I want to get the search terms that user typed on Google to get to my long-tail landing page (and use them on that page).
Getting the the "q" variable from the query string using the response referrer (in ASP C#) works well but only if the referring Google page was not loaded as https.
This is obviously a problem due to the fact that almost everyone is logged in to their Google accounts on their browsers all the time and, if they are, all Google pages will be automatically loaded (and redirected) to use https.
When a user (on https://www.google.com) searches for something and clicks on a search result, Google seems to redirect the user to an intermediate page that strips the request of its query string and replaces it with a different one that pretty much only contains url that the intermediate page should redirect to (i.e. the url to my long-tail landing page).
Is there any way that I can get the original search terms that were used on https://www.google.com anyway? Maybe if JavaScript could access the browser history or something similar?
Is there any way that I can get the original search terms that were used on https://www.google.com
No, the full text of the https session is secured via SSL this includes headers, urls etc. In your scenario, for security reasons browers tend to omit the referer header therefore you won't be able to access it (unless the destination URL is also secured via HTTPS). This is part of the HTTP spec - 15.1.3 Encoding Sensitive Information in URI's.
The only thing you can do is put a disclaimer on your site to say it doesn't work over https.
Since it is Google, it is not possible because there is not shared link with your website.
Once you are on HTTPS - it does not allow sending of REFERRER headers. I am sure you are aware that headers can be manipulated and cannot be trusted but, you may trust Google. However, due to privacy policy any activity done on Google by Google users are not shared by 3rd party. Link
Again, in server side languages you can find functions for HTTP Referrer but not HTTPS referrer. That is for a reason !
Unless and until you do not have a collaboration with the originating server who may create an exception in their RFC thing to allow HTTP REFERRER ONLY for your website. It isn't possible.
Hope that helps! (in moving on) :)
EDIT: Wikipedia Link See Referrer Hiding (2nd last line)
To see the referrer data you need to be either a paying google ads customer (and the visitor come via an ad-click) or have your site in HTTPS as well. Certs are cheap these days or you could use an intermediary like CloudFlare to do the SSL and have a self-signed cert on your site.
You can also see queries no matter the method used, with Google Webmaster tools.
I wrote a little about this here: http://blogs.dixcart.com/public/technology/2012/03/say-goodbye-to-keyword-tracking.html
Related
We are third party widget proving services company for payment (for example). PARENT website can add our widget built in ASP.net MVC -5 in their iframe. Our widget URL is completely secure (https) but the parent website where its getting embedded is not. Now that parent website is used by 100's of people to make the payment. Problem is they see that the whole website is not "SECURE". they cannot see that the iframe where they are making payment is secure. Is there any way I can solve this issue. How can I make the parent website detect that the iframe its using is HTTPS hence make the whole page as secure. Or if there is another way to handle it then please guide me.
Parent - http
iframe - https
users : scared of making payments
You can't, because it's in fact not secure. HTTPS only provides protection if it's HTTPS all the way. If any part drops to HTTP, the whole channel is insecure.
Tell your clients to implement SSL to protect their users. If they choose not to, there's not really anything you can do about it. It might be worth updating your terms of service to indemnify your organization against damages should your clients not utilize SSL and some sort of breach should occur because of that, when using your widget. Basically, transfer the legal responsibility to the client.
I am very green to developing a web front-end.
I wanted to create an eternally hosted site that will pull a collection of resources.
This is the scenario:
I want to "embed" pages within the web app. My thoughts, make the site called look exactly like it does originally, but keep the navigation header above. I have googled quite a bit to try and get a good direction in where to take this. From what I have found, iframe is the way to go.
The issues:
We host Dell Open manage Essentials on one of our servers. The only way to access this is through https://ome/. We currently do not have CA in place, therefore the certificate that is currently on the server is expired. This error is accurate due to the lack of a valid certificate.
My question:
1). Is iframe the right approach to this situation.
2). How do I bypass, or at least give the user the ability to continue to the embedded site? These sites are all internal.
You can't decide for a user to display content for a site with an expired certificate. The user has to accept that risk. That's why the browsers now immediately flag pages with expired certificates and make it super-obvious. There are few cases where you'd actually want to bypass this - so few, that the common browsers just don't make exceptions.
The VERY difficult way is to route your IFRAME url through a proxy that doesn't care about expired certs.
The less difficult way is to spend $50 on a certificate. Or you can even get a free one (YMMV) at https://startssl.com (at your own risk. I am unaffiliated.)
Q: is it possible to manipulate http request header or using any other technique in C# when making request (to servers like yahoo.com/cnn.com) using C#, so that the returned web page text(stream)'s size can be greatly reduced - a simplified webpage without all other extra scripts/image/css? or even better can I just request a sub-section of the webpage of my interest to be downloaded only? I just need the responded page to be minimized as much as possible so that it can be downloaded as fast as possible before the page can be processed later.
It really depends on the site and services it provides and configuration it has. Things that may help to look for (not a complete list):
API exposed that let you access data directly. E.g. XML or JSON type response.
Compression - your client has to request via appropriate HTTP headers, e.g. Accept-Encoding: gzip, deflate, and needless to say know how to process response accordingly. SO thread on doing this in C#.
Requesting mobile version of site if site supports such a thing. How site exposes such version really depends on the site. Some prefix their URLs with m., some respond to User-Agent string, some use other strategies...
Use HTTP Range header. Also depends if site supports it. MSDN link for .NET API.
Have a play with tweaking some of the browser capabilities in your HTTP request header, see here. Although your response to this will vary from site to site but this is how a client tells the server what it is capable of displaying and dealing with.
There is no way to ask server to render different amount of data outside of what server supports via C# or any other language. I.e. there is no generic mechanism to tell server "don't render inline CSS/JS/Images" or "don't render ad content" or even "just give me article text".
Many sites have "mobile" versions that will have potentially smaller page sizes, but likely contain different or less information than desktop version. You should be able to request mobile version by picking different url or specifying "user agent" corresponding to a phone.
Some sites provide data as RSS feed or some other means to obtain data automatically - you may want to check with each side.
If you know particular portion of the page to download you may be able to use range header for GET request, but it may not be supported by dynamic pages.
Side notes:
- most sites will server CSS/JS as separate files.
- make sure to check license to see if there are any limitations on each site.
Can anyone verify what I am seeing is not a bug? I am checking the referrer when someone comes to my site which is an ASP.NET C# store site. When I link from any of the other sites I control, my main page sees the referrer properly.
I am trying to support another third-party site that's linking to me and they have a Google site page at sites.google.com/site/whatever and when I follow that link my referrer on my main page is blank.
Is that something Google is doing or is it a truly bizarre bug in my code. (I know you can't see my code but I would like verification that Google is stripping referrer from their sites.google pages please.)
Thanks
Google Sites is HTTPS by default, which means no referrer data is passed. This may be part of Google's move to HTTPS across the board. Implications discussed here.
HTTP RFC says referrers shouldn't be sent when going from HTTPS to HTTP. Not sure if HTTPS to HTTPS will work either. See discussion here.
I am writing a crawler. Once after the crawler logs into a website I want to make the crawler to "stay-always-logged-in". How can I do that? Is a client (like browser, crawler etc.,) make a server to obey this rule? This scenario could occur when the server allows limited logins in day.
"Logged-in state" is usually represented by cookies. So what your have to do is to store the cookie information sent by that server on login, then send that cookie with each of your subsequent requests (as noted by Aiden Bell in his message, thx).
See also this question:
How to "keep-alive" with cookielib and httplib in python?
A more comprehensive article on how to implement it:
http://www.voidspace.org.uk/python/articles/cookielib.shtml
The simplest examples are at the bottom of this manual page:
https://docs.python.org/library/cookielib.html
You can also use a regular browser (like Firefox) to log in manually. Then you'll be able to save the cookie from that browser and use that in your crawler. But such cookies are usually valid only for a limited time, so it is not a long-term fully automated solution. It can be quite handy for downloading contents from a Web site once, however.
UPDATE:
I've just found another interesting tool in a recent question:
http://www.scrapy.org
It can also do such cookie based login:
http://doc.scrapy.org/topics/request-response.html#topics-request-response-ref-request-userlogin
The question I mentioned is here:
Scrapy domain_name for spider
Hope this helps.