How do websites find out which browser is visiting them
how i can do this
Are you give answer for asp.net c#
They look for the user agent passed in the request.
In ASP.NET:
Request.ServerVariables["HTTP_USER_AGENT"]
The browser tells the server what kind of browser it is in the User-Agent string, which it includes with each HTTP request.
You can access the User-Agent directly and parse it yourself, or you can use ASP.NET's built-in browser capabilities feature, which relies on several *.browser files, regular expressions, etc.
User-Agent: <%= Request.UserAgent %>
ID: <%= Request.Browser.Id %>
Browser: <%= Request.Browser.Browser %>
Type: <%= Request.Browser.Capabilities["type"] %>
The HTTP protocol provides an attribute of the request header called the User-Agent which the client (here the web browsers) fill-in with a string identifying the browser make, version and operating system. Like all elements of the HTTP header, this information may well be "spoofed" or altered for various purposes (for example by various client-side privacy gateways and such), but it is usually relatively reliable.
An example of such a User-Agent string is (here for a FireFox browser, Version 3.5, running under Windows XP)
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5
This information, along with other attributes from the header can be queried by the receiving application. Although the specifics vary from one language/framework to the next, may of these languages/framworks expose a simple object model which mirrors the various objects associated with the HTTP protocol. In the case of the http header, this typically comes from the "Request" (may be named differently) object, so accessing the User-Agent may look something like:
ClientBrowser = Request.Header("User-Agent")
or possibly
ClientBrowser = HttpHeader.UserAgent
Edit: In the case of C#/ASP.NET (late edit of question):
ClientBrowser = Request.ServerVariables("HTTP_USER_AGENT")
Also, although you may be tempted to use this information directly, you may also rely on various libraries which encapsulate the details of parsing the [very many versions of the] User-Agent strings to figure out the particular web browser and even the particular forms of javascript such client should be sent.
Related
I am trying in C# to screen scrap two airlines site so I can compare the two fares over many different dates. I manage to do on qua.com but when I try to do it on amadeus.net, I encounter that this site give me a response of
older browser not supported
So using webbrowser class doesn't work... using httpwebrequest doesnt work also.
So I want to use webclient but because amadeus.net is heavily base on js or something. I do not know where to post url.
Any suggestion?
Edit: webclient.downloadstring also doesn't wort
Try to use the Navigate overload with the user agent:
string useragent = "Mozilla/5.0 (Windows NT 6.0; rv:39.0) Gecko/20100101 Firefox/39.0" ;
webBrowser.Navigate(url, null, null,useragent) ;
An alternative is to use another WebBrowser such as awesomium
After looking into passing a fake useragent (from Jodrell) in httpWebrequest, this works but i had to deal with cookies so that can get complicated.
Graffito suggest to overload useragent within a webBrowser but didn't work as it gave me lots of JS loading error, this is because within that website it-self it requires a proper modern browser for it to work.
I found out that my IE itself is a version 9, so i upgraded it IE.11. Then tried Graffito solution again, but that didn't work.
So in the end i thought i might as well update webBrowser to the correct version by following this article
I have problem with certain site - I am provided with list of product ID numbers (about 2000) and my job is to pull data from producer site. I already tried forming url of product pages, but there are some unknown variables that I can't put to get results. However there is search field so i can use url like this: http://www.hansgrohe.de/suche.htm?searchtext=10117000&searchSubmit=Suchen - the problem is, that given page display info (probably java script) and then redirect straight to desired page - the one that i need to pull data from.
is there any way of tracking this redirection thing?
I would like to put some of my code, but everything i got so far, i find unhelpful because it just download source of preregistered page.
public static string Download(string uri)
{
WebClient client = new WebClient();
client.Encoding = Encoding.UTF8;
client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
string s = client.DownloadString(uri);
return s;
}
Also suggested answer is not helpfull in this case, because redirection doesn't come with http request - page is redirected after few seconds of loading http://www.hansgrohe.de/suche.htm?searchtext=10117000&searchSubmit=Suchen url
I just found solution, And since i'm new, and i have to wait few hours to answer my question, it will end up there:
I hope that other users will find it usefull:
{pseudocode}
webBrowser1.Navigate('url');
while (webBrowser1.Url.AbsoluteUri != 'url')
{
// wait
}
String desiredUri = webBrowser1.Url.AbsoluteUri;
Thanks for answers.
Welcome to the wonderful world of page scraping. The short answer is "you can't do that." Not in the general case, anyway, and certainly not with WebClient. The problem appears to be that some Javascript does the redirection. And since all WebClient does is download the page, it's not even going to download the Javascript. Much less parse and execute it.
You might be able to do this by creating a program that uses the WebBrowser class. You can have it load the page. It should do the redirect and then you can inspect the result, which should be the page you were looking for. I haven't actually done this, but it does seem possible.
Your other option is to fire up your Web browser's developer tools (like IE's F12 Developer Tools) and watch what's happening. You can then inspect the Javascript that's being executed as well as the modified DOM, and see where the redirect happens.
Yes, it's tedious work. But once you figure out the redirect for one page, you can probably generate the URL for the other pages you want automatically.
I created REST base webAPI in MVC 4 and hosted on server when I call this from HTML on other domain and on my local pc I need to call it as JSONP request like put callback=? in url so it can be jsonp. My question is that why this is so? if its due to cross domain then how google and facbook and other companies host their api we also call it from our own domain but we dont keep callback=? in their url.
so why my API need callback=? in url if i call it from other domain or on my local pc with simple jquery html.
Its because of the Same Origin Policy imposed by the browsers.
See
http://www.w3.org/Security/wiki/Same_Origin_Policy
http://en.wikipedia.org/wiki/Same_origin_policy
.
Also note that CORS might be a better option than JSONP in the future
http://en.wikipedia.org/wiki/Cross-origin_resource_sharing
EDIT: ------------
If you have gone through above links you would see that JSONP allows users to work-around this Same Origin Policy security measure imposed by the browsers.
Trick is browsers allow tags to refer files in other domains than the origin.
Basically what happens is with JSONP, you send a callback function name to the server appended to the query string. Then the server will pad or prefix it's otherwise JSON request with a call to this function, hence the P in the name to denote response is padded or prefixed.
For example you can create a script tag like
then the target server, should send a response such that
mymethod({normal: 'json response'})
when this repsone is evaluated on the client side (as for any other javascript file) it will effectively call your method with the JSON response from that server.
However, this can only do GET requests.
If you want to make POST (PUT/DELETE) requests you need to use CORS in which server needs to set a specific header beforehand.
Access-Control-Allow-Origin: www.ext.site.com
Hope this helps.
Because of the same-origin policy limitations. The same-origin policy prevents a script loaded from one domain from getting or manipulating properties of a document from another domain. That is, the domain of the requested URL must be the same as the domain of the current Web page. This basically means that the browser isolates content from different origins to guard them against manipulation.
It happens that when I save a web-page source from IE it differs from source downloaded by HttpWebRequest in my C# app.
I have saved both files for reference. The one saved from IE is here and the one from HttpWebRequest is here.
They differ in formating and in the content itself. It seems that the one downloaded by HttpWebRequest is broken and doesn't consist of valid data (which is perfect when saved from IE).
I don't know why I cannot achieve a nice formated source using IE.
Reagrds
Mariusz
I suspect the one downloaded using IE has got some state associated with it from either cookies or session variables that were set when you visited the site manually. The one downloaded using C# will have the default values for everything, and hence different content.
This looks most likely because the file_web file contains a section called "LastViewedHotels" that contains an entry for the Arora Manchester.
Additionally, it looks like there is dynamic content for displaying adverts, which is different between the two files.
Usually this happens when the site you are navigating to, loads additional content via Ajax or frames.
To overcome this and always fetch the content IE sees, you can use the WebBrowser control to navigate and take the source from there.
Here is an
Example
Update
From running a KDiff on the sources you gave, it looks like there's 1 major line difference:
<link rel="alternate" type="text/html" hreflang="de"...
And that looks like it has an ID generated from a session (a cookie) so there's not much you can do about that without copying the IE cookie header.
Previous answer
"Under the hood", IE and HttpWebRequest both perform the same simple task, which is to send the following text request on port 80 via a a socket to the HTTP server:
GET / HTTP/1.1
(or 1.0 - and a host header too).
If you're on Windows you can try it out. Install the built in Windows telnet client (add/remove programs->windows features), or putty and then type:
GET / HTTP/1.1 (newline)
Host: yahoo.com
The source from this, IE, and the HttpWebRequest class will be exactly the same. The only difference will come if IE is passing cookies to the server, and any extra header which normally include:
A user agent
Accept */*
Gzip
A cookies or session variable (which includes session variables - cookies that expire when IE is closed)
For formatting, IE might turn tabs into spaces, or the other way around. The HttpWebRequest will return the raw results without any formatting.
I use Process.Start("firefox.exe", "http://localhost/page.aspx");
And how i can know page fails or no?
OR
How to know via HttpWebRequest, HttpWebResponse page fails or not?
When i use
HttpWebRequest myReq = (HttpWebRequest)WebRequest.Create("somepage.aspx");
HttpWebResponse loWebResponse = (HttpWebResponse)myReq.GetResponse();
Console.Write("{0},{1}",loWebResponse.StatusCode, loWebResponse.StatusDescription);
how can I return error details?
Not need additional plugins and frameworks. I want to choose this problem only by .net
Any Idea please
Use Watin to automate firefox instead of Process.Start. Its a browser automation framework that will let you monitor what is happening properly.
http://watin.sourceforge.net/
edit: see also Google Webdriver http://google-opensource.blogspot.com/2009/05/introducing-webdriver.html
If you are spawning a child-process, it is quite hard and you'd probably need to use each browser's specific API (it won't be the same between FF and IE, for example).
It doesn't help that in many cases the exe detects an existing instance and forwards the request there (so you can't trust the exit-code, since the page hasn't even been requested in the right exe yet).
Personally, I try to avoid assuming any particular browser for this scenario; just launch the url:
Process.Start("http://somesite.com");
This will use the user's default browser. You have to hope it appears though - you can't (reliably and robustly) check that externally without lots of work.
One other option is to read the data yourself (WebClient.Download*) - but this may have issues with complex cookies, login, user-agent awareness, etc.
Use HttpWebRequest class or WebClient class to check this. I don't think Process.Start will return something if the URL not exists.
Don't start the page in this form. Instead, create a local http://localhost:<port>/wrapper.html which loads http://localhost/page.aspx and then either http://localhost:<port>/pass.html or http://localhost:<port>/fail.html. localhost: is a trivial HTTP server interface implemented by your app.
The idea is that Javascript gives you an API inside the browser, which is far more standard than the APIs on the outside of browsers. Since the Javascript on wrapper.html comes from the same server and even port as the subsequent resources, this should satisfy the same-origin policies in current browsers.