Detecting 302 Redirect - c#

I'm trying to check the redirect location of a url but am always getting the wrong results. For example, for the url http://www.yellowpages.com.eg/Mjg3NF9VUkxfMTEwX2h0dHA6Ly93d3cubG90dXMtYWlyLmNvbV8=/Lotus-Air/profile.html, it redirects to http://www.lotus-air.com with a type of redirect 302 Found (you can test it on the this service http://www.internetofficer.com/seo-tool/redirect-check/), however am getting "http://mobile.yellowpages.com.eg/" as the webResp.GetResponseHeader("Location") . My Code is as follows:
string url = #"http://www.yellowpages.com.eg/Mjg3NF9VUkxfMTEwX2h0dHA6Ly93d3cubG90dXMtYWlyLmNvbV8=/Lotus-Air/profile.html";
HttpWebRequest webReq = WebRequest.Create(url) as HttpWebRequest;
webReq.Method = "HEAD";
webReq.AllowAutoRedirect = false;
HttpWebResponse webResp = webReq.GetResponse() as HttpWebResponse;
txtOutput.Text += webResp.StatusCode.ToString() + "\r\n" ;
txtOutput.Text += webResp.GetResponseHeader("Location") + "\r\n";
txtOutput.Text += webResp.ResponseUri.ToString();
webResp.Close();
Thanks.
Yehia

They are probably sending different redirects based on the user agent, so you get one result in a browser and another in your code.

You could use a HTTP debugging proxy to get an understanding of the headers moving back and forth and enables to you to change your user-agent to help test Ben's theory (I +1'd that).
A good one is Fiddler - Web Debugging Proxy free and easy to use/
The screenshot below shows me changing the useragent to an old IEMobile one "Mozilla/4.0 (compatible; MSIE 6.0; Windows CE; IEMobile 6.12; en-US; KIN.Two 1.0)", which redirects me to mobile.yellowpages.com.eg
n.b. changing to an ipad useragent takes you to iphone.yellowpages.com.eg

As Ben pointed out, it redirects based on user agent. Just add some user agent (this one is for chrome):
webReq.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13";
For me it redirects to http://www.lotus-air.com.

Related

Querystring being ignored

I'm writing an interface to scrape info from a service. The link is behind a login, so I keep a copy of the cookies and then attempt to loop through the pages to get stats for our users.
The urls to hit are of the format: https://domain.com/groups/members/1234
for the first page, and each subsequent page appends ?page=X
string vUrl = "https://domain.com/groups/members/1234";
if (pageNumber > 1) vUrl += "?page=" + (pageNumber).ToString();
HttpWebRequest groupsRequest = (HttpWebRequest)WebRequest.Create(vUrl);
groupsRequest.CookieContainer = new CookieContainer();
groupsRequest.CookieContainer.Add(cookies); //recover cookies First request
groupsRequest.Method = WebRequestMethods.Http.Get;
groupsRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36";
groupsRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
groupsRequest.UseDefaultCredentials = true;
groupsRequest.AutomaticDecompression = DecompressionMethods.GZip;
groupsRequest.Headers.Add("Accept-Language", "en-US,en;q=0.8");
groupsRequest.Headers.Add("Cache-Control", "max-age=0");
HttpWebResponse getResponse = (HttpWebResponse)groupsRequest.GetResponse();
This works fine for the first page and I get the data back that I need, but with each subsequent pass, the queryString is ignored. Debugging at the last line shows that RequestUri.Query for the request is correct, but the response RequestUri.Query is blank. So it has the effect of always returning page 1 data.
I've tried to mimic the request headers that I see via Inspect in Chrome, but I'm stuck. Help?
when you put that url that is failing into a browser does it work? Because it is a GET, the browser should make the same request and tell you if it is working. If it does not work in the browser, then perhaps you are missing something aside from the query string?
If it does work, then maybe use fiddler and find out exactly what headers, cookies, and query string values are being sent to make 100% sure that you are sending the correct request. It could be that the query string is not enough information to get the data that you need from the request that you are sending.
If you still can't get it then fiddler the request when you send it through the browser and then use this fiddler extension to turn the request into code and see whats up.

Error 500 Web Request Can't Scrape WebSite

I have no problem accessing the website with a browser, but when I programmatically try to access the website for scraping, I get the following error.
The remote server returned an error: (500) Internal Server Error.
Here is the code I'm using.
using System.Net;
string strURL1 = "http://www.covers.com/index.aspx";
WebRequest req = WebRequest.Create(strURL1);
// Get the stream from the returned web response
StreamReader stream = new StreamReader(req.GetResponse().GetResponseStream());
System.Text.StringBuilder sb = new System.Text.StringBuilder();
string strLine;
// Read the stream a line at a time and place each one
while ((strLine = stream.ReadLine()) != null)
{
if (strLine.Length > 0)
sb.Append(strLine + Environment.NewLine);
}
stream.Close();
This one has me stumped. TIA
Its the user agent.
Many sites like the one you're attempting to scrape will validate the user agent string in an attempt to stop you from scraping them. Like it has with you, this quickly stops junior programmers from attempting the scrape. Its not really a very solid way of stopping a scrape - but it stumps some people.
Setting the User-Agent string will work. Change the code to:
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(strURL1);
req.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"; // Chrome user agent string
..and it will be fine.
It looks like it's doing some sort of user-agent checking. I was able to replicate your problem in PowerShell, but I noticed that the PowerShell cmdlet Invoke-WebRequest was working fine.
So I hooked up Fiddler, reran it, and stole the user-agent string out of Fiddler.
Try to set the UserAgent property to:
User-Agent: Mozilla/5.0 (Windows NT; Windows NT 6.2; en-US) WindowsPowerShell/4.0

C# webclient cannot getting response from https protocol

When i trying to load html from server by https, it returning an error code 500: but when i open same link in browser it works fine: is there any way to do this? I'm using Webclient and also sending a useragent information to the server:
HttpWebRequest req1 = (HttpWebRequest)WebRequest.Create("mobile.unibet.com/";);
req1.UserAgent = #"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5";
var response1 = req1.GetResponse();
var responsestream1 = response1.GetResponseStream();
David is correct, this generally happens when the server is expecting some headers that is not passed through, in your case Accept
this code works now
string requestUrl = "https://mobile.unibet.com/unibet_index.t";
var request = (HttpWebRequest)WebRequest.Create(requestUrl);
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.UserAgent = "//Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
using (var response = request.GetResponse() as HttpWebResponse)
{
using (var sr = new StreamReader(response.GetResponseStream()))
{
var responsestring = sr.ReadToEnd();
if (!string.IsNullOrEmpty(responsestring))
{
Console.WriteLine(responsestring);
}
}
}
This should probably be a comment but there's not enough room in the comment for all the questions... I don't think the question has enough information to answer with any level of confidence.
A 500 error means a problem at the server. The short answer is that the browser is sending some content that the WebClient is not.
The WebClient may not be sending headers that are expected by the server. Does the server require authentication? Is this a page on a company that you've contracted with that perhaps provided you with credentials or an API key that was Do you need to add HTTP Authorization?
If this is something you're doing with a company that you've got a partnership with, you should be able to ask them to help trace why you're getting a 500 error. Otherwise, you may need to provide us with a code sample and more details so we can offer more suggestions.

Website not returning cookies

I am trying to log into a website to send SMS via a windows phone 7 app. I have 2 providers working but when I try Vodafone I am running into an error.
From what I gather it seems that the response does not contain cookies, or they are not being read. The request logs in ok and the response I get back is the correct page but it contains no cookies.
The Url:
RequestUrl = String.Format("https://www.vodafone.ie/myv/services/login/Login.shtml?username={0}&password={1}", userSettings.Username, userSettings.Password),
The Request:
Request = (HttpWebRequest)WebRequest.Create((requestCollection.CurrentRequest().RequestUrl));
if (Request.CookieContainer == null)
{
Request.CookieContainer = cookieJar.CookieContainer;
Request.AllowAutoRedirect = true;
Request.AllowReadStreamBuffering = true;
Request.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6";
}
Where the code errors as the response cookies could not be evaluated:
public void AddCookiesToContainer(HttpWebResponse response)
{
CookieCollection.Add(response.Cookies);
CookieContainer.Add(response.ResponseUri, CookieCollection);
}
And below is the debugger showing no cookies :(
Which line of the code has the error?
Have you verified that the service does return cookies? (i.e. If you make the same request from a PC)
Edit:
The remote host is returning cookies in it's redirection to the index page but in that page there are no cookies in the response. This would explain why there are no cookies in the collection when you try and use it.
Verify this behaviour against a PC client, inspect the body of the response from index.jsp ans this may contain information to help debug and check the documentation on how the process is supposed to work.

How to emulate XHttpRequest in c#

I need to access to service from windows-client? that can be called by ajax - GET request. and returns XML
if i using HttpWebRequest request = HttpWebRequest.Create...
for ex url: http://site.com/UtilBillAjaxServlet?event=GET_PAMENT_CENT_DUE&SERVICEPROIDER=providername&SERVICETYPE=BROADBAND&CONSUMERNUMBER=195100601
And it return's 0-length response (in browser it retun correct response)
i think problem is - server detects that query as non-xhttp query (is there any difference?)
Thank you.
You should use fiddler or any other sniffer for tracing that.
But for doing what you want just use the following:
http://support.microsoft.com/default.aspx/kb/307023
It's possible that the service only responds to requests coming from a browser; I'd find that a little strange, but not unheard of.
However, if that is the case you can emulate a browser request:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(yourUri);
// Pretend to be IE6!
request.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; windows NT 5.1)";
request.Method = "GET";
request.AllowAutoRedirect = true;
request.KeepAlive = true;

Categories