Can't get HTML code through HttpWebRequest - c#

I am trying to parse the HTML code of the page at http://odds.bestbetting.com/horse-racing/today in order to have a list of races, etc.
The problem is I am not being able to retrieve the HTML code of the page. Here is the C# code of the function:
public static string Http(string url) {
Uri myUri = new Uri(url);
// Create a 'HttpWebRequest' object for the specified url.
HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(myUri);
myHttpWebRequest.AllowAutoRedirect = true;
// Send the request and wait for response.
HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
var stream = myHttpWebResponse.GetResponseStream();
var reader = new StreamReader(stream);
var html = reader.ReadToEnd();
// Release resources of response object.
myHttpWebResponse.Close();
return html;
}
When I execute the program calling the function it throws an exception on
HttpWebResponse myHttpWebResponse =
(HttpWebResponse)myHttpWebRequest.GetResponse();
which is:
Cannot handle redirect from HTTP/HTTPS protocols to other dissimilar ones.
I have read this question but I don't seem to have the same problem.
I've also tried iguring something out sniffing the traffic with fiddler but can't see anything to where it redirects or something similar. I just have extracted these two possible redirections: odds.bestbetting.com/horse-racing/2011-06-10/byCourse
and odds.bestbetting.com/horse-racing/2011-06-10/byTime , but querying them produces the same result as above.
It's not the first time I do something like this, but I'm really lost on this one. Any help?
Thanks!

I finally found the solution... it effectively was a problem with the headers, specifically the User-Agent one.
I found after lots of searching a guy having the same problem as me with the same site. Although his code was different the important bit was that he set the UserAgent attribute of the request manually to that of a browser. I think I had done this before but I may had done it pretty bad... sorry.
The final code if it is of interest to any one is this:
public static string Http(string url) {
if (url.Length > 0)
{
Uri myUri = new Uri(url);
// Create a 'HttpWebRequest' object for the specified url.
HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(myUri);
// Set the user agent as if we were a web browser
myHttpWebRequest.UserAgent = #"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4";
HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
var stream = myHttpWebResponse.GetResponseStream();
var reader = new StreamReader(stream);
var html = reader.ReadToEnd();
// Release resources of response object.
myHttpWebResponse.Close();
return html;
}
else { return "NO URL"; }
}
Thank you very much for helping.

There can be a dozen probable causes for your problem.
One of them is that the redirect from the server is pointing to an FTP site, or something like that.
It can also being that the server require some headers in the request that you're failing to provide.
Check what a browser would send to the site and try to replicate.

Related

.NET service responds 500 internal error and "missing parameter" to HttpWebRequest POSTS but test form works fine

I am using a simple .NET service (asmx) that works fine when invoking via the test form (POST). When invoking via a HttpWebRequest object, I get a WebException "System.Net.WebException: The remote server returned an error: (500) Internal Server Error." Digging deeper, reading the WebException.Response.GetResponseStream() I get the message: "Missing parameter: serviceType." but I've clearly included this parameter.
I'm at a loss here, and its worse that I don't have access to debug the service itself.
Here is the code being used to make the request:
string postData = String.Format("serviceType={0}&SaleID={1}&Zip={2}", request.service, request.saleId, request.postalCode);
byte[] data = (new ASCIIEncoding()).GetBytes(postData);
HttpWebRequest httpWebRequest = (HttpWebRequest)WebRequest.Create(url);
httpWebRequest.Timeout = 60000;
httpWebRequest.Method = "POST";
httpWebRequest.ContentType = "application/x-www-form-urlencoded";
httpWebRequest.ContentLength = data.Length;
using (Stream newStream = httpWebRequest.GetRequestStream())
{
newStream.Write(data, 0, data.Length);
}
try
{
using (response = (HttpWebResponse)httpWebRequest.GetResponse())
{
if (response.StatusCode != HttpStatusCode.OK)
throw new Exception("There was an error with the shipping freight service.");
string responseData;
using (StreamReader responseStream = new StreamReader(httpWebRequest.GetResponse().GetResponseStream(), System.Text.Encoding.GetEncoding("iso-8859-1")))
{
responseData = responseStream.ReadToEnd();
responseStream.Close();
}
if (string.IsNullOrEmpty(responseData))
throw new Exception("There was an error with the shipping freight service. Request went through but response is empty.");
XmlDocument providerResponse = new XmlDocument();
providerResponse.LoadXml(responseData);
return providerResponse;
}
}
catch (WebException webExp)
{
string exMessage = webExp.Message;
if (webExp.Response != null)
{
using (StreamReader responseReader = new StreamReader(webExp.Response.GetResponseStream()))
{
exMessage = responseReader.ReadToEnd();
}
}
throw new Exception(exMessage);
}
Anyone have an idea what could be happening?
Thanks.
UPDATE
Stepping through the debugger, I see the parameters are correct. I also see the parameters are correct in fiddler.
Examining fiddler, I get 2 requests each time this code executes. The first request is a post that sends the parameters. It gets a 301 response code with a "Document Moved Object Moved This document may be found here" message. The second request is a GET to the same URL with no body. It gets a 500 server error with "Missing parameter: serviceType." message.
It seems like you found your problem when you looked at the requests in Fiddler. Taking an excerpt from http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html:
10.3.2 301 Moved Permanently
The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible.
.....
Note: When automatically redirecting a POST request after
receiving a 301 status code, some existing HTTP/1.0 user agents
will erroneously change it into a GET request.
Here's a couple options that you can take:
Hard-code your program to use the new Url that you see in the 301 response in Fiddler
Adjust your code to retrieve the 301 response, parse out the new Url from the response, and build a new response with the new Url.
The latter option would be ideal if you're dealing with user-based input on the Url (like a web browser), since you don't know where the user is going to want your program to go.

Unable to get HTML

I'm trying to open adfoc.us/504....9 link with httpwebrequest.
However it gives me no HTML code.
try
{
req = WebRequest.Create(txtLink.Text);
WebProxy wp = new WebProxy(proxies[0]);
//req.Proxy = wp;
WebResponse wr = req.GetResponse();
StreamReader sr = new StreamReader(wr.GetResponseStream());
string content = sr.ReadToEnd();
MessageBox.Show(content);
sr.Close();
}
catch (UriFormatException)
{
MessageBox.Show("URL should be in this format:\nhttp://www.google.com");
return;
}
If I use website like [google.com][1] - I get mbox with google html source.
If I use adfoc.us/50.... link I get an empty string.
Where could be the problem?
Thank you.
EDIT: I resolved the problem by installing GeckoFx component.
This is just a guess.
If you can open the link in your browser and not from your code it could mean that adfoc.us blocks you because it can't find the useragent header. Try adding a useragent header that a browser uses.
try this
var req = (System.Net.HttpWebRequest) System.Net.WebRequest.Create("");
req.AllowAutoRedirect = true;
and you can manual set MaximumAutomaticRedirections
When initializing the WebRequest, add the following:
req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
Seems like it doesn't like the default header. I got the above from Firefox request header.

Asp.Net c# logging in to another website

I know this question has been asked quite a lot of times which is how I have got to where I am at with the code below however I just can't get it to work on the particular website I am trying to access. At the site I am trying to access I need to retrieve certain values from the page however things like price and availability only come up after logging in so I am trying to submit my login information and then go to the product page to get the information I need using HTML Agility Pack.
At the moment it seems to attempt the login however the website is either not accepting it or the cookies are not present on the next page load to actually keep me logged in.
If someone could help me with this I would be very grateful as I am not a programmer but have been assigned this task as part of a software installation.
protected void Button5_Click(object sender, System.EventArgs e)
{
string LOGIN_URL = "http://www.videor.com/quicklogin/1/0/0/0/index.html";
string SECRET_PAGE_URL = "http://www.videor.com/item/47/32/0/703/index.html?scriptMode=&CUSTOMERNO=xxx&USERNAME=xxx&activeTabId=0";
// have a cookie container ready to receive the forms auth cookie
CookieContainer cookies = new CookieContainer();
// first, request the login form to get the viewstate value
HttpWebRequest webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
webRequest.CookieContainer = cookies;
StreamReader responseReader = new StreamReader(
webRequest.GetResponse().GetResponseStream()
);
string responseData = responseReader.ReadToEnd();
responseReader.Close();
string postData = "CUSTOMERNO=xxxx&USERNAME=xxxxx&PASSWORD=xxxxx";
// now post to the login form
webRequest = WebRequest.Create(LOGIN_URL) as HttpWebRequest;
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.CookieContainer = cookies;
// write the form values into the request message
StreamWriter requestWriter = new StreamWriter(webRequest.GetRequestStream());
requestWriter.Write(postData);
requestWriter.Close();
// we don't need the contents of the response, just the cookie it issues
webRequest.GetResponse().Close();
// now we can send out cookie along with a request for the protected page
webRequest = WebRequest.Create(SECRET_PAGE_URL) as HttpWebRequest;
webRequest.CookieContainer = cookies;
responseReader = new StreamReader(webRequest.GetResponse().GetResponseStream());
// and read the response
responseData = responseReader.ReadToEnd();
responseReader.Close();
Response.Write(responseData);
}
This isn't a direct answer since I'm not sure what's wrong with your code (from a cursory glance it looks ok), but another approach is to use browser automation using Selenium . The following code will actually load the page using Chrome (you can swap out Firefox or IE) and is simpler to code against. It also won't break if they add javascript or something.
var driver = new ChromeDriver();
driver.Navigate().GoToUrl(LOGON_URL);
driver.FindElement(By.Id("UserName")).SendKeys("myuser");
driver.FindElement(By.Id("Password")).SendKeys("mypassword");
driver.FindElement(By.TagName("Form")).Submit();
driver.Navigate().GoToUrl(SECRET_PAGE_URL);
// And now the html can be found as driver.PageSource. You can also look for
// different elements and get their inner text and stuff as well.

pass string from C# Windows Form Application to php webpage

How can I pass some data to a webpage from C#.net? I'm currently using this:
ProcessStartInfo p1 = new ProcessStartInfo("http://www.example.com","key=123");
Process.Start(p1);
but how can I access it from PHP? I tried:
<?php echo($_GET['key']); ?>
but it prints nothing.
Try passing it with the url itself
ProcessStartInfo p1 = new ProcessStartInfo("http://timepass.comule.com?key=123","");
Process.Start(p1);
you should put the key parameter as a query string :
ProcessStartInfo p1 = new ProcessStartInfo("http://timepass.comule.com?key=123");
I would suggest using the HttpWebRequestClass.
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.aspx
This way, you would also have the ability to post data to your page, add auth parameters, cookies etc - in case you might need it.
I'm not sure if this matters in your particular setup, passing data thru the query string is not secure. But if security is an issue as well, I would POST the data thru an SSL connection.
Update:
so if you POST'ed data to your php page like so:
string dataToSend = "data=" + HttpUtility.UrlEncode("this is your data string");
var dataBytes = System.Text.Encoding.UTF8.GetBytes(dataToSend);
HttpWebRequest req = (HttpWebRequest) WebRequest.Create("http://localhost/yourpage.php");
req.ContentType = "application/x-www-form-urlencoded";
req.ContentLength = dataBytes.Length;
req.Method = "POST";
using (var stream = req.GetRequestStream())
{
stream.Write(dataBytes, 0, dataBytes.Length);
}
// -- execute request and get response
HttpWebResponse resp = (HttpWebResponse) req.GetResponse();
if (resp.StatusCode == HttpStatusCode.OK)
Console.WriteLine("Hooray!");
you can retrieve it by using the following code in your php page:
echo $_POST["data"])
Update 2:
AFAIK, ProcessStartInfo/Process.Start() actually starts a process - in this case, I think it will start your browser. The second parameter is the command line arguments. This information is used by programs so they know how to behave when started (hidden, open a default document etc). Its not related to the Query string in anyway. if you prefer to use Process.Start(), then try something like this:
ProcessStartInfo p1 = new ProcessStartInfo("iexplore","http://google.com?q=test");
Process.Start(p1);
If you run that, it will open internet explorer and open google with test on the search box. If that were you're page, you could access "q" by calling:
echo $_GET["q"])
In my applications i used different method i.e using webClient i done it
WebClient client1 = new WebClient();
string path = "dtscompleted.php";//your php path
NameValueCollection formData = new NameValueCollection();
byte[] responseBytes2=null;
formData.Add("key", "123");
try
{
responseBytes2 = client1.UploadValues(path, "POST", formData);
}
catch (WebException web)
{
//MessageBox.Show("Check network connection.\n"+web.Message);
}

Substitute for WebClient to Prevent Timeouts?

I'm working on a C# project that uses a public XML feed for calculations. I originally used XmlDocument.Load, but migrated to WebClient.DownloadString so I could include headers in my request. The feed I'm accessing usually responds quickly, but every now and again it fails to respond within the timeout period of the WebClient object, and I get an exception. Here's my code:
XmlDocument xmlDoc = new XmlDocument();
Webclient client = new WebClient();
client.Headers["User-Agent"] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202 Safari/535.1";
client.Headers["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
string data = client.DownloadString(/*URL*/);
xmlDoc.LoadXml(data);
I've read that you cannot change the timeout property of WebClient, and people who have this problem should use HttpWebRequest instead. Unfortunately, I don't know how to go about implementing this in a way that still allows me to use my headers AND send that result to xmlDoc. Due to the nature of this application, I don't care how long it takes to receive the data; I can handle alerting the user.
What is the best way to go about doing this?
You could use a WebClient derived class for this, which just adds the timeout you want for each fetch:
public class TimeoutWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
HttpWebRequest request = (HttpWebRequest)base.GetWebRequest(address);
request.Timeout = 60000; //1 minute timeout
return request;
}
}
If you use TimeoutWebClient instead of WebClient now, you get the timeout behavior that you want. If the custom headers you need are the same for each request, you could add those here as well and your calling code remains very clean.
XmlDocument xmlDoc = new XmlDocument();
HttpWebRequest request = new (HttpWebRequest)WebRequest.Create(/*URL*/);
request.Headers = new WebHeaderCollection();
// fill in request.Headers...
// The response is presented as a stream. Wrap it in a StreamReader that
// xmlDoc.LoadXml can accept.
xmlDoc.LoadXml(new StreamReader(request.GetResponse().GetResponseStream());
You could just catch the exception, then reissue the request. You might want to put some other logic in here to abort after a certain number of failed attempts.
bool continue;
do{
continue = false;
try {
string data = client.DownloadString(/*URL*/);
}
catch (WebException e) {
continue = true;
}
}
while(continue);

Categories