using HtmlWeb causes HttpWebRequest to timeout - c#

So I've got a situation where I'm using HtmlAgilityPack to load web pages in order to scrape the Document contents. I have a number of URLs that I need to load and a few of them require gzip encoding so I catch the exception thrown by HtmlWeb.load(), check that it's a gzip encoding issue, and then process the page load with HttpWebRequest. However this allows the first time through with HttpWebRequest to be successful, but any other attemp with HttpWebRequest will timeout.
Here's a cleaned up version of the code:
HtmlDocument doc = new HtmlDocument();
HtmlWeb web = new HtmlWeb();
try
{
doc = web.Load(uri);
Console.WriteLine("htmlweb and htmldocument success");
}
catch (ArgumentException ae)
{
Console.WriteLine("htmlweb and htmldocument not successful");
if (ae.Message.Contains("\'gzip\'"))
{
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(uri);
try
{
req.Headers[HttpRequestHeader.AcceptEncoding] = "gzip, deflate";
req.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
req.Method = "GET";
//req.UserAgent = "Mozilla/5.0 (Windows; U; MSIE 9.0; WIndows NT 9.0; en-US))";
string source;
req.KeepAlive = false;
//req.Timeout = 100000;
// On the second iteration we never get beyond this line
using (WebResponse webResponse = req.GetResponse())
{
using (HttpWebResponse httpWebResponse = webResponse as HttpWebResponse)
{
using (StreamReader reader = new StreamReader(httpWebResponse.GetResponseStream()))
{
source = reader.ReadToEnd();
}
}
}
req.Abort();
Console.WriteLine("httpwebresponse successfull");
}
catch (WebException we)
{
Console.WriteLine("httpwebresponse not successful");
}
}
}
Is there some cleanup that I'm needing to do? or is there something I'm forgetting?
Any help will be greatly appreciated.

I think that I will have to load via WebRequest first, instead of HtmlWeb. then inspect the response header for gzip, and decompress as needed each time.
System.Net.HttpWebRequest req = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(uri);
//req.Headers[HttpRequestHeader.AcceptEncoding] = "gzip, deflate";
//req.AutomaticDecompression = System.Net.DecompressionMethods.Deflate | System.Net.DecompressionMethods.GZip;
//req.Method = "GET";
string source = String.Empty;
try
{
using (System.Net.WebResponse webResponse = req.GetResponse())
{
using (HttpWebResponse httpWebResponse = webResponse as HttpWebResponse)
{
StreamReader reader;
if (httpWebResponse.ContentEncoding.ToLower().Contains("gzip"))
{
reader = new StreamReader(new GZipStream(httpWebResponse.GetResponseStream(), CompressionMode.Decompress));
}
else if (httpWebResponse.ContentEncoding.ToLower().Contains("deflate"))
{
reader = new StreamReader(new DeflateStream(httpWebResponse.GetResponseStream(), CompressionMode.Decompress));
}
else
{
reader = new StreamReader(httpWebResponse.GetResponseStream());
}
source = reader.ReadToEnd();
}
}
req.Abort();
}
catch(Exception ex){
//received a 404 Error - apparently one of my links is now dead...
}

Related

Integration of Rightmove Real Time Data Feed (RTDF) asp.net

i am integrating rightmove real time data feed (rtdf) in my property site for listing my properties on rightmove website. i am using asp.net web api to post data on rightmove listing.
they have provide me with these SSL Files [.p12,.pem,.jks]. i have imported .p12 certificate in my local machine personal store and sending it in my http request
to rightmove test api link provide by rightmove.
i am getting the following error from server.
The remote server returned an error: 403 forbidden.
i checked my certificate loaded successfully in the request, below is my code
public static string PostData(string data, string url)
{
String result = "";
try
{
byte[] bytebuffer = Encoding.UTF8.GetBytes(data);
HttpWebRequest objRequest = (HttpWebRequest)WebRequest.Create(url);
objRequest.Method = "POST";
objRequest.ContentLength = bytebuffer.Length;
objRequest.ContentType = "application/json";
objRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0";
objRequest.PreAuthenticate = true;
objRequest.Accept = "application/json";
objRequest.ClientCertificates.Add(CertificateHelper.GetRightmoveApiX509Certificate());
using (Stream stream = objRequest.GetRequestStream())
{
stream.Write(bytebuffer, 0, bytebuffer.Length);
stream.Close();
}
HttpWebResponse objResponse = (HttpWebResponse)objRequest.GetResponse();
using (StreamReader streamReader = new StreamReader(objResponse.GetResponseStream()))
{
result = streamReader.ReadToEnd();
// Close and clean up the StreamReader
streamReader.Close();
}
}
catch (Exception e)
{
result = "Exception: " + e.Message;
}
return result;
}
help me to get rid from 403 forbidden error.
Use the following.
I have tested it and it's working fine in my case.
// Grab Certificate
X509Certificate2 cert2 = new X509Certificate2(
AppDomain.CurrentDomain.BaseDirectory + "CertificateName.p12",
CertificatePasswordHere,
X509KeyStorageFlags.MachineKeySet);
var httpWebRequest = (HttpWebRequest)WebRequest.Create("https://adfapi.adftest.rightmove.com/v1/property/sendpropertydetails");
httpWebRequest.ContentType = "application/json";
httpWebRequest.Method = "POST";
httpWebRequest.ClientCertificates.Clear();
httpWebRequest.ClientCertificates.Add(cert2);
using (var streamWriter = new StreamWriter(httpWebRequest.GetRequestStream()))
{
streamWriter.Write(data);
streamWriter.Flush();
streamWriter.Close();
}
var httpResponse = (HttpWebResponse)httpWebRequest.GetResponse();
using (var streamReader = new StreamReader(httpResponse.GetResponseStream()))
{
var result = streamReader.ReadToEnd();
}

How to capture a response token sent by a REST API after a request?

I am working on consuming a REST API and I am using basic authentication where password is encoded to Base64 as follows
private XmlDocument sendXMLRequest(string requestXml)
{
string destinationUrl = "https://serviceapi.testgroup.com/testtp/query";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(destinationUrl);
request.Headers["Authorization"] = "Basic " + Convert.ToBase64String(Encoding.Default.GetBytes("API_TEST_NR:Testnol1$"));
byte[] bytes;
bytes = System.Text.Encoding.ASCII.GetBytes(requestXml);
request.Method = "POST";
request.ContentLength = bytes.Length;
//request.Connection = "keep-alive";
request.ContentType = "text/xml";
request.KeepAlive = true;
request.Timeout = 2000;
request.MediaType = "text/xml";
Stream requestStream = request.GetRequestStream();
requestStream.Write(bytes, 0, bytes.Length);
requestStream.Close();
HttpWebResponse response;
Stream responseStream;
using (response = (HttpWebResponse)request.GetResponse())
{
if (response.StatusCode == HttpStatusCode.OK)
{
responseStream = response.GetResponseStream();
XmlReader reader = new XmlTextReader(responseStream);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(reader);
try { reader.Close(); }
catch { }
try { responseStream.Close(); }
catch { }
try { response.Close(); }
catch { }
return xmlDoc;
}
}
try { response.Close(); }
catch { }
return null;
}
I'm kind of new to working on Web Api's and I know that the API responds with an access x-token after successful authorization based on the API documentaion and I am not sure how to access or capture it from the HTTP headers.
May I know a good way I can achieve this?
This is easier than I thought just capturing with its name.
string xtoken= response.Headers["custom-header"];
Console.WriteLine(xtoken);
Try this as below, represents, Request Data Using the WebRequest Class.In most cases, the WebRequest class is sufficient to receive data. However, if you need to set protocol-specific properties, you must cast the WebRequest to the protocol-specific type. For example, to access the HTTP-specific properties of HttpWebRequest, cast the WebRequest to an HttpWebRequest reference.
private XmlDocument GetRootLevelServiceDocument(
string serviceEndPoint, string oAuthToken)
{
XmlDocument xmlDoc = new XmlDocument();
HttpWebRequest request = CreateHttpRequest(serviceEndPoint,
oAuthToken);
using (HttpWebResponse response =
(HttpWebResponse)request.GetResponse())
{
using (XmlReader reader =
XmlReader.Create(response.GetResponseStream(),
new XmlReaderSettings() { CloseInput = true }))
{
xmlDoc.Load(reader);
string data = ReadResponse(response);
if (response.StatusCode != HttpStatusCode.OK)
{
LogMsg(string.Format("Error: {0}", data));
LogMsg(string.Format(
"Unexpected status code returned: {0}",
response.StatusCode));
}
}
}
return xmlDoc;
}

HttpWebRequest get 404 page only when using POST mode

First of all: I know this has been asked over 100 times, but most of these questions were eigher caused by timeout problems, by incorrect Url or by foregetting to close a stream (and belive me, I tried ALL the samples and none of them worked).
So, now to my question: in my Windows Phone app I'm using the HttpWebRequest to POST some data to a php web service. That service should then save the data in some directories, but to simplify it, at the moment, it only echos "hello".
But when I use the following code, I always get a 404 complete with an apache 404 html document. Therefor I think I can exclude the possibility of a timeout. It seems like the request reaches the server, but for some reason, a 404 is returned. But what really makes me be surprised is, if I use a get request, everything works fine. So here is my code:
HttpWebRequest webRequest = (HttpWebRequest)HttpWebRequest.CreateHttp(server + "getfeaturedpicture.php?randomparameter="+ Environment.TickCount);
webRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0";
webRequest.Method = "POST";
webRequest.ContentType = "text/plain; charset=utf-8";
StreamWriter writer = new StreamWriter(await Task.Factory.FromAsync<Stream>(webRequest.BeginGetRequestStream, webRequest.EndGetRequestStream, null));
writer.Write(Encoding.UTF8.GetBytes("filter=" + Uri.EscapeDataString(filterML)));
writer.Close();
webRequest.BeginGetResponse(new AsyncCallback((res) =>
{
string strg = getResponseString(res);
Stator.mainPage.Dispatcher.BeginInvoke(() => { MessageBox.Show(strg); });
}), webRequest);
Although I don't think this is the reason, here's the source of getResponseString:
public static string getResponseString(IAsyncResult asyncResult)
{
HttpWebRequest webRequest = (HttpWebRequest)asyncResult.AsyncState;
HttpWebResponse webResponse;
try
{
webResponse = (HttpWebResponse)webRequest.EndGetResponse(asyncResult);
}
catch (WebException ex)
{
webResponse = ex.Response as HttpWebResponse;
}
MemoryStream tempStream = new MemoryStream();
webResponse.GetResponseStream().CopyTo(tempStream);
tempStream.Position = 0;
webResponse.Close();
return new StreamReader(tempStream).ReadToEnd();
}
This is tested code work fine in Post method with some body. May this gives you an idea.
public void testSend()
{
try
{
string url = "abc.com";
string str = "test";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
req.Method = "POST";
req.ContentType = "text/plain; charset=utf-8";
req.BeginGetRequestStream(SendRequest, req);
}
catch (WebException)
{
}
}
//Get Response and write body
private void SendRequest(IAsyncResult asyncResult)
{
string str = "test";
string Data = "data=" + str;
HttpWebRequest req= (HttpWebRequest)asyncResult.AsyncState;
byte[] postBytes = Encoding.UTF8.GetBytes(Data);
req.ContentType = "application/x-www-form-urlencoded";
req.ContentLength = postBytes.Length;
Stream requestStream = req.GetRequestStream();
requestStream.Write(postBytes, 0, postBytes.Length);
requestStream.Close();
request.BeginGetResponse(SendResponse, req);
}
//Get Response string
private void SendResponse(IAsyncResult asyncResult)
{
try
{
MemoryStream ms;
HttpWebRequest request = (HttpWebRequest)asyncResult.AsyncState;
HttpWebResponse response = (HttpWebResponse)request.EndGetResponse(asyncResult);
HttpWebResponse httpResponse = (HttpWebResponse)response;
string _responestring = string.Empty;
using (Stream data = response.GetResponseStream())
using (var reader = new StreamReader(data))
{
_responestring = reader.ReadToEnd();
}
}
catch (WebException)
{
}
}
I would suggest you to use RestSharp for your POST requests in windows phone. I am making an app for a startup and i faced lots of problems while using a similar code as yours. heres an example of a post request using RestSharp. You see, instead of using 3 functions it can be done in a more concise form. Also the response can be handled efficiently. You can get RestSharp from Nuget.
RestRequest request = new RestRequest("your url", Method.POST);
request.AddParameter("key", value);
RestClient restClient = new RestClient();
restClient.ExecuteAsync(request, (response) =>
{
if (response.StatusCode == HttpStatusCode.OK)
{
StoryBoard2.Begin();
string result = response.Content;
if (result.Equals("success"))
message.Text = "Review submitted successfully!";
else
message.Text = "Review could not be submitted.";
indicator.IsRunning = false;
}
else
{
StoryBoard2.Begin();
message.Text = "Review could not be submitted.";
}
});
It turned out the problem was on the server-side: it tried it on the server of a friend and it worked fine, there. I'll contact the support of the hoster and provide details as soon as I get a response.

Scraping a dynamic page with cookies

I am trying to scrape this page for a set of zipcodes.
https://www.chase.com/mortgage/loan-officer/search-results.html#action-search;zipcode-11747;lastname-;language-
If you put that in your browser, you will get results however, trying to do so in code fails.
First I tried
HttpWebRequest request = (HttpWebRequest )System.Net.WebRequest.Create(URI);
var sr = new System.IO.StreamReader(resp.GetResponseStream());
string page= sr.ReadToEnd().Trim();
but this code generated by a plugin in fiddler didnt work as well either. no results are returned. So what exactly am I missing??
private void MakeRequests()
{
HttpWebResponse response;
string responseText;
if (Request_www_chase_com(out response))
{
responseText = ReadResponse(response);
response.Close();
}
}
private static string ReadResponse(HttpWebResponse response)
{
using (Stream responseStream = response.GetResponseStream())
{
Stream streamToRead = responseStream;
if (response.ContentEncoding.ToLower().Contains("gzip"))
{
streamToRead = new GZipStream(streamToRead, CompressionMode.Decompress);
}
else if (response.ContentEncoding.ToLower().Contains("deflate"))
{
streamToRead = new DeflateStream(streamToRead, CompressionMode.Decompress);
}
using (StreamReader streamReader = new StreamReader(streamToRead, Encoding.UTF8))
{
return streamReader.ReadToEnd();
}
}
}
private bool Request_www_chase_com(out HttpWebResponse response)
{
response = null;
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://www.chase.com/mortgage/loan-officer/search-results.html");
request.KeepAlive = true;
request.Headers.Set(HttpRequestHeader.CacheControl, "max-age=0");
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
request.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.76 Safari/537.36";
request.Headers.Add("DNT", #"1");
request.Referer = "https://mail.google.com/mail/u/0/?shva=1";
request.Headers.Set(HttpRequestHeader.AcceptEncoding, "gzip,deflate,sdch");
request.Headers.Set(HttpRequestHeader.AcceptLanguage, "en-US,en;q=0.8");
request.Headers.Set(HttpRequestHeader.Cookie, #"v1st=3B46E5CCD302C2DE; marketlist=68|90|152|170|198; chasezip=zipcode=11577&county=Nassau&state=NY; ASP.NET_SessionId=kwybehscfioasswbl20wb14f; PC_1_0=n%3Dundefined|u%3Dundefined|l%3Dundefined|zip%3D11577|lastUpdate%3D2014-01-24|lastSent%3D2014-01-24|home%3Dpersonal|; SessionPersistence=CLICKSTREAMCLOUD%3A%3DvisitorId%3D%7CPROFILEDATA%3A%3D%7CSURFERINFO%3A%3Dbrowser%3DChrome%2COS%3DWindows%2Cresolution%3D1366x768%7C; fsr.s=%7B%22v2%22%3A-2%2C%22v1%22%3A1%2C%22rid%22%3A%22d464cf6-82273859-c860-572f-2944b%22%2C%22to%22%3A5%2C%22c%22%3A%22https%3A%2F%2Fwww.chase.com%2Fmortgage%2Floan-officer%2Fsearch-results.html%23action-search%3Bzipcode-11747%3Blastname-%3Blanguage-%22%2C%22pv%22%3A12%2C%22lc%22%3A%7B%22d18%22%3A%7B%22v%22%3A12%2C%22s%22%3Atrue%7D%7D%2C%22cd%22%3A18%2C%22sd%22%3A18%2C%22f%22%3A1390649574789%7D");
request.IfModifiedSince = DateTime.Parse("Fri, 24 Jan 2014 20:18:51 GMT");
response = (HttpWebResponse)request.GetResponse();
}
catch (WebException e)
{
if (e.Status == WebExceptionStatus.ProtocolError) response = (HttpWebResponse)e.Response;
else return false;
}
catch (Exception)
{
if (response != null) response.Close();
return false;
}
return true;
}
To make this work, you'd need to parse the HTML, then download and run the JavaScript. Instead of writing your own browser, use a Web Browser control to load the page, then scrape its inner HTML.
The page uses AJAX to create the results so all you will see in your response is the initial HTML

problem using proxy with HttpWebRequest in C#

I'm using this code to use proxy with HttpWebRequest
public string GetBoardPageResponse(string url, string proxy = "")
{
ServicePointManager.Expect100Continue = false;
HttpWebRequest request = HttpWebRequest.Create(url) as HttpWebRequest;
HttpWebResponse response = null;
WebProxy myProxy = new WebProxy(proxy);
request.Proxy = myProxy;
request.Timeout = 20000;
request.ReadWriteTimeout = 20000;
request.Accept = "*/*";
request.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)";
request.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip,deflate");
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
// SEND POST
Stream os = null;
StreamReader sr = null;
try
{
//post data
byte[] bytes = Encoding.ASCII.GetBytes(param);
if (param.Length > 0)
{
request.ContentLength = bytes.Length; //Count bytes to send
os = request.GetRequestStream();
os.Write(bytes, 0, bytes.Length); //Send it
}
// Get the response
HttpWebResponse webResponse;
using (webResponse = (HttpWebResponse)request.GetResponse())
if (webResponse == null)
return "";
sr = new StreamReader(webResponse.GetResponseStream(), Encoding.GetEncoding(webResponse.CharacterSet));
string encoding = webResponse.CharacterSet;
string data = sr.ReadToEnd().Trim();
return data;
}
catch (Exception ex)
{
return "";
}
finally
{
if (sr != null)
sr.Close();
if (response != null)
response.Close();
if (os != null)
os.Close();
}
}
now this function works fine if I don't use proxy server. but If I add any proxy it will return null result. if I use same proxy with WebClient it works like charm.. I really have no idea what's really blocking or bugging this..
any ideas or help will be appreciated!
just changed: using (webResponse = (HttpWebResponse)request.GetResponse())
to webResponse = (HttpWebResponse)request.GetResponse();
nooby miskate..

Categories