I need to use WebRequest to download a webpage content into a string.
I can't use WebClient instead because it doesn't support certain HTTP Headers which i need to use. I couldn't figure out what's the best practice for handling memory issues in this case (How to correctly dispose it). Is using a using statement enough or do i need to add some try catch stuff in here too?
This is my code so far:
var webRequest = (HttpWebRequest)WebRequest.Create("http://www.gooogle.com");
using (var webResponse = (HttpWebResponse)webRequest.GetResponse()) {
using (var responseStream = webResponse.GetResponseStream()) {
responseStream.ReadTimeout = 30;
using (var streamReader = new StreamReader(responseStream, Encoding.UTF8)) {
var page = streamReader.ReadToEnd();
}
}
Console.WriteLine("Done");
}
Your code is fine (except of course that some exception handling would be nice). You don't need to worry about disposing or closing streams when using using, the compiler generates the code for that.
The best thing would of course be to wrap the above code in a function that returns the page, and put a global try/catch in there, for instance:
public string GetHtmlPage(string urlToFetch)
{
string page = "";
try
{
... code ...
return page;
} catch (Exception exc)
{
throw new HtmlPageRetrievalException(exc);
}
}
Related
I am downloading a pdf file using HttpWebRequest object and write the content directly to a FileStream from a response stream, using all "using" blocks and also the .Close method right after the data is copied.
And the next step, I need to extract some text from that pdf file by using some 3rd party library (iText7) but it can't access the file.
At first, I thought it was the iText7-related issue but then I realized it doesn't seem so because I can't even delete the file from file explorer, getting "file in use" error by my own app.
Here's the sample code:
HttpWebRequest webReq = (HttpWebRequest)HttpWebRequest.Create(url);
webReq.AllowAutoRedirect = true;
webReq.CookieContainer = Cookies;
webReq.UserAgent = UserAgent;
webReq.Referer = Referrer;
webReq.Method = WebRequestMethods.Http.Get;
using (HttpWebResponse response = (HttpWebResponse)webReq.GetResponse())
{
using (Stream httpResponseStream = response.GetResponseStream())
{
using (FileStream output = File.Create(file1))
{
httpResponseStream.CopyTo(output);
output.Close();
}
httpResponseStream.Close();
response.Close();
Cookies = webReq.CookieContainer;
}
}
GC.Collect();
ExtractPDFDoc(file1);//error throws in this function and the exception.message is "Cannot open document."
Console.WriteLine("now waiting to let you check the file is in use? try delete it manually...");
Console.ReadKey(); //added this line to ensure that file is actually in use. I can't even delete the file manually from windows file explorer at this time. But, interestingly, Acrobat Reader can OPEN the file when I double click, which makes me thing that Adobe and iText7 uses different methods to open the pdf file - but anyway - I can't help it tho.
Can you please help what is wrong here?
For those who wants to see the ExtractPDFDoc() method:
public static object ExtractPDFDoc(string filename)
{
iText.Kernel.Pdf.PdfReader pdfReader = null;
iText.Kernel.Pdf.PdfDocument pdfDocument = null;
try
{
pdfReader = new iText.Kernel.Pdf.PdfReader(filename);
pdfDocument = new iText.Kernel.Pdf.PdfDocument(pdfReader);
}
catch (Exception ex)
{
pdfReader = null;
pdfDocument = null;
return new Exception(string.Format("ExtractPDFDoc() failed on file '{0}' with message '{1}'", filename, ex.Message));
//this is where I get the error, ex.Message is 'Cannot open document.'
//however, I can open it in Adobe Reader but I can't delete it before closing my app.
}
}
If I remember correctly, the iText objects are all IDisposable, so you should be sure to dispose of them as well. Also, I don't know why you're returning an exception instead of just throwing it.
public static object ExtractPDFDoc(string filename)
{
iText.Kernel.Pdf.PdfReader pdfReader = null;
iText.Kernel.Pdf.PdfDocument pdfDocument = null;
try
{
pdfReader = new iText.Kernel.Pdf.PdfReader(filename);
pdfDocument = new iText.Kernel.Pdf.PdfDocument(pdfReader);
}
catch (Exception ex)
{
throw new Exception(string.Format("ExtractPDFDoc() failed on file '{0}' with message '{1}'", filename, ex.Message), ex);
}
finally
{
pdfReader?.Dispose();
pdfDocument?.Dispose();
}
}
Unrelated to that, you can also stack your using statements instead of nesting them.
using (HttpWebResponse response = (HttpWebResponse)webReq.GetResponse())
using (Stream httpResponseStream = response.GetResponseStream())
using (FileStream output = File.Create(file1))
{
// do stuff
}
I'm deeply sorry, thanks to #howcheng, I realized that it was the iText7 which leaves the file open after it's failed to open the document because of one of it's dependency files was missing in the output folder.
It's clear that I should do a .Close() on iText7 objects on exception to avoid false perceptions such as this.
Thanks for all your help.
I´m making repeated requests to a web server using HttpWebRequest, but I randomly get a 'broken' response stream in return. e.g it doesn´t contain the tags that I KNOW is supposed to be there. If I request the same page multiple times in a row it turns up 'broken' ~3/5.
The request always returns a 200 response so I first thought there was a null value inserted in the response that made the StreamReader think it reached the end.
I´ve tried:
1) reading everything into a byte array and cleaning it
2) inserting a random Thread.Sleep after each request
Is there any potentially bad practice with my code below or can anyone tell me why I´m randomly getting an incomplete response stream? As far as I can see I´m closing all unmanaged resources so that shouldn´t be a problem, right?
public string ReturnHtmlResponse(string url)
{
string result;
var request = (HttpWebRequest)WebRequest.Create(url);
{
using(var response = (HttpWebResponse)request.GetResponse())
{
Console.WriteLine((int)response.StatusCode);
var encoding = Encoding.GetEncoding(response.CharacterSet);
using(var stream = response.GetResponseStream())
{
using(var sr = new StreamReader(stream,encoding))
{
result = sr.ReadToEnd();
}
}
}
}
return result;
}
I do not see any direct flaws in you're code. What could be is that one of the 'Parent' using statements is done before the nested one. Try changing the using to a Dispose() and Close() method.
public string ReturnHtmlResponse(string url)
{
string result;
var request = (HttpWebRequest)WebRequest.Create(url);
var response = (HttpWebResponse)request.GetResponse();
Console.WriteLine((int)response.StatusCode);
var encoding = Encoding.GetEncoding(response.CharacterSet);
var stream = response.GetResponseStream();
var sr = new StreamReader(stream,encoding);
result = sr.ReadToEnd();
sr.Close();
stream.Close();
response.Close();
sr.Dispose();
stream.Dispose();
response.Dispose();
return result;
}
I'm making a WinForms project on C#/C++ (depending on the best way I could find to reach my goal, language could be changed). I need to get a page from website and parse it to get some information. I'm a very beginner in web programming with Visual C#/C++ and all the answers I found here are too complicated for me as a beginner. Could you help me to tell which standart classes should I use for getting page from Internet in some form and how to parse it then. I would be very pleased if you have any code examples, cause as I wrote above I have no experience in web coding and have no time to learn every term in detail. Thank you in advance.
You can use c# to download the specific webpage then do the analysis, an code example of downloading:
using System.Net;
using System.IO;
using System.Windows.Forms;
string result = null;
string url = "http://www.devtopics.com";
WebResponse response = null;
StreamReader reader = null;
try
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create( url );
request.Method = "GET";
response = request.GetResponse();
ContentType contentType = new ContentType(response.ContentType);
Encoding encoding = Encoding.GetEncoding(contentType.CharSet);
reader = new StreamReader( response.GetResponseStream(), encoding);
result = reader.ReadToEnd();
}
catch (Exception ex)
{
// handle error
MessageBox.Show( ex.Message );
}
finally
{
if (reader != null)
reader.Close();
if (response != null)
response.Close();
}
Check out this project 'here' and their code examples 'here'
I have a MonoTouch based iOS universal app. It uses REST services to make calls to get data. I'm using the HttpWebRequest class to build and make my calls. Everything works great, with the exception that it seems to be holding onto memory. I've got usings all over the code to limit the scope of things. I've avoided anonymous delegates as well as I had heard they can be a problem. I have a helper class that builds up my call to my REST service. As I make calls it seems to just hold onto memory from making my calls. I'm curious if anyone has run into similar issues with the HttpWebClient and what to do about it. I'm currently looking to see if I can make a call using an nsMutableRequest and just avoid the HttpWebClient, but am struggling with getting it to work with NTLM authentication. Any advice is appreciated.
protected T IntegrationCall<T,I>(string methodName, I input) {
HttpWebRequest invokeRequest = BuildWebRequest<I>(GetMethodURL(methodName),"POST",input, true);
WebResponse response = invokeRequest.GetResponse();
T result = DeserializeResponseObject<T>((HttpWebResponse)response);
invokeRequest = null;
response = null;
return result;
}
protected HttpWebRequest BuildWebRequest<T>(string url, string method, T requestObject, bool IncludeCredentials)
{
ServicePointManager.ServerCertificateValidationCallback = Validator;
var invokeRequest = WebRequest.Create(url) as HttpWebRequest;
if (invokeRequest == null)
return null;
if (IncludeCredentials)
{
invokeRequest.Credentials = CommonData.IntegrationCredentials;
}
if( !string.IsNullOrEmpty(method) )
invokeRequest.Method = method;
else
invokeRequest.Method = "POST";
invokeRequest.ContentType = "text/xml";
invokeRequest.Timeout = 40000;
using( Stream requestObjectStream = new MemoryStream() )
{
DataContractSerializer serializedObject = new DataContractSerializer(typeof(T));
serializedObject.WriteObject(requestObjectStream, requestObject);
requestObjectStream.Position = 0;
using(StreamReader reader = new StreamReader(requestObjectStream))
{
string strTempRequestObject = reader.ReadToEnd();
//byte[] requestBodyBytes = Encoding.UTF8.GetBytes(strTempRequestObject);
Encoding enc = new UTF8Encoding(false);
byte[] requestBodyBytes = enc.GetBytes(strTempRequestObject);
invokeRequest.ContentLength = requestBodyBytes.Length;
using (Stream postStream = invokeRequest.GetRequestStream())
{
postStream.Write(requestBodyBytes, 0, requestBodyBytes.Length);
}
}
}
return invokeRequest;
}
Using using is the right thing to do - but your code seems to be duplicating the same content multiple times (which it should not do).
requestObjectStream is turned into a string which is then turned into a byte[] before being written to another stream. And that's without considering what the extra code (e.g. ReadToEnd and UTF8Encoding.GetBytes) might allocate themselves (e.g. like more strings, byte[]...).
So if what you serialize is large then you'll consume a lot of extra memory (for nothing). It's even a bit worse for stringand byte[] since you can't dispose them manually (GC will decide when, making measurement harder).
I would try (but did not ;-) something like:
...
using (Stream requestObjectStream = new MemoryStream ()) {
DataContractSerializer serializedObject = new DataContractSerializer(typeof(T));
serializedObject.WriteObject(requestObjectStream, requestObject);
requestObjectStream.Position = 0;
invokeRequest.ContentLength = requestObjectStream.Length;
using (Stream postStream = invokeRequest.GetRequestStream())
requestObjectStream.CopyTo (postStream);
}
...
That would let the MemoryStream copy itself to the request stream. An alternative is to call ToArray to the MemoryStream (but that's another copy of the serialized object that the GC will have to track and free).
I need to check if a text file exists on a site on a different domain. The URL could be:
http://sub.somedomain.com/blah/atextfile.txt
I need to do this from code behind. I am trying to use the HttpWebRequest object, but not sure how to do it.
EDIT: I am looking for a light weight way of doing this as I'll be executing this logic every few seconds
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(
"http://sub.somedomain.com/blah/atextfile.txt");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
// FILE EXISTS!
}
response.Close();
You could probably use the method used here:
http://www.eggheadcafe.com/tutorials/aspnet/2c13cafc-be1c-4dd8-9129-f82f59991517/the-lowly-http-head-reque.aspx
Something like this might work for you:
using (WebClient webClient = new WebClient())
{
try
{
using (Stream stream = webClient.OpenRead("http://does.not.exist.com/textfile.txt"))
{
}
}
catch (WebException)
{
throw;
}
}