Hi I was making a crawler for a site. After about 3 hours of crawling, my app stopped on a WebException. below are my code in c#. client is predefined WebClient object that will be disposed every time gameDoc has already been processed. gameDoc is a HtmlDocument object (from HtmlAgilityPack)
while (retrygamedoc)
{
try
{
gameDoc.LoadHtml(client.DownloadString(url)); // this line caused the exception
retrygamedoc = false;
}
catch
{
client.Dispose();
client = new WebClient();
retrygamedoc = true;
Thread.Sleep(500);
}
}
I tried to use code below (to keep the webclient fresh) from this answer
while (retrygamedoc)
{
try
{
using (WebClient client2 = new WebClient())
{
gameDoc.LoadHtml(client2.DownloadString(url)); // this line cause the exception
retrygamedoc = false;
}
}
catch
{
retrygamedoc = true;
Thread.Sleep(500);
}
}
but the result is still the same. Then I use StreamReader and the result stays the same! below are my code using StreamReader.
while (retrygamedoc)
{
try
{
// using native to check the result
HttpWebRequest webreq = (HttpWebRequest)WebRequest.Create(url);
string responsestring = string.Empty;
HttpWebResponse response = (HttpWebResponse)webreq.GetResponse(); // this cause the exception
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
responsestring = reader.ReadToEnd();
}
gameDoc.LoadHtml(client.DownloadString(url));
retrygamedoc = false;
}
catch
{
retrygamedoc = true;
Thread.Sleep(500);
}
}
What should I do and check? I am so confused because I got am able to crawl on some pages, on the same site, then in about 1000 reasults, it cause the exception. the message from exception is only The request was aborted: The connection was closed unexpectedly. and the status is ConnectionClosed
PS. the app is a desktop form app.
update :
Now I am skipping the values and turned them to null so that the crawling can goes on. But if the data is really needed, I still have to update the crawling result manually, which is tiring because the result contains thousands of record. Please help me.
example :
it was like you have downloaded like about 1300 data from the website, then the application stopped saying The request was aborted: The connection was closed unexpectedly. while all your internet connection still on and on a good speed.
ConnectionClosed may indicate (and probably does) that the server you're downloading from is closing the connection. Perhaps it is noticing a large amount of requests from your client and is denying you additional service.
Since you can't control server-side shenanigans, I'd recommend you have some sort of logic to retry the download a bit later.
Got this error because it was returned as 404 from the server.
Related
I'm trying to communicate between my host app and service via AppServiceConnection. I'm using the following code in my host app:
using (var connection = new AppServiceConnection())
{
connection.AppServiceName = extension.AppServiceName;
connection.PackageFamilyName = extension.PackageFamilyName;
var connectionStatus = await connection.OpenAsync();
if (connectionStatus == AppServiceConnectionStatus.Success)
{
var response = await connection.SendMessageAsync(requestMessage);
if (response.Status == AppServiceResponseStatus.Success)
returnValue = response.Message as ValueSet;
}
}
And my service code:
private async void OnRequestReceived(AppServiceConnection sender, AppServiceRequestReceivedEventArgs args)
{
var messageDeferral = args.GetDeferral();
var message = args.Request.Message;
var returnData = new ValueSet();
var command = message["Command"] as string;
switch (command)
{
case "ACTION":
var value = await AsyncAction();
returnData = new ValueSet { { "Value", JsonConvert.SerializeObject(value) } };
break;
default:
break;
}
await args.Request.SendResponseAsync(returnData);
messageDeferral.Complete();
}
This works some of the time, but other times the ValueSet (returnValue) is randomly empty when the host receives it. It has Value in it when returned in the service, but when I get the response in the host, nothing.
I've verified that the service is indeed setting the value, adding it to the ValueSet and returning it correctly.
Note that my service is receiving the request, the host is receiving the response and the response status is Success; a failed connection isn't the issue.
Sometimes this happens only once before requests start working again, other times it will happen ten times in a row.
The first working response after a failure always takes significantly longer than normal.
Also, I have no issues in the request from host to service. It's always service to host where the problem shows up.
Has anyone else run into this issue and figured it out?
In the process of creating a sample app I realized what the problem was. I was performing an asynchronous action (albeit a very short one) before my line var messageDeferral = args.GetDeferral();. It appears that this was allowing the background task to be closed before it had responded to the host. Simply moving that line to the beginning of the OnRequestReceived function fixed the problem for me.
So for anyone who runs into a similar issue, get your deferral before you do anything else! Spare yourself the pain I went through.
I have an FTP client, running as part of a windows service that gets information from an FTP server on a scheduled basis. My issue is that sometimes, the FTP server is down for planned maintanance. When this happens, my FTP client still calls out on a scheduled basis and fails with the following error:
System.Net.WebException. The underlying connection was closed: An unexpected error occurred on a receive
I get the error above twice. After this, I get the following timeout error every time indefinitely:
System.Net.WebException The operation has timed out
Even with the maintenance window complete, my windows service will keep timing out when attempting to connect to the FTP server. The only way we can solve the problem is by restarting the windows service. The following code shows my FTP client code:
var _request = (FtpWebRequest)WebRequest.Create(configuration.Url);
_request.Method = WebRequestMethods.Ftp.DownloadFile;
_request.KeepAlive = false;
_request.Timeout = configuration.RequestTimeoutInMilliseconds;
_request.Proxy = null; // Do NOT use a proxy
_request.Credentials = new NetworkCredential(configuration.UserName, configuration.Password);
_request.ServicePoint.ConnectionLeaseTimeout = configuration.RequestTimeoutInMilliseconds;
_request.ServicePoint.MaxIdleTime = configuration.RequestTimeoutInMilliseconds;
try
{
using (var _response = (FtpWebResponse)_request.GetResponse())
using (var _responseStream = _response.GetResponseStream())
using (var _streamReader = new StreamReader(_responseStream))
{
this.c_rateSourceData = _streamReader.ReadToEnd();
}
}
catch (Exception genericException)
{
throw genericException;
}
Anyone know what the issue might be?
i have this method:
private void sendSms(object url)
{
var Url = url.ToString();
webRequest = WebRequest.Create(Url);
// webRequest.BeginGetResponse(this.RespCallback, webRequest);
webResponse = webRequest.GetResponse();
// End the Asynchronous response.
var stream = new StreamReader(webResponse.GetResponseStream());
var response = stream.ReadToEnd().ToString();
if (response.Contains(Config.ValidResponse))
{
var queryString = HttpUtility.ParseQueryString(webRequest.RequestUri.Query);
OnMessageAccepted(this, new MessageAcceptedEventArgs(queryString["SN"], "n/a"));
}
else
{
OnMessageAccepted(this, new MessageAcceptedEventArgs("", "n/a"));
}
}
which i call inside a loop like this
While (true)
{
Send(url);
sleep(400);
}
Problem is after couples of hundreds of calls like 500 or 600 the performance of the calls gets slower and slower if i restart application it start so fast and good but then start slowing down ! i was wondering if there is any buffer or cache i should clear every now and then to make it stay fast ?
ps: i developed the server so im sure the server doesnt slow it down plus i tried that with different kind of server implementation that i developed and developed by others.
thanks in advance.
You need to dispose the response and response stream using using blocks.
Another question about Web proxy.
Here is my code:
IWebProxy Proxya = System.Net.WebRequest.GetSystemWebProxy();
Proxya.Credentials = CredentialCache.DefaultNetworkCredentials;
HttpWebRequest rqst = (HttpWebRequest)WebRequest.Create(targetServer);
rqst.Proxy = Proxya;
rqst.Timeout = 5000;
try
{
rqst.GetResponse();
}
catch(WebException wex)
{
connectErrMsg = wex.Message;
proxyworks = false;
}
This code hangs the first time it is called for a minute of two. After that on successive calls it works sometimes, but not others. It also never hits the catch block.
Now the weird part. If I add a MessageBox.Show(msg) call in the first section of code before the GetResponse() call this all will work every time without hanging. Here is an example:
try
{
// ========Here is where I make the call and get the response========
System.Windows.Forms.MessageBox.Show("Getting Response");
// ========This makes the whole thing work every time========
rqst.GetResponse();
}
catch(WebException wex)
{
connectErrMsg = wex.Message;
proxyworks = false;
}
I'm baffled about why it is behaving this way. I don't know if the timeout is not working (it's in milliseconds, not seconds, so should timeout after 5 seconds, right?...) or what is going on. The most confusing this is that the message box call makes it all work without hanging.
So any help and suggestions on what is happening is appreciated. These are the kind of bugs that drive me absolutely out of my mind.
EDIT and CORRECTION:
OK, so I've been testing this and the problem is caused when I try to download data from the URI that I am getting a response from. I am testing the connectivity using the GetResponse() method with a WebRequest, but am downloading the data with a WebClient. Here is the code for that:
public void LoadUpdateDataFromNet(string url, IWebProxy wProxy)
{
//Create web client
System.Net.WebClient webClnt = new System.Net.WebClient();
//set the proxy settings
webClnt.Proxy = wProxy;
webClnt.Credentials = wProxy.Credentials;
byte[] tempBytes;
//download the data and put it into a stream for reading
try
{
tempBytes = webClnt.DownloadData(url); // <--HERE IS WHERE IT HANGS
}
catch (WebException wex)
{
MessageBox.Show("NEW ERROR: " + wex.Message);
return;
}
//Code here that uses the downloaded data
}
The WebRequest and WebClient are both accessing the same URL which is a web path to an XML file and the proxy is the same one created in the method at the top of this post. I am testing to see if the created IWebProxy is valid for the specified path and file and then downloading the file.
The first piece of code I put above and this code using the WebClient are in separate classes and are called at different times, yet using a message box in the first bit of code still makes the whole thing run fine, which confuses me. Not sure what all is happening here or why message boxes and running/debugging in Visual Studio makes the program run OK. Suggestions?
So, I figured out the answer to the problem. The timeout for the we request is still 5 sec, but for some reason if it is not closed explicitly it makes consecutive web requests hang. Here is the code now:
IWebProxy Proxya = System.Net.WebRequest.GetSystemWebProxy();
//to get default proxy settings
Proxya.Credentials = CredentialCache.DefaultNetworkCredentials;
Uri targetserver = new Uri(targetAddress);
Uri proxyserver = Proxya.GetProxy(targetserver);
HttpWebRequest rqst = (HttpWebRequest)WebRequest.Create(targetserver);
rqst.Proxy = Proxya;
rqst.Timeout = 5000;
try
{
//Get response to check for valid proxy and then close it
WebResponse wResp = rqst.GetResponse();
//===================================================================
wResp.Close(); //HERE WAS THE PROBLEM. ADDING THIS CALL MAKES IT WORK
//===================================================================
}
catch(WebException wex)
{
connectErrMsg = wex.Message;
proxyworks = false;
}
Still not sure exactly how calling the message box was making everything work, but it doesn't really matter at this point. The whole thing works like a charm.
I have a windows service that calls a page after a certain interval of time. The page in turn creates some reports.
The problem is that the service stops doing anything after 2-3 calls. as in it calls the page for 2-3 times and then does not do any work though it shows that the service is running...i am using timers in my service..
please can someone help me with a solution here
thank you
the code:(where t1 is my timer)
protected override void OnStart(string[] args)
{
GetRecords();
t1.Elapsed += new ElapsedEventHandler(OnElapsedTime);
t1.Interval = //SomeTimeInterval
t1.Enabled = true;
t1.Start();
}
private void OnElapsedTime(object source, ElapsedEventArgs e)
{
try
{
GetRecords();
}
catch (Exception ex)
{
EventLog.WriteEntry(ex.Message);
}
}
public void GetRecords()
{
try
{
string ConnectionString = //Connection string from web.config
WebRequest Request = HttpWebRequest.Create(ConnectionString);
Request.Timeout = 100000000;
HttpWebResponse Response = (HttpWebResponse)Request.GetResponse();
}
catch (Exception ex)
{
}
}
Well, what does the code look like? WebClient is the easiest way to query a page:
string result;
using (WebClient client = new WebClient()) {
result = client.DownloadString(address);
}
// do something with `result`
The timer code might also be glitchy if it is stalling...
It's possible that HttpWebRequest will restrict the number of concurrent HTTP requests to a specific page or server, as is generally proper HTTP client practice.
The fact that you're not properly disposing your objects most likely means you are maintaining 2 or 3 connections to a specific page, each with large timout value, and HttpWebRequest is queueing or ignoring your requests until the first few complete (die from a client or server timeout, most likely the server in this case).
Add a 'finally' clause and dispose of your objects properly!
I think you're missing something about disposing your objects like StreamReader, WebRequest, etc.. You should dispose your expensive objects after using them.
possibly the way you are requesting athe page is throwing an unnhandled exception which leaves the service in an inoperable state.
Yes, we need code.
Marc's advice worked for me, in the context of a service
Using WebClient worked reliably, where WebRequest timed out.
#jscharf explanation looks as good as any to me.