WebClient DownloadString method stops downloading string after few hours of running

WebClient DownloadString method stops downloading string after few hours of running - c#

I have an EXE which downloads string from an API hosted on cloud. this exe serves well upto one or two hours, it downloads all the string from requested URI but after certain requests/ one to two hours,, it does not download any string. I also tried with DownloadStringAsync method but same behavior with that as well. Following is the code.
static void Main(string[] args)
{
for (int i = 0; i < 100000; i++)
{
using (var webClient = new WebClient())
{
webClient.Headers.Clear();
webClient.Headers.Add("MyId", "037a1289-a1c6-e611-80d9-000d3a213f57");
webClient.Headers.Add("Content-Type", "application/json; charset=utf-8");
string downloadedString = webClient.DownloadString(new Uri("https://mycloudapiurl.com/api/MySet/GetMySets?id=D6364A82-9A3C-E711-80E0-000D3A213F57"));
if (!string.IsNullOrWhiteSpace(downloadedString) && downloadedString != "null")
{
Console.WriteLine("Downloaded " + i + "times");
}
else
{
Console.WriteLine("Downloaded string is null for API URL");
}
}
Thread.Sleep(10000);
}
}
this execution stops after around 100 to 120 iterations. The same thing occurs in my real application. not able to figure out the cause which stops downloading after certain iterations

Related

Why can my WebClient not read a failed API call?

My code (intended for searching for steam market items using an API call) is supposed to read the response from an API call into my program and this works great, however, when the API call fails I want to display a separate error message letting the user know that something has gone wrong but currently it will simply crash the code.
An example of a successful API call is:
https://steamcommunity.com/market/priceoverview/?currency=2&appid=730&market_hash_name=Glock-18%20%7C%20Steel%20Disruption%20%28Minimal%20Wear%29
This results in the following;
{"success":true,"lowest_price":"\u00a31.39","volume":"22","median_price":"\u00a31.40"}
So far this works perfectly fine, the problem arises when an incorrect link is used, like this:
https://steamcommunity.com/market/priceoverview/?currency=2&appid=730&market_hash_name=this-skin-does-not-exist
This results in an error like so;
{"success":false}
I want to know when this happens so I can display a message to the user however in my code's current state it simply crashes when this is returned. Here's my current code:
webpage = "https://steamcommunity.com/market/priceoverview/?currency=2&appid=730&market_hash_name=" + Model.category + Model.weapon + " | " + Model.skin + " (" + Model.wear + ")";
System.Net.WebClient wc = new System.Net.WebClient();
byte[] raw = wc.DownloadData(webpage);
string webData = System.Text.Encoding.UTF8.GetString(raw);
if (webData.Substring(11, 1) == "t")
{
int lowestPos = webData.IndexOf("\"lowest_price\":\"");
int volumePos = webData.IndexOf("\",\"volume\":\"");
int medianPos = webData.IndexOf("\",\"median_price\":\"");
int endPos = webData.IndexOf("\"}");
Model.lowestPrice = webData.Substring(lowestPos + 16, volumePos - lowestPos - 16);
if (Model.lowestPrice.IndexOf("\\u00a3") != -1)
{
Model.lowestPrice = "£" + Model.lowestPrice.Substring(6);
}
Model.medianPrice = webData.Substring(medianPos + 18, endPos - medianPos - 18);
if (Model.medianPrice.IndexOf("\\u00a3") != -1)
{
Model.medianPrice = "£" + Model.medianPrice.Substring(6);
}
Model.volume = webData.Substring(volumePos + 12, medianPos - volumePos - 12);
}
else
{
Console.WriteLine("An error has occurred, please enter a correct skin");
}
The error occurs at byte[] raw = wc.DownloadData(webpage);
Any help would be appreciated :)

Webclient is deprecated and you should consider using HttpClient if possible. Webclient throws an exception. So you should wrap your code inside a try/catch block to catch the exception and react accordingly:
try
{
System.Net.WebClient wc = new System.Net.WebClient();
byte[] raw = wc.DownloadData(webpage);
string webData = System.Text.Encoding.UTF8.GetString(raw);
}
catch(System.Net.WebException e)
{
//handle the error here
}

C# HtmlAgilityPack Refresh the value

Since a while I'm coding in C# and I'm trying to create a couple of small tools for myself and my friends but I've ran into a problem, which stops me from continuing.
The problem is this. I want to use HmtlAgilityPack to get a changing value to use it for a couple of diffrent actions. But the problem is, that the value gets stuck on the same value until I restart the program.
So here is the code I'm using:
public static void Main(string[] args)
{
Console.WriteLine("Running the program!");
Console.WriteLine("Reading the value!");
int i = 0;
string url = "Website";
while (i < 300)
{
i++;
HtmlWeb web = new HtmlWeb();
HtmlDocument LoadWebsite = web.Load(url);
HtmlNode rateNode = LoadWebsite.DocumentNode.SelectSingleNode("//div[#class='the-value']");
string rate = rateNode.InnerText;
Console.WriteLine(i + ". " + rate);
Thread.Sleep(1000);
}
Console.WriteLine("Done");
Console.ReadLine();
}
So here it first loads the website. Next it gets the value from div. After that it writes the value so I can check it. But it just keeps writing the same value.
My question here is, that I don't know what I have to change to get newest value because the value changes every few seconds and I need the most recent value from my website. It's like the value is needed to keep the system running.

Declare your HtmlWeb web = new HtmlAgilityPack.HtmlWeb(); outside the loop, it isn't necessary to create for every loop.
You could be having caching issues in the website that you want to crawl.
Set web.UsingCache = false;.
If it doesn't work append some random string so that every call is different.
Code:
HtmlWeb htmlWeb = new HtmlAgilityPack.HtmlWeb();
htmlWeb.UsingCache = false;
int i = 0;
while (i < 300)
{
var uri = new Uri($"yoururl?z={Guid.NewGuid()}");
i++;
HtmlAgilityPack.HtmlDocument LoadWebsite = htmlWeb.Load(uri.AbsoluteUri);
HtmlNode rateNode = LoadWebsite.DocumentNode.SelectSingleNode("//div[#class='the-value']");
string rate = rateNode.InnerText;
Console.WriteLine(i + ". " + rate);
Thread.Sleep(1000);
}

Using Webclient with Foreach Loop to Download Webpages About 100,000

I am trying to build a small application where when I enter the a list of around 100,000 to 200,0000 urls it should go and download the html and save it in a relative folder.
I have 2 solution but each a some problems I have trying to figure out the best approach.
First Solution: Synchronize Method
Below is the code I am using
currentline = 0;
var lines = txtUrls.Lines.Where(line => !String.IsNullOrWhiteSpace(line)).Count();
string urltext = txtUrls.Text;
List<string> list = new List<string>(
txtUrls.Text.Split(new string[] { "\r\n" },
StringSplitOptions.RemoveEmptyEntries));
lblStatus.Text = "Working";
btnStart.Enabled = false;
foreach (string url in list)
{
using (WebClient client = new WebClient())
{
client.DownloadFile(url, #".\pages\page" + currentline + ".html");
currentline++;
}
}
lblStatus.Text = "Finished";
btnStart.Enabled = true;
the code works fine however it's slow and also randomly after 5000 urls it's stops working and the process says it's completed. (Please note I am using this code on background worker but make this code simpler to view I am showing only the relevant code.)
Second Solution : Asynchronize Method
int currentline = 0;
string urltext = txtUrls.Text;
List<string> list = new List<string>(
txtUrls.Text.Split(new string[] { "\r\n" },
StringSplitOptions.RemoveEmptyEntries));
foreach (var url in list)
{
using (WebClient webClient = new WebClient())
{
webClient.DownloadFileCompleted += new AsyncCompletedEventHandler(Completed);
webClient.DownloadProgressChanged += new DownloadProgressChangedEventHandler(ProgressChanged);
webClient.DownloadFileAsync(new Uri(url), #".\pages\page" + currentline + ".html");
}
currentline++;
label1.Text = "No.of Lines Completed: " + currentline;
}
this code works super fast but most of the time I am getting downloaded files with 0KB and I am sure the network is fast since I am testing in OVH Dedi server.
Can anyone point what I am doing wrong ? or tips on improving it or entirely different solution to this problem.

Instead of using DownloadFile() try use
public async Task GetData()
{
WebClient client = new WebClient();
var data = await client.DownloadDataTaskAsync("http://xxxxxxxxxxxxxxxxxxxxx");
}
you will get data formated in byte[]. Then you just call:
File.WriteAllBytes() to save them to disk.

Asynchronous Call using Delegate

I want that separate Async threads of method splitFile should run so that the task will become faster but below code is not working. When I debug , it reaches till line RecCnt = File.ReadAllLines(SourceFile).Length - 1; and comes out. Please help.
public delegate void SplitFile_Delegate(FileInfo file);
static void Main(string[] args)
{
DirectoryInfo d = new DirectoryInfo(#"D:\test\Perf testing Splitter"); //Assuming Test is your Folder
FileInfo[] Files = d.GetFiles("*.txt"); //Getting Text files
foreach (FileInfo file in Files)
{
SplitFile_Delegate LocalDelegate = new SplitFile_Delegate(SplitFile);
IAsyncResult R = LocalDelegate.BeginInvoke(file, null, null); //invoking the method
LocalDelegate.EndInvoke(R);
}
}
private static void SplitFile(FileInfo file)
{
try
{
String fname;
//int FileLength;
int RecCnt;
int fileCount;
fname = file.Name;
String SourceFile = #"D:\test\Perf testing Splitter\" + file.Name;
RecCnt = File.ReadAllLines(SourceFile).Length - 1;
fileCount = RecCnt / 10000;
FileStream fs = new FileStream(SourceFile, FileMode.Open);
using (StreamReader sr = new StreamReader(fs))
{
while (!sr.EndOfStream)
{
String dataLine = sr.ReadLine();
for (int x = 0; x < (fileCount + 1); x++)
{
String Filename = #"D:\test\Perf testing Splitter\Destination Files\" + fname + "_" + x + "by" + (fileCount + 1) + ".txt"; //test0by4
using (StreamWriter Writer = file.AppendText(Filename))
{
for (int y = 0; y < 10000; y++)
{
Writer.WriteLine(dataLine);
dataLine = sr.ReadLine();
}
Writer.Close();
}
}
}
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}

Your code doesn't really need any multi-threading. It doesn't really even need asynchronous processing all that much - you're saturating the I/O most likely, and unless you've got multiple drives as the data sources, you're not going to improve that by adding parallelism.
On the other hand, your code is reading each file twice. For no reason, wasting memory, time and even CPU. Instead, just do this:
FileStream fs = new FileStream(SourceFile, FileMode.Open);
using (StreamReader sr = new StreamReader(fs))
{
string line;
string fileName = null;
StreamWriter outputFile = null;
int lineCounter = 0;
int outputFileIndex = 0;
while ((line = sr.ReadLine()) != null)
{
if (fileName == null || lineCounter >= 10000)
{
lineCounter = 0;
outputFileIndex++;
fileName = #"D:\Output\" + fname + "_" + outputFileIndex + ".txt";
if (outputFile != null) outputFile.Dispose();
outputFile = File.AppendText(fileName);
}
outputFile.WriteLine(line);
lineCounter++;
}
}
If you really need to have the filename in format XOutOfY, you can just rename them afterwards - it's a lot cheaper than reading the source file twice, line after line. Or, if you don't care about keeping the whole file in memory at once, just use the array you got from ReadAllLines and iterate over that, rather than doing the reading all over again.
To make this even easier, you can also use foreach (var line in File.ReadLines(fileName)).
If you really want to make this asynchronous, the way to handle that is by using asynchronous I/O, not just by spooling new threads. So you can use await with StreamReader.ReadLineAsync etc.

You are not required to call EndInvoke and really all EndInvoke does is wait on the return value for you. Since SplitFile returns void, my guess is there's an optimization that kicks in because you don't need to wait on anything and it simply ignores the wait. For more details: C# Asynchronous call without EndInvoke?
That being said, your usage of Begin/EndInvoke will likely not be faster than serial programming (and will likely be marginally slower) as your for loop is still serialized, and you're still running the iteration in serial. All that has changed is you're using a delegate where it looks like one isn't necessary.
It's possible that what you meant to use was Parallel.ForEach (MSDN: https://msdn.microsoft.com/en-us/library/dd992001(v=vs.110).aspx) which will potentially run iterations in parallel.
Edit: As someone else has mentioned, having multiple threads engage in file operations will likely not improve performance as your file ops are probably disk bound. The main benefit you would get from an async file read/write would probably be unblocking the main thread for a UI update. You will need to specify what you want with "performance" if you want a better answer.

I am getting WebException: "The operation has timed out" immediately on HttpWebRequest.GetResponse()

I am scraping the content of web pages heavily in a multi-thread environment. I need a reliable downloader component that is tolerable to temporary server failures, connection drops, etc. Below is what my code looks like.
Now, I am having a weird situation over and over: It all starts perfectly. 10 threads pull data concurrently for about 10 minutes. After that time I start getting WebException with timeouts right after I call the GetResponse method of my request object. Taking a break (getting a thread to sleep) doesn't help. It only helps when I stop the application and start it over until the next 10 minutes pass and the problem comes back again.
What I tried already and nothing has helped:
to close/dispose the response object explicitly and via the "using" statement
to call request.Abort everywhere it could have helped
to manipulate timeouts at ServicePointManager/ServicePoint and WebRequest level (extend / shorten the timeout interval)
to manipulate the KeepAlive property
to call to CloseConnectionGroup
to manipulate the number the threads that run simultaneously
Nothing helps! So it seems like it's a bug or at least very poorly documented behavior. I've seen a lot of question regarding this in Google and on Stackoverflow, but non of them is fully answered. Basically people suggest one of the things from the list above. I tried all of them.
public TResource DownloadResource(Uri uri)
{
for (var resourceReadingAttempt = 0; resourceReadingAttempt <= MaxTries; resourceReadingAttempt++)
{
var request = (HttpWebRequest)WebRequest.Create(uri);
HttpWebResponse response = null;
for (var downloadAttempt = 0; downloadAttempt <= MaxTries; downloadAttempt++)
{
if (downloadAttempt > 0)
{
var sleepFor = TimeSpan.FromSeconds(4 << downloadAttempt) + TimeSpan.FromMilliseconds(new Random(DateTime.Now.Millisecond).Next(1000));
Trace.WriteLine("Retry #" + downloadAttempt + " in " + sleepFor + ".");
Thread.Sleep(sleepFor);
}
Trace.WriteLine("Trying to get a resource by URL: " + uri);
var watch = Stopwatch.StartNew();
try
{
response = (HttpWebResponse)request.GetResponse();
break;
}
catch (WebException exception)
{
request.Abort();
Trace.WriteLine("Failed to get a resource by the URL: " + uri + " after " + watch.Elapsed + ". " + exception.Message);
if (exception.Status == WebExceptionStatus.Timeout)
{
//Trace.WriteLine("Closing " + request.ServicePoint.CurrentConnections + " current connections.");
//request.ServicePoint.CloseConnectionGroup(request.ConnectionGroupName);
//request.Abort();
continue;
}
else
{
using (var failure = exception.Response as HttpWebResponse)
{
Int32 code;
try { code = failure != null ? (Int32)failure.StatusCode : 500; }
catch { code = 500; }
if (code >= 500 && code < 600)
{
if (failure != null) failure.Close();
continue;
}
else
{
Trace.TraceError(exception.ToString());
throw;
}
}
}
}
}
if (response == null) throw new ApplicationException("Unable to get a resource from URL \"" + uri + "\".");
try
{
// response disposal is required to eliminate problems with timeouts
// more about the problem: http://stackoverflow.com/questions/5827030/httpwebrequest-times-out-on-second-call
// http://social.msdn.microsoft.com/Forums/en/netfxnetcom/thread/a2014f3d-122b-4cd6-a886-d619d7e3140e
TResource resource;
using (var stream = response.GetResponseStream())
{
try
{
resource = this.reader.ReadFromStream(stream);
}
catch (IOException exception)
{
Trace.TraceError("Unable to read the resource stream: " + exception.ToString());
continue;
}
}
return resource;
}
finally
{
// recycle as much as you can
if (response != null)
{
response.Close();
(response as IDisposable).Dispose();
response = null;
}
if (request != null)
{
//Trace.WriteLine("closing connection group: " + request.ConnectionGroupName);
//request.ServicePoint.CloseConnectionGroup(request.ConnectionGroupName);
request.Abort();
request = null;
}
}
}
throw new ApplicationException("Resource was not able to be acquired after several attempts.");
}

i have same problem i have search a lot on internet , i got 1 solution, fix the number of thread at a time .you have to control the number of thread at a time, i have started to use 2-3 thread at a time.
also use this ServicePointManager.DefaultConnectionLimit = 200;
this will really help you.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

WebClient DownloadString method stops downloading string after few hours of running - c#

Related

Why can my WebClient not read a failed API call?

C# HtmlAgilityPack Refresh the value

Using Webclient with Foreach Loop to Download Webpages About 100,000

Asynchronous Call using Delegate

I am getting WebException: "The operation has timed out" immediately on HttpWebRequest.GetResponse()

Categories

Resources