Connection problems while using Parallel.ForEach Loop - c#

I have a foreach loop which is responsible for executing a certain set of statements. A part of that is to save an image from a URL to Azure storage. I have to do this for a large set of data. To achieve the same I have converted the foreach loop into a Parallel.ForEach loop.
Parallel.ForEach(listSkills, item =>
{
// some business logic
var b = getImageFromUrl(item.Url);
Stream ms = new MemoryStream(b);
saveImage(ms);
// more business logic
});
private static byte[] getByteArray(Stream input)
{
using (MemoryStream ms = new MemoryStream())
{
input.CopyTo(ms);
return ms.ToArray();
}
}
public static byte[] getImageFromUrl(string url)
{
HttpWebRequest request = null;
HttpWebResponse response = null;
byte[] b = null;
request = (HttpWebRequest)WebRequest.Create(url);
response = (HttpWebResponse)request.GetResponse();
if (request.HaveResponse)
{
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
b = getByteArray(receiveStream);
}
}
return b;
}
public static void saveImage(Stream fileContent)
{
fileContent.Seek(0, SeekOrigin.Begin);
byte[] bytes = getByteArray(fileContent);
var blob = null;
blob.UploadFromByteArrayAsync(bytes, 0, bytes.Length).Wait();
}
Although there are instances when I am getting the below error and the image is not getting saved.
An existing connection was forcibly closed by the remote host.
Also sharing the StackTrace :
at System.Net.Sockets.NetworkStream.Read(Span`1 buffer)
at System.Net.Security.SslStream.<FillBufferAsync>d__183`1.MoveNext()
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Net.Security.SslStream.<ReadAsyncInternal>d__181`1.MoveNext()
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Net.Security.SslStream.Read(Byte[] buffer, Int32 offset, Int32 count)
at System.IO.Stream.Read(Span`1 buffer)
at System.Net.Http.HttpConnection.Read(Span`1 destination)
at System.Net.Http.HttpConnection.ContentLengthReadStream.Read(Span`1 buffer)
at System.Net.Http.HttpBaseStream.Read(Byte[] buffer, Int32 offset, Int32 count)
at System.IO.Stream.CopyTo(Stream destination, Int32 bufferSize)
at Utilities.getByteArray(Stream input) in D:\repos\SampleProj\Sample\Helpers\CH.cs:line 238
at Utilities.getImageFromUrl(String url) in D:\repos\SampleProj\Sample\Helpers\CH.cs:line 178
I am guessing this maybe because I am not using locks? I am unsure whether to use locks within a Parallel.ForEach loop.

According to another question on stackoverflow, here are the potential causes for An existing connection was forcibly closed by the remote host. :
You are sending malformed data to the application (which could include sending an HTTPS request to an HTTP server)
The network link between the client and server is going down for some reason
You have triggered a bug in the third-party application that caused it to crash
The third-party application has exhausted system resources
Since only some of your requests are affected, I think we can exclude the first one. This can be, of course, a network issue, and in that case, this would happend from time to time depending on the quality of the netwok between you and the server.
Unless you find indication of an AzureStorage's bug from other users, there is a high probability your call are consuming too much of the remote server's resources (connections/data) at the same time. Servers and proxy have limitation on how much connections they can handle at the same time (especially from the same client machine).
Depending on the size of your listSkills list, your code may launch a big number of request in parallel (as much as your thread pool can), possibly flooding the server.
You could at least limit the number of parallel task launch using MaxDegreeOfParallelism like this :
Parallel.ForEach(listSkills,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
item =>
{
// some business logic
var b = getImageFromUrl(item.Url);
Stream ms = new MemoryStream(b);
saveImage(ms);
// more business logic
});

You can control parallelism like:
listSkills.AsParallel()
.Select(item => {/*Your Logic*/ return item})
.WithDegreeOfParallelism(10)
.Select(item =>
{
getImageFromUrl(item.url);
saveImage(your_stream);
return item;
});
But Parallel.ForEach is not good for IO because it's designed for CPU-intensive tasks, if you use it for IO-bound operations specially making web requests you may waste thread pool thread blocked while waiting for response.
You use asynchronous web request methods like HttpWebRequest.GetResponseAsync, in the other side you can also use thread synchronization constructs for that, as ab example using Semaphore, the Semaphore is like queue, it allows X threads to pass, and the rest should wait until one of busy threads will finish it's work.
First make your getStream method as async like (this is not good solution but can be better):
public static async Task getImageFromUrl(SemaphoreSlim semaphore, string url)
{
try
{
HttpWebRequest request = null;
byte[] b = null;
request = (HttpWebRequest)WebRequest.Create(url);
using (var response = await request.GetResponseAsync().ConfigureAwait(false))
{
// your logic
}
}
catch (Exception ex)
{
// handle exp
}
finally
{
// release
semaphore.Release();
}
}
and then:
using (var semaphore = new SemaphoreSlim(10))
{
foreach (var url in urls)
{
// await here until there is a room for this task
await semaphore.WaitAsync();
tasks.Add(getImageFromUrl(semaphore, url));
}
// await for the rest of tasks to complete
await Task.WhenAll(tasks);
}
You should not use the Parallel or Task.Run instead you can have an async handler method like:
public async Task handleResponse(Task<HttpResponseMessage> response)
{
HttpResponseMessage response = await response;
//Process your data
}
and then use Task.WhenAll like:
Task[] requests = myList.Select(l => getImageFromUrl(l.Id))
.Select(r => handleResponse(r))
.ToArray();
await Task.WhenAll(requests);
at the end there are several solution for your scenario but forget Parallel.Foreach instead use optimized solution.

There are several problems with this code:
Parallel.ForEach is meant for data parallelism, not IO. The code is freezing all CPU cores waiting for IO to complete
HttpWebRequest is a wrapper over HttpClient in .NET Core. Using HttpWebRequest is inefficient and far more complex than needed.
HttpClient can post retrieve or post stream contents which means there's no reason to load stream contents in memory. HttpClient is thread-safe and meant to be reused too.
There are several ways to execute many IO operations concurrently in .NET Core.
.NET 6
In the current Long-Term-Support version of .NET, .NET 6, this can be done using Parallel.ForEachAsync. Scott Hanselman shows how easy it is to use it for API calls
You can retrieve the data directly with GetBytesAsync :
record CopyRequest(Uri sourceUri,Uri blobUri);
...
var requests=new List<CopyRequest>();
//Load some source/target URLs
var client=new HttpClient();
await Parallel.ForEachAsync(requests,async req=>{
var bytes=await client.GetBytesAsync(req.sourceUri);
var blob=new CloudAppendBlob(req.targetUri);
await blob.UploadFromByteArrayAsync(bytes, 0, bytes.Length);
});
A better option would be to retrieve the data as a stream and send it directly to the blob :
await Parallel.ForEachAsync(requests,async req=>{
var response=await client.GetAsync(req.sourceUri,
HttpCompletionOption.ResponseHeadersRead);
using var sourceStream=await response.Content.ReadAsStreamAsync();
var blob=new CloudAppendBlob(req.targetUri);
await blob.UploadFromStreamAsync(sourceStream);
});
HttpCompletionOption.ResponseHeadersRead causes GetAsync to return as soon as the response headers are received, without buffering any of the response data.
.NET 3.1
In older .NET Core versions (which are reaching End-of-Life in a few months) you can use eg an ActionBlock with a Degree-Of-Parallelism greater than 1:
var options=new ExecuteDataflowBlockOptions{ MaxDegreeOfParallelism = 8};
var copyBlock=new ActionBlock<CopyRequest>(async req=>{
var response=await client.GetAsync(req.sourceUri,
HttpCompletionOption.ResponseHeadersRead);
using var sourceStream=await response.Content.ReadAsStreamAsync();
var blob=new CloudAppendBlob(req.targetUri);
await blob.UploadFromStreamAsync(sourceStream);
}, options);
The block classes in the TPL Dataflow library can be used to construct processing pipelines similar to a shell script pipeline, with each block piping its output to the next block.

Related

grpc WriteAsync locking up server

Grpc.Core 2.38.0
I have a collection of applications participating in inter process communication using grpc streaming. From time to time we've noticed a lockup and memory exhaustion (due to the lockup) in the server processes being unable to finish a call to IAsyncStreamWriter.WriteAsync(...)
Recent changes to grpc (.net) have changed the WriteAsync API to accept a CancellationToken, however this is not available in the Grpc.Core package.
A misconfigured grpc client accepting a stream can cause a deadlock. If a client does not dispose of the AsyncServerStreamingCall during error handling, the deadlock will occur on the server.
Example:
async Task ClientStreamingThread()
{
while (...)
{
var theStream = grpcService.SomeStream(new());
try
{
while (await theStream.ResponseStream.MoveNext(shutdownToken.Token))
{
var theData = theStream.ResponseStream.Current;
}
}
catch (RpcException)
{
// if an exception occurs, start over, reopen the stream
}
}
}
The example above contains the misbehaving client. If an RpcException occurs, we'll return to the start of the while loop and open another stream without cleaning up the previous. This causes the deadlock.
"Fix" the client code by disposing of the previous stream like the following:
async Task ClientStreamingThread()
{
while (...)
{
// important. dispose of theStream if it goes out of scope
using var theStream = grpcService.SomeStream(new());
try
{
while (await theStream.ResponseStream.MoveNext(shutdownToken.Token))
{
var theData = theStream.ResponseStream.Current;
}
}
catch (RpcException)
{
// if an exception occurs, start over, reopen the stream
}
}
}

Getting HTML response fails respectively after first fail

I have a program which gets html code for ~500 webpages every 5 minutes
it runs correctly until first fail(unable to download source in 6 seconds)
after that all threads will fail
and if I restart program, again it runs correctly until ...
where I'm wrong, what I should do to do it better?
this function runs every 5 mins:
foreach (Company company in companies)
{
string link = company.GetLink();
Thread t = new Thread(() => F(company, link));
t.Start();
if (!t.Join(TimeSpan.FromSeconds(6)))
{
Debug.WriteLine( company.Name + " Fails");
t.Abort();
}
}
and this function download html code
private void F(Company company, string link)
{
try
{
string htmlCode = GetInformationFromWeb.GetHtmlRequest(link);
company.HtmlCode = htmlCode;
}
catch (Exception ex)
{
}
}
and this class:
public class GetInformationFromWeb
{
public static string GetHtmlRequest(string url)
{
using (MyWebClient client = new MyWebClient())
{
client.Encoding = Encoding.UTF8;
string htmlCode = client.DownloadString(url);
return htmlCode;
}
}
}
and web client class
public class MyWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest;
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
return request;
}
}
IF your foreach is looping over 500 companies, and each is creating a new thread, it could be that your internet speed could become a bottleneck and you will receive timeouts over 6 seconds, and fail very often.
I suggest you to try with parallelism. Note MaxDegreeOfParallelism, which sets maximum amount of parallel executions. You can tune this to suit your needs.
Parallel.ForEach(companies, new ParallelOptions { MaxDegreeOfParallelism = 10 }, (company) =>
{
try
{
string htmlCode = GetInformationFromWeb.GetHtmlRequest(company.link);
company.HtmlCode = htmlCode;
}
catch(Exception ex)
{
//ignore or process exception
}
});
I have four basic suggestions:
Use HttpClient instead of obsolete WebClient. HttpClient can deal with asynchronous operations natively and has far more flexibility to take advantage of. You can even read downloaded contents to strings/streams on different thread since you can configure await not to schedule back your operations. Or even program the HttpClientHandler to break after 6 seconds and raise TaskCanceledException if this was exceeded.
Avoid swallowing exceptions (like you do in your F function) as it breaks debugging and obfuscates the real cause of problems. Correctly-written program will never raise an exception during normal operation.
You are using threads in an useless way, in which they are not even overlapping; they are just waiting for each other to start, because you are locking the calling loop after each thread's start. In .NET it would be better to do multitasking using Tasks (for example, by calling them as Task.Run(async delegate() { await yourTask(); }) (or AsyncContext.Run(...) if you need UI access) and it won't block anything.
The whole GetInformationFromWeb class is pointless in the moment - and you are spawning multiple client objects also pointlessly, since one HTTP client object can handle multiple requests (if you'd use HttpClient even without additional bloat - you just instantiate it once as static global variable with all necessary configuration and then call it from any place using as little code as client.GetStringAsync(Uri uri).
OT: Is it some kind of an academic project?

.NET HttpClient.PostAsync() slow after 3 requests

I am using the .NET 4.5 HttpClient class to make a POST request to a server a number of times. The first 3 calls run quickly, but the fourth time a call to await client.PostAsync(...) is made, it hangs for several seconds before returning the expected response.
using (HttpClient client = new HttpClient())
{
// Prepare query
StringBuilder queryBuilder = new StringBuilder();
queryBuilder.Append("?arg=value");
// Send query
using (var result = await client.PostAsync(BaseUrl + queryBuilder.ToString(),
new StreamContent(streamData)))
{
Stream stream = await result.Content.ReadAsStreamAsync();
return new MyResult(stream);
}
}
The server code is shown below:
HttpListener listener;
void Run()
{
listener.Start();
ThreadPool.QueueUserWorkItem((o) =>
{
while (listener.IsListening)
{
ThreadPool.QueueUserWorkItem((c) =>
{
var context = c as HttpListenerContext;
try
{
// Handle request
}
finally
{
// Always close the stream
context.Response.OutputStream.Close();
}
}, listener.GetContext());
}
});
}
Inserting a debug statement at // Handle request shows that the server code doesn't seem to receive the request as soon as it is sent.
I have already investigated whether it could be a problem with the client not closing the response, meaning that the number of connections the ServicePoint provider allows could be reached. However, I have tried increasing ServicePointManager.MaxServicePoints but this has no effect at all.
I also found this similar question:
.NET HttpClient hangs after several requests (unless Fiddler is active)
I don't believe this is the problem with my code - even changing my code to exactly what is given there didn't fix the problem.
The problem was that there were too many Task instances scheduled to run.
Changing some of the Task.Factory.StartNew calls in my program for tasks which ran for a long time to use the TaskCreationOptions.LongRunning option fixed this. It appears that the task scheduler was waiting for other tasks to finish before it scheduled the request to the server.

Async Loop There is no longer an HttpContext available

I have a requirement, is to process X number of files, usually we can receive around 100 files each day, is a zip file so I have to open it, create a stream then send it to a WebApi service which is a workflow, this workflow calls two more WebApi Steps.
I implemented a console application that loops through the files then calls a wrapper which makes a REST call using HttpWebRequest.GetResponse().
I stressed tested the solution and created 11K files, in a synchronous version it takes to process all the files around 17 minutes, but I would like to create an async version of it and be able to use await HttpWebRequest.GetResponseAsync().
Here is the Async version:
private async Task<KeyValuePair<HttpStatusCode, string>> REST_CallAsync(
string httpMethod,
string url,
string contentType,
object bodyMessage = null,
Dictionary<string, object> headerParameters = null,
object[] queryStringParamaters = null,
string requestData = "")
{
try
{
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create("some url");
req.Method = "POST";
req.ContentType = contentType;
//Adding zip stream to body
var reqBodyBytes = ReadFully((Stream)bodyMessage);
req.ContentLength = reqBodyBytes.Length;
Stream reqStream = req.GetRequestStream();
reqStream.Write(reqBodyBytes, 0, reqBodyBytes.Length);
reqStream.Close();
//Async call
var resp = await req.GetResponseAsync();
var httpResponse = (HttpWebResponse)resp as HttpWebResponse;
var responseData = new StreamReader(resp.GetResponseStream()).ReadToEnd();
return new KeyValuePair<HttpStatusCode,string>(httpResponse.StatusCode, responseData);
}
catch (WebException webEx)
{
//something
}
catch (Exception ex)
{
//something
}
In my console Application I have a loop to open and call the async (CallServiceAsync under the covers calls the method above)
foreach (var zipFile in Directory.EnumerateFiles(directory))
{
using (var zipStream = System.IO.File.OpenRead(zipFile))
{
await _restFulService.CallServiceAsync<WorkflowResponse>(
zipStream,
headerParameters,
null,
true);
}
processId++;
}
}
What end up happening was that only 2K of 11K got processed and didn't throw any exception so I was clueless so I changed the version I am calling the async to:
foreach (var zipFile in Directory.EnumerateFiles(directory))
{
using (var zipStream = System.IO.File.OpenRead(zipFile))
{
tasks.Add(_restFulService.CallServiceAsync<WorkflowResponse>(
zipStream,
headerParameters,
null,
true));
}
}
}
And have another loop to await for the tasks:
foreach (var task in await System.Threading.Tasks.Task.WhenAll(tasks))
{
if (task.Value != null)
{
Console.WriteLine("Ending Process");
}
}
And now I am facing a different error, when I process three files, the third one receives:
The client is disconnected because the underlying request has been completed. There is no longer an HttpContext available.
My question is, what i am doing wrong here? I use SimpleInjector as IoC would it be this the problem?
Also when you do WhenAll is waiting for each thread to run? Is not making it synchronous so it waits for a thread to finish in order to execute the next one? I am new to this async world so any help would be really much appreciated.
Well for those that added -1 to my question and instead of providing some type of solution just suggested something meaningless, here it is the answer and the reason why specifying as much detail as possible is useful.
First problem, since I'm using IIS Express if I'm not running my solution (F5) then the web applications are not available, that happened to me sometimes not always.
The second problem and the one giving me a huge headache is that not all the files got processed, I should've known the reason of this issue before, is the usage of async - await in a console application. I forced my console app to work with async by doing:
static void Main(string[] args)
{
System.Threading.Tasks.Task.Run(() => MainAsync(args)).Wait();
}
static async void MainAsync(string[] args)
{
//rest of code
Then if you note in my foreach I had await keyword and what was happening is that by concept await sends back the control flow to the caller, in this case the OS is the one calling the Console App (that is why doesn't make too much sense to use async - await in a console app, I did it because I mistakenly used await by calling an async method).
So the result was that my process only processed some X number of files, so what I end up doing is the following:
Add a list of tasks, the same way I did above:
tasks.Add(_restFulService.CallServiceAsync<WorkflowResponse>(....
And the way to run the threads is (in my console app):
ExecuteAsync(tasks);
Finally my method:
static void ExecuteAsync(List<System.Threading.Tasks.Task<KeyValuePair<HttpStatusCode, WorkflowResponse>>> tasks)
{
System.Threading.Tasks.Task.WhenAll(tasks).Wait();
}
UPDATE: Based on Scott's feedback, I changed the way I execute my threads.
And now I'm able to process all my files, I tested it and to process 1000 files in my synchronous process took around 160+ seconds to run all the process (I have a workflow of three steps in order to process the file) and when I put my async process in place it took 80+ seconds so almost half of the time. In my production server with IIS I believe the execution time will be less.
Hope this helps to anyone facing this type of issue.

How to handle multiple request batch processing using Task in ASP.NET?

I have a list of selected contentIds and for each content id I need to call an api, get the response and then save the received response for each content in DB.
At a time a user can select any number of content ranging from 1-1000 and can pass on this to update the content db details after getting the response from api.
In this situation I end up creating multiple requests for each content.
I thought to go ahead with asp.net async Task operation and then wrote this following method.
The code I wrote currently creates one one task for each contentId and atlast I am waiting from all task to get the response.
Task.WaitAll(allTasks);
public static Task<KeyValuePair<int, string>> GetXXXApiResponse(string url, int contentId)
{
var client = new HttpClient();
return client.GetAsync(url).ContinueWith(task =>
{
var response = task.Result;
var strTask = response.Content.ReadAsStringAsync();
strTask.Wait();
var strResponse = strTask.Result;
return new KeyValuePair<int, string>(contentId, strResponse);
});
}
I am now thinking for each task I create it will create one thread and in turn with the limited no of worker thread this approach will end up taking all the threads, which I don't want to happen.
Can any one help/guide me how to handle this situation effectively i.e handling multiple api requests or kind of batch processing using async tasks etc?
FYI: I'm using .NET framework 4.5
A task is just a representation of an asynchronous operation that can be waited on and cancelled. Creating new tasks doesn't necessarily create new threads (it mostly doesn't). If you use async-await correctly you don't even have a thread during most of the asynchronous operation.
But, making many requests concurrently can still be problematic (e.g. burdening the content server too much). So you may still want to limit the amount of concurrent calls using SemaphoreSlim or TPL Dataflow:
private static readonly SemaphoreSlim _semaphore = new SemaphoreSlim(100);
public static async Task<KeyValuePair<int, string>> GetXXXApiResponse(string url, int contentId)
{
await _semaphore.WaitAsync();
try
{
var client = new HttpClient();
var response = await client.GetAsync(url);
var strResponse = await response.Content.ReadAsStringAsync();
return new KeyValuePair<int, string>(contentId, strResponse);
}
finally
{
_semaphore.Release();
}
}
To wait for these tasks to complete you should use Task.WhenAll to asynchronously wait instead of Task.WaitAll which blocks the calling thread:
await Task.WhenAll(apiResponses);

Categories