Parallelizing multiple long-running tasks with async/await - c#

I have a helper method returns IEnumerable<string>. As the collection grows, it's slowing down dramatically. My current approach is to do essentially the following:
var results = new List<string>();
foreach (var item in items)
{
results.Add(await item.Fetch());
}
I'm not actually sure whether this asynchronicity gives me any benefit (it sure doesn't seem like it), but all methods up the stack and to my controller's actions are asynchronous:
public async Task<IHttpActionResult> FetchAllItems()
As this code is ultimately used by my API, I'd really like to parallelize these all for what I hope would be great speedup. I've tried .AsParallel:
var results = items
.AsParallel()
.Select(i => i.Fetch().Result)
.AsList();
return results;
And .WhenAll (returning a string[]):
var tasks = items.Select(i => i.Fetch());
return Task<string>.WhenAll<string>(tasks).Result;
And a last-ditch effort of firing off all long-running jobs and sequentially awaiting them (hoping that they were all running in parallel, so waiting on one would let all others nearly complete):
var tasks = new LinkedList<Task<string>>();
foreach (var item in items)
tasks.AddLast(item.Fetch());
var results = new LinkedList<string>();
foreach (var task in tasks)
results.AddLast(task.Result);
In every test case, the time it takes to run is directly proportional to the number of items. There's no discernable speedup by doing this. What am I missing in using Tasks and await/async?

There's a difference between parallel and concurrent. Concurrency just means doing more than one thing at a time, whereas parallel means doing more than one thing on multiple threads. async is great for concurrency, but doesn't (directly) help you with parallelism.
As a general rule, parallelism on ASP.NET should be avoided. This is because any parallel work you do (i.e., AsParallel, Parallel.ForEach, etc) shares the same thread pool as ASP.NET, so that reduces ASP.NET's capability to handle other requests. This impacts the scalability of your web service. It's best to leave the thread pool to ASP.NET.
However, concurrency is just fine - specifically, asynchronous concurrency. This is where Task.WhenAll comes in. Code like this is what you should be looking for (note that there is no call to Task<T>.Result):
var tasks = items.Select(i => i.Fetch());
return await Task<string>.WhenAll<string>(tasks);
Given your other code samples, it would be good to run through your call tree starting at Fetch and replace all Result calls with await. This may be (part of) your problem, because Result forces synchronous execution.
Another possible problem is that the underlying resource being fetched does not support concurrent access, or there may be throttling that you're not aware of. E.g., if Fetch retrieves data from another web service, check out System.Net.ServicePointManager.DefaultConnectionLimit.

There is also a configurable limitation on the max connections to a single server that can make download performance independent to the number of client threads.
To change the connection limit use
ServicePointManager.DefaultConnectionLimit
Maximum concurrent requests for WebClient, HttpWebRequest, and HttpClient

Related

How to get faster result using Parallelism

I have this (below) process where it collects search results returned from a service. Each result is then added to UI for display.
Can this be improved so that the _list can be processed in parallel (perhaps using multiple threads?), therefore I get faster results?
List<Query> queries = _list.Where(x => string.IsNullOrEmpty(x.Title));
foreach (var item in queries)
{
List<ExtendedSearchResult> searchResults = (await _service.SearchAsync(item.Query))
.Select(x => ExtendedSearchResult.FromSearchResult(x))
.ToList();
if (searchResults != null)
{
foreach (var result in searchResults)
{
_view.AddItem(result);
}
}
}
Found this post but not sure if this applies to my scenario and how to implement it.
Can this be improved so that the _list can be processed in parallel (perhaps using multiple threads?)
Maybe? there is not really anyway to tell from the example. Is the performed work IO-bound or compute bound? The searching might need to take some kind of lock to gain exclusive access to some resource, if that is the case parallelism would do nothing except increase overhead.
As a rule of thumb, parallelism is best for compute bound tasks, while async is best for IO bound tasks. But often a combination can be useful, modern SSDs are inherently parallel, and compute bound tasks are often done in the background to avoid blocking the main thread.
If you want to make this code parallel you need to make it thread-safe, i.e. make sure any shared objects are threadsafe, and ensure the UI is only updated on the main-thread.
There are plenty of resources available for running tasks concurrently. For example using async await for multiple tasks. As an alternative I would consider IAsynEnumerable to update the UI as results are returned.

Parallel or async ASP.NET Core C#

I've googled this plenty but I'm afraid I don't fully understand the consequences of concurrency and parallelism.
I have about 3000 rows of database objects that each have an average of 2-4 logical data attached to them that need to be validated as a part of a search query, meaning the validation service needs to execute approx. 3*3000 times. E.g. the user has filtered on color then each row needs to validate the color and return the result. The loop cannot break when a match has been found, meaning all logical objects will always need to be evaluated (this is due to calculations of relevance and just not a match).
This is done on-demand when the user selects various properties, meaning performance is key here.
I'm currently doing this by using Parallel.ForEach but wonder if it is smarter to use async behavior instead?
Current way
var validatorService = new LogicalGroupValidatorService();
ConcurrentBag<StandardSearchResult> results = new ConcurrentBag<StandardSearchResult>();
Parallel.ForEach(searchGroups, (group) =>
{
var searchGroupResult = validatorService.ValidateLogicGroupRecursivly(
propertySearchQuery, group.StandardPropertyLogicalGroup);
result.Add(new StandardSearchResult(searchGroupResult));
});
Async example code
var validatorService = new LogicalGroupValidatorService();
List<StandardSearchResult> results = new List<StandardSearchResult>();
var tasks = new List<Task<StandardPropertyLogicalGroupSearchResult>>();
foreach (var group in searchGroups)
{
tasks.Add(validatorService.ValidateLogicGroupRecursivlyAsync(
propertySearchQuery, group.StandardPropertyLogicalGroup));
}
await Task.WhenAll(tasks);
results = tasks.Select(logicalGroupResultTask =>
new StandardSearchResult(logicalGroupResultTask.Result)).ToList();
The difference between parallel and async is this:
Parallel: Spin up multiple threads and divide the work over each thread
Async: Do the work in a non-blocking manner.
Whether this makes a difference depends on what it is that is blocking in the async-way. If you're doing work on the CPU, it's the CPU that is blocking you and therefore you will still end up with multiple threads. In case it's IO (or anything else besides the CPU, you will reuse the same thread)
For your particular example that means the following:
Parallel.ForEach => Spin up new threads for each item in the list (the nr of threads that are spun up is managed by the CLR) and execute each item on a different thread
async/await => Do this bit of work, but let me continue execution. Since you have many items, that means saying this multiple times. It depends now what the results:
If this bit of workis on the CPU, the effect is the same
Otherwise, you'll just use a single thread while the work is being done somewhere else

How to efficiently make 1000s of web requests as quickly as possible

I need to make 100,000s of lightweight (i.e. small Content-Length) web requests from a C# console app. What is the fastest way I can do this (i.e. have completed all the requests in the shortest possible time) and what best practices should I follow? I can't fire and forget because I need to capture the responses.
Presumably I'd want to use the async web requests methods, however I'm wondering what the impact of the overhead of storing all the Task continuations and marshalling would be.
Memory consumption is not an overall concern, the objective is speed.
Presumably I'd also want to make use of all the cores available.
So I can do something like this:
Parallel.ForEach(iterations, i =>
{
var response = await MakeRequest(i);
// do thing with response
});
but that won't make me any faster than just my number of cores.
I can do:
Parallel.ForEach(iterations, i =>
{
var response = MakeRequest(i);
response.GetAwaiter().OnCompleted(() =>
{
// do thing with response
});
});
but how do I keep my program running after the ForEach. Holding on to all the Tasks and WhenAlling them feels bloated, are there any existing patterns or helpers to have some kind of Task queue?
Is there any way to get any better, and how should I handle throttling/error detection? For instance, if the remote endpoint is slow to respond I don't want to continue spamming it.
I understand I also need to do:
ServicePointManager.DefaultConnectionLimit = int.MaxValue
Anything else necessary?
The Parallel class does not work with async loop bodies so you can't use it. Your loop body completes almost immediately and returns a task. There is no parallelism benefit here.
This is a very easy problem. Use one of the standard solutions for processing a series of items asynchronously with a given DOP (this one is good: http://blogs.msdn.com/b/pfxteam/archive/2012/03/05/10278165.aspx. Use the last piece of code).
You need to empirically determine the right DOP. Simply try different values. There is no theoretical way to derive the best value because it is dependent on many things.
The connection limit is the only limit that's in your way.
response.GetAwaiter().OnCompleted
Not sure what you tried to accomplish there... If you comment I'll explain the misunderstanding.
The operation you want to perform is
Call an I/O method
Process the result
You are correct that you should use an async version of the I/O method. What's more, you only need 1 thread to start all of the I/O operations. You will not benefit from parallelism here.
You will benefit from parallelism in the second part - processing the result, as this will be a CPU-bound operation. Luckily, async/await will do all the job for you. Console applications don't have a synchronization context. It means that the part of the method after an await will run on a thread pool thread, optimally utilizing all CPU cores.
private async Task MakeRequestAndProcessResult(int i)
{
var result = await MakeRequestAsync();
ProcessResult(result);
}
var tasks = iterations.Select(i => MakeRequestAndProcessResult(i)).ToArray();
To achieve the same behavior in an environment with a synchronization context (for example WPF or WinForms), use ConfigureAwait(false).
var result = await MakeRequestAsync().ConfigureAwait(false);
To wait for the tasks to complete, you can use await Task.WhenAll(tasks) inside an async method or Task.WaitAll(tasks) in Main().
Throwing 100k requests at a web service will probably kill it, so you will have to limit it. You can check answers to this question to find some options how to do it.
Parallel.ForEach should be able to use more threads than there are cores if you explicitly set the MaxDegreeOfParallelism property of the ParallelOptions parameter (in the overload of ForEach where there is that parameter) - see https://msdn.microsoft.com/en-us/library/system.threading.tasks.paralleloptions.maxdegreeofparallelism(v=vs.110).aspx
You should be able to set this on 1,000 to get it to use 1,000 threads or even more, but that might not be efficient due to the threading overheads. You may wish to experiment (eg. loop from eg. 100 to 1,000 stepping in 100s to try submitting 1,000 requests each time and time start to finish) or even set up some kind of self-tuning algorithm.

Using Parallel.ForEach and Tasks.Factory.StartNew for database insertion/updation

I am working in .Net 4.0 and my Code is supposed to do this:
I have a WebAPI exposed to user . In this I have a collection of Objects . Basically a ConcurrentBag containing some objects . I have to iterate over each object in this collection and then insert/update its data in the Database . The count of objects can be high (200-300) . Adding to that , if there can be multiple concurrent users using my API .
Now , The insertion / updation is very slow as for each record the conn is made to database which makes this process very slow . Unfortunately i cant change the logic for this .
To improve performance I am using Parallel.ForEach instead of routine foreach as each iteration is diffrent. Also, I am creating a seperate task for each insertion in the db
Here is My Code
var tasks = new List<Task>(allRecordings.Count);//Creating a Task List
Parallel.ForEach(allRecordings, recording =>
{
var recordingItem = recording;
//Lines oF Code
//
if ( some Conditions){
var task = Task.Factory.StartNew(
() => SaveRecordingDetailsToDb(ref recordingItem, device.Locale));
recording.Title = recordingItem.Title;
recording.ProgramId =recordingItem.ProgramId;
recording.SeriesId = recordingItem.SeriesId;
tasks.Add(task);//Adding Task to List
}
});
Task.WaitAll(tasks.ToArray()); //Waiting for all Tasks to complete before going back to main
Function
}
Can MemoryLeak occur in the above block when there are multiple concurrent requests consuming this same API
Also, will using Parallel.ForEach be better here than normal ForEach.
The TPL (Task Parallel Library) was designed specifically for compute-bound operations, for operations which can be completed in parallel (like calculations on different CPU cores). In your case, you write into the database, so, basically you write something to a file system, i.e. this is IO operation. IO operations cannot be executed in parallel in sense of pure parallelism. If you run several IO operations simultaneously, they will simply interrupt each other and thus will take more time to complete comparing with running them one by one. Of course, the database server should somehow handle this situation, but it will not be significantly faster than sending requests to it one by one, more likely, it will be even slower.

Difference between ThreadPool.QueueUserWorkItem and Parallel.ForEach?

What is the main difference between two of following approaches:
ThreadPool.QueueUserWorkItem
Clients objClient = new Clients();
List<Clients> objClientList = Clients.GetClientList();
foreach (var list in objClientList)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(SendFilesToClient), list);
}
System.Threading.Tasks.Parallel ForEach
Clients objClient = new Clients();
List<Clients> objClientList = Clients.GetClientList();
Parallel.ForEach<Clients>(objClientList, list =>
{
SendFilesToClient(list);
});
I am new to multi-threading and want to know what's going to happen in each case (in terms of execution process) what's the level of multi-threading for each approach? Help me visualize both the processes.
SendFilesToClient: Gets data from database, converts to Excel and sends the Excel file to respective client.
Thanks!
The main difference is functional. Parallel.ForEach will block (by design), so it will not return until all of the objects have been processed. Your foreach queuing threadpool thread work will push the work onto background threads, and not block.
Also, the Parallel.ForEach version will have another major advantages - unhandled exceptions will be pushed back to the call site here, instead of left unhandled on a ThreadPool thread.
In general, Parallel.ForEach will be more efficient. Both options use the ThreadPool, but Parallel.ForEach does intelligent partitioning to prevent overthreading and to reduce the amount of overhead required by the scheduler. Individual tasks (which will map to ThreadPool threads) get reused, and effectively "pooled" to lower overhead, especially if SendFilesToClient is a fast operation (which, in this case, will not be true).
Note that you can also, as a third option, use PLINQ:
objClientList.AsParallel().ForAll(SendFilesToClient);
This will be very similar to the Parallel.ForEach method in terms of performance and functionality.

Categories