Asynchonous programming for method loops - c#

I am trying to learn about asynchronous programming and how I can benefit from it.
My hope is that I can use it to improve performance whenever I'm looping over a method that takes an significant time to complete, like the following method.
string AddStrings()
{
string result = "";
for (int i = 0; i < 10000; i++)
{
result += "hi";
}
return result;
}
Obviously this method doesn't have much value, and I purposely made it ineffecient, in order to test the benefits of asynchronous programming. The test is done by looping over the method 100 times, first synchronously and then asynchronously.
Stopwatch watch = new Stopwatch();
watch.Start();
List<string> results = WorkSync();
//List<string> results = await WorkAsyncParallel();
watch.Stop();
Console.WriteLine(watch.ElapsedMilliseconds);
List<string> WorkSync()
{
var stringList = new List<string>();
for (int i = 0; i < 100; i++)
{
stringList.Add(AddStrings());
}
return stringList;
}
async Task<List<string>> WorkAsyncParallel()
{
var taskList = new List<Task<string>>();
for (int i = 0; i < 100; i++)
{
taskList.Add(Task.Run(() => AddStrings()));
}
var results = (await Task.WhenAll(taskList)).ToList();
return results;
}
Super optimistically (naively), I was hoping that the asynchronous loop would be 100 times as fast as the synchronous loop, as all the tasks are running at the same time. While that didn't exactly turn out to be the case, the time of the loop was decreased by more than two thirds, from around 5000 miliseconds to 1500 miliseconds!
Now my questions are:
What makes the asynchronous loop faster than the synchronous loop, but not nearly 100 times faster? I'm guessing each of the 100 tasks are fighthing for a limited amount of CPU?
Is this a valid method to improve performance when looping methods?
Thank you in advance.

My hope is that I can use it to improve performance
Not really. Concurrency (parallel or asynchronous) can improve performance, but asynchronous code on its own is more about freeing up threads. Asynchronous code provides two main benefits:
Server-side apps get better scalability. By using fewer threads, asynchronous code can scale further and faster than synchronous code.
UI apps get better responsiveness. By freeing up the UI thread, asynchronous code provides a better user experience.
Neither of these have much to do with performance. E.g., when comparing a synchronous server-side request handler with its basic asynchronous counterpart, the asynchronous one is usually slower (slightly), but the server as a whole scales better.
That said, asynchronous code can enable natural concurrency (e.g., Task.WhenAll). And if you do end up adding concurrency, then you can see some performance benefits - sometimes quite drastic ones.
The test is done by looping over the method 100 times, first synchronously and then asynchronously.
Technically, you're comparing single-threaded and parallel code, here.
As Jeroen pointed out in the comments, Parallel or PLINQ are the proper tools to use if you have a parallel problem. While you can use Task.Run, it's a very low-level tool for parallel programming. Task.Run is more commonly used for "shove this one thing to the thread pool", not "shove these 100 things to the thread pool".
What makes the asynchronous loop faster than the synchronous loop, but not nearly 100 times faster? I'm guessing each of the 100 tasks are fighthing for a limited amount of CPU?
Yes. Most machines these days have multi-core CPUs, and each core can do one thing at a time. So if you have 4 or 8 cores, that's the limit of your parallelism.
Is this a valid method to improve performance when looping methods?
If you have a lot of work to do in parallel, then parallel programming is acceptable. Again, I recommend using higher-level constructs like Parallel or PLINQ, which have better built-in partitioning strategies and other optimizations, which will be more efficient than throwing a bunch of tasks at the thread pool.
Parallel programming does have its caveats:
If your work is too fine-grained, then parallel work can end up being slower, as the overhead of partitioning, queueing, and scheduling erases the gains from concurrency.
You generally want to avoid parallelism in some situations such as handling a request on the server side. It's just usually not a good idea to allow one request to consume all the CPU resources of the whole server.
Now, if you want to test out asynchronous benefits, then I recommend using an operation that is inherently asynchronous (e.g., a client web request). Say, if your code was hitting 100 URLs. That would be something more naturally asynchronous that doesn't require thread pool threads.

Related

CPU benchmark test: Tasks vs ThreadPool vs Thread

I posted another SO question here, and as a follow-up, my colleague did a test, seen below, as some form of "counter" to the argument for async/await/Tasks.
(I am aware that the lock on resultList isn't needed, disregard that)
I am aware that async/await and Tasks is not made to handle CPU-intensive tasks but instead handle I/O operations that are done by the OS. The benchmark below is a CPU-intensive task, so the test is flawed from start.
However, as I understand it, using new Task().Start() will schedule the operation on the ThreadPool and execute the test code on different threads on the ThreadPool. Wouldnt that mean that the first and second test are more or less the same? (I'm guessing not, please explain
Why then the big difference between them?
some form of "counter" to the argument for async/await/Tasks.
The posted code has absolutely nothing to do with async or await. It's comparing three different kinds of parallelism:
Dynamic Task Parallelism.
Direct threadpool access.
Manual multithreading with manual partitioning.
The first two are somewhat comparable. Of course, direct threadpool access will be faster than Dynamic Task Parallelism. But what these tests don't show is that direct threadpool access is much harder to do correctly. In particular, when you are running real-world code and need to handle exceptions and return values, you have to add in boilerplate code and object instances to the direct threadpool access code that slows it down.
The third one is not comparable at all. It just uses 10 manual threads. Again, this example ignores the additional complexity necessary in real-world code; specifically, the need to handle exceptions and return values. It also assumes a partition size, which is problematic; real-world code does not have that luxury. If you're managing your own set of threads, then you have to decide things like how quickly you should increase the number of threads when the queue has many items, and how quickly you should end threads when the queue is empty. These are all difficult questions that add lots of code to the #3 test before you're really comparing the same thing.
And that's not even to say anything about the cost of maintenance. In my experience (i.e., as an application developer), micro-optimizations are just not worth it. Even if you took the "worst" (#1) approach, you're losing about 7 microseconds per item. That is an unimaginably small amount of savings. As a general rule, developer time is far more valuable to your company than user time. If your users have to process a hundred thousand items, the difference would barely be perceptible. If you were to adopt the "best" (#3) approach, the code would be much less maintainable, particularly considering the boilerplate and thread management code necessary in production code and not shown here. Going with #3 would probably cost your company far more in terms of developer time just writing or reading the code than it would ever save in terms of user time.
Oh, and the funniest part of all this is that with all these different kinds of parallelism compared, they didn't even include the one that is most suitable for this test: PLINQ.
static void Main(string[] args)
{
TaskParallelLibrary();
ManualThreads();
Console.ReadKey();
}
static void ManualThreads()
{
var queue = new List<string>();
for (int i = 0; i != 1000000; ++i)
queue.Add("string" + i);
var resultList = new List<string>();
var stopwatch = Stopwatch.StartNew();
var counter = 0;
for (int i = 0; i != 10; ++i)
{
new Thread(() =>
{
while (true)
{
var t = "";
lock (queue)
{
if (counter >= queue.Count)
break;
t = queue[counter];
++counter;
}
t = t.Substring(0, 5);
string t2 = t.Substring(0, 2) + t;
lock (resultList)
resultList.Add(t2);
}
}).Start();
}
while (resultList.Count < queue.Count)
Thread.Sleep(1);
stopwatch.Stop();
Console.WriteLine($"Manual threads: Processed {resultList.Count} in {stopwatch.Elapsed}");
}
static void TaskParallelLibrary()
{
var queue = new List<string>();
for (int i = 0; i != 1000000; ++i)
queue.Add("string" + i);
var stopwatch = Stopwatch.StartNew();
var resultList = queue.AsParallel().Select(t =>
{
t = t.Substring(0, 5);
return t.Substring(0, 2) + t;
}).ToList();
stopwatch.Stop();
Console.WriteLine($"Parallel: Processed {resultList.Count} in {stopwatch.Elapsed}");
}
On my machine, after running this code several times, I find that the PLINQ code outperforms the Manual Threads by about 30%. Sample output on .NET Core 3.0 preview5-27626-15, built for Release, run standalone:
Parallel: Processed 1000000 in 00:00:00.3629408
Manual threads: Processed 1000000 in 00:00:00.5119985
And, of course, the PLINQ code is:
Shorter
More maintainable
More robust (handles exceptions and return types)
Less awkward (no need to poll for completion)
More portable (partitions based on number of processors)
More flexible (automatically adjusts the thread pool as necessary based on amount of work)

Handling Parallel Jobs/Threads

I'm trying to refactoring my project and now I'm trying to research for best ways to increase the application's performance.
Question 1. SpinLock vs Interlocked
To creating a counter, which way has better performance.
Interlocked.increament(ref counter)
Or
SpinLock _spinlock = new SpinLock()
bool lockTaken = false;
try
{
_spinlock.Enter(ref lockTaken);
counter = counter + 1;
}
finally
{
if (lockTaken) _spinlock.Exit(false);
}
And if we need to increment another counter, like counter2, should we declare another SpinLock object? or its enough to use another boolean object?
Question 2. Handling nested tasks or better replacement
In this current version of my application, I used tasks, adding each new task to an array and then used Task.WaitAll()
After a lot of research I just figured out that using Parallel.ForEach has better performance, But how can I control the number of current threads? I know I can specify a MaxDegreeOfParallelism in a ParallelOptions parameter, but the problem is here, every time crawl(url) method runs, It just create another limited number of threads, I mean if I set MaxDegree to 10, every time crawl(url) runs, another +10 will created, am I right?, so how can I prevent this? should I use semaphore and threads instead of Parallel? Or there is a better way?
public void Start() {
Parallel.Invoke(() => { crawl(url) } );
}
crawl(string url) {
var response = getresponse(url);
Parallel.foreach(response.links, ParallelOption, link => {
crawl(link);
});
}
Question 3. Notify when all Jobs (and nested jobs) finished.
And my last question is how can I understand when all my jobs has finished?
There a is a lot of misconceptions here, I'll point out just a few.
To creating a counter, which way has better performance.
They both do, depending on your exact situation
After a lot of research I just figured out that using Parallel.ForEach
has better performance
This is also very suspect, and actually just wrong. Once again it depends on what you want to do.
I know I can specify a MaxDegreeOfParallelism in a ParallelOptions
parameter, but the problem is here, every time crawl(url) method runs, It just create another limited number of threads
Once again this is wrong, this is your own implementation detail, and depends on how you do it. also TPL MaxDegreeOfParallelism is only a suggestion, it will only do what it thinks heuristically is best for you.
should I use semaphore and threads instead of Parallel? Or there is a
better way?
The answer is a resounding yes.
OK, let's have a look at what you are doing. You say you are making a crawler. A crawler, accesses the internet, each time you access the internet or a network resource or the file system you are (said simplistically) waiting around for an IO completion port callbacks. This is what's knows as an IO workload.
With IO Bound tasks we don't want to tie up the thread pool with threads waiting for IO completion ports. It's inefficient, you are using up valuable resources waiting for callback on threads that are effectively paused.
So for IO bound work, we don't want to spin up new tasks, and we don't want to use Parallel ForEach to wait around using up threads waiting for events to happen. The most appropriate modern pattern for IO bound tasks is the async and await pattern.
For CPU bound work (if you want to use as much CPU as you can) smash the thread pool, use TPL Parallel or as many tasks that is effective.
The async and await pattern works well with completion ports, because instead of waiting around idly for a callback it will give the threads back and allow them to be reused.
...
However what I suggest is using another approach, where you can take advantage of async and await and also control degrees of parallelisation. This enables you to be good to your thread pool, not using up resources waiting for callbacks, and allowing IO to be IO. I give you TPL DataFlow ActionBlock and TransformManyBlocks
This subject is a little above a simple working example, but I can assure you its an appropriate path for what you are doing. What I suggest is you have a look at the following links.
Stephen Cleary There Is No Thread
Stephen Cleary Introduction to Dataflow
Msdn Blogs Parallel Programming with .NET
Stephen Toub Going Deep Stephen Toub: Inside TPL Dataflow, In this he even talks about crawler examples.
Some random blog on dataflow and crawlers Tpl Dataflow walkthrough – Part 5
In Summary, there are many ways to do what you want to do, and there are many technologies. But the main thing is you have some very skewed ideas about parallel programming. You need to hit the books, hit the blogs, and start getting some really solid design principles from the ground up, and stop trying to figure this all out for your self by nit picking small bits of information.
I'd suggest looking at Microsoft's Reactive Framework for this. You can write your Crawl function like this:
public IObservable<Response> Crawl(string url)
{
return
from r in Observable.Start(() => GetResponse(url))
from l in r.Links.ToObservable()
from r2 in Crawl(l).StartWith(r)
select r2;
}
Then to call it try this:
IObservable<Response> crawls = Crawl("www.microsoft.com");
IDisposable subscription =
crawls
.Subscribe(
r => { /* process each response as it arrives */ },
() => { /* All crawls complete */ });
Done. It handles all the threading for you. Just NuGet "System.Reactive".

How to efficiently make 1000s of web requests as quickly as possible

I need to make 100,000s of lightweight (i.e. small Content-Length) web requests from a C# console app. What is the fastest way I can do this (i.e. have completed all the requests in the shortest possible time) and what best practices should I follow? I can't fire and forget because I need to capture the responses.
Presumably I'd want to use the async web requests methods, however I'm wondering what the impact of the overhead of storing all the Task continuations and marshalling would be.
Memory consumption is not an overall concern, the objective is speed.
Presumably I'd also want to make use of all the cores available.
So I can do something like this:
Parallel.ForEach(iterations, i =>
{
var response = await MakeRequest(i);
// do thing with response
});
but that won't make me any faster than just my number of cores.
I can do:
Parallel.ForEach(iterations, i =>
{
var response = MakeRequest(i);
response.GetAwaiter().OnCompleted(() =>
{
// do thing with response
});
});
but how do I keep my program running after the ForEach. Holding on to all the Tasks and WhenAlling them feels bloated, are there any existing patterns or helpers to have some kind of Task queue?
Is there any way to get any better, and how should I handle throttling/error detection? For instance, if the remote endpoint is slow to respond I don't want to continue spamming it.
I understand I also need to do:
ServicePointManager.DefaultConnectionLimit = int.MaxValue
Anything else necessary?
The Parallel class does not work with async loop bodies so you can't use it. Your loop body completes almost immediately and returns a task. There is no parallelism benefit here.
This is a very easy problem. Use one of the standard solutions for processing a series of items asynchronously with a given DOP (this one is good: http://blogs.msdn.com/b/pfxteam/archive/2012/03/05/10278165.aspx. Use the last piece of code).
You need to empirically determine the right DOP. Simply try different values. There is no theoretical way to derive the best value because it is dependent on many things.
The connection limit is the only limit that's in your way.
response.GetAwaiter().OnCompleted
Not sure what you tried to accomplish there... If you comment I'll explain the misunderstanding.
The operation you want to perform is
Call an I/O method
Process the result
You are correct that you should use an async version of the I/O method. What's more, you only need 1 thread to start all of the I/O operations. You will not benefit from parallelism here.
You will benefit from parallelism in the second part - processing the result, as this will be a CPU-bound operation. Luckily, async/await will do all the job for you. Console applications don't have a synchronization context. It means that the part of the method after an await will run on a thread pool thread, optimally utilizing all CPU cores.
private async Task MakeRequestAndProcessResult(int i)
{
var result = await MakeRequestAsync();
ProcessResult(result);
}
var tasks = iterations.Select(i => MakeRequestAndProcessResult(i)).ToArray();
To achieve the same behavior in an environment with a synchronization context (for example WPF or WinForms), use ConfigureAwait(false).
var result = await MakeRequestAsync().ConfigureAwait(false);
To wait for the tasks to complete, you can use await Task.WhenAll(tasks) inside an async method or Task.WaitAll(tasks) in Main().
Throwing 100k requests at a web service will probably kill it, so you will have to limit it. You can check answers to this question to find some options how to do it.
Parallel.ForEach should be able to use more threads than there are cores if you explicitly set the MaxDegreeOfParallelism property of the ParallelOptions parameter (in the overload of ForEach where there is that parameter) - see https://msdn.microsoft.com/en-us/library/system.threading.tasks.paralleloptions.maxdegreeofparallelism(v=vs.110).aspx
You should be able to set this on 1,000 to get it to use 1,000 threads or even more, but that might not be efficient due to the threading overheads. You may wish to experiment (eg. loop from eg. 100 to 1,000 stepping in 100s to try submitting 1,000 requests each time and time start to finish) or even set up some kind of self-tuning algorithm.

Parallel.For vs regular threads

I'm trying to understand why Parallel.For is able to outperform a number of threads in the following scenario: consider a batch of jobs that can be processed in parallel. While processing these jobs, new work may be added, which then needs to be processed as well. The Parallel.For solution would look as follows:
var jobs = new List<Job> { firstJob };
int startIdx = 0, endIdx = jobs.Count;
while (startIdx < endIdx) {
Parallel.For(startIdx, endIdx, i => WorkJob(jobs[i]));
startIdx = endIdx; endIdx = jobs.Count;
}
This means that there are multiple times where the Parallel.For needs to synchronize. Consider a bread-first graph algorithm algorithm; the number of synchronizations would be quite large. Waste of time, no?
Trying the same in the old-fashioned threading approach:
var queue = new ConcurrentQueue<Job> { firstJob };
var threads = new List<Thread>();
var waitHandle = new AutoResetEvent(false);
int numBusy = 0;
for (int i = 0; i < maxThreads; i++)
threads.Add(new Thread(new ThreadStart(delegate {
while (!queue.IsEmpty || numBusy > 0) {
if (queue.IsEmpty)
// numbusy > 0 implies more data may arrive
waitHandle.WaitOne();
Job job;
if (queue.TryDequeue(out job)) {
Interlocked.Increment(ref numBusy);
WorkJob(job); // WorkJob does a waitHandle.Set() when more work was found
Interlocked.Decrement(ref numBusy);
}
}
// others are possibly waiting for us to enable more work which won't happen
waitHandle.Set();
})));
threads.ForEach(t => t.Start());
threads.ForEach(t => t.Join());
The Parallel.For code is of course much cleaner, but what I cannot comprehend, it's even faster as well! Is the task scheduler just that good? The synchronizations were eleminated, there's no busy waiting, yet the threaded approach is consistently slower (for me). What's going on? Can the threading approach be made faster?
Edit: thanks for all the answers, I wish I could pick multiple ones. I chose to go with the one that also shows an actual possible improvement.
The two code samples are not really the same.
The Parallel.ForEach() will use a limited amount of threads and re-use them. The 2nd sample is already starting way behind by having to create a number of threads. That takes time.
And what is the value of maxThreads ? Very critical, in Parallel.ForEach() it is dynamic.
Is the task scheduler just that good?
It is pretty good. And TPL uses work-stealing and other adaptive technologies. You'll have a hard time to do any better.
Parallel.For doesn't actually break up the items into single units of work. It breaks up all the work (early on) based on the number of threads it plans to use and the number of iterations to be executed. Then has each thread synchronously process that batch (possibly using work stealing or saving some extra items to load-balance near the end). By using this approach the worker threads are virtually never waiting on each other, while your threads are constantly waiting on each other due to the heavy synchronization you're using before/after every single iteration.
On top of that since it's using thread pool threads many of the threads it needs are likely already created, which is another advantage in its favor.
As for synchronization, the entire point of a Parallel.For is that all of the iterations can be done in parallel, so there is almost no synchronization that needs to take place (at least in their code).
Then of course there is the issue of the number of threads. The threadpool has a lot of very good algorithms and heuristics to help it determine how many threads are need at that instant in time, based on the current hardware, the load from other applications, etc. It's possible that you're using too many, or not enough threads.
Also, since the number of items that you have isn't known before you start I would suggest using Parallel.ForEach rather than several Parallel.For loops. It is simply designed for the situation that you're in, so it's heuristics will apply better. (It also makes for even cleaner code.)
BlockingCollection<Job> queue = new BlockingCollection<Job>();
//add jobs to queue, possibly in another thread
//call queue.CompleteAdding() when there are no more jobs to run
Parallel.ForEach(queue.GetConsumingEnumerable(),
job => job.DoWork());
Your creating a bunch of new threads and the Parallel.For is using a Threadpool. You'll see better performance if you were utilizing the C# threadpool but there really is no point in doing that.
I would shy away from rolling out your own solution; if there is a corner case where you need customization use the TPL and customize..

C# No performance gain with Task?

I used Task like below but there is no performance gain. I checked my method which executes in 0-1 seconds but with Task(30 Tasks), it takes 5-12 seconds. Can anyone guide if I have done any mistake. I want to run 30 parallel and expect 30 done in max 2 seconds.
Here is my code:
Task[] tasks = new Task[30];
for (int p = 0; p <= dstable.Tables[0].Rows.Count - 1; p++)
{
MethodParameters newParameter = new MethodParameters();
newParameter.Name = dstable.Tables[0].Rows[p]["Name"].ToString();
tasks[p] = Task.Factory.StartNew(() => ParseUri(newParameter));
Application.DoEvents();
}
try
{
Task.WaitAll(tasks);
//Console.Write("task completed");
}
catch (AggregateException ae)
{
throw ae.Flatten();
}
There are some major problems in your thinking.
does your PC have 30 Cores, so that every core can exactly takes one task? I don't think so
starting a seperate thread also takes some time.
with every concurrent thread more overhead is generated.
Can your problem be solved faster by starting more threads? This is only the case, when all threads do different tasks, like reading from disk, quering a database, computing something, etc.. 10 threads that do all "high-performance" tasks on the cpu, won't give an boost, quite contrary to, because every thread needs to clean up his mess, before he can give some cpu time to the next thread, and that one needs to clean up his mess too.
Check this link out
http://msdn.microsoft.com/en-us/library/ms810437.aspx
You can use the TPL
http://msdn.microsoft.com/en-us/library/dd460717.aspx
they try to guaranty the maximum effect from parallel threads.
Also I recommend this book
http://www.amazon.com/The-Multiprocessor-Programming-Maurice-Herlihy/dp/0123705916
When you really want to solve your problem in under 2 seconds, buy more CPU power ;)
I think you may have missing the main point of using thread.
Creating many threads may lead to more complexity by increasing OS over-head. (Context switching)
Also it will be much harder to manage your execution of code and harder to find and fix bugs if there is any.
Usage of thread may give you advantage when there is
Task need to be perform simultaneously
Building responsive UI

Categories