I need to make 100,000s of lightweight (i.e. small Content-Length) web requests from a C# console app. What is the fastest way I can do this (i.e. have completed all the requests in the shortest possible time) and what best practices should I follow? I can't fire and forget because I need to capture the responses.
Presumably I'd want to use the async web requests methods, however I'm wondering what the impact of the overhead of storing all the Task continuations and marshalling would be.
Memory consumption is not an overall concern, the objective is speed.
Presumably I'd also want to make use of all the cores available.
So I can do something like this:
Parallel.ForEach(iterations, i =>
{
var response = await MakeRequest(i);
// do thing with response
});
but that won't make me any faster than just my number of cores.
I can do:
Parallel.ForEach(iterations, i =>
{
var response = MakeRequest(i);
response.GetAwaiter().OnCompleted(() =>
{
// do thing with response
});
});
but how do I keep my program running after the ForEach. Holding on to all the Tasks and WhenAlling them feels bloated, are there any existing patterns or helpers to have some kind of Task queue?
Is there any way to get any better, and how should I handle throttling/error detection? For instance, if the remote endpoint is slow to respond I don't want to continue spamming it.
I understand I also need to do:
ServicePointManager.DefaultConnectionLimit = int.MaxValue
Anything else necessary?
The Parallel class does not work with async loop bodies so you can't use it. Your loop body completes almost immediately and returns a task. There is no parallelism benefit here.
This is a very easy problem. Use one of the standard solutions for processing a series of items asynchronously with a given DOP (this one is good: http://blogs.msdn.com/b/pfxteam/archive/2012/03/05/10278165.aspx. Use the last piece of code).
You need to empirically determine the right DOP. Simply try different values. There is no theoretical way to derive the best value because it is dependent on many things.
The connection limit is the only limit that's in your way.
response.GetAwaiter().OnCompleted
Not sure what you tried to accomplish there... If you comment I'll explain the misunderstanding.
The operation you want to perform is
Call an I/O method
Process the result
You are correct that you should use an async version of the I/O method. What's more, you only need 1 thread to start all of the I/O operations. You will not benefit from parallelism here.
You will benefit from parallelism in the second part - processing the result, as this will be a CPU-bound operation. Luckily, async/await will do all the job for you. Console applications don't have a synchronization context. It means that the part of the method after an await will run on a thread pool thread, optimally utilizing all CPU cores.
private async Task MakeRequestAndProcessResult(int i)
{
var result = await MakeRequestAsync();
ProcessResult(result);
}
var tasks = iterations.Select(i => MakeRequestAndProcessResult(i)).ToArray();
To achieve the same behavior in an environment with a synchronization context (for example WPF or WinForms), use ConfigureAwait(false).
var result = await MakeRequestAsync().ConfigureAwait(false);
To wait for the tasks to complete, you can use await Task.WhenAll(tasks) inside an async method or Task.WaitAll(tasks) in Main().
Throwing 100k requests at a web service will probably kill it, so you will have to limit it. You can check answers to this question to find some options how to do it.
Parallel.ForEach should be able to use more threads than there are cores if you explicitly set the MaxDegreeOfParallelism property of the ParallelOptions parameter (in the overload of ForEach where there is that parameter) - see https://msdn.microsoft.com/en-us/library/system.threading.tasks.paralleloptions.maxdegreeofparallelism(v=vs.110).aspx
You should be able to set this on 1,000 to get it to use 1,000 threads or even more, but that might not be efficient due to the threading overheads. You may wish to experiment (eg. loop from eg. 100 to 1,000 stepping in 100s to try submitting 1,000 requests each time and time start to finish) or even set up some kind of self-tuning algorithm.
Related
I have an Async processing pipeline. I'm implementing a constraint such that I need to limit the number of submissions to the next stage. For my component, I have:
a single input source (items are tagged with a source id)
a single destination that I need to propagate the inputs to in a round-robin fashion
If capacity is available for multiple clients, I'll forward a message for each (i.e. if I wake because client 3's semaphore has finally become available, I may first send a message for client 2, then 3, etc)
The processing loop is thus waiting on one or more of the following conditions to continue processing:
more input has arrived (it might be for a client that is not at its limit)
capacity has been released for a client that we are holding data for
Ideally, I'd thus use Task.WhenAny with
a task representing the input c.Reader.WaitToReadAsync(ct).AsTask()
N tasks representing the clients for which we are holding data, but it's not yet valid for submission (the Wait for the SemaphoreSlim would fail)
SemaphoreSlim's AvailableWaitHandle would be ideal - I want to know when it's available but I don't want to reserve it yet as I have a chain of work to process - I just want to know if one of my trigger conditions has arisen
Is there a way to await the AvailableWaitHandle ?
My current approach is a hack derived from this answer to a similar question by #usr - posting for reference
My actual code is here - there's also some more detail about the whole problem in my self-answer below
I want to know when it's available but I don't want to reserve it yet as I have a chain of work to process
This is very strange and it seems like SemaphoreSlim may not be what you want to use. SemaphoreSlim is a kind of mutual exclusion object that can allow multiple takers. It is sometimes used for throttling. But I would not want to use it as a signal.
It seems like something more like an asynchronous manual-reset event would be what you really want. Or, if you wanted to maintain a locking/concurrent-collection kind of concept, an asynchronous monitor or condition variable.
That said, it is possible to use a SemaphoreSlim as a signal. I just strongly hesitate suggesting this as a solution, since it seems like this requirement is highlighting a mistake in the choice of synchronization primitive.
Is there a way to await the AvailableWaitHandle?
Yes. You can await anything by using TaskCompletionSource. For WaitHandles in particular, ThreadPool.RegisterWaitForSingleObject gives you an efficient wait.
So, what you want to do is create a TCS, register the handle with the thread pool, and complete the TCS in the callback for that handle. Keep in mind that you want to be sure that the TCS is eventually completed and that everything is disposed properly.
I have support for this in my AsyncEx library (WaitHandleAsyncFactory.FromWaitHandle); code is here.
My AsyncEx library also has support for asynchronous manual-reset events, monitors, and condition variables.
Variation of #usr's answer which solved my problem
class SemaphoreSlimExtensions
public static Task AwaitButReleaseAsync(this SemaphoreSlim s) =>
s.WaitAsync().ContinueWith(_t -> s.Release(), TaskContinuationOptions.ExecuteSynchronously);
public static bool TryTake(this SemaphoreSlim s) =>
s.Wait(0);
In my use case, the await is just a trigger for synchronous logic that then walks the full set - the TryTake helper is in my case a natural way to handle the conditional acquisition of the semaphore and the processing that's contingent on that. My wait looks like this:
SemaphoreSlim[] throttled = Enumerable.Empty();
while (!ct.IsCancellationRequested)
{
var throttledClients = from s in throttled select s.AwaitButReleaseAsync();
var timeout = 3000;
var otherConditions = new[] { input.Reader.WaitToReadAsync().ToTask(), Task.Delay(ct, timeout) };
await Task.WhenAny(throttledClients.Append(otherConditions));
throttled = propagateStuff();
}
The actual code is here - I have other cases that follow the same general pattern. The bottom line is that I want to separate the concern of waiting for the availability of capacity on a SemaphoreSlim from actually reserving that capacity.
I'm trying to refactoring my project and now I'm trying to research for best ways to increase the application's performance.
Question 1. SpinLock vs Interlocked
To creating a counter, which way has better performance.
Interlocked.increament(ref counter)
Or
SpinLock _spinlock = new SpinLock()
bool lockTaken = false;
try
{
_spinlock.Enter(ref lockTaken);
counter = counter + 1;
}
finally
{
if (lockTaken) _spinlock.Exit(false);
}
And if we need to increment another counter, like counter2, should we declare another SpinLock object? or its enough to use another boolean object?
Question 2. Handling nested tasks or better replacement
In this current version of my application, I used tasks, adding each new task to an array and then used Task.WaitAll()
After a lot of research I just figured out that using Parallel.ForEach has better performance, But how can I control the number of current threads? I know I can specify a MaxDegreeOfParallelism in a ParallelOptions parameter, but the problem is here, every time crawl(url) method runs, It just create another limited number of threads, I mean if I set MaxDegree to 10, every time crawl(url) runs, another +10 will created, am I right?, so how can I prevent this? should I use semaphore and threads instead of Parallel? Or there is a better way?
public void Start() {
Parallel.Invoke(() => { crawl(url) } );
}
crawl(string url) {
var response = getresponse(url);
Parallel.foreach(response.links, ParallelOption, link => {
crawl(link);
});
}
Question 3. Notify when all Jobs (and nested jobs) finished.
And my last question is how can I understand when all my jobs has finished?
There a is a lot of misconceptions here, I'll point out just a few.
To creating a counter, which way has better performance.
They both do, depending on your exact situation
After a lot of research I just figured out that using Parallel.ForEach
has better performance
This is also very suspect, and actually just wrong. Once again it depends on what you want to do.
I know I can specify a MaxDegreeOfParallelism in a ParallelOptions
parameter, but the problem is here, every time crawl(url) method runs, It just create another limited number of threads
Once again this is wrong, this is your own implementation detail, and depends on how you do it. also TPL MaxDegreeOfParallelism is only a suggestion, it will only do what it thinks heuristically is best for you.
should I use semaphore and threads instead of Parallel? Or there is a
better way?
The answer is a resounding yes.
OK, let's have a look at what you are doing. You say you are making a crawler. A crawler, accesses the internet, each time you access the internet or a network resource or the file system you are (said simplistically) waiting around for an IO completion port callbacks. This is what's knows as an IO workload.
With IO Bound tasks we don't want to tie up the thread pool with threads waiting for IO completion ports. It's inefficient, you are using up valuable resources waiting for callback on threads that are effectively paused.
So for IO bound work, we don't want to spin up new tasks, and we don't want to use Parallel ForEach to wait around using up threads waiting for events to happen. The most appropriate modern pattern for IO bound tasks is the async and await pattern.
For CPU bound work (if you want to use as much CPU as you can) smash the thread pool, use TPL Parallel or as many tasks that is effective.
The async and await pattern works well with completion ports, because instead of waiting around idly for a callback it will give the threads back and allow them to be reused.
...
However what I suggest is using another approach, where you can take advantage of async and await and also control degrees of parallelisation. This enables you to be good to your thread pool, not using up resources waiting for callbacks, and allowing IO to be IO. I give you TPL DataFlow ActionBlock and TransformManyBlocks
This subject is a little above a simple working example, but I can assure you its an appropriate path for what you are doing. What I suggest is you have a look at the following links.
Stephen Cleary There Is No Thread
Stephen Cleary Introduction to Dataflow
Msdn Blogs Parallel Programming with .NET
Stephen Toub Going Deep Stephen Toub: Inside TPL Dataflow, In this he even talks about crawler examples.
Some random blog on dataflow and crawlers Tpl Dataflow walkthrough – Part 5
In Summary, there are many ways to do what you want to do, and there are many technologies. But the main thing is you have some very skewed ideas about parallel programming. You need to hit the books, hit the blogs, and start getting some really solid design principles from the ground up, and stop trying to figure this all out for your self by nit picking small bits of information.
I'd suggest looking at Microsoft's Reactive Framework for this. You can write your Crawl function like this:
public IObservable<Response> Crawl(string url)
{
return
from r in Observable.Start(() => GetResponse(url))
from l in r.Links.ToObservable()
from r2 in Crawl(l).StartWith(r)
select r2;
}
Then to call it try this:
IObservable<Response> crawls = Crawl("www.microsoft.com");
IDisposable subscription =
crawls
.Subscribe(
r => { /* process each response as it arrives */ },
() => { /* All crawls complete */ });
Done. It handles all the threading for you. Just NuGet "System.Reactive".
So I just cant grasp the concept here.
I have a Method that uses the Parallel class with the Foreach method.
But the thing I dont understand is, does it create new threads so it can run the function faster?
Let's take this as an example.
I do a normal foreach loop.
private static void DoSimpleWork()
{
foreach (var item in collection)
{
//DoWork();
}
}
What that will do is, it will take the first item in the list, assign the method DoWork(); to it and wait until it finishes. Simple, plain and works.
Now.. There are three cases I am curious about
If I do this.
Parallel.ForEach(stringList, simpleString =>
{
DoMagic(simpleString);
});
Will that split up the Foreach into let's say 4 chunks?
So what I think is happening is that it takes the first 4 lines in the list, assigns each string to each "thread" (assuming parallel creates 4 virtual threads) does the work and then starts with the next 4 in that list?
If that is wrong please correct me I really want to understand how this works.
And then we have this.
Which essentially is the same but with a new parameter
Parallel.ForEach(stringList, new ParallelOptions() { MaxDegreeOfParallelism = 32 }, simpleString =>
{
DoMagic(simpleString);
});
What I am curious about is this
new ParallelOptions() { MaxDegreeOfParallelism = 32 }
Does that mean it will take the first 32 strings from that list (if there even is that many in the list) and then do the same thing as I was talking about above?
And for the last one.
Task.Factory.StartNew(() =>
{
Parallel.ForEach(stringList, simpleString =>
{
DoMagic(simpleString);
});
});
Would that create a new task, assigning each "chunk" to it's own task?
Do not mix async code with parallel. Task is for async operations - querying a DB, reading file, awaiting some comparatively-computation-cheap operation such that your UI won't be blocked and unresponsive.
Parallel is different. That's designed for 1) multi-core systems and 2) computational-intensive operations. I won't go in details how it works, that kind of info could be found in an MS documentation. Long story short, Parallel.For most probably will make it's own decision on what exactly when and how to run. It might disobey you parameters, i.e. MaxDegreeOfParallelism or somewhat else. The whole idea is to provide the best possible parallezation, thus complete your operation as fast as possible.
Parallel.ForEach perform the equivalent of a C# foreach loop, but with each iteration executing in parallel instead of sequentially. There is no sequencing, it depends on whether the OS can find an available thread, if there is it will execute
MaxDegreeOfParallelism
By default, For and ForEach will utilize as many threads as the OS provides, so changing MaxDegreeOfParallelism from the default only limits how many concurrent tasks will be used by the application.
You do not need to modify this parameter in general but may choose to change it in advanced scenarios:
When you know that a particular algorithm you're using won't scale
beyond a certain number of cores. You can set the property to avoid
wasting cycles on additional cores.
When you're running multiple algorithms concurrently and want to
manually define how much of the system each algorithm can utilize.
When the thread pool's heuristics is unable to determine the right
number of threads to use and could end up injecting too many
threads. e.g. in long-running loop body iterations, the
thread pool might not be able to tell the difference between
reasonable progress or livelock or deadlock, and might not be able
to reclaim threads that were added to improve performance. You can set the property to ensure that you don't use more than a reasonable number of threads.
Task.StartNew is usually used when you require fine-grained control for a long-running, compute-bound task, and like what #Сергей Боголюбов mentioned, do not mix them up
It creates a new task, and that task will create threads asynchronously to run the for loop
You may find this ebook useful: http://www.albahari.com/threading/#_Introduction
does the work and then starts with the next 4 in that list?
This depends on your machine's hardware and how busy the machine's cores are with other processes/apps your CPU is working on
Does that mean it will take the first 32 strings from that list (if there even if that many in the list) and then do the same thing as I was talking about above?
No, there's is no guarantee that it will take first 32, could be less. It will vary each time you execute the same code
Task.Factory.StartNew creates a new tasks but it will not create a new one for each chunk as you expect.
Putting a Parallel.ForEach inside a new Task will not help you further reduce the time taken for the parallel tasks themselves.
I am new to threading and I need a clarification for the below scenario.
I am working on apple push notification services. My application demands to send notifications to 30k users when a new deal is added to the website.
can I split the 30k users into lists, each list containing 1000 users and start multiple threads or can use task?
Is the following way efficient?
if (lstDevice.Count > 0)
{
for (int i = 0; i < lstDevice.Count; i += 2)
{
splitList.Add(lstDevice.Skip(i).Take(2).ToList<DeviceHelper>());
}
var tasks = new Task[splitList.Count];
int count=0;
foreach (List<DeviceHelper> lst in splitList)
{
tasks[count] = Task.Factory.StartNew(() =>
{
QueueNotifications(lst, pMessage, pSubject, pNotificationType, push);
},
TaskCreationOptions.None);
count++;
}
QueueNotification method will just loop through each list item and creates a payload like
foreach (DeviceHelper device in splitList)
{
if (device.PlatformType.ToLower() == "ios")
{
push.QueueNotification(new AppleNotification()
.ForDeviceToken(device.DeviceToken)
.WithAlert(pMessage)
.WithBadge(device.Badge)
);
Console.Write("Waiting for Queue to Finish...");
}
}
push.StopAllServices(true);
Technically it is sure possible to split a list and then start threads that runs your List in parallel. You can also implement everything yourself, as you already have done, but this isn't a good approach. At first splitting a List into chunks that gets processed in parallel is already what Parallel.For or Parallel.ForEach does. There is no need to re-implement everything yourself.
Now, you constantly ask if something can run 300 or 500 notifications in parallel. But actually this is not a good question because you completly miss the point of running something in parallel.
So, let me explain you why that question is not good. At first, you should ask yourself why do you want to run something in parallel? The answer to that is, you want that something runs faster by using multiple CPU-cores.
Now your simple idea is probably that spawning 300 or 500 threads is faster, because you have more threads and it runs more things "in parallel". But that is not exactly the case.
At first, creating a thread is not "free". Every thread you create has some overhead, it takes some CPU-time to create a thread, and also it needs some memory. On top of that, if you create 300 threads it doesn't mean 300 threads run in parallel. If you have for example an 8 core CPU only 8 threads really can run in parallel. Creating more threads can even hurt your performance. Because now your program needs to switch constanlty between threads, that also cost CPU-performance.
The result of all that is. If you have something lightweight some small code that don't do a lot of computation it ends that creating a lot of threads will slow down your application instead of running faster, because the managing of your threads creates more overhead than running it on (for example) 8 cpu-cores.
That means, if you have a list of 30,000 of somewhat. It usally end that it is faster to just split your list in 8 chunks and work through your list in 8 threads as creating 300 Threads.
Your goal should never be: Can it run xxx things in parallel?
The question should be like: How many threads do i need, and how much items should every thread process to get my work as fastest done.
That is an important difference because just spawning more threads doesn't mean something ends up beeing fast.
So how many threads do you need, and how many items should every thread process? Well, you can write a lot of code to test it. But the amount changes from hardware to hardware. A PC with just 4 cores have another optimum than a system with 8 cores. If what you are doing is IO bound (for example read/write to disk/network) you also don't get more speed by increasing your threads.
So what you now can do is test everything, try to get the correct thread number and do a lot of benchmarking to find the best numbers.
But actually, that is the whole purpose of the TPL library with the Task<T> class. The Task<T> class already looks at your computer how many cpu-cores it have. And when you are running your Task it automatically tries to create as much threads needed to get the maximum out of your system.
So my suggestion is that you should use the TPL library with the Task<T> class. In my opinion you should never create Threads directly yourself or doing partition yourself, because all of that is already done in TPL.
I think the Task-Class is a good choise for your aim, becuase you have an easy handling over the async process and don't have to deal with Threads directly.
Maybe this help: Task vs Thread differences
But to give you a better answer, you should improve your question an give us more details.
You should be careful with creating to much parallel threads, because this can slow down your application. Read this nice article from SO: How many threads is too many?. The best thing is you make it configurable and than test some values.
I agree Task is a good choice however creating too many tasks also bring risks to your system and for failures, your decision is also a factor to come up a solution. For me I prefer MSQueue combining with thread pool.
If you want parallelize the creation of the push notifications and maximize the performance by using all CPU's on the computer you should use Parallel.ForEach:
Parallel.ForEach(
devices,
device => {
if (device.PlatformType.ToUpperInvariant() == "IOS") {
push.QueueNotification(
new AppleNotification()
.ForDeviceToken(device.DeviceToken)
.WithAlert(message)
.WithBadge(device.Badge)
);
}
}
);
push.StopAllServices(true);
This assumes that calling push.QueueNotification is thread-safe. Also, if this call locks a shared resource you may see lower than expected performance because of lock contention.
To avoid this lock contention you may be able to create a separate queue for each partition that Parallel.ForEach creates. I am improvising a bit here because some details are missing from the question. I assume that the variable push is an instance of the type Push:
Parallel.ForEach(
devices,
() => new Push(),
(device, _, push) => {
if (device.PlatformType.ToUpperInvariant() == "IOS") {
push.QueueNotification(
new AppleNotification()
.ForDeviceToken(device.DeviceToken)
.WithAlert(message)
.WithBadge(device.Badge)
);
}
return push;
},
push.StopAllServices(true);
);
This will create a separate Push instance for each partition that Parallel.ForEach creates and when the partition is complete it will call StopAllServices on the instance.
This approach should perform no worse than splitting the devices into N lists where N is the number of CPU's and and starting either N threads or N tasks to process each list. If one thread or task "gets behind" the total execution time will be the execution time of this "slow" thread or task. With Parallel.ForEach all CPU's are used until all devices have been processed.
I have a helper method returns IEnumerable<string>. As the collection grows, it's slowing down dramatically. My current approach is to do essentially the following:
var results = new List<string>();
foreach (var item in items)
{
results.Add(await item.Fetch());
}
I'm not actually sure whether this asynchronicity gives me any benefit (it sure doesn't seem like it), but all methods up the stack and to my controller's actions are asynchronous:
public async Task<IHttpActionResult> FetchAllItems()
As this code is ultimately used by my API, I'd really like to parallelize these all for what I hope would be great speedup. I've tried .AsParallel:
var results = items
.AsParallel()
.Select(i => i.Fetch().Result)
.AsList();
return results;
And .WhenAll (returning a string[]):
var tasks = items.Select(i => i.Fetch());
return Task<string>.WhenAll<string>(tasks).Result;
And a last-ditch effort of firing off all long-running jobs and sequentially awaiting them (hoping that they were all running in parallel, so waiting on one would let all others nearly complete):
var tasks = new LinkedList<Task<string>>();
foreach (var item in items)
tasks.AddLast(item.Fetch());
var results = new LinkedList<string>();
foreach (var task in tasks)
results.AddLast(task.Result);
In every test case, the time it takes to run is directly proportional to the number of items. There's no discernable speedup by doing this. What am I missing in using Tasks and await/async?
There's a difference between parallel and concurrent. Concurrency just means doing more than one thing at a time, whereas parallel means doing more than one thing on multiple threads. async is great for concurrency, but doesn't (directly) help you with parallelism.
As a general rule, parallelism on ASP.NET should be avoided. This is because any parallel work you do (i.e., AsParallel, Parallel.ForEach, etc) shares the same thread pool as ASP.NET, so that reduces ASP.NET's capability to handle other requests. This impacts the scalability of your web service. It's best to leave the thread pool to ASP.NET.
However, concurrency is just fine - specifically, asynchronous concurrency. This is where Task.WhenAll comes in. Code like this is what you should be looking for (note that there is no call to Task<T>.Result):
var tasks = items.Select(i => i.Fetch());
return await Task<string>.WhenAll<string>(tasks);
Given your other code samples, it would be good to run through your call tree starting at Fetch and replace all Result calls with await. This may be (part of) your problem, because Result forces synchronous execution.
Another possible problem is that the underlying resource being fetched does not support concurrent access, or there may be throttling that you're not aware of. E.g., if Fetch retrieves data from another web service, check out System.Net.ServicePointManager.DefaultConnectionLimit.
There is also a configurable limitation on the max connections to a single server that can make download performance independent to the number of client threads.
To change the connection limit use
ServicePointManager.DefaultConnectionLimit
Maximum concurrent requests for WebClient, HttpWebRequest, and HttpClient