Handling Parallel Jobs/Threads

Handling Parallel Jobs/Threads - c#

I'm trying to refactoring my project and now I'm trying to research for best ways to increase the application's performance.
Question 1. SpinLock vs Interlocked
To creating a counter, which way has better performance.
Interlocked.increament(ref counter)
Or
SpinLock _spinlock = new SpinLock()
bool lockTaken = false;
try
{
_spinlock.Enter(ref lockTaken);
counter = counter + 1;
}
finally
{
if (lockTaken) _spinlock.Exit(false);
}
And if we need to increment another counter, like counter2, should we declare another SpinLock object? or its enough to use another boolean object?
Question 2. Handling nested tasks or better replacement
In this current version of my application, I used tasks, adding each new task to an array and then used Task.WaitAll()
After a lot of research I just figured out that using Parallel.ForEach has better performance, But how can I control the number of current threads? I know I can specify a MaxDegreeOfParallelism in a ParallelOptions parameter, but the problem is here, every time crawl(url) method runs, It just create another limited number of threads, I mean if I set MaxDegree to 10, every time crawl(url) runs, another +10 will created, am I right?, so how can I prevent this? should I use semaphore and threads instead of Parallel? Or there is a better way?
public void Start() {
Parallel.Invoke(() => { crawl(url) } );
}
crawl(string url) {
var response = getresponse(url);
Parallel.foreach(response.links, ParallelOption, link => {
crawl(link);
});
}
Question 3. Notify when all Jobs (and nested jobs) finished.
And my last question is how can I understand when all my jobs has finished?

There a is a lot of misconceptions here, I'll point out just a few.
To creating a counter, which way has better performance.
They both do, depending on your exact situation
After a lot of research I just figured out that using Parallel.ForEach
has better performance
This is also very suspect, and actually just wrong. Once again it depends on what you want to do.
I know I can specify a MaxDegreeOfParallelism in a ParallelOptions
parameter, but the problem is here, every time crawl(url) method runs, It just create another limited number of threads
Once again this is wrong, this is your own implementation detail, and depends on how you do it. also TPL MaxDegreeOfParallelism is only a suggestion, it will only do what it thinks heuristically is best for you.
should I use semaphore and threads instead of Parallel? Or there is a
better way?
The answer is a resounding yes.
OK, let's have a look at what you are doing. You say you are making a crawler. A crawler, accesses the internet, each time you access the internet or a network resource or the file system you are (said simplistically) waiting around for an IO completion port callbacks. This is what's knows as an IO workload.
With IO Bound tasks we don't want to tie up the thread pool with threads waiting for IO completion ports. It's inefficient, you are using up valuable resources waiting for callback on threads that are effectively paused.
So for IO bound work, we don't want to spin up new tasks, and we don't want to use Parallel ForEach to wait around using up threads waiting for events to happen. The most appropriate modern pattern for IO bound tasks is the async and await pattern.
For CPU bound work (if you want to use as much CPU as you can) smash the thread pool, use TPL Parallel or as many tasks that is effective.
The async and await pattern works well with completion ports, because instead of waiting around idly for a callback it will give the threads back and allow them to be reused.
...
However what I suggest is using another approach, where you can take advantage of async and await and also control degrees of parallelisation. This enables you to be good to your thread pool, not using up resources waiting for callbacks, and allowing IO to be IO. I give you TPL DataFlow ActionBlock and TransformManyBlocks
This subject is a little above a simple working example, but I can assure you its an appropriate path for what you are doing. What I suggest is you have a look at the following links.
Stephen Cleary There Is No Thread
Stephen Cleary Introduction to Dataflow
Msdn Blogs Parallel Programming with .NET
Stephen Toub Going Deep Stephen Toub: Inside TPL Dataflow, In this he even talks about crawler examples.
Some random blog on dataflow and crawlers Tpl Dataflow walkthrough – Part 5
In Summary, there are many ways to do what you want to do, and there are many technologies. But the main thing is you have some very skewed ideas about parallel programming. You need to hit the books, hit the blogs, and start getting some really solid design principles from the ground up, and stop trying to figure this all out for your self by nit picking small bits of information.

I'd suggest looking at Microsoft's Reactive Framework for this. You can write your Crawl function like this:
public IObservable<Response> Crawl(string url)
{
return
from r in Observable.Start(() => GetResponse(url))
from l in r.Links.ToObservable()
from r2 in Crawl(l).StartWith(r)
select r2;
}
Then to call it try this:
IObservable<Response> crawls = Crawl("www.microsoft.com");
IDisposable subscription =
crawls
.Subscribe(
r => { /* process each response as it arrives */ },
() => { /* All crawls complete */ });
Done. It handles all the threading for you. Just NuGet "System.Reactive".

Related

How to store results from tasks running in threadpool?

I have a problem with a threadpool efficiency. I'm not sure I understand the whole concept. I did a lot of reading before asking that question and I know that threadpool is a good solution if you have a lot of small, relatively quick functions AND what's more important - non-blocking tasks. Using lock is very bad in threadpool.
And here is my question: How to return values from threadpool functions? If you have functions to run they probably produce some results, right? It's good to store those results somewhere. Where?
I'm running c.a. 200k very quick functions in a threadpool. The results I store in the List. Of course I have to do:
lock(lockobj)
{
myList.Add(result);
}
So, is this the right way? I mean, if your functions returns SOMETHING, you have to store them in some kind of collection. It has to be a blocking collection. So, I started thinking... "Blocking is very bead in threadpool, but you have to do this, at least once - at the end of every function
How to store/return results from functions running in threadpool?
Thanks!
JB
EDIT: By "function" I mean...
ThreadPool.QueueUserWorkItem(state =>
{
Result r = function(); // previously named "Task"
lock(lockobj)
{
allResults.Add(r);
}
}

If you don't want to block the ThreadPool threads use a lock-free approach. ConcurrentQueue is currently lock-free (as of .NET 4.6.2) when you enqueue items.
So simply do this:
public static ConcurrentQueue<Result> AllResults { get; } = new ConcurrentQueue<Result>();
ThreadPool.QueueUserWorkItem(state =>
{
Result r = function();
AllResults.Enqueue(r);
}
This will guarantee you don't block ThreadPool threads.

Any kind of collection that is thread safe/synchronized will do. There are plenty in .net framework.
You can also use volatile variables to store data between multiple threads - but this is usually considered a bad practice.
Another approach can be to schedule those operations on tasks that can produce results, they run by default on the thread pool and you can get the return values by awaiting the methods and checking the Result of the Task that is returned.
Finally you can write your own code in order to synchronize access to certain regions of code/variables etc using stuff like lock, semaphores, mutex etc

How to efficiently make 1000s of web requests as quickly as possible

I need to make 100,000s of lightweight (i.e. small Content-Length) web requests from a C# console app. What is the fastest way I can do this (i.e. have completed all the requests in the shortest possible time) and what best practices should I follow? I can't fire and forget because I need to capture the responses.
Presumably I'd want to use the async web requests methods, however I'm wondering what the impact of the overhead of storing all the Task continuations and marshalling would be.
Memory consumption is not an overall concern, the objective is speed.
Presumably I'd also want to make use of all the cores available.
So I can do something like this:
Parallel.ForEach(iterations, i =>
{
var response = await MakeRequest(i);
// do thing with response
});
but that won't make me any faster than just my number of cores.
I can do:
Parallel.ForEach(iterations, i =>
{
var response = MakeRequest(i);
response.GetAwaiter().OnCompleted(() =>
{
// do thing with response
});
});
but how do I keep my program running after the ForEach. Holding on to all the Tasks and WhenAlling them feels bloated, are there any existing patterns or helpers to have some kind of Task queue?
Is there any way to get any better, and how should I handle throttling/error detection? For instance, if the remote endpoint is slow to respond I don't want to continue spamming it.
I understand I also need to do:
ServicePointManager.DefaultConnectionLimit = int.MaxValue
Anything else necessary?

The Parallel class does not work with async loop bodies so you can't use it. Your loop body completes almost immediately and returns a task. There is no parallelism benefit here.
This is a very easy problem. Use one of the standard solutions for processing a series of items asynchronously with a given DOP (this one is good: http://blogs.msdn.com/b/pfxteam/archive/2012/03/05/10278165.aspx. Use the last piece of code).
You need to empirically determine the right DOP. Simply try different values. There is no theoretical way to derive the best value because it is dependent on many things.
The connection limit is the only limit that's in your way.
response.GetAwaiter().OnCompleted
Not sure what you tried to accomplish there... If you comment I'll explain the misunderstanding.

The operation you want to perform is
Call an I/O method
Process the result
You are correct that you should use an async version of the I/O method. What's more, you only need 1 thread to start all of the I/O operations. You will not benefit from parallelism here.
You will benefit from parallelism in the second part - processing the result, as this will be a CPU-bound operation. Luckily, async/await will do all the job for you. Console applications don't have a synchronization context. It means that the part of the method after an await will run on a thread pool thread, optimally utilizing all CPU cores.
private async Task MakeRequestAndProcessResult(int i)
{
var result = await MakeRequestAsync();
ProcessResult(result);
}
var tasks = iterations.Select(i => MakeRequestAndProcessResult(i)).ToArray();
To achieve the same behavior in an environment with a synchronization context (for example WPF or WinForms), use ConfigureAwait(false).
var result = await MakeRequestAsync().ConfigureAwait(false);
To wait for the tasks to complete, you can use await Task.WhenAll(tasks) inside an async method or Task.WaitAll(tasks) in Main().
Throwing 100k requests at a web service will probably kill it, so you will have to limit it. You can check answers to this question to find some options how to do it.

Parallel.ForEach should be able to use more threads than there are cores if you explicitly set the MaxDegreeOfParallelism property of the ParallelOptions parameter (in the overload of ForEach where there is that parameter) - see https://msdn.microsoft.com/en-us/library/system.threading.tasks.paralleloptions.maxdegreeofparallelism(v=vs.110).aspx
You should be able to set this on 1,000 to get it to use 1,000 threads or even more, but that might not be efficient due to the threading overheads. You may wish to experiment (eg. loop from eg. 100 to 1,000 stepping in 100s to try submitting 1,000 requests each time and time start to finish) or even set up some kind of self-tuning algorithm.

TPL vs Multithreading

I am new to threading and I need a clarification for the below scenario.
I am working on apple push notification services. My application demands to send notifications to 30k users when a new deal is added to the website.
can I split the 30k users into lists, each list containing 1000 users and start multiple threads or can use task?
Is the following way efficient?
if (lstDevice.Count > 0)
{
for (int i = 0; i < lstDevice.Count; i += 2)
{
splitList.Add(lstDevice.Skip(i).Take(2).ToList<DeviceHelper>());
}
var tasks = new Task[splitList.Count];
int count=0;
foreach (List<DeviceHelper> lst in splitList)
{
tasks[count] = Task.Factory.StartNew(() =>
{
QueueNotifications(lst, pMessage, pSubject, pNotificationType, push);
},
TaskCreationOptions.None);
count++;
}
QueueNotification method will just loop through each list item and creates a payload like
foreach (DeviceHelper device in splitList)
{
if (device.PlatformType.ToLower() == "ios")
{
push.QueueNotification(new AppleNotification()
.ForDeviceToken(device.DeviceToken)
.WithAlert(pMessage)
.WithBadge(device.Badge)
);
Console.Write("Waiting for Queue to Finish...");
}
}
push.StopAllServices(true);

Technically it is sure possible to split a list and then start threads that runs your List in parallel. You can also implement everything yourself, as you already have done, but this isn't a good approach. At first splitting a List into chunks that gets processed in parallel is already what Parallel.For or Parallel.ForEach does. There is no need to re-implement everything yourself.
Now, you constantly ask if something can run 300 or 500 notifications in parallel. But actually this is not a good question because you completly miss the point of running something in parallel.
So, let me explain you why that question is not good. At first, you should ask yourself why do you want to run something in parallel? The answer to that is, you want that something runs faster by using multiple CPU-cores.
Now your simple idea is probably that spawning 300 or 500 threads is faster, because you have more threads and it runs more things "in parallel". But that is not exactly the case.
At first, creating a thread is not "free". Every thread you create has some overhead, it takes some CPU-time to create a thread, and also it needs some memory. On top of that, if you create 300 threads it doesn't mean 300 threads run in parallel. If you have for example an 8 core CPU only 8 threads really can run in parallel. Creating more threads can even hurt your performance. Because now your program needs to switch constanlty between threads, that also cost CPU-performance.
The result of all that is. If you have something lightweight some small code that don't do a lot of computation it ends that creating a lot of threads will slow down your application instead of running faster, because the managing of your threads creates more overhead than running it on (for example) 8 cpu-cores.
That means, if you have a list of 30,000 of somewhat. It usally end that it is faster to just split your list in 8 chunks and work through your list in 8 threads as creating 300 Threads.
Your goal should never be: Can it run xxx things in parallel?
The question should be like: How many threads do i need, and how much items should every thread process to get my work as fastest done.
That is an important difference because just spawning more threads doesn't mean something ends up beeing fast.
So how many threads do you need, and how many items should every thread process? Well, you can write a lot of code to test it. But the amount changes from hardware to hardware. A PC with just 4 cores have another optimum than a system with 8 cores. If what you are doing is IO bound (for example read/write to disk/network) you also don't get more speed by increasing your threads.
So what you now can do is test everything, try to get the correct thread number and do a lot of benchmarking to find the best numbers.
But actually, that is the whole purpose of the TPL library with the Task<T> class. The Task<T> class already looks at your computer how many cpu-cores it have. And when you are running your Task it automatically tries to create as much threads needed to get the maximum out of your system.
So my suggestion is that you should use the TPL library with the Task<T> class. In my opinion you should never create Threads directly yourself or doing partition yourself, because all of that is already done in TPL.

I think the Task-Class is a good choise for your aim, becuase you have an easy handling over the async process and don't have to deal with Threads directly.
Maybe this help: Task vs Thread differences
But to give you a better answer, you should improve your question an give us more details.
You should be careful with creating to much parallel threads, because this can slow down your application. Read this nice article from SO: How many threads is too many?. The best thing is you make it configurable and than test some values.

I agree Task is a good choice however creating too many tasks also bring risks to your system and for failures, your decision is also a factor to come up a solution. For me I prefer MSQueue combining with thread pool.

If you want parallelize the creation of the push notifications and maximize the performance by using all CPU's on the computer you should use Parallel.ForEach:
Parallel.ForEach(
devices,
device => {
if (device.PlatformType.ToUpperInvariant() == "IOS") {
push.QueueNotification(
new AppleNotification()
.ForDeviceToken(device.DeviceToken)
.WithAlert(message)
.WithBadge(device.Badge)
);
}
}
);
push.StopAllServices(true);
This assumes that calling push.QueueNotification is thread-safe. Also, if this call locks a shared resource you may see lower than expected performance because of lock contention.
To avoid this lock contention you may be able to create a separate queue for each partition that Parallel.ForEach creates. I am improvising a bit here because some details are missing from the question. I assume that the variable push is an instance of the type Push:
Parallel.ForEach(
devices,
() => new Push(),
(device, _, push) => {
if (device.PlatformType.ToUpperInvariant() == "IOS") {
push.QueueNotification(
new AppleNotification()
.ForDeviceToken(device.DeviceToken)
.WithAlert(message)
.WithBadge(device.Badge)
);
}
return push;
},
push.StopAllServices(true);
);
This will create a separate Push instance for each partition that Parallel.ForEach creates and when the partition is complete it will call StopAllServices on the instance.
This approach should perform no worse than splitting the devices into N lists where N is the number of CPU's and and starting either N threads or N tasks to process each list. If one thread or task "gets behind" the total execution time will be the execution time of this "slow" thread or task. With Parallel.ForEach all CPU's are used until all devices have been processed.

C# .net For() Step?

I have a function, that processes a list of 6100 list items. The code used to work when the list was just 300 items. But instantly crashes with 6100. Is there a way I can loop through these 6100 items say 30 at a time and execute a new thread per item?
for (var i = 0; i < ListProxies.Items.Count; i++)
{
var s = ListProxies.Items[i] as string;
var thread = new ParameterizedThreadStart(ProxyTest.IsAlive);
var doIt = new Thread(thread) { Name = "CheckProxy# " + i };
doIt.Start(s);
}
Any help would be greatly appreciated.

Do you really need to spawn a new thread for each work item? Unless there is a genuine need for this (if so, please tell us why), I would strongly recommend you use the Managed Thread Pool instead. This will give you the concurrency benefits you require, but without the resource requirements (as well as the creation, destruction and massive context-switching costs) of running thousands of threads. If you are on .NET 4.0, you might also want to consider using the Task Parallel Library.
For example:
for (var i = 0; i < ListProxies.Items.Count; i++)
{
var s = ListProxies.Items[i] as string;
ThreadPool.QueueUserWorkItem(ProxyTest.IsAlive, s);
}
On another note, I would seriously consider renaming the IsAlive method (which looks like a boolean property or method) since:
It clearly has a void IsAlive(object) signature.
It has observable side-effects (from your comment that it "increment a progress bar and add a 'working' proxy to a new list").

There is a limit on the number of threads you can spawn. 6100 threads does seem quite a bit excesive.
I agree win Ani, you should look into a ThreadPool or even a Producer / Consumer process depending on what you are trying to accomplish.
There are quite a few processes for handling multi threaded applications but without knowing what you are doing in the start there really is no way to recommend any approach other than a ThreadPool or Producer / Consumer process (Queues with SyncEvents).
At any rate you really should try to keep the number of threads to a minimum otherwise you run the risk of thread locks, spin locks, wait locks, dead locks, race conditions, who knows what, etc...
If you want good information on threading with C# check out the book Concurrent Programming on Windows By Joe Duffy it is really helpful.

C# Improvement on a Fire-and-Forget

Greetings
I have a program that creates multiples instances of a class, runs the same long-running Update method on all instances and waits for completion. I'm following Kev's approach from this question of adding the Update to ThreadPool.QueueUserWorkItem.
In the main prog., I'm sleeping for a few minutes and checking a Boolean in the last child to see if done
while(!child[child.Length-1].isFinished) {
Thread.Sleep(...);
}
This solution is working the way I want, but is there a better way to do this? Both for the independent instances and checking if all work is done.
Thanks
UPDATE:
There doesn't need to be locking. The different instances each have a different web service url they request from, and do similar work on the response. They're all doing their own thing.

If you know the number of operations that will be performed, use a countdown and an event:
Activity[] activities = GetActivities();
int remaining = activities.Length;
using (ManualResetEvent finishedEvent = new ManualResetEvent(false))
{
foreach (Activity activity in activities)
{
ThreadPool.QueueUserWorkItem(s =>
{
activity.Run();
if (Interlocked.Decrement(ref remaining) == 0)
finishedEvent.Set();
});
}
finishedEvent.WaitOne();
}
Don't poll for completion. The .NET Framework (and the Windows OS in general) has a number of threading primitives specifically designed to prevent the need for spinlocks, and a polling loop with Sleep is really just a slow spinlock.

You can try Semaphore.

A blocking way of waiting is a bit more elegant than polling. See the Monitor.Wait/Monitor.Pulse (Semaphore works ok too) for a simple way to block and signal. C# has some syntactic sugar around the Monitor class in the form of the lock keyword.

This doesn't look good. There is almost never a valid reason to assume that when the last thread is completed that the other ones are done as well. Unless you somehow interlock the worker threads, which you should never do. It also makes little sense to Sleep(), waiting for a thread to complete. You might as well do the work that thread is doing.
If you've got multiple threads going, give them each a ManualResetEvent. You can wait on completion with WaitHandle.WaitAll(). Counting down a thread counter with the Interlocked class can work too. Or use a CountdownLatch.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.