C# No performance gain with Task? - c#

I used Task like below but there is no performance gain. I checked my method which executes in 0-1 seconds but with Task(30 Tasks), it takes 5-12 seconds. Can anyone guide if I have done any mistake. I want to run 30 parallel and expect 30 done in max 2 seconds.
Here is my code:
Task[] tasks = new Task[30];
for (int p = 0; p <= dstable.Tables[0].Rows.Count - 1; p++)
{
MethodParameters newParameter = new MethodParameters();
newParameter.Name = dstable.Tables[0].Rows[p]["Name"].ToString();
tasks[p] = Task.Factory.StartNew(() => ParseUri(newParameter));
Application.DoEvents();
}
try
{
Task.WaitAll(tasks);
//Console.Write("task completed");
}
catch (AggregateException ae)
{
throw ae.Flatten();
}

There are some major problems in your thinking.
does your PC have 30 Cores, so that every core can exactly takes one task? I don't think so
starting a seperate thread also takes some time.
with every concurrent thread more overhead is generated.
Can your problem be solved faster by starting more threads? This is only the case, when all threads do different tasks, like reading from disk, quering a database, computing something, etc.. 10 threads that do all "high-performance" tasks on the cpu, won't give an boost, quite contrary to, because every thread needs to clean up his mess, before he can give some cpu time to the next thread, and that one needs to clean up his mess too.
Check this link out
http://msdn.microsoft.com/en-us/library/ms810437.aspx
You can use the TPL
http://msdn.microsoft.com/en-us/library/dd460717.aspx
they try to guaranty the maximum effect from parallel threads.
Also I recommend this book
http://www.amazon.com/The-Multiprocessor-Programming-Maurice-Herlihy/dp/0123705916
When you really want to solve your problem in under 2 seconds, buy more CPU power ;)

I think you may have missing the main point of using thread.
Creating many threads may lead to more complexity by increasing OS over-head. (Context switching)
Also it will be much harder to manage your execution of code and harder to find and fix bugs if there is any.
Usage of thread may give you advantage when there is
Task need to be perform simultaneously
Building responsive UI

Related

TPL force higher parallelism

When queuing Tasks to the ThreadPool, the code relies on the default TaskScheduler to execute them. In my code example, I can see that 7 Tasks maximum get executed in parallel on separate threads.
new Thread(() =>
{
while (true)
{
ThreadPool.GetAvailableThreads(out var wt, out var cpt);
Console.WriteLine($"WT:{wt} CPT:{cpt}");
Thread.Sleep(500);
}
}).Start();
var stopwatch = new Stopwatch();
stopwatch.Start();
var tasks = Enumerable.Range(0, 100).Select(async i => { await Task.Yield(); Thread.Sleep(10000); }).ToArray();
Task.WaitAll(tasks);
Console.WriteLine(stopwatch.Elapsed.TotalSeconds);
Console.ReadKey();
Is there a way to force the scheduler to fire up more Tasks on other threads? Or is there a more "generous" scheduler in the framework without implementing a custom one?
EDIT:
Adding ThreadPool.SetMinThreads(100, X) seems to do the trick, I presume awaiting frees up the thread so the pool think it can fire up another one and then it immediately resumes.
By default, the minimum number of threads is set to the number of processors on a system. You can use the SetMinThreads method to increase the minimum number ofthreads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases, the thread pool will perform better with its own algorithm for allocating threads. Reducing the minimum to less than the number of processors can also hurt performance.
From here: https://msdn.microsoft.com/en-us/library/system.threading.threadpool.setminthreads(v=vs.110).aspx
I removed AsParallel as it is not relevant and it just seems to confuse readers.
Is there a way to force the scheduler to fire up more Tasks on other threads?
You cannot have more executing threads than you have CPU cores. This is just how computers work. If you use more threads, then your work will actually get done more slowly since the threads must swap in and out of the cores in order to run.
Or is there a more "generous" scheduler in the framework without implementing a custom one?
PLINQ is already tuned to make maximum use of the hardware.
You can see this for yourself if you replace the Thread.Sleep call with something that actually uses the CPU (e.g., while (true) ;), and then watch your CPU usage in Task Manager. My expectation is that the 7 or 8 threads used by PLINQ in this example is all your machine can handle.
Useful link that explains it can be done with ThreadPool.SetMinThread:
https://gist.github.com/JonCole/e65411214030f0d823cb#file-threadpool-md
Try this: https://msdn.microsoft.com/en-us/library/system.threading.threadpool.setmaxthreads(v=vs.110).aspx
You can set the number of worker threads (first argument).
Use WithDegreeOfParallelism extension:
Enumerable.Range(0, 100).AsParallel().WithDegreeOfParallelism(x).Select(...

Why ThreadPool start only one thread in time?

I start threads exactly like book says:
for (int i = 1; i <= 4; i++) {
ThreadPool.QueueUserWorkItem(new WaitCallback(ThreadMethod), i);
}
ThreadMethod looks like:
static void ThreadMethod(object input) {
Console.WriteLine(input + " thread started");
//do some stuff for, like, 400 milliseconds
Console.WriteLine(input + " thread completed");
}
In some reason 2 thread starts only after 1 is completed (in this moment all work is already done and 2-4 thread just start and stop doing nothing).
What could be wrong? Ask anything what could help solve this problem.
I don't use any synchronization classes.
If it's matter, i have 2 core processor.
Your ThreadMethod just runs too fast. Everything is right with your code except if possible you should switch from ThreadPool to new abstractions like Task.Run.
Actually, as i made ThreadMethod to do much longer stuff, another threads started after some time but it don't worked in one time but just switches from thread to thread. Looks like i have to use another tool.
A TreadPool has a maximum number of threads that it can use. See ThreadPool.GetMaxThread and SetMaxThread. It is probably equal to the number of available cores by default.
For CPU intensive work it makes sense since you would actually lower performance by using more threads than you have cores. However, for slow jobs such as I/O intensive jobs, many threads can run in parallel to avoid blocking and wait until the I/O is complete. Example: Grabing several files at a time from various FTP servers.

TPL vs Multithreading

I am new to threading and I need a clarification for the below scenario.
I am working on apple push notification services. My application demands to send notifications to 30k users when a new deal is added to the website.
can I split the 30k users into lists, each list containing 1000 users and start multiple threads or can use task?
Is the following way efficient?
if (lstDevice.Count > 0)
{
for (int i = 0; i < lstDevice.Count; i += 2)
{
splitList.Add(lstDevice.Skip(i).Take(2).ToList<DeviceHelper>());
}
var tasks = new Task[splitList.Count];
int count=0;
foreach (List<DeviceHelper> lst in splitList)
{
tasks[count] = Task.Factory.StartNew(() =>
{
QueueNotifications(lst, pMessage, pSubject, pNotificationType, push);
},
TaskCreationOptions.None);
count++;
}
QueueNotification method will just loop through each list item and creates a payload like
foreach (DeviceHelper device in splitList)
{
if (device.PlatformType.ToLower() == "ios")
{
push.QueueNotification(new AppleNotification()
.ForDeviceToken(device.DeviceToken)
.WithAlert(pMessage)
.WithBadge(device.Badge)
);
Console.Write("Waiting for Queue to Finish...");
}
}
push.StopAllServices(true);
Technically it is sure possible to split a list and then start threads that runs your List in parallel. You can also implement everything yourself, as you already have done, but this isn't a good approach. At first splitting a List into chunks that gets processed in parallel is already what Parallel.For or Parallel.ForEach does. There is no need to re-implement everything yourself.
Now, you constantly ask if something can run 300 or 500 notifications in parallel. But actually this is not a good question because you completly miss the point of running something in parallel.
So, let me explain you why that question is not good. At first, you should ask yourself why do you want to run something in parallel? The answer to that is, you want that something runs faster by using multiple CPU-cores.
Now your simple idea is probably that spawning 300 or 500 threads is faster, because you have more threads and it runs more things "in parallel". But that is not exactly the case.
At first, creating a thread is not "free". Every thread you create has some overhead, it takes some CPU-time to create a thread, and also it needs some memory. On top of that, if you create 300 threads it doesn't mean 300 threads run in parallel. If you have for example an 8 core CPU only 8 threads really can run in parallel. Creating more threads can even hurt your performance. Because now your program needs to switch constanlty between threads, that also cost CPU-performance.
The result of all that is. If you have something lightweight some small code that don't do a lot of computation it ends that creating a lot of threads will slow down your application instead of running faster, because the managing of your threads creates more overhead than running it on (for example) 8 cpu-cores.
That means, if you have a list of 30,000 of somewhat. It usally end that it is faster to just split your list in 8 chunks and work through your list in 8 threads as creating 300 Threads.
Your goal should never be: Can it run xxx things in parallel?
The question should be like: How many threads do i need, and how much items should every thread process to get my work as fastest done.
That is an important difference because just spawning more threads doesn't mean something ends up beeing fast.
So how many threads do you need, and how many items should every thread process? Well, you can write a lot of code to test it. But the amount changes from hardware to hardware. A PC with just 4 cores have another optimum than a system with 8 cores. If what you are doing is IO bound (for example read/write to disk/network) you also don't get more speed by increasing your threads.
So what you now can do is test everything, try to get the correct thread number and do a lot of benchmarking to find the best numbers.
But actually, that is the whole purpose of the TPL library with the Task<T> class. The Task<T> class already looks at your computer how many cpu-cores it have. And when you are running your Task it automatically tries to create as much threads needed to get the maximum out of your system.
So my suggestion is that you should use the TPL library with the Task<T> class. In my opinion you should never create Threads directly yourself or doing partition yourself, because all of that is already done in TPL.
I think the Task-Class is a good choise for your aim, becuase you have an easy handling over the async process and don't have to deal with Threads directly.
Maybe this help: Task vs Thread differences
But to give you a better answer, you should improve your question an give us more details.
You should be careful with creating to much parallel threads, because this can slow down your application. Read this nice article from SO: How many threads is too many?. The best thing is you make it configurable and than test some values.
I agree Task is a good choice however creating too many tasks also bring risks to your system and for failures, your decision is also a factor to come up a solution. For me I prefer MSQueue combining with thread pool.
If you want parallelize the creation of the push notifications and maximize the performance by using all CPU's on the computer you should use Parallel.ForEach:
Parallel.ForEach(
devices,
device => {
if (device.PlatformType.ToUpperInvariant() == "IOS") {
push.QueueNotification(
new AppleNotification()
.ForDeviceToken(device.DeviceToken)
.WithAlert(message)
.WithBadge(device.Badge)
);
}
}
);
push.StopAllServices(true);
This assumes that calling push.QueueNotification is thread-safe. Also, if this call locks a shared resource you may see lower than expected performance because of lock contention.
To avoid this lock contention you may be able to create a separate queue for each partition that Parallel.ForEach creates. I am improvising a bit here because some details are missing from the question. I assume that the variable push is an instance of the type Push:
Parallel.ForEach(
devices,
() => new Push(),
(device, _, push) => {
if (device.PlatformType.ToUpperInvariant() == "IOS") {
push.QueueNotification(
new AppleNotification()
.ForDeviceToken(device.DeviceToken)
.WithAlert(message)
.WithBadge(device.Badge)
);
}
return push;
},
push.StopAllServices(true);
);
This will create a separate Push instance for each partition that Parallel.ForEach creates and when the partition is complete it will call StopAllServices on the instance.
This approach should perform no worse than splitting the devices into N lists where N is the number of CPU's and and starting either N threads or N tasks to process each list. If one thread or task "gets behind" the total execution time will be the execution time of this "slow" thread or task. With Parallel.ForEach all CPU's are used until all devices have been processed.

Parallel.For vs regular threads

I'm trying to understand why Parallel.For is able to outperform a number of threads in the following scenario: consider a batch of jobs that can be processed in parallel. While processing these jobs, new work may be added, which then needs to be processed as well. The Parallel.For solution would look as follows:
var jobs = new List<Job> { firstJob };
int startIdx = 0, endIdx = jobs.Count;
while (startIdx < endIdx) {
Parallel.For(startIdx, endIdx, i => WorkJob(jobs[i]));
startIdx = endIdx; endIdx = jobs.Count;
}
This means that there are multiple times where the Parallel.For needs to synchronize. Consider a bread-first graph algorithm algorithm; the number of synchronizations would be quite large. Waste of time, no?
Trying the same in the old-fashioned threading approach:
var queue = new ConcurrentQueue<Job> { firstJob };
var threads = new List<Thread>();
var waitHandle = new AutoResetEvent(false);
int numBusy = 0;
for (int i = 0; i < maxThreads; i++)
threads.Add(new Thread(new ThreadStart(delegate {
while (!queue.IsEmpty || numBusy > 0) {
if (queue.IsEmpty)
// numbusy > 0 implies more data may arrive
waitHandle.WaitOne();
Job job;
if (queue.TryDequeue(out job)) {
Interlocked.Increment(ref numBusy);
WorkJob(job); // WorkJob does a waitHandle.Set() when more work was found
Interlocked.Decrement(ref numBusy);
}
}
// others are possibly waiting for us to enable more work which won't happen
waitHandle.Set();
})));
threads.ForEach(t => t.Start());
threads.ForEach(t => t.Join());
The Parallel.For code is of course much cleaner, but what I cannot comprehend, it's even faster as well! Is the task scheduler just that good? The synchronizations were eleminated, there's no busy waiting, yet the threaded approach is consistently slower (for me). What's going on? Can the threading approach be made faster?
Edit: thanks for all the answers, I wish I could pick multiple ones. I chose to go with the one that also shows an actual possible improvement.
The two code samples are not really the same.
The Parallel.ForEach() will use a limited amount of threads and re-use them. The 2nd sample is already starting way behind by having to create a number of threads. That takes time.
And what is the value of maxThreads ? Very critical, in Parallel.ForEach() it is dynamic.
Is the task scheduler just that good?
It is pretty good. And TPL uses work-stealing and other adaptive technologies. You'll have a hard time to do any better.
Parallel.For doesn't actually break up the items into single units of work. It breaks up all the work (early on) based on the number of threads it plans to use and the number of iterations to be executed. Then has each thread synchronously process that batch (possibly using work stealing or saving some extra items to load-balance near the end). By using this approach the worker threads are virtually never waiting on each other, while your threads are constantly waiting on each other due to the heavy synchronization you're using before/after every single iteration.
On top of that since it's using thread pool threads many of the threads it needs are likely already created, which is another advantage in its favor.
As for synchronization, the entire point of a Parallel.For is that all of the iterations can be done in parallel, so there is almost no synchronization that needs to take place (at least in their code).
Then of course there is the issue of the number of threads. The threadpool has a lot of very good algorithms and heuristics to help it determine how many threads are need at that instant in time, based on the current hardware, the load from other applications, etc. It's possible that you're using too many, or not enough threads.
Also, since the number of items that you have isn't known before you start I would suggest using Parallel.ForEach rather than several Parallel.For loops. It is simply designed for the situation that you're in, so it's heuristics will apply better. (It also makes for even cleaner code.)
BlockingCollection<Job> queue = new BlockingCollection<Job>();
//add jobs to queue, possibly in another thread
//call queue.CompleteAdding() when there are no more jobs to run
Parallel.ForEach(queue.GetConsumingEnumerable(),
job => job.DoWork());
Your creating a bunch of new threads and the Parallel.For is using a Threadpool. You'll see better performance if you were utilizing the C# threadpool but there really is no point in doing that.
I would shy away from rolling out your own solution; if there is a corner case where you need customization use the TPL and customize..

C# .net For() Step?

I have a function, that processes a list of 6100 list items. The code used to work when the list was just 300 items. But instantly crashes with 6100. Is there a way I can loop through these 6100 items say 30 at a time and execute a new thread per item?
for (var i = 0; i < ListProxies.Items.Count; i++)
{
var s = ListProxies.Items[i] as string;
var thread = new ParameterizedThreadStart(ProxyTest.IsAlive);
var doIt = new Thread(thread) { Name = "CheckProxy# " + i };
doIt.Start(s);
}
Any help would be greatly appreciated.
Do you really need to spawn a new thread for each work item? Unless there is a genuine need for this (if so, please tell us why), I would strongly recommend you use the Managed Thread Pool instead. This will give you the concurrency benefits you require, but without the resource requirements (as well as the creation, destruction and massive context-switching costs) of running thousands of threads. If you are on .NET 4.0, you might also want to consider using the Task Parallel Library.
For example:
for (var i = 0; i < ListProxies.Items.Count; i++)
{
var s = ListProxies.Items[i] as string;
ThreadPool.QueueUserWorkItem(ProxyTest.IsAlive, s);
}
On another note, I would seriously consider renaming the IsAlive method (which looks like a boolean property or method) since:
It clearly has a void IsAlive(object) signature.
It has observable side-effects (from your comment that it "increment a progress bar and add a 'working' proxy to a new list").
There is a limit on the number of threads you can spawn. 6100 threads does seem quite a bit excesive.
I agree win Ani, you should look into a ThreadPool or even a Producer / Consumer process depending on what you are trying to accomplish.
There are quite a few processes for handling multi threaded applications but without knowing what you are doing in the start there really is no way to recommend any approach other than a ThreadPool or Producer / Consumer process (Queues with SyncEvents).
At any rate you really should try to keep the number of threads to a minimum otherwise you run the risk of thread locks, spin locks, wait locks, dead locks, race conditions, who knows what, etc...
If you want good information on threading with C# check out the book Concurrent Programming on Windows By Joe Duffy it is really helpful.

Categories