Multithreading has no effect when working with Bitmap object

Multithreading has no effect when working with Bitmap object - c#

class Program
{
static void Main(string[] args)
{
new Program();
}
private static int prepTotal;
private static readonly object Lock = new object();
public Program()
{
var sw = new Stopwatch();
sw.Start();
Parallel.For((long) 0, 10,new ParallelOptions {MaxDegreeOfParallelism = 1}, (j) =>
{
DoIt();
});
sw.Stop();
Console.WriteLine($"1 thread sum time is {prepTotal} ms. Total time is {sw.ElapsedMilliseconds} ms.");
sw.Restart();
prepTotal = 0;
Parallel.For((long)0, 10, new ParallelOptions { MaxDegreeOfParallelism = 3 }, (j) =>
{
DoIt();
});
sw.Stop();
Console.WriteLine($"3 thread sum time is {prepTotal} ms. Total time is {sw.ElapsedMilliseconds} ms.");
sw.Restart();
prepTotal = 0;
Parallel.For((long)0, 10, new ParallelOptions { MaxDegreeOfParallelism = 1 }, (j) =>
{
DoIt();
});
sw.Stop();
Console.WriteLine($"1 thread sum time is {prepTotal} ms. Total time is {sw.ElapsedMilliseconds} ms.");
sw.Restart();
prepTotal = 0;
Parallel.For((long)0, 10, new ParallelOptions { MaxDegreeOfParallelism = 3 }, (j) =>
{
DoIt();
});
sw.Stop();
Console.WriteLine($"3 thread sum time is {prepTotal} ms. Total time is {sw.ElapsedMilliseconds} ms.");
Console.ReadLine();
}
private static void DoIt()
{
var sw2 = new Stopwatch();
sw2.Start();
using (var bmp = new Bitmap(3000, 3000))
{
}
sw2.Stop();
lock (Lock)
{
prepTotal += (int) sw2.ElapsedMilliseconds;
}
}
}
When I run my test code(derived from original really complex code) I got following results. As you can see code running in more threads is almost 3 times slower. Is Bitmap constructor makes some blocking or what?
1 thread sum time is 125 ms. Total time is 132 ms.
3 thread sum time is 360 ms. Total time is 132 ms.
1 thread sum time is 121 ms. Total time is 127 ms.
3 thread sum time is 364 ms. Total time is 128 ms.

Well, I just used a profiler see if my guess is correct, and indeed, new Bitmap(3000, 3000) is almost entirely memory bound. So unless you have a server machine with multiple independent memory systems, adding more CPU doesn't help any. The bottleneck is memory.
The second most important part happens in the Dispose, which is again... almost entirely memory bound.
Multi-threading only helps with CPU-bound code. Since the CPU is much faster than any memory you may have in your system, the CPU is only really saturated when it can avoid working with memory (and other I/O devices). Your case is pretty much exactly the opposite - there's very little CPU work, and where there is CPU work, it's mostly synchronized (e.g. requesting and freeing virtual memory). Not a lot of opportunities for parallelization.

Related

Task.Delay(millis) consistently blocks all threads for 100-500ms

I've noticed firing up several Task.Delay() calls basically "at the same time" causes systematic and periodic long pauses in the execution. Not just in one thread, but all running threads.
Here's an old SO question, which describes probably the same issue: await Task.Delay(foo); takes seconds instead of ms
I hope it's ok to re-surface this with a fresh take, since the problem still exists and I haven't found any other workaround than "use Thread.Sleep", which doesn't really work in all cases.
Here's a test code:
static Stopwatch totalTime = new Stopwatch();
static void Main(string[] args)
{
Task[] tasks = new Task[100];
totalTime.Start();
for (int i = 0; i < 100; i++)
{
tasks[i] = TestDelay(1000, 10, i);
}
Task.WaitAll(tasks);
}
private static async Task TestDelay(int loops, int delay, int id)
{
int exact = 0;
int close = 0;
int off = 0;
Stopwatch stopwatch = new Stopwatch();
for (int i = 0; i < loops; i++)
{
stopwatch.Restart();
await Task.Delay(delay);
long duration = stopwatch.ElapsedMilliseconds;
if (duration == delay) ++exact;
else if (duration < delay + 10) ++close;
else
{
//This is seen in chunks for all the tasks at once!
Console.WriteLine(totalTime.ElapsedMilliseconds + " ------ " + id + ": " + duration + "ms");
++off;
}
}
Console.WriteLine(totalTime.ElapsedMilliseconds + " -DONE- " + id + " Exact: " + exact + ", Close: " + close + ", Off:" + off);
}
By running the code, there will be 1-3 points in time, when all of the N tasks will block/hang/something for significantly more than 10ms, more like 100-500ms. This happens to all tasks, and at the same time. I've added relevant logging, in case someone wants to try it and fiddle with the numbers.
Finally the obvious question is: Why is this happening, and is there any way to avoid it? Can anyone run the code and NOT get the delays?
Tested with dotnetcore 3.1 and net 5.0. Ran on MacOS and Linux.
Changing min threads doesn't have any effect on this.
Just for laughs, I tried SemaphoreSlim.WaitAsync(millis) (on an always unsignaled semaphore), which funnily enough has the same problem.
EDIT: Here's a sample output:
136 ------ 65: 117ms
136 ------ 73: 117ms
160 ------ 99: 140ms
... all 100 of these
161 ------ 3: 144ms
Similar output is printed later in the execution as well.
These lines are printed when a task delay takes over 10ms more than requested.
So the first number is the point in time, which is almost the same for all tasks, so I assume it's due to the same hang in execution. The second number is just the task id to tell them apart. Last number is the stopwatch-given delay, which is significantly more than the 10ms.
It can be 10-20ms easily, but 10x is not due to inaccuracy.
I've tried to look into GC problems, but it doesn't happen during a manual GC.Collect(), and when it does happen I don't see changes in heapdump. It's still a possibility, but I'm lost at pinpointing it.

I'll do the unthinkable, and answer my own question, just in case anyone else stumbles upon this.
First, thanks to #paulomorgado for pointing me towards thread pool latency. That indeed is the problem, if you fire up hundreds of Task.Delay() calls in a short period of time.
The way I solved this, was to create a separate Thread, which keeps track of requested delays and uses TaskCompletionSource to enable asynchronous awaits on the delays.
E.g. create a struct with three fields: start time, delay duration and a TaskCompletionSource. Have the thread loop through these (in a lock) and whenever a duration has expired, mark the task done with TaskCompletionSource.SetResult().
Now you can have a custom async Delay(millis) method that
creates a new struct
adds it to a "requested delays" list (lock)
awaits for the task completion
remove the struct from the list (lock)
return
A custom TaskScheduler with the needed threads might be a fancier solution, but I found this approach simple and clean. And it seems to do the trick, especially since you can have more than one thread going through all the delays for extra efficiency. Obviously happy to have this approach murdered with any flaws you might notice.
Please note that this approach probably only makes sense if your code is filled with asynchronous delays for some reason (like mine).
EDIT Quick sample code for the relevant parts. This needs some optimizing in regards of how locks, loops, news, and lists are handled, but even with this, I can see a HUGE improvement.
With ridiculously short delays (say 10ms), this shows error at 80ms max (tried with 5 threads), where with Task.Delay it's at least 100ms, up to 500ms. With longer, reasonable, delays (100ms+) this is almost flawless, whereas Task.Delay() slaps with the same 100-500ms surprise delay at least in the beginning.
private struct CustomDelay
{
public TaskCompletionSource Completion;
public long Started;
public long Delay;
}
//Your favorite data structure here. Using List for clarity. Note: using separate blocks based on delay granularity might be a good idea.
private static List<CustomDelay> _requestedDelays = new List<CustomDelay>();
//Create threads from this. Sleep can be longer if there are several threads.
private static void CustomDelayHandler()
{
while (_running)
{
Thread.Sleep(10); //To avoid busy loop
lock (_delayListLock)
{
for (int i = 0; i < _requestedDelays.Count; ++i)
{
CustomDelay delay = _requestedDelays[i];
if (!delay.Completion.Task.IsCompleted)
{
if (TotalElapsed() - delay.Started >= delay.Delay)
{
delay.Completion.SetResult();
}
}
}
}
}
}
//Use this instead of Task.Delay(millis)
private static async Task Delay(int ms)
{
if (ms <= 0) return;
CustomDelay delay = new CustomDelay()
{
Completion = new TaskCompletionSource(),
Delay = ms,
Started = TotalElapsed()
};
lock (_delayListLock)
{
_requestedDelays.Add(delay);
}
await delay.Completion.Task;
lock (_delayListLock)
{
_requestedDelays.Remove(delay);
}
}

Here is my attempt to reproduce your observations. I am creating 100 tasks, and each task is awaiting repeatedly a 10 msec Task.Delay in a loop. The actual duration of each Delay is measured with a Stopwatch, and is used to update a dictionary that holds the occurrences of each duration (all measurements with the same integer duration are aggregated in a single entry in the dictionary). The total duration of the test is 10 seconds.
ThreadPool.SetMinThreads(100, 100);
const int nominalDelay = 10;
var cts = new CancellationTokenSource(10000); // Duration of the test
var durations = new ConcurrentDictionary<long, int>();
var tasks = Enumerable.Range(1, 100).Select(n => Task.Run(async () =>
{
var stopwatch = new Stopwatch();
while (true)
{
stopwatch.Restart();
try { await Task.Delay(nominalDelay, cts.Token); }
catch (OperationCanceledException) { break; }
long duration = stopwatch.ElapsedMilliseconds;
durations.AddOrUpdate(duration, _ => 1, (_, count) => count + 1);
}
})).ToArray();
Task.WaitAll(tasks);
var totalTasks = durations.Values.Sum();
var totalDuration = durations.Select(pair => pair.Key * pair.Value).Sum();
Console.WriteLine($"Nominal delay: {nominalDelay} msec");
Console.WriteLine($"Min duration: {durations.Keys.Min()} msec");
Console.WriteLine($"Avg duration: {(double)totalDuration / totalTasks:#,0.0} msec");
Console.WriteLine($"Max duration: {durations.Keys.Max()} msec");
Console.WriteLine($"Total tasks: {totalTasks:#,0}");
Console.WriteLine($"---Occurrences by Duration---");
foreach (var pair in durations.OrderBy(e => e.Key))
{
Console.WriteLine($"Duration {pair.Key,2} msec, Occurrences: {pair.Value:#,0}");
}
I run the program on .NET Core 3.1.3, in Release version without the debugger attached. Here are the results:
(Try it on fiddle)
Nominal delay: 10 msec
Min duration: 9 msec
Avg duration: 15.2 msec
Max duration: 40 msec
Total tasks: 63,418
---Occurrences by Duration---
Duration 9 msec, Occurrences: 165
Duration 10 msec, Occurrences: 11,373
Duration 11 msec, Occurrences: 21,299
Duration 12 msec, Occurrences: 2,745
Duration 13 msec, Occurrences: 878
Duration 14 msec, Occurrences: 375
Duration 15 msec, Occurrences: 252
Duration 16 msec, Occurrences: 7
Duration 17 msec, Occurrences: 16
Duration 18 msec, Occurrences: 102
Duration 19 msec, Occurrences: 110
Duration 20 msec, Occurrences: 1,995
Duration 21 msec, Occurrences: 14,839
Duration 22 msec, Occurrences: 7,347
Duration 23 msec, Occurrences: 1,269
Duration 24 msec, Occurrences: 166
Duration 25 msec, Occurrences: 136
Duration 26 msec, Occurrences: 264
Duration 27 msec, Occurrences: 47
Duration 28 msec, Occurrences: 1
Duration 36 msec, Occurrences: 5
Duration 37 msec, Occurrences: 8
Duration 38 msec, Occurrences: 9
Duration 39 msec, Occurrences: 7
Duration 40 msec, Occurrences: 3
Running the program on .NET Framework 4.8.3801.0 produces similar results.
TL;DR, I was not able to reproduce the 100-500 msec durations you observed.

Task.Run different second execute

I'm C# newbie.
Question:
class Program
{
static void why()
{
List<Task> listOfDummyTask = new List<Task>();
for (int i = 0; i < 100; ++i)
{
Task hoho = Task.Run(() =>
{
Thread.Sleep(2000);
Console.WriteLine("Die");
});
listOfDummyTask.Add(hoho);
}
Task.WaitAll(listOfDummyTask.ToArray());
}
static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
sw.Reset();
sw.Start();
why();
sw.Stop();
Console.WriteLine("{0} ms", sw.ElapsedMilliseconds.ToString());
Console.WriteLine("----------------------------------");
sw.Reset();
sw.Start();
why();
sw.Stop();
Console.WriteLine("{0} ms", sw.ElapsedMilliseconds.ToString());
Console.WriteLine("----------------------------------");
Console.WriteLine("End");
}
}
First, I call why(), It prints "Die" same 4 times.
And It prints 1 ~ 4 "Die".
First why() stopwatch returns 28,000 ms, but when I call second why(),
It prints "Die" 8 time, And It prints 5~8 "Die" same time..
Second why() stopwatch returns 10,000 ~ 14,000 ms.
Why?
What Keyword it's situation?

Task.Run in a loop 100 times will enqueue 100 work items to the default scheduler (thread pool). If the thread pool decides to execute up to 4 items concurrently, then it will take 4 work items off the queue and complete them in roughly 2 seconds. Then it will take four more, and so on until all are done.
There's no real expectation of how many tasks will execute concurrently, and how many will be postponed.

Parallel.Invoke gives a minimal performance increase if any

I wrote a simple console app to test the performance of Parallel.Invoke based on Microsoft's example on msdn:
public static void TestParallelInvokeSimple()
{
ParallelOptions parallelOptions = new ParallelOptions { MaxDegreeOfParallelism = 1 }; // 1 to disable threads, -1 to enable them
Parallel.Invoke(parallelOptions,
() =>
{
Stopwatch sw = new Stopwatch();
sw.Start();
Console.WriteLine("Begin first task...");
List<string> objects = new List<string>();
for (int i = 0; i < 10000000; i++)
{
if (objects.Count > 0)
{
string tempstr = string.Join("", objects.Last().Take(6).ToList());
objects.Add(tempstr + i);
}
else
{
objects.Add("START!");
}
}
sw.Stop();
Console.WriteLine("End first task... {0} seconds", sw.Elapsed.TotalSeconds);
},
() =>
{
Stopwatch sw = new Stopwatch();
sw.Start();
Console.WriteLine("Begin second task...");
List<string> objects = new List<string>();
for (int i = 0; i < 10000000; i++)
{
objects.Add("abc" + i);
}
sw.Stop();
Console.WriteLine("End second task... {0} seconds", sw.Elapsed.TotalSeconds);
},
() =>
{
Stopwatch sw = new Stopwatch();
sw.Start();
Console.WriteLine("Begin third task...");
List<string> objects = new List<string>();
for (int i = 0; i < 20000000; i++)
{
objects.Add("abc" + i);
}
sw.Stop();
Console.WriteLine("End third task... {0} seconds", sw.Elapsed.TotalSeconds);
}
);
}
The ParallelOptions is to easily enable/disable threading.
When I disable threading I get the following output:
Begin first task...
End first task... 10.034647 seconds
Begin second task...
End second task... 3.5326487 seconds
Begin third task...
End third task... 6.8715266 seconds
done!
Total elapsed time: 20.4456563 seconds
Press any key to continue . . .
When I enable threading by setting MaxDegreeOfParallelism to -1 I get:
Begin third task...
Begin first task...
Begin second task...
End second task... 5.9112167 seconds
End third task... 13.113622 seconds
End first task... 19.5815043 seconds
done!
Total elapsed time: 19.5884057 seconds
Which is practically the same speed as sequential processing. Since task 1 takes the longest - about 10 seconds, I would expect the threading to take around 10 seconds total to run all 3 tasks. So what gives? Why is Parallel.Invoke running my tasks slower individually, yet in parallel?
BTW, I've seen the exact same results when using Parallel.Invoke in a real app performing many different tasks at the same time (most of which are running queries).
If you think it's my pc, think again... it's 1 year old, with 8GB of RAM, windows 8.1, Intel Core I7 2.7GHz 8 core cpu. My PC is not overloaded as I watched the performance while running my tests over and over again. My PC never maxed out but obviously showed cpu and memory increase when running.

I haven't profiled this, but the majority of the time here is probably being spent doing memory allocation for those lists and tiny strings. These "tasks" aren't actually doing anything other than growing the lists with minimal input and almost no processing time.
Consider that:
objects.Add("abc" + i);
is essentially just creating a new string and then adding it to a list. Creating a small string like this is largely just a memory allocation exercise since strings are stored on the heap. Furthermore, the memory allocated for the List is going to fill up rapidly - each time it does the list will re-allocate more memory for its own storage.
Now, heap allocations are serialized within a process - four threads inside one process cannot allocate memory at the same time. Requests for memory allocation are processed in sequence since the shared heap is like any other shared resource that needs to be protected from concurrent mayhem.
So what you have are three extremely memory-hungry threads that are probably spending most of their time waiting for each other to finish getting new memory.
Fill those methods with CPU intensive work (ie : do some math, etc) and you'll see the results are very different. The lesson is that not all tasks are efficiently parallelizable and not all in the ways that you might think. The above, for example, could be sped up by running each task within its own process - with its own private memory space there would be no contention for memory allocation, for example.

Parallel Invoke is taking too long to launch

I need some advice. I have an application that processes trade information from a real-time data feed from the stock exachanges. My processing is falling behind.
Since I'm running on a 3GHz Intel I7 with 32 GBytes of main memory, I should have enough power to do this application. The Parse routine stores trade information in an SQL Server 2014 database, running on a Windows 2012 R2 Server.
I put the following timing information in the main processing loop:
invokeTime.Restart();
Parallel.Invoke(() => parserObj.Parse(julian, data));
invokeTime.Stop();
var milliseconds = invokeTime.ElapsedMilliseconds;
if (milliseconds > maxMilliseconds) {
maxMilliseconds = milliseconds;
messageBar.SetText("Invoke: " + milliseconds);
}
I'm getting as much as 1122 milliseonds to do the Parallel.Invoke. A similar timing test shows that the Parse routine only takes 7 milliseconds (max).
Is there a better way of processing the data, other than doing the Parallel.Invoke?
Any suggestions will be greatly appreciated.
Charles

Have you tried
Task.Factory.StartNew(() => {
parserObj.Parse(julian, data));
});
How does your Parse method look like? Maybe the bottleneck is in there...

In Stephen Toub's article: http://blogs.msdn.com/b/pfxteam/archive/2011/10/24/10229468.aspx he describes: "Task.Run can and should be used for the most common cases of simply offloading some work to be processed on the ThreadPool". That's exactly what I want to do, offload the Parse rosutine to a background thread. So I changed:
Parallel.Invoke(() => parserObj.Parse(julian, data));
to:
Task.Run(() => parserObj.Parse(julian, data));
I also increased the number of threads in the ThreadPool from 8 to 80 by doing:
int minWorker, minIOC;;
ThreadPool.GetMinThreads(out minWorker, out minIOC);
var newMinWorker = 10*minWorker;
var newMinIOC = 10*minIOC;
if (ThreadPool.SetMinThreads(newMinWorker, newMinIOC)) {
textBox.AddLine("The minimum no. of worker threads is now: " + newMinWorker);
} else {
textBox.AddLine("Drat! The minimum no. of worker threads could not be changed.");
}
The parsing loop, which runs for 6 1/2 hours/day, looks like:
var stopWatch = new Stopwatch();
var maxMilliseconds = 0L;
while ((data = GetDataFromIQClient()) != null) {
if ( MarketHours.IsMarketClosedPlus2() ) {
break;
}
stopWatch.Restart();
Task.Run(() => parserObj.Parse(julian, data));
stopWatch.Stop();
var milliseconds = stopWatch.ElapsedMilliseconds;
if (milliseconds > maxMilliseconds) {
maxMilliseconds = milliseconds;
messageBar.SetText("Task.Run: " + milliseconds);
}
}
Now, the maximum time spent to call Task.Run was 96 milliseconds, and the maximum time spent in parser was 18 milliseconds. I'm now keeping up with the data transmission.
Charles

Parallel Library: does a delay on one degree of parallelism delay all of them?

I have a ConcurrentBag urls whose items are being processed in parallel (nothing is being written back to the collection):
urls.AsParallel<UrlInfo>().WithDegreeOfParallelism(17).ForAll( item =>
UrlInfo info = MakeSynchronousWebRequest(item);
(myProgress as IProgress<UrlInfo>).Report(info);
});
I have the timeout set to 30 seconds in the web request. When a url that is very slow to respond is encountered, all of the parallel processing grinds to a halt. Is this expected behavior, or should I be searching out some problem in my code?
Here's the progress :
myProgress = new Progress<UrlInfo>( info =>
{
Action action = () =>
{
Interlocked.Increment(ref itested);
if (info.status == UrlInfo.UrlStatusCode.dead)
{
Interlocked.Increment(ref idead);
this.BadUrls.Add(info);
}
dead.Content = idead.ToString();
tested.Content = itested.ToString();
};
try
{
Dispatcher.BeginInvoke(action);
}
catch (Exception ex)
{
}
});

It's the expected behavior. AsParallel doesn't return until all the operations are finished. Since you're making synchronous requests, you've got to wait until your slowest one is finished. However note that even if you've got one really slow task hogging up a thread, the scheduler continues to schedule new tasks as old ones finish on the remaining threads.
Here's a rather instructive example. It creates 101 tasks. The first task hogs one thread for 5000 ms, the 100 others churn on the remaining 20 threads for 1000 ms each. So it schedules 20 of those tasks and they run for one second each, going through that cycle 5 times to get through all 100 tasks, for a total of 5000 ms. However if you change the 101 to 102, that means you've got 101 tasks churning on the 20 threads, which will end up taking 6000 ms; that 101th task just didn't have a thread to churn on until the 5 sec mark. If you change the 101 to, say, 2, you note it still takes 5000 ms because you have to wait for the slow task to complete.
static void Main()
{
ThreadPool.SetMinThreads(21, 21);
var sw = new Stopwatch();
sw.Start();
Enumerable.Range(0, 101).AsParallel().WithDegreeOfParallelism(21).ForAll(i => Thread.Sleep(i==0?5000:1000));
Console.WriteLine(sw.ElapsedMilliseconds);
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.