Parallel Invoke is taking too long to launch - c#

I need some advice. I have an application that processes trade information from a real-time data feed from the stock exachanges. My processing is falling behind.
Since I'm running on a 3GHz Intel I7 with 32 GBytes of main memory, I should have enough power to do this application. The Parse routine stores trade information in an SQL Server 2014 database, running on a Windows 2012 R2 Server.
I put the following timing information in the main processing loop:
invokeTime.Restart();
Parallel.Invoke(() => parserObj.Parse(julian, data));
invokeTime.Stop();
var milliseconds = invokeTime.ElapsedMilliseconds;
if (milliseconds > maxMilliseconds) {
maxMilliseconds = milliseconds;
messageBar.SetText("Invoke: " + milliseconds);
}
I'm getting as much as 1122 milliseonds to do the Parallel.Invoke. A similar timing test shows that the Parse routine only takes 7 milliseconds (max).
Is there a better way of processing the data, other than doing the Parallel.Invoke?
Any suggestions will be greatly appreciated.
Charles

Have you tried
Task.Factory.StartNew(() => {
parserObj.Parse(julian, data));
});
How does your Parse method look like? Maybe the bottleneck is in there...

In Stephen Toub's article: http://blogs.msdn.com/b/pfxteam/archive/2011/10/24/10229468.aspx he describes: "Task.Run can and should be used for the most common cases of simply offloading some work to be processed on the ThreadPool". That's exactly what I want to do, offload the Parse rosutine to a background thread. So I changed:
Parallel.Invoke(() => parserObj.Parse(julian, data));
to:
Task.Run(() => parserObj.Parse(julian, data));
I also increased the number of threads in the ThreadPool from 8 to 80 by doing:
int minWorker, minIOC;;
ThreadPool.GetMinThreads(out minWorker, out minIOC);
var newMinWorker = 10*minWorker;
var newMinIOC = 10*minIOC;
if (ThreadPool.SetMinThreads(newMinWorker, newMinIOC)) {
textBox.AddLine("The minimum no. of worker threads is now: " + newMinWorker);
} else {
textBox.AddLine("Drat! The minimum no. of worker threads could not be changed.");
}
The parsing loop, which runs for 6 1/2 hours/day, looks like:
var stopWatch = new Stopwatch();
var maxMilliseconds = 0L;
while ((data = GetDataFromIQClient()) != null) {
if ( MarketHours.IsMarketClosedPlus2() ) {
break;
}
stopWatch.Restart();
Task.Run(() => parserObj.Parse(julian, data));
stopWatch.Stop();
var milliseconds = stopWatch.ElapsedMilliseconds;
if (milliseconds > maxMilliseconds) {
maxMilliseconds = milliseconds;
messageBar.SetText("Task.Run: " + milliseconds);
}
}
Now, the maximum time spent to call Task.Run was 96 milliseconds, and the maximum time spent in parser was 18 milliseconds. I'm now keeping up with the data transmission.
Charles

Related

Multiple parallel Tasks in C# do not improve calculation time

I have a complicated math problem to solve and I decided to do some independent calculations in parallel to improve calculation time. In many CAE programs, like ANSYS or SolidWorks, it is possible to set multiple cores for that purpose.
I created a simple Windows Form example to illustrate my problem. Here the function CalculateStuff() raises A from Sample class in power 1.2 max times. For 2 tasks it's max / 2 times and for 4 tasks it's max / 4 times.
I calculated the resulting time of operation both for only one CalculateStuff() function or four duplicates (CalculateStuff1(), ...2(), ...3(), ...4() - one for each task) with the same code. I'm not sure, if it matters to use the same function for each task (anyway, Math.Pow is the same). I also tried to enable or disable the ProgressBar.
The table represents time of operation (sec) for all 12 cases. I expected it to be like 2 and 4 times faster for 2 and 4 tasks, but in some cases 4 tasks are even worse than 1. My computer has 2 processors, 10 cores each. According to Debug window, CPU usage increases with more tasks. What's wrong with my code here or do I misunderstand something? Why multiple tasks do not improve time of operation?
private readonly ulong max = 400000000ul;
// Sample class
private class Sample
{
public double A { get; set; } = 1.0;
}
// Clear WinForm elements
private void Clear()
{
PBar1.Value = PBar2.Value = PBar3.Value = PBar4.Value = 0;
TextBox.Text = "";
}
// Button that launches 1 task
private async void BThr1_Click(object sender, EventArgs e)
{
Clear();
DateTime start = DateTime.Now;
Sample sample = new Sample();
await Task.Delay(100);
Task t = Task.Run(() => CalculateStuff(sample, PBar1, max));
await t;
TextBox.Text = (DateTime.Now - start).ToString(#"hh\:mm\:ss");
t.Dispose();
}
// Button that launches 2 tasks
private async void BThr2_Click(object sender, EventArgs e)
{
Clear();
DateTime start = DateTime.Now;
Sample sample1 = new Sample();
Sample sample2 = new Sample();
await Task.Delay(100);
Task t1 = Task.Run(() => CalculateStuff(sample1, PBar1, max / 2));
Task t2 = Task.Run(() => CalculateStuff(sample2, PBar2, max / 2));
await t1; await t2;
TextBox.Text = (DateTime.Now - start).ToString(#"hh\:mm\:ss");
t1.Dispose(); t2.Dispose();
}
// Button that launches 4 tasks
private async void BThr4_Click(object sender, EventArgs e)
{
Clear();
DateTime start = DateTime.Now;
Sample sample1 = new Sample();
Sample sample2 = new Sample();
Sample sample3 = new Sample();
Sample sample4 = new Sample();
await Task.Delay(100);
Task t1 = Task.Run(() => CalculateStuff(sample1, PBar1, max / 4));
Task t2 = Task.Run(() => CalculateStuff(sample2, PBar2, max / 4));
Task t3 = Task.Run(() => CalculateStuff(sample3, PBar3, max / 4));
Task t4 = Task.Run(() => CalculateStuff(sample4, PBar4, max / 4));
await t1; await t2; await t3; await t4;
TextBox.Text = (DateTime.Now - start).ToString(#"hh\:mm\:ss");
t1.Dispose(); t2.Dispose(); t3.Dispose(); t4.Dispose();
}
// Calculate some math stuff
private static void CalculateStuff(Sample s, ProgressBar pb, ulong max)
{
ulong c = max / (ulong)pb.Maximum;
for (ulong i = 1; i <= max; i++)
{
s.A = Math.Pow(s.A, 1.2);
if (i % c == 0)
pb.Invoke(new Action(() => pb.Value = (int)(i / c)));
}
}
Tasks are not threads. "Asynchronous" does not mean "simultaneous".
What's wrong with my code here or do I misunderstand something?
You're misunderstanding what tasks are.
You should think of tasks as something that you can do in any order you desire. Take the example of a cooking recipe:
Cut the potatoes
Cut the vegetables
Cut the meat
If these were not tasks and it was synchronous code, you would always do these steps in the exact order they were listed.
If they were tasks, that doesn't mean these jobs will be done simultaneously. You are only one person (= one thread), and you can only do one thing at a time.
You can do the tasks in any order you like, you can possibly even halt one task to begin on another, but you still can't do more than one thing at the same time. Regardless of the order in which you complete the tasks, the total time taken to complete all three tasks remains the same, and this is not (inherently) any faster.
If they were threads, that's like hiring 3 chefs, which means these jobs can be done simultaneously.
Asynchronicity does cut down on idling time, when it is awaitable.
Do note that asynchronous code can lead to time gains in cases where your synchronous code would otherwise be idling, e.g. waiting for a network response. This is not taken into account in the above example, which is exactly why I listed "cut [x]" jobs rather than "wait for [x] to boil".
Your job (the calculation) is not asynchronous code. It never idles (in a way that it's awaitable) and therefore it runs synchronously. This means you're not getting any benefit from running this asynchronously.
Reducing your code to a simpler example:
private static void CalculateStuff(Sample s, ProgressBar pb, ulong max)
{
Thread.Sleep(5000);
}
Very simply put, this job takes 5 seconds and cannot be awaited. If you run 3 of these tasks at the same time, they will still be handled one after the other, taking 15 seconds total.
If the job inside your tasks were actually awaitable, you would see a time benefit. E.g.:
private static async void CalculateStuff(Sample s, ProgressBar pb, ulong max)
{
await Task.Delay(5000);
}
This job takes 5 seconds but is awaitable. If you run 3 of these tasks at the same time, your thread will not waste time idling (i.e. waiting for the delay) and will instead start on the following task. Since it can await (i.e. do nothing for) these tasks at the same time, this means that the total processing time takes 5 seconds total (plus some negligible overhead cost).
According to Debug window, CPU usage increases with more tasks.
The managing of tasks takes a small overhead cost, which means that the total amounts of work (which can be measured in CPU usage over time) is slightly higher compared to synchronous code. That is to be expected.
This small cost usually pales in comparison to the benefits gained from well written asynchronous code. However, your code is simply not leveraging the actual benefits from asynchronicity, so you're only seeing the overhead cost and not its benefits, which is why your monitoring is giving you the opposite result of what you were expecting.
My computer has 2 processors, 10 cores each.
CPU cores, threads and tasks are three very different beasts.
Tasks are handled by threads, but they don't necessarily have a one-to-one mapping. Take the example of a team of 4 developers which has 10 bugs to resolve. While this means it's impossible for all 10 bugs to be resolved at the same time, these developers (threads) can take on the tickets (tasks) one after the other, taking on a new ticket (task) whenever they finished their previous ticket (task).
CPU cores are like workstations. It makes little sense to have less workstations (CPU cores) than you have developers (threads), since you'll end up with idling developers.
Additionally, you might not want your developers to be able to claim all workstations. Maybe HR and accounting (= other OS processes) also need to have some guaranteed workstations so they can do their job.
The company (= computer) doesn't just grind to a halt because the developers are fixing some bugs. This is what used to happen on single core machines - if one process claims the CPU, nothing else can happen. If that one process takes long or hangs, everything freezes.
This is why we have a thread pool. There is no straightforward real-world-analogy here (unless maybe a consultancy firm that dynamically adjusts how many developers it sends to your company), but the thread pool is basically able to decide how many developers are allowed to work at the company at the same time in order to ensure that development tasks can be seen to as fast as possible while also ensuring other departments can still do their work on the workstations as well.
It's a careful balancing act, not sending too many developers as that floods the systems, while also not sending too few developers as that means the work gets done too slowly.
The exact configuration of your threadpool is not something I can troubleshoot over a simple Q&A. But the behavior you describe is consistent with having less CPUs (dedicated to your runtime) and/or threads compared to how many tasks you have.
There are a lot of possible reasons that you might not see the performance gains you're expecting, including things like what else your machine's cores are getting used for at the moment. Running this trimmed-down version of your code, I am able to see a marked improvement when running parallel:
private IEnumerable<Sample> CalculateMany(int n)
{
return Enumerable.Range(0, n)
.AsParallel() // comment this to remove parallelism
.Select(i => { var s = new Sample(); CalculateStuff(s, max / (ulong)n); return s; })
.ToList();
}
// Calculate some math stuff
private static void CalculateStuff(Sample s, ulong max)
{
for (ulong i = 1; i <= max; i++)
{
s.A = Math.Pow(s.A, 1.2);
}
}
Here's running CalculateMany with n values as 1, 2, and 4:
Here's what I get if not using parallelism:
I see similar results using Task.Run():
private IEnumerable<Sample> CalculateMany(int n)
{
var tasks =
Enumerable.Range(0, n)
.Select(i => Task.Run(() => { var s = new Sample(); CalculateStuff(s, max / (ulong)n); return s; }))
.ToArray() ;
Task.WaitAll(tasks);
return tasks
.Select(t => t.Result)
.ToList();
}
Unfortunately I can not give you a reason other than probably something with state machine magic that is happening but this significally increases performance:
private async void BThr4_Click(object sender, EventArgs e)
{
Clear();
DateTime start = DateTime.Now;
await Task.Delay(100);
Task<Sample> t1 = Task<Sample>.Run(() => CalculateStuff(PBar1, max / 4));
Task<Sample> t2 = Task<Sample>.Run(() => CalculateStuff(PBar2, max / 4));
Task<Sample> t3 = Task<Sample>.Run(() => CalculateStuff(PBar3, max / 4));
Task<Sample> t4 = Task<Sample>.Run(() => CalculateStuff(PBar4, max / 4));
Sample sample1 = await t1;
Sample sample2 = await t2;
Sample sample3 = await t3;
Sample sample4 = await t4;
TextBox.Text = (DateTime.Now - start).ToString(#"hh\:mm\:ss");
t1.Dispose(); t2.Dispose(); t3.Dispose(); t4.Dispose();
}
// Calculate some math stuff
private static Sample CalculateStuff(ProgressBar pb, ulong max)
{
Sample s = new Sample();
ulong c = max / (ulong)pb.Maximum;
for (ulong i = 1; i <= max; i++)
{
s.A = Math.Pow(s.A, 1.2);
if (i % c == 0)
pb.Invoke(new Action(() => pb.Value = (int)(i / c)));
}
return s;
}
This way you are not keeping Sample instances that the tasks have to access in the calling function but you create the instances within the task and then just return them to the caller after the task has completed.

Redis Get String appears to be slow

I have written some test code to retrieve 1000 strings from my Redis cache. Obviously it is getting the same string in this test but it was written to see how long it would take to get these 1000 items.
The test completes in 23 seconds, so that is only around 43 strings per second that seems quite slow.
I am running this locally against the Redis instance that is in Azure, so I’m assuming there will be some latency. Have I missed out something or is there a way to reduce the time to get these 1000 items?
In my production environment, there could be several thousand items that need to be retrieved.
class Program
{
static async Task Main(string[] args)
{
var connectionString = #"testserver-rc.redis.cache.windows.net:6380,password=password,ssl=True,abortConnect=False,defaultDatabase=2";
var redisClient = new StackExchangeRedisCacheClient(new NewtonsoftSerializer(), connectionString, 2);
await TestGets(redisClient);
Console.ReadLine();
}
private static async Task TestGets(StackExchangeRedisCacheClient redisClient)
{
Console.WriteLine("Running...");
var sw = new Stopwatch();
sw.Start();
for (var i = 0; i < 1000; i++)
{
await redisClient.Database.StringGetAsync("test_T-0004");
}
Console.WriteLine($"{sw.Elapsed.Seconds} seconds");
}
}
43 per second? That sounds pretty fast. That means including overhead and latency you are spending 23ms per query.
I think you want to parallelize the query.
Try replacing your query line with
await Task.WhenAll(Enumerable.Range(0, 1000).Select(I => redisClient.Database.StringGetAsync("test_T-0004"));
The problem is that you are latency bound. You are waiting for each request to complete before firing the next one off.

MongoDB Find() is slow if several .NET Tasks run in the same time

I'm new in MongoDB ecosystem and faced with a problem using MongoDB on my ASP.NET MVC application. If I call Find method for a collection just once it works very fast, just a few milliseconds. But if I call the Find method several times in the same moment using Task.Run() it gets extremely slow and even could exceed 30 sec timeout.
Performance degradation numbers are like the following:
a single query takes ~0.002 second to finish
4 parallel queries take ~3 seconds to finish (average duration is ~2.5 seconds)
40 parallel queries take 30+ seconds to finish and some of them are failed because of timeout (~18 seconds to finish in average).
If I run the same 40 queries one after each other without using tasks, all of them will complete less than in a second (~0.7 sec), i.e. the same ~0.002 sec per query.
What I have done already and it did not help:
No matter whether I create connection object inside the task code or create it once in the beginning and store in static field.
No matter which way I query the data:
collection.AsQueriable().Where(...).Take(1)
collection.Find(Builders<T>.Filter.Eq(...)).FirstOrDefault()
collection.Find(m => ...).FirstOrDefault()
Increasing timeout helps to avoid timeout exceptions but performance is still expectedly bad.
My environment:
I have very few data in my MongoDB collection (21 object, 1995 bytes total, 95 bytes per object average). The MongoDB server is at localhost environment, so network could not be an issue here. Server version is 3.2.4, works as Windows service. I get the same result using MongoDB hosted by linux VM in Azure. .NET MongoDB Driver is the last one and installed from nuget, the version is 2.3.0.157.
I cannot believe that such a mature system cannot handle few queries in the same time. So probably I have missed something. Could someone help and point me out the direction?
EDIT. The code sample I used for testing:
// IMongoDatabase database;
int iterations = 40;
var tasks = new Task<TimeSpan>[iterations];
for (int i = 0; i < iterations; i++) {
var tempI = i;
tasks[i] = new Task<TimeSpan>(() => {
var stopwatch = new Stopwatch();
stopwatch.Start();
var integrationId = INTEGRATION_IDS[tempI];
try {
var metadataCollection = database.GetCollection<CacheMetadata>(METADATA_COLLECTION_NAME);
var query = metadataCollection.AsQueryable()
.Where(m => m.IntegrationId == integrationId)
.Take(1);
var metadata = query.FirstOrDefault();
stopwatch.Stop();
// Write elapsed time to the console
}
catch (Exception ex) {
// Write the exception details to the console
}
return stopwatch.Elapsed;
});
}
for (int i = 0; i < iterations; i++) {
tasks[i].Start();
}
Task.WaitAll(tasks);
// Write summary to the console

Parallel.Invoke gives a minimal performance increase if any

I wrote a simple console app to test the performance of Parallel.Invoke based on Microsoft's example on msdn:
public static void TestParallelInvokeSimple()
{
ParallelOptions parallelOptions = new ParallelOptions { MaxDegreeOfParallelism = 1 }; // 1 to disable threads, -1 to enable them
Parallel.Invoke(parallelOptions,
() =>
{
Stopwatch sw = new Stopwatch();
sw.Start();
Console.WriteLine("Begin first task...");
List<string> objects = new List<string>();
for (int i = 0; i < 10000000; i++)
{
if (objects.Count > 0)
{
string tempstr = string.Join("", objects.Last().Take(6).ToList());
objects.Add(tempstr + i);
}
else
{
objects.Add("START!");
}
}
sw.Stop();
Console.WriteLine("End first task... {0} seconds", sw.Elapsed.TotalSeconds);
},
() =>
{
Stopwatch sw = new Stopwatch();
sw.Start();
Console.WriteLine("Begin second task...");
List<string> objects = new List<string>();
for (int i = 0; i < 10000000; i++)
{
objects.Add("abc" + i);
}
sw.Stop();
Console.WriteLine("End second task... {0} seconds", sw.Elapsed.TotalSeconds);
},
() =>
{
Stopwatch sw = new Stopwatch();
sw.Start();
Console.WriteLine("Begin third task...");
List<string> objects = new List<string>();
for (int i = 0; i < 20000000; i++)
{
objects.Add("abc" + i);
}
sw.Stop();
Console.WriteLine("End third task... {0} seconds", sw.Elapsed.TotalSeconds);
}
);
}
The ParallelOptions is to easily enable/disable threading.
When I disable threading I get the following output:
Begin first task...
End first task... 10.034647 seconds
Begin second task...
End second task... 3.5326487 seconds
Begin third task...
End third task... 6.8715266 seconds
done!
Total elapsed time: 20.4456563 seconds
Press any key to continue . . .
When I enable threading by setting MaxDegreeOfParallelism to -1 I get:
Begin third task...
Begin first task...
Begin second task...
End second task... 5.9112167 seconds
End third task... 13.113622 seconds
End first task... 19.5815043 seconds
done!
Total elapsed time: 19.5884057 seconds
Which is practically the same speed as sequential processing. Since task 1 takes the longest - about 10 seconds, I would expect the threading to take around 10 seconds total to run all 3 tasks. So what gives? Why is Parallel.Invoke running my tasks slower individually, yet in parallel?
BTW, I've seen the exact same results when using Parallel.Invoke in a real app performing many different tasks at the same time (most of which are running queries).
If you think it's my pc, think again... it's 1 year old, with 8GB of RAM, windows 8.1, Intel Core I7 2.7GHz 8 core cpu. My PC is not overloaded as I watched the performance while running my tests over and over again. My PC never maxed out but obviously showed cpu and memory increase when running.
I haven't profiled this, but the majority of the time here is probably being spent doing memory allocation for those lists and tiny strings. These "tasks" aren't actually doing anything other than growing the lists with minimal input and almost no processing time.
Consider that:
objects.Add("abc" + i);
is essentially just creating a new string and then adding it to a list. Creating a small string like this is largely just a memory allocation exercise since strings are stored on the heap. Furthermore, the memory allocated for the List is going to fill up rapidly - each time it does the list will re-allocate more memory for its own storage.
Now, heap allocations are serialized within a process - four threads inside one process cannot allocate memory at the same time. Requests for memory allocation are processed in sequence since the shared heap is like any other shared resource that needs to be protected from concurrent mayhem.
So what you have are three extremely memory-hungry threads that are probably spending most of their time waiting for each other to finish getting new memory.
Fill those methods with CPU intensive work (ie : do some math, etc) and you'll see the results are very different. The lesson is that not all tasks are efficiently parallelizable and not all in the ways that you might think. The above, for example, could be sped up by running each task within its own process - with its own private memory space there would be no contention for memory allocation, for example.

Parallel Library: does a delay on one degree of parallelism delay all of them?

I have a ConcurrentBag urls whose items are being processed in parallel (nothing is being written back to the collection):
urls.AsParallel<UrlInfo>().WithDegreeOfParallelism(17).ForAll( item =>
UrlInfo info = MakeSynchronousWebRequest(item);
(myProgress as IProgress<UrlInfo>).Report(info);
});
I have the timeout set to 30 seconds in the web request. When a url that is very slow to respond is encountered, all of the parallel processing grinds to a halt. Is this expected behavior, or should I be searching out some problem in my code?
Here's the progress :
myProgress = new Progress<UrlInfo>( info =>
{
Action action = () =>
{
Interlocked.Increment(ref itested);
if (info.status == UrlInfo.UrlStatusCode.dead)
{
Interlocked.Increment(ref idead);
this.BadUrls.Add(info);
}
dead.Content = idead.ToString();
tested.Content = itested.ToString();
};
try
{
Dispatcher.BeginInvoke(action);
}
catch (Exception ex)
{
}
});
It's the expected behavior. AsParallel doesn't return until all the operations are finished. Since you're making synchronous requests, you've got to wait until your slowest one is finished. However note that even if you've got one really slow task hogging up a thread, the scheduler continues to schedule new tasks as old ones finish on the remaining threads.
Here's a rather instructive example. It creates 101 tasks. The first task hogs one thread for 5000 ms, the 100 others churn on the remaining 20 threads for 1000 ms each. So it schedules 20 of those tasks and they run for one second each, going through that cycle 5 times to get through all 100 tasks, for a total of 5000 ms. However if you change the 101 to 102, that means you've got 101 tasks churning on the 20 threads, which will end up taking 6000 ms; that 101th task just didn't have a thread to churn on until the 5 sec mark. If you change the 101 to, say, 2, you note it still takes 5000 ms because you have to wait for the slow task to complete.
static void Main()
{
ThreadPool.SetMinThreads(21, 21);
var sw = new Stopwatch();
sw.Start();
Enumerable.Range(0, 101).AsParallel().WithDegreeOfParallelism(21).ForAll(i => Thread.Sleep(i==0?5000:1000));
Console.WriteLine(sw.ElapsedMilliseconds);
}

Categories