How does sequential loop run faster than Parallel loop in C#?

How does sequential loop run faster than Parallel loop in C#? - c#

I tried a very minimal example:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Threading;
using System.Collections.Concurrent;
using System.Diagnostics;
namespace TPLExample {
class Program {
static void Main(string[] args) {
int[] dataItems = new int[100];
double[] resultItems = new double[100];
for (int i = 0; i < dataItems.Length; ++i) {
dataItems[i] = i;
}
Stopwatch stopwatch = new Stopwatch();
stopwatch.Reset();
stopwatch.Start();
Parallel.For(0, dataItems.Length, (index) => {
resultItems[index] = Math.Pow(dataItems[index], 2);
});
stopwatch.Stop();
Console.WriteLine("TPL Time elapsed: {0}", stopwatch.Elapsed);
stopwatch.Reset();
stopwatch.Start();
for (int i = 0; i < dataItems.Length; ++i) {
resultItems[i] = Math.Pow(dataItems[i], 2);
}
stopwatch.Stop();
Console.WriteLine("Sequential Time elapsed: {0}", stopwatch.Elapsed);
WaitForEnterKey();
}
public static void WaitForEnterKey() {
Console.WriteLine("Press enter to finish");
Console.ReadLine();
}
public static void PrintMessage() {
Console.WriteLine("Message printed");
}
}
}
The output was:
TPL Time elapsed: 00:00:00.0010670
Sequential Time elapsed: 00:00:00.0000178
Press enter to finish
The sequential loop is way faster than TPL! How is this possible? From my understanding, calculation within the Parallel.For will be executed in parallel, so must it be faster?

Simply put: For only iterating over a hundred items and performing a small mathematical operation, spawning new threads and waiting for them to complete produces more overhead than just running through the loop would.
From my understanding, calculation within the Parallel.For will be executed in parallel, so must it be faster?
As generally happens when people make sweeping statements about computer performance, there are far more variables at play here, and you can't really make that assumption. For example, inside your for loop, you are doing nothing more than Math.Pow, which the processor can perform very quickly. If this were an I/O intensive operation, requiring each thread to wait a long time, or even if it were a series of processor-intensive operations, you would get more out of Parallel processing (assuming you have a multi-threaded processor). But as it is, the overhead of creating and synchronizing these threads is far greater than any advantage that parallelism might give you.

Parallel loop processing is beneficial when the operation performed within the loop is relatively costly. All you're doing in your example is calculating an exponent, which is trivial. The overhead of multithreading is far outweighing the gains that you're getting in this case.

This code example is practical proof really nice answers above.
I've simulated intensive processor operation by simply blocking thread by Thead.Sleep.
The output was:
Sequential Loop - 00:00:09.9995500
Parallel Loop - 00:00:03.0347901
_
class Program
{
static void Main(string[] args)
{
const int a = 10;
Stopwatch sw = new Stopwatch();
sw.Start();
//for (long i = 0; i < a; i++)
//{
// Thread.Sleep(1000);
//}
Parallel.For(0, a, i =>
{
Thread.Sleep(1000);
});
sw.Stop();
Console.WriteLine(sw.Elapsed);
Console.ReadLine();
}
}

The overhead of parallelization is far greater than simply running Math.Pow 100 times sequentially. The others have said this.
More importantly, though, the memory access is trivial in the sequential version, but with the parallel version, the threads have to share memory (resultItems) and that kind of thing will really kill you even if you have a million items.
See page 44 of this excellent Microsoft whitepaper on parallel programming:
http://www.microsoft.com/en-us/download/details.aspx?id=19222. Here is an MSDN magazine article on the subject: http://msdn.microsoft.com/en-us/magazine/cc872851.aspx

Related

Parallel.Invoke gives a minimal performance increase if any

I wrote a simple console app to test the performance of Parallel.Invoke based on Microsoft's example on msdn:
public static void TestParallelInvokeSimple()
{
ParallelOptions parallelOptions = new ParallelOptions { MaxDegreeOfParallelism = 1 }; // 1 to disable threads, -1 to enable them
Parallel.Invoke(parallelOptions,
() =>
{
Stopwatch sw = new Stopwatch();
sw.Start();
Console.WriteLine("Begin first task...");
List<string> objects = new List<string>();
for (int i = 0; i < 10000000; i++)
{
if (objects.Count > 0)
{
string tempstr = string.Join("", objects.Last().Take(6).ToList());
objects.Add(tempstr + i);
}
else
{
objects.Add("START!");
}
}
sw.Stop();
Console.WriteLine("End first task... {0} seconds", sw.Elapsed.TotalSeconds);
},
() =>
{
Stopwatch sw = new Stopwatch();
sw.Start();
Console.WriteLine("Begin second task...");
List<string> objects = new List<string>();
for (int i = 0; i < 10000000; i++)
{
objects.Add("abc" + i);
}
sw.Stop();
Console.WriteLine("End second task... {0} seconds", sw.Elapsed.TotalSeconds);
},
() =>
{
Stopwatch sw = new Stopwatch();
sw.Start();
Console.WriteLine("Begin third task...");
List<string> objects = new List<string>();
for (int i = 0; i < 20000000; i++)
{
objects.Add("abc" + i);
}
sw.Stop();
Console.WriteLine("End third task... {0} seconds", sw.Elapsed.TotalSeconds);
}
);
}
The ParallelOptions is to easily enable/disable threading.
When I disable threading I get the following output:
Begin first task...
End first task... 10.034647 seconds
Begin second task...
End second task... 3.5326487 seconds
Begin third task...
End third task... 6.8715266 seconds
done!
Total elapsed time: 20.4456563 seconds
Press any key to continue . . .
When I enable threading by setting MaxDegreeOfParallelism to -1 I get:
Begin third task...
Begin first task...
Begin second task...
End second task... 5.9112167 seconds
End third task... 13.113622 seconds
End first task... 19.5815043 seconds
done!
Total elapsed time: 19.5884057 seconds
Which is practically the same speed as sequential processing. Since task 1 takes the longest - about 10 seconds, I would expect the threading to take around 10 seconds total to run all 3 tasks. So what gives? Why is Parallel.Invoke running my tasks slower individually, yet in parallel?
BTW, I've seen the exact same results when using Parallel.Invoke in a real app performing many different tasks at the same time (most of which are running queries).
If you think it's my pc, think again... it's 1 year old, with 8GB of RAM, windows 8.1, Intel Core I7 2.7GHz 8 core cpu. My PC is not overloaded as I watched the performance while running my tests over and over again. My PC never maxed out but obviously showed cpu and memory increase when running.

I haven't profiled this, but the majority of the time here is probably being spent doing memory allocation for those lists and tiny strings. These "tasks" aren't actually doing anything other than growing the lists with minimal input and almost no processing time.
Consider that:
objects.Add("abc" + i);
is essentially just creating a new string and then adding it to a list. Creating a small string like this is largely just a memory allocation exercise since strings are stored on the heap. Furthermore, the memory allocated for the List is going to fill up rapidly - each time it does the list will re-allocate more memory for its own storage.
Now, heap allocations are serialized within a process - four threads inside one process cannot allocate memory at the same time. Requests for memory allocation are processed in sequence since the shared heap is like any other shared resource that needs to be protected from concurrent mayhem.
So what you have are three extremely memory-hungry threads that are probably spending most of their time waiting for each other to finish getting new memory.
Fill those methods with CPU intensive work (ie : do some math, etc) and you'll see the results are very different. The lesson is that not all tasks are efficiently parallelizable and not all in the ways that you might think. The above, for example, could be sped up by running each task within its own process - with its own private memory space there would be no contention for memory allocation, for example.

C# WebClient with Task.Run only achieve 5% network usage. WHY?

I am experimenting / learning the new Task library and I have written a very simple html downloader using WebClient and Task.Run. However I can never reach anything more than 5% on my network usage. I would like to understand why and how I can improve my code to reach 100% network usage / throughput (probably not possible but it has to be a lot more than 5%).
I would also like to be able to limit the number of thread however it seems it's not as easy as I thought (i.e. custom task scheduler). Is there a way to just do something like this to set the max thread count: something.SetMaxThread(2)?
internal static class Program
{
private static void Main()
{
for (var i = 0; i < 1000000; i++)
{
Go(i, Thread.CurrentThread.ManagedThreadId);
}
Console.Read();
}
private static readonly Action<int, int> Go = (counter, threadId) => Task.Run(() =>
{
var stopwatch = new Stopwatch();
stopwatch.Start();
var webClient = new WebClient();
webClient.DownloadString(new Uri("http://stackoverflow.com"));
stopwatch.Stop();
Console.Write("{0} == {1} | ", threadId.ToString("D3"), Thread.CurrentThread.ManagedThreadId.ToString("D3"));
Console.WriteLine("{0}: {1}ms ", counter.ToString("D3"), stopwatch.ElapsedMilliseconds.ToString("D4"));
});
}
This is the async version according to #spender. However my understanding is that await will "remember" the point in time and hand off the download to OS level and skip (the 2 console.write) and return to main immediately and continue scheduling the remaining Go method in the for loop. Am I understanding it correctly? So there's no blocking on the UI.
private static async void Go(int counter, int threadId)
{
using (var webClient = new WebClient())
{
var stopWatch = new Stopwatch();
stopWatch.Start();
await webClient.DownloadStringTaskAsync(new Uri("http://ftp.iinet.net.au/test500MB.dat"));
stopWatch.Stop();
Console.Write("{0} == {1} | ", threadId.ToString("D3"), Thread.CurrentThread.ManagedThreadId.ToString("D3"));
Console.WriteLine("{0}: {1}ms ", counter.ToString("D3"), stopWatch.ElapsedMilliseconds.ToString("D4"));
}
}
What I noticed was that when I am downloading large files there's no that much difference in terms of download speed / network usage. They (threading version and the async version) both peaked at about 12.5% network usage and about 12MByte download /sec. I also tried to run multiple instances (multiple .exe running) and again there's no huge difference between the two. And when I am trying to download large files from 2 URLs concurrently (20 instances) I get similar network usage (12.5%) and download speed (10-12MByte /sec). I guess I am reaching the peak?

As it stands, your code is suboptimal because, although you are using Task.Run to create asynchronous code that runs in the ThreadPool, the code that is being run in the ThreadPool is still blocking on the line:
webClient.DownloadString(...
This amounts to an abuse of the ThreadPool because it is not designed to run blocking tasks, and is slow to spin up additional threads to deal with peaks in workload. This in turn will have a seriously degrading effect on the smooth running of any API that uses the ThreadPool (timers, async callbacks, they're everywhere), because they'll schedule work that goes to the back of the (saturated) queue for the ThreadPool (which is tied up reluctantly and slowly spinning up hundreds of threads that are going to spend 99.9% of their time doing nothing).
Stop blocking the ThreadPool and switch to proper async methods that do not block.
So now you can literally break your router and seriously upset the SO site admins with the following simple mod:
private static void Main()
{
for (var i = 0; i < 1000000; i++)
{
Go(i, Thread.CurrentThread.ManagedThreadId);
}
Console.Read();
}
private static async Task Go(int counter, int threadId)
{
var stopwatch = new Stopwatch();
stopwatch.Start();
using (var webClient = new WebClient())
{
await webClient.DownloadStringTaskAsync(
new Uri("http://stackoverflow.com"));
}
//...
}
HttpWebRequest (and therefore WebClient) are also constrained by a number of limits.

Multithreaded code executes by threadnumber-times slower using System.Threading and Visual Studio C# Express Hosting Process

I have a very simple program counting the characters in a string. An integer threadnum sets the number of threads and divides the data by threadnum accordingly into chunks for each thread to process.
Each thread increments the values contained in a shared dictionary, building a character historgram.
private Dictionary<UInt32, int> dict = new Dictionary<UInt32, int>();
In order to wait for all threads to finish and continue with the main process, I invoke Thread.Join
Initially I had a local dictionary for each thread which get merged afterwards, but a shared dictionary worked fine, without locking.
No references are locked in the method BuildDictionary, though locking the dictionary did not significantly impact thread-execution time.
Each thread is timed, and the resulting dictionary compared.
The dictionary content is the same regardless of a single or multiple threads - as it should be.
Each thread takes a fraction determined by threadnum to complete - as it should be.
Problem:
The total time is roughly a multiple of threadnum , that is to say the execution time increases ?
(Unfortunately I cannot run a C# Profiler at the moment. Additionally I would prefer C# 3 code compatibility. )
Others are likely struggling as well. It may be that the VS 2010 express edition vshost process stacks and schedules threads to be run sequentially?
Another MT-performance issue was posted recently posted here as "Visual Studio C# 2010 Express Debug running Faster than Release":
Code:
public int threadnum = 8;
Thread[] threads = new Thread[threadnum];
Stopwatch stpwtch = new Stopwatch();
stpwtch.Start();
for (var threadidx = 0; threadidx < threadnum; threadidx++)
{
threads[threadidx] = new Thread(BuildDictionary);
threads[threadidx].Start(threadidx);
threads[threadidx].Join(); //Blocks the calling thread, till thread completion
}
WriteLine("Total - time: {0} msec", stpwtch.ElapsedMilliseconds);
Can you help please?
Update:
It appears that the strange behavior of an almost linear slowdown with increasing thread-number is an artifact due to the numerous hooks of the IDE's Debugger.
Running the process outside the developer environment, I actually do get a 30% speed increase on a 2 logical/physical core machine. During debugging I am already at the high end of CPU utilization, and hence I suspect it is wise to have some leeway during development through additional idle cores.
As initially, I let each thread compute on its own local data-chunk, which is locked and written back to a shared list and aggregated after all threads have finished.
Conclusion:
Be heedful of the environment the process is running in.

We can put the dictionary synchronization issues Tony the Lion mentions in his answer aside for the moment, because in your current implementation you are in fact not running anything in parallel!
Let's take a look at what you are currently doing in your loop:
Start a thread.
Wait for the thread to complete.
Start the next thread.
In other words, you should not be calling Join inside the loop.
Instead, you should start all threads as you are doing, but use a singaling construct such as an AutoResetEvent to determine when all threads have completed.
See example program:
class Program
{
static EventWaitHandle _waitHandle = new AutoResetEvent(false);
static void Main(string[] args)
{
int numThreads = 5;
for (int i = 0; i < numThreads; i++)
{
new Thread(DoWork).Start(i);
}
for (int i = 0; i < numThreads; i++)
{
_waitHandle.WaitOne();
}
Console.WriteLine("All threads finished");
}
static void DoWork(object id)
{
Thread.Sleep(1000);
Console.WriteLine(String.Format("Thread {0} completed", (int)id));
_waitHandle.Set();
}
}
Alternatively you could just as well be calling Join in the second loop if you have references to the threads available.
After you have done this you can and should worry about the dictionary synchronization problems.

A Dictionary can support multiple readers concurrently, as long as the collection is not modified. From MSDN
You say:
but a shared dictionary worked fine, without locking.
Each thread increments the values contained in a shared dictionary
Your program is by definition broken, if you alter the data in the dictionary without proper locking, you will end up with bugs. Nothing more needs to be said.

I wouldn't use some shared static Dictionary, if each thread worked on a local copy you could amalgamate your results once all threads had signalled completion.
WaitHandle.WaitAll avoids any deadlocking on an AutoResetEvent.
class Program
{
static void Main()
{
char[] text = "Some String".ToCharArray();
int numThreads = 5;
// I leave the implementation of the next line to the OP.
Partition[] partitions = PartitionWork(text, numThreads);
completions = new WaitHandle[numThreads];
results = IDictionary<char, int>[numThreads];
for (int i = 0; i < numThreads; i++)
{
results[i] = new IDictionary<char, int>();
completions[i] = new ManualResetEvent(false);
new Thread(DoWork).Start(
text,
partitions[i].Start,
partitions[i].End,
results[i],
completions[i]);
}
if (WaitHandle.WaitAll(completions, new TimeSpan(366, 0, 0, 0))
{
Console.WriteLine("All threads finished");
}
else
{
Console.WriteLine("Timed out after a year and a day");
}
// Merge the results
IDictionary<char, int> result = results[0];
for (int i = 1; i < numThreads - 1; i ++)
{
foreach(KeyValuePair<char, int> item in results[i])
{
if (result.ContainsKey(item.Key)
{
result[item.Key] += item.Value;
}
else
{
result.Add(item.Key, item.Value);
}
}
}
}
static void BuildDictionary(
char[] text,
int start,
int finish,
IDictionary<char, int> result,
WaitHandle completed)
{
for (int i = start; i <= finish; i++)
{
if (result.ContainsKey(text[i])
{
result[text[i]]++;
}
else
{
result.Add(text[i], 1);
}
}
completed.Set();
}
}
With this implementation the only variable that is ever shared is the char[] of the text and that is always read only.
You do have the burden of merging the dictionaries at the end but, that is a small price for avoiding any concurrencey issues. In a later version of the framework I would have used TPL and ConcurrentDictionary and possibly Partitioner<TSource>.

I totally agree with TonyTheLion and others, and as you fix the actual problem with join'ing at the wrong place, there still will be problem with (no) locks and updating the shared dictionary. I wanted to drop you a quick workaround: just wrap your integer value into some object:
instead of:
Dictionary<uint, int> dict = new Dictionary<uint, int>();
use:
class Entry { public int value; }
Dictionary<uint, Entry> dict = new Dictionary<uint, Entry>();
and now increment the Entry::value instead. That way, the Dictionary will not notice any changes and it will be safe without locking the dictionary.
Note: this will however work only if you are guaranteed if one thread would use only its own one Entry. I've just noticed this is not true as you said 'histogram of characters'. You will have to lock over each Entry during the increment, or some increments may be lost. Still, locking at Entry layer will speed up signinificantly when compared to locking at whole dictionary

Roem saw it.
Your main thread should Join the X other Threads after having started all of them.
Else it waits for the 1st thread to be finished, to start and wait for the 2nd one.
for (var threadidx = 0; threadidx < threadnum; threadidx++)
{
threads[threadidx] = new Thread(BuildDictionary);
threads[threadidx].Start(threadidx);
}
for (var threadidx = 0; threadidx < threadnum; threadidx++)
{
threads[threadidx].Join(); //Blocks the calling thread, till thread completion
}

As Rotem points out, by joining in the loop you are waiting for each thread to complete before going continuing.
The hint for why this is can be found on the Thread.Join documentation on MSDN
Blocks the calling thread until a thread terminates
So you loop will not continue until that one thread has completed it's work. To start all the threads then wait for them to complete, join them outside the loop:
public int threadnum = 8;
Thread[] threads = new Thread[threadnum];
Stopwatch stpwtch = new Stopwatch();
stpwtch.Start();
// Start all the threads doing their work
for (var threadidx = 0; threadidx < threadnum; threadidx++)
{
threads[threadidx] = new Thread(BuildDictionary);
threads[threadidx].Start(threadidx);
}
// Join to all the threads to wait for them to complete
for (var threadidx = 0; threadidx < threadnum; threadidx++)
{
threads[threadidx].Join();
}
System.Diagnostics.Debug.WriteLine("Total - time: {0} msec", stpwtch.ElapsedMilliseconds);
You will really need to post your BuildDictionary function. It is very likely that the operation will be no faster with multiple threads and the threading overhead will actually increase execution time.

Multithreading and speeding things up

I have the following piece of code. I wish to start the file creation on multiple threads. The objective is that it will take less time to create 10 files when I do it on multiple threads. As I understand I need to introduce the element of asynchronous calls to make that happen.
What changes should I make in this piece of code?
using System;
using System.Text;
using System.Threading;
using System.IO;
using System.Diagnostics;
namespace MultiDemo
{
class MultiDemo
{
public static void Main()
{
var stopWatch = new Stopwatch();
stopWatch.Start();
// Create an instance of the test class.
var ad = new MultiDemo();
//Should create 10 files in a loop.
for (var x = 0; x < 10; x++)
{
var y = x;
int threadId;
var myThread = new Thread(() => TestMethod("outpFile", y, out threadId));
myThread.Start();
myThread.Join();
//TestMethod("outpFile", y, out threadId);
}
stopWatch.Stop();
Console.WriteLine("Seconds Taken:\t{0}",stopWatch.Elapsed.TotalMilliseconds);
}
public static void TestMethod(string fileName, int hifi, out int threadId)
{
fileName = fileName + hifi;
var fs = new FileStream(fileName, FileMode.OpenOrCreate, FileAccess.ReadWrite);
var sw = new StreamWriter(fs, Encoding.UTF8);
for (int x = 0; x < 10000; x++)
{
sw.WriteLine(DateTime.Now.ToString());
}
sw.Close();
threadId = Thread.CurrentThread.ManagedThreadId;
Console.WriteLine("{0}",threadId);
}
}
}
Right now, if I comment the thread creation part of the code and just call testMethod 10 times in a loop, it is faster than the multiple threads that the thread creation attempts to process.

The threaded version of your code is doing extra work, so it's not suprising that it's slower.
When you do something like:
var myThread = new Thread(() => TestMethod("outpFile", y, out threadId));
myThread.Start();
myThread.Join();
...you're creating a thread, having it call TestMethod, then waiting for it to finish. The additional overhead of creating and starting a thread will make things slower than just calling TestMethod without any threads.
It's possible that you'll see better performance if you start all of the threads working and then wait for them to finish, e.g.:
var workers = new List<Thread>();
for (int i = 0; i < 10; ++i)
{
var y = x;
int threadId;
var myThread = new Thread(() => TestMethod("outpFile", y, out threadId));
myThread.Start();
workers.Add(myThread);
}
foreach (var worker in workers) worker.Join();

Perhaps this doesn't directly answer your question but here is my thought on the matter. The bottleneck in that code is unlikely to be the processor. I would bet the disk IO would take way more time than the CPU processing. As such, I don't believe that creating new threads will help at all (all the threads will attempt to write to the same disk). I think this is a case of premature optimization. If I were you, I would just do it all on one thread.

The reason you're slower is that all you're doing is starting up a new thread and waiting for it to complete so it has to be slower because your other method is simply not doing 3 steps.
Try this out (assuming .Net 4.0 because of TPL). On my machine, it's consistently 100ms faster when done in parallel.
[Test]
public void Y()
{
var sw = Stopwatch.StartNew();
Parallel.For(0, 10, n => TestMethod("parallel", n));
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
sw.Restart();
for (int i = 0; i < 10; i++)
TestMethod("forloop", i);
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
private static void TestMethod(string fileName, int hifi)
{
fileName = fileName + hifi;
var fs = new FileStream(fileName, FileMode.OpenOrCreate, FileAccess.ReadWrite);
var sw = new StreamWriter(fs, Encoding.UTF8);
for (int x = 0; x < 10000; x++)
{
sw.WriteLine(DateTime.Now.ToString());
}
sw.Close();
}

The primary thing to observe in your case is Amdahl's Law. Your algorithm makes roughly equal use of each of the following resources:
Processor usage
Memory access
Drive access
Of these, the drive access is by far the slowest item, so to see speedup you'll need to parallelize your algorithm across this resource. In other words, if you parallelize your program by writing the 10 different files to 10 different drives, you'll see a substantial performance improvement compared to just parallelizing the computation of the file contents. In fact if you create the files on 10 different threads, the serialization involved with drive access could actually reduce the overall performance of your program.
Although both imply multi-threaded programming, parallelization should NOT be treated the same as asynchronous programming in the case of IO. While I would not recommend parallelizing your use of the file system, it is almost always beneficial to use asynchronous methods for reading/writing to files.

it's wrong way to get up speed, multithreading for parallel work, but not for accelerate

So why did you decide to use multi threading? The price of starting a new thread might be higher than a simple loop. Its not something you can blindly decide about... If you insist on using threads, you can also check the managed ThreadPool / usage of async delegates, which can reduce the cost of creating new threads, by re-using existing ones.

You're negating the benefit of multiple threads because you Join each thread and thus wait for it to complete before you create and start the next thread.
Instead, add the threads to a list as you create and start them, and then loop through the list of threads, joining them in sequence until they finish.
using System.Collections.Generic;
List<Thread> threads= new List<Thread>();
//Should create 10 files in a loop.
for (var x = 0; x < 10; x++)
{
var y = x;
int threadId;
var myThread = new Thread(() => TestMethod("outpFile", y, out threadId));
threads.Add(myThread);
myThread.Start();
//myThread.Join();
//TestMethod("outpFile", y, out threadId);
}
foreach (var thread in threads) thread.Join();

try something like:
for (int i = 0; i < 10; ++i)
{
new Action(() => { TestMethod("outpFile"); }).BeginInvoke(null,null);
}
Console.ReadLine();
if it wont be quicker than serial calls then indeed your IO is a botleneck and nothing you can do about it.

How do Tasks in the Task Parallel Library affect ActivityID?

Before using the Task Parallel Library, I have often used CorrelationManager.ActivityId to keep track of tracing/error reporting with multiple threads.
ActivityId is stored in Thread Local Storage, so each thread get's its own copy. The idea is that when you fire up a thread (activity), you assign a new ActivityId. The ActivityId will be written to the logs with any other trace information, making it possible to single out the trace information for a single 'Activity'. This is really useful with WCF as the ActivityId can be carried over to the service component.
Here is an example of what I'm talking about:
static void Main(string[] args)
{
ThreadPool.QueueUserWorkItem(new WaitCallback((o) =>
{
DoWork();
}));
}
static void DoWork()
{
try
{
Trace.CorrelationManager.ActivityId = Guid.NewGuid();
//The functions below contain tracing which logs the ActivityID.
CallFunction1();
CallFunction2();
CallFunction3();
}
catch (Exception ex)
{
Trace.Write(Trace.CorrelationManager.ActivityId + " " + ex.ToString());
}
}
Now, with the TPL, my understanding is that multiple Tasks share Threads. Does this mean that ActivityId is prone to being reinitialized mid-task (by another task)? Is there a new mechanism to deal with activity tracing?

I ran some experiments and it turns out the assumption in my question is incorrect - multiple tasks created with the TPL do not run on the same thread at the same time.
ThreadLocalStorage is safe to use with TPL in .NET 4.0, since a thread can only be used by one task at a time.
The assumption that tasks can share threads concurrently was based on an interview I heard about c# 5.0 on DotNetRocks (sorry, I can't remember which show it was) - so my question may (or may not) become relevant soon.
My experiment starts a number of tasks, and records how many tasks ran, how long they took, and how many threads were consumed. The code is below if anyone would like to repeat it.
class Program
{
static void Main(string[] args)
{
int totalThreads = 100;
TaskCreationOptions taskCreationOpt = TaskCreationOptions.None;
Task task = null;
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
Task[] allTasks = new Task[totalThreads];
for (int i = 0; i < totalThreads; i++)
{
task = Task.Factory.StartNew(() =>
{
DoLongRunningWork();
}, taskCreationOpt);
allTasks[i] = task;
}
Task.WaitAll(allTasks);
stopwatch.Stop();
Console.WriteLine(String.Format("Completed {0} tasks in {1} milliseconds", totalThreads, stopwatch.ElapsedMilliseconds));
Console.WriteLine(String.Format("Used {0} threads", threadIds.Count));
Console.ReadKey();
}
private static List<int> threadIds = new List<int>();
private static object locker = new object();
private static void DoLongRunningWork()
{
lock (locker)
{
//Keep a record of the managed thread used.
if (!threadIds.Contains(Thread.CurrentThread.ManagedThreadId))
threadIds.Add(Thread.CurrentThread.ManagedThreadId);
}
Guid g1 = Guid.NewGuid();
Trace.CorrelationManager.ActivityId = g1;
Thread.Sleep(3000);
Guid g2 = Trace.CorrelationManager.ActivityId;
Debug.Assert(g1.Equals(g2));
}
}
The output (of course this will depend on the machine) was:
Completed 100 tasks in 23097 milliseconds
Used 23 threads
Changing taskCreationOpt to TaskCreationOptions.LongRunning gave different results:
Completed 100 tasks in 3458 milliseconds
Used 100 threads

Please forgive my posting this as an answer as it is not really answer to your question, however, it is related to your question since it deals with CorrelationManager behavior and threads/tasks/etc. I have been looking at using the CorrelationManager's LogicalOperationStack (and StartLogicalOperation/StopLogicalOperation methods) to provide additional context in multithreading scenarios.
I took your example and modified it slightly to add the ability to perform work in parallel using Parallel.For. Also, I use StartLogicalOperation/StopLogicalOperation to bracket (internally) DoLongRunningWork. Conceptually, DoLongRunningWork does something like this each time it is executed:
DoLongRunningWork
StartLogicalOperation
Thread.Sleep(3000)
StopLogicalOperation
I have found that if I add these logical operations to your code (more or less as is), all of the logical operatins remain in sync (always the expected number of operations on stack and the values of the operations on the stack are always as expected).
In some of my own testing I found that this was not always the case. The logical operation stack was getting "corrupted". The best explanation I could come up with is that the "merging" back of the CallContext information into the "parent" thread context when the "child" thread exits was causing the "old" child thread context information (logical operation) to be "inherited" by another sibling child thread.
The problem might also be related to the fact that Parallel.For apparently uses the main thread (at least in the example code, as written) as one of the "worker threads" (or whatever they should be called in the parallel domain). Whenever DoLongRunningWork is executed, a new logical operation is started (at the beginning) and stopped (at the end) (that is, pushed onto the LogicalOperationStack and popped back off of it). If the main thread already has a logical operation in effect and if DoLongRunningWork executes ON THE MAIN THREAD, then a new logical operation is started so the main thread's LogicalOperationStack now has TWO operations. Any subsequent executions of DoLongRunningWork (as long as this "iteration" of DoLongRunningWork is executing on the main thread) will (apparently) inherit the main thread's LogicalOperationStack (which now has two operations on it, rather than just the one expected operation).
It took me a long time to figure out why the behavior of the LogicalOperationStack was different in my example than in my modified version of your example. Finally I saw that in my code I had bracketed the entire program in a logical operation, whereas in my modified version of your test program I did not. The implication is that in my test program, each time my "work" was performed (analogous to DoLongRunningWork), there was already a logical operation in effect. In my modified version of your test program, I had not bracketed the entire program in a logical operation.
So, when I modified your test program to bracket the entire program in a logical operation AND if I am using Parallel.For, I ran into exactly the same problem.
Using the conceptual model above, this will run successfully:
Parallel.For
DoLongRunningWork
StartLogicalOperation
Sleep(3000)
StopLogicalOperation
While this will eventually assert due to an apparently out of sync LogicalOperationStack:
StartLogicalOperation
Parallel.For
DoLongRunningWork
StartLogicalOperation
Sleep(3000)
StopLogicalOperation
StopLogicalOperation
Here is my sample program. It is similar to yours in that it has a DoLongRunningWork method that manipulates the ActivityId as well as the LogicalOperationStack. I also have two flavors of kicking of DoLongRunningWork. One flavor uses Tasks one uses Parallel.For. Each flavor can also be executed such that the whole parallelized operation is enclosed in a logical operation or not. So, there are a total of 4 ways to execute the parallel operation. To try each one, simply uncomment the desired "Use..." method, recompile, and run. UseTasks, UseTasks(true), and UseParallelFor should all run to completion. UseParallelFor(true) will assert at some point because the LogicalOperationStack does not have the expected number of entries.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
namespace CorrelationManagerParallelTest
{
class Program
{
static void Main(string[] args)
{
//UseParallelFor(true) will assert because LogicalOperationStack will not have expected
//number of entries, all others will run to completion.
UseTasks(); //Equivalent to original test program with only the parallelized
//operation bracketed in logical operation.
////UseTasks(true); //Bracket entire UseTasks method in logical operation
////UseParallelFor(); //Equivalent to original test program, but use Parallel.For
//rather than Tasks. Bracket only the parallelized
//operation in logical operation.
////UseParallelFor(true); //Bracket entire UseParallelFor method in logical operation
}
private static List<int> threadIds = new List<int>();
private static object locker = new object();
private static int mainThreadId = Thread.CurrentThread.ManagedThreadId;
private static int mainThreadUsedInDelegate = 0;
// baseCount is the expected number of entries in the LogicalOperationStack
// at the time that DoLongRunningWork starts. If the entire operation is bracketed
// externally by Start/StopLogicalOperation, then baseCount will be 1. Otherwise,
// it will be 0.
private static void DoLongRunningWork(int baseCount)
{
lock (locker)
{
//Keep a record of the managed thread used.
if (!threadIds.Contains(Thread.CurrentThread.ManagedThreadId))
threadIds.Add(Thread.CurrentThread.ManagedThreadId);
if (Thread.CurrentThread.ManagedThreadId == mainThreadId)
{
mainThreadUsedInDelegate++;
}
}
Guid lo1 = Guid.NewGuid();
Trace.CorrelationManager.StartLogicalOperation(lo1);
Guid g1 = Guid.NewGuid();
Trace.CorrelationManager.ActivityId = g1;
Thread.Sleep(3000);
Guid g2 = Trace.CorrelationManager.ActivityId;
Debug.Assert(g1.Equals(g2));
//This assert, LogicalOperation.Count, will eventually fail if there is a logical operation
//in effect when the Parallel.For operation was started.
Debug.Assert(Trace.CorrelationManager.LogicalOperationStack.Count == baseCount + 1, string.Format("MainThread = {0}, Thread = {1}, Count = {2}, ExpectedCount = {3}", mainThreadId, Thread.CurrentThread.ManagedThreadId, Trace.CorrelationManager.LogicalOperationStack.Count, baseCount + 1));
Debug.Assert(Trace.CorrelationManager.LogicalOperationStack.Peek().Equals(lo1), string.Format("MainThread = {0}, Thread = {1}, Count = {2}, ExpectedCount = {3}", mainThreadId, Thread.CurrentThread.ManagedThreadId, Trace.CorrelationManager.LogicalOperationStack.Peek(), lo1));
Trace.CorrelationManager.StopLogicalOperation();
}
private static void UseTasks(bool encloseInLogicalOperation = false)
{
int totalThreads = 100;
TaskCreationOptions taskCreationOpt = TaskCreationOptions.None;
Task task = null;
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
if (encloseInLogicalOperation)
{
Trace.CorrelationManager.StartLogicalOperation();
}
Task[] allTasks = new Task[totalThreads];
for (int i = 0; i < totalThreads; i++)
{
task = Task.Factory.StartNew(() =>
{
DoLongRunningWork(encloseInLogicalOperation ? 1 : 0);
}, taskCreationOpt);
allTasks[i] = task;
}
Task.WaitAll(allTasks);
if (encloseInLogicalOperation)
{
Trace.CorrelationManager.StopLogicalOperation();
}
stopwatch.Stop();
Console.WriteLine(String.Format("Completed {0} tasks in {1} milliseconds", totalThreads, stopwatch.ElapsedMilliseconds));
Console.WriteLine(String.Format("Used {0} threads", threadIds.Count));
Console.WriteLine(String.Format("Main thread used in delegate {0} times", mainThreadUsedInDelegate));
Console.ReadKey();
}
private static void UseParallelFor(bool encloseInLogicalOperation = false)
{
int totalThreads = 100;
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
if (encloseInLogicalOperation)
{
Trace.CorrelationManager.StartLogicalOperation();
}
Parallel.For(0, totalThreads, i =>
{
DoLongRunningWork(encloseInLogicalOperation ? 1 : 0);
});
if (encloseInLogicalOperation)
{
Trace.CorrelationManager.StopLogicalOperation();
}
stopwatch.Stop();
Console.WriteLine(String.Format("Completed {0} tasks in {1} milliseconds", totalThreads, stopwatch.ElapsedMilliseconds));
Console.WriteLine(String.Format("Used {0} threads", threadIds.Count));
Console.WriteLine(String.Format("Main thread used in delegate {0} times", mainThreadUsedInDelegate));
Console.ReadKey();
}
}
}
This whole issue of if LogicalOperationStack can be used with Parallel.For (and/or other threading/Task constructs) or how it can be used probably merits its own question. Maybe I will post a question. In the meantime, I wonder if you have any thoughts on this (or, I wonder if you had considered using LogicalOperationStack since ActivityId appears to be safe).
[EDIT]
See my answer to this question for more information about using LogicalOperationStack and/or CallContext.LogicalSetData with some of the various Thread/ThreadPool/Task/Parallel contstructs.
See also my question here on SO about LogicalOperationStack and Parallel extensions:
Is CorrelationManager.LogicalOperationStack compatible with Parallel.For, Tasks, Threads, etc
Finally, see also my question here on Microsoft's Parallel Extensions forum:
http://social.msdn.microsoft.com/Forums/en-US/parallelextensions/thread/7c5c3051-133b-4814-9db0-fc0039b4f9d9
In my testing it looks like Trace.CorrelationManager.LogicalOperationStack can become corrupted when using Parallel.For or Parallel.Invoke IF you start a logical operation in the main thread and then start/stop logical operations in the delegate. In my tests (see either of the two links above) the LogicalOperationStack should always have exactly 2 entries when DoLongRunningWork is executing (if I start a logical operation in the main thread before kicking of DoLongRunningWork using various techniques). So, by "corrupted" I mean that the LogicalOperationStack will eventually have many more than 2 entries.
From what I can tell, this is probably because Parallel.For and Parallel.Invoke use the main thread as one of the "worker" threads to perform the DoLongRunningWork action.
Using a stack stored in CallContext.LogicalSetData to mimic the behavior of the LogicalOperationStack (similar to log4net's LogicalThreadContext.Stacks which is stored via CallContext.SetData) yields even worse results. If I am using such a stack to maintain context, it becomes corrupted (i.e. does not have the expected number of entries) in almost all of the scenarios where I have a "logical operation" in the main thread and a logical operation in each iteration/execution of the DoLongRunningWork delegate.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.