C# Multithreading - Threads don't run

C# Multithreading - Threads don't run - c#

I am teaching myself C# from my usual C++ programming and now I'm doing threads.
The following simple code compiles fine and should output beeps on a loop via threads for 30 seconds.
using System;
using System.Runtime.InteropServices;
using System.Threading;
class BeepSample
{
[DllImport("kernel32.dll", SetLastError=true)]
static extern bool Beep(uint dwFreq, uint dwDuration);
static void Main()
{
Console.WriteLine("Testing PC speaker...");
for (uint i = 100; i <= 20000; i++)
{
var BeepThread = new Thread(delegate() { Beep(i, 30000); });
}
Console.WriteLine("Testing complete.");
Console.ReadLine();
}
}
Only problem is the threads don't seem to work.
I know I am missing something basic.
Any ideas?

Your forgot to start thread MSDN link
for (uint i = 100; i <= 20000; i++)
{
var BeepThread = new Thread(delegate() { Beep(i, 30000); });
BeepThread.Start();
}
However that looks suspicious. Why would you need 19900 threads? Probably you want to have 1 thread, that has a loop inside and pauses for short periods to output different frequency through beeper.

Only problem is the threads don't seem to work.
This aspect is clear from the part that you have not started the threads for them to do anything
Code has many other issues:
Closure issue, needs further modification as
for (uint i = 100; i <= 20000; i++)
{
int icopy = i;
var BeepThread = new Thread(delegate() { Beep(icopy, 30000); });
BeepThread.Start();
}
Thread class will start the foreground threads and each logical processor core has capacity to process one thread at a time and each Thread is a very costly resource for the computation and memory as it needs allocation of Thread Environment Block, Kernel Stack Memory, User Stack Memory, the current code even if it runs, will kill your system and you mostly have to kill the process to come out of it
Console.ReadLine(); will only block the Main Thread / Ui Thread, others threads being foreground will go on even if Main thread / Ui thread exits and not blocked, ideal way to block is calling Join on each Thread object, which will ask Main thread to wait till its complete
One of the preferred way to re-write the same code is using the Task Parallel Library:
Parallel.For(100, 20000,
, new ParallelOptions { MaxDegreeOfParallelism =
Environment.ProcessorCount }
i =>
{
int icopy = i;
Beep(icopy, 30000);
});
Benefits:
Code doesn't create so many threads and kill the system
Works on thread pool (Background threads) and use only required number of threads are invoked and max number never exceeds the Processor count of the system and would be mostly far lesser, as threads are reused since there's no major long running computation
Automatically blocks Main thread / Ui Thread

Ok, thanks guys. Had gone for lunch.
I implemented...
for (uint i = 500; i <= 550; i++)
{
uint icopy = i;
var BeepThread = new Thread(delegate() { Beep(icopy, 30000); });
BeepThread.Start();
}
Which worked great.
As predicted the threads did not terminate after the main thread was executed but it does what I want which is awesome.
Bless y'all.

Related

Throttling in multithread console application

My console application opens 100 threads which do exactly the same - sends some date to host in internal network. The host is very responsive, I have checked that it can handle much bigger number of requests in every second. The console application also is quite primitive and responsive (it doesn't use database or something) - it only sends requests to host. Increasing the number of threads doesn't improve the speed. It seems something is throttling the speed of communication the app with the host. Moreover I have run three instances of the same console application in the same time, and they have made 3x time more, so it seems the limitation is one the level of application.
I have already increased DefaultConnectionLimit but with no effect.
class Program
{
static void Main(string[] args)
{
System.Net.ServicePointManager.DefaultConnectionLimit = 200;
for (var i = 1; i <= 100; i++)
{
int threadId = i;
Thread thread = new Thread(() =>
{
Testing(threadId);
});
thread.Start();
}
}
private static void Testing(int threadId)
{
//just communicate with host
}
}

The thing is that craeting more threads than you have cores in your processors is pointless.
For example you have 4 cores and create 100 threads: where do you expect 96 threads to run? They have to wait and decrease of performance is due to creating and managing unnecessary threads.
You should use ThreadPool, which will optimize number of threads created and scheduled to work.

Creation of a new Thread everytime is very expensive. You shouldn't create threads explicitly. Use task api instead to run this on threadpool:
var tasks = new Task[100];
for (var i = 0; i < 100; i++)
{
int threadId = i;
tasks[i] = Task.Run(() => Testing(threadId));
}
Task.WhenAll(tasks).GetAwaiter().GetResult();

Replacing threads with tasks

New to threading and tasks here :)
So, I wrote a simple threading program that creates a few threads and runs them asynchronously then waits for them to finish.
I then changed it to a Task. The code does exactly the same thing and the only change is I change a couple of statements.
So, two questions really:
In the below code, what is the difference?
I'm struggling to figure out async/await. How would I integrate it into the below, or given all examples seem to be one method calls another that are both async/await return is this a bad example of using Task to do background work?
Thanks.
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
ThreadSample();
TaskSample();
}
private static void ThreadSample()
{
Random r = new Random();
MyThreadTest[] myThreads = new MyThreadTest[4];
Thread[] threads = new Thread[4];
for (int i = 0; i < 4; i++)
{
myThreads[i] = new MyThreadTest($"T{i}", r.Next(1, 500));
threads[i] = new Thread(new ThreadStart(myThreads[i].ThreadSample));
threads[i].Start();
}
for (int i = 0; i < 4; i++)
{
threads[i].Join();
}
System.Console.WriteLine("Finished");
System.Console.ReadKey();
}
private static void TaskSample()
{
Random r = new Random();
MyThreadTest[] myTasks = new MyThreadTest[4];
Task[] tasks = new Task[4];
for (int i = 0; i < 4; i++)
{
myTasks[i] = new MyThreadTest($"T{i}", r.Next(1, 500));
tasks[i] = new Task(new Action(myTasks[i].ThreadSample));
tasks[i].Start();
}
for (int i = 0; i < 4; i++)
{
tasks[i].Wait();
}
System.Console.WriteLine("Finished");
System.Console.ReadKey();
}
}
class MyThreadTest
{
private string name;
private int interval;
public MyThreadTest(string name, int interval)
{
this.name = name;
this.interval = interval;
Console.WriteLine($"Thread created: {name},{interval}");
}
public void ThreadSample()
{
for (int i = 0; i < 5; i++)
{
Thread.Sleep(interval);
Console.WriteLine($"{name} At {i} on thread {Thread.CurrentThread.ManagedThreadId}");
}
}
public void TaskSample()
{
for (int i = 0; i < 5; i++)
{
Thread.Sleep(interval);
Console.WriteLine($"{name} At {i} on thread {Thread.CurrentThread.ManagedThreadId}");
}
}
}
}

The Task Parallel Library (TPL) is an abstraction, and you shouldn't try to compare Tasks directly with threads. The Task object represents the abstract concept of an asynchronous task - a piece of code that should execute asynchronously and which will either complete, fault (throw an exception) or be canceled. The abstraction means you can write and use such tasks without worrying too much about exactly how they're executed asynchronously. There are lots of useful things like ContinueWith() you can use to compose, sequence and otherwise manage tasks.
Threads are a much lower level concrete system facility that can be used to run code asynchronously, but without all the niceties you get from the Task Parallel Library (TPL). If you want to sequence tasks or anything like that, you have to code it yourself.
In your example code, you're not actually directly creating any threads. Instead, the Actions you've written are being executed by the system thread pool. Of course, this can be changed. The TPL abstraction layer provides the TaskScheduler class which you can extend - if you have some special way of running code asynchronously, you can write a TaskScheduler to use TPL with it.
async/await is 100% compiler sugar. The compiler decomposes an async method into chunks, each of which becomes a Task, and those chunks execute sequentially with the help of a state machine, all generated by the compiler. One caution: by default, await captures the current SynchronizationContext and resumes on that context. So if you're doing this in WPF or Windows Forms, your continuation code after an await isn't actually running in a thread at all, it's running on the UI thread. You can disable this by calling ConfigureAwait(false). Really, async/await are primarily intended for asynchronous programming in UI environments where synchronization to a main thread is important.

In the below code, what is the difference?
The difference is big. Task is a unit of work, which will use a thread(s) from thread pool allocated based on estimated amount of work to be computed. if there is another Task, and there are paused, but still alive threads, in the pool, instead of spinning of a new thread (which is very costy) it reuses already created one. Multiple tasks can end-up using the same thread eventually (non simultaneously obviously)
Task based parallelism in nutshell is: Tasks are jobs, ThreadPool provisions resource to complete those jobs. Consequence, more clever, elastic thread/resource utilization, especially in general purpose programs targeting variety of execution environments and resource availability, for example VMs on cloud.
I'm struggling to figure out async/await.
await implied dependency of one task from another. If in your case you don't have it, other than waiting all of them to complete, what are you doing is pretty much enough.
If you need, you can achieve that with TPL too via, for example, ContinueWith

Thread Local Storage working principle

This is an example about Thread Local Storage (TLS) from Apress parallel programming book. I know that if we have 4 cores computer 4 thread can run parallel in same time. In this example we create 10 task and we suppose that have 4 cores computer. Each Thread local storage live in on thread so when start 10 task parallel only 4 thread perform. And We have 4 TLS so 10 task try to change 4 Thread local storage object. i want to ask how Tls prevent data race problem when thread count < Task count ??
using System;
using System.Threading;
using System.Threading.Tasks;
namespace Listing_04
{
class BankAccount
{
public int Balance
{
get;
set;
}
}
class Listing_04
{
static void Main(string[] args)
{
// create the bank account instance
BankAccount account = new BankAccount();
// create an array of tasks
Task<int>[] tasks = new Task<int>[10];
// create the thread local storage
ThreadLocal<int> tls = new ThreadLocal<int>();
for (int i = 0; i < 10; i++)
{
// create a new task
tasks[i] = new Task<int>((stateObject) =>
{
// get the state object and use it
// to set the TLS data
tls.Value = (int)stateObject;
// enter a loop for 1000 balance updates
for (int j = 0; j < 1000; j++)
{
// update the TLS balance
tls.Value++;
}
// return the updated balance
return tls.Value;
}, account.Balance);
// start the new task
tasks[i].Start();
}
// get the result from each task and add it to
// the balance
for (int i = 0; i < 10; i++)
{
account.Balance += tasks[i].Result;
}
// write out the counter value
Console.WriteLine("Expected value {0}, Balance: {1}",
10000, account.Balance);
// wait for input before exiting
Console.WriteLine("Press enter to finish");
Console.ReadLine();
}
}
}

We have 4 TLS so 10 task try to change 4 Thread local storage object
In your example, you could have anywhere between 1 and 10 TLS slots. This is because a) you are not managing your threads explicitly and so the tasks are executed using the thread pool, and b) the thread pool creates and destroys threads over time according to demand.
A loop of only 1000 iterations will completely almost instantaneously. So it's likely all ten of your tasks will get through the thread pool before the thread pool decides a work item has been waiting long enough to justify adding any new threads. But there is no guarantee of this.
Some important parts of the documentation include these statements:
By default, the minimum number of threads is set to the number of processors on a system
and
When demand is low, the actual number of thread pool threads can fall below the minimum values.
In other words, on your four-core system, the default minimum number of threads is four, but the actual number of threads active in the thread pool could in fact be less than that. And if the tasks take long enough to execute, the number of active threads could rise above that.
The biggest thing to keep in mind here is that using TLS in the context of a thread pool is almost certainly the wrong thing to do.
You use TLS when you have control over the threads, and you want a thread to be able to maintain some data private or unique to that thread. That's the opposite of what happens when you are using the thread pool. Even in the simplest case, multiple tasks can use the same thread, and so would wind up sharing TLS. And in more complicated scenarios, such as when using await, a single task could wind up executed in different threads, and so that one task could wind up using different TLS values depending on what thread is assigned to that task at that moment.
how Tls prevent data race problem when thread count < Task count ??
That depends on what "data race problem" you're talking about.
The fact is, the code you posted is filled with problems that are at the very least odd, if not outright wrong. For example, you are passing account.Balance as the initial value for each task. But why? This value is evaluated when you create the task, before it could ever be modified later, so what's the point of passing it?
And if you thought you were passing whatever the current value is when the task starts, that seems like that would be wrong too. Why would it be valid to make the starting value for a given task vary according to how many tasks had already completed and been accounted for in your later loop? (To be clear: that's not what's happening…but even if it were, it'd be a strange thing to do.)
Beyond all that, it's not clear what you thought using TLS here would accomplish anyway. When each task starts, you reinitialize the TLS value to 0 (i.e. the value of account.Balance that you've passed to the Task<int> constructor). So no thread involved ever sees a value other than 0 during the context of executing any given task. A local variable would accomplish exactly the same thing, without the overhead of TLS and without confusing anyone who reads the code and tries to figure out why TLS was used when it adds no value to the code.
So, does TLS solve some sort of "data race problem"? Not in this example, it doesn't appear to. So asking how it does that is impossible to answer. It doesn't do that, so there is no "how".
For what it's worth, I modified your example slightly so that it would report the individual threads that were assigned to the tasks. I found that on my machine, the number of threads used varied between two and eight. This is consistent with my eight-core machine, with the variation due to how much the first thread in the pool can get done before the pool has initialized additional threads and assigned tasks to them. Most commonly, I would see the first thread completing between three and five of the tasks, with the remaining tasks handled by remaining individual threads.
In each case, the thread pool created eight threads as soon as the tasks were started. But most of the time, at least one of those threads wound up unused, because the other threads were able to complete the tasks before the pool was saturated. That is, there is overhead in the thread pool just managing the tasks, and in your example the tasks are so inexpensive that this overhead allows one or more thread pool threads to finish one task before the thread pool needs that thread for another.
I've copied that version below. Note that I also added a delay between trial iterations, to allow the thread pool to terminate the threads it created (on my machine, this took 20 seconds, hence the delay time hard-coded…you can see the threads being terminated in the debugger output).
static void Main(string[] args)
{
while (_PromptContinue())
{
// create the bank account instance
BankAccount account = new BankAccount();
// create an array of tasks
Task<int>[] tasks = new Task<int>[10];
// create the thread local storage
ThreadLocal<int> tlsBalance = new ThreadLocal<int>();
ThreadLocal<(int Id, int Count)> tlsIds = new ThreadLocal<(int, int)>(
() => (Thread.CurrentThread.ManagedThreadId, 0), true);
for (int i = 0; i < 10; i++)
{
int k = i;
// create a new task
tasks[i] = new Task<int>((stateObject) =>
{
// get the state object and use it
// to set the TLS data
tlsBalance.Value = (int)stateObject;
(int id, int count) = tlsIds.Value;
tlsIds.Value = (id, count + 1);
Console.WriteLine($"task {k}: thread {id}, initial value {tlsBalance.Value}");
// enter a loop for 1000 balance updates
for (int j = 0; j < 1000; j++)
{
// update the TLS balance
tlsBalance.Value++;
}
// return the updated balance
return tlsBalance.Value;
}, account.Balance);
// start the new task
tasks[i].Start();
}
// Make sure this thread isn't busy at all while the thread pool threads are working
Task.WaitAll(tasks);
// get the result from each task and add it to
// the balance
for (int i = 0; i < 10; i++)
{
account.Balance += tasks[i].Result;
}
// write out the counter value
Console.WriteLine("Expected value {0}, Balance: {1}", 10000, account.Balance);
Console.WriteLine("{0} thread ids used: {1}",
tlsIds.Values.Count,
string.Join(", ", tlsIds.Values.Select(t => $"{t.Id} ({t.Count})")));
System.Diagnostics.Debug.WriteLine("done!");
_Countdown(TimeSpan.FromSeconds(20));
}
}
private static void _Countdown(TimeSpan delay)
{
System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew();
TimeSpan remaining = delay - sw.Elapsed,
sleepMax = TimeSpan.FromMilliseconds(250);
int cchMax = $"{delay.TotalSeconds,2:0}".Length;
string format = $"\r{{0,{cchMax}:0}}", previousText = null;
while (remaining > TimeSpan.Zero)
{
string nextText = string.Format(format, remaining.TotalSeconds);
if (previousText != nextText)
{
Console.Write(format, remaining.TotalSeconds);
previousText = nextText;
}
Thread.Sleep(remaining > sleepMax ? sleepMax : remaining);
remaining = delay - sw.Elapsed;
}
Console.Write(new string(' ', cchMax));
Console.Write('\r');
}
private static bool _PromptContinue()
{
Console.Write("Press Esc to exit, any other key to proceed: ");
try
{
return Console.ReadKey(true).Key != ConsoleKey.Escape;
}
finally
{
Console.WriteLine();
}
}

Load Test using C# Async Await

I am creating a console program, which can test read / write to a Cache by simulating multiple clients, and have written following code. Please help me understand:
Is it correct way to achieve the multi client simulation
What can I do more to make it a genuine load test
void Main()
{
List<Task<long>> taskList = new List<Task<long>>();
for (int i = 0; i < 500; i++)
{
taskList.Add(TestAsync());
}
Task.WaitAll(taskList.ToArray());
long averageTime = taskList.Average(t => t.Result);
}
public static async Task<long> TestAsync()
{
// Returns the total time taken using Stop Watch in the same module
return await Task.Factory.StartNew(() => // Call Cache Read / Write);
}

Adjusted your code slightly to see how many threads we have at a particular time.
static volatile int currentExecutionCount = 0;
static void Main(string[] args)
{
List<Task<long>> taskList = new List<Task<long>>();
var timer = new Timer(Print, null, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1));
for (int i = 0; i < 1000; i++)
{
taskList.Add(DoMagic());
}
Task.WaitAll(taskList.ToArray());
timer.Change(Timeout.Infinite, Timeout.Infinite);
timer = null;
//to check that we have all the threads executed
Console.WriteLine("Done " + taskList.Sum(t => t.Result));
Console.ReadLine();
}
static void Print(object state)
{
Console.WriteLine(currentExecutionCount);
}
static async Task<long> DoMagic()
{
return await Task.Factory.StartNew(() =>
{
Interlocked.Increment(ref currentExecutionCount);
//place your code here
Thread.Sleep(TimeSpan.FromMilliseconds(1000));
Interlocked.Decrement(ref currentExecutionCount);
return 4;
}
//this thing should give a hint to scheduller to use new threads and not scheduled
, TaskCreationOptions.LongRunning
);
}
The result is: inside a virtual machine I have from 2 to 10 threads running simultaneously if I don't use the hint. With the hint — up to 100. And on real machine I can see 1000 threads at once. Process explorer confirms this. Some details on the hint that would be helpful.

If it is very busy, then apparently your clients have to wait a while before their requests are serviced. Your program does not measure this, because your stopwatch starts running when the service request starts.
If you also want to measure what happen with the average time before a request is finished, you should start your stopwatch when the request is made, not when the request is serviced.
Your program takes only threads from the thread pool. If you start more tasks then there are threads, some tasks will have to wait before TestAsync starts running. This wait time would be measured if you remember the time Task.Run is called.
Besides the flaw in time measurements, how many service requests do you expect simultaneously? Are there enough free threads in your thread pool to simulate this? If you expect about 50 service requests at the same time, and the size of your thread pool is only 20 threads, then you'll never run 50 service requests at the same time. Vice versa: if your thread pool is way bigger than your number of expected simultaneous service requests, then you'll measure longer times than are actual the case.
Consider changing the number of threads in your thread pool, and make sure no one else uses any threads of the pool.

Multithreaded code executes by threadnumber-times slower using System.Threading and Visual Studio C# Express Hosting Process

I have a very simple program counting the characters in a string. An integer threadnum sets the number of threads and divides the data by threadnum accordingly into chunks for each thread to process.
Each thread increments the values contained in a shared dictionary, building a character historgram.
private Dictionary<UInt32, int> dict = new Dictionary<UInt32, int>();
In order to wait for all threads to finish and continue with the main process, I invoke Thread.Join
Initially I had a local dictionary for each thread which get merged afterwards, but a shared dictionary worked fine, without locking.
No references are locked in the method BuildDictionary, though locking the dictionary did not significantly impact thread-execution time.
Each thread is timed, and the resulting dictionary compared.
The dictionary content is the same regardless of a single or multiple threads - as it should be.
Each thread takes a fraction determined by threadnum to complete - as it should be.
Problem:
The total time is roughly a multiple of threadnum , that is to say the execution time increases ?
(Unfortunately I cannot run a C# Profiler at the moment. Additionally I would prefer C# 3 code compatibility. )
Others are likely struggling as well. It may be that the VS 2010 express edition vshost process stacks and schedules threads to be run sequentially?
Another MT-performance issue was posted recently posted here as "Visual Studio C# 2010 Express Debug running Faster than Release":
Code:
public int threadnum = 8;
Thread[] threads = new Thread[threadnum];
Stopwatch stpwtch = new Stopwatch();
stpwtch.Start();
for (var threadidx = 0; threadidx < threadnum; threadidx++)
{
threads[threadidx] = new Thread(BuildDictionary);
threads[threadidx].Start(threadidx);
threads[threadidx].Join(); //Blocks the calling thread, till thread completion
}
WriteLine("Total - time: {0} msec", stpwtch.ElapsedMilliseconds);
Can you help please?
Update:
It appears that the strange behavior of an almost linear slowdown with increasing thread-number is an artifact due to the numerous hooks of the IDE's Debugger.
Running the process outside the developer environment, I actually do get a 30% speed increase on a 2 logical/physical core machine. During debugging I am already at the high end of CPU utilization, and hence I suspect it is wise to have some leeway during development through additional idle cores.
As initially, I let each thread compute on its own local data-chunk, which is locked and written back to a shared list and aggregated after all threads have finished.
Conclusion:
Be heedful of the environment the process is running in.

We can put the dictionary synchronization issues Tony the Lion mentions in his answer aside for the moment, because in your current implementation you are in fact not running anything in parallel!
Let's take a look at what you are currently doing in your loop:
Start a thread.
Wait for the thread to complete.
Start the next thread.
In other words, you should not be calling Join inside the loop.
Instead, you should start all threads as you are doing, but use a singaling construct such as an AutoResetEvent to determine when all threads have completed.
See example program:
class Program
{
static EventWaitHandle _waitHandle = new AutoResetEvent(false);
static void Main(string[] args)
{
int numThreads = 5;
for (int i = 0; i < numThreads; i++)
{
new Thread(DoWork).Start(i);
}
for (int i = 0; i < numThreads; i++)
{
_waitHandle.WaitOne();
}
Console.WriteLine("All threads finished");
}
static void DoWork(object id)
{
Thread.Sleep(1000);
Console.WriteLine(String.Format("Thread {0} completed", (int)id));
_waitHandle.Set();
}
}
Alternatively you could just as well be calling Join in the second loop if you have references to the threads available.
After you have done this you can and should worry about the dictionary synchronization problems.

A Dictionary can support multiple readers concurrently, as long as the collection is not modified. From MSDN
You say:
but a shared dictionary worked fine, without locking.
Each thread increments the values contained in a shared dictionary
Your program is by definition broken, if you alter the data in the dictionary without proper locking, you will end up with bugs. Nothing more needs to be said.

I wouldn't use some shared static Dictionary, if each thread worked on a local copy you could amalgamate your results once all threads had signalled completion.
WaitHandle.WaitAll avoids any deadlocking on an AutoResetEvent.
class Program
{
static void Main()
{
char[] text = "Some String".ToCharArray();
int numThreads = 5;
// I leave the implementation of the next line to the OP.
Partition[] partitions = PartitionWork(text, numThreads);
completions = new WaitHandle[numThreads];
results = IDictionary<char, int>[numThreads];
for (int i = 0; i < numThreads; i++)
{
results[i] = new IDictionary<char, int>();
completions[i] = new ManualResetEvent(false);
new Thread(DoWork).Start(
text,
partitions[i].Start,
partitions[i].End,
results[i],
completions[i]);
}
if (WaitHandle.WaitAll(completions, new TimeSpan(366, 0, 0, 0))
{
Console.WriteLine("All threads finished");
}
else
{
Console.WriteLine("Timed out after a year and a day");
}
// Merge the results
IDictionary<char, int> result = results[0];
for (int i = 1; i < numThreads - 1; i ++)
{
foreach(KeyValuePair<char, int> item in results[i])
{
if (result.ContainsKey(item.Key)
{
result[item.Key] += item.Value;
}
else
{
result.Add(item.Key, item.Value);
}
}
}
}
static void BuildDictionary(
char[] text,
int start,
int finish,
IDictionary<char, int> result,
WaitHandle completed)
{
for (int i = start; i <= finish; i++)
{
if (result.ContainsKey(text[i])
{
result[text[i]]++;
}
else
{
result.Add(text[i], 1);
}
}
completed.Set();
}
}
With this implementation the only variable that is ever shared is the char[] of the text and that is always read only.
You do have the burden of merging the dictionaries at the end but, that is a small price for avoiding any concurrencey issues. In a later version of the framework I would have used TPL and ConcurrentDictionary and possibly Partitioner<TSource>.

I totally agree with TonyTheLion and others, and as you fix the actual problem with join'ing at the wrong place, there still will be problem with (no) locks and updating the shared dictionary. I wanted to drop you a quick workaround: just wrap your integer value into some object:
instead of:
Dictionary<uint, int> dict = new Dictionary<uint, int>();
use:
class Entry { public int value; }
Dictionary<uint, Entry> dict = new Dictionary<uint, Entry>();
and now increment the Entry::value instead. That way, the Dictionary will not notice any changes and it will be safe without locking the dictionary.
Note: this will however work only if you are guaranteed if one thread would use only its own one Entry. I've just noticed this is not true as you said 'histogram of characters'. You will have to lock over each Entry during the increment, or some increments may be lost. Still, locking at Entry layer will speed up signinificantly when compared to locking at whole dictionary

Roem saw it.
Your main thread should Join the X other Threads after having started all of them.
Else it waits for the 1st thread to be finished, to start and wait for the 2nd one.
for (var threadidx = 0; threadidx < threadnum; threadidx++)
{
threads[threadidx] = new Thread(BuildDictionary);
threads[threadidx].Start(threadidx);
}
for (var threadidx = 0; threadidx < threadnum; threadidx++)
{
threads[threadidx].Join(); //Blocks the calling thread, till thread completion
}

As Rotem points out, by joining in the loop you are waiting for each thread to complete before going continuing.
The hint for why this is can be found on the Thread.Join documentation on MSDN
Blocks the calling thread until a thread terminates
So you loop will not continue until that one thread has completed it's work. To start all the threads then wait for them to complete, join them outside the loop:
public int threadnum = 8;
Thread[] threads = new Thread[threadnum];
Stopwatch stpwtch = new Stopwatch();
stpwtch.Start();
// Start all the threads doing their work
for (var threadidx = 0; threadidx < threadnum; threadidx++)
{
threads[threadidx] = new Thread(BuildDictionary);
threads[threadidx].Start(threadidx);
}
// Join to all the threads to wait for them to complete
for (var threadidx = 0; threadidx < threadnum; threadidx++)
{
threads[threadidx].Join();
}
System.Diagnostics.Debug.WriteLine("Total - time: {0} msec", stpwtch.ElapsedMilliseconds);
You will really need to post your BuildDictionary function. It is very likely that the operation will be no faster with multiple threads and the threading overhead will actually increase execution time.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.