My console application opens 100 threads which do exactly the same - sends some date to host in internal network. The host is very responsive, I have checked that it can handle much bigger number of requests in every second. The console application also is quite primitive and responsive (it doesn't use database or something) - it only sends requests to host. Increasing the number of threads doesn't improve the speed. It seems something is throttling the speed of communication the app with the host. Moreover I have run three instances of the same console application in the same time, and they have made 3x time more, so it seems the limitation is one the level of application.
I have already increased DefaultConnectionLimit but with no effect.
class Program
{
static void Main(string[] args)
{
System.Net.ServicePointManager.DefaultConnectionLimit = 200;
for (var i = 1; i <= 100; i++)
{
int threadId = i;
Thread thread = new Thread(() =>
{
Testing(threadId);
});
thread.Start();
}
}
private static void Testing(int threadId)
{
//just communicate with host
}
}
The thing is that craeting more threads than you have cores in your processors is pointless.
For example you have 4 cores and create 100 threads: where do you expect 96 threads to run? They have to wait and decrease of performance is due to creating and managing unnecessary threads.
You should use ThreadPool, which will optimize number of threads created and scheduled to work.
Creation of a new Thread everytime is very expensive. You shouldn't create threads explicitly. Use task api instead to run this on threadpool:
var tasks = new Task[100];
for (var i = 0; i < 100; i++)
{
int threadId = i;
tasks[i] = Task.Run(() => Testing(threadId));
}
Task.WhenAll(tasks).GetAwaiter().GetResult();
Related
Edit: As per the discussion in the comments, I was overestimating how much many threads would help, and have gone back to Parallell.ForEach with a reasonable MaxDegreeOfParallelism, and just have to wait it out.
I have a 2D array data structure, and perform work on slices of the data. There will only ever be around 1000 threads required to work on all the data simultaneously. Basically there are around 1000 "days" worth of data for all ~7000 data points, and I would like to process the data for each day in a new thread in parallel.
My issue is that doing work in the child threads dramatically slows the time in which the main thread starts them. If I have no work being done in the child threads, the main thread starts them all basically instantly. In my example below, with just a bit of work, it takes ~65ms to start all the threads. In my real use case, the worker threads will take around 5-10 seconds to compute all what they need, but I would like them all to start instantly otherwise, I am basically running the work in sequence. I do not understand why their work is slowing down the main thread from starting them.
How the data is setup shouldn't matter (I hope). The way it's setupmight look weird I was just simulating exactly how I receive the data. What's important is that if you comment out the foreach loop in the DoThreadWork method, the time it takes to start the threads is waaay lower.
I have the for (var i = 0; i < 4; i++) loop just to run the simulation multiple times to see 4 sets of timing results to make sure that it wasn't just slow the first time.
Here is a code snippet to simulate my real code:
public static void Main(string[] args)
{
var fakeData = Enumerable
.Range(0, 7000)
.Select(_ => Enumerable.Range(0, 400).ToArray())
.ToArray();
const int offset = 100;
var dataIndices = Enumerable
.Range(offset, 290)
.ToArray();
for (var i = 0; i < 4; i++)
{
var s = Stopwatch.StartNew();
var threads = dataIndices
.Select(n =>
{
var thread = new Thread(() =>
{
foreach (var fake in fakeData)
{
var sliced = new ArraySegment<int>(fake, n - offset, n - (n - offset));
DoThreadWork(sliced);
}
});
return thread;
})
.ToList();
foreach (var thread in threads)
{
thread.Start();
}
Console.WriteLine($"Before Join: {s.Elapsed.Milliseconds}");
foreach (var thread in threads)
{
thread.Join();
}
Console.WriteLine($"After Join: {s.Elapsed.Milliseconds}");
}
}
private static void DoThreadWork(ArraySegment<int> fakeData)
{
// Commenting out this foreach loop will dramatically increase the speed
// in which all the threads start
var a = 0;
foreach (var fake in fakeData)
{
// Simulate thread work
a += fake;
}
}
Use the thread/task pool and limit thread/task count to 2*(CPU Cores) at most. Creating more threads doesn't magically make more work get done as you need hardware "threads" to run them (1 per CPU core for non-SMT CPU's, 2 per core for Intel HT, AMD's SMT implementation). Executing hundreds to thousands of threads that don't have to passively await asynchronous callbacks (i.e. I/O) makes running the threads far less efficient due to thrashing the CPU with context switches for no reason.
I've been working on a multithreaded file archiver for a week now, it works exclusively on clean threads. Synchronization is achieved by monitors and AutoResetEvent.
I allocated the number of threads to the number of cores like that:
public static int GetCoreCount()
{
int coreCount = 0;
foreach (var item in new System.Management.ManagementObjectSearcher("Select * from Win32_Processor").Get())
{
coreCount += int.Parse(item["NumberOfCores"].ToString());
}
return coreCount;
}
But that load my CPU max ~65%.
And this load is far from uniform, it constantly falls and rises.
Tell me. Does anyone have any idea how to use 100% processor capability?
This is my Run() code :
public void Run()
{
var readingThread = new Thread(new ThreadStart(ReadInFile));
var compressingThreads = new List<Thread>();
for (var i = 0; i < CoreManager.GetCoreCount(); i++)
{
var j = i;
ProcessEvents[j] = new AutoResetEvent(false);
compressingThreads.Add(new Thread(() => Process(j)));
}
var writingThread = new Thread(new ThreadStart(WriteOutFile));
readingThread.Start();
foreach (var compressThread in compressingThreads)
{
compressThread.Start();
}
writingThread.Start();
WaitHandle.WaitAll(ProcessEvents);
OutputDictionary.SetCompleted();
writingThread.Join();
It's not possible to tell what is limiting your core usage without profiling, and also knowing how much data you are compressing in your test.
However I can say that in order to get good efficiency, which includes both full core utilization and close to a factor of n speedup for n threads over one thread, in pigz I have to create pools of threads that are always there, either running or waiting for more work. It is a huge impact to create and destroy threads for every chunk of data to be processed. I also have pools of pre-allocated blocks of memory for the same reason.
The source code at the link, in C, may be of help.
This is an example about Thread Local Storage (TLS) from Apress parallel programming book. I know that if we have 4 cores computer 4 thread can run parallel in same time. In this example we create 10 task and we suppose that have 4 cores computer. Each Thread local storage live in on thread so when start 10 task parallel only 4 thread perform. And We have 4 TLS so 10 task try to change 4 Thread local storage object. i want to ask how Tls prevent data race problem when thread count < Task count ??
using System;
using System.Threading;
using System.Threading.Tasks;
namespace Listing_04
{
class BankAccount
{
public int Balance
{
get;
set;
}
}
class Listing_04
{
static void Main(string[] args)
{
// create the bank account instance
BankAccount account = new BankAccount();
// create an array of tasks
Task<int>[] tasks = new Task<int>[10];
// create the thread local storage
ThreadLocal<int> tls = new ThreadLocal<int>();
for (int i = 0; i < 10; i++)
{
// create a new task
tasks[i] = new Task<int>((stateObject) =>
{
// get the state object and use it
// to set the TLS data
tls.Value = (int)stateObject;
// enter a loop for 1000 balance updates
for (int j = 0; j < 1000; j++)
{
// update the TLS balance
tls.Value++;
}
// return the updated balance
return tls.Value;
}, account.Balance);
// start the new task
tasks[i].Start();
}
// get the result from each task and add it to
// the balance
for (int i = 0; i < 10; i++)
{
account.Balance += tasks[i].Result;
}
// write out the counter value
Console.WriteLine("Expected value {0}, Balance: {1}",
10000, account.Balance);
// wait for input before exiting
Console.WriteLine("Press enter to finish");
Console.ReadLine();
}
}
}
We have 4 TLS so 10 task try to change 4 Thread local storage object
In your example, you could have anywhere between 1 and 10 TLS slots. This is because a) you are not managing your threads explicitly and so the tasks are executed using the thread pool, and b) the thread pool creates and destroys threads over time according to demand.
A loop of only 1000 iterations will completely almost instantaneously. So it's likely all ten of your tasks will get through the thread pool before the thread pool decides a work item has been waiting long enough to justify adding any new threads. But there is no guarantee of this.
Some important parts of the documentation include these statements:
By default, the minimum number of threads is set to the number of processors on a system
and
When demand is low, the actual number of thread pool threads can fall below the minimum values.
In other words, on your four-core system, the default minimum number of threads is four, but the actual number of threads active in the thread pool could in fact be less than that. And if the tasks take long enough to execute, the number of active threads could rise above that.
The biggest thing to keep in mind here is that using TLS in the context of a thread pool is almost certainly the wrong thing to do.
You use TLS when you have control over the threads, and you want a thread to be able to maintain some data private or unique to that thread. That's the opposite of what happens when you are using the thread pool. Even in the simplest case, multiple tasks can use the same thread, and so would wind up sharing TLS. And in more complicated scenarios, such as when using await, a single task could wind up executed in different threads, and so that one task could wind up using different TLS values depending on what thread is assigned to that task at that moment.
how Tls prevent data race problem when thread count < Task count ??
That depends on what "data race problem" you're talking about.
The fact is, the code you posted is filled with problems that are at the very least odd, if not outright wrong. For example, you are passing account.Balance as the initial value for each task. But why? This value is evaluated when you create the task, before it could ever be modified later, so what's the point of passing it?
And if you thought you were passing whatever the current value is when the task starts, that seems like that would be wrong too. Why would it be valid to make the starting value for a given task vary according to how many tasks had already completed and been accounted for in your later loop? (To be clear: that's not what's happening…but even if it were, it'd be a strange thing to do.)
Beyond all that, it's not clear what you thought using TLS here would accomplish anyway. When each task starts, you reinitialize the TLS value to 0 (i.e. the value of account.Balance that you've passed to the Task<int> constructor). So no thread involved ever sees a value other than 0 during the context of executing any given task. A local variable would accomplish exactly the same thing, without the overhead of TLS and without confusing anyone who reads the code and tries to figure out why TLS was used when it adds no value to the code.
So, does TLS solve some sort of "data race problem"? Not in this example, it doesn't appear to. So asking how it does that is impossible to answer. It doesn't do that, so there is no "how".
For what it's worth, I modified your example slightly so that it would report the individual threads that were assigned to the tasks. I found that on my machine, the number of threads used varied between two and eight. This is consistent with my eight-core machine, with the variation due to how much the first thread in the pool can get done before the pool has initialized additional threads and assigned tasks to them. Most commonly, I would see the first thread completing between three and five of the tasks, with the remaining tasks handled by remaining individual threads.
In each case, the thread pool created eight threads as soon as the tasks were started. But most of the time, at least one of those threads wound up unused, because the other threads were able to complete the tasks before the pool was saturated. That is, there is overhead in the thread pool just managing the tasks, and in your example the tasks are so inexpensive that this overhead allows one or more thread pool threads to finish one task before the thread pool needs that thread for another.
I've copied that version below. Note that I also added a delay between trial iterations, to allow the thread pool to terminate the threads it created (on my machine, this took 20 seconds, hence the delay time hard-coded…you can see the threads being terminated in the debugger output).
static void Main(string[] args)
{
while (_PromptContinue())
{
// create the bank account instance
BankAccount account = new BankAccount();
// create an array of tasks
Task<int>[] tasks = new Task<int>[10];
// create the thread local storage
ThreadLocal<int> tlsBalance = new ThreadLocal<int>();
ThreadLocal<(int Id, int Count)> tlsIds = new ThreadLocal<(int, int)>(
() => (Thread.CurrentThread.ManagedThreadId, 0), true);
for (int i = 0; i < 10; i++)
{
int k = i;
// create a new task
tasks[i] = new Task<int>((stateObject) =>
{
// get the state object and use it
// to set the TLS data
tlsBalance.Value = (int)stateObject;
(int id, int count) = tlsIds.Value;
tlsIds.Value = (id, count + 1);
Console.WriteLine($"task {k}: thread {id}, initial value {tlsBalance.Value}");
// enter a loop for 1000 balance updates
for (int j = 0; j < 1000; j++)
{
// update the TLS balance
tlsBalance.Value++;
}
// return the updated balance
return tlsBalance.Value;
}, account.Balance);
// start the new task
tasks[i].Start();
}
// Make sure this thread isn't busy at all while the thread pool threads are working
Task.WaitAll(tasks);
// get the result from each task and add it to
// the balance
for (int i = 0; i < 10; i++)
{
account.Balance += tasks[i].Result;
}
// write out the counter value
Console.WriteLine("Expected value {0}, Balance: {1}", 10000, account.Balance);
Console.WriteLine("{0} thread ids used: {1}",
tlsIds.Values.Count,
string.Join(", ", tlsIds.Values.Select(t => $"{t.Id} ({t.Count})")));
System.Diagnostics.Debug.WriteLine("done!");
_Countdown(TimeSpan.FromSeconds(20));
}
}
private static void _Countdown(TimeSpan delay)
{
System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew();
TimeSpan remaining = delay - sw.Elapsed,
sleepMax = TimeSpan.FromMilliseconds(250);
int cchMax = $"{delay.TotalSeconds,2:0}".Length;
string format = $"\r{{0,{cchMax}:0}}", previousText = null;
while (remaining > TimeSpan.Zero)
{
string nextText = string.Format(format, remaining.TotalSeconds);
if (previousText != nextText)
{
Console.Write(format, remaining.TotalSeconds);
previousText = nextText;
}
Thread.Sleep(remaining > sleepMax ? sleepMax : remaining);
remaining = delay - sw.Elapsed;
}
Console.Write(new string(' ', cchMax));
Console.Write('\r');
}
private static bool _PromptContinue()
{
Console.Write("Press Esc to exit, any other key to proceed: ");
try
{
return Console.ReadKey(true).Key != ConsoleKey.Escape;
}
finally
{
Console.WriteLine();
}
}
I am teaching myself C# from my usual C++ programming and now I'm doing threads.
The following simple code compiles fine and should output beeps on a loop via threads for 30 seconds.
using System;
using System.Runtime.InteropServices;
using System.Threading;
class BeepSample
{
[DllImport("kernel32.dll", SetLastError=true)]
static extern bool Beep(uint dwFreq, uint dwDuration);
static void Main()
{
Console.WriteLine("Testing PC speaker...");
for (uint i = 100; i <= 20000; i++)
{
var BeepThread = new Thread(delegate() { Beep(i, 30000); });
}
Console.WriteLine("Testing complete.");
Console.ReadLine();
}
}
Only problem is the threads don't seem to work.
I know I am missing something basic.
Any ideas?
Your forgot to start thread MSDN link
for (uint i = 100; i <= 20000; i++)
{
var BeepThread = new Thread(delegate() { Beep(i, 30000); });
BeepThread.Start();
}
However that looks suspicious. Why would you need 19900 threads? Probably you want to have 1 thread, that has a loop inside and pauses for short periods to output different frequency through beeper.
Only problem is the threads don't seem to work.
This aspect is clear from the part that you have not started the threads for them to do anything
Code has many other issues:
Closure issue, needs further modification as
for (uint i = 100; i <= 20000; i++)
{
int icopy = i;
var BeepThread = new Thread(delegate() { Beep(icopy, 30000); });
BeepThread.Start();
}
Thread class will start the foreground threads and each logical processor core has capacity to process one thread at a time and each Thread is a very costly resource for the computation and memory as it needs allocation of Thread Environment Block, Kernel Stack Memory, User Stack Memory, the current code even if it runs, will kill your system and you mostly have to kill the process to come out of it
Console.ReadLine(); will only block the Main Thread / Ui Thread, others threads being foreground will go on even if Main thread / Ui thread exits and not blocked, ideal way to block is calling Join on each Thread object, which will ask Main thread to wait till its complete
One of the preferred way to re-write the same code is using the Task Parallel Library:
Parallel.For(100, 20000,
, new ParallelOptions { MaxDegreeOfParallelism =
Environment.ProcessorCount }
i =>
{
int icopy = i;
Beep(icopy, 30000);
});
Benefits:
Code doesn't create so many threads and kill the system
Works on thread pool (Background threads) and use only required number of threads are invoked and max number never exceeds the Processor count of the system and would be mostly far lesser, as threads are reused since there's no major long running computation
Automatically blocks Main thread / Ui Thread
Ok, thanks guys. Had gone for lunch.
I implemented...
for (uint i = 500; i <= 550; i++)
{
uint icopy = i;
var BeepThread = new Thread(delegate() { Beep(icopy, 30000); });
BeepThread.Start();
}
Which worked great.
As predicted the threads did not terminate after the main thread was executed but it does what I want which is awesome.
Bless y'all.
I am creating a console program, which can test read / write to a Cache by simulating multiple clients, and have written following code. Please help me understand:
Is it correct way to achieve the multi client simulation
What can I do more to make it a genuine load test
void Main()
{
List<Task<long>> taskList = new List<Task<long>>();
for (int i = 0; i < 500; i++)
{
taskList.Add(TestAsync());
}
Task.WaitAll(taskList.ToArray());
long averageTime = taskList.Average(t => t.Result);
}
public static async Task<long> TestAsync()
{
// Returns the total time taken using Stop Watch in the same module
return await Task.Factory.StartNew(() => // Call Cache Read / Write);
}
Adjusted your code slightly to see how many threads we have at a particular time.
static volatile int currentExecutionCount = 0;
static void Main(string[] args)
{
List<Task<long>> taskList = new List<Task<long>>();
var timer = new Timer(Print, null, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1));
for (int i = 0; i < 1000; i++)
{
taskList.Add(DoMagic());
}
Task.WaitAll(taskList.ToArray());
timer.Change(Timeout.Infinite, Timeout.Infinite);
timer = null;
//to check that we have all the threads executed
Console.WriteLine("Done " + taskList.Sum(t => t.Result));
Console.ReadLine();
}
static void Print(object state)
{
Console.WriteLine(currentExecutionCount);
}
static async Task<long> DoMagic()
{
return await Task.Factory.StartNew(() =>
{
Interlocked.Increment(ref currentExecutionCount);
//place your code here
Thread.Sleep(TimeSpan.FromMilliseconds(1000));
Interlocked.Decrement(ref currentExecutionCount);
return 4;
}
//this thing should give a hint to scheduller to use new threads and not scheduled
, TaskCreationOptions.LongRunning
);
}
The result is: inside a virtual machine I have from 2 to 10 threads running simultaneously if I don't use the hint. With the hint — up to 100. And on real machine I can see 1000 threads at once. Process explorer confirms this. Some details on the hint that would be helpful.
If it is very busy, then apparently your clients have to wait a while before their requests are serviced. Your program does not measure this, because your stopwatch starts running when the service request starts.
If you also want to measure what happen with the average time before a request is finished, you should start your stopwatch when the request is made, not when the request is serviced.
Your program takes only threads from the thread pool. If you start more tasks then there are threads, some tasks will have to wait before TestAsync starts running. This wait time would be measured if you remember the time Task.Run is called.
Besides the flaw in time measurements, how many service requests do you expect simultaneously? Are there enough free threads in your thread pool to simulate this? If you expect about 50 service requests at the same time, and the size of your thread pool is only 20 threads, then you'll never run 50 service requests at the same time. Vice versa: if your thread pool is way bigger than your number of expected simultaneous service requests, then you'll measure longer times than are actual the case.
Consider changing the number of threads in your thread pool, and make sure no one else uses any threads of the pool.