I wanted to stress test my new cpu. I made this in about 2 mins.
When I add more threads to it, the efficiency of it decreases dramatically. These are results: (Please note I set the Priority in the taskmanager to High)
1 Thread: After one minute on 1 thread(s), you got to the number/prime 680263811
2 Threads: After one minute on 2 thread(s), you got to the number/prime 360252913
4 Threads: After one minute on 4 thread(s), you got to the number/prime 216150449
There are problems with the code, I just made it as a test. Please don't bash me that it was written horribly... I've kinda had a bad day
static void Main(string[] args)
{
Console.Write("Stress test, how many OS threads?: ");
int thr = int.Parse(Console.ReadLine());
Thread[] t = new Thread[thr];
Stopwatch s = new Stopwatch();
s.Start();
UInt64 it = 0;
UInt64 prime = 0;
for (int i = 0; i < thr; i++)
{
t[i] = new Thread(delegate()
{
while (s.Elapsed.TotalMinutes < 1)
{
it++;
if (it % 2 != 0)// im 100% sure that x % 2 does not give primes, but it uses up the cpu, so idc
{
prime = it;
}
}
Console.WriteLine("After one minute on " + t.Length + " thread(s), you got to the number/prime " + prime.ToString());//idc if this prints 100 times, this is just a test
});
t[i].Start();
}
Console.ReadLine();
}
Question: Can someone explain these unexpected results?
Your threads are incrementing it without any synchronization, so you're going to get weird results like this. Worse, you're also assigning prime without any synchronization.
thread 1: reads 0 from it, then gets unscheduled by the OS for whatever reason
thread 2: reads 0 from it, then increments to 1
thread 2: does work, assigns 1 to prime
... thread 2 repeats for awhile. thread 2 is now up to 7, and is about to check if (it % 2 != 0)
thread 1: regains the CPU
thread 1: increments it to 1
thread 2: assigns 1 to prime --- wat?
The possibilities get even worse as you get to the point where a bit in the high half of it is changing because 64-bit reads and writes are not atomic either although these numbers are a little larger than in the question, after running for longer, wildly variable would be possible ... consider
After some time it = 0x00000000_FFFFFFFF
thread 1 reads reads it (both words)
thread 2 reads the higher word 0x0000000_????????
thread 1 calculates it + 1 0x00000001_00000000
thread 1 writes it (both halves)
thread 2 reads the lower word (and puts this with the already read half) 0x00000000_00000000
While thread 1 was incrementing to 4294967296, thread 2 managed to read 0.
You should apply keyword volatile to variable it and prime.
Related
I have a List to loop while using multi-thread,I will get the first item of the List and do some processing,then remove the item.
While the count of List is not greater than 0 ,fetch data from data.
In a word:
In have a lot of records in my database.I need to publish them to my server.In the process of publishing, multithreading is required and the number of threads may be 10 or less.
For example:
private List<string> list;
void LoadDataFromDatabase(){
list=...;//load data from database...
}
void DoMethod()
{
While(list.Count>0)
{
var item=list.FirstOrDefault();
list.RemoveAt(0);
DoProcess();//how to use multi-thread (custom the count of theads)?
if(list.Count<=0)
{
LoadDataFromDatabase();
}
}
}
Please help me,I'm a beginner of c#,I have searched a lot of solutions, but no similar.
And more,I need to custom the count of theads.
Should your processing of the list be sequential? In other words, cannot you process element n + 1 while not finished yet processing of element n? If this is your case, then Multi-Threading is not the right solution.
Otherwise, if your processing elements are fully independent, you can use m threads, deviding Elements.Count / m elements for each thread to work on
Example: printing a list:
List<int> a = new List<int> { 1, 2, 3, 4,5 , 6, 7, 8, 9 , 10 };
int num_threads = 2;
int thread_elements = a.Count / num_threads;
// start the threads
Thread[] threads = new Thread[num_threads];
for (int i = 0; i < num_threads; ++i)
{
threads[i] = new Thread(new ThreadStart(Work));
threads[i].Start(i);
}
// this works fine if the total number of elements is divisable by num_threads
// but if we have 500 elements, 7 threads, then thread_elements = 500 / 7 = 71
// but 71 * 7 = 497, so that there are 3 elements not processed
// process them here:
int actual = thread_elements * num_threads;
for (int i = actual; i < a.Count; ++i)
Console.WriteLine(a[i]);
// wait all threads to finish
for (int i = 0; i < num_threads; ++i)
{
threads[i].Join();
}
void Work(object arg)
{
Console.WriteLine("Thread #" + arg + " has begun...");
// calculate my working range [start, end)
int id = (int)arg;
int mystart = id * thread_elements;
int myend = (id + 1) * thread_elements;
// start work on my range !!
for (int i = mystart; i < myend; ++i)
Console.WriteLine("Thread #" + arg + " Element " + a[i]);
}
ADD For your case, (uploading to server), it is the same as the code obove. You assign a number of threads, assigning each thread number of elements (which is auto calculated in the variable thread_elements, so you need only to change num_threads). For method Work, all you need is replacing the line Console.WriteLine("Thread #" + arg + " Element " + a[i]); with you uploading code.
One more thing to keep in mind, that multi-threading is dependent on your machine CPU. If your CPU has 4 cores, for example, then the best performance obtained would be 4 threads at maximum, so that assigning each core a thread. Otherwise, if you have 10 threads, for example, they would be slower than 4 threads because they will compete on CPU cores (Unless the threads are idle, waiting for some event to occur (e.g. uploading). In this case, 10 threads can run, because they don't take %100 of CPU usage)
WARNING: DO NOT modify the list while any thread is working (add, remove, set element...), neither assigning two threads the same element. Such things cause you a lot of bugs and exceptions !!!
This is a simple scenario that can be expanded in multiple ways if you add some details to your requirements:
IEnumerable<Data> LoadDataFromDatabase()
{
return ...
}
void ProcessInParallel()
{
while(true)
{
var data = LoadDataFromDatabase().ToList();
if(!data.Any()) break;
data.AsParallel().ForEach(ProcessSingleData);
}
}
void ProcessSingleData(Data d)
{
// do something with data
}
There are many ways to approach this. You can create threads and partition the list yourself or you can take advantage of the TPL and utilize Parallel.ForEach. In the example on the link you see a Action is called for each member of the list being iterated over. If this is your first taste of threading I would also attempt to do it the old fashioned way.
Here my opinion ;)
You can avoid use multithread if youur "List" is not really huge.
Instead of a List, you can use a Queue (FIFO - First In First Out). Then only use Dequeue() method to get one element of the Queue, DoSomeWork and get the another. Something like:
while(queue.Count > 0)
{
var temp = DoSomeWork(queue.Dequeue());
}
I think that this will be better for your propose.
I will get the first item of the List and do some processing,then remove the item.
Bad.
First, you want a queue, not a list.
Second, you do not process then remove, you remove THEN process.
Why?
So that you keep the locks small. Lock list access (note you need to synchonize access), remove, THEN unlock immediately and then process. THis way you keep the locks short. If you take, process, then remove - you basically are single threaded as you have to keep the lock in place while processing, so the next thread does not take the same item again.
And as you need to synchronize access and want multiple threads this is about the only way.
Read up on the lock statement for a start (you can later move to something like spinlock). Do NOT use threads unless you ahve to put schedule Tasks (using the Tasks interface new in 4.0), which gives you more flexibility.
I have made a simple console application for printing prime numbers. I am using ThreadPool for the function which checks whether a number is prime or not.
In the task manager , this program starts to take too much memory ( 1 GB in a few seconds )
How do I improve that if I have to still use ThreadPool?
Here is the code which I wrote
class Program
{
static void Main(string[] args)
{
Console.WriteLine(2);
Console.WriteLine(3);
Console.WriteLine(5);
Console.WriteLine(7);
Console.WriteLine(11);
Console.WriteLine(13);
Console.WriteLine(17);
for (long i = 19; i < Int64.MaxValue; i = i+2)
{
if(i % 3 == 0 || i % 5 == 0 || i % 7 == 0 || i % 11 == 0 || i % 13 == 0 || i % 17 == 0 )
continue;
ThreadPool.QueueUserWorkItem(CheckForPrime, i);
}
Console.Read();
}
private static void CheckForPrime(object i)
{
var i1 = i as long?;
var val = Math.Sqrt(i1.Value);
for (long j = 19; j <= val; j = j + 2)
{
if (i1 % j == 0) return;
}
Console.WriteLine(i1);
}
}
Simplest way to fix your code, just limit the work queue using a semaphore;
class Program
{
// Max 100 items in queue
private static readonly Semaphore WorkLimiter = new Semaphore(100, 100);
static void Main(string[] args)
{
Console.WriteLine(2);
Console.WriteLine(3);
Console.WriteLine(5);
Console.WriteLine(7);
Console.WriteLine(11);
Console.WriteLine(13);
Console.WriteLine(17);
for (long i = 19; i < Int64.MaxValue; i = i + 2)
{
if (i % 3 == 0 || i % 5 == 0 || i % 7 == 0 || i % 11 == 0 || i % 13 == 0 || i % 17 == 0)
continue;
// Get one of the 100 "allowances" to add to the queue.
WorkLimiter.WaitOne();
ThreadPool.QueueUserWorkItem(CheckForPrime, i);
}
Console.Read();
}
private static void CheckForPrime(object i)
{
var i1 = i as long?;
try
{
var val = Math.Sqrt(i1.Value);
for (long j = 19; j <= val; j = j + 2)
{
if (i1%j == 0) return;
}
Console.WriteLine(i1);
}
finally
{
// Allow another add to the queue
WorkLimiter.Release();
}
}
}
This will allow you to keep the queue full (100 items in queue) at all times, without over-filling it or adding a Sleep.
To put it rather bluntly, you're doing multi-threading wrong. Threads are a powerful tool when used correctly, but like all tools, they're not the correct solution in every case. A glass bottle works well for holding beer, but not so great for hammering nails.
In the general case, creating more threads is not going to make things run any faster, and that is particularly true here as you've discovered. The code you've written queues up a new thread each iteration through your loop, and each of those threads will allocate a stack. Since the default size for a stack in the .NET world is 1 MB, it doesn't take very long for your memory commitment to skyrocket. It therefore comes as no particular surprise that you're exceeding 1 GB. Eventually, you'll run into a hard memory limit and get an OutOfMemoryException thrown at you. And memory is just the most obvious of the resources that your design is quickly starving your system for. Unless your system resources can grow exponentially with your thread pool, you're not going to experience any performance advantages.
Adil suggests inserting a call to Thread.Sleep to give the new thread you create time to run before continuing through the loop (and creating additional threads). As I mentioned in a comment, although this "works", it seems like an awfully ugly hack to me. But it's hard for me to suggest a better solution because the real problem is the design. You say that you have to use a thread pool, but you don't say why that is the case.
If you absolutely have to use a thread pool, the best workaround is probably to set an arbitrary limit on the size of the thread pool (i.e., how many new threads it can spawn), which is accomplished by calling the SetMaxThreads method. This feels at least a little less hacky to me than Thread.Sleep.
Note: If you decide to pursue the SetMaxThreads approach, you should be aware that you cannot set the maximum to less than the minimum. The default value for the minimum is the number of CPU cores, so if you have a dual-core processor, you can't set the maximum to 1 without first lowering the minimum.
Finally, although it doesn't really change the answer in this case, it is worth noting that Task Manager is not a memory profiler. Relying on it as if it were one will frequently get you bad (or at least very misleading) data.
Edit: After further thought, it occurs to me that the problem really lies not with exponential execution but with exponential querying. The maximum number of allowed threads is probably irrelevant because the code will still queue 'em up faster than they can ever hope to be processed. So never mind about limiting the size. You probably want to go with Joachim's solution involving the creation of a semaphore, or the implicit suggestion that everyone has made of not using a thread pool.
You are creating threads in loop without any break. You should give some break to the process of creating thread so that some of thread finish their execution before you create more thread in the ThreadPool. You can use System.Threading.Thread.Sleep for that.
for (long i = 19; i < Int64.MaxValue; i = i+2)
{
if(i % 3 == 0 || i % 5 == 0 || i % 7 == 0 || i % 11 == 0 || i % 13 == 0 || i % 17 == 0 )
continue;
ThreadPool.QueueUserWorkItem(CheckForPrime, i);
System.Threading.Thread.Sleep(100);
}
You should know where to use threads and they will be beneficial and how many threads you require and what would be the impact of thread on application performance. It depends upon the application that for how much time it would suspend the current thread. I just gave 100 miliseconds, you adjust according to your application.
The following ruby code runs in ~15s. It barely uses any CPU/Memory (about 25% of one CPU):
def collatz(num)
num.even? ? num/2 : 3*num + 1
end
start_time = Time.now
max_chain_count = 0
max_starter_num = 0
(1..1000000).each do |i|
count = 0
current = i
current = collatz(current) and count += 1 until (current == 1)
max_chain_count = count and max_starter_num = i if (count > max_chain_count)
end
puts "Max starter num: #{max_starter_num} -> chain of #{max_chain_count} elements. Found in: #{Time.now - start_time}s"
And the following TPL C# puts all my 4 cores to 100% usage and is orders of magnitude slower than the ruby version:
static void Euler14Test()
{
Stopwatch sw = new Stopwatch();
sw.Start();
int max_chain_count = 0;
int max_starter_num = 0;
object locker = new object();
Parallel.For(1, 1000000, i =>
{
int count = 0;
int current = i;
while (current != 1)
{
current = collatz(current);
count++;
}
if (count > max_chain_count)
{
lock (locker)
{
max_chain_count = count;
max_starter_num = i;
}
}
if (i % 1000 == 0)
Console.WriteLine(i);
});
sw.Stop();
Console.WriteLine("Max starter i: {0} -> chain of {1} elements. Found in: {2}s", max_starter_num, max_chain_count, sw.Elapsed.ToString());
}
static int collatz(int num)
{
return num % 2 == 0 ? num / 2 : 3 * num + 1;
}
How come ruby runs faster than C#? I've been told that Ruby is slow. Is that not true when it comes to algorithms?
Perf AFTER correction:
Ruby (Non parallel): 14.62s
C# (Non parallel): 2.22s
C# (With TPL): 0.64s
Actually, the bug is quite subtle, and has nothing to do with threading. The reason that your C# version takes so long is that the intermediate values computed by the collatz method eventually start to overflow the int type, resulting in negative numbers which may then take ages to converge.
This first happens when i is 134,379, for which the 129th term (assuming one-based counting) is 2,482,111,348. This exceeds the maximum value of 2,147,483,647 and therefore gets stored as -1,812,855,948.
To get good performance (and correct results) on the C# version, just change:
int current = i;
…to:
long current = i;
…and:
static int collatz(int num)
…to:
static long collatz(long num)
That will bring down your performance to a respectable 1.5 seconds.
Edit: CodesInChaos raises a very valid point about enabling overflow checking when debugging math-oriented applications. Doing so would have allowed the bug to be immediately identified, since the runtime would throw an OverflowException.
Should be:
Parallel.For(1L, 1000000L, i =>
{
Otherwise, you have integer overfill and start checking negative values. The same collatz method should operate with long values.
I experienced something like that. And I figured out that's because each of your loop iterations need to start other thread and this takes some time, and in this case it's comparable (I think it's more time) than the operations you acctualy do in the loop body.
There is an alternative for that: You can get how many CPU cores you have and than use a parallelism loop with the same number of iterations you have cores, each loop will evaluate part of the acctual loop you want, it's done by making an inner for loop that depends on the parallel loop.
EDIT: EXAMPLE
int start = 1, end = 1000000;
Parallel.For(0, N_CORES, n =>
{
int s = start + (end - start) * n / N_CORES;
int e = n == N_CORES - 1 ? end : start + (end - start) * (n + 1) / N_CORES;
for (int i = s; i < e; i++)
{
// Your code
}
});
You should try this code, I'm pretty sure this will do the job faster.
EDIT: ELUCIDATION
Well, quite a long time since I answered this question, but I faced the problem again and finally understood what's going on.
I've been using AForge implementation of Parallel for loop, and it seems like, it fires a thread for each iteration of the loop, so, that's why if the loop takes relatively a small amount of time to execute, you end up with a inefficient parallelism.
So, as some of you pointed out, System.Threading.Tasks.Parallel methods are based on Tasks, which are kind of a higher level of abstraction of a Thread:
"Behind the scenes, tasks are queued to the ThreadPool, which has been enhanced with algorithms that determine and adjust to the number of threads and that provide load balancing to maximize throughput. This makes tasks relatively lightweight, and you can create many of them to enable fine-grained parallelism."
So yeah, if you use the default library's implementation, you won't need to use this kind of "bogus".
I'm trying to understand the basics of multi-threading so I built a little program that raised a few question and I'll be thankful for any help :)
Here is the little program:
class Program
{
public static int count;
public static int max;
static void Main(string[] args)
{
int t = 0;
DateTime Result;
Console.WriteLine("Enter Max Number : ");
max = int.Parse(Console.ReadLine());
Console.WriteLine("Enter Thread Number : ");
t = int.Parse(Console.ReadLine());
count = 0;
Result = DateTime.Now;
List<Thread> MyThreads = new List<Thread>();
for (int i = 1; i < 31; i++)
{
Thread Temp = new Thread(print);
Temp.Name = i.ToString();
MyThreads.Add(Temp);
}
foreach (Thread th in MyThreads)
th.Start();
while (count < max)
{
}
Console.WriteLine("Finish , Took : " + (DateTime.Now - Result).ToString() + " With : " + t + " Threads.");
Console.ReadLine();
}
public static void print()
{
while (count < max)
{
Console.WriteLine(Thread.CurrentThread.Name + " - " + count.ToString());
count++;
}
}
}
I checked this with some test runs:
I made the maximum number 100, and it seems to be that the fastest execution time is with 2 threads which is 80% faster than the time with 10 threads.
Questions:
1) Threads 4-10 don't print even one time, how can it be?
2) Shouldn't more threads be faster?
I made the maximum number 10000 and disabled printing.
With this configuration, 5 threads seems to be fastest.
Why there is a change compared to the first check?
And also in this configuration (with printing) all the threads print a few times. Why is that different from the first run where only a few threads printed?
Is there is a way to make all the threads print one by one? In a line or something like that?
Thank you very much for your help :)
Your code is certainly a first step into the world of threading, and you've just experienced the first (of many) headaches!
To start with, static may enable you to share a variable among the threads, but it does not do so in a thread safe manner. This means your count < max expression and count++ are not guaranteed to be up to date or an effective guard between threads. Look at the output of your program when max is only 10 (t set to 4, on my 8 processor workstation):
T0 - 0
T0 - 1
T0 - 2
T0 - 3
T1 - 0 // wait T1 got count = 0 too!
T2 - 1 // and T2 got count = 1 too!
T2 - 6
T2 - 7
T2 - 8
T2 - 9
T0 - 4
T3 - 1 // and T3 got count = 1 too!
T1 - 5
To your question about each thread printing one-by-one, I assume you're trying to coordinate access to count. You can accomplish this with synchronization primitives (such as the lock statement in C#). Here is a naive modification to your code which will ensure only max increments occur:
static object countLock = new object();
public static void printWithLock()
{
// loop forever
while(true)
{
// protect access to count using a static object
// now only 1 thread can use 'count' at a time
lock (countLock)
{
if (count >= max) return;
Console.WriteLine(Thread.CurrentThread.Name + " - " + count.ToString());
count++;
}
}
}
This simple modification makes your program logically correct, but also slow. The sample now exhibits a new problem: lock contention. Every thread is now vying for access to countLock. We've made our program thread safe, but without any benefits of parallelism!
Threading and parallelism is not particularly easy to get right, but thankfully recent versions of .Net come with the Task Parallel Library (TPL) and Parallel LINQ (PLINQ).
The beauty of the library is how easy it would be to convert your current code:
var sw = new Stopwatch();
sw.Start();
Enumerable.Range(0, max)
.AsParallel()
.ForAll(number =>
Console.WriteLine("T{0}: {1}",
Thread.CurrentThread.ManagedThreadId,
number));
Console.WriteLine("{0} ms elapsed", sw.ElapsedMilliseconds);
// Sample output from max = 10
//
// T9: 3
// T9: 4
// T9: 5
// T9: 6
// T9: 7
// T9: 8
// T9: 9
// T8: 1
// T7: 2
// T1: 0
// 30 ms elapsed
The output above is an interesting illustration of why threading produces "unexpected results" for newer users. When threads execute in parallel, they may complete chunks of code at different points in time or one thread may be faster than another. You never really know with threading!
Your print function is far from thread safe, that's why 4-10 doesn't print. All threads share the same max and count variables.
Reason for the why more threads slows you down is likely the state change taking place each time the processor changes focus between each thread.
Also, when you're creating a lot of threads, the system needs to allocate new ones. Most of the time it is now advisable to use Tasks instead, as they are pulled from a system managed thread-pool. And thus doesn't necessarily have to be allocated. The creation of a distinct new thread is rather expensive.
Take a look here anyhow: http://msdn.microsoft.com/en-us/library/aa645740(VS.71).aspx
Look carefuly:
t = int.Parse(Console.ReadLine());
count = 0;
Result = DateTime.Now;
List<Thread> MyThreads = new List<Thread>();
for (int i = 1; i < 31; i++)
{
Thread Temp = new Thread(print);
Temp.Name = i.ToString();
MyThreads.Add(Temp);
}
I think you missed a variable t ( i < 31).
You should read many books on parallel and multithreaded programming before writing code, because programming language is just a tool. Good luck!
Say I have an IO-bound task. I'm using WithDegreeOfParallelism = 10 and WithExecution = ForceParallelism mode, but still the query only uses two threads. Why?
I understand PLINQ will usually choose a degree of parallelism equal to my core count, but why does it ignore my specific request for higher parallelism?
static void Main(string[] args)
{
TestParallel(0.UpTo(8));
}
private static void TestParallel(IEnumerable<int> input)
{
var timer = new Stopwatch();
timer.Start();
var size = input.Count();
if (input.AsParallel().
WithDegreeOfParallelism(10).
WithExecutionMode(ParallelExecutionMode.ForceParallelism).
Where(IsOdd).Count() != size / 2)
throw new Exception("Failed to count the odds");
timer.Stop();
Console.WriteLine("Tested " + size + " numbers in " + timer.Elapsed.TotalSeconds + " seconds");
}
private static bool IsOdd(int n)
{
Thread.Sleep(1000);
return n%2 == 1;
}
PLINQ tries to find the optimal number of threads to perform what you want it to do as quickly as possible, if you only have 2 cores on your cpu, that number is most likely 2. If you had a quad core, you would be more likely to see 4 threads appear, but creating 4 threads on a dual core machine wouldn't really improve performance because only 2 threads could be active at the same time.
Also, with IO-based operations, it is likely that any extra threads would simply block on the first IO operation performed.
10 is maximum
Sets the degree of parallelism to use
in a query. Degree of parallelism is
the maximum number of concurrently
executing tasks that will be used to
process the query.
From here:
MSDN
It appears PLINQ tunes the number of threads. When I wrapped the above code in a while(true) loop, the first two iteration took two seconds to run, but the third and above took only one second. PLINQ understood the cores are idle and upped the number of threads. Impressive!
I would agree to Rory, except IO. Haven't tested with disk IO, but network IO definitively may be more effective with more threads, than there are cores on CPU.
Simple test (it would be more correct to run test with each thread count several times, as network speed isn't constant, but still) to prove that:
[Test]
public void TestDownloadThreadsImpactToSpeed()
{
var sampleImages = Enumerable.Range(0, 100)
.Select(x => "url to some quite large file from good server which does not have anti DSS stuff.")
.ToArray();
for (int i = 0; i < 8; i++)
{
var start = DateTime.Now;
var threadCount = (int)Math.Pow(2, i);
Parallel.For(0, sampleImages.Length - 1, new ParallelOptions {MaxDegreeOfParallelism = threadCount},
index =>
{
using (var webClient = new WebClient())
{
webClient.DownloadFile(sampleImages[index],
string.Format(#"c:\test\{0}", index));
}
});
Console.WriteLine("Number of threads: {0}, Seconds: {1}", threadCount, (DateTime.Now - start).TotalSeconds);
}
}
Result with 500x500px image from CDN using 8 core machine with SSD was:
Number of threads: 1, Seconds: 25.3904522
Number of threads: 2, Seconds: 10.8986233
Number of threads: 4, Seconds: 9.9325681
Number of threads: 8, Seconds: 3.7352137
Number of threads: 16, Seconds: 3.3071892
Number of threads: 32, Seconds: 3.1421797
Number of threads: 64, Seconds: 3.1161782
Number of threads: 128, Seconds: 3.7272132
Last result has such time i think firstly because we have to download only 100 images :)
Time differences using 8-64 threads isn't that big, but that is on 8 core machine. If it was 2 core machine (cheap enduser notebook), i think forcing to use 8 threads would have more impact, than on 8 core machine forcing to use 64 threads.