I imagined the result would be a negative value, due to not locking and multiple threads sharing the same object. I have tested this many times with release and debug version, every time the result is correct. Why is it still correct?
Code :
static BankAccount ThisbankAccount = new BankAccount(10000);
public static void WithdrawMoney()
{
for(int i = 0; i < 1000; i++)
ThisbankAccount.WithdrawMoney(25);
}
static void Main(string[] args)
{
Thread client1 = new Thread(WithdrawMoney);
Thread client2 = new Thread(WithdrawMoney);
Thread client3 = new Thread(WithdrawMoney);
client1.Start();
client2.Start();
client3.Start();
client3.Join();
Console.WriteLine( ThisbankAccount.Balance);
Console.ReadLine();
}
}
public class BankAccount
{
object Acctlocker = new object();
public BankAccount(int initialAmount)
{
m_balance = initialAmount;
}
public void WithdrawMoney(int amnt)
{
// lock(Acctlocker)
// {
if (m_balance - amnt >= 0)
{
m_balance -= amnt;
}
// }
}
public int Balance
{
get
{
return m_balance;
}
}
private int m_balance;
}
Just because something works now doesn't mean it is guaranteed to work. Race conditions are hard to trigger and might take years to surface. And when they surface, they can be very hard to track down and diagnose.
To see your problem in action, change this code:
if (m_balance - amnt >= 0)
{
m_balance -= amnt;
}
to:
if (m_balance - amnt >= 0)
{
Thread.Sleep(10);
m_balance -= amnt;
}
That introduces a slow enough code path to highlight the problem really easily.
The reason you aren't spotting it with your current code is that the operations you are doing (subtraction and comparisons) are very fast. So the window for the race condition is very small - and you are lucky enough for it not to occur. But, over unlimited time, it definitely will occur.
Related
I was trying out some concepts related to lock and Mutex in C# Threading. However if found that using Mutex gave me correct results while that by using lock were inconsitent.
With lock construct:
class BankAccount
{
private int balance;
public object padlock = new object();
public int Balance { get => balance; private set => balance = value; }
public void Deposit(int amount)
{
lock ( padlock )
{
balance += amount;
}
}
public void Withdraw(int amount)
{
lock ( padlock )
{
balance -= amount;
}
}
public void Transfer(BankAccount where, int amount)
{
lock ( padlock )
{
balance = balance - amount;
where.Balance = where.Balance + amount;
}
}
}
static void Main(string[] args)
{
var ba1 = new BankAccount();
var ba2 = new BankAccount();
var task = Task.Factory.StartNew(() =>
{
for ( int j = 0; j < 1000; ++j )
ba1.Deposit(100);
});
var task1 = Task.Factory.StartNew(() =>
{
for ( int j = 0; j < 1000; ++j )
ba2.Deposit(100);
});
var task2 = Task.Factory.StartNew(() =>
{
for ( int j = 0; j < 1000; ++j )
ba1.Transfer(ba2, 100);
});
Task.WaitAll(task, task1, task2);
Console.WriteLine($"Final balance is {ba1.Balance}.");
Console.WriteLine($"Final balance is {ba2.Balance}.");
Console.ReadLine();
}
The code was giving incorrect balance for ba2 while ba1 was set to 0.
This is the case even though each operation is surrounded by lock statement. It is not working correctly.
With Mutex construct:
class BankAccount
{
private int balance;
public int Balance { get => balance; private set => balance = value; }
public void Deposit(int amount)
{
balance += amount;
}
public void Withdraw(int amount)
{
balance -= amount;
}
public void Transfer(BankAccount where, int amount)
{
balance = balance - amount;
where.Balance = where.Balance + amount;
}
}
static void Main(string[] args)
{
var ba1 = new BankAccount();
var ba2 = new BankAccount();
var mutex1 = new Mutex();
var mutex2 = new Mutex();
var task = Task.Factory.StartNew(() =>
{
for ( int j = 0; j < 1000; ++j )
{
var lockTaken = mutex1.WaitOne();
try
{
ba1.Deposit(100);
}
finally
{
if ( lockTaken )
{
mutex1.ReleaseMutex();
}
}
}
});
var task1 = Task.Factory.StartNew(() =>
{
for ( int j = 0; j < 1000; ++j )
{
var lockTaken = mutex2.WaitOne();
try
{
ba2.Deposit(100);
}
finally
{
if ( lockTaken )
{
mutex2.ReleaseMutex();
}
}
}
});
var task2 = Task.Factory.StartNew(() =>
{
for ( int j = 0; j < 1000; ++j )
{
bool haveLock = Mutex.WaitAll(new[] { mutex1, mutex2 });
try
{
ba1.Transfer(ba2, 100);
}
finally
{
if ( haveLock )
{
mutex1.ReleaseMutex();
mutex2.ReleaseMutex();
}
}
}
});
Task.WaitAll(task, task1, task2);
Console.WriteLine($"Final balance is {ba1.Balance}.");
Console.WriteLine($"Final balance is {ba2.Balance}.");
Console.ReadLine();
}
With this approach I was getting correct balances every time I ran it.
I am not able to figure out why first approach is not working correctly. Am I missing something with respect to lock statements?
The main problem is with this line:
public int Balance { get => balance; private set => balance = value; }
You are allowing external code to meddle with the balance field, without the protection of the padlock. You also allow out-of-order reads of the balance field, because of the lack of a memory barrier, or even worse torn reads in case you later replace the int type with the more appropriate decimal.
The second problem can be solved by protecting the read with the padlock.
public int Balance { get => { lock (padlock) return balance; } }
As for the Transfer method, it can now be implemented without access to the other BankAccounts balance, like this:
public void Transfer(BankAccount where, int amount)
{
Withdraw(amount);
where.Deposit(amount);
}
This Transfer implementation is not atomic though, since an exception in the where.Deposit method could lead to the amount vanishing into thin air. Also other threads are not prevented from reading inconsistent values for the two BankAccounts Balances. This is why people generally use databases equipped with the ACID properties for this kind of work.
The two codes give the same results on my machine VS2017 .NET Framework 4.7.2 and work fine. So perhaps a difference with your system.
Final balance is 0.
Final balance is 200000.
Mutex are historically and originally for inter-process synchronization.
So in the process where the mutex is created, it is never locked against itself unless it was released like in the code provided in the question.
Using an operating system mutex object to synchronize threads is a bad practice and an anti-pattern.
Use Semaphore or Monitor within a process if having problems with lock and volatile.
Mutex : "A synchronization primitive that can also be used for interprocess synchronization."
Semaphore : "Limits the number of threads that can access a resource or pool of resources concurrently."
Monitor : "Provides a mechanism that synchronizes access to objects."
lock : "The lock statement acquires the mutual-exclusion lock for a given object, executes a statement block, and then releases the lock. While a lock is held, the thread that holds the lock can again acquire and release the lock. Any other thread is blocked from acquiring the lock and waits until the lock is released."
volatile : "The volatile keyword indicates that a field might be modified by multiple threads that are executing at the same time. The compiler, the runtime system, and even hardware may rearrange reads and writes to memory locations for performance reasons. Fields that are declared volatile are not subject to these optimizations. Adding the volatile modifier ensures that all threads will observe volatile writes performed by any other thread in the order in which they were performed. There is no guarantee of a single total ordering of volatile writes as seen from all threads of execution."
Hence you may try to add volatile:
private volatile int balance;
You can also set the locker object as static to be shared between instances if needed:
static private object padlock = new object();
Suppose I have some UI, and some Asynchronous Task handler. I can use callback methods/events that link back to the UI and tell it to update its progress display. This is all fine.
But what if I want to change progress on the UI only sometimes? As in, every 10%, I should get a single console output/UI change that would tell me the total/completed, percentage, and the elapsed time. This would require a simple class:
class Progress
{
int completed;
int total;
public Progress(int t)
{
total = t;
completed = 0;
}
bool ShouldReportProgress()
{
if((int)(total/completed) * 100) % 10 == 0)
return true;
return false;
}
}
I need help with 2 things: the method "ShouldReportProgress()" is naive and works incorrectly for various reasons (outputs more than once per step, can skip steps entirely), so I'm hoping there's a better way to do that.
I also am assuming that a class like this MUST exist somewhere already. I could write it myself if it doesn't, but it just seems like a logical conclusion of this issue.
Here's one idea of how you could create your class. I've added a field that the user can set that specifies the ReportIncrement, which is basically stating how often the progress should be reported. There is also an enum for Increment, which specifies whether or not ReportIncrement is a fixed number (i.e. report every 15th completion) or a percentage (i.e. report ever time we complete 10% of the total).
Every time the Completed field is changed, a call is made to ReportProgress, which then checks to see if progress should actually be reported (and it also takes an argument to force reporting). The checking is done in the ShouldReport method, which determines if the current progress is greater than or equal to the last reported progress plus the increment amount:
class Progress
{
public enum Increment
{
Percent,
FixedAmount
}
public Increment IncrementType { get; set; }
public int ReportIncrement { get; set; }
public int Total { get; }
private double completed;
public double Completed
{
get { return completed; }
set
{
completed = value;
ReportProgress();
}
}
public void ReportProgress(bool onlyIfShould = true)
{
if (!onlyIfShould || ShouldReport())
{
Console.WriteLine(
$"{Completed / Total * 100:0}% complete ({Completed} / {Total})");
lastReportedAmount = Completed;
}
}
public Progress(int total)
{
Total = total;
}
private double lastReportedAmount;
private bool ShouldReport()
{
if (Completed >= Total) return true;
switch (IncrementType)
{
case Increment.FixedAmount:
return lastReportedAmount + ReportIncrement <= Completed;
case Increment.Percent:
return lastReportedAmount / Total * 100 +
ReportIncrement <= Completed / Total * 100;
default:
return true;
}
}
}
Then, for example, this class can then be used to report every 10%:
private static void Main()
{
var progress = new Progress(50)
{
ReportIncrement = 10,
IncrementType = Progress.Increment.Percent
};
Console.WriteLine("Starting");
for (int i = 0; i < progress.Total; i++)
{
Console.Write('.');
Thread.Sleep(100);
progress.Completed++;
}
Console.WriteLine("\nDone!\nPress any key to exit...");
Console.ReadKey();
}
Or every 10th completion:
// Same code as above, only this time we specify 'FixedAmount'
var progress = new Progress(50)
{
ReportIncrement = 10,
IncrementType = Progress.Increment.FixedAmount
};
I've searched for the differences between the * operator and the Math.BigMul method, and found nothing. So I've decided I would try and test their efficiency against each other. Consider the following code :
public class Program
{
static void Main()
{
Stopwatch MulOperatorWatch = new Stopwatch();
Stopwatch MulMethodWatch = new Stopwatch();
MulOperatorWatch.Start();
// Creates a new MulOperatorClass to perform the start method 100 times.
for (int i = 0; i < 100; i++)
{
MulOperatorClass mOperator = new MulOperatorClass();
mOperator.start();
}
MulOperatorWatch.Stop();
MulMethodWatch.Start();
for (int i = 0; i < 100; i++)
{
MulMethodClass mMethod = new MulMethodClass();
mMethod.start();
}
MulMethodWatch.Stop();
Console.WriteLine("Operator = " + MulOperatorWatch.ElapsedMilliseconds.ToString());
Console.WriteLine("Method = " + MulMethodWatch.ElapsedMilliseconds.ToString());
Console.ReadLine();
}
public class MulOperatorClass
{
public void start()
{
List<long> MulOperatorList = new List<long>();
for (int i = 0; i < 15000000; i++)
{
MulOperatorList.Add(i * i);
}
}
}
public class MulMethodClass
{
public void start()
{
List<long> MulMethodList = new List<long>();
for (int i = 0; i < 15000000; i++)
{
MulMethodList.Add(Math.BigMul(i,i));
}
}
}
}
To sum it up : I've created two classes - MulMethodClass and MulOperatorClass that performs both the start method, which fills a varible of type List<long with the values of i multiply by i many times. The only difference between these methods are the use of the * operator in the operator class, and the use of the Math.BigMul in the method class.
I'm creating 100 instances of each of these classes, just to prevent and overflow of the lists (I can't create a 1000000000 items list).
I then measure the time it takes for each of the 100 classes to execute. The results are pretty peculiar : I've did this process about 15 times and the average results were (in milliseconds) :
Operator = 20357
Method = 24579
That about 4.5 seconds difference, which I think is a lot. I've looked at the source code of the BigMul method - it uses the * operator, and practically does the same exact thing.
So, for my quesitons :
Why such method even exist? It does exactly the same thing.
If it does exactly the same thing, why there is a huge efficiency difference between these two?
I'm just curious :)
Microbenchmarking is art. You are right the method is around 10% slower on x86. Same speed on x64. Note that you have to multiply two longs, so ((long)i) * ((long)i), because it is BigMul!
Now, some easy rules if you want to microbenchmark:
A) Don't allocate memory in the benchmarked code... You don't want the GC to run (you are enlarging the List<>)
B) Preallocate the memory outside the timed zone (create the List<> with the right capacity before running the code)
C) Run at least once or twice the methods before benchmarking it.
D) Try to not do anything but what you are benchmarking, but to force the compiler to run your code. For example checking for an always true condition based on the result of the operation, and throwing an exception if it is false is normally good enough to fool the compiler.
static void Main()
{
// Check x86 or x64
Console.WriteLine(IntPtr.Size == 4 ? "x86" : "x64");
// Check Debug/Release
Console.WriteLine(IsDebug() ? "Debug, USELESS BENCHMARK" : "Release");
// Check if debugger is attached
Console.WriteLine(System.Diagnostics.Debugger.IsAttached ? "Debugger attached, USELESS BENCHMARK!" : "Debugger not attached");
// High priority
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
Stopwatch MulOperatorWatch = new Stopwatch();
Stopwatch MulMethodWatch = new Stopwatch();
// Prerunning of the benchmarked methods
MulMethodClass.start();
MulOperatorClass.start();
{
// No useless method allocation here
MulMethodWatch.Start();
for (int i = 0; i < 100; i++)
{
MulMethodClass.start();
}
MulMethodWatch.Stop();
}
{
// No useless method allocation here
MulOperatorWatch.Start();
for (int i = 0; i < 100; i++)
{
MulOperatorClass.start();
}
MulOperatorWatch.Stop();
}
Console.WriteLine("Operator = " + MulOperatorWatch.ElapsedMilliseconds.ToString());
Console.WriteLine("Method = " + MulMethodWatch.ElapsedMilliseconds.ToString());
Console.ReadLine();
}
public class MulOperatorClass
{
// The method is static. No useless memory allocation
public static void start()
{
for (int i = 2; i < 15000000; i++)
{
// This condition will always be false, but the compiler
// won't be able to remove the code
if (((long)i) * ((long)i) == ((long)i))
{
throw new Exception();
}
}
}
}
public class MulMethodClass
{
public static void start()
{
// The method is static. No useless memory allocation
for (int i = 2; i < 15000000; i++)
{
// This condition will always be false, but the compiler
// won't be able to remove the code
if (Math.BigMul(i, i) == i)
{
throw new Exception();
}
}
}
}
private static bool IsDebug()
{
// Taken from http://stackoverflow.com/questions/2104099/c-sharp-if-then-directives-for-debug-vs-release
object[] customAttributes = Assembly.GetExecutingAssembly().GetCustomAttributes(typeof(DebuggableAttribute), false);
if ((customAttributes != null) && (customAttributes.Length == 1))
{
DebuggableAttribute attribute = customAttributes[0] as DebuggableAttribute;
return (attribute.IsJITOptimizerDisabled && attribute.IsJITTrackingEnabled);
}
return false;
}
E) If you are really sure your code is ok, try changing the order of the tests
F) Put your program in higher priority
but be happy :-)
at least another persons had the same question, and wrote a blog article: http://reflectivecode.com/2008/10/mathbigmul-exposed/
He did the same errors you did.
I have a task to show difference between syncronized and unsyncronized multithreading. Therefore I wrote an application simulating withdrawing money from clients' bank accounts. Each of some number of threads chooses a random user and withdraws money from the account.
Every thread should withdraw every account once. First time the threads are syncronized, but the second time they are not. So there must be a difference between accounts, withdrawed by syncronized and unsyncronized threads. And the difference must be different for different numbers of users and threads. But in my application I have difference just for 1000 threads. So I need unsyncronized threads' results to be strongly different from syncronized threads' ones.
The class User:
public class User : IComparable
{
public string Name { get; set; }
public int Start { get; set; }
public int FinishSync { get; set; }
public int FinishUnsync { get; set; }
public int Hypothetic { get; set; }
public int Differrence { get; set; }
...
}
The method which withdraws money:
public void Withdraw(ref List<User> users, int sum, bool isSync)
{
int ind = 0;
Thread.Sleep(_due);
var rnd = new Random(DateTime.Now.Millisecond);
//used is list of users, withrawed by the thread
while (_used.Count < users.Count)
{
while (_used.Contains(ind = rnd.Next(0, users.Count))) ; //choosing a random user
if (isSync) //isSync = if threads syncroized
{
if (Monitor.TryEnter(users[ind]))
{
try
{
users[ind].FinishSync = users[ind].FinishSync - sum;
}
finally
{
Monitor.Exit(users[ind]);
}
}
}
else
{
lock (users[ind])
{
users[ind].FinishUnsync = users[ind].FinishUnsync - sum;
}
}
_used.Add(ind);
}
done = true;
}
And the threads are created this way:
private void Withdrawing(bool IsSync)
{
if (IsSync)
{
for (int i = 0; i < _num; i++)
{
_withdrawers.Add(new Withdrawer(Users.Count, _due, _pause));
_threads.Add(new Thread(delegate()
{ _withdrawers[i].Withdraw(ref Users, _sum, true); }));
_threads[i].Name = i.ToString();
_threads[i].Start();
_threads[i].Join();
}
}
else
{
for (int i = 0; i < _num; ++i)
{
_withdrawers.Add(new Withdrawer(Users.Count, _due, _pause));
_threads.Add(new Thread(delegate()
{ _withdrawers[i].Withdraw(ref Users, _sum, false); }));
_threads[i].Name = i.ToString();
_threads[i].Start();
}
}
}
I've changed the Withdraw class this way, bc the problem could have been in creating threads separately from the delegate:
class Withdrawer
{
private List<int>[] _used;
private int _due;
private int _pause;
public int done;
private List<Thread> _threads;
public Withdrawer(List<User> users, int n, int due, int pause, int sum)
{
_due = due;
_pause = pause;
done = 0;
_threads = new List<Thread>(users.Count);
InitializeUsed(users, n);
CreateThreads(users, n, sum, false);
_threads.Clear();
while (done < n) ;
Array.Clear(_used,0,n-1);
InitializeUsed(users, n);
CreateThreads(users, n, sum, true);
}
private void InitializeUsed(List<User> users, int n)
{
_used = new List<int>[n];
for (int i = 0; i < n; i++)
{
_used[i] = new List<int>(users.Count);
for (int j = 0; j < users.Count; j++)
{
_used[i].Add(j);
}
}
}
private void CreateThreads(List<User> users, int n, int sum, bool isSync)
{
for (int i = 0; i < n; i++)
{
_threads.Add(new Thread(delegate() { Withdraw(users, sum, isSync); }));
_threads[i].Name = i.ToString();
_threads[i].Start();
}
}
public void Withdraw(List<User> users, int sum, bool isSync)
{
int ind = 0;
var rnd = new Random();
while (_used[int.Parse(Thread.CurrentThread.Name)].Count > 0)
{
int x = rnd.Next(_used[int.Parse(Thread.CurrentThread.Name)].Count);
ind = _used[int.Parse(Thread.CurrentThread.Name)][x];
if (isSync)
{
lock (users[ind])
{
Thread.Sleep(_due);
users[ind].FinishSync -= sum;
}
}
else
{
Thread.Sleep(_due);
users[ind].FinishUnsync -= sum;
}
_used[int.Parse(Thread.CurrentThread.Name)][x] = _used[int.Parse(Thread.CurrentThread.Name)][_used[int.Parse(Thread.CurrentThread.Name)].Count - 1];
_used[int.Parse(Thread.CurrentThread.Name)].RemoveAt(_used[int.Parse(Thread.CurrentThread.Name)].Count - 1);
Thread.Sleep(_pause);
}
done++;
}
}
Now the problem is FinishUnSync values are correct, while FinishSync values are absolutely not.
Thread.Sleep(_due);
and
Thread.Sleep(_pause);
are used to "hold" the resourse, bc my task is the thread should get resourse, hold it for a _due ms, and after processing wait _pause ms before finishing.
Your code isn't doing anything useful, and doesn't show the difference between synchronized and unsynchronized access. There are many things you'll need to address.
Comments in your code say that _used is a list of users that have been accessed by the thread. You're apparently creating that on a per-thread basis. If that's true, I don't see how. From the looks of things I'd say that _used is accessible to all threads. I don't see anywhere that you're creating a per-thread version of that list. And the naming convention indicates that it's at class scope.
If that list is not per-thread, that would go a long way towards explaining why your data is always the same. You also have a real race condition here because you're updating the list from multiple threads.
Assuning that _used really is a per-thread data structure . . .
You have this code:
if (isSync) //isSync = if threads syncroized
{
if (Monitor.TryEnter(users[ind]))
{
try
{
users[ind].FinishSync = users[ind].FinishSync - sum;
}
finally
{
Monitor.Exit(users[ind]);
}
}
}
else
{
lock (users[ind])
{
users[ind].FinishUnsync = users[ind].FinishUnsync - sum;
}
}
Both of these provide synchronization. In the isSync case, a second thread will fail to do its update if a thread already has the user locked. In the second case, the second thread will wait for the first to finish, and then will do the update. In either case, the use of Monitor or lock prevents concurrent update.
Still, you would potentially see a difference if multiple threads could be executing the isSync code at the same time. But you won't see a difference because in your synchronized case you never let more than one thread execute. That is, you have:
if (IsSync)
{
for (int i = 0; i < _num; i++)
{
_withdrawers.Add(new Withdrawer(Users.Count, _due, _pause));
_threads.Add(new Thread(delegate()
{ _withdrawers[i].Withdraw(ref Users, _sum, true); }));
_threads[i].Name = i.ToString();
_threads[i].Start();
_threads[i].Join();
}
}
else
{
for (int i = 0; i < _num; ++i)
{
_withdrawers.Add(new Withdrawer(Users.Count, _due, _pause));
_threads.Add(new Thread(delegate()
{ _withdrawers[i].Withdraw(ref Users, _sum, false); }));
_threads[i].Name = i.ToString();
_threads[i].Start();
}
}
So in the IsSync case, you start a thread and then wait for it to complete before you start another thread. Your code is not multithreaded. And in the "unsynchronized" case you're using a lock to prevent concurrent updates. So in one case you prevent concurrent updates by only running one thread at a time, and in the other case you prevent concurrent updates by using a lock. There will be no difference.
Something else worth noting is that your method of randomly selecting a user is highly inefficient, and could be part of the problem you're seeing. Basically what you're doing is picking a random number and checking to see if it's in a list. If it is, you try again, etc. And the list keeps growing. Quick experimentation shows that I have to generate 7,000 random numbers between 0 and 1,000 before I get all of them. So your threads are spending a huge amount of time trying to find the next unused account, meaning that they have less likelihood to be processing the same user account at the same time.
You need to do three things. First, change your Withdrawl method so it does this:
if (isSync) //isSync = if threads syncroized
{
// synchronized. prevent concurrent updates.
lock (users[ind])
{
users[ind].FinishSync = users[ind].FinishSync - sum;
}
}
else
{
// unsynchronized. It's a free-for-all.
users[ind].FinishUnsync = users[ind].FinishUnsync - sum;
}
Your Withdrawing method should be the same regardless of whether IsSync is true or not. That is, it should be:
for (int i = 0; i < _num; ++i)
{
_withdrawers.Add(new Withdrawer(Users.Count, _due, _pause));
_threads.Add(new Thread(delegate()
{ _withdrawers[i].Withdraw(ref Users, _sum, false); }));
_threads[i].Name = i.ToString();
_threads[i].Start();
}
Now you always have multiple threads running. The only difference is whether access to the user account is synchronized.
Finally, make your _used list a list of indexes into the users list. Something like:
_used = new List<int>(users.Count);
for (int i = 0; i < _used.Count; ++i)
{
_used[i] = i;
}
Now, when you select a user, you do this:
var x = rnd.Next(_used.Count);
ind = _used[x];
// now remove the item from _used
_used[x] = _used[_used.Count-1];
_used.RemoveAt(_used.Count-1);
That way you can generate all users more efficiently. It will take n random numbers to generate n users.
A couple of nitpicks:
I have no idea why you have the Thread.Sleep call in the Withdraw method. What benefit do you think it provides?
There's no real reason to pass DateTime.Now.Millisecond to the Random constructor. Just calling new Random() will use Environment.TickCount for the seed. Unless you really want to limit the seed to numbers between 0 and 1,000.
The following nunit test compares performance between running a single thread versus running 2 threads on a dual core machine. Specifically, this is a VMWare dual core virtual Windows 7 machine running on a quad core Linux SLED host with is a Dell Inspiron 503.
Each thread simply loops and increments 2 counters, addCounter and readCounter. This test was original testing a Queue implementation which was discovered to perform worse on a multi-core machine. So in narrowing down the problem to the small reproducible code, you have here no queue only incrementing variables and to shock and dismay, it's far slower with 2 threads then one.
When running the first test, the Task Manager shows 1 of the cores 100% busy with the other core almost idle. Here's the test output for the single thread test:
readCounter 360687000
readCounter2 0
total readCounter 360687000
addCounter 360687000
addCounter2 0
You see over 360 Million increments!
Next the dual thread test shows 100% busy on both cores for the whole 5 seconds duration of the test. However it's output shows only:
readCounter 88687000
readCounter2 134606500
totoal readCounter 223293500
addCounter 88687000
addCounter2 67303250
addFailure0
That's only 223 Million read increments. What is god's creation are those 2 CPU's doing for those 5 seconds to get less work done?
Any possible clue? And can you run the tests on your machine to see if you get different results? One idea is that perhaps the VMWare dual core performance isn't what you would hope.
using System;
using System.Threading;
using NUnit.Framework;
namespace TickZoom.Utilities.TickZoom.Utilities
{
[TestFixture]
public class ActiveMultiQueueTest
{
private volatile bool stopThread = false;
private Exception threadException;
private long addCounter;
private long readCounter;
private long addCounter2;
private long readCounter2;
private long addFailureCounter;
[SetUp]
public void Setup()
{
stopThread = false;
addCounter = 0;
readCounter = 0;
addCounter2 = 0;
readCounter2 = 0;
}
[Test]
public void TestSingleCoreSpeed()
{
var speedThread = new Thread(SpeedTestLoop);
speedThread.Name = "1st Core Speed Test";
speedThread.Start();
Thread.Sleep(5000);
stopThread = true;
speedThread.Join();
if (threadException != null)
{
throw new Exception("Thread failed: ", threadException);
}
Console.Out.WriteLine("readCounter " + readCounter);
Console.Out.WriteLine("readCounter2 " + readCounter2);
Console.Out.WriteLine("total readCounter " + (readCounter + readCounter2));
Console.Out.WriteLine("addCounter " + addCounter);
Console.Out.WriteLine("addCounter2 " + addCounter2);
}
[Test]
public void TestDualCoreSpeed()
{
var speedThread1 = new Thread(SpeedTestLoop);
speedThread1.Name = "Speed Test 1";
var speedThread2 = new Thread(SpeedTestLoop2);
speedThread2.Name = "Speed Test 2";
speedThread1.Start();
speedThread2.Start();
Thread.Sleep(5000);
stopThread = true;
speedThread1.Join();
speedThread2.Join();
if (threadException != null)
{
throw new Exception("Thread failed: ", threadException);
}
Console.Out.WriteLine("readCounter " + readCounter);
Console.Out.WriteLine("readCounter2 " + readCounter2);
Console.Out.WriteLine("totoal readCounter " + (readCounter + readCounter2));
Console.Out.WriteLine("addCounter " + addCounter);
Console.Out.WriteLine("addCounter2 " + addCounter2);
Console.Out.WriteLine("addFailure" + addFailureCounter);
}
private void SpeedTestLoop()
{
try
{
while (!stopThread)
{
for (var i = 0; i < 500; i++)
{
++addCounter;
}
for (var i = 0; i < 500; i++)
{
readCounter++;
}
}
}
catch (Exception ex)
{
threadException = ex;
}
}
private void SpeedTestLoop2()
{
try
{
while (!stopThread)
{
for (var i = 0; i < 500; i++)
{
++addCounter2;
i++;
}
for (var i = 0; i < 500; i++)
{
readCounter2++;
}
}
}
catch (Exception ex)
{
threadException = ex;
}
}
}
}
Edit: I tested the above on a quad core laptop w/o vmware and got similar degraded performance. So I wrote another test similar to the above but which has each thread method in a separate class. My purpose in doing that was to test 4 cores.
Well that test showed excelled results which improved almost linearly with 1, 2, 3, or 4 cores.
With some experimentation now on both machines it appears that the proper performance only happens if main thread methods are on different instances instead of the same instance.
In other words, if multiple threads main entry method on on the same instance of a particular class, then the performance on a multi-core will be worse for each thread you add, instead of better as you might assume.
It almost appears that the CLR is "synchronizing" so only one thread at a time can run on that method. However, my testing says that isn't the case. So it's still unclear what's happening.
But my own problem seems to be solved simply by making separate instances of methods to run threads as their starting point.
Sincerely,
Wayne
EDIT:
Here's an updated unit test that tests 1, 2, 3, & 4 threads with them all on the same instance of a class. Using arrays with variables uses in the thread loop at least 10 elements apart. And performance still degrades significantly for each thread added.
using System;
using System.Threading;
using NUnit.Framework;
namespace TickZoom.Utilities.TickZoom.Utilities
{
[TestFixture]
public class MultiCoreSameClassTest
{
private ThreadTester threadTester;
public class ThreadTester
{
private Thread[] speedThread = new Thread[400];
private long[] addCounter = new long[400];
private long[] readCounter = new long[400];
private bool[] stopThread = new bool[400];
internal Exception threadException;
private int count;
public ThreadTester(int count)
{
for( var i=0; i<speedThread.Length; i+=10)
{
speedThread[i] = new Thread(SpeedTestLoop);
}
this.count = count;
}
public void Run()
{
for (var i = 0; i < count*10; i+=10)
{
speedThread[i].Start(i);
}
}
public void Stop()
{
for (var i = 0; i < stopThread.Length; i+=10 )
{
stopThread[i] = true;
}
for (var i = 0; i < count * 10; i += 10)
{
speedThread[i].Join();
}
if (threadException != null)
{
throw new Exception("Thread failed: ", threadException);
}
}
public void Output()
{
var readSum = 0L;
var addSum = 0L;
for (var i = 0; i < count; i++)
{
readSum += readCounter[i];
addSum += addCounter[i];
}
Console.Out.WriteLine("Thread readCounter " + readSum + ", addCounter " + addSum);
}
private void SpeedTestLoop(object indexarg)
{
var index = (int) indexarg;
try
{
while (!stopThread[index*10])
{
for (var i = 0; i < 500; i++)
{
++addCounter[index*10];
}
for (var i = 0; i < 500; i++)
{
++readCounter[index*10];
}
}
}
catch (Exception ex)
{
threadException = ex;
}
}
}
[SetUp]
public void Setup()
{
}
[Test]
public void SingleCoreTest()
{
TestCores(1);
}
[Test]
public void DualCoreTest()
{
TestCores(2);
}
[Test]
public void TriCoreTest()
{
TestCores(3);
}
[Test]
public void QuadCoreTest()
{
TestCores(4);
}
public void TestCores(int numCores)
{
threadTester = new ThreadTester(numCores);
threadTester.Run();
Thread.Sleep(5000);
threadTester.Stop();
threadTester.Output();
}
}
}
That's only 223 Million read increments. What is god's creation are those 2 CPU's doing for those 5 seconds to get less work done?
You're probably running into cache contention -- when a single CPU is incrementing your integer, it can do so in its own L1 cache, but as soon as two CPUs start "fighting" over the same value, the cache line it's on has to be copied back and forth between their caches each time each one accesses it. The extra time spent copying data between caches adds up fast, especially when the operation you're doing (incrementing an integer) is so trivial.
A few things:
You should probably test each setup at least 10 times and take the average
As far as I know, Thread.sleep is not exact - it depends on how the OS switches your threads
Thread.join is not immediate. Again, it depends on how the OS switches your threads
A better way to test would be to run a computationally intensive operation (say, sum from one to a million) on two configurations and time them. For example:
Time how long it takes to sum from one to a million
Time how long it takes to sum one to 500000 on one thread and one 500001 to 1000000 on another
You were right when you thought that two threads would work faster than one thread. But yours are not the only threads running - the OS has threads, your browser has threads, and so on. Keep in mind that your timings will not be exact and may even fluctuate.
Lastly, there are other reasons(see slide 24) why threads work slower.