Exception Handling vs performance [duplicate]

Exception Handling vs performance [duplicate] - c#

I know that exceptions have a performance penalty, and that it's generally more efficient to try and avoid exceptions than to drop a big try/catch around everything -- but what about the try block itself? What's the cost of merely declaring a try/catch, even if it never throws an exception?

The performance cost of try is very small. The major cost of exception handling is getting the stack trace and other metadata, and that's a cost that's not paid until you actually have to throw an exception.
But this will vary by language and implementation. Why not write a simple loop in C# and time it yourself?

Actually, a couple months ago I was creating an ASP.NET web app, and I accidentally wrapped a try / catch block with a very long loop. Even though the loop wasn't generating every exceptions, it was taking too much time to finish. When I went back and saw the try / catch wrapped by the loop, I did it the other way around, I wrapped the loop IN the try / catch block. Performance improved a LOT. You can try this on your own: do something like
int total;
DateTime startTime = DateTime.Now;
for(int i = 0; i < 20000; i++)
{
try
{
total += i;
}
catch
{
// nothing to catch;
}
}
Console.Write((DateTime.Now - startTime).ToString());
And then take out the try / catch block. You'll see a big difference!

A common saying is that exceptions are expensive when they are caught - not thrown. This is because most of the exception metadata gathering (such as getting a stack trace etc.) only really happens on the try-catch side (not on the throw side).
Unwinding the stack is actually pretty quick - the CLR walks up the call stack and only pays heed to the finally blocks it finds; at no point in a pure try-finally block does the runtime attempt to 'complete' an exception (it's metadata etc.).
From what I remember, any try-catches with filters (such as "catch (FooException) {}") are just as expensive - even if they do not do anything with the exception.
I would venture to say that a method (call it CatchesAndRethrows) with the following block:
try
{
ThrowsAnException();
}
catch
{
throw;
}
Might result in a faster stack walk in a method - such as:
try
{
CatchesAndRethrows();
}
catch (Exception ex) // The runtime has already done most of the work.
{
// Some fancy logic
}
Some numbers:
With: 0.13905ms
Without: 0.096ms
Percent difference: 144%
Here is the benchmark I ran (remember, release mode - run without debug):
static void Main(string[] args)
{
Stopwatch withCatch = new Stopwatch();
Stopwatch withoutCatch = new Stopwatch();
int iterations = 20000;
for (int i = 0; i < iterations; i++)
{
if (i % 100 == 0)
{
Console.Write("{0}%", 100 * i / iterations);
Console.CursorLeft = 0;
Console.CursorTop = 0;
}
CatchIt(withCatch, withoutCatch);
}
Console.WriteLine("With: {0}ms", ((float)(withCatch.ElapsedMilliseconds)) / iterations);
Console.WriteLine("Without: {0}ms", ((float)(withoutCatch.ElapsedMilliseconds)) / iterations);
Console.WriteLine("Percent difference: {0}%", 100 * withCatch.ElapsedMilliseconds / withoutCatch.ElapsedMilliseconds);
Console.ReadKey(true);
}
static void CatchIt(Stopwatch withCatch, Stopwatch withoutCatch)
{
withCatch.Start();
try
{
FinallyIt(withoutCatch);
}
catch
{
}
withCatch.Stop();
}
static void FinallyIt(Stopwatch withoutCatch)
{
try
{
withoutCatch.Start();
ThrowIt(withoutCatch);
}
finally
{
withoutCatch.Stop();
}
}
private static void ThrowIt(Stopwatch withoutCatch)
{
throw new NotImplementedException();
}

To see what it really costs, you can run the code below. It takes a simple two dimensional array and generates random coordinates which is out of range. If your exception only occurs one time, of course you will not notice it. My example is done to emphasize what it will mean when doing this several thousand times, and what catching an exception vs implementing a simple test will save you.
const int size = 1000;
const int maxSteps = 100000;
var randomSeed = (int)(DateTime.UtcNow - new DateTime(1970,1,1,0,0,0).ToLocalTime()).TotalMilliseconds;
var random = new Random(randomSeed);
var numOutOfRange = 0;
var grid = new int[size,size];
var stopwatch = new Stopwatch();
Console.WriteLine("---Start test with exception---");
stopwatch.Reset();
stopwatch.Start();
for (int i = 0; i < maxSteps; i++)
{
int coord = random.Next(0, size * 2);
try
{
grid[coord, coord] = 1;
}
catch (IndexOutOfRangeException)
{
numOutOfRange++;
}
}
stopwatch.Stop();
Console.WriteLine("Time used: " + stopwatch.ElapsedMilliseconds + "ms, Number out of range: " + numOutOfRange);
Console.WriteLine("---End test with exception---");
random = new Random(randomSeed);
stopwatch.Reset();
Console.WriteLine("---Start test without exception---");
numOutOfRange = 0;
stopwatch.Start();
for (int i = 0; i < maxSteps; i++)
{
int coord = random.Next(0, size * 2);
if (coord >= grid.GetLength(0) || coord >= grid.GetLength(1))
{
numOutOfRange++;
continue;
}
grid[coord, coord] = 1;
}
stopwatch.Stop();
Console.WriteLine("Time used: " + stopwatch.ElapsedMilliseconds + "ms, Number out of range: " + numOutOfRange);
Console.WriteLine("---End test without exception---");
Console.ReadLine();
Example output of this code:
---Start test with exception---
Time used: 3228ms, Number out of range: 49795
---End test with exception---
---Start test without exception---
Time used: 3ms, Number out of range: 49795
---End test without exception---

You might want to read up on Structured Exception Handling. It's Window's implementation of exceptions and used in .NET.
http://www.microsoft.com/msj/0197/Exception/Exception.aspx

Related

Why is the regular multiplication operator is much more efficient than the BigMul method?

I've searched for the differences between the * operator and the Math.BigMul method, and found nothing. So I've decided I would try and test their efficiency against each other. Consider the following code :
public class Program
{
static void Main()
{
Stopwatch MulOperatorWatch = new Stopwatch();
Stopwatch MulMethodWatch = new Stopwatch();
MulOperatorWatch.Start();
// Creates a new MulOperatorClass to perform the start method 100 times.
for (int i = 0; i < 100; i++)
{
MulOperatorClass mOperator = new MulOperatorClass();
mOperator.start();
}
MulOperatorWatch.Stop();
MulMethodWatch.Start();
for (int i = 0; i < 100; i++)
{
MulMethodClass mMethod = new MulMethodClass();
mMethod.start();
}
MulMethodWatch.Stop();
Console.WriteLine("Operator = " + MulOperatorWatch.ElapsedMilliseconds.ToString());
Console.WriteLine("Method = " + MulMethodWatch.ElapsedMilliseconds.ToString());
Console.ReadLine();
}
public class MulOperatorClass
{
public void start()
{
List<long> MulOperatorList = new List<long>();
for (int i = 0; i < 15000000; i++)
{
MulOperatorList.Add(i * i);
}
}
}
public class MulMethodClass
{
public void start()
{
List<long> MulMethodList = new List<long>();
for (int i = 0; i < 15000000; i++)
{
MulMethodList.Add(Math.BigMul(i,i));
}
}
}
}
To sum it up : I've created two classes - MulMethodClass and MulOperatorClass that performs both the start method, which fills a varible of type List<long with the values of i multiply by i many times. The only difference between these methods are the use of the * operator in the operator class, and the use of the Math.BigMul in the method class.
I'm creating 100 instances of each of these classes, just to prevent and overflow of the lists (I can't create a 1000000000 items list).
I then measure the time it takes for each of the 100 classes to execute. The results are pretty peculiar : I've did this process about 15 times and the average results were (in milliseconds) :
Operator = 20357
Method = 24579
That about 4.5 seconds difference, which I think is a lot. I've looked at the source code of the BigMul method - it uses the * operator, and practically does the same exact thing.
So, for my quesitons :
Why such method even exist? It does exactly the same thing.
If it does exactly the same thing, why there is a huge efficiency difference between these two?
I'm just curious :)

Microbenchmarking is art. You are right the method is around 10% slower on x86. Same speed on x64. Note that you have to multiply two longs, so ((long)i) * ((long)i), because it is BigMul!
Now, some easy rules if you want to microbenchmark:
A) Don't allocate memory in the benchmarked code... You don't want the GC to run (you are enlarging the List<>)
B) Preallocate the memory outside the timed zone (create the List<> with the right capacity before running the code)
C) Run at least once or twice the methods before benchmarking it.
D) Try to not do anything but what you are benchmarking, but to force the compiler to run your code. For example checking for an always true condition based on the result of the operation, and throwing an exception if it is false is normally good enough to fool the compiler.
static void Main()
{
// Check x86 or x64
Console.WriteLine(IntPtr.Size == 4 ? "x86" : "x64");
// Check Debug/Release
Console.WriteLine(IsDebug() ? "Debug, USELESS BENCHMARK" : "Release");
// Check if debugger is attached
Console.WriteLine(System.Diagnostics.Debugger.IsAttached ? "Debugger attached, USELESS BENCHMARK!" : "Debugger not attached");
// High priority
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
Stopwatch MulOperatorWatch = new Stopwatch();
Stopwatch MulMethodWatch = new Stopwatch();
// Prerunning of the benchmarked methods
MulMethodClass.start();
MulOperatorClass.start();
{
// No useless method allocation here
MulMethodWatch.Start();
for (int i = 0; i < 100; i++)
{
MulMethodClass.start();
}
MulMethodWatch.Stop();
}
{
// No useless method allocation here
MulOperatorWatch.Start();
for (int i = 0; i < 100; i++)
{
MulOperatorClass.start();
}
MulOperatorWatch.Stop();
}
Console.WriteLine("Operator = " + MulOperatorWatch.ElapsedMilliseconds.ToString());
Console.WriteLine("Method = " + MulMethodWatch.ElapsedMilliseconds.ToString());
Console.ReadLine();
}
public class MulOperatorClass
{
// The method is static. No useless memory allocation
public static void start()
{
for (int i = 2; i < 15000000; i++)
{
// This condition will always be false, but the compiler
// won't be able to remove the code
if (((long)i) * ((long)i) == ((long)i))
{
throw new Exception();
}
}
}
}
public class MulMethodClass
{
public static void start()
{
// The method is static. No useless memory allocation
for (int i = 2; i < 15000000; i++)
{
// This condition will always be false, but the compiler
// won't be able to remove the code
if (Math.BigMul(i, i) == i)
{
throw new Exception();
}
}
}
}
private static bool IsDebug()
{
// Taken from http://stackoverflow.com/questions/2104099/c-sharp-if-then-directives-for-debug-vs-release
object[] customAttributes = Assembly.GetExecutingAssembly().GetCustomAttributes(typeof(DebuggableAttribute), false);
if ((customAttributes != null) && (customAttributes.Length == 1))
{
DebuggableAttribute attribute = customAttributes[0] as DebuggableAttribute;
return (attribute.IsJITOptimizerDisabled && attribute.IsJITTrackingEnabled);
}
return false;
}
E) If you are really sure your code is ok, try changing the order of the tests
F) Put your program in higher priority
but be happy :-)
at least another persons had the same question, and wrote a blog article: http://reflectivecode.com/2008/10/mathbigmul-exposed/
He did the same errors you did.

Calculate pending time to finish processing threaded code C#

I have a code with threads and I want to show the pending time to finish processing. The button1 calls the
function "Function1()" that reads a file in chunks of 1024 bytes controlled in a while loop until get end
of file. Within the "While loop" there is a "foreach loop" where is called the "Function2()". I'm starting
the timer at the beginning of "while loop" and stopping it at the end of "while loop". After that I'm trying
to calculate aprox the Pending time knowing first the number of iterations that will be processed by "while loop".
Then I save the "elapsed time for the first iteration" (lets say T1) and then I multiply it by number of iterations.
This would be
PendingTime = T1*Iterations.
Then I do
PendingTime = PendingTime - Ti, where Ti is the ElapsedTime of the ith iteration.
The issue is when I try with the real code, the multiplation of T1*Iterations gives me 402s and actually
the processing takes 12s.
Maybe some expert could see what I'm doing wrong. Thanks in advance.
The code looks like this:
async void button1_Click(object sender, EventArgs e)
{
//Some code
await Task.Run(() => Function1(inputfile, cts.Token), cts.Token);
//Some code
}
public void Function2()
{
//Some code
}
public void Function1(string inputfile, CancellationToken token)
{
int buffer = 1024;
int IterationCounter = 0;
decimal Iterations = 1;
int PendingTime = 0;
using (BinaryReader reader = new BinaryReader(File.Open(inputfile, FileMode.Open)))
{
FileLength = (int)reader.BaseStream.Length;
Iterations = (int)FileLength/buffer;
while (chunk.Length > 0)
{
Stopwatch sw1 = Stopwatch.StartNew(); //Start time counter
//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
//some code
chunk = reader.ReadBytes(buffer);
foreach (byte data in chunk)
{
//Some code
Function2(); //Call to Function2
}
//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
//Checking if it is the first iteration to save the pending time
//Pending time would be the elapsed time for the first iteration
//multiplied by the number of iterations (FileLength/1024).
sw1.Stop(); //Stop time counter
if (IterationCounter == 1)
{
PendingTime = (int)((decimal)Math.Round(sw1.Elapsed.TotalMilliseconds / 1000, 4)*Iterations);
}
//Show in TexBox1 the pending time
TextBox1.Invoke((MethodInvoker)delegate
{
PendingTime = PendingTime - (int)Math.Round(sw1.Elapsed.TotalMilliseconds / 1000, 4);
TextBox1.Text = PendingTime + " s";
});
}
}
}
Update:
I'm testing with the following code based on the example of Peter Duniho.
It can be tested with any file(i.e. txt file). I've tested with a txt file of 5MB and execution time was 3 seconds, but the pending time appear always as zero in TextBox1. Where I'm wrong?
Note: I changed this:
double timePerIteration = sw1.Elapsed / ++IterationCounter;
to this
double timePerIteration = sw1.ElapsedMilliseconds/1000/ ++IterationCounter;
Since I was getting the error:
Operator '/' cannot be applied to operands of type 'System.TimeSpan' and 'int' (CS0019)
The code so far is. Thanks for help.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Drawing;
using System.IO;
using System.Threading;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace TestTimer
{
public partial class MainForm : Form
{
CancellationTokenSource cts = new CancellationTokenSource();
string filename = "";
long FileLength;
FileInfo fInfo;
Stopwatch sw1 = new Stopwatch();
public MainForm()
{
InitializeComponent();
}
void BtnSelectFileClick(object sender, EventArgs e)
{
OpenFileDialog ofd = new OpenFileDialog();
ofd.Title = "Select file";
DialogResult dr = ofd.ShowDialog();
if (dr == DialogResult.OK)
{
filename = ofd.FileName;
fInfo = new FileInfo(filename);
}
else
{
MessageBox.Show("File not found");
return;
}
}
async void BtnRunProcessClick(object sender, System.EventArgs e)
{
cts = new CancellationTokenSource();
await Task.Run(() => Function1(filename, cts.Token), cts.Token);
}
public void Function1(string inputfile, CancellationToken token)
{
int buffer = 1024;
int IterationCounter = 0;
int Iterations = 0;
double pendingTime = 0;
using (BinaryReader reader = new BinaryReader(File.Open(inputfile, FileMode.Open)))
{
FileLength = (int)reader.BaseStream.Length;
Iterations = (int)FileLength/buffer;
byte[] chunk;
sw1 = Stopwatch.StartNew(); //Start time counter
while (true)
{
chunk = reader.ReadBytes(buffer);
if (chunk.Length == 0) {break;}
foreach (byte data in chunk)
{
Thread.Sleep(90/100);
}
// pendingTime is the current average time-per-iteration,
// times the number of iterations left
double timePerIteration = sw1.ElapsedMilliseconds/1000/ ++IterationCounter;
pendingTime = timePerIteration * (Iterations - IterationCounter);
TextBox1.Invoke((MethodInvoker)delegate
{
// Let string.Format() take care of rounding for you
TextBox1.Text = string.Format("{0:0} s", pendingTime / 1000);
});
}
MessageBox.Show("Execution time: " + string.Format("{0:0} s", sw1.ElapsedMilliseconds / 1000) );
}
}
}
}

I don't see how the code you posted ever actually compiled, never mind worked. The FileLength variable does not appear to be declared, and you never increment the IterationCounter variable, giving you a negative PendingTime value with each iteration. Even if you had incremented the counter, your PendingTime variable's actual meaning changes from the block that executes when the counter is 1 and a little later when you subtract your elapsed time from the current PendingTime variable.
That suggests the code you posted isn't really the code you're using, since the displayed time remaining would always have been negative (even assuming the declaration of FileLength just got accidently dropped from your post for some reason). For the sake of argument, I'll add a statement that does the increment…
As commenter Chris says, when each iteration's actual duration can vary, as it seems to be the case here, the best you're going to do is average all of the iterations up to the current one. Even that may lead to an erroneous time-remaining display, with a fair amount of variation from iteration to iteration (especially if the number of iterations is small), but at least it's more likely to be close.
Something like this would likely work better for you:
public void Function1(string inputfile, CancellationToken token)
{
int buffer = 1024;
int IterationCounter = 0;
int Iterations;
using (BinaryReader reader = new BinaryReader(File.Open(inputfile, FileMode.Open)))
{
if (reader.BaseStream.Length == 0)
{
// nothing to do
return;
}
// NOTE: this won't work for files with length > int.MaxValue!
// Your original code has the same limitation, and I have not
// bothered to change that.
// Now that we know for sure the length is > 0, we can
// do the following to ensure a correct iteration count
Iterations = ((int)reader.BaseStream.Length - 1) / buffer + 1;
Stopwatch sw1 = Stopwatch.StartNew();
while (chunk.Length > 0)
{
//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
//some code
chunk = reader.ReadBytes(buffer);
foreach (byte data in chunk)
{
//Some code
Function2(); //Call to Function2
}
//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
// pendingTime is the current average time-per-iteration,
// times the number of iterations left
double timePerIteration = sw1.ElapsedMilliseconds / ++IterationCounter,
pendingTime = timePerIteration *
(Iterations - IterationCounter);
//Show in TexBox1 the pending time
TextBox1.Invoke((MethodInvoker)delegate
{
// Let string.Format() take care of rounding for you
TextBox1.Text = string.Format("{0:0} s", pendingTime / 1000);
});
}
}
}
Unless you are guaranteed that your input file is always exactly a multiple of 1024 bytes in length, you also had a bug in your calculation of the total iteration count. I fixed that in the above as well.

How can I get the the waiting times for access or failing in a function that is locked by threads?

I'm using a function to add some values in an Dynamic Array (I know that I could use a list but it's a requirement that I must use an Array).
Right now everything is working but I need to know when a thread fails adding a value (because it's locked and to save that time) and when it adds it (I think when it adds, I already have it as you can see in the function Add.
Insert Data:
private void button6_Click(object sender, EventArgs e)
{
showMessage(numericUpDown5.Value.ToString());
showMessage(numericUpDown6.Value.ToString());
for (int i = 0; i < int.Parse(numericUpDown6.Value.ToString()); i++)
{
ThreadStart start = new ThreadStart(insertDataSecure);
new Thread(start).Start();
}
}
private void insertDataSecure()
{
for (int i = 0; i < int.Parse(numericUpDown5.Value.ToString()); i++)
sArray.addSecure(i);
MessageBox.Show(String.Format("Finished data inserted, you can check the result in: {0}", Path.Combine(
Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location),
"times.txt")), "Result", MessageBoxButtons.OK, MessageBoxIcon.Information);
}
Function to Add:
private object padLock = new object();
public void addSecure(int value)
{
Stopwatch sw = Stopwatch.StartNew();
string values = "";
lock (padLock)
{
try
{
if (array == null)
{
this.size = 1;
Resize(this.size);
array[0] = value;
count++;
}
else
{
count++;
if (size == count)
{
size *= 2;
Resize(size);
}
array[count - 1] = value;
}
}
catch
{
throw new System.ArgumentException("It was impossible to insert, try again later.", "insert");
}
values=String.Format("Element {0}, Time taken: {1}ms", value.ToString(), sw.Elapsed.TotalMilliseconds);
sw.Stop();
saveFile(values);
}
Sorry for asking this question but I have read different articles and this is the last one that I tried to use: http://msdn.microsoft.com/en-us/library/4tssbxcw.aspx but when I tried to implement in my code finally crashed in an strange error.

I'm afraid I might not completely understand the question. It sounds like you want to know how long it takes between the time the thread starts and when it actually acquires the lock. But in that case, the thread does not actually fail to add a value; it is simply delayed some period of time.
On the other hand, you do have an exception handler, so presumably there's some scenario you expect where the Resize() method can throw an exception (but you should catch only those exceptions you expect and know you can handle…a bare catch clause is not a good idea, though the harm is mitigated somewhat by the fact that you do throw some exception the exception handler). So I can't help but wonder if that is the failure you're talking about.
That said, assuming the former interpretation is correct – that you want to time how long it takes to acquire the lock – then the following change to your code should do that:
public void addSecure(int value)
{
Stopwatch sw = Stopwatch.StartNew();
string values = "";
lock (padLock)
{
// Save the current timer value here
TimeSpan elapsedToAcquireLock = sw.Elapsed;
try
{
if (array == null)
{
this.size = 1;
Resize(this.size);
array[0] = value;
count++;
}
else
{
count++;
if (size == count)
{
size *= 2;
Resize(size);
}
array[count - 1] = value;
}
}
catch
{
throw new System.ArgumentException("It was impossible to insert, try again later.", "insert");
}
sw.Stop();
values = string.Format(
"Element {0}, Time taken: for lock acquire: {1}ms, for append operation: {2}ms",
value.ToString(),
elapsedToAcquireLock.TotalMilliseconds,
sw.Elapsed.TotalMilliseconds - elapsedToAcquireLock.TotalMilliseconds);
saveFile(values);
}
}
That will display the individual times for the sections of code: acquiring the lock, and then actually adding the value to the array (i.e. the latter not including the time taken to acquire the lock).
If that's not actually what you are trying to do, please edit your question so that it is more clear.

Trying to cause a read or write error in multi-threading

How come the following doesn't give me a read / write error ever?
In my main thread I increment i, which is a read, and a write.
In both threads I write to total.
I would expect to see an error, but I leave it running and never get one, why is this?
For reference, my computer has 2 cores, 4 logical cores, its an Intel i3.
using System;
using System.Collections.Generic;
using System.Threading;
namespace Threading001
{
class Program
{
static void Main(string[] args)
{
int i = 0;
int total = 0;
new Thread(() => {
var service2 = new Service();
while (true)
{
try
{
total += i;
}
catch (Exception ex)
{
Console.WriteLine("Error from thread 2: {0}", ex);
break;
}
}
}).Start();
while (true)
{
try
{
total += i;
i++;
}
catch (Exception ex)
{
Console.WriteLine("Error from thread 1: {0}", ex);
break;
}
}
}
}
}

Since both threads may update total at the same time, the error you may get is a "corrupted" total. i.e.: that total doesn't contain the "correct" sum. However, it won't throw an Exception.
If you want to do this kind of things and avoid any potential error, you could use Interlocked.Add:
Interlocked.Add(ref total, i)
Here is a simplified example:
using System;
using System.Collections.Generic;
using System.Threading;
namespace Threading001
{
class Program
{
static void Main(string[] args)
{
int total = 0;
Thread thread = new Thread(() => {
for ( int j=0 ; j < 1000000 ; j++)
{
Interlocked.Add(ref total, 1);
//total++; //not thread-safe
}
});
thread.Start();
for ( int i=0 ; i < 1000000 ; i++)
{
Interlocked.Add(ref total, 1);
//total++; //not thread-safe
}
thread.Join();
Console.WriteLine(total);
}
}
}
With the version not thread safe, I get a different total each time (eg: 1425407, 1109498, ...).
With the thread safe version, I get 2000000 every time.
So, conclusion: if both threads try to write at the same time, I won't get the results I want. However, it won't throw an exception.

race condition does not throw an exception.
http://en.wikipedia.org/wiki/Race_condition
notice the Example
here is the question in a "correct" manner
pthreads: If I increment a global from two different threads, can there be sync issues?

Dual-core performance worse than single core?

The following nunit test compares performance between running a single thread versus running 2 threads on a dual core machine. Specifically, this is a VMWare dual core virtual Windows 7 machine running on a quad core Linux SLED host with is a Dell Inspiron 503.
Each thread simply loops and increments 2 counters, addCounter and readCounter. This test was original testing a Queue implementation which was discovered to perform worse on a multi-core machine. So in narrowing down the problem to the small reproducible code, you have here no queue only incrementing variables and to shock and dismay, it's far slower with 2 threads then one.
When running the first test, the Task Manager shows 1 of the cores 100% busy with the other core almost idle. Here's the test output for the single thread test:
readCounter 360687000
readCounter2 0
total readCounter 360687000
addCounter 360687000
addCounter2 0
You see over 360 Million increments!
Next the dual thread test shows 100% busy on both cores for the whole 5 seconds duration of the test. However it's output shows only:
readCounter 88687000
readCounter2 134606500
totoal readCounter 223293500
addCounter 88687000
addCounter2 67303250
addFailure0
That's only 223 Million read increments. What is god's creation are those 2 CPU's doing for those 5 seconds to get less work done?
Any possible clue? And can you run the tests on your machine to see if you get different results? One idea is that perhaps the VMWare dual core performance isn't what you would hope.
using System;
using System.Threading;
using NUnit.Framework;
namespace TickZoom.Utilities.TickZoom.Utilities
{
[TestFixture]
public class ActiveMultiQueueTest
{
private volatile bool stopThread = false;
private Exception threadException;
private long addCounter;
private long readCounter;
private long addCounter2;
private long readCounter2;
private long addFailureCounter;
[SetUp]
public void Setup()
{
stopThread = false;
addCounter = 0;
readCounter = 0;
addCounter2 = 0;
readCounter2 = 0;
}
[Test]
public void TestSingleCoreSpeed()
{
var speedThread = new Thread(SpeedTestLoop);
speedThread.Name = "1st Core Speed Test";
speedThread.Start();
Thread.Sleep(5000);
stopThread = true;
speedThread.Join();
if (threadException != null)
{
throw new Exception("Thread failed: ", threadException);
}
Console.Out.WriteLine("readCounter " + readCounter);
Console.Out.WriteLine("readCounter2 " + readCounter2);
Console.Out.WriteLine("total readCounter " + (readCounter + readCounter2));
Console.Out.WriteLine("addCounter " + addCounter);
Console.Out.WriteLine("addCounter2 " + addCounter2);
}
[Test]
public void TestDualCoreSpeed()
{
var speedThread1 = new Thread(SpeedTestLoop);
speedThread1.Name = "Speed Test 1";
var speedThread2 = new Thread(SpeedTestLoop2);
speedThread2.Name = "Speed Test 2";
speedThread1.Start();
speedThread2.Start();
Thread.Sleep(5000);
stopThread = true;
speedThread1.Join();
speedThread2.Join();
if (threadException != null)
{
throw new Exception("Thread failed: ", threadException);
}
Console.Out.WriteLine("readCounter " + readCounter);
Console.Out.WriteLine("readCounter2 " + readCounter2);
Console.Out.WriteLine("totoal readCounter " + (readCounter + readCounter2));
Console.Out.WriteLine("addCounter " + addCounter);
Console.Out.WriteLine("addCounter2 " + addCounter2);
Console.Out.WriteLine("addFailure" + addFailureCounter);
}
private void SpeedTestLoop()
{
try
{
while (!stopThread)
{
for (var i = 0; i < 500; i++)
{
++addCounter;
}
for (var i = 0; i < 500; i++)
{
readCounter++;
}
}
}
catch (Exception ex)
{
threadException = ex;
}
}
private void SpeedTestLoop2()
{
try
{
while (!stopThread)
{
for (var i = 0; i < 500; i++)
{
++addCounter2;
i++;
}
for (var i = 0; i < 500; i++)
{
readCounter2++;
}
}
}
catch (Exception ex)
{
threadException = ex;
}
}
}
}
Edit: I tested the above on a quad core laptop w/o vmware and got similar degraded performance. So I wrote another test similar to the above but which has each thread method in a separate class. My purpose in doing that was to test 4 cores.
Well that test showed excelled results which improved almost linearly with 1, 2, 3, or 4 cores.
With some experimentation now on both machines it appears that the proper performance only happens if main thread methods are on different instances instead of the same instance.
In other words, if multiple threads main entry method on on the same instance of a particular class, then the performance on a multi-core will be worse for each thread you add, instead of better as you might assume.
It almost appears that the CLR is "synchronizing" so only one thread at a time can run on that method. However, my testing says that isn't the case. So it's still unclear what's happening.
But my own problem seems to be solved simply by making separate instances of methods to run threads as their starting point.
Sincerely,
Wayne
EDIT:
Here's an updated unit test that tests 1, 2, 3, & 4 threads with them all on the same instance of a class. Using arrays with variables uses in the thread loop at least 10 elements apart. And performance still degrades significantly for each thread added.
using System;
using System.Threading;
using NUnit.Framework;
namespace TickZoom.Utilities.TickZoom.Utilities
{
[TestFixture]
public class MultiCoreSameClassTest
{
private ThreadTester threadTester;
public class ThreadTester
{
private Thread[] speedThread = new Thread[400];
private long[] addCounter = new long[400];
private long[] readCounter = new long[400];
private bool[] stopThread = new bool[400];
internal Exception threadException;
private int count;
public ThreadTester(int count)
{
for( var i=0; i<speedThread.Length; i+=10)
{
speedThread[i] = new Thread(SpeedTestLoop);
}
this.count = count;
}
public void Run()
{
for (var i = 0; i < count*10; i+=10)
{
speedThread[i].Start(i);
}
}
public void Stop()
{
for (var i = 0; i < stopThread.Length; i+=10 )
{
stopThread[i] = true;
}
for (var i = 0; i < count * 10; i += 10)
{
speedThread[i].Join();
}
if (threadException != null)
{
throw new Exception("Thread failed: ", threadException);
}
}
public void Output()
{
var readSum = 0L;
var addSum = 0L;
for (var i = 0; i < count; i++)
{
readSum += readCounter[i];
addSum += addCounter[i];
}
Console.Out.WriteLine("Thread readCounter " + readSum + ", addCounter " + addSum);
}
private void SpeedTestLoop(object indexarg)
{
var index = (int) indexarg;
try
{
while (!stopThread[index*10])
{
for (var i = 0; i < 500; i++)
{
++addCounter[index*10];
}
for (var i = 0; i < 500; i++)
{
++readCounter[index*10];
}
}
}
catch (Exception ex)
{
threadException = ex;
}
}
}
[SetUp]
public void Setup()
{
}
[Test]
public void SingleCoreTest()
{
TestCores(1);
}
[Test]
public void DualCoreTest()
{
TestCores(2);
}
[Test]
public void TriCoreTest()
{
TestCores(3);
}
[Test]
public void QuadCoreTest()
{
TestCores(4);
}
public void TestCores(int numCores)
{
threadTester = new ThreadTester(numCores);
threadTester.Run();
Thread.Sleep(5000);
threadTester.Stop();
threadTester.Output();
}
}
}

That's only 223 Million read increments. What is god's creation are those 2 CPU's doing for those 5 seconds to get less work done?
You're probably running into cache contention -- when a single CPU is incrementing your integer, it can do so in its own L1 cache, but as soon as two CPUs start "fighting" over the same value, the cache line it's on has to be copied back and forth between their caches each time each one accesses it. The extra time spent copying data between caches adds up fast, especially when the operation you're doing (incrementing an integer) is so trivial.

A few things:
You should probably test each setup at least 10 times and take the average
As far as I know, Thread.sleep is not exact - it depends on how the OS switches your threads
Thread.join is not immediate. Again, it depends on how the OS switches your threads
A better way to test would be to run a computationally intensive operation (say, sum from one to a million) on two configurations and time them. For example:
Time how long it takes to sum from one to a million
Time how long it takes to sum one to 500000 on one thread and one 500001 to 1000000 on another
You were right when you thought that two threads would work faster than one thread. But yours are not the only threads running - the OS has threads, your browser has threads, and so on. Keep in mind that your timings will not be exact and may even fluctuate.
Lastly, there are other reasons(see slide 24) why threads work slower.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Exception Handling vs performance [duplicate] - c#

I know that exceptions have a performance penalty, and that it's generally more efficient to try and avoid exceptions than to drop a big try/catch around everything -- but what about the try block itself? What's the cost of merely declaring a try/catch, even if it never throws an exception?

You might want to read up on Structured Exception Handling. It's Window's implementation of exceptions and used in .NET. http://www.microsoft.com/msj/0197/Exception/Exception.aspx

Related

Why is the regular multiplication operator is much more efficient than the BigMul method?

Calculate pending time to finish processing threaded code C#

How can I get the the waiting times for access or failing in a function that is locked by threads?

Trying to cause a read or write error in multi-threading

Dual-core performance worse than single core?

Categories

Resources