Huge performance hit for inlining sync method into async method

Huge performance hit for inlining sync method into async method - c#

I have a simple performance test, that indirectly calls WriteAsync many times. It performs reasonably as long as WriteAsync is implemented as shown below. However, when I inline WriteByte into WriteAsync, performance degrades by about factor 7.
(To be clear: The only change that I make is replacing the statement containing the WriteByte call with the body of WriteByte.)
Can anybody explain why this happens? I've had a look at the differences in the generated code with Reflector, but nothing struck me as so totally different as that it would explain the huge perf hit.
public sealed override async Task WriteAsync(
byte[] buffer, int offset, int count, CancellationToken cancellationToken)
{
var writeBuffer = this.WriteBuffer;
var pastEnd = offset + count;
while ((offset < pastEnd) && ((writeBuffer.Count < writeBuffer.Capacity) ||
await writeBuffer.FlushAsync(cancellationToken)))
{
offset = WriteByte(buffer, offset, writeBuffer);
}
this.TotalCount += count;
}
private int WriteByte(byte[] buffer, int offset, WriteBuffer writeBuffer)
{
var currentByte = buffer[offset];
if (this.previousWasEscapeByte)
{
this.previousWasEscapeByte = false;
this.crc = Crc.AddCrcCcitt(this.crc, currentByte);
currentByte = (byte)(currentByte ^ Frame.EscapeXor);
++offset;
}
else
{
if (currentByte < Frame.InvalidStart)
{
this.crc = Crc.AddCrcCcitt(this.crc, currentByte);
++offset;
}
else
{
currentByte = Frame.EscapeByte;
this.previousWasEscapeByte = true;
}
}
writeBuffer[writeBuffer.Count++] = currentByte;
return offset;
}

async methods are rewritten by the compiler into a giant state machine, very similar to methods using yield return. All of your locals become fields in the state machine's class. The compiler currently doesn't try to optimize this at all, so any optimization is up to the coder.
Every local which would have sat happily in a register is now being read from and written to memory. Refactoring synchronous code out of async methods and into a sync method is a very valid performance optimization -- you're just doing the reverse!

Related

Writing a unit test for concurrent C# code?

I've been trying to solve this issue for quite some time now. I've written some example code showcasing the usage of lock in C#. Running my code manually I can see that it works the way it should, but of course I would like to write a unit test that confirms my code.
I have the following ObjectStack.cs class:
enum ExitCode
{
Success = 0,
Error = 1
}
public class ObjectStack
{
private readonly Stack<Object> _objects = new Stack<object>();
private readonly Object _lockObject = new Object();
private const int NumOfPopIterations = 1000;
public ObjectStack(IEnumerable<object> objects)
{
foreach (var anObject in objects) {
Push(anObject);
}
}
public void Push(object anObject)
{
_objects.Push(anObject);
}
public void Pop()
{
_objects.Pop();
}
public void ThreadSafeMultiPop()
{
for (var i = 0; i < NumOfPopIterations; i++) {
lock (_lockObject) {
try {
Pop();
}
//Because of lock, the stack will be emptied safely and no exception is ever caught
catch (InvalidOperationException) {
Environment.Exit((int)ExitCode.Error);
}
if (_objects.Count == 0) {
Environment.Exit((int)ExitCode.Success);
}
}
}
}
public void ThreadUnsafeMultiPop()
{
for (var i = 0; i < NumOfPopIterations; i++) {
try {
Pop();
}
//Because there is no lock, an exception is caught when popping an already empty stack
catch (InvalidOperationException) {
Environment.Exit((int)ExitCode.Error);
}
if (_objects.Count == 0) {
Environment.Exit((int)ExitCode.Success);
}
}
}
}
And Program.cs:
public class Program
{
private const int NumOfObjects = 100;
private const int NumOfThreads = 10000;
public static void Main(string[] args)
{
var objects = new List<Object>();
for (var i = 0; i < NumOfObjects; i++) {
objects.Add(new object());
}
var objectStack = new ObjectStack(objects);
Parallel.For(0, NumOfThreads, x => objectStack.ThreadUnsafeMultiPop());
}
}
I'm trying to write a unit that tests the thread unsafe method, by checking the exit code value (0 = success, 1 = error) of the executable.
I tried to start and run the application executable as a process in my test, a couple of 100 times, and checked the exit code value each time in the test. Unfortunately, it was 0 every single time.
Any ideas are greatly appreciated!

Logically, there is one, very small, piece of code where this problem can happen. Once one of the threads enters the block of code that pops a single element, then either the pop will work in which case the next line of code in that thread will Exit with success OR the pop will fail in which case the next line of code will catch the exception and Exit.
This means that no matter how much parallelization you put into the program, there is still only one single point in the whole program execution stack where the issue can occur and that is directly before the program exits.
The code is genuinely unsafe, but the probability of an issue happening in any single execution of the code is extremely low as it requires the scheduler to decide not to execute the line of code that will exit the environment cleanly and instead let one of the other Threads raise an exception and exit with an error.
It is extremely difficult to "prove" that a concurrency bug exists, except for really obvious ones, because you are completely dependent on what the scheduler decides to do.
Looking up some other posts I see this post which is written related to Java but references C#: How should I unit test threaded code?
It includes a link to this which might be useful to you: http://research.microsoft.com/en-us/projects/chess/
Hope this is useful and apologies if it is not. Testing concurrency is inherently unpredictable as is writing example code to cause it.

Thanks for all the input! Although I do agree that this is a concurrency issue quite hard to detect due to the scheduler execution among other things, I seem to have found an acceptable solution to my problem.
I wrote the following unit test:
[TestMethod]
public void Executable_Process_Is_Thread_Safe()
{
const string executablePath = "Thread.Locking.exe";
for (var i = 0; i < 1000; i++) {
var process = new Process() {StartInfo = {FileName = executablePath}};
process.Start();
process.WaitForExit();
if (process.ExitCode == 1) {
Assert.Fail();
}
}
}
When I ran the unit test, it seemed that the Parallel.For execution in Program.cs threw strange exceptions at times, so I had to change that to traditional for-loops:
public class Program
{
private const int NumOfObjects = 100;
private const int NumOfThreads = 10000;
public static void Main(string[] args)
{
var objects = new List<Object>();
for (var i = 0; i < NumOfObjects; i++) {
objects.Add(new object());
}
var tasks = new Task[NumOfThreads];
var objectStack = new ObjectStack(objects);
for (var i = 0; i < NumOfThreads; i++)
{
var task = new Task(objectStack.ThreadUnsafeMultiPop);
tasks[i] = task;
}
for (var i = 0; i < NumOfThreads; i++)
{
tasks[i].Start();
}
//Using this seems to throw exceptions from unit test context
//Parallel.For(0, NumOfThreads, x => objectStack.ThreadUnsafeMultiPop());
}
Of course, the unit test is quite dependent on the machine you're running it on (a fast processor may be able to empty the stack and exit safely before reaching the critical section in all cases).

1.) You could inject IL Inject Context switches on a post build of your code in the form of Thread.Sleep(0) using ILGenerator which would most likely help these issues to arise.
2.) I would recommend you take a look at the CHESS project by Microsoft research team.

How to process IEnumerable in batches?

A lot of time my lists are too big tor process and i have to break them up into batches to process.
Is there a way to encapsulate this in a method or an extension. I seem to have to write this batching logic everywhere.
const int TAKE = 100;
int skip = 0;
while (skip < contacts.Count)
{
var batch = contacts.Skip(skip).Take(TAKE).ToList();
DoSomething(batch);
skip += TAKE;
}
I would like to do something like -
Batch(contacts, DoSomething);
Or anything similar so i do not have to write this batching logic again and again.

Using the batch solution from this thread it seems trivial:
const int batchSize = 100;
foreach (var batch in contacts.Batch(batchSize))
{
DoSomething(batch);
}
If you want to also wrap it up:
public static void ProcessInBatches<TSource>(
this IEnumerable<TSource> source,
int batchSize,
Action<IEnumerable<TSource>> action)
{
foreach (var batch in source.Batch(batchSize))
{
action(batch);
}
}
So, your code can be transformed into:
const int batchSize = 100;
contacts.ProcessInBatches(batchSize, DoSomething);

Referencing stream.Position dramatically increase execution time

Any idea why referencing the Position property of a stream dratically increases IO time ?
The execution time of:
sw.Restart();
fs = new FileStream("tmp", FileMode.Open);
var br = new BinaryReader(fs);
for (int i = 0; i < n; i++)
{
fs.Position+=0; //Should be a NOOP
a[i] = br.ReadDouble();
}
Debug.Write("\n");
Debug.Write(sw.ElapsedMilliseconds.ToString());
Debug.Write("\n");
fs.Close();
sw.Stop();
Debug.Write(a[0].ToString() + "\n");
Debug.Write(a[n - 1].ToString() + "\n");
is ~100 times slower than the equivalent loop without the "fs.Position+=0;".
Normally the intention with using Seek (or manipulating the Position property) is to speed up things when you dont need all data in the file.
But if, e.g., you only need every second value in the file, it is aparently much faster to read the entire file and discard the data that you dont need, than to skip every second value in the file by moving Stream.Position

You're doing two things:
Fetching the position
Setting it
It's entirely possible that each of those performs a direct interaction with the underlying Win32 APIs, whereas normally you can read a fair amount of data without having to interop with native code at all, due to buffering.
I'm slightly surprised at the extent to which it's worse, but I'm not surprised that it is worse. I think it would be worth you doing separate tests to find out which has more effect - the read or the write. So write similar code which either just reads or just writes. Note that that should affect the code you write later, but it may satisfy your curiosity a bit further.
You might also try
fs.Seek(0, SeekOrigin.Current);
... which is more likely to be ignored as a genuine no-op. But even so, skipping a single byte using fs.Seek(1, SeekOrigin.Current) may well be expensive again.

From the reflector:
public override long Position {
[SecuritySafeCritical]
get {
if (this._handle.IsClosed) {
__Error.FileNotOpen();
}
if (!this.CanSeek) {
__Error.SeekNotSupported();
}
if (this._exposedHandle) {
this.VerifyOSHandlePosition();
}
return this._pos + (long)(this._readPos - this._readLen + this._writePos);
}
set {
if (value < 0L) {
throw new ArgumentOutOfRangeException("value", Environment.GetResourceString("ArgumentOutOfRange_NeedNonNegNum"));
}
if (this._writePos > 0) {
this.FlushWrite(false);
}
this._readPos = 0;
this._readLen = 0;
this.Seek(value, SeekOrigin.Begin);
}
}
(self explanatory) - essentially every set causes flush and there is no check if you're setting the same value to the position; if you're not happy with the FileStream, make your own stream proxy that would handle position update more gracefully :)

C# 5 async CTP: why is internal "state" set to 0 in generated code before EndAwait call?

Yesterday I was giving a talk about the new C# "async" feature, in particular delving into what the generated code looked like, and the GetAwaiter() / BeginAwait() / EndAwait() calls.
We looked in some detail at the state machine generated by the C# compiler, and there were two aspects we couldn't understand:
Why the generated class contains a Dispose() method and a $__disposing variable, which never appear to be used (and the class doesn't implement IDisposable).
Why the internal state variable is set to 0 before any call to EndAwait(), when 0 normally appears to mean "this is the initial entry point".
I suspect the first point could be answered by doing something more interesting within the async method, although if anyone has any further information I'd be glad to hear it. This question is more about the second point, however.
Here's a very simple piece of sample code:
using System.Threading.Tasks;
class Test
{
static async Task<int> Sum(Task<int> t1, Task<int> t2)
{
return await t1 + await t2;
}
}
... and here's the code which gets generated for the MoveNext() method which implements the state machine. This is copied directly from Reflector - I haven't fixed up the unspeakable variable names:
public void MoveNext()
{
try
{
this.$__doFinallyBodies = true;
switch (this.<>1__state)
{
case 1:
break;
case 2:
goto Label_00DA;
case -1:
return;
default:
this.<a1>t__$await2 = this.t1.GetAwaiter<int>();
this.<>1__state = 1;
this.$__doFinallyBodies = false;
if (this.<a1>t__$await2.BeginAwait(this.MoveNextDelegate))
{
return;
}
this.$__doFinallyBodies = true;
break;
}
this.<>1__state = 0;
this.<1>t__$await1 = this.<a1>t__$await2.EndAwait();
this.<a2>t__$await4 = this.t2.GetAwaiter<int>();
this.<>1__state = 2;
this.$__doFinallyBodies = false;
if (this.<a2>t__$await4.BeginAwait(this.MoveNextDelegate))
{
return;
}
this.$__doFinallyBodies = true;
Label_00DA:
this.<>1__state = 0;
this.<2>t__$await3 = this.<a2>t__$await4.EndAwait();
this.<>1__state = -1;
this.$builder.SetResult(this.<1>t__$await1 + this.<2>t__$await3);
}
catch (Exception exception)
{
this.<>1__state = -1;
this.$builder.SetException(exception);
}
}
It's long, but the important lines for this question are these:
// End of awaiting t1
this.<>1__state = 0;
this.<1>t__$await1 = this.<a1>t__$await2.EndAwait();
// End of awaiting t2
this.<>1__state = 0;
this.<2>t__$await3 = this.<a2>t__$await4.EndAwait();
In both cases the state is changed again afterwards before it's next obviously observed... so why set it to 0 at all? If MoveNext() were called again at this point (either directly or via Dispose) it would effectively start the async method again, which would be wholly inappropriate as far as I can tell... if and MoveNext() isn't called, the change in state is irrelevant.
Is this simply a side-effect of the compiler reusing iterator block generation code for async, where it may have a more obvious explanation?
Important disclaimer
Obviously this is just a CTP compiler. I fully expect things to change before the final release - and possibly even before the next CTP release. This question is in no way trying to claim this is a flaw in the C# compiler or anything like that. I'm just trying to work out whether there's a subtle reason for this that I've missed :)

Okay, I finally have a real answer. I sort of worked it out on my own, but only after Lucian Wischik from the VB part of the team confirmed that there really is a good reason for it. Many thanks to him - and please visit his blog (on archive.org), which rocks.
The value 0 here is only special because it's not a valid state which you might be in just before the await in a normal case. In particular, it's not a state which the state machine may end up testing for elsewhere. I believe that using any non-positive value would work just as well: -1 isn't used for this as it's logically incorrect, as -1 normally means "finished". I could argue that we're giving an extra meaning to state 0 at the moment, but ultimately it doesn't really matter. The point of this question was finding out why the state is being set at all.
The value is relevant if the await ends in an exception which is caught. We can end up coming back to the same await statement again, but we mustn't be in the state meaning "I'm just about to come back from that await" as otherwise all kinds of code would be skipped. It's simplest to show this with an example. Note that I'm now using the second CTP, so the generated code is slightly different to that in the question.
Here's the async method:
static async Task<int> FooAsync()
{
var t = new SimpleAwaitable();
for (int i = 0; i < 3; i++)
{
try
{
Console.WriteLine("In Try");
return await t;
}
catch (Exception)
{
Console.WriteLine("Trying again...");
}
}
return 0;
}
Conceptually, the SimpleAwaitable can be any awaitable - maybe a task, maybe something else. For the purposes of my tests, it always returns false for IsCompleted, and throws an exception in GetResult.
Here's the generated code for MoveNext:
public void MoveNext()
{
int returnValue;
try
{
int num3 = state;
if (num3 == 1)
{
goto Label_ContinuationPoint;
}
if (state == -1)
{
return;
}
t = new SimpleAwaitable();
i = 0;
Label_ContinuationPoint:
while (i < 3)
{
// Label_ContinuationPoint: should be here
try
{
num3 = state;
if (num3 != 1)
{
Console.WriteLine("In Try");
awaiter = t.GetAwaiter();
if (!awaiter.IsCompleted)
{
state = 1;
awaiter.OnCompleted(MoveNextDelegate);
return;
}
}
else
{
state = 0;
}
int result = awaiter.GetResult();
awaiter = null;
returnValue = result;
goto Label_ReturnStatement;
}
catch (Exception)
{
Console.WriteLine("Trying again...");
}
i++;
}
returnValue = 0;
}
catch (Exception exception)
{
state = -1;
Builder.SetException(exception);
return;
}
Label_ReturnStatement:
state = -1;
Builder.SetResult(returnValue);
}
I had to move Label_ContinuationPoint to make it valid code - otherwise it's not in the scope of the goto statement - but that doesn't affect the answer.
Think about what happens when GetResult throws its exception. We'll go through the catch block, increment i, and then loop round again (assuming i is still less than 3). We're still in whatever state we were before the GetResult call... but when we get inside the try block we must print "In Try" and call GetAwaiter again... and we'll only do that if state isn't 1. Without the state = 0 assignment, it will use the existing awaiter and skip the Console.WriteLine call.
It's a fairly tortuous bit of code to work through, but that just goes to show the kinds of thing that the team has to think about. I'm glad I'm not responsible for implementing this :)

if it was kept at 1 (first case) you would get a call to EndAwait without a call to BeginAwait. If it's kept at 2 (second case) you'd get the same result just on the other awaiter.
I'm guessing that calling the BeginAwait returns false if it has be started already (a guess from my side) and keeps the original value to return at the EndAwait. If that's the case it would work correctly whereas if you set it to -1 you might have an uninitialized this.<1>t__$await1 for the first case.
This however assumes that BeginAwaiter won't actually start the action on any calls after the first and that it will return false in those cases. Starting would of course be unacceptable since it could have side effect or simply give a different result. It also assumpes that the EndAwaiter will always return the same value no matter how many times it's called and that is can be called when BeginAwait returns false (as per the above assumption)
It would seem to be a guard against race conditions
If we inline the statements where movenext is called by a different thread after the state = 0 in questions it woule look something like the below
this.<a1>t__$await2 = this.t1.GetAwaiter<int>();
this.<>1__state = 1;
this.$__doFinallyBodies = false;
this.<a1>t__$await2.BeginAwait(this.MoveNextDelegate)
this.<>1__state = 0;
//second thread
this.<a1>t__$await2 = this.t1.GetAwaiter<int>();
this.<>1__state = 1;
this.$__doFinallyBodies = false;
this.<a1>t__$await2.BeginAwait(this.MoveNextDelegate)
this.$__doFinallyBodies = true;
this.<>1__state = 0;
this.<1>t__$await1 = this.<a1>t__$await2.EndAwait();
//other thread
this.<1>t__$await1 = this.<a1>t__$await2.EndAwait();
If the assumptions above are correct the there's some unneeded work done such as get sawiater and reassigning the same value to <1>t__$await1. If the state was kept at 1 then the last part would in stead be:
//second thread
//I suppose this un matched call to EndAwait will fail
this.<1>t__$await1 = this.<a1>t__$await2.EndAwait();
further if it was set to 2 the state machine would assume it already had gotten the value of the first action which would be untrue and a (potentially) unassigned variable would be used to calculate the result

Could it be something to do with stacked/nested async calls ?..
i.e:
async Task m1()
{
await m2;
}
async Task m2()
{
await m3();
}
async Task m3()
{
Thread.Sleep(10000);
}
Does the movenext delegate get called multiple times in this situation ?
Just a punt really?

Explanation of actual states:
possible states:
0 Initialized (i think so) or waiting for end of operation
>0 just called MoveNext, chosing next state
-1 ended
Is it possible that this implementation just wants to assure that if another Call to MoveNext from whereever happens (while waiting) it will reevaluate the whole state-chain again from the beginning, to reevaluate results which could be in the mean time already outdated?

Can locking a List fail

I have a class that inherits from MemoryStream in order to provide some buffering. The class works exactly as expected but every now and then I get an InvalidOperationException during a Read with the error message being
Collection was modified; enumeration operation may not execute.
My code is below and the only line that enumerates a collection would seem to be:
m_buffer = m_buffer.Skip(count).ToList();
However I have that and all other operations that can modify the m_buffer object within locks so I'm mystified as to how a Write operation could interfere with a Read to cause that exception?
public class MyMemoryStream : MemoryStream
{
private ManualResetEvent m_dataReady = new ManualResetEvent(false);
private List<byte> m_buffer = new List<byte>();
public override void Write(byte[] buffer, int offset, int count)
{
lock (m_buffer)
{
m_buffer.AddRange(buffer.ToList().Skip(offset).Take(count));
}
m_dataReady.Set();
}
public override int Read(byte[] buffer, int offset, int count)
{
if (m_buffer.Count == 0)
{
// Block until the stream has some more data.
m_dataReady.Reset();
m_dataReady.WaitOne();
}
lock (m_buffer)
{
if (m_buffer.Count >= count)
{
// More bytes available than were requested.
Array.Copy(m_buffer.ToArray(), 0, buffer, offset, count);
m_buffer = m_buffer.Skip(count).ToList();
return count;
}
else
{
int length = m_buffer.Count;
Array.Copy(m_buffer.ToArray(), 0, buffer, offset, length);
m_buffer.Clear();
return length;
}
}
}
}

I cannot say exactly what's going wrong from the code you posted, but a bit of an oddity is that you lock on m_buffer, but replace the buffer, so that the collection locked is not always the collection that is being read and modified.
It is good practice to use a dedicated private readonly object for the locking:
private readonly object locker = new object();
// ...
lock(locker)
{
// ...
}

You have at least one data race there: on the Read method , if you're pre-empted after the if(m_buffer.Count == 0) block and before the lock, Count can be 0 again. You should check the count inside the lock, and use Monitor.Wait, Monitor.Pulse and/or Monitor.PulseAll for the wait/signal coordination, like this:
// On Write
lock(m_buffer)
{
// ...
Monitor.PulseAll();
}
// On Read
lock(m_buffer)
{
while(m_buffer.Count == 0)
Monitor.Wait(m_buffer);
// ...
You have to protect all accesses to m_buffer, and calling m_buffer.Count is not special in that regard.

Do you modify the content of buffer in another thread somewhere, I suspect that may be the enumeration giving error rather than m_buffer.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Huge performance hit for inlining sync method into async method - c#

Related

Writing a unit test for concurrent C# code?

How to process IEnumerable in batches?

Referencing stream.Position dramatically increase execution time

C# 5 async CTP: why is internal "state" set to 0 in generated code before EndAwait call?

Can locking a List fail

Categories

Resources