c# mongodb possible deadlock in FindAsync extention - c#

Im trying to cut out the code im writing.
There is this piece of code where in every FindAsync we need to write:
using (var cursor = await SomeCollection.FindAsync(filter, options))
{
while (await cursor.MoveNextAsync())
{
var batch = cursor.Current;
foreach (var item in batch)
{
//do something
}
}
}
I came up with an extention. My code:
public static async Task<List<T>> GetResultsFromFindAsync<T>(this Task<IAsyncCursor<T>> find)
{
List<T> result = new List<T>();
using (var cursor = await find)
{
while (await cursor.MoveNextAsync())
{
var batch = cursor.Current;
foreach (var item in batch)
{
result.Add(item);
}
}
}
return result;
}
and now, I only use:
List<MyObject> lst = await SomeCollection.FindAsync(filter, options).GetResultsFromFindAsync();
Question is: Can it cause deadlocks since there are 2 async tasks involved in a process with a single Await caluse.
I know that the process really contains 2 Awaitcaluses, but, wont they just conflict which each other, or might cause to a data loss?
I executed a FindAsync using this extention and got my data, so its working, but the test doesn't gurentee deadlocks 100% of the times if it can occur.
I would very much like to know why or why not.
Thanks.

Can this cause deadlocks? Yes, but not for the reasons you listed.
If someone decides to call your method synchronously and uses Wait or Result on the returned Task, they may get a deadlock. SO is littered with "why does my task never complete" questions for that exact reason. Use ConfigureAwait(false) on all Tasks awaited inside your GetResultsFromFindAsync to safeguard yourself.
With that said, if GetResultsFromFindAsync is always consumed asynchronously, there is no issue.
With regards to there only being a "single await clause" - that's not actually true. You are awaiting the Task returned by FindAsync inside your GetResultsFromFindAsync implementation, thus propagating its completion and exceptions (if any). When you call
SomeCollection.FindAsync(filter, options).GetResultsFromFindAsync()
, you are in effect awaiting both Tasks (as the inner Task is awaited by the outer Task). If FindAsync happens to throw (asynchronously), the Task returned by GetResultsFromFindAsync will automatically transition to Faulted state and the exception will be re-thrown when the outer Task is awaited.
In conclusion, there is nothing technically wrong with the method you have written, although introducing ConfigureAwait(false) wouldn't hurt.
EDIT
Having said all of the above, I would personally consider merging the two calls into one, so that calling FindAsync is the responsibility of GetResultsFromFindAsync. This is to prevent consumers from reusing the cursor returned by FindAsync as I expect the following will fail:
IAsyncCursor<T> cursor = await SomeCollection.FindAsync(filter, options);
List<MyObject> lst1 = await cursor.GetResultsFromFindAsync();
List<MyObject> lst2 = await cursor.GetResultsFromFindAsync(); // BOOM.

The .NET driver already offers extension methods for this type of thing. ToList, ToListAsync, and ForEachAsync are all available.
http://mongodb.github.io/mongo-csharp-driver/2.2/reference/driver/crud/reading/#iteration

Related

Awaiting async tasks instantly vs declaring first and then awaiting

Let's look at the following 2 examples:
public class MyClass
{
public async Task Main()
{
var result1 = "";
var result2 = "";
var request1 = await DelayMe();
var request2 = await DelayMe();
result1 = request1;
result2 = request2;
}
private static async Task<String> DelayMe()
{
await Task.Delay(2000);
return "";
}
}
And:
public class MyClass
{
public async Task Main()
{
var result1 = "";
var result2 = "";
var request1 = DelayMe();
var request2 = DelayMe();
result1 = await request1;
result2 = await request2;
}
private static async Task<String> DelayMe()
{
await Task.Delay(2000);
return "";
}
}
In the first example shows how you would typically write async await code where one thing happens after the other and awaited properly.
The second one is first calling the async Task method but it's awaiting it later.
The first example takes a bit over 4000ms to execute because the await is computing the first request before it makes the second; but the second example takes a bit over 2000ms. This happens because the Task actually starts running as soon as the execution steps over the var request1 = DelayMe(); line which means that request1 and request2 are running in parallel. At this point it looks like the await keyword just ensures that the Task is computed.
The second approach feels and acts like a await Task.WhenAll(request1, request2), but in this scenario, if something fails in the 2 requests, you will get an exception instantly instead of waiting for everything to compute and then getting an AggregateException.
My question is that is there a drawback (performance or otherwise) in using the second approach to run multiple awaitable Tasks in parallel when the result of one doesn't depend on the execution of the other? Looking at the lowered code, it looks like the second example generates an equal amount of System.Threading.Tasks.Task1per awaited item while the first one doesn't. Is this still going through theasync await` state-machine flow?
if something fails in the 2 requests, you will get an exception instantly instead of waiting for everything to compute and then getting an AggregateException.
If something fails in the first request, then yes. If something fails in the second request, then no, you wouldn't check the second request results until that task is awaited.
My question is that is there a drawback (performance or otherwise) in using the second approach to run multiple awaitable Tasks in parallel when the result of one doesn't depend on the execution of the other? Looking at the lowered code, it looks like the second example generates an equal amount of System.Threading.Tasks.Task1per awaited item while the first one doesn't. Is this still going through theasync await` state-machine flow?
It's still going through the state machine flow. I tend to recommend await Task.WhenAll because the intent of the code is more explicit, but there are some people who don't like the "always wait even when there are exceptions" behavior. The flip side to that is that Task.WhenAll always collects all the exceptions - if you have fail-fast behavior, then some exceptions could be ignored.
Regarding performance, concurrent execution would be better because you can do multiple operations concurrently. There's no danger of threadpool exhaustion from this because async/await does not use additional threads.
As a side note, I recommend using the term "asynchronous concurrency" for this rather than "parallel", since to many people "parallel" implies parallel processing, i.e., Parallel or PLINQ, which would be the wrong technologies to use in this case.
The drawback of using the second approach to run multiple awaitable tasks in parallel is that the parallelism is not obvious. And not obvious parallelism (implicit multithreading in other words) is dangerous because the bugs that could be introduced are notoriously inconsistent and sporadically observed. Lets suppose that the actual DelayMe running in the production environment was the one bellow:
private static int delaysCount = 0;
private static async Task<String> DelayMe()
{
await Task.Delay(2000);
return (++delaysCount).ToString();
}
Sequentially awaited calls to DelayMe will return increasing numbers. Parallelly awaited calls will occasionally return the same number.

Parallelize without Task.Run()

Is it possible to let something run in background (in example a DB query) without spawining new thread with Task.Run()?
Example Db query
public async Task Query(long objectToDelete)
{
using ( var ctx= new Context())
{
Car car = new Car { Id = objectToDelete};
ctx.Entry(employer).State = EntityState.Deleted;
await ctx.SaveChangesAsync();
}
}
From what I understand the Query is runned synchronously up until the first "await", then the control is returned to the caller,
I wonder if I can leave a caller like that:
public async Task Caller( long id)
{
var runningQuery = Query( id);
/* do something else
*/
// await runningQuery; // commented out so "Caller" can complete early
}
You can do exactly this. The query is running as soon as SaveChangesAsync returns. await pauses execution until the task is done. So don't use await (you did this correctly).
Note, that fire and forget work is very difficult. You need to make sure to log errors. Also, you often cannot rely on the work to ever complete. There might be an error (like a network blip or a deadlock or a timeout) or your whole process might exit before the query is complete. This is especially relevant in ASP.NET. Consider fire and forget work optional if you do not wait for it at any point.
The await keyword will halt execution within the method that it is called and return execution to the method that called that method. So if you wanted to return from "Caller" early, you would just leave the
await runningQuery;
in place.
This Microsoft Page explains it. If you look at the diagram under the heading What Happens in an Async Method , you get a good visual reference of the logic flow.

async function never executed why in c# with quartz task backgrounder in c# [duplicate]

I have a multi-tier .Net 4.5 application calling a method using C#'s new async and await keywords that just hangs and I can't see why.
At the bottom I have an async method that extents our database utility OurDBConn (basically a wrapper for the underlying DBConnection and DBCommand objects):
public static async Task<T> ExecuteAsync<T>(this OurDBConn dataSource, Func<OurDBConn, T> function)
{
string connectionString = dataSource.ConnectionString;
// Start the SQL and pass back to the caller until finished
T result = await Task.Run(
() =>
{
// Copy the SQL connection so that we don't get two commands running at the same time on the same open connection
using (var ds = new OurDBConn(connectionString))
{
return function(ds);
}
});
return result;
}
Then I have a mid level async method that calls this to get some slow running totals:
public static async Task<ResultClass> GetTotalAsync( ... )
{
var result = await this.DBConnection.ExecuteAsync<ResultClass>(
ds => ds.Execute("select slow running data into result"));
return result;
}
Finally I have a UI method (an MVC action) that runs synchronously:
Task<ResultClass> asyncTask = midLevelClass.GetTotalAsync(...);
// do other stuff that takes a few seconds
ResultClass slowTotal = asyncTask.Result;
The problem is that it hangs on that last line forever. It does the same thing if I call asyncTask.Wait(). If I run the slow SQL method directly it takes about 4 seconds.
The behaviour I'm expecting is that when it gets to asyncTask.Result, if it's not finished it should wait until it is, and once it is it should return the result.
If I step through with a debugger the SQL statement completes and the lambda function finishes, but the return result; line of GetTotalAsync is never reached.
Any idea what I'm doing wrong?
Any suggestions to where I need to investigate in order to fix this?
Could this be a deadlock somewhere, and if so is there any direct way to find it?
Yep, that's a deadlock all right. And a common mistake with the TPL, so don't feel bad.
When you write await foo, the runtime, by default, schedules the continuation of the function on the same SynchronizationContext that the method started on. In English, let's say you called your ExecuteAsync from the UI thread. Your query runs on the threadpool thread (because you called Task.Run), but you then await the result. This means that the runtime will schedule your "return result;" line to run back on the UI thread, rather than scheduling it back to the threadpool.
So how does this deadlock? Imagine you just have this code:
var task = dataSource.ExecuteAsync(_ => 42);
var result = task.Result;
So the first line kicks off the asynchronous work. The second line then blocks the UI thread. So when the runtime wants to run the "return result" line back on the UI thread, it can't do that until the Result completes. But of course, the Result can't be given until the return happens. Deadlock.
This illustrates a key rule of using the TPL: when you use .Result on a UI thread (or some other fancy sync context), you must be careful to ensure that nothing that Task is dependent upon is scheduled to the UI thread. Or else evilness happens.
So what do you do? Option #1 is use await everywhere, but as you said that's already not an option. Second option which is available for you is to simply stop using await. You can rewrite your two functions to:
public static Task<T> ExecuteAsync<T>(this OurDBConn dataSource, Func<OurDBConn, T> function)
{
string connectionString = dataSource.ConnectionString;
// Start the SQL and pass back to the caller until finished
return Task.Run(
() =>
{
// Copy the SQL connection so that we don't get two commands running at the same time on the same open connection
using (var ds = new OurDBConn(connectionString))
{
return function(ds);
}
});
}
public static Task<ResultClass> GetTotalAsync( ... )
{
return this.DBConnection.ExecuteAsync<ResultClass>(
ds => ds.Execute("select slow running data into result"));
}
What's the difference? There's now no awaiting anywhere, so nothing being implicitly scheduled to the UI thread. For simple methods like these that have a single return, there's no point in doing an "var result = await...; return result" pattern; just remove the async modifier and pass the task object around directly. It's less overhead, if nothing else.
Option #3 is to specify that you don't want your awaits to schedule back to the UI thread, but just schedule to the thread pool. You do this with the ConfigureAwait method, like so:
public static async Task<ResultClass> GetTotalAsync( ... )
{
var resultTask = this.DBConnection.ExecuteAsync<ResultClass>(
ds => return ds.Execute("select slow running data into result");
return await resultTask.ConfigureAwait(false);
}
Awaiting a task normally would schedule to the UI thread if you're on it; awaiting the result of ContinueAwait will ignore whatever context you are on, and always schedule to the threadpool. The downside of this is you have to sprinkle this everywhere in all functions your .Result depends on, because any missed .ConfigureAwait might be the cause of another deadlock.
This is the classic mixed-async deadlock scenario, as I describe on my blog. Jason described it well: by default, a "context" is saved at every await and used to continue the async method. This "context" is the current SynchronizationContext unless it it null, in which case it is the current TaskScheduler. When the async method attempts to continue, it first re-enters the captured "context" (in this case, an ASP.NET SynchronizationContext). The ASP.NET SynchronizationContext only permits one thread in the context at a time, and there is already a thread in the context - the thread blocked on Task.Result.
There are two guidelines that will avoid this deadlock:
Use async all the way down. You mention that you "can't" do this, but I'm not sure why not. ASP.NET MVC on .NET 4.5 can certainly support async actions, and it's not a difficult change to make.
Use ConfigureAwait(continueOnCapturedContext: false) as much as possible. This overrides the default behavior of resuming on the captured context.
I was in the same deadlock situation but in my case calling an async method from a sync method, what works for me was:
private static SiteMetadataCacheItem GetCachedItem()
{
TenantService TS = new TenantService(); // my service datacontext
var CachedItem = Task.Run(async ()=>
await TS.GetTenantDataAsync(TenantIdValue)
).Result; // dont deadlock anymore
}
is this a good approach, any idea?
Just to add to the accepted answer (not enough rep to comment), I had this issue arise when blocking using task.Result, event though every await below it had ConfigureAwait(false), as in this example:
public Foo GetFooSynchronous()
{
var foo = new Foo();
foo.Info = GetInfoAsync.Result; // often deadlocks in ASP.NET
return foo;
}
private async Task<string> GetInfoAsync()
{
return await ExternalLibraryStringAsync().ConfigureAwait(false);
}
The issue actually lay with the external library code. The async library method tried to continue in the calling sync context, no matter how I configured the await, leading to deadlock.
Thus, the answer was to roll my own version of the external library code ExternalLibraryStringAsync, so that it would have the desired continuation properties.
wrong answer for historical purposes
After much pain and anguish, I found the solution buried in this blog post (Ctrl-f for 'deadlock'). It revolves around using task.ContinueWith, instead of the bare task.Result.
Previously deadlocking example:
public Foo GetFooSynchronous()
{
var foo = new Foo();
foo.Info = GetInfoAsync.Result; // often deadlocks in ASP.NET
return foo;
}
private async Task<string> GetInfoAsync()
{
return await ExternalLibraryStringAsync().ConfigureAwait(false);
}
Avoid the deadlock like this:
public Foo GetFooSynchronous
{
var foo = new Foo();
GetInfoAsync() // ContinueWith doesn't run until the task is complete
.ContinueWith(task => foo.Info = task.Result);
return foo;
}
private async Task<string> GetInfoAsync
{
return await ExternalLibraryStringAsync().ConfigureAwait(false);
}
quick answer :
change this line
ResultClass slowTotal = asyncTask.Result;
to
ResultClass slowTotal = await asyncTask;
why? you should not use .result to get the result of tasks inside most applications except console applications if you do so your program will hang when it gets there
you can also try the below code if you want to use .Result
ResultClass slowTotal = Task.Run(async ()=>await asyncTask).Result;

How to convert this Parallel.ForEach code to async/await

I'm having some trouble getting my head around async/await. I'm helping with an existing code base that has the following code (simplified, for brevity):
List<BuyerContext> buyerContexts = GetBuyers();
var results = new List<Result>();
Parallel.ForEach(buyerContexts, buyerContext =>
{
//The following call creates a connection to a remote web server that
//can take up to 15 seconds to respond
var result = Bid(buyerContext);
if (result != null)
results.Add(result);
}
foreach (var result in results)
{
// do some work here that is predicated on the
// Parallel.ForEach having completed all of its calls
}
How can i convert this code to asynchronous code instead of parallel using async/await? I'm suffering from some pretty severe performance issues that I believe are a result of using a parallel approach to multiple network I/O operations.
I've tried several approaches myself but I'm getting warnings from Visual Studio that my code will execute synchronously or that I can't use await keywords outside of an async method so I'm sure I'm just missing something simple.
EDIT #1: I'm open to alternatives to async/await as well. That just seems to be the proper approach based on my reading so far.
EDIT #2: This application is a Windows Service. It calls out to several "buyers" to ask them to bid on a particular piece of data. I need ALL of the bids back before processing can continue.
The key to "making things async" is to start at the leaves. In this case, start in your network code (not shown), and change whatever synchronous call you have (e.g., WebClient.DownloadString) to the corresponding asynchronous call (e.g., HttpClient.GetStringAsync). Then await that call.
Using await will force the calling method to be async, and change its return type from T to Task<T>. It is also a good idea at this point to add the Async suffix so you're following the well-known convention. Then take all of that method's callers and change them to use await as well, which will then require them to be async, etc. Repeat until you have a BidAsync method to use.
Then you should look at replacing your parallel loop; this is pretty easy to do with Task.WhenAll:
List<BuyerContext> buyerContexts = GetBuyers();
var tasks = buyerContexts.Select(buyerContext => BidAsync(buyerContext));
var results = await Task.WhenAll(tasks);
foreach (var result in results)
{
...
}
Basically, to make use of async-await, the Bid method should have this signature instead of the current one:
public async Task<Result> BidAsync(BuyerContext buyerContext);
This will allow you to use await in this method. Now, every time you make a network call, you basically need to await it. For example, here's how to modify the call and signature of a synchronous method to an asynchronous one.
Before
//Signature
public string ReceiveStringFromClient();
//Call
string messageFromClient = ReceiveStringFromClient();
After
//Signature
public Task<string> ReceiveStringFromClientAsync();
//Call
string messageFromClient = await ReceiveStringFromClientAsync();
If you still need to be able to make synchronous calls to these methods, I would recommend creating new ones suffixed with "Async".
Now you need to do this on every level until you reach your network calls, at which point you'll be able to await .Net's async methods. They normally have the same name as their synchronous version, suffixed with "Async".
Once you've done all that, you can make use of this in your main code. I would do something along these lines:
List<BuyerContext> buyerContexts = GetBuyers();
var results = new List<Result>();
List<Task> tasks = new List<Task>();
//There really is no need for Parallel.ForEach unless you have hundreds of thousands of requests to make.
//If that's the case, I hope you have a good network interface!
foreach (var buyerContext in buyerContexts)
{
var task = Task.Run(async () =>
{
var result = await BidAsync(buyerContext);
if (result != null)
results.Add(result);
});
tasks.Add(task);
}
//Block the current thread until all the calls are completed
Task.WaitAll(tasks);
foreach (var result in results)
{
// do some work here that is predicated on the
// Parallel.ForEach having completed all of its calls
}

How to call an asynchronous (await) method synchronously?

I'm using the .NET API available from parse.com,
https://parse.com/docs/dotnet_guide#objects-saving
A snippet of my code looks like this;
public async void UploadCurrentXML()
{
...
var query = ParseObject.GetQuery("RANDOM_TABLE").WhereEqualTo("some_field", "string");
var count = await query.CountAsync();
ParseObject temp_A;
temp_A = await query.FirstAsync();
...
// do lots of stuff
...
await temp_A.SaveAsync();
}
To summarize; A query is made to a remote database. From the result a specific object (or its reference) is obtained from the database. Multiple operations are performed on the object and in the end, its saved back into the database.
All the database operations happen via await ParseObject.randomfunction() . Is it possible to call these functions in a synchronous manner? Or at least wait till the operation returns without moving on? The application is designed for maintenance purposes and time of operation is NOT an issue.
I'm asking this because as things stand, I get an error which states
The number of count operations in progress has reached its limit.
I've tried,
var count = await query.CountAsync().ConfigureAwait(false);
in all the await calls, but it doesn't help - the code is still running asynchronously.
var count = query.CountAsync().Result;
causes the application to get stuck - fairly certain that I've hit a deadlock.
A bit of searching led me to this question,
How would I run an async Task<T> method synchronously?
But I don't understand how it could apply to my case, since I do not have access to the source of ParseObject. Help? (Am using .NET 4.5)
I recommend that you use asynchronous programming throughout. If you're running into some kind of resource issue (i.e., multiple queries on a single db not allowed), then you should structure your code so that cannot happen (e.g., disabling UI buttons while operations are in flight). Or, if you must, you can use SemaphoreSlim to throttle your async code:
private readonly SemaphoreSlim _mutex = new SemaphoreSlim(1);
public async Task UploadCurrentXMLAsync()
{
await _mutex.WaitAsync();
try
{
...
var query = ParseObject.GetQuery("RANDOM_TABLE").WhereEqualTo("some_field", "string");
var count = await query.CountAsync();
ParseObject temp_A;
temp_A = await query.FirstAsync();
...
// do lots of stuff
...
await temp_A.SaveAsync();
}
finally
{
_mutex.Release();
}
}
But if you really, really want to synchronously block, you can do it like this:
public async Task UploadCurrentXMLAsync();
Task.Run(() => UploadCurrentXMLAsync()).Wait();
Again, I just can't recommend this last "solution", which is more of a hack than a proper solution.
if the api method returns an async task, you can get the awaiter and get the result synchronously
api.DoWorkAsync().GetAwaiter().GetResult();

Categories