Loosely-ordered concurrency with loops?

Loosely-ordered concurrency with loops? - c#

I have the following code that is meant to fetch data from a REST service (the Get(i) calls), then populate a matrix (not shown; this happens within addLabels()) with their relationships.
All of the Get() calls can run in parallel to each other, but they must all be finished before anything enters the second loop (where, once again, the calls can run in parallel to each other). The addLabel() calls depend on the work from the Get() calls to be complete.
** To anyone stumbling across this post, this code is the solution:**
private async void GetTypeButton_Click(object sender, RoutedEventArgs e)
{
await PokeType.InitTypes(); // initializes relationships in the matrix
var table = PokeType.EffectivenessMatrix;
// pretty-printing the table
// ...
// ...
}
private static bool initialized = false;
public static async Task InitTypes()
{
if (initialized) return;
// await blocks until first batch is finished
await Task.WhenAll(Enumerable.Range(1, NUM_TYPES /* inclusive */).Select(i => Get(i)));
// doesn't need to be parallelized because it's quick work.
foreach(PokeType type in cachedTypes.Values)
{
JObject data = type.GetJsonFromCache();
addLabels(type, (JArray)data["super_effective"], Effectiveness.SuperEffectiveAgainst);
addLabels(type, (JArray)data["ineffective"], Effectiveness.NotVeryEffectiveAgainst);
addLabels(type, (JArray)data["no_effect"], Effectiveness.UselessAgainst);
}
initialized = true;
}
public static async Task<PokeType> Get(int id);
As the code is currently written, the InitTypes() method attempts to enter both loops simultaneously; the cachedTypes dictionary is empty because the first loop hasn't finished populating it yet, so it never runs and no relationships are constructed.
How can I properly structure this function? Thanks!

Parallel and async-await don't go together well. Your async lambda expression is actually async void since Parallel.For excpects an Action<int>, which means that Parallel.For can't wait for that operation to complete.
If you're trying to call Get(i) multiple times concurrently and wait for them to complete before moving on you need to use Task.WhenAll:
await Task.WhenAll(Enumerable.Range(1, NUM_TYPES).Select(() => Get(i)))

Related

How can I asynchronously transform one IEnumerable to another, just like LINQ's Select(), but using await on every transformed item?

Consider this situation:
class Product { }
interface IWorker
{
Task<Product> CreateProductAsync();
}
I am now given an IEnumerable<IWorker> workers and am supposed to create an IEnumerable<Product> from it that I have to pass to some other function that I cannot alter:
void CheckProducts(IEnumerable<Product> products);
This methods needs to have access to the entire IEnumerable<Product>. It is not possible to subdivide it and call CheckProducts on multiple subsets.
One obvious solution is this:
CheckProducts(workers.Select(worker => worker.CreateProductAsync().Result));
But this is blocking, of course, and hence it would only be my last resort.
Syntactically, I need precisely this, just without blocking.
I cannot use await inside of the function I'm passing to Select() as I would have to mark it as async and that would require it to return a Task itself and I would have gained nothing. In the end I need an IEnumerable<Product> and not an IEnumerable<Task<Product>>.
It is important to know that the order of the workers creating their products does matter, their work must not overlap. Otherwise, I would do this:
async Task<IEnumerable<Product>> CreateProductsAsync(IEnumerable<IWorker> workers)
{
var tasks = workers.Select(worker => worker.CreateProductAsync());
return await Task.WhenAll(tasks);
}
But unfortunately, Task.WhenAll() executes some tasks in parallel while I need them executed sequentially.
Here is one possibility to implement it if I had an IReadOnlyList<IWorker> instead of an IEnumerable<IWorker>:
async Task<IEnumerable<Product>> CreateProductsAsync(IReadOnlyList<IWorker> workers)
{
var resultList = new Product[workers.Count];
for (int i = 0; i < resultList.Length; ++i)
resultList[i] = await workers[i].CreateProductAsync();
return resultList;
}
But I must deal with an IEnumerable and, even worse, it is usually quite huge, sometimes it is even unlimited, yielding workers forever. If I knew that its size was decent, I would just call ToArray() on it and use the method above.
The ultimate solution would be this:
async Task<IEnumerable<Product>> CreateProductsAsync(IEnumerable<IWorker> workers)
{
foreach (var worker in workers)
yield return await worker.CreateProductAsync();
}
But yield and await are incompatible as described in this answer. Looking at that answer, would that hypothetical IAsyncEnumerator help me here? Does something similar meanwhile exist in C#?
A summary of the issues I'm facing:
I have a potentially endless IEnumerable<IWorker>
I want to asynchronously call CreateProductAsync() on each of them in the same order as they are coming in
In the end I need an IEnumerable<Product>
A summary of what I already tried, but doesn't work:
I cannot use Task.WhenAll() because it executes tasks in parallel.
I cannot use ToArray() and process that array manually in a loop because my sequence is sometimes endless.
I cannot use yield return because it's incompatible with await.
Does anybody have a solution or workaround for me?
Otherwise I will have to use that blocking code...

IEnumerator<T> is a synchronous interface, so blocking is unavoidable if CheckProducts enumerates the next product before the next worker has finished creating the product.
Nevertheless, you can achieve parallelism by creating products on another thread, adding them to a BlockingCollection<T>, and yielding them on the main thread:
static IEnumerable<Product> CreateProducts(IEnumerable<IWorker> workers)
{
var products = new BlockingCollection<Product>(3);
Task.Run(async () => // On the thread pool...
{
foreach (IWorker worker in workers)
{
Product product = await worker.CreateProductAsync(); // Create products serially.
products.Add(product); // Enqueue the product, blocking if the queue is full.
}
products.CompleteAdding(); // Notify GetConsumingEnumerable that we're done.
});
return products.GetConsumingEnumerable();
}
To avoid unbounded memory consumption, you can optionally specify the capacity of the queue as a constructor argument to BlockingCollection<T>. I used 3 in the code above.

The Situation:
Here you're saying you need to do this synchronously, because IEnumerable doesn't support async and the requirements are you need an IEnumerable<Product>.
I am now given an IEnumerable workers and am supposed to
create an IEnumerable from it that I have to pass to some
other function that I cannot alter:
Here you say the entire product set needs to be processed at the same time, presumably making a single call to void CheckProducts(IEnumerable<Product> products).
This methods needs to check the entire Product set as a whole. It is
not possible to subdivide the result.
And here you say the enumerable can yield an indefinite number of items
But I must deal with an IEnumerable and, even worse, it is usually
quite huge, sometimes it is even unlimited, yielding workers forever.
If I knew that its size was decent, I would just call ToArray() on it
and use the method above.
So lets put these together. You need to do asynchronous processing of an indefinite number of items within a synchronous environment and then evaluate the entire set as a whole... synchronously.
The Underlying Problems:
1: To evaluate a set as a whole, it must be completely enumerated. To completely enumerate a set, it must be finite. Therefore it is impossible to evaluate an infinite set as a whole.
2: Switching back and forth between sync and async forces the async code to run synchronously. that might be ok from a requirements perspective, but from a technical perspective it can cause deadlocks (maybe unavoidable, I don't know. Look that up. I'm not the expert).
Possible Solutions to Problem 1:
1: Force the source to be an ICollection<T> instead of IEnumerable<T>. This enforces finiteness.
2: Alter the CheckProducts algorithm to process iteratively, potentially yielding intermediary results while still maintaining an ongoing aggregation internally.
Possible Solutions to Problem 2:
1: Make the CheckProducts method asynchronous.
2: Make the CreateProduct... method synchronous.
Bottom Line
You can't do what you're asking how you're asking, and it sounds like someone else is dictating your requirements. They need to change some of the requirements, because what they're asking for is (and I really hate using this word) impossible. Is it possible you have misinterpreted some of the requirements?

Two ideas for you OP
Multiple call solution
If you are allowed to call CheckProducts more than once, you could simply do this:
foreach (var worker in workers)
{
var product = await worker.CreateProductAsync();
CheckProducts(new [] { product } );
}
If it adds value, I'm pretty sure you could work out a way to do it in batches of, say, 100 at a time, too.
Thread pool solution
If you are not allowed to call CheckProducts more than once, and not allowed to modify CheckProducts, there is no way to force it to yield control and allow other continuations to run. So no matter what you do, you cannot force asynchronousness into the IEnumerable that you pass to it, not just because of the compiler checking, but because it would probably deadlock.
So here is a thread pool solution. The idea is to create one separate thread to process the products in series; the processor is async, so a call to CreateProductAsync() will still yield control to anything else that has been posted to the synchronization context, as needed. However it can't magically force CheckProduct to give up control, so there is still some possibility that it will block occasionally if it is able to check products faster than they are created. In my example I'm using Monitor.Wait() so the O/S won't schedule the thread until there is something waiting for it. You'll still be using up a thread resource while it blocks, but at least you won't be wasting CPU time in a busy-wait loop.
public static IEnumerable<Product> CreateProducts(IEnumerable<Worker> workers)
{
var queue = new ConcurrentQueue<Product>();
var task = Task.Run(() => ConvertProducts(workers.GetEnumerator(), queue));
while (true)
{
while (queue.Count > 0)
{
Product product;
var ok = queue.TryDequeue(out product);
if (ok) yield return product;
}
if (task.IsCompleted && queue.Count == 0) yield break;
Monitor.Wait(queue, 1000);
}
}
private static async Task ConvertProducts(IEnumerator<Worker> input, ConcurrentQueue<Product> output)
{
while (input.MoveNext())
{
var current = input.Current;
var product = await current.CreateProductAsync();
output.Enqueue(product);
Monitor.Pulse(output);
}
}

From your requirements I can put together the following:
1) Workers processed in order
2) Open to receive new Workers at any time
So using the fact that a dataflow TransformBlock has a built in queue and processes items in order. Now we can accept Workers from the producer at any time.
Next we make the result of the TransformBlockobservale so that the consumer can consume Products on demand.
Made some quick changes and started the consumer portion. This simply takes the observable produced by the Transformer and maps it to an enumerable that yields each product. For background here is the ToEnumerable().
The ToEnumerator operator returns an enumerator from an observable sequence. The enumerator will yield each item in the sequence as it is produced
Source
using System;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
namespace ClassLibrary1
{
public class WorkerProducer
{
public async Task ProduceWorker()
{
//await ProductTransformer_Transformer.SendAsync(new Worker())
}
}
public class ProductTransformer
{
public IObservable<Product> Products { get; private set; }
public TransformBlock<Worker, Product> Transformer { get; private set; }
private Task<Product> CreateProductAsync(Worker worker) => Task.FromResult(new Product());
public ProductTransformer()
{
Transformer = new TransformBlock<Worker, Product>(wrk => CreateProductAsync(wrk));
Products = Transformer.AsObservable();
}
}
public class ProductConsumer
{
private ThirdParty ThirdParty { get; set; } = new ThirdParty();
private ProductTransformer Transformer { get; set; }
public ProductConsumer()
{
ThirdParty.CheckProducts(Transformer.Products.ToEnumerable());
}
public class Worker { }
public class Product { }
public class ThirdParty
{
public void CheckProducts(IEnumerable<Product> products)
{
}
}
}

Unless I misunterstood something, I don't see why you don't simply do it like this:
var productList = new List<Product>(workers.Count())
foreach(var worker in workers)
{
productList.Add(await worker.CreateProductAsync());
}
CheckProducts(productList);
What about if you simply keep clearing a List of size 1?
var productList = new List<Product>(1);
var checkTask = Task.CompletedTask;
foreach(var worker in workers)
{
await checkTask;
productList.Clear();
productList.Add(await worker.CreateProductAsync());
checkTask = Task.Run(CheckProducts(productList));
}
await checkTask;

You can use Task.WhenAll, but instead of returning result of Task.WhenAll, return collection of tasks transformed to the collection of results.
async Task<IEnumerable<Product>> CreateProductsAsync(IEnumerable<IWorker> workers)
{
var tasks = workers.Select(worker => worker.CreateProductAsync()).ToList();
await Task.WhenAll(tasks);
return tasks.Select(task => task.Result);
}
Order of tasks will be persisted.
And seems like should be ok to go with just return await Task.WhenAll()
From docs of Task.WhenAll Method (IEnumerable>)
The Task.Result property of the returned task will be set to
an array containing all of the results of the supplied tasks in the
same order as they were provided...
If workers need to be executed one by one in the order they were created and based on requirement that another function need whole set of workers results
async Task<IEnumerable<Product>> CreateProductsAsync(IEnumerable<IWorker> workers)
{
var products = new List<product>();
foreach (var worker in workers)
{
product = await worker.CreateProductAsync();
products.Add(product);
}
return products;
}

You can do this now with async, IEnumerable and LINQ but every method in the chain after the async would be a Task<T>, and you need to use something like await Task.WhenAll at the end. You can use async lambdas in the LINQ methods, which return Task<T>. You don't need to wait synchronously in these.
The Select will start your tasks sequentially i.e. they won't even exist as tasks until the select enumerates each one, and won't keep going after you stop enumerating. You could also run your own foreach over the enumerable of tasks if you want to await them all individually.
You can break out of this like any other foreach without it starting all of them, so this will also work on an infinite enumerable.
public async Task Main()
{
// This async method call could also be an async lambda
foreach (var task in GetTasks())
{
var result = await task;
Console.WriteLine($"Result is {result}");
if (result > 5) break;
}
}
private IEnumerable<Task<int>> GetTasks()
{
return GetNumbers().Select(WaitAndDoubleAsync);
}
private async Task<int> WaitAndDoubleAsync(int i)
{
Console.WriteLine($"Waiting {i} seconds asynchronously");
await Task.Delay(TimeSpan.FromSeconds(i));
return i * 2;
}
/// Keeps yielding numbers
private IEnumerable<int> GetNumbers()
{
var i = 0;
while (true) yield return i++;
}
Outputs, the following, then stops:
Waiting 0 seconds asynchronously
Result is 0
Waiting 1 seconds asynchronously
Result is 2
Waiting 2 seconds asynchronously
Result is 4
Waiting 3 seconds asynchronously
Result is 6
The important thing is that you can't mix yield and await in the same method, but you can yield Tasks returned from a method that uses await absolutely fine, so you can use them together just by splitting them into separate methods. Select is already a method that uses yield, so you may not need to write your own method for this.
In your post you were looking for a Task<IEnumerable<Product>>, but what you can actually use is a IEnumerable<Task<Product>>.
You can go even further with this e.g. if you had something like a REST API where one resource can have links to other resources, like if you just wanted to get a list of users of a group, but stop when you found the user you were interested in:
public async Task<IEnumerable<Task<User>>> GetUserTasksAsync(int groupId)
{
var group = await GetGroupAsync(groupId);
return group.UserIds.Select(GetUserAsync);
}
foreach (var task in await GetUserTasksAsync(1))
{
var user = await task;
...
}

There is no solution to your problem. You can't transform a deferred IEnumerable<Task<Product>> to a deferred IEnumerable<Product>, such that the consuming thread will not get blocked while enumerating the IEnumerable<Product>. The IEnumerable<T> is a synchronous interface. It returns an enumerator with a synchronous MoveNext method. The MoveNext returns bool, which is not an awaitable type. An asynchronous interface IAsyncEnumerable<T> exists, whose enumerator has an asynchronous MoveNextAsync method, with a return type of ValueTask<bool>. But you have explicitly said that you can't change the consuming method, so you are stuck with the IEnumerable<T> interface. No solution then.

try
workers.ForEach(async wrkr =>
{
var prdlist = await wrkr.CreateProductAsync();
//Remaing tasks....
});

Replacement for async void

I'm developing an application for monitoring certain tasks (e.g. if certain services/websites are currently up and running, certain records in the database exist, etc.). And as most these tasks are long running, I use TPL with async/await.
I have an base class for all such tasks:
public abstract class LongRunningOperation
{
// .. some props...
internal async void Start()
{
try
{
this.Status = OperationStatus.Started;
await this.DoStart();
this.Status = OperationStatus.Finished;
}
catch (Exception e)
{
this.Status = OperationStatus.Error;
this.Message = e.ToString();
}
}
protected abstract Task DoStart();
}
And method that launches these tasks looks like this:
public static LongRunningOperation[] LaunchOperations()
{
LongRunningOperation[] operations = GetAllLongRunningOperations();
foreach (var o in operations)
Task.Factory.StartNew(() => { o.Start(); });
return operations;
}
The array returned by this method is used to monitor all LongRunningOperations and log the stats. currently I have a console application having a while (true) loop that prints out the stats (name, status, current runtime) for each operation on the screen (refreshing every second) until all the operations are finished.
The thing that bothers me is the async void method. I've read that it's bad practice to use async void methods, but:
I can't figure out what harm they might do in my scenario
If I change the Start method to return Task, its return value will never be used anywhere, and I can't think why I need it
I'd appreciate it if someone could clarify these points

An async void method is a "fire and forget" operation. You can not wait for any result, and will not know when the operation completes and if it has been successful or not.
Basically you should use void when you are sure that you'll never need to know when the operation finished and if the operation execution was successful or not (for example writing logs).
With async methods that return Task, a caller is capable of waiting for an operation to finish, and also handle exceptions that happened during the execution of the operation.
To summarize, if you do not need a result, an async Task is slightly better because you can await it as well as handle exceptions and deal with task ordering.

Inside a loop,does each async call get chained to the returned task using task's continuewith?

The best practice is to collect all the async calls in a collection inside the loop and do Task.WhenAll(). Yet, want to understand what happens when an await is encountered inside the loop, what would the returned Task contain? what about further async calls? Will it create new tasks and add them to the already returned Task sequentially?
As per the code below
private void CallLoopAsync()
{
var loopReturnedTask = LoopAsync();
}
private async Task LoopAsync()
{
int count = 0;
while(count < 5)
{
await SomeNetworkCallAsync();
count++;
}
}
The steps I assumed are
LoopAsync gets called
count is set to zero, code enters while loop, condition is checked
SomeNetworkCallAsync is called,and the returned task is awaited
New task/awaitable is created
New task is returned to CallLoopAsync()
Now, provided there is enough time for the process to live, How / In what way, will the next code lines like count++ and further SomeNetworkCallAsync be executed?
Update - Based on Jon Hanna and Stephen Cleary:
So there is one Task and the implementation of that Task will involve
5 calls to NetworkCallAsync, but the use of a state-machine means
those tasks need not be explicitly chained for this to work. This, for
example, allows it to decide whether to break the looping or not based
on the result of a task, and so on.
Though they are not chained, each call will wait for the previous call to complete as we have used await (in state m/c, awaiter.GetResult();). It behaves as if five consecutive calls have been made and they are executed one after the another (only after the previous call gets completed). If this is true, we have to be bit more careful in how we are composing the async calls.For ex:
Instead of writing
private async Task SomeWorkAsync()
{
await SomeIndependentNetworkCall();// 2 sec to complete
var result1 = await GetDataFromNetworkCallAsync(); // 2 sec to complete
await PostDataToNetworkAsync(result1); // 2 sec to complete
}
It should be written
private Task[] RefactoredSomeWorkAsync()
{
var task1 = SomeIndependentNetworkCall();// 2 sec to complete
var task2 = GetDataFromNetworkCallAsync()
.ContinueWith(result1 => PostDataToNetworkAsync(result1)).Unwrap();// 4 sec to complete
return new[] { task1, task2 };
}
So that we can say RefactoredSomeWorkAsync is faster by 2 seconds, because of the possibility of parallelism
private async Task CallRefactoredSomeWorkAsync()
{
await Task.WhenAll(RefactoredSomeWorkAsync());//Faster, 4 sec
await SomeWorkAsync(); // Slower, 6 sec
}
Is this correct? - Yes. Along with "async all the way", "Accumulate tasks all the way" is good practice. Similar discussion is here

When count is zero, new task will be created because of await and be returned
No. It will not. It will simply call the async method consequently, without storing or returning the result. The value in loopReturnedTask will store the Task of LoopAsync, not related to SomeNetworkCallAsync.
await SomeNetworkCallAsync(); // call, wait and forget the result
You may want to read the MSDN article on async\await.

To produce code similar to what async and await do, if those keywords didn't exist, would require code a bit like:
private struct LoopAsyncStateMachine : IAsyncStateMachine
{
public int _state;
public AsyncTaskMethodBuilder _builder;
public TestAsync _this;
public int _count;
private TaskAwaiter _awaiter;
void IAsyncStateMachine.MoveNext()
{
try
{
if (_state != 0)
{
_count = 0;
goto afterSetup;
}
TaskAwaiter awaiter = _awaiter;
_awaiter = default(TaskAwaiter);
_state = -1;
loopBack:
awaiter.GetResult();
awaiter = default(TaskAwaiter);
_count++;
afterSetup:
if (_count < 5)
{
awaiter = _this.SomeNetworkCallAsync().GetAwaiter();
if (!awaiter.IsCompleted)
{
_state = 0;
_awaiter = awaiter;
_builder.AwaitUnsafeOnCompleted<TaskAwaiter, TestAsync.LoopAsyncStateMachine>(ref awaiter, ref this);
return;
}
goto loopBack;
}
_state = -2;
_builder.SetResult();
}
catch (Exception exception)
{
_state = -2;
_builder.SetException(exception);
return;
}
}
[DebuggerHidden]
void IAsyncStateMachine.SetStateMachine(IAsyncStateMachine param0)
{
_builder.SetStateMachine(param0);
}
}
public Task LoopAsync()
{
LoopAsyncStateMachine stateMachine = new LoopAsyncStateMachine();
stateMachine._this = this;
AsyncTaskMethodBuilder builder = AsyncTaskMethodBuilder.Create();
stateMachine._builder = builder;
stateMachine._state = -1;
builder.Start(ref stateMachine);
return builder.Task;
}
(The above is based on what happens when you use async and await except that the result of that uses names that cannot be valid C# class or field names, along with some extra attributes. If its MoveNext() reminds you of an IEnumerator that's not entirely irrelevant, the mechanism by which await and async produce an IAsyncStateMachine to implement a Task is similar in many ways to how yield produces an IEnumerator<T>).
The result is a single Task which comes from AsyncTaskMethodBuilder and makes use of LoopAsyncStateMachine (which is close to the hidden struct that the async produces). Its MoveNext() method is first called upon the task being started. It will then use an awaiter on SomeNetworkCallAsync. If it is already completed it moves on to the next stage (increment count and so on), otherwise it stores the awaiter in a field. On subsequent uses it will be called because the SomeNetworkCallAsync() task has returned, and it will get the result (which is void in this case, but could be a value if values were returned). It then attempts further loops and again returns when it is waiting on a task that is not yet completed.
When it finally reaches a count of 5 it calls SetResult() on the builder, which sets the result of the Task that LoopAsync had returned.
So there is one Task and the implementation of that Task will involve 5 calls to NetworkCallAsync, but the use of a state-machine means those tasks need not be explicitly chained for this to work. This, for example, allows it to decide whether to break the looping or not based on the result of a task, and so on.

When an async method first yields at an await, it returns a Task (or Task<T>). This is not the task being observed by the await; it is a completely different task created by the async method. The async state machine controls the lifetime of that Task.
One way to think of it is to consider the returned Task as representing the method itself. The returned Task will only complete when the method completes. If the method returns a value, then that value is set as the result of the task. If the method throws an exception, then that exception is captured by the state machine and placed on that task.
So, there's no need for attaching continuations to the returned task. The returned task will not complete until the method is done.
How / In what way, will the next code lines like count++ and further SomeNetworkCallAsync be executed?
I do explain this in my async intro post. In summary, when a method awaits, it captures a "current context" (SynchronizationContext.Current unless it is null, in which case it uses TaskScheduler.Current). When the await completes, it resumes executing its async method within that context.
That's what technically happens; but in the vast majority of cases, this simply means:
If an async method starts on a UI thread, then it will resume on that same UI thread.
If an async method starts within an ASP.NET request context, then it will resume with that same request context (not necessarily on the same thread, though).
Otherwise, the async method resumes on a thread pool thread.

Entity Framework SaveChanges() vs. SaveChangesAsync() and Find() vs. FindAsync()

I have been searching for the differences between 2 pairs above but haven't found any articles explaining clearly about it as well as when to use one or another.
So what is the difference between SaveChanges() and SaveChangesAsync()?
And between Find() and FindAsync()?
On server side, when we use Async methods, we also need to add await. Thus, I don't think it is asynchronous on server side.
Does it only help to prevent the UI blocking on client side browser? Or are there any pros and cons between them?

Any time that you need to do an action on a remote server, your program generates the request, sends it, then waits for a response. I will use SaveChanges() and SaveChangesAsync() as an example but the same applies to Find() and FindAsync().
Say you have a list myList of 100+ items that you need to add to your database. To insert that, your function would look something like so:
using(var context = new MyEDM())
{
context.MyTable.AddRange(myList);
context.SaveChanges();
}
First you create an instance of MyEDM, add the list myList to the table MyTable, then call SaveChanges() to persist the changes to the database. It works how you want, the records get committed, but your program cannot do anything else until the commit finishes. This can take a long time depending on what you are committing. If you are committing changes to the records, entity has to commit those one at a time (I once had a save take 2 minutes for updates)!
To solve this problem, you could do one of two things. The first is you can start up a new thread to handle the insert. While this will free up the calling thread to continue executing, you created a new thread that is just going to sit there and wait. There is no need for that overhead, and this is what the async await pattern solves.
For I/O opperations, await quickly becomes your best friend. Taking the code section from above, we can modify it to be:
using(var context = new MyEDM())
{
Console.WriteLine("Save Starting");
context.MyTable.AddRange(myList);
await context.SaveChangesAsync();
Console.WriteLine("Save Complete");
}
It is a very small change, but there are profound effects on the efficiency and performance of your code. So what happens? The begining of the code is the same, you create an instance of MyEDM and add your myList to MyTable. But when you call await context.SaveChangesAsync(), the execution of code returns to the calling function! So while you are waiting for all those records to commit, your code can continue to execute. Say the function that contained the above code had the signature of public async Task SaveRecords(List<MyTable> saveList), the calling function could look like this:
public async Task MyCallingFunction()
{
Console.WriteLine("Function Starting");
Task saveTask = SaveRecords(GenerateNewRecords());
for(int i = 0; i < 1000; i++){
Console.WriteLine("Continuing to execute!");
}
await saveTask;
Console.Log("Function Complete");
}
Why you would have a function like this, I don't know, but what it outputs shows how async await works. First let's go over what happens.
Execution enters MyCallingFunction, Function Starting then Save Starting gets written to the console, then the function SaveChangesAsync() gets called. At this point, execution returns to MyCallingFunction and enters the for loop writing 'Continuing to Execute' up to 1000 times. When SaveChangesAsync() finishes, execution returns to the SaveRecordsfunction, writing Save Complete to the console. Once everything in SaveRecords completes, execution will continue in MyCallingFunction right were it was when SaveChangesAsync() finished. Confused? Here is an example output:
Function Starting
Save Starting
Continuing to execute!
Continuing to execute!
Continuing to execute!
Continuing to execute!
Continuing to execute!
....
Continuing to execute!
Save Complete!
Continuing to execute!
Continuing to execute!
Continuing to execute!
....
Continuing to execute!
Function Complete!
Or maybe:
Function Starting
Save Starting
Continuing to execute!
Continuing to execute!
Save Complete!
Continuing to execute!
Continuing to execute!
Continuing to execute!
....
Continuing to execute!
Function Complete!
That is the beauty of async await, your code can continue to run while you are waiting for something to finish. In reality, you would have a function more like this as your calling function:
public async Task MyCallingFunction()
{
List<Task> myTasks = new List<Task>();
myTasks.Add(SaveRecords(GenerateNewRecords()));
myTasks.Add(SaveRecords2(GenerateNewRecords2()));
myTasks.Add(SaveRecords3(GenerateNewRecords3()));
myTasks.Add(SaveRecords4(GenerateNewRecords4()));
await Task.WhenAll(myTasks.ToArray());
}
Here, you have four different save record functions going at the same time. MyCallingFunction will complete a lot faster using async await than if the individual SaveRecords functions were called in series.
The one thing that I have not touched on yet is the await keyword. What this does is stop the current function from executing until whatever Task you are awaiting completes. So in the case of the original MyCallingFunction, the line Function Complete will not be written to the console until the SaveRecords function finishes.
Long story short, if you have an option to use async await, you should as it will greatly increase the performance of your application.

My remaining explanation will be based on the following code snippet.
using System;
using System.Threading;
using System.Threading.Tasks;
using static System.Console;
public static class Program
{
const int N = 20;
static readonly object obj = new object();
static int counter;
public static void Job(ConsoleColor color, int multiplier = 1)
{
for (long i = 0; i < N * multiplier; i++)
{
lock (obj)
{
counter++;
ForegroundColor = color;
Write($"{Thread.CurrentThread.ManagedThreadId}");
if (counter % N == 0) WriteLine();
ResetColor();
}
Thread.Sleep(N);
}
}
static async Task JobAsync()
{
// intentionally removed
}
public static async Task Main()
{
// intentionally removed
}
}
Case 1
static async Task JobAsync()
{
Task t = Task.Run(() => Job(ConsoleColor.Red, 1));
Job(ConsoleColor.Green, 2);
await t;
Job(ConsoleColor.Blue, 1);
}
public static async Task Main()
{
Task t = JobAsync();
Job(ConsoleColor.White, 1);
await t;
}
Remarks: As the synchronous part (green) of JobAsync spins longer than the task t (red) then the task t is already completed at the point of await t. As a result, the continuation (blue) runs on the same thread as the green one.
The synchronous part of Main (white) will spin after the green one is finished spinning. That is why the synchronous part in asynchronous method is problematic.
Case 2
static async Task JobAsync()
{
Task t = Task.Run(() => Job(ConsoleColor.Red, 2));
Job(ConsoleColor.Green, 1);
await t;
Job(ConsoleColor.Blue, 1);
}
public static async Task Main()
{
Task t = JobAsync();
Job(ConsoleColor.White, 1);
await t;
}
Remarks: This case is opposite to the first case. The synchronous part (green) of JobAsync spins shorter than the task t (red) then the task t has not been completed at the point of await t. As a result, the continuation (blue) runs on the different thread as the green one.
The synchronous part of Main (white) still spins after the green one is finished spinning.
Case 3
static async Task JobAsync()
{
Task t = Task.Run(() => Job(ConsoleColor.Red, 1));
await t;
Job(ConsoleColor.Green, 1);
Job(ConsoleColor.Blue, 1);
}
public static async Task Main()
{
Task t = JobAsync();
Job(ConsoleColor.White, 1);
await t;
}
Remarks: This case will solve the problem in the previous cases about the synchronous part in asynchronous method. The task t is immediately awaited. As a result, the continuation (blue) runs on the different thread as the green one.
The synchronous part of Main (white) will spin immediately parallel to JobAsync.
If you want to add other cases, feel free to edit.

This statement is incorrect:
On server side, when we use Async methods, we also need to add await.
You do not need to add "await", await is merely a convenient keyword in C# that enables you to write more lines of code after the call, and those other lines will only get executed after the Save operation completes. But as you pointed out, you could accomplish that simply by calling SaveChanges instead of SaveChangesAsync.
But fundamentally, an async call is about much more than that. The idea here is that if there is other work you can do (on the server) while the Save operation is in progress, then you should use SaveChangesAsync. Do not use "await". Just call SaveChangesAsync, and then continue to do other stuff in parallel. This includes potentially, in a web app, returning a response to the client even before the Save has completed. But of course, you still will want to check the final result of the Save so that in case it fails, you can communicate that to your user, or log it somehow.

Use a Task to avoid multiple calls to expensive operation and to cache its result

I have an async method that fetches some data from a database. This operation is fairly expensive, and takes a long time to complete. As a result, I'd like to cache the method's return value. However, it's possible that the async method will be called multiple times before its initial execution has a chance to return and save its result to the cache, resulting in multiple calls to this expensive operation.
To avoid this, I'm currently reusing a Task, like so:
public class DataAccess
{
private Task<MyData> _getDataTask;
public async Task<MyData> GetDataAsync()
{
if (_getDataTask == null)
{
_getDataTask = Task.Run(() => synchronousDataAccessMethod());
}
return await _getDataTask;
}
}
My thought is that the initial call to GetDataAsync will kick off the synchronousDataAccessMethod method in a Task, and any subsequent calls to this method before the Task has completed will simply await the already running Task, automatically avoiding calling synchronousDataAccessMethod more than once. Calls made to GetDataAsync after the private Task has completed will cause the Task to be awaited, which will immediately return the data from its initial execution.
This seems to be working, but I'm having some strange performance issues that I suspect may be tied to this approach. Specifically, awaiting _getDataTask after it has completed takes several seconds (and locks the UI thread), even though the synchronousDataAccessMethod call is not called.
Am I misusing async/await? Is there a hidden gotcha that I'm not seeing? Is there a better way to accomplish the desired behavior?
EDIT
Here's how I call this method:
var result = (await myDataAccessObject.GetDataAsync()).ToList();
Maybe it has something to do with the fact that the result is not immediately enumerated?

If you want to await it further up the call stack, I think you want this:
public class DataAccess
{
private Task<MyData> _getDataTask;
private readonly object lockObj = new Object();
public async Task<MyData> GetDataAsync()
{
lock(lockObj)
{
if (_getDataTask == null)
{
_getDataTask = Task.Run(() => synchronousDataAccessMethod());
}
}
return await _getDataTask;
}
}
Your original code has the potential for this happening:
Thread 1 sees that _getDataTask == null, and begins constructing the task
Thread 2 sees that _getDataTask == null, and begins constructing the task
Thread 1 finishes constructing the task, which starts, and Thread 1 waits on that task
Thread 2 finishes constructing a task, which starts, and Thread 2 waits on that task
You end up with two instances of the task running.

Use the lock function to prevent multiple calls to the database query section. Lock will make it thread safe so that once it has been cached all the other calls will use it instead of running to the database for fulfillment.
lock(StaticObject) // Create a static object so there is only one value defined for this routine
{
if(_getDataTask == null)
{
// Get data code here
}
return _getDataTask
}

Please rewrite your function as:
public Task<MyData> GetDataAsync()
{
if (_getDataTask == null)
{
_getDataTask = Task.Run(() => synchronousDataAccessMethod());
}
return _getDataTask;
}
This should not change at all the things that can be done with this function - you can still await on the returned task!
Please tell me if that changes anything.

Bit late to answer this but there is an open source library called LazyCache that will do this for you in two lines of code and it was recently updated to handle caching Tasks for just this sort of situation. It is also available on nuget.
Example:
Func<Task<List<MyData>>> cacheableAsyncFunc = () => myDataAccessObject.GetDataAsync();
var cachedData = await cache.GetOrAddAsync("myDataAccessObject.GetData", cacheableAsyncFunc);
return cachedData;
// Or instead just do it all in one line if you prefer
// return await cache.GetOrAddAsync("myDataAccessObject.GetData", myDataAccessObject.GetDataAsync);
}
It has built in locking by default so the cacheable method will only execute once per cache miss, and it uses a lamda so you can do "get or add" in one go. It defaults to 20 minutes sliding expiration but you can set whatever caching policy you like on it.
More info on caching tasks is in the api docs and you may find the sample app to demo caching tasks useful.
(Disclaimer: I am the author of LazyCache)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.