Changing each element in a list with differents threads - c#

I have the following pseudo-code:
public void Associar(List<Data> dados)
{
List<Task> tasks = new List<Task>();
foreach(dado in dados)
{
tasks.Add(AdicionarAsync(dado));
}
Task.WaitAll(tasks.ToArray());
Debug.WriteLine(dados.Select(e => e.Colecao).Sum(e => e.Count));
}
public async Task<List<Foo>> ConsultarNoBanco()
{
//make request
//here the result is OK
return result;
}
public async Task AdicionarAsync(Data dado)
{
dado.Colecao = await ConsultarNoBanco();
//Here the result (dado.Colecao) is wrong
//If I modify the code to ConsultarNoBanco().Result everything works fine
}
The output of this code must always be 411. However, the result changes each time the method Associar() is called. What is the best way to use a thread safe list to change each item in a collection with multi-thread?

Use Parallel.ForEach(); to modify entries in your list.
It will manage concurrency and threading for you.
You can also break by using e.Break();

The current answer(s)/comment(s) are saying something that you need to manage concurrency since you are modifying entries in your list using tasks/threading. IMHO this is incorrect since the modifications that are being done on your Data object are fine - each task is only modifying the designated Data object. No synchronization is necessary at that point.
On the other hand, your method ConsultarNoBanco is being executed from multiple tasks/threads, at the same time. Since you are not showing the code in the method, we cannot say anything about it. But it is my impression that this method is not thread-safe. Especially since the method is not receiving the Data object, and I can therefore only assume it is doing something not related to the Data object.
Can you show the code of the method ConsultarNoBanco? Is it thread-safe? You mention a request, is the request-handler thread-safe?

Related

What are best practices / good patterns for managing cached async data?

I am rewriting an old app and I am trying to use async to speed it up.
The old code was doing something like this:
var value1 = getValue("key1");
var value2 = getValue("key2");
var value3 = getValue("key3");
where the getValue function was managing its own cache in a dictionary, doing something like this:
object getValue(string key) {
if (cache.ContainsKey(key)) return cache[key];
var value = callSomeHttpEndPointsAndCalculateTheValue(key);
cache.Add(key, value);
return value;
}
If I make the getValue async and I await every call to getValue, then everything works well. But it is not faster than the old version because everything is running synchronously as it used to.
If I remove the await (well, if I postpone it, but that's not the focus of this question), I finally get the slow stuff to run in parallel. But if a second call to getValue("key1") is executed before the first call has finished, I end up with executing the same slow call twice and everything is slower than the old version, because it doesn't take advantage of the cache.
Is there something like await("key1") that will only await if a previous call with "key1" is still awaiting?
EDIT (follow-up to a comment)
By "speed it up" I mean more responsive.
For example when the user selects a material in a drop down, I want to update the list of available thicknesses or colors in other drop downs and other material properties in other UI elements. Sometimes this triggers a cascade of events that requires the same getValue("key") to used more than once.
For example when the material is changed, a few functions may be called: updateThicknesses(), updateHoleOffsets(), updateMaxWindLoad(), updateMaxHoleDistances(), etc. Each function reads the values from the UI elements and decides whether to do its own slow calculations independently from the other functions. Each function can require a few http calls to calculate some parameters, and some of those parameters may be required by several functions.
The old implementation was calling the functions in sequence, so the second function would take advantage of some values cached while processing the first one. The user would see each section of the interface updating in sequence over 5-6 seconds the first time and very quickly the following times, unless the new value required some new http endpoint calls.
The new async implementation calls all the functions at the same time, so every function ends up calling the same http endpoints because their results are not yet cached.
A simple method is to cache the tasks instead of the values, this way you can await both a pending task and an already completed task to get the values.
If several parallel tasks all try to get a value using the same key, only the first will spin off the task, the others will await the same task.
Here's a simple implementation:
private Dictionary<string, Task<object>> cache = new();
public Task<object> getValueAsync(string key)
{
lock (cache)
{
if (!cache.TryGetValue(key, out var result))
cache[key] = result = callSomeHttpEndPointsAndCalculateTheValueAsync(key);
return result;
}
}
Judging by the comments the following example should probably not be used.
Since [ConcurrentDictionary]() has been mentioned, here's a version using that instead.
private ConcurrentDictionary<string, Task<object>> cache = new();
public Task<object> getValueAsync(string key)
{
return cache.GetOrAdd(key, k => callSomeHttpEndPointsAndCalculateTheValueAsync(k));
}
The method seems simpler and that alone might be grounds for switching to it, but in my experience the ConcurrentDictionary and the other ConcurrentXXX collections seems to have their niche use and seems somewhat more heavyhanded and thus slower for the basic stuff.

Running Async Foreach Loop C# async await

I am struggling to grasp the basic concept of c# async await.
Basically what I have is a List of objects which I need to process, the processing involves iterating through its properties and joining strings, and then creating a new object (in this case called a trellocard) and eventually adding a list of trellocards.
The iteration takes quiet a long time, So what I would like to do is process multiple objects at asynchronously.
I've tried multiple approaches but basically I want to do something like this. (in the below example I have removed the processing, and just put system.threading.thread.sleep(200). Im await that this is NOT an async method, and I could use tasks.delay but the point is my processing does not have any async methods, i want to just run the entire method with multiple instances.
private async Task<List<TrelloCard>> ProcessJobs(IQueryable<IGrouping<CardGrouping, Job>> jobs)
{
List<TrelloCard> cards = new List<TrelloCard>();
foreach (var job in jobs.ToList())
{
card = await ProcessCards(job, cards); // I would like to run multiple instances of the processing
cards.add(card); //Once each instance is finshed it adds it to the list
}
private async Task<TrelloCard> ProcessCards(Job job)
{
System.Threading.Thread.Sleep(2000); //Just for examples sake
return new TrelloCard();
}
I am struggling to grasp the basic concept of c# async await.
Simple definition would be, Async-Await is a part .Net concurrency, which can be used to make multiple IO calls, and in process not waste the Threads, which are meant for Compute operations. Its like call to Database, Web service, Network calls, File IO, all of which doesn't need a current process thread
In your current case, where the use case is:
iterating through its properties and joining strings, and then creating a new object
eventually adding a list of trellocards
This seems to be a compute bound operation, until and unless you are doing an IO, to me it seems you are traversing an in memory object, for this case the better choice would be:
Parallel.ForEach, to parallelize the in memory processing, though you need to be careful of Race conditions, as a given memory could be accessed by multiple threads, thus corrupting it specially during write operation, so at least in current code use Thread safe collection like ConcurrentBag from System.Collections.Concurrent namespace, or which ever suit the use case instead of List<TrelloCard>, or you may consider following Thread safe list
Also please note that, in case your methods are not by default Async, then you may plan to wrap them in a Task.Run, to await upon, though this would need a Thread pool thread, but can be called using Async-Await
Parallel.Foreach code for your use case (I am doing direct replacement, there seems to be an issue in your code, since ProcessCards function, just takes Job object but you are also passing the collection Cards, which is compilation error):
private List<TrelloCard> ProcessJobs(IQueryable<IGrouping<CardGrouping, Job>> jobs)
{
ConcurrentBag<TrelloCard> cards = new ConcurrentBag<TrelloCard>();
Parallel.ForEach(jobs.ToList(), (job) =>
{
card = ProcessCards(job); // I would like to run multiple instances of the processing
cards.Add(card); //Once each instance is finshed it adds it to the list
});
return cards.ToList();
}
private TrelloCard ProcessCards(Job job)
{
return new TrelloCard();
}
If you want them to run in parallel you could spawn a new Task for each operation and then await the completion of all using Task.WhenAll.
private async Task<List<TrelloCard>> ProcessJobs(IQueryable<IGrouping<CardGrouping, Job>> jobs)
{
List<Task<TrelloCard>> tasks = new List<Task<TrelloCard>>();
foreach (var job in jobs)
{
tasks.Add(ProcessCards(job));
}
var results = await Task.WhenAll(tasks);
return results.ToList();
}
private Task<TrelloCard> ProcessCards(Job job)
{
return Task.Run(() =>
{
System.Threading.Thread.Sleep(2000); //Just for examples sake
return new TrelloCard();
});
}
jobs.ToList() is just wasting memory. It's already IEnumerable so can be used in a foreach.
ProcessCards doesn't compile. You need something like this
private Task<TrelloCard> ProcessCards(Job job)
{
return Task.Run(() =>
{
System.Threading.Thread.Sleep(2000); //Just for examples sake
return new TrelloCard();
});
}
Now you want ProcessJobs to
create a ProcessCards task for each job
wait for all tasks to finish
return a sequence of TrelloCard
private async Task<List<TrelloCard>> ProcessJobs(IQueryable<IGrouping<CardGrouping, Job>> jobs)
{
return await Task.WhenAll(jobs.Select(ProcessCards));
}

Why is Task.WaitAll not waiting until all of my Tasks have completed? C#

Before I start. I have looked at similar questions and I don't think they have an answer in my situation.
I am having problems with Task.Factory.StartNew and Task.WaitAll.
I am getting null exceptions on a object within a created class that is initialized in the task, even though the code that is throwing a null exception should be waiting until all tasks are complete.
If I run this code without the tasks it works fine.
Why is Task.WaitAll not waiting until all of the Tasks have been completed?
Queue<Task> tasks = new Queue<Task>();
//Go through all transactions in the file via the reader.
foreach (transaction t in xr.read_x12(_progressbar_all_processing)) {
tasks.Enqueue(Task.Factory.StartNew(() => {
//Create a new provider from the current transaction and then
//add it to the global provider list.
provider p = new provider(t);
t_info.provider_list.Add(p);
//Null out the segments of the current transaction
//We are done with them and now the garbage collector
//can clean them up for us.
t.segments = null;
}));
}
Task.WaitAll(tasks.ToArray());
foreach(provider p in t_info.providers){
//Every provider has a List<claims> claims_list
//Do something with p.claims_list
foreach(claim c in p.claims_list){ //<--null exception here
}
}
t_info.provider_list is a List<provider> this class is not safe to have multiple threads write to it at once, you must synchronize access to the list.
lock(t_info.provider_list)
{
t_info.provider_list.Add(p);
}
This will only allow a single thread to do the Add call at a time and will fix your issues with a broken collection.
A suggestion to make this easier to get right: use Task.WhenAll instead. Make each of your tasks return a value which is the result of its own unit of work.
WhenAll has the signature:
Task<TResult[]> WhenAll<TResult>(IEnumerable<Task<TResult>> tasks)
Task.WhenAll on MSDN
So you pass it a collection of tasks that each evaluate to a TResult and you get back a task that evaluates to an array containing all the results when they're done.
This way, you are absolved of any responsibility for using thread-safe collections to pass data between tasks. It's much harder to get wrong.
It's also compatible with async/await, which is all about consuming values returned via tasks.

Task.Run with Parameter(s)?

I'm working on a multi-tasking network project and I'm new on Threading.Tasks. I implemented a simple Task.Factory.StartNew() and I wonder how can I do it with Task.Run()?
Here is the basic code:
Task.Factory.StartNew(new Action<object>(
(x) =>
{
// Do something with 'x'
}), rawData);
I looked into System.Threading.Tasks.Task in Object Browser and I couldn't find a Action<T> like parameter. There is only Action that takes void parameter and no type.
There are only 2 things similiar: static Task Run(Action action) and static Task Run(Func<Task> function) but can't post parameter(s) with both.
Yes, I know I can create a simple extension method for it but my main question is can we write it on single line with Task.Run()?
private void RunAsync()
{
//Beware of closures. String is immutable.
string param = "Hi";
Task.Run(() => MethodWithParameter(param));
}
private void MethodWithParameter(string param)
{
//Do stuff
}
Edit
Due to popular demand I must note that the Task launched will run in parallel with the calling thread. Assuming the default TaskScheduler this will use the .NET ThreadPool. Anyways, this means you need to account for whatever parameter(s) being passed to the Task as potentially being accessed by multiple threads at once, making them shared state. This includes accessing them on the calling thread.
In my above code that case is made entirely moot. Strings are immutable. That's why I used them as an example. But say you're not using a String...
One solution is to use async and await. This, by default, will capture the SynchronizationContext of the calling thread and will create a continuation for the rest of the method after the call to await and attach it to the created Task. If this method is running on the WinForms GUI thread it will be of type WindowsFormsSynchronizationContext.
The continuation will run after being posted back to the captured SynchronizationContext - again only by default. So you'll be back on the thread you started with after the await call. You can change this in a variety of ways, notably using ConfigureAwait. In short, the rest of that method will not continue until after the Task has completed on another thread. But the calling thread will continue to run in parallel, just not the rest of the method.
This waiting to complete running the rest of the method may or may not be desirable. If nothing in that method later accesses the parameters passed to the Task you may not want to use await at all.
Or maybe you use those parameters much later on in the method. No reason to await immediately as you could continue safely doing work. Remember, you can store the Task returned in a variable and await on it later - even in the same method. For instance, once you need to access the passed parameters safely after doing a bunch some other work. Again, you do not need to await on the Task right when you run it.
Anyways, a simple way to make this thread-safe with respect to the parameters passed to Task.Run is to do this:
You must first decorate RunAsync with async:
private async void RunAsync()
Important Notes
Preferably the method marked async should not return void, as the linked documentation mentions. The common exception to this is event handlers such as button clicks and such. They must return void. Otherwise I always try to return a Task or Task<TResult> when using async. It's good practice for a quite a few reasons.
Now you can await running the Task like below. You cannot use await without async.
await Task.Run(() => MethodWithParameter(param));
//Code here and below in the same method will not run until AFTER the above task has completed in one fashion or another
So, in general, if you await the task you can avoid treating passed in parameters as a potentially shared resource with all the pitfalls of modifying something from multiple threads at once. Also, beware of closures. I won't cover those in depth but the linked article does a great job of it.
Regarding Run and StartNew the code below I find most important to know, really. There are legitimate reasons to use either, neither is obsolete or "better" than the other. Be aware simply replacing one with the other is a very bad idea unless you understand this:
//These are exactly the same
Task.Run(x);
Task.Factory.StartNew(x, CancellationToken.None,
TaskCreationOptions.DenyChildAttach, TaskScheduler.Default);
//These are also exactly the same
Task.Factory.StartNew(x);
Task.Factory.StartNew(x, CancellationToken.None,
TaskCreationOptions.None, TaskScheduler.Current);
Side Notes
A bit off topic, but be careful using any type of "blocking" on the WinForms GUI thread due to it being marked with [STAThread]. Using await won't block at all, but I do sometimes see it used in conjunction with some sort of blocking.
"Block" is in quotes because you technically cannot block the WinForms GUI thread. Yes, if you use lock on the WinForms GUI thread it will still pump messages, despite you thinking it's "blocked". It's not.
This can cause bizarre issues in very rare cases. One of the reasons you never want to use a lock when painting, for example. But that's a fringe and complex case; however I've seen it cause crazy issues. So I noted it for completeness sake.
Use variable capture to "pass in" parameters.
var x = rawData;
Task.Run(() =>
{
// Do something with 'x'
});
You also could use rawData directly but you must be careful, if you change the value of rawData outside of a task (for example a iterator in a for loop) it will also change the value inside of the task.
From now you can also :
Action<int> action = (o) => Thread.Sleep(o);
int param = 10;
await new TaskFactory().StartNew(action, param)
I know this is an old thread, but I wanted to share a solution I ended up having to use since the accepted post still has an issue.
The Issue:
As pointed out by Alexandre Severino, if param (in the function below) changes shortly after the function call, you might get some unexpected behavior in MethodWithParameter.
Task.Run(() => MethodWithParameter(param));
My Solution:
To account for this, I ended up writing something more like the following line of code:
(new Func<T, Task>(async (p) => await Task.Run(() => MethodWithParam(p)))).Invoke(param);
This allowed me to safely use the parameter asynchronously despite the fact that the parameter changed very quickly after starting the task (which caused issues with the posted solution).
Using this approach, param (value type) gets its value passed in, so even if the async method runs after param changes, p will have whatever value param had when this line of code ran.
Just use Task.Run
var task = Task.Run(() =>
{
//this will already share scope with rawData, no need to use a placeholder
});
Or, if you would like to use it in a method and await the task later
public Task<T> SomethingAsync<T>()
{
var task = Task.Run(() =>
{
//presumably do something which takes a few ms here
//this will share scope with any passed parameters in the method
return default(T);
});
return task;
}
It's unclear if the original problem was the same problem I had: wanting to max CPU threads on computation inside a loop while preserving the iterator's value and keeping inline to avoid passing a ton of variables to a worker function.
for (int i = 0; i < 300; i++)
{
Task.Run(() => {
var x = ComputeStuff(datavector, i); // value of i was incorrect
var y = ComputeMoreStuff(x);
// ...
});
}
I got this to work by changing the outer iterator and localizing its value with a gate.
for (int ii = 0; ii < 300; ii++)
{
System.Threading.CountdownEvent handoff = new System.Threading.CountdownEvent(1);
Task.Run(() => {
int i = ii;
handoff.Signal();
var x = ComputeStuff(datavector, i);
var y = ComputeMoreStuff(x);
// ...
});
handoff.Wait();
}
Idea is to avoid using a Signal like above.
Pumping int values into a struct prevents those values from changing (in the struct).
I had the following Problem: loop var i would change before DoSomething(i) was called (i was incremented at end of loop before ()=> DoSomething(i,ii) was called). With the structs it doesn't happen anymore. Nasty bug to find: DoSomething(i, ii) looks great, but never sure if it gets called each time with a different value for i (or just a 100 times with i=100), hence -> struct
struct Job { public int P1; public int P2; }
…
for (int i = 0; i < 100; i++) {
var job = new Job { P1 = i, P2 = i * i}; // structs immutable...
Task.Run(() => DoSomething(job));
}
There is another way of doing this. I found it useful.
int param;
ThreadPool.QueueUserWorkItem(someMethod, param);
void someMethod(object parameter){
var param = (int) parameter;
// do the job
}

What is causing this particular method to deadlock?

As best as I can, I opt for async all the way down. However, I am still stuck using ASP.NET Membership which isn't built for async. As a result my calls to methods like string[] GetRolesForUser() can't use async.
In order to build roles properly I depend on data from various sources so I am using multiple tasks to fetch the data in parallel:
public override string[] GetRolesForUser(string username) {
...
Task.WaitAll(taskAccounts, taskContracts, taskOtherContracts, taskMoreContracts, taskSomeProduct);
...
}
All of these tasks are simply fetching data from a SQL Server database using the Entity Framework. However, the introduction of that last task (taskSomeProduct) is causing a deadlock while none of the other methods have been.
Here is the method that causes a deadlock:
public async Task<int> SomeProduct(IEnumerable<string> ids) {
var q = from c in this.context.Contracts
join p in this.context.Products
on c.ProductId equals p.Id
where ids.Contains(c.Id)
select p.Code;
//Adding .ConfigureAwait(false) fixes the problem here
var codes = await q.ToListAsync();
var slotCount = codes .Sum(p => char.GetNumericValue(p, p.Length - 1));
return Convert.ToInt32(slotCount);
}
However, this method (which looks very similar to all the other methods) isn't causing deadlocks:
public async Task<List<CustomAccount>> SomeAccounts(IEnumerable<string> ids) {
return await this.context.Accounts
.Where(o => ids.Contains(o.Id))
.ToListAsync()
.ToCustomAccountListAsync();
}
I'm not quite sure what it is about that one method that is causing the deadlock. Ultimately they are both doing the same task of querying the database. Adding ConfigureAwait(false) to the one method does fix the problem, but I'm not quite sure what differentiates itself from the other methods which execute fine.
Edit
Here is some additional code which I originally omitted for brevity:
public static Task<List<CustomAccount>> ToCustomAccountListAsync(this Task<List<Account>> sqlObjectsTask) {
var sqlObjects = sqlObjectsTask.Result;
var customObjects = sqlObjects.Select(o => PopulateCustomAccount(o)).ToList();
return Task.FromResult<List<CustomAccount>>(customObjects);
}
The PopulateCustomAccount method simply returns a CustomAccount object from the database Account object.
In ToCustomAccountListAsync you call Task.Result. That's a classic deadlock. Use await.
This is not an answer, but I have a lot to say, it wouldn't fit in comments.
Some fact: EF context is not thread safe and doesn't support parallel execution:
While thread safety would make async more useful it is an orthogonal feature. It is unclear that we could ever implement support for it in the most general case, given that EF interacts with a graph composed of user code to maintain state and there aren't easy ways to ensure that this code is also thread safe.
For the moment, EF will detect if the developer attempts to execute two async operations at one time and throw.
Some prediction:
You say that:
The parallel execution of the other four tasks has been in production for months without deadlocking.
They can't be executing in parallel. One possibility is that the thread pool cannot assign more than one thread to your operations, in that case they would be executed sequentially. Or it could be the way you are initializing your tasks, I'm not sure. Assuming they are executed sequentially (otherwise you would have recognized the exception I'm talking about), there is another problem:
Task.WaitAll hanging with multiple awaitable tasks in ASP.NET
So maybe it isn't about that specific task SomeProduct but it always happens on the last task? Well, if they executed in parallel, there wouldn't be a "last task" but as I've already pointed out, they must be running sequentially considering they had been in production for quite a long time.

Categories