I have the following situation (or a basic misunderstanding with the async await mechanism).
Assume you have a set of 1-20 web request call that takes a long time: findItemsByProduct().
you want to wrap it around in an async request, that would be able to abstract all these calls into one async call, but I can't seem to be able to do it without using more threads.
If I'm doing:
int total = result.paginationOutput.totalPages;
for (int i = 2; i < total + 1; i++)
{
await Task.Factory.StartNew(() =>
{
result = client.findItemsByProduct(i);
});
newList.AddRange(result.searchResult.item);
}
}
return newList;
problem here, that the calls don't run together, rather they are waiting one by one.
I would like all the calls to run together and than harvest the results.
as pseudo code, I would like the code to run like this:
forEach item {
result = item.makeWebRequest();
}
foreach item {
List.addRange(item.harvestResults);
}
I have no idea how to make the code to do that though..
Ideally, you should add a findItemsByProductAsync that returns a Task<Item[]>. That way, you don't have to create unnecessary tasks using StartNew or Task.Run.
Then your code can look like this:
int total = result.paginationOutput.totalPages;
// Start all downloads; each download is represented by a task.
Task<Item[]>[] tasks = Enumerable.Range(2, total - 1)
.Select(i => client.findItemsByProductAsync(i)).ToArray();
// Wait for all downloads to complete.
Item[][] results = await Task.WhenAll(tasks);
// Flatten the results into a single collection.
return results.SelectMany(x => x).ToArray();
Given your requirements which I see as:
Process n number of non-blocking tasks
Process results after all queries have returned
I would use the CountdownEvent for this e.g.
var results = new ConcurrentBag<ItemType>(result.pagination.totalPages);
using (var e = new CountdownEvent(result.pagination.totalPages))
{
for (int i = 2; i <= result.pagination.totalPages+1; i++)
{
Task.Factory.StartNew(() => return client.findItemsByProduct(i))
.ContinueWith(items => {
results.AddRange(items);
e.Signal(); // signal task is done
});
}
// Wait for all requests to complete
e.Wait();
}
// Process results
foreach (var item in results)
{
...
}
This particular problem is solved easily enough without even using await. Simply create each of the tasks, put all of the tasks into a list, and then use WhenAll on that list to get a task that represents the completion of all of those tasks:
public static Task<Item[]> Foo()
{
int total = result.paginationOutput.totalPages;
var tasks = new List<Task<Item>>();
for (int i = 2; i < total + 1; i++)
{
tasks.Add(Task.Factory.StartNew(() => client.findItemsByProduct(i)));
}
return Task.WhenAll(tasks);
}
Also note you have a major problem in how you use result in your code. You're having each of the different tasks all using the same variable, so there are race conditions as to whether or not it works properly. You could end up adding the same call twice and having one skipped entirely. Instead you should have the call to findItemsByProduct be the result of the task, and use that task's Result.
If you want to use async-await properly you have to declare your functions async, and the functions that call you also have to be async. This continues until you have once synchronous function that starts the async process.
Your function would look like this:
by the way you didn't describe what's in the list. I assume they are
object of type T. in that case result.SearchResult.Item returns
IEnumerable
private async Task<List<T>> FindItems(...)
{
int total = result.paginationOutput.totalPages;
var newList = new List<T>();
for (int i = 2; i < total + 1; i++)
{
IEnumerable<T> result = await Task.Factory.StartNew(() =>
{
return client.findItemsByProduct(i);
});
newList.AddRange(result.searchResult.item);
}
return newList;
}
If you do it this way, your function will be asynchronous, but the findItemsByProduct will be executed one after another. If you want to execute them simultaneously you should not await for the result, but start the next task before the previous one is finished. Once all tasks are started wait until all are finished. Like this:
private async Task<List<T>> FindItems(...)
{
int total = result.paginationOutput.totalPages;
var tasks= new List<Task<IEnumerable<T>>>();
// start all tasks. don't wait for the result yet
for (int i = 2; i < total + 1; i++)
{
Task<IEnumerable<T>> task = Task.Factory.StartNew(() =>
{
return client.findItemsByProduct(i);
});
tasks.Add(task);
}
// now that all tasks are started, wait until all are finished
await Task.WhenAll(tasks);
// the result of each task is now in task.Result
// the type of result is IEnumerable<T>
// put all into one big list using some linq:
return tasks.SelectMany ( task => task.Result.SearchResult.Item)
.ToList();
// if you're not familiar to linq yet, use a foreach:
var newList = new List<T>();
foreach (var task in tasks)
{
newList.AddRange(task.Result.searchResult.item);
}
return newList;
}
Related
Let's say I want to download 1000 recipes from a website. The websites accepts at most 10 concurrent connections. Each recipe should be stored in an array, at its corresponding index. (I don't want to send the array to the DownloadRecipe method.)
Technically, I've already solved the problem, but I would like to know if there is an even cleaner way to use async/await or something else to achieve it?
static async Task MainAsync()
{
int recipeCount = 1000;
int connectionCount = 10;
string[] recipes = new string[recipeCount];
Task<string>[] tasks = new Task<string>[connectionCount];
int r = 0;
while (r < recipeCount)
{
for (int t = 0; t < tasks.Length; t++)
{
tasks[t] = Task.Run(async () => recipes[r] = await DownloadRecipe(r));
r++;
}
await Task.WhenAll(tasks);
}
}
static async Task<string> DownloadRecipe(int index)
{
// ... await calls to download recipe
}
Also, this solution it's not optimal, since it doesn't bother starting a new download until all the 10 running downloads are finished. Is there something we can improve there without bloating the code too much? A thread pool limited to 10 threads?
There are many many ways you could do this. One way is to use an ActionBlock which give you access to MaxDegreeOfParallelism fairly easily and will work well with async methods
static async Task MainAsync()
{
var recipeCount = 1000;
var connectionCount = 10;
var recipes = new string[recipeCount];
async Task Action(int i) => recipes[i] = await DownloadRecipe(i);
var processor = new ActionBlock<int>(Action, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = connectionCount,
SingleProducerConstrained = true
});
for (var i = 0; i < recipeCount; i++)
await processor.SendAsync(i);
processor.Complete();
await processor.Completion;
}
static async Task<string> DownloadRecipe(int index)
{
...
}
Another way might be to use a SemaphoreSlim
var slim = new SemaphoreSlim(connectionCount, connectionCount);
var tasks = Enumerable
.Range(0, recipeCount)
.Select(Selector);
async Task<string> Selector(int i)
{
await slim.WaitAsync()
try
{
return await DownloadRecipe(i)
}
finally
{
slim.Release();
}
}
var recipes = await Task.WhenAll(tasks);
Another set of approaches is to use Reactive Extensions (Rx)... Once again there are many ways to do this, this is just an awaitable approach (and likely could be better all things considered)
var results = await Enumerable
.Range(0, recipeCount)
.ToObservable()
.Select(i => Observable.FromAsync(() => DownloadRecipe(i)))
.Merge(connectionCount)
.ToArray()
.ToTask();
Alternative approach to have 10 "pools" which will load data "simultaneously".
You don't need to wrap IO operations with the separate thread. Using separate thread for IO operations is just a waste of resources.
Notice that thread which downloads data will do nothing, but just waiting for a response. This is where async-await approach come very handy - we can send multiple requests without waiting them to complete and without wasting threads.
static async Task MainAsync()
{
var requests = Enumerable.Range(0, 1000).ToArray();
var maxConnections = 10;
var pools = requests
.GroupBy(i => i % maxConnections)
.Select(group => DownloadRecipesFor(group.ToArray()))
.ToArray();
await Task.WhenAll(pools);
var recipes = pools.SelectMany(pool => pool.Result).ToArray();
}
static async Task<IEnumerable<string>> DownLoadRecipesFor(params int[] requests)
{
var recipes = new List<string>();
foreach (var request in requests)
{
var recipe = await DownloadRecipe(request);
recipes.Add(recipe);
}
return recipes;
}
Because inside the pool (DownloadRecipesFor method) we download results one by one - we make sure that we have no more than 10 active requests all the time.
This is little bit more effective than originals, because we don't wait for 10 tasks to complete before starting next "bunch".
This is not ideal, because if last "pool" finishes early then others it aren't able to pickup next request to handle.
Final result will have corresponding indexes, because we will process "pools" and requests inside in same order as we created them.
I want to generate List<Task> and then call it all parallel. I try to do this, but not understood how pass index in lambda expression. In my realisation always send last index of for. But i want to use all indexes in parallel computing...
List<Task> tasks = new List<Task>();
var valueSize = Values.Count;
for (int i = 1; i <= valueSize; i++)
{
tasks.Add(Task.Factory.StartNew(
() => {
this.Values[i] = this._tcpRepository.GetValueAsync(i).ToString().GetInt32();
}));
}
Task.WaitAll(tasks.ToArray());
Additional information:
public Dictionary<int, int> Values { get; set; }
In TcpRepository
public async Task<string> GetValueAsync(int digit)
{
var netStream = this.TcpClient.GetStream();
if (netStream.CanWrite)
{
Byte[] sendBytes = Encoding.UTF8.GetBytes(digit.ToString());
await netStream.WriteAsync(sendBytes, 0, sendBytes.Length);
}
if (netStream.CanRead)
{
byte[] bytes = new byte[this.Client.ReceiveBufferSize];
await netStream.ReadAsync(bytes, 0, (int)this.Client.ReceiveBufferSize);
return bytes.ToString();
}
return null;
}
GetInt32() it's my custom extension for string public static int GetInt32(this string value). Strings can come with garbage as characters that are not numbers.
There are a few issues here.
To directly answer your question, you need to create a closure over i to prevent it being updated before it is accessed in your async code.
You can do this by swapping your for loop with Enumerable.Range.
Another issue is that you are running GetValueAsync on the ThreadPool but not awaiting it. Subsequently, your Task.WaitAll will be waiting for the outer Tasks only, not the Tasks returned by GetValueAsync.
Here is an example of what you could do:
var tasks = Enumerable.Range(0, valueSize)
.Select(async i =>
{
string val = await _tcpRepository.GetValueAsync(i);
Values[i] = val.GetInt32();
});
Your final issue is the use of Task.WaitAll; this introduces the sync-over-async antipattern. You should allow async to grow through your code base, and instead use Task.WhenAll, which returns a Task that completes when all the provided Tasks are complete:
await Task.WhenAll(tasks);
I'm not sure if I'm misunderstanding the usage of Task.WhenAny but in the following code only "0" gets printed when it should print "1" and "2" and then "mx.Name" every time a task finishes:
public async void PopulateMxRecords(List<string> nsRecords, int threads)
{
ThreadPool.SetMinThreads(threads, threads);
var resolver = new DnsStubResolver();
var tasks = nsRecords.Select(ns => resolver.ResolveAsync<MxRecord>(ns, RecordType.Mx));
Console.WriteLine("0");
var finished = Task.WhenAny(tasks);
Console.WriteLine("1");
while (mxNsRecords.Count < nsRecords.Count)
{
Console.WriteLine("2");
var task = await finished;
var mxRecords = await task;
foreach(var mx in mxRecords)
Console.WriteLine(mx.Name);
}
}
The DnsStubResolver is part of ARSoft.Tools.Net.Dns. The nsRecords list contains up to 2 million strings.
I'm not sure if I'm misunderstanding the usage of Task.WhenAny
You might be. The pattern you seem to be looking for is interleaving. In the following example, notice the important changes that I have made:
use ToList() to materialize the LINQ query results,
move WhenAny() into the loop,
use Remove(task) as each task completes, and
run the while loop as long as tasks.Count() > 0.
Those are the important changes. The other changes are there to make your listing into a runnable demo of interleaving, the full listing of which is here: https://dotnetfiddle.net/nr1gQ7
public static async Task PopulateMxRecords(List<string> nsRecords)
{
var tasks = nsRecords.Select(ns => ResolveAsync(ns)).ToList();
while (tasks.Count() > 0)
{
var task = await Task.WhenAny(tasks);
tasks.Remove(task);
var mxRecords = await task;
Console.WriteLine(mxRecords);
}
}
Say I had a list of algorithms, each containing an async call somewhere in the body of that algorithm. The order in which I execute the algorithms is the order in which I want to receive the results. That is, I want the AlgorithmResult List to look like {Algorithm1Result, Algorithm2Result, Algorithm3Result} after all the algorithms have executed. Would I be right in saying that if Algorithm1 and 3 finished before 2 that my results would actually be in the order {Algorithm1Result, Algorithm3Result, Algorithm2Result}
var algorithms = new List<Algorithm>(){Algorithm1, Algorithm2, Algorithm3};
var algorithmResults = new List<AlgorithmResults>();
foreach (var algorithm in algorithms)
{
algorithmResults.Add(await algorithm.Execute());
}
NO, Result would be in the same order you added it to the list, since each operation is being waited for separately.
class Program
{
public static async Task<int> GetResult(int timing)
{
return await Task<int>.Run(() =>
{
Thread.Sleep(timing * 1000);
return timing;
});
}
public static async Task<List<int>> GetAll()
{
List<int> tasks = new List<int>();
tasks.Add(await GetResult(3));
tasks.Add(await GetResult(2));
tasks.Add(await GetResult(1));
return tasks;
}
static void Main(string[] args)
{
var res = GetAll().Result;
}
}
res anyway contains list in order it was added, also this is not parallel execution.
Since you don't add the tasks, but the results of the task after awaiting, they will be in the order you want.
Even if you did not await the result before adding the next task you could get the order you want:
in small steps, showing type awareness:
List<Task<AlgorithmResult>> tasks = new List<Task<AlgorithmResult>>();
foreach (Algorithm algorithm in algorithms)
{
Task<AlgorithmResult> task = algorithm.Execute();
// don't wait until task Completes, just remember the Task
// and continue executing the next Algorithm
tasks.Add(task);
}
Now some Tasks may be running, some may already have completed. Let's wait until they are all complete, and fetch the results:
Task.WhenAll(tasks.ToArray());
List<AlgrithmResults> results = tasks.Select(task => task.Result).ToList();
I need to do something like this Test() is just calculations, I only want to put all cpu core at work to find the solution faster.
for (int i= 0; i<1000000; i++)
{
result[i] = Test(i);
if (result[i] == 0)
{
break;
}
}
I have work with BackgroundWorker before. And could create an array of N bgWorker and handle a queue by myself but looks too much trouble.
So I found Task.Factory, seem similar to what I want, but still don't know how handle each separate task to wait for the result and stop everything when found the asnwer.
Task<string> task = Task.Factory.StartNew<string>
(() => DownloadString("http://www.google.com/"));
string result = task.Result;
Or maybe is there other solution for my problem.
Parallel.For(0, 1000000, (i, loopState) =>
{
result[i] = Test(i);
if (result[i] == 0)
{
loopState.Stop();
return;
}
});
How to: Write a Simple Parallel.For Loop
How to: Stop or Break from a Parallel.For Loop
You actually don't need to put multiple cores to use since you're doing IO bound work.
You could take advantage of naturally async APIs for HTTP requests, such as those exposed by HttpClient. Then, you could asynchronously check each task as ot completes if it contains the correct result:
public async Task FindResultAsync(string[] urlList)
{
var downloadTasks = urlList.Select(url => DownloadStringAsync(url)).ToList();
while (downloadTasks.Count > 0)
{
await finishedTask = Task.WhenAny(downloadTasks)
if (finishedTask.Result == someValue)
{
return finishedTask.Result;
}
downloadTasks.Remove(finishedTask);
}
}