C# LanguageExt - combine multiple async calls into one grouped call - c#

I have a method that looks up an item asynchronously from a datastore;
class MyThing {}
Task<Try<MyThing>> GetThing(int thingId) {...}
I want to look up multiple items from the datastore, and wrote a new method to do this. I also wrote a helper method that will take multiple Try<T> and combine their results into a single Try<IEnumerable<T>>.
public static class TryExtensions
{
Try<IEnumerable<T>> Collapse<T>(this IEnumerable<Try<T>> items)
{
var failures = items.Fails().ToArray();
return failures.Any() ?
Try<IEnumerable<T>>(new AggregateException(failures)) :
Try(items.Select(i => i.Succ(a => a).Fail(Enumerable.Empty<T>())));
}
}
async Task<Try<MyThing[]>> GetThings(IEnumerable<string> ids)
{
var results = new List<Try<Things>>();
foreach (var id in ids)
{
var thing = await GetThing(id);
results.Add(thing);
}
return results.Collapse().Map(p => p.ToArray());
}
Another way to do it would be like this;
async Task<Try<MyThing[]>> GetThings(IEnumerable<string> ids)
{
var tasks = ids.Select(async id => await GetThing(id)).ToArray();
await Task.WhenAll(tasks);
return tasks.Select(t => t.Result).Collapse().Map(p => p.ToArray());
}
The problem with this is that all the tasks will run in parallel and I don't want to hammer my datastore with lots of parallel requests. What I really want is to make my code functional, using monadic principles and features of LanguageExt. Does anyone know how to achieve this?
Update
Thanks for the suggestion #MatthewWatson, this is what it looks like with the SemaphoreSlim;
async Task<Try<MyThing[]>> GetThings(IEnumerable<string> ids)
{
var mutex = new SemaphoreSlim(1);
var results = ids.Select(async id =>
{
await mutex.WaitAsync();
try { return await GetThing(id); }
finally { mutex.Release(); }
}).ToArray();
await Task.WhenAll(tasks);
return tasks.Select(t => t.Result).Collapse().Map(Enumerable.ToArray);
return results.Collapse().Map(p => p.ToArray());
}
Problem is, this is still not very monadic / functional, and ends up with more lines of code than my original code with a foreach block.

In the "Another way" you almost achieved your goal when you called:
var tasks = ids.Select(async id => await GetThing(id)).ToArray();
Except that Tasks doesn't run sequentially so you will end up with many queries hitting your datastore, which is caused by .ToArray() and Task.WhenAll. Once you called .ToArray() it allocated and started the Tasks already, so if you can "tolerate" one foreach to achieve the sequential tasks running, like this:
public static class TaskExtensions
{
public static async Task RunSequentially<T>(this IEnumerable<Task<T>> tasks)
{
foreach (var task in tasks) await task;
}
}
Despite that running a "loop" of queries is not a quite good practice
in general, unless you have in some background service and some
special scenario, leveraging this to the Database engine through
WHERE thingId IN (...) in general is a better option. Even you
have big amount of thingIds we can slice it into small 10s, 100s.. to
narrow the WHERE IN footprint.
Back to our RunSequentially, I would like to make it more functional like this for example:
tasks.ToList().ForEach(async task => await task);
But sadly this will still run kinda "Parallel" tasks.
So the final usage should be:
async Task<Try<MyThing[]>> GetThings(IEnumerable<string> ids)
{
var tasks = ids.Select(id => GetThing(id));// remember don't use .ToArray or ToList...
await tasks.RunSequentially();
return tasks.Select(t => t.Result).Collapse().Map(p => p.ToArray());
}
Another overkill functional solution is to get Lazy in a Queue recursively !!
Instead GetThing, get a Lazy one GetLazyThing that returns Lazy<Task<Try<MyThing>>> simply by wrapping GetThing:
new Lazy<Task<Try<MyThing>>>(() => GetThing(id))
Now using couple extensions/functions:
public static async Task RecRunSequentially<T>(this IEnumerable<Lazy<Task<T>>> tasks)
{
var queue = tasks.EnqueueAll();
await RunQueue(queue);
}
public static Queue<T> EnqueueAll<T>(this IEnumerable<T> list)
{
var queue = new Queue<T>();
list.ToList().ForEach(m => queue.Enqueue(m));
return queue;
}
public static async Task RunQueue<T>(Queue<Lazy<Task<T>>> queue)
{
if (queue.Count > 0)
{
var task = queue.Dequeue();
await task.Value; // this unwraps the Lazy object content
await RunQueue(queue);
}
}
Finally:
var lazyTasks = ids.Select(id => GetLazyThing(id));
await lazyTasks.RecRunSequentially();
// Now collapse and map as you like
Update
However if you don't like the fact that EnqueueAll and RunQueue are not "pure", we can take the following approach with the same Lazy trick
public static async Task AwaitSequentially<T>(this Lazy<Task<T>>[] array, int index = 0)
{
if (array == null || index < 0 || index >= array.Length - 1) return;
await array[index].Value;
await AwaitSequentially(array, index + 1); // ++index is not pure :)
}
Now:
var lazyTasks = ids.Select(id => GetLazyThing(id));
await tasks.ToArray().AwaitSequentially();
// Now collapse and map as you like

Related

Storing each async result in its own array element

Let's say I want to download 1000 recipes from a website. The websites accepts at most 10 concurrent connections. Each recipe should be stored in an array, at its corresponding index. (I don't want to send the array to the DownloadRecipe method.)
Technically, I've already solved the problem, but I would like to know if there is an even cleaner way to use async/await or something else to achieve it?
static async Task MainAsync()
{
int recipeCount = 1000;
int connectionCount = 10;
string[] recipes = new string[recipeCount];
Task<string>[] tasks = new Task<string>[connectionCount];
int r = 0;
while (r < recipeCount)
{
for (int t = 0; t < tasks.Length; t++)
{
tasks[t] = Task.Run(async () => recipes[r] = await DownloadRecipe(r));
r++;
}
await Task.WhenAll(tasks);
}
}
static async Task<string> DownloadRecipe(int index)
{
// ... await calls to download recipe
}
Also, this solution it's not optimal, since it doesn't bother starting a new download until all the 10 running downloads are finished. Is there something we can improve there without bloating the code too much? A thread pool limited to 10 threads?
There are many many ways you could do this. One way is to use an ActionBlock which give you access to MaxDegreeOfParallelism fairly easily and will work well with async methods
static async Task MainAsync()
{
var recipeCount = 1000;
var connectionCount = 10;
var recipes = new string[recipeCount];
async Task Action(int i) => recipes[i] = await DownloadRecipe(i);
var processor = new ActionBlock<int>(Action, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = connectionCount,
SingleProducerConstrained = true
});
for (var i = 0; i < recipeCount; i++)
await processor.SendAsync(i);
processor.Complete();
await processor.Completion;
}
static async Task<string> DownloadRecipe(int index)
{
...
}
Another way might be to use a SemaphoreSlim
var slim = new SemaphoreSlim(connectionCount, connectionCount);
var tasks = Enumerable
.Range(0, recipeCount)
.Select(Selector);
async Task<string> Selector(int i)
{
await slim.WaitAsync()
try
{
return await DownloadRecipe(i)
}
finally
{
slim.Release();
}
}
var recipes = await Task.WhenAll(tasks);
Another set of approaches is to use Reactive Extensions (Rx)... Once again there are many ways to do this, this is just an awaitable approach (and likely could be better all things considered)
var results = await Enumerable
.Range(0, recipeCount)
.ToObservable()
.Select(i => Observable.FromAsync(() => DownloadRecipe(i)))
.Merge(connectionCount)
.ToArray()
.ToTask();
Alternative approach to have 10 "pools" which will load data "simultaneously".
You don't need to wrap IO operations with the separate thread. Using separate thread for IO operations is just a waste of resources.
Notice that thread which downloads data will do nothing, but just waiting for a response. This is where async-await approach come very handy - we can send multiple requests without waiting them to complete and without wasting threads.
static async Task MainAsync()
{
var requests = Enumerable.Range(0, 1000).ToArray();
var maxConnections = 10;
var pools = requests
.GroupBy(i => i % maxConnections)
.Select(group => DownloadRecipesFor(group.ToArray()))
.ToArray();
await Task.WhenAll(pools);
var recipes = pools.SelectMany(pool => pool.Result).ToArray();
}
static async Task<IEnumerable<string>> DownLoadRecipesFor(params int[] requests)
{
var recipes = new List<string>();
foreach (var request in requests)
{
var recipe = await DownloadRecipe(request);
recipes.Add(recipe);
}
return recipes;
}
Because inside the pool (DownloadRecipesFor method) we download results one by one - we make sure that we have no more than 10 active requests all the time.
This is little bit more effective than originals, because we don't wait for 10 tasks to complete before starting next "bunch".
This is not ideal, because if last "pool" finishes early then others it aren't able to pickup next request to handle.
Final result will have corresponding indexes, because we will process "pools" and requests inside in same order as we created them.

Task.WhenAny blocking

I'm not sure if I'm misunderstanding the usage of Task.WhenAny but in the following code only "0" gets printed when it should print "1" and "2" and then "mx.Name" every time a task finishes:
public async void PopulateMxRecords(List<string> nsRecords, int threads)
{
ThreadPool.SetMinThreads(threads, threads);
var resolver = new DnsStubResolver();
var tasks = nsRecords.Select(ns => resolver.ResolveAsync<MxRecord>(ns, RecordType.Mx));
Console.WriteLine("0");
var finished = Task.WhenAny(tasks);
Console.WriteLine("1");
while (mxNsRecords.Count < nsRecords.Count)
{
Console.WriteLine("2");
var task = await finished;
var mxRecords = await task;
foreach(var mx in mxRecords)
Console.WriteLine(mx.Name);
}
}
The DnsStubResolver is part of ARSoft.Tools.Net.Dns. The nsRecords list contains up to 2 million strings.
I'm not sure if I'm misunderstanding the usage of Task.WhenAny
You might be. The pattern you seem to be looking for is interleaving. In the following example, notice the important changes that I have made:
use ToList() to materialize the LINQ query results,
move WhenAny() into the loop,
use Remove(task) as each task completes, and
run the while loop as long as tasks.Count() > 0.
Those are the important changes. The other changes are there to make your listing into a runnable demo of interleaving, the full listing of which is here: https://dotnetfiddle.net/nr1gQ7
public static async Task PopulateMxRecords(List<string> nsRecords)
{
var tasks = nsRecords.Select(ns => ResolveAsync(ns)).ToList();
while (tasks.Count() > 0)
{
var task = await Task.WhenAny(tasks);
tasks.Remove(task);
var mxRecords = await task;
Console.WriteLine(mxRecords);
}
}

C# Running many async tasks the same time

I'm kinda new to async tasks.
I've a function that takes student ID and scrapes data from specific university website with the required ID.
private static HttpClient client = new HttpClient();
public static async Task<Student> ParseAsync(string departmentLink, int id, CancellationToken ct)
{
string website = string.Format(departmentLink, id);
try
{
string data;
var stream = await client.GetAsync(website, ct);
using (var reader = new StreamReader(await stream.Content.ReadAsStreamAsync(), Encoding.GetEncoding("windows-1256")))
data = reader.ReadToEnd();
//Parse data here and return Student.
} catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
And it works correctly. Sometimes though I need to run this function for a lot of students so I use the following
for(int i = ids.first; i <= ids.last; i++)
{
tasks[i - ids.first] = ParseStudentData.ParseAsync(entity.Link, i, cts.Token).ContinueWith(t =>
{
Dispatcher.Invoke(() =>
{
listview_students.Items.Add(t.Result);
//Students.Add(t.Result);
//lbl_count.Content = $"{listview_students.Items.Count}/{testerino.Length}";
});
});
}
I'm storing tasks in an array to wait for them later.
This also works finely as long as the students count is between (0, ~600?) it's kinda random.
And then for every other student that still hasn't been parsed throws A task was cancelled.
Keep in mind that, I never use the cancellation token at all.
I need to run this function on so many students it can reach ~9000 async task altogether. So what's happening?
You are basically creating a denial of service attack on the website when you are queuing up 9000 requests in such a short time frame. Not only is this causing you errors, but it could take down the website. It would be best to limit the number of concurrent requests to a more reasonable value (say 30). While there are probably several ways to do this, one that comes to mind is the following:
private async Task Test()
{
var tasks = new List<Task>();
for (int i = ids.first; i <= ids.last; i++)
{
tasks.Add(/* Do stuff */);
await WaitList(tasks, 30);
}
}
private async Task WaitList(IList<Task> tasks, int maxSize)
{
while (tasks.Count > maxSize)
{
var completed = await Task.WhenAny(tasks).ConfigureAwait(false);
tasks.Remove(completed);
}
}
Other approaches might leverage the producer/consumer pattern using .Net classes such as a BlockingCollection
This is what I ended up with based on #erdomke code:
public static async Task ForEachParallel<T>(
this IEnumerable<T> list,
Func<T, Task> action,
int dop)
{
var tasks = new List<Task>(dop);
foreach (var item in list)
{
tasks.Add(action(item));
while (tasks.Count >= dop)
{
var completed = await Task.WhenAny(tasks).ConfigureAwait(false);
tasks.Remove(completed);
}
}
// Wait for all remaining tasks.
await Task.WhenAll(tasks).ConfigureAwait(false);
}
// usage
await Enumerable
.Range(1, 500)
.ForEachParallel(i => ProcessItem(i), Environment.ProcessorCount);

creating a .net async wrapper to a sync request

I have the following situation (or a basic misunderstanding with the async await mechanism).
Assume you have a set of 1-20 web request call that takes a long time: findItemsByProduct().
you want to wrap it around in an async request, that would be able to abstract all these calls into one async call, but I can't seem to be able to do it without using more threads.
If I'm doing:
int total = result.paginationOutput.totalPages;
for (int i = 2; i < total + 1; i++)
{
await Task.Factory.StartNew(() =>
{
result = client.findItemsByProduct(i);
});
newList.AddRange(result.searchResult.item);
}
}
return newList;
problem here, that the calls don't run together, rather they are waiting one by one.
I would like all the calls to run together and than harvest the results.
as pseudo code, I would like the code to run like this:
forEach item {
result = item.makeWebRequest();
}
foreach item {
List.addRange(item.harvestResults);
}
I have no idea how to make the code to do that though..
Ideally, you should add a findItemsByProductAsync that returns a Task<Item[]>. That way, you don't have to create unnecessary tasks using StartNew or Task.Run.
Then your code can look like this:
int total = result.paginationOutput.totalPages;
// Start all downloads; each download is represented by a task.
Task<Item[]>[] tasks = Enumerable.Range(2, total - 1)
.Select(i => client.findItemsByProductAsync(i)).ToArray();
// Wait for all downloads to complete.
Item[][] results = await Task.WhenAll(tasks);
// Flatten the results into a single collection.
return results.SelectMany(x => x).ToArray();
Given your requirements which I see as:
Process n number of non-blocking tasks
Process results after all queries have returned
I would use the CountdownEvent for this e.g.
var results = new ConcurrentBag<ItemType>(result.pagination.totalPages);
using (var e = new CountdownEvent(result.pagination.totalPages))
{
for (int i = 2; i <= result.pagination.totalPages+1; i++)
{
Task.Factory.StartNew(() => return client.findItemsByProduct(i))
.ContinueWith(items => {
results.AddRange(items);
e.Signal(); // signal task is done
});
}
// Wait for all requests to complete
e.Wait();
}
// Process results
foreach (var item in results)
{
...
}
This particular problem is solved easily enough without even using await. Simply create each of the tasks, put all of the tasks into a list, and then use WhenAll on that list to get a task that represents the completion of all of those tasks:
public static Task<Item[]> Foo()
{
int total = result.paginationOutput.totalPages;
var tasks = new List<Task<Item>>();
for (int i = 2; i < total + 1; i++)
{
tasks.Add(Task.Factory.StartNew(() => client.findItemsByProduct(i)));
}
return Task.WhenAll(tasks);
}
Also note you have a major problem in how you use result in your code. You're having each of the different tasks all using the same variable, so there are race conditions as to whether or not it works properly. You could end up adding the same call twice and having one skipped entirely. Instead you should have the call to findItemsByProduct be the result of the task, and use that task's Result.
If you want to use async-await properly you have to declare your functions async, and the functions that call you also have to be async. This continues until you have once synchronous function that starts the async process.
Your function would look like this:
by the way you didn't describe what's in the list. I assume they are
object of type T. in that case result.SearchResult.Item returns
IEnumerable
private async Task<List<T>> FindItems(...)
{
int total = result.paginationOutput.totalPages;
var newList = new List<T>();
for (int i = 2; i < total + 1; i++)
{
IEnumerable<T> result = await Task.Factory.StartNew(() =>
{
return client.findItemsByProduct(i);
});
newList.AddRange(result.searchResult.item);
}
return newList;
}
If you do it this way, your function will be asynchronous, but the findItemsByProduct will be executed one after another. If you want to execute them simultaneously you should not await for the result, but start the next task before the previous one is finished. Once all tasks are started wait until all are finished. Like this:
private async Task<List<T>> FindItems(...)
{
int total = result.paginationOutput.totalPages;
var tasks= new List<Task<IEnumerable<T>>>();
// start all tasks. don't wait for the result yet
for (int i = 2; i < total + 1; i++)
{
Task<IEnumerable<T>> task = Task.Factory.StartNew(() =>
{
return client.findItemsByProduct(i);
});
tasks.Add(task);
}
// now that all tasks are started, wait until all are finished
await Task.WhenAll(tasks);
// the result of each task is now in task.Result
// the type of result is IEnumerable<T>
// put all into one big list using some linq:
return tasks.SelectMany ( task => task.Result.SearchResult.Item)
.ToList();
// if you're not familiar to linq yet, use a foreach:
var newList = new List<T>();
foreach (var task in tasks)
{
newList.AddRange(task.Result.searchResult.item);
}
return newList;
}

Parallel.ForEach not setting all values in loop

I am querying a sql data base for some employees.
When i receive these employees I and looping each one using the Parallel.ForEach.
The only reason I'm looping he employees that was retrieved from the database is so expand a few of the properties that I do not want to clutter up the data base with.
Never-the-less in this example I am attempting to set the Avatar for the current employee in the loop, but only one out of three always gets set, none of the other employees Avatar ever gets set to their correct URI. Basically, I'm taking the file-name of the avatar and building the full path to the users folder.
What am I doing wrong here to where each employees Avatar is not updated with the full path to their directory, like the only one that is being set? Parallel stack and there is in deep four
I'm sure I've got the code formatted incorrectly. I've looked at that Parallel Task and it does in deep create 4 Parallel Task on 6 Threads.
Can someone point out to me the correct way to format the code to use Parallel?
Also, one thing, if I remove the return await Task.Run()=> from the GetEmployees method I get an error of cannot finish task because some other task fished first.
The Parallel is acting as if it is only setting one of the Avatars for one of the employees.
---Caller
public async static Task<List<uspGetEmployees_Result>> GetEmployess(int professionalID, int startIndex, int pageSize, string where, string equals)
{
var httpCurrent = HttpContext.Current;
return await Task.Run(() =>
{
List<uspGetEmployees_Result> emps = null;
try
{
using (AFCCInc_ComEntities db = new AFCCInc_ComEntities())
{
var tempEmps = db.uspGetEmployees(professionalID, startIndex, pageSize, where, equals);
if (tempEmps != null)
{
emps = tempEmps.ToList<uspGetEmployees_Result>();
Parallel.ForEach<uspGetEmployees_Result>(
emps,
async (e) =>
{
e.Avatar = await Task.Run(() => BuildUserFilePath(e.Avatar, e.UserId, httpCurrent, true));
}
);
};
}
}
catch (SqlException ex)
{
throw ex;
};
return emps;
});
}
--Callee
static string BuildUserFilePath(object fileName, object userProviderKey, HttpContext context, bool resolveForClient = false)
{
return string.Format("{0}/{1}/{2}",
resolveForClient ?
context.Request.Url.AbsoluteUri.Replace(context.Request.Url.PathAndQuery, "") : "~",
_membersFolderPath + AFCCIncSecurity.Folder.GetEncryptNameForSiteMemberFolder(userProviderKey.ToString(), _cryptPassword),
fileName.ToString());
}
----------------------------------------Edit------------------------------------
The final code that I'm using with everyone's help. Thanks so much!
public async static Task<List<uspGetEmployees_Result>> GetEmployess(int professionalID, int startIndex, int pageSize, string where, string equals)
{
var httpCurrent = HttpContext.Current;
List<uspGetEmployees_Result> emps = null;
using (AFCCInc_ComEntities db = new AFCCInc_ComEntities())
{
emps = await Task.Run(() => (db.uspGetEmployees(professionalID, startIndex, pageSize, where, equals) ?? Enumerable.Empty<uspGetEmployees_Result>()).ToList());
if (emps.Count() == 0) { return null; }
int skip = 0;
while (true)
{
// Do parallel processing in "waves".
var tasks = emps
.Take(Environment.ProcessorCount)
.Select(e => Task.Run(() => e.Avatar = BuildUserFilePath(e.Avatar, e.UserId, httpCurrent, true))) // No await here - we just want the tasks.
.Skip(skip)
.ToArray();
if (tasks.Length == 0) { break; }
skip += Environment.ProcessorCount;
await Task.WhenAll(tasks);
};
}
return emps;
}
Your definition of BuildUserFilePath and its usage are inconsistent. The definition clearly states that it's a string-returning method, whereas its usage implies that it returns a Task<>.
Parallel.ForEach and async don't mix very well - that's the reason your bug happened in the first place.
Unrelated but still worth noting: your try/catch is redundant as all it does is rethrow the original SqlException (and even that it doesn't do very well because you'll end up losing the stack trace).
Do you really, really want to return null?
public async static Task<List<uspGetEmployees_Result>> GetEmployess(int professionalID, int startIndex, int pageSize, string where, string equals)
{
var httpCurrent = HttpContext.Current;
// Most of these operations are unlikely to be time-consuming,
// so why await the whole thing?
using (AFCCInc_ComEntities db = new AFCCInc_ComEntities())
{
// I don't really know what exactly uspGetEmployees returns
// and, if it's an IEnumerable, whether it yields its elements lazily.
// The fact that it can be null, however, bothers me, so I'll sidestep it.
List<uspGetEmployees_Result> emps = await Task.Run(() =>
(db.uspGetEmployees(professionalID, startIndex, pageSize, where, equals) ?? Enumerable.Empty<uspGetEmployees_Result>()).ToList()
);
// I'm assuming that BuildUserFilePath returns string - no async.
await Task.Run(() =>
{
Parallel.ForEach(emps, e =>
{
// NO async/await within the ForEach delegate body.
e.Avatar = BuildUserFilePath(e.Avatar, e.UserId, httpCurrent, true);
});
});
}
}
There seems to be over-use of async and Task.Run() in this code. For example, what are you hoping to achieve from this segment?
Parallel.ForEach<uspGetEmployees_Result>(
emps,
async (e) =>
{
e.Avatar = await Task.Run(() => BuildUserFilePath(e.Avatar, e.UserId, httpCurrent, true));
}
);
You're already using await on the result of the entire method, and you've used a Parallel.ForEach to get parallel execution of items in your loop, so what does the additional use of await Task.Run() get you? The code would certainly be a lot easier to follow without it.
It is not clear to me what you are trying to achieve here. Can you describe what your objectives are for this method?

Categories