How to do parallel.for async methods - c#

Whats the best way to a parallel processing in c# with some async methods.
Let me explain with some simple code
Example Scenario: We have a person and 1000 text files from them. we want to check that his text files does not contain sensitive keywords, and if one of his text files contains sensitive keywords, we mark him with the untrusted. The method which check this is an async method, and as fast as we found one of the sensitive keywords further processing is not required and checking loop must be broke for that person.
For the best performance and making it so fast, we must use Parallel processing
simple psudocode:
boolean sesitivedetected=false;
Parallel.ForEach(textfilecollection,async (textfile,parallelloopstate)=>
{
if (await hassensitiveasync(textfile))
{
sensitivedetected=true;
parallelloopstate.break()
}
}
‌if (sensitivedetected)
markuntrusted(person)
Problem is that Parallel.ForEach don't wait until completion of async tasks so statement ‌if (sensitivedetected) is runned as soon as creating task are finished.
I read other Questions like write parallel.for with async and async/await and Parallel.For and lots of other pages.
This topics are usefull when you need the results of async methods to be collected and used later, but in my scenario execution of loop should be ended as soon as possible.
Update: Sample code:
Boolean detected=false;
Parallel.ForEach(UrlList, async (url, pls) =>
{
using (HttpClient hc = new HttpClient())
{
var result = await hc.GetAsync(url);
if ((await result.Content.ReadAsStringAsync()).Contains("sensitive"))
{
detected = true;
pls.Break();
}
}
});
if (detected)
Console.WriteLine("WARNING");

The simplest way to achieve what you need (and not what you want, because Threading is evil). Is to use ReactiveExtensions.
var firstSensitive = await UrlList
.Select(async url => {
using(var http = new HttpClient()
{
var result = await hc.GetAsync(url);
return await result.Content.ReadAsStringAsync();
}
})
.SelectMany(downloadTask => downloadTask.ToObservable())
.Where(result => result.Contains("sensitive"))
.FirstOrDefaultAsync();
if(firstSensitive != null)
Console.WriteLine("WARNING");
To limit the number of concurrent HTTP queries :
int const concurrentRequestLimit = 4;
var semaphore = new SemaphoreSlim(concurrentRequestLimit);
var firstSensitive = await UrlList
.Select(async url => {
await semaphore.WaitAsync()
try
using(var http = new HttpClient()
{
var result = await hc.GetAsync(url);
return await result.Content.ReadAsStringAsync();
}
finally
semaphore.Release();
})
.SelectMany(downloadTask => downloadTask.ToObservable())
.Where(result => result.Contains("sensitive"))
.FirstOrDefaultAsync();
if(firstSensitive != null)
Console.WriteLine("WARNING");

Related

What is the fastest and most efficient ways to make a large number of C# WebRequests?

I have a list of URLs (thousands), I want to asynchronously get page data from each URL as fast as possible without putting extreme load on the CPU.
I have tried using threading but it still feels quite slow:
public static ConcurrentQueue<string> List = new ConcurrentQueue<string>(); //URL List (assume I added them already)
public static void Threading()
{
for(int i=0;i<100;i++) //100 threads
{
Thread thread = new Thread(new ThreadStart(Task));
thread.Start();
}
}
public static void Task()
{
while(!(List.isEmpty))
{
List.TryDequeue(out string URL);
//GET REQUEST HERE
}
}
Is there any better way to do this? I want to do this asynchronously but I can't figure out how to do it, and I don't want to sacrifice speed or CPU efficiency to do so.
Thanks :)
You should use Microsoft's Reactive Framework (aka Rx) - NuGet System.Reactive and add using System.Reactive.Linq; - then you can do this:
public static IObservable<(string url, string content)> GetAllUrls(List<string> urls) =>
Observable
.Using(
() => new HttpClient(),
hc =>
from url in urls.ToObservable()
from response in Observable.FromAsync(() => hc.GetAsync(url))
from content in Observable.FromAsync(() => response.Content.ReadAsStringAsync())
select (url, content));
That allows you to consume the results in a couple of ways.
You can process them as they get produced:
IDisposable subscription =
GetAllUrls(urlsx).Subscribe(x => Console.WriteLine(x.content));
Or you can get all of them produced and then await the full results:
(string url, string content)[] results = await GetAllUrls(urlsx).ToArray();
You are best off using HttpClient which allows async Task requests.
Just store each task in a list, and await the whole list. To prevent too many requests at once, wait for any single one to complete if there are too many, and remove the completed one from the list.
const int maxDegreeOfParallelism = 100;
static HttpClient _client = new HttpClient();
public static async Task GetAllUrls(List<string> urls)
{
var tasks = new List<Task>(urls.Count);
foreach (var url in urls)
{
if (tasks.Count == maxDegreeOfParallelism) // this prevents too many requests at once
tasks.Remove(await Task.WhenAny(tasks));
tasks.Add(GetUrl(url));
}
await Task.WhenAll(tasks);
}
private static async Task GetUrl(string url)
{
using var response = await _client.GetAsync(url);
// handle response here
var responseStr = await response.Content.ReadAsStringAsync(); // whatever
// do stuff etc
}

Problems with parallel for loop

I am trying to load a full auctionhouse by loading each page async from an api call and putting all the same items together in a list in a dictionary. When I make the parallel for loop, It does not return anything. Help would be appricieted.
Have a great day!
-Vexea
{
string url = "https://api.hypixel.net/skyblock/auctions";
//Gets number of pages to make threads on the auction house...
using (HttpResponseMessage response = await ApiHelper.GetApiClient("application/json").GetAsync(url))
{
if (response.IsSuccessStatusCode)
{
AuctionHouseModel auctionHouse = await response.Content.ReadAsAsync<AuctionHouseModel>();
return auctionHouse.Pages;
}
else
{
return 0;
}
}
}
private static async Task<AuctionPageModel> LoadHypixelAuctionPage(int page, string apiKey)
{
//Loads a solid page...
string url = "https://api.hypixel.net/skyblock/auctions?key=" + apiKey + "&page=" + page;
using (HttpResponseMessage response = await ApiHelper.GetApiClient("application/json").GetAsync(url))
{
if (response.IsSuccessStatusCode)
{
return await response.Content.ReadAsAsync<AuctionPageModel>();
}
else
{
return null;
}
}
}
public async static Task<AuctionHouseModel> LoadHypixelAuctionHouse(string apiKey)
{
//Loads all pages needed and puts them into a dictionary...
List<AuctionPageModel> pages = new List<AuctionPageModel>();
AuctionHouseModel output = new AuctionHouseModel();
Parallel.For(1, await LoadHypixelAuctionPages(), async page => {
pages.Add(await LoadHypixelAuctionPage(page, apiKey)); //This returns nothing, count of pages stays 0 and nothing happens...
});
foreach (AuctionPageModel page in pages)
foreach(AuctionProductModel product in page.Products)
try
{
output.Products[product.Name].Add(product);
}
catch
{
output.Products.Add(product.Name, new List<AuctionProductModel>());
output.Products[product.Name].Add(product);
}
output.Pages = await LoadHypixelAuctionPages();
return output;
}
When you're doing parallel programming you need to make sure to use thread-safe types or locking. Perhaps there're more things wrong than this, but the first thing you need to fix is making sure to lock access to the pages list. Secondly, the first paramenter in Parallel.For is inclusive while the second parameter is exclusive. So if LoadHypixelAuctionPages() returns 0 or 1, nothing will run inside the loop, so you probably mean LoadHypixelAuctionPages() + 1 if the first page number is 1 and not 0:
List<AuctionPageModel> pages = new List<AuctionPageModel>();
AuctionHouseModel output = new AuctionHouseModel();
Parallel.For(1, await LoadHypixelAuctionPages() + 1, async page =>
{
var loadedPage = await LoadHypixelAuctionPage(page, apiKey);
lock(pages)
{
pages.Add(loadedPage);
}
});
//...
Take a look at this fiddle to see what can happen when not locking.
An alternative to locking is using one of the concurrent collections, such as ConcurrentQueue<T>
You can't use any Parallel methods with async. Parallel is for CPU-bound code and async is (primarily) for I/O-bound code. The Parallel class doesn't properly understand anything async.
Instead of parallel concurrency, you need asynchronous concurrency (Task.WhenAll):
List<AuctionPageModel> pages = new List<AuctionPageModel>();
AuctionHouseModel output = new AuctionHouseModel();
var tasks = Enumerable
.Range(1, await LoadHypixelAuctionPages())
.Select(async page => pages.Add(await LoadHypixelAuctionPage(page, apiKey)))
.ToList();
await Task.WhenAll(tasks);
or, more simply:
AuctionHouseModel output = new AuctionHouseModel();
var tasks = Enumerable
.Range(1, await LoadHypixelAuctionPages())
.Select(async page => await LoadHypixelAuctionPage(page, apiKey))
.ToList();
var pages = await Task.WhenAll(tasks);

Storing each async result in its own array element

Let's say I want to download 1000 recipes from a website. The websites accepts at most 10 concurrent connections. Each recipe should be stored in an array, at its corresponding index. (I don't want to send the array to the DownloadRecipe method.)
Technically, I've already solved the problem, but I would like to know if there is an even cleaner way to use async/await or something else to achieve it?
static async Task MainAsync()
{
int recipeCount = 1000;
int connectionCount = 10;
string[] recipes = new string[recipeCount];
Task<string>[] tasks = new Task<string>[connectionCount];
int r = 0;
while (r < recipeCount)
{
for (int t = 0; t < tasks.Length; t++)
{
tasks[t] = Task.Run(async () => recipes[r] = await DownloadRecipe(r));
r++;
}
await Task.WhenAll(tasks);
}
}
static async Task<string> DownloadRecipe(int index)
{
// ... await calls to download recipe
}
Also, this solution it's not optimal, since it doesn't bother starting a new download until all the 10 running downloads are finished. Is there something we can improve there without bloating the code too much? A thread pool limited to 10 threads?
There are many many ways you could do this. One way is to use an ActionBlock which give you access to MaxDegreeOfParallelism fairly easily and will work well with async methods
static async Task MainAsync()
{
var recipeCount = 1000;
var connectionCount = 10;
var recipes = new string[recipeCount];
async Task Action(int i) => recipes[i] = await DownloadRecipe(i);
var processor = new ActionBlock<int>(Action, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = connectionCount,
SingleProducerConstrained = true
});
for (var i = 0; i < recipeCount; i++)
await processor.SendAsync(i);
processor.Complete();
await processor.Completion;
}
static async Task<string> DownloadRecipe(int index)
{
...
}
Another way might be to use a SemaphoreSlim
var slim = new SemaphoreSlim(connectionCount, connectionCount);
var tasks = Enumerable
.Range(0, recipeCount)
.Select(Selector);
async Task<string> Selector(int i)
{
await slim.WaitAsync()
try
{
return await DownloadRecipe(i)
}
finally
{
slim.Release();
}
}
var recipes = await Task.WhenAll(tasks);
Another set of approaches is to use Reactive Extensions (Rx)... Once again there are many ways to do this, this is just an awaitable approach (and likely could be better all things considered)
var results = await Enumerable
.Range(0, recipeCount)
.ToObservable()
.Select(i => Observable.FromAsync(() => DownloadRecipe(i)))
.Merge(connectionCount)
.ToArray()
.ToTask();
Alternative approach to have 10 "pools" which will load data "simultaneously".
You don't need to wrap IO operations with the separate thread. Using separate thread for IO operations is just a waste of resources.
Notice that thread which downloads data will do nothing, but just waiting for a response. This is where async-await approach come very handy - we can send multiple requests without waiting them to complete and without wasting threads.
static async Task MainAsync()
{
var requests = Enumerable.Range(0, 1000).ToArray();
var maxConnections = 10;
var pools = requests
.GroupBy(i => i % maxConnections)
.Select(group => DownloadRecipesFor(group.ToArray()))
.ToArray();
await Task.WhenAll(pools);
var recipes = pools.SelectMany(pool => pool.Result).ToArray();
}
static async Task<IEnumerable<string>> DownLoadRecipesFor(params int[] requests)
{
var recipes = new List<string>();
foreach (var request in requests)
{
var recipe = await DownloadRecipe(request);
recipes.Add(recipe);
}
return recipes;
}
Because inside the pool (DownloadRecipesFor method) we download results one by one - we make sure that we have no more than 10 active requests all the time.
This is little bit more effective than originals, because we don't wait for 10 tasks to complete before starting next "bunch".
This is not ideal, because if last "pool" finishes early then others it aren't able to pickup next request to handle.
Final result will have corresponding indexes, because we will process "pools" and requests inside in same order as we created them.

How to properly execute a List of Tasks async in C#

I have a list of objects that I need to run a long running process on and I would like to kick them off asynchronously, then when they are all finished return them as a list to the calling method. I've been trying different methods that I have found, however it appears that the processes are still running synchronously in the order that they are in the list. So I am sure that I am missing something in the process of how to execute a list of tasks.
Here is my code:
public async Task<List<ShipmentOverview>> GetShipmentByStatus(ShipmentFilterModel filter)
{
if (string.IsNullOrEmpty(filter.Status))
{
throw new InvalidShipmentStatusException(filter.Status);
}
var lookups = GetLookups(false, Brownells.ConsolidatedShipping.Constants.ShipmentStatusType);
var lookup = lookups.SingleOrDefault(sd => sd.Name.ToLower() == filter.Status.ToLower());
if (lookup != null)
{
filter.StatusId = lookup.Id;
var shipments = Shipments.GetShipments(filter);
var tasks = shipments.Select(async model => await GetOverview(model)).ToList();
ShipmentOverview[] finishedTask = await Task.WhenAll(tasks);
return finishedTask.ToList();
}
else
{
throw new InvalidShipmentStatusException(filter.Status);
}
}
private async Task<ShipmentOverview> GetOverview(ShipmentModel model)
{
String version;
var user = AuthContext.GetUserSecurityModel(Identity.Token, out version) as UserSecurityModel;
var profile = AuthContext.GetProfileSecurityModel(user.Profiles.First());
var overview = new ShipmentOverview
{
Id = model.Id,
CanView = true,
CanClose = profile.HasFeatureAction("Shipments", "Close", "POST"),
CanClear = profile.HasFeatureAction("Shipments", "Clear", "POST"),
CanEdit = profile.HasFeatureAction("Shipments", "Get", "PUT"),
ShipmentNumber = model.ShipmentNumber.ToString(),
ShipmentName = model.Name,
};
var parcels = Shipments.GetParcelsInShipment(model.Id);
overview.NumberParcels = parcels.Count;
var orders = parcels.Select(s => WareHouseClient.GetOrderNumberFromParcelId(s.ParcelNumber)).ToList();
overview.NumberOrders = orders.Distinct().Count();
//check validations
var vals = Shipments.GetShipmentValidations(model.Id);
if (model.ValidationTypeId == Constants.OrderValidationType)
{
if (vals.Count > 0)
{
overview.NumberOrdersTotal = vals.Count();
overview.NumberParcelsTotal = vals.Sum(s => WareHouseClient.GetParcelsPerOrder(s.ValidateReference));
}
}
return overview;
}
It looks like you're using asynchronous methods while you really want threads.
Asynchronous methods yield control back to the calling method when an async method is called, then wait until the methods has completed on the await. You can see how it works here.
Basically, the only usefulness of async/await methods is not to lock the UI, so that it stays responsive.
If you want to fire multiple processings in parallel, you will want to use threads, like such:
using System.Threading.Tasks;
public void MainMethod() {
// Parallel.ForEach will automagically run the "right" number of threads in parallel
Parallel.ForEach(shipments, shipment => ProcessShipment(shipment));
// do something when all shipments have been processed
}
public void ProcessShipment(Shipment shipment) { ... }
Marking the method as async doesn't auto-magically make it execute in parallel. Since you're not using await at all, it will in fact execute completely synchronously as if it wasn't async. You might have read somewhere that async makes functions execute asynchronously, but this simply isn't true - forget it. The only thing it does is build a state machine to handle task continuations for you when you use await and actually build all the code to manage those tasks and their error handling.
If your code is mostly I/O bound, use the asynchronous APIs with await to make sure the methods actually execute in parallel. If they are CPU bound, a Task.Run (or Parallel.ForEach) will work best.
Also, there's no point in doing .Select(async model => await GetOverview(model). It's almost equivalent to .Select(model => GetOverview(model). In any case, since the method actually doesn't return an asynchronous task, it will be executed while doing the Select, long before you get to the Task.WhenAll.
Given this, even the GetShipmentByStatus's async is pretty much useless - you only use await to await the Task.WhenAll, but since all the tasks are already completed by that point, it will simply complete synchronously.
If your tasks are CPU bound and not I/O bound, then here is the pattern I believe you're looking for:
static void Main(string[] args) {
Task firstStepTask = Task.Run(() => firstStep());
Task secondStepTask = Task.Run(() => secondStep());
//...
Task finalStepTask = Task.Factory.ContinueWhenAll(
new Task[] { step1Task, step2Task }, //more if more than two steps...
(previousTasks) => finalStep());
finalStepTask.Wait();
}

GetStringAsync with ContinueWith and WhenAll -- Not Waiting for all requests to finish

I'm trying to process each individual request as it finishes, which is what would happen in the ContinueWith after the GetStringAsync and then when they've all completed have an additional bit of processing.
However, it seems that the ContinueWith on the WhenAll fires right away. It appears as if the GetStringAsync tasks are faulting, but I can't figure out why.
When I use WaitAll instead of WhenAll and just add my processing after the WaitAll then my requests work just fine. But when I change it to WhenAll it fails.
Here is an example of my code:
using (var client = new HttpClient())
{
Task.WhenAll(services.Select(x =>
{
return client.GetStringAsync(x.Url).ContinueWith(response =>
{
Console.WriteLine(response.Result);
}, TaskContinuationOptions.AttachedToParent);
}).ToArray())
.ContinueWith(response =>
{
Console.WriteLine("All tasks completed");
});
}
You shouldn't use ContinueWith and TaskContinuationOptions.AttachedToParent when using async-await. Use async-await only, instead:
async Task<IEnumerable<string>> SomeMethod(...)
{
using (var client = new HttpClient())
{
var ss = await Task.WhenAll(services.Select(async x =>
{
var s = await client.GetStringAsync(x.Url);
Console.WriteLine(response);
return s;
};
Console.WriteLine("All tasks completed");
return ss;
}
}
Well, I found the issue. I'll leave it here in case anyone else comes along looking for a similar answer. I still needed to await the Task.WhenAll method.
So, the correct code would be:
using (var client = new HttpClient())
{
await Task.WhenAll(services.Select(x =>
{
return client.GetStringAsync(x.Url).ContinueWith(response =>
{
Console.WriteLine(response.Result);
}, TaskContinuationOptions.AttachedToParent);
}).ToArray())
.ContinueWith(response =>
{
Console.WriteLine("All tasks completed");
});
}
I still see a couple issues with your solution:
Drop the using statement - you don't want to dispose HttpClient.
Drop the ContinueWiths - you don't need them if your are using await properly.
The Task.WhenAny approach described in this MSDN article is a somewhat cleaner way to process tasks as they complete.
I would re-write your example like this:
var client = new HttpClient();
var tasks = services.Select(x => client.GetStringAsync(x.Url)).ToList();
while (tasks.Count > 0)
{
var firstDone = await Task.WhenAny(tasks);
tasks.Remove(firstDone);
Console.WriteLine(await firstDone);
}
Console.WriteLine("All tasks completed");
Edit to address comment:
If you need access to the service object as the tasks complete, one way would be to modify tasks to be a list of Task<ObjectWithMoreData> instead of Task<string>. Notice the lambda is marked async so we can await within it:
var client = new HttpClient();
var tasks = services.Select(async x => new
{
Service = x,
ResponseText = await client.GetStringAsync(x.Url)
}).ToList();
while (tasks.Count > 0)
{
var firstDone = await Task.WhenAny(tasks);
tasks.Remove(firstDone);
var result = await firstDone;
Console.WriteLine(result.ResponseText);
// do something with result.Service
}
Console.WriteLine("All tasks completed");

Categories