I'm trying to create a web-scraper that queries a lot of urls in parallel and waits for their responses using Task.WhenAll(). However if one of the Tasks are unsuccessful, WhenAll fails. I am expecting many of the Tasks to return a 404 and wish to handle or ignore those. For example:
string urls = Enumerable.Range(1, 1000).Select(i => "https://somewebsite.com/" + i));
List<Task<string>> tasks = new List<Task<string>>();
foreach (string url in urls)
{
tasks.Add(Task.Run(() => {
try
{
return (new HttpClient()).GetStringAsync(url);
}
catch (HttpRequestException)
{
return Task.FromResult<string>("");
}
}));
}
var responseStrings = await Task.WhenAll(tasks);
This never hits the catch statement, and WhenAll fails at the first 404. How can I get WhenAll to ignore exceptions and just return the Tasks that completed successfully? Better yet, could it be done somewhere in the code below?
var tasks = Enumerable.Range(1, 1000).Select(i => (new HttpClient()).GetStringAsync("https://somewebsite.com/" + i))));
var responseStrings = await Task.WhenAll(tasks);
Thanks for your help.
You need to use await to observe the exception:
var tasks = Enumerable.Range(1, 1000).Select(i => TryGetStringAsync("https://somewebsite.com/" + i));
var responseStrings = await Task.WhenAll(tasks);
var validResponses = responseStrings.Where(x => x != null);
private async Task TryGetStringAsync(string url)
{
try
{
return await httpClient.GetStringAsync(url);
}
catch (HttpRequestException)
{
return null;
}
}
Related
I want to improve the performance and remove the delay in showing the data to the user on the screen. As per requirement, I need to get the list of the data from a different source, then get the further data from other sources based on the previous data which takes a lot of time and feel that executing them sequentially.
I am looking for the suggestion to improve the performance, asynchronously call the client and wait at the end and reduce the wait time of the request.
foreach (var n in player.data)
{
var request1 = new HttpRequestMessage(HttpMethod.Get, "https://api.*****.com/buckets/" + **** + "/tests/" + n.id);
var client1 = new HttpClient();
request1.Headers.Authorization = new System.Net.Http.Headers.AuthenticationHeaderValue("Bearer", "****-b23a-*****-b1be-********");
HttpResponseMessage response1 = await client1.SendAsync(request1, HttpCompletionOption.ResponseHeadersRead);
List<dataroot> root1 = new List<dataroot>();
if (response1.StatusCode == System.Net.HttpStatusCode.OK)
{
try
{
var apiString1 = await response1.Content.ReadAsStringAsync();
var player1 = Newtonsoft.Json.JsonConvert.DeserializeObject<envRoot>(apiString1);
if (!string.IsNullOrEmpty(player1.data.environments[0].parent_environment_id))
{
player.data.Where(x => x.id == player1.data.environments[0].test_id).ToList().ForEach(s => s.isShared = true);
player.data.Where(x => x.id == player1.data.environments[0].test_id).ToList().ForEach(s => s.sharedEnvironmentId = player1.data.environments[0].parent_environment_id);
//player.data.Where(x=>x.id==player1.data.environments[0].test_id).ToList().ForEach(s=>s.sharedEnvironmentId=player1.data.environments[0].test_id);
}
player.data.Where(x => x.id == player1.data.environments[0].test_id).ToList().ForEach(s => s.normalenvironmentId = player1.data.environments[0].id);
}
catch (Exception ex)
{
var test = ex;
}
}
}
You can try the way I did in my sample below:
https://github.com/rajabb/RunningLongRunningTasksEfficientlyAndWaitAtEnd
The main part of code is:
List<Task> tasks = new List<Task>();
for (int i = 0; i < 100; i++)
{
tasks.Add(LongRunningTask.RunAsync(i.ToString()));
}
await Task.WhenAll(tasks.ToArray());
I've used the below code from this post - What is the best way to cal API calls in parallel in .net Core, C#?
It works fine, but when I'm processing a large list, some of the calls fail.
My question is, how can I implement Retry logic into this?
foreach (var post in list)
{
async Task<string> func()
{
var response = await client.GetAsync("posts/" + post);
return await response.Content.ReadAsStringAsync();
}
tasks.Add(func());
}
await Task.WhenAll(tasks);
var postResponses = new List<string>();
foreach (var t in tasks) {
var postResponse = await t; //t.Result would be okay too.
postResponses.Add(postResponse);
Console.WriteLine(postResponse);
}
This is my attempt to use Polly. It doesn't work as it still fails on around the same amount of requests as before.
What am I doing wrong?
var policy = Policy
.Handle<HttpRequestException>()
.RetryAsync(3);
foreach (var mediaItem in uploadedMedia)
{
var mediaRequest = new HttpRequestMessage { *** }
async Task<string> func()
{
var response = await client.SendAsync(mediaRequest);
return await response.Content.ReadAsStringAsync();
}
tasks.Add(policy.ExecuteAsync(() => func()));
}
await Task.WhenAll(tasks);
The following code run M1, M2, M3, M4 in parallel. Each method may raise exceptions. The method should return the results of the four async methods - either the int returned by methods or the Exceptions.
async Task<string> RunAll()
{
int m1result, m2result, m3result, m4result;
try
{
var m1task = M1();
var m2task = M2();
var m3task = M3();
var m4task = M4();
// await Task.WhenAll(new Task<int>[] { m1task, m2task, m3task, m4task });
m1result = await m1task;
m2result = await m2task;
m3result = await m3task;
m4result = await m4task;
}
catch (Exception ex)
{
// need to return the ex of the failed task. How?
}
// How to implement M1HasException, M2HasException, ... in the following lines?
var m1msg = M1HasException ? M1ExceptionMessage : m1result.ToString();
var m2msg = M2HasException ? M2ExceptionMessage : m2result.ToString();
var m3msg = M3HasException ? M3ExceptionMessage : m3result.ToString();
var m4msg = M4HasException ? M4ExceptionMessage : m4result.ToString();
return $"M1: {m1msg}, M2: {m2msg}, M3: {m3msg}, M4: {m4msg}";
}
How to capture the individual exceptions of the failed task?
For example, if only M2 threw an exception,
"M1: 1, M2: Excpetion...., M3: 3, M4: 4"
Each task has a Status and and Exception property.
You may want to see if it has faulted:
myTask.Status == TaskStatus.Faulted
Or if it has excepted:
if (myTask.Exception != null)
You can use ContinueWhenAll to run all the tasks and then check the status.
See the docs here.
As other answers/comments pointed out, one possible approach is by using ContinueWith or ContinueWhenAll. This is a clever trick because Task has the Exception property:
Gets the AggregateException that caused the Task to end prematurely.
If the Task completed successfully or has not yet thrown any
exceptions, this will return null.
Using ContinueWith whether a task completes successfully or not, it will be passed as the argument to the delegate function. From there you can check if an exception was thrown.
Task<string> GetStringedResult<T>(Task<T> initialTask)
{
return initialTask.ContinueWith(t => {
return t.Exception?.InnerException.Message ?? t.Result.ToString();
});
}
async Task<string> RunAll()
{
string m1result, m2result, m3result, m4result;
var m1task = GetStringedResult(M1());
var m2task = GetStringedResult(M2());
var m3task = GetStringedResult(M3());
var m4task = GetStringedResult(M4());
m1result = await m1task;
m2result = await m2task;
m3result = await m3task;
m4result = await m4task;
return $"M1: {m1result}, M2: {m2result}, M3: {m3result}, M4: {m4result}";
}
You can wrap the tasks inside a WaitAll and catch the AggregateException (docs),
try
{
Task.WaitAll(new[] { task1, task2 }, token);
}
catch (AggregateException ae)
{
foreach (var ex in ae.InnerExceptions)
//Do what ever you want with the ex.
}
Could you wrap each await in try-catch block and capture the exception message if any, seems feasible...
var results = new List<string>();
try { results.Add(await t1); } catch { results.Add("Exception"); };
try { results.Add(await t2); } catch { results.Add("Exception"); };
try { results.Add(await t3); } catch { results.Add("Exception"); };
return string.Join("|", results);
if you want to use WhenAll you could await for it and ignore exceptions and then do the same exercise as shown above to retrieve individual task results...
try { await Task.WhenAll(t1, t2, t3); } catch { };
// ^^^^^^^^^
// then same as ^ above
I am starting 2 channels in the mediaservices azure portal.
Starting a channel takes a long time to complete, about 25-30 seconds per channel. Hence, multithreading :)
However, the following is not clear to me:
I have 2 methods:
public async Task<bool> StartAsync(string programName, CancellationToken token = default(CancellationToken))
{
var workerThreads = new List<Thread>();
var results = new List<bool>();
foreach (var azureProgram in _accounts.GetPrograms(programName))
{
var thread = new Thread(() =>
{
var result = StartChannelAsync(azureProgram).Result;
lock (results)
{
results.Add(result);
}
});
workerThreads.Add(thread);
thread.Start();
}
foreach (var thread in workerThreads)
{
thread.Join();
}
return results.All(r => r);
}
and
private async Task<bool> StartChannelAsync(IProgram azureProgram)
{
var state = _channelFactory.ConvertToState(azureProgram.Channel.State);
if (state == State.Running)
{
return true;
}
if (state.IsTransitioning())
{
return false;
}
await azureProgram.Channel.StartAsync();
return true;
}
in the first method I use
var result = StartChannelAsync(azureProgram).Result;
In this case everything works fine. But if I use
var result = await StartChannelAsync(azureProgram);
Executing is not awaited and my results has zero entries.
What am I missing here?
And is this a correct way?
Any comments on the code is appreciated. I am not a multithreading king ;)
Cheers!
Don't span new Thread instances to execute tasks in parallel, instead use Task.WhenAll:
public async Task<bool> StartAsync(string programName, CancellationToken token = default(CancellationToken))
{
// Create a task for each program and fire them "at the same time"
Task<bool>[] startingChannels = _accounts.GetPrograms(programName))
.Select(n => StartChannelAsync(n))
.ToArray();
// Create a task that will be completed when all the supplied tasks are done
bool[] results = await Task.WhenAll(startingChannels);
return results.All(r => r);
}
Note: I see that you're passing a CancellationToken to your StartAsync method, but you're not actually using it. Consider passing it as an argument to StartChannelAsync, and then use it when calling azureProgram.Channel.StartAsync
If you love one-liners:
public async Task<bool> StartAsync(string programName, CancellationToken token = default(CancellationToken))
{
return (await Task.WhenAll(_accounts.GetPrograms(programName)
.Select(p => StartChannelAsync(p))
.ToArray())).All(r => r);
}
I would like to fire several tasks while setting a timeout on them. The idea is to gather the results from the tasks that beat the clock, and cancel (or even just ignore) the other tasks.
I tried using extension methods WithCancellation as explained here, however throwing an exception caused WhenAll to return and supply no results.
Here's what I tried, but I'm opened to other directions as well (note however that I need to use await rather than Task.Run since I need the httpContext in the Tasks):
var cts = new CancellationTokenSource(TimeSpan.FromSeconds(3));
IEnumerable<Task<MyResults>> tasks =
from url in urls
select taskAsync(url).WithCancellation(cts.Token);
Task<MyResults>[] excutedTasks = null;
MyResults[] res = null;
try
{
// Execute the query and start the searches:
excutedTasks = tasks.ToArray();
res = await Task.WhenAll(excutedTasks);
}
catch (Exception exc)
{
if (excutedTasks != null)
{
foreach (Task<MyResults> faulted in excutedTasks.Where(t => t.IsFaulted))
{
// work with faulted and faulted.Exception
}
}
}
// work with res
EDIT:
Following #Servy's answer below, this is the implementation I went with:
var cts = new CancellationTokenSource(TimeSpan.FromSeconds(3));
IEnumerable<Task<MyResults>> tasks =
from url in urls
select taskAsync(url).WithCancellation(cts.Token);
// Execute the query and start the searches:
Task<MyResults>[] excutedTasks = tasks.ToArray();
try
{
await Task.WhenAll(excutedTasks);
}
catch (OperationCanceledException)
{
// Do nothing - we expect this if a timeout has occurred
}
IEnumerable<Task<MyResults>> completedTasks = excutedTasks.Where(t => t.Status == TaskStatus.RanToCompletion);
var results = new List<MyResults>();
completedTasks.ForEach(async t => results.Add(await t));
If any of the tasks fail to complete you are correct that WhenAll doesn't return the results of any that did complete, it just wraps an aggregate exception of all of the failures. Fortunately, you have the original collection of tasks, so you can get the results that completed successfully from there.
var completedTasks = excutedTasks.Where(t => t.Status == TaskStatus.RanToCompletion);
Just use that instead of res.
I tried you code and it worked just fine, except the cancelled tasks are in not in a Faulted state, but rather in the Cancelled. So if you want to process the cancelled tasks use t.IsCanceled instead. The non cancelled tasks ran to completion. Here is the code I used:
public static async Task MainAsync()
{
var urls = new List<string> {"url1", "url2", "url3", "url4", "url5", "url6"};
var cts = new CancellationTokenSource(TimeSpan.FromSeconds(3));
IEnumerable<Task<MyResults>> tasks =
from url in urls
select taskAsync(url).WithCancellation(cts.Token);
Task<MyResults>[] excutedTasks = null;
MyResults[] res = null;
try
{
// Execute the query and start the searches:
excutedTasks = tasks.ToArray();
res = await Task.WhenAll(excutedTasks);
}
catch (Exception exc)
{
if (excutedTasks != null)
{
foreach (Task<MyResults> faulted in excutedTasks.Where(t => t.IsFaulted))
{
// work with faulted and faulted.Exception
}
}
}
}
public static async Task<MyResults> taskAsync(string url)
{
Console.WriteLine("Start " + url);
var random = new Random();
var delay = random.Next(10);
await Task.Delay(TimeSpan.FromSeconds(delay));
Console.WriteLine("End " + url);
return new MyResults();
}
private static void Main(string[] args)
{
MainAsync().Wait();
}