I have gone through many Stackoverflow threads but I am still not sure why this is taking so long. I am using VS 2022 and it's a console app and the target is .NET 6.0. There is just one file Program.cs where the function and the call to the function is coded.
I am making a GET call to an external API. Since that API returns 10000+ rows, I am trying to call my method that calls this API, 2-3 times in Parallel. I also try to update this Concurrent dictionary object that is declared at the top, which I then use LINQ to show some summaries on the UI.
This same external GET call on Postman takes less than 30 seconds but my app takes for ever.
Here is my code. Right now this entire code is in Program.cs of a Console application. I plan to move the GetAPIData() method to a class library after this works.
static async Task GetAPIData(string url, int taskNumber)
{
var client = new HttpClient();
var serializer = new JsonSerializer();
client.Timeout = TimeSpan.FromMilliseconds(Timeout.Infinite);
var bearerToken = "xxxxxxxxxxxxxx";
client.DefaultRequestHeaders.Add("Authorization", $"Bearer {bearerToken}");
using (var stream = await client.GetStreamAsync(url).ConfigureAwait(false))
using (var sr = new StreamReader(stream))
using (JsonTextReader reader = new JsonTextReader(sr))
{
reader.SupportMultipleContent = true;
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
bag.Add(serializer.Deserialize<Stats?>(reader));
}
}
}
}
My calling code:
var taskNumber=1;
var url = "https://extrenalsite/api/stream";
var bag = ConcurrentBag<Stats>();
var task1 = Task.Run(() => GetAPIData(url, bag, taskNumber++));
var task2 = Task.Run(() => GetAPIData(url, bag, taskNumber++));
var task3 = Task.Run(() => GetAPIData(url, bag, taskNumber++));
await Task.WhenAll(task1, task2, task3);
Please let me know why it is taking way too long to execute when I have spawned 3 threads and why it's slow.
Thanks.
If it is not necessary to immediately write to the collection, you can improve performance by collecting the results locally in the task method. This does not need to do thread synchronisation and will therefore be faster:
static async Task<List<Stats>> GetAPIData(string url, int taskNumber)
{
var result = new List<Stats>();
// your code, but instead of writing to `bag`:
list.Add(serializer.Deserialize<Stats>(reader));
// ...
return list;
}
var task1 = Task.Run(() => GetAPIData(url, taskNumber++));
var task2 = Task.Run(() => GetAPIData(url, taskNumber++));
var task3 = Task.Run(() => GetAPIData(url, taskNumber++));
await Task.WhenAll(task1, task2, task3);
var allResults = task1.Result.Concat(task2.Result).Concat(task3.Result);
Related
I have two services that ultimately both update the same object, so we have a test to ensure that the writes to that object complete (Under the hood we have retry policies on each).
9 times out of 10, one or more of the theories will fail, with the task.ShouldNotBeNull(); always being the assertion to fail. What am i getting wrong with the async code in this sample? Why would the task be null?
[Theory]
[InlineData(1)]
[InlineData(5)]
[InlineData(10)]
[InlineData(20)]
public async Task ConcurrencyIssueTest(int iterations)
{
var orderResult = await _driver.PlaceOrder();
var tasksA = new List<Task<ApiResponse<string>>>();
var tasksB = new List<Task<ApiResponse<string>>>();
await Task.Run(() => Parallel.For(1, iterations,
x =>
{
tasksA.Add(_Api.TaskA(orderResult.OrderId));
tasksB.Add(_Api.TaskB(orderResult.OrderId));
}));
//Check all tasks return successful
foreach (var task in tasksA)
{
task.ShouldNotBeNull();
var result = task.GetAwaiter().GetResult();
result.ShouldNotBeNull();
result.StatusCode.ShouldBe(HttpStatusCode.OK);
}
foreach (var task in tasksB)
{
task.ShouldNotBeNull();
var result = task.GetAwaiter().GetResult();
result.ShouldNotBeNull();
result.StatusCode.ShouldBe(HttpStatusCode.OK);
}
}
}
There's no need for Tasks and Parrallel looping here. I'm presuming that your _api calls are IO bound? You want something more like this:
var tasksA = new List<Task<ApiResponse<string>>>();
var tasksB = new List<Task<ApiResponse<string>>>();
//fire off all the async tasks
foreach(var it in iterations){
tasksA.Add(_Api.TaskA(orderResult.OrderId));
tasksB.Add(_Api.TaskB(orderResult.OrderId));
}
//await the results
await Task.WhenAll(tasksA).ConfigureAwait(false);
foreach (var task in tasksA)
{
//no need to get GetAwaiter(), you've awaited above.
task.Result;
}
//to get the most out of the async only await them just before you need them
await Task.WhenAll(tasksB).ConfigureAwait(false);
foreach (var task2 in tasksB)
{
task2.Result;
}
this will fire all your api calls async then block while the results return. You Parallel for and tasks are just using additional thread pool threads to zero benefit.
If _api is CPU bound you could get benefit from Task.Run but I'm guessing these are web api or something. So the Task.Run is doing nothing but using an additional thread.
As others have suggested, remove the Parallel, and await on all tasks to finish before asserting them.
I would also recommend to remove .Result from each task, and await them instead.
public async Task ConcurrencyIssueTest(int iterations)
{
var orderResult = await _driver.PlaceOrder();
var taskA = _Api.TaskA(orderResult.OrderId);
var taskB = _Api.TaskB(orderResult.OrderId);
await Task.WhenAll(taskA, taskB);
var taskAResult = await taskA;
taskAResult.ShouldNotBeNull();
taskAResult.StatusCode.ShouldBe(HttpStatusCode.OK);
var taskBResult = await taskB;
taskBResult.ShouldNotBeNull();
taskBResult.StatusCode.ShouldBe(HttpStatusCode.OK);
}
I am describing my problem in a simple example and then describing a more close problem.
Imagine We Have n items [i1,i2,i3,i4,...,in] in the box1 and we have a box2 that can handle m items to do them (m is usually much less than n) . The time required for each item is different. I want to always have doing m job items until all items are proceeded.
A much more close problem is that for example you have a list1 of n strings (URL addresses) of files and we want to have a system to have m files downloading concurrently (for example via httpclient.getAsync() method). Whenever downloading of one of m items finishes, another remaining item from list1 must be substituted as soon as possible and this must be countinued until all of List1 items proceeded.
(number of n and m are specified by users input at runtime)
How this can be done?
Here is a generic method you can use.
when you call this TIn will be string (URL addresses) and the asyncProcessor will be your async method that takes the URL address as input and returns a Task.
The SlimSemaphore used by this method is going to allow only n number of concurrent async I/O requests in real time, as soon as one completes the other request will execute. Something like a sliding window pattern.
public static Task ForEachAsync<TIn>(
IEnumerable<TIn> inputEnumerable,
Func<TIn, Task> asyncProcessor,
int? maxDegreeOfParallelism = null)
{
int maxAsyncThreadCount = maxDegreeOfParallelism ?? DefaultMaxDegreeOfParallelism;
SemaphoreSlim throttler = new SemaphoreSlim(maxAsyncThreadCount, maxAsyncThreadCount);
IEnumerable<Task> tasks = inputEnumerable.Select(async input =>
{
await throttler.WaitAsync().ConfigureAwait(false);
try
{
await asyncProcessor(input).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
return Task.WhenAll(tasks);
}
You should look in to TPL Dataflow, add the System.Threading.Tasks.Dataflow NuGet package to your project then what you want is as simple as
private static HttpClient _client = new HttpClient();
public async Task<List<MyClass>> ProcessDownloads(IEnumerable<string> uris,
int concurrentDownloads)
{
var result = new List<MyClass>();
var downloadData = new TransformBlock<string, string>(async uri =>
{
return await _client.GetStringAsync(uri); //GetStringAsync is a thread safe method.
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = concurrentDownloads});
var processData = new TransformBlock<string, MyClass>(
json => JsonConvert.DeserializeObject<MyClass>(json),
new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded});
var collectData = new ActionBlock<MyClass>(
data => result.Add(data)); //When you don't specifiy options dataflow processes items one at a time.
//Set up the chain of blocks, have it call `.Complete()` on the next block when the current block finishes processing it's last item.
downloadData.LinkTo(processData, new DataflowLinkOptions {PropagateCompletion = true});
processData.LinkTo(collectData, new DataflowLinkOptions {PropagateCompletion = true});
//Load the data in to the first transform block to start off the process.
foreach (var uri in uris)
{
await downloadData.SendAsync(uri).ConfigureAwait(false);
}
downloadData.Complete(); //Signal you are done adding data.
//Wait for the last object to be added to the list.
await collectData.Completion.ConfigureAwait(false);
return result;
}
In the above code only concurrentDownloads number of HttpClients will be active at any given time, unlimited threads will be processing the received strings and turning them in to objects, and a single thread will be taking those objects and adding them to a list.
UPDATE: here is a simplified example that only does what you asked for in the question
private static HttpClient _client = new HttpClient();
public void ProcessDownloads(IEnumerable<string> uris, int concurrentDownloads)
{
var downloadData = new ActionBlock<string>(async uri =>
{
var response = await _client.GetAsync(uri); //GetAsync is a thread safe method.
//do something with response here.
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = concurrentDownloads});
foreach (var uri in uris)
{
downloadData.Post(uri);
}
downloadData.Complete();
downloadData.Completion.Wait();
}
A simple solution for throttling is a SemaphoreSlim.
EDIT
After a slight alteration the code now creates the tasks when they are needed
var client = new HttpClient();
SemaphoreSlim semaphore = new SemaphoreSlim(m, m); //set the max here
var tasks = new List<Task>();
foreach(var url in urls)
{
// moving the wait here throttles the foreach loop
await semaphore.WaitAsync();
tasks.Add(((Func<Task>)(async () =>
{
//await semaphore.WaitAsync();
var response = await client.GetAsync(url); // possibly ConfigureAwait(false) here
// do something with response
semaphore.Release();
}))());
}
await Task.WhenAll(tasks);
This is another way to do it
var client = new HttpClient();
var tasks = new HashSet<Task>();
foreach(var url in urls)
{
if(tasks.Count == m)
{
tasks.Remove(await Task.WhenAny(tasks));
}
tasks.Add(((Func<Task>)(async () =>
{
var response = await client.GetAsync(url); // possibly ConfigureAwait(false) here
// do something with response
}))());
}
await Task.WhenAll(tasks);
Process items in parallel, limiting the number of simultaneous jobs:
string[] strings = GetStrings(); // Items to process.
const int m = 2; // Max simultaneous jobs.
Parallel.ForEach(strings, new ParallelOptions {MaxDegreeOfParallelism = m}, s =>
{
DoWork(s);
});
Whats the best way to a parallel processing in c# with some async methods.
Let me explain with some simple code
Example Scenario: We have a person and 1000 text files from them. we want to check that his text files does not contain sensitive keywords, and if one of his text files contains sensitive keywords, we mark him with the untrusted. The method which check this is an async method, and as fast as we found one of the sensitive keywords further processing is not required and checking loop must be broke for that person.
For the best performance and making it so fast, we must use Parallel processing
simple psudocode:
boolean sesitivedetected=false;
Parallel.ForEach(textfilecollection,async (textfile,parallelloopstate)=>
{
if (await hassensitiveasync(textfile))
{
sensitivedetected=true;
parallelloopstate.break()
}
}
if (sensitivedetected)
markuntrusted(person)
Problem is that Parallel.ForEach don't wait until completion of async tasks so statement if (sensitivedetected) is runned as soon as creating task are finished.
I read other Questions like write parallel.for with async and async/await and Parallel.For and lots of other pages.
This topics are usefull when you need the results of async methods to be collected and used later, but in my scenario execution of loop should be ended as soon as possible.
Update: Sample code:
Boolean detected=false;
Parallel.ForEach(UrlList, async (url, pls) =>
{
using (HttpClient hc = new HttpClient())
{
var result = await hc.GetAsync(url);
if ((await result.Content.ReadAsStringAsync()).Contains("sensitive"))
{
detected = true;
pls.Break();
}
}
});
if (detected)
Console.WriteLine("WARNING");
The simplest way to achieve what you need (and not what you want, because Threading is evil). Is to use ReactiveExtensions.
var firstSensitive = await UrlList
.Select(async url => {
using(var http = new HttpClient()
{
var result = await hc.GetAsync(url);
return await result.Content.ReadAsStringAsync();
}
})
.SelectMany(downloadTask => downloadTask.ToObservable())
.Where(result => result.Contains("sensitive"))
.FirstOrDefaultAsync();
if(firstSensitive != null)
Console.WriteLine("WARNING");
To limit the number of concurrent HTTP queries :
int const concurrentRequestLimit = 4;
var semaphore = new SemaphoreSlim(concurrentRequestLimit);
var firstSensitive = await UrlList
.Select(async url => {
await semaphore.WaitAsync()
try
using(var http = new HttpClient()
{
var result = await hc.GetAsync(url);
return await result.Content.ReadAsStringAsync();
}
finally
semaphore.Release();
})
.SelectMany(downloadTask => downloadTask.ToObservable())
.Where(result => result.Contains("sensitive"))
.FirstOrDefaultAsync();
if(firstSensitive != null)
Console.WriteLine("WARNING");
Update
Added the missing code for adding in taskList
I have a list of task, that I await on..
var files = Directory.GetFiles(myFilesDirectory);
var listOfTasks = new List<Tasks>();
files.ToList().ForEach(file => {
var localFile = file // to avoid any closure issue
listOfTasks.Add(ProcessMyFileTask(localFile));
});
await Task.WhenAll(listOfTasks.ToArray());
Console.WriteLine("All done!");
Here's the ProcessMyFileTask
private async Task<List<string>> ProcessMyFileTask(string filePath)
{
using (var streamReader = File.OpenText(filePath))
{
string line;
if ((line = await streamReader.ReadLineAsync()) != null)
{
return DumpHexInLog(line);
}
return null;
}
}
The message shows up when all files are processed. But if I add a continuation task, like this..
var files = Directory.GetFiles(myFilesDirectory);
var listOfTasks = new List<Tasks>();
files.ToList().ForEach(file => {
var localFile = file // to avoid any closure issue
listOfTasks.Add(ProcessMyFileTask(localFile).ContinueWith(list =>
ValidateHexDumpsTask(list.Result, localFile)));
});
await Task.WhenAll(listOfTasks.ToArray());
Console.WriteLine("All done!");
Then what Tasks will be awaited on? I mean would "All Done!" come after all the ProcessMyFileTask is done? Or will it come after all the ValidateHexDumpsTask are done too?
When I tested it, it came after the ValidateHexDumpsTask but I am not sure if that will certainly be always the case, as this might have been because of some threading condition or such.
It will complete only when both the ProcessMyFiles methods and ValidateHexDumps are done.
However, ContinueWith is not recommended. It's a low-level, dangerous API. You should use await instead:
var files = Directory.GetFiles(myFilesDirectory);
var listOfTasks = files.Select(ProcessAndValidateAsync);
await Task.WhenAll(listOfTasks);
Console.WriteLine("All done!");
async Task ProcessAndValidateAsync(string file)
{
var list = await ProcessMyFileTask(localFile);
ValidateHexDumps(list, localFile);
}
WhenAll will be completed when ValidateHexDumps has finished running on each of the items.
ContinueWith returns a Task that represents the completion of the continuation, not the completion of the task that it is a continuation of.
I'm trying to process each individual request as it finishes, which is what would happen in the ContinueWith after the GetStringAsync and then when they've all completed have an additional bit of processing.
However, it seems that the ContinueWith on the WhenAll fires right away. It appears as if the GetStringAsync tasks are faulting, but I can't figure out why.
When I use WaitAll instead of WhenAll and just add my processing after the WaitAll then my requests work just fine. But when I change it to WhenAll it fails.
Here is an example of my code:
using (var client = new HttpClient())
{
Task.WhenAll(services.Select(x =>
{
return client.GetStringAsync(x.Url).ContinueWith(response =>
{
Console.WriteLine(response.Result);
}, TaskContinuationOptions.AttachedToParent);
}).ToArray())
.ContinueWith(response =>
{
Console.WriteLine("All tasks completed");
});
}
You shouldn't use ContinueWith and TaskContinuationOptions.AttachedToParent when using async-await. Use async-await only, instead:
async Task<IEnumerable<string>> SomeMethod(...)
{
using (var client = new HttpClient())
{
var ss = await Task.WhenAll(services.Select(async x =>
{
var s = await client.GetStringAsync(x.Url);
Console.WriteLine(response);
return s;
};
Console.WriteLine("All tasks completed");
return ss;
}
}
Well, I found the issue. I'll leave it here in case anyone else comes along looking for a similar answer. I still needed to await the Task.WhenAll method.
So, the correct code would be:
using (var client = new HttpClient())
{
await Task.WhenAll(services.Select(x =>
{
return client.GetStringAsync(x.Url).ContinueWith(response =>
{
Console.WriteLine(response.Result);
}, TaskContinuationOptions.AttachedToParent);
}).ToArray())
.ContinueWith(response =>
{
Console.WriteLine("All tasks completed");
});
}
I still see a couple issues with your solution:
Drop the using statement - you don't want to dispose HttpClient.
Drop the ContinueWiths - you don't need them if your are using await properly.
The Task.WhenAny approach described in this MSDN article is a somewhat cleaner way to process tasks as they complete.
I would re-write your example like this:
var client = new HttpClient();
var tasks = services.Select(x => client.GetStringAsync(x.Url)).ToList();
while (tasks.Count > 0)
{
var firstDone = await Task.WhenAny(tasks);
tasks.Remove(firstDone);
Console.WriteLine(await firstDone);
}
Console.WriteLine("All tasks completed");
Edit to address comment:
If you need access to the service object as the tasks complete, one way would be to modify tasks to be a list of Task<ObjectWithMoreData> instead of Task<string>. Notice the lambda is marked async so we can await within it:
var client = new HttpClient();
var tasks = services.Select(async x => new
{
Service = x,
ResponseText = await client.GetStringAsync(x.Url)
}).ToList();
while (tasks.Count > 0)
{
var firstDone = await Task.WhenAny(tasks);
tasks.Remove(firstDone);
var result = await firstDone;
Console.WriteLine(result.ResponseText);
// do something with result.Service
}
Console.WriteLine("All tasks completed");