I am new to threading world of c#. I read there are different ways to do threading like sequential.
My scenario is below. Which one would be more suitable for the below.
I have list of complex objects. I will be making calls to PUT endpoint for each object [body of put] separately. There can be 1000 or more objects in the list. And I cannot pass all the objects at one and hence I have to pass each object in every call to the put endpoint. In this way, I have to make 1000 calls separately if there are 1000 objects.
Each put call is independent of each other while I have to store the properties of the response back from each call.
I was thinking to apply threading concept to above but not sure which one and how to do it.
Any suggestions would be greatly appreciated.
Thanking in advance.
As per the comments below,
Putting the method signatures here and adding more details.
I have IEnumerable<CamelList>. For each camel, I have to make a put request call and update the table from the response of each call. I will write a new method that will accept this list and make use of below 2 methods to make call and update table. I have to ensure, I am making not more than 100 calls at the same time and the API I am calling can be called by the same user 100 times per minute.
We have a method as
public Camel SendRequest(handler, uri, route, Camel); //basically takes all the parameters and provide you the Camel.
We have a method as public void updateInTable(Entity Camel); //updates the table.
HTTP calls are typically made using the HttpClient class, whose HTTP methods are already asynchronous. You don't need to create your own threads or tasks.
All asynchronous methods return a Task or Task<T> value. You need to use theawaitkeyword to await for the operation to complete asynchronously - that means the thread is released until the operation completes. When that happens, execution resumes after theawait`.
You can see how to write a PUT request here. The example uses the PutAsJsonAsync method to reduce the boilerplate code needed to serialize a Product class into a string and create a StringContent class with the correct content type, eg:
var response = await client.PutAsJsonAsync($"api/products/{product.Id}", product);
response.EnsureSuccessStatusCode();
If you want to PUT 1000 products, all you need is an array or list with the products. You can use LINQ to make multiple calls and await the tasks they return at the end :
var callTasks = myProducts.Select(product=>client.PutAsJsonAsync($"api/products/{product.Id}", product);
var responses = await Task.WhenAll(callTasks);
This means that you have to wait for all requests to finish before you can check if any one succeeded. You can change the body of Select to await the response itself :
var callTasks = myProducts.Select(async product=>{
var response=await client.PutAsJsonAsync($"api/products/{product.Id}", product);
if (!response.IsSuccessStatusCode)
{
//Log the error
}
return response.StatusCode;
});
var responses=await Task.WhenAll(callTasks);
It's better to conver the lambda into a separate method though, eg PutProductAsync :
async Task<HttpStatusCode> PutProduct(Product product,HttpClient client)
{
var response=await client.PutAsJsonAsync($"api/products/{product.Id}", product);
if (!response.IsSuccessStatusCode)
{
//Log the error
}
return response.StatusCode;
};
var callTasks = myProducts.Select(product=>PutProductAsync(product));
var responses=await Task.WhenAll(callTasks);
I'm going to suggest using Microsoft's Reactive Framework for this. You need to NuGet "System.Reactive" to get the bits.
Then you can do this:
var urls = new string[1000]; //somehow populated;
Func<string, HttpContent, IObservable<string>> putCall = (u, c) =>
Observable
.Using(
() => new HttpClient(),
hc =>
from resp in Observable.FromAsync(() => hc.PutAsync(u, c))
from body in Observable.FromAsync(() => resp.Content.ReadAsStringAsync())
select body);
var callsPerTimeSpanAllowed = 100;
var timeSpanAllowed = TimeSpan.FromMinutes(1.0);
IObservable<IList<string>> bufferedIntervaledUrls =
Observable.Zip(
Observable.Interval(timeSpanAllowed),
urls.ToObservable().Buffer(callsPerTimeSpanAllowed),
(_, buffered_urls) => buffered_urls);
var query =
from bufferedUrls in bufferedIntervaledUrls
from url in bufferedUrls
from result in putCall(url, new StringContent("YOURCONTENTHERE"))
select new { url, result };
IDisposable subscription =
query
.Subscribe(
x => { /* do something with each `x.url` & `x.result` */ },
() => { /* do something when it is all finished */ });
This code is breaking the URLs into blocks (or buffers) of 100 and putting them on a timeline (or interval) of 1 minute apart. It then calls the putCall for each URL and returns the result.
It's probably a little advanced for you now, but I thought this answer might be useful just to see how clean this can be.
Related
I have a WebApi in .NET CORE 3.1 in which I'm trying to get results from a service (other third party). I have created multiple requests in my API for the same service but some parameters of every request are different, the results return from service will be different for every request but structure of result will be same.
As all requests are independent of each other I want to run all that in parallel. And I want to return the first result as soon as received from the service from my API, but I also want to run all other requests in background and save there results in REDIS.
I tried to create a sample code to check if possible:
[HttpPost]
[Route("Test")]
public async Task<SearchResponse> Test(SearchRequest req)
{
List<Task<SearchResponse>> TaskList = new List<Task<SearchResponse>>();
for (int i = 0; i < 10; i++)
{
SearchRequest copyReq = Util.Copy(req); // my util function to copy the request
copyReq.ChangedParameter = i; // This is an example, many param can changed
TaskList.Add(Task.Run(() => DoSomething(copyReq)));
}
var finishedTask = await Task.WhenAny(TaskList);
return await finishedTask;
}
private async Task<SearchResponse> DoSomething(SearchRequest req)
{
// Here calling the third party service
SearchResponse resp = await service.GetResultAsync(req);
// Saving the result in REDIS
RedisManager.Save("KEY",resp);
return resp;
}
Now I'm wondering if this is correct way to dealing with this problem or not. If there is any better way please guide me to that.
EDIT
Use Case scenario
I have created a web app which will fetch results from my webapi and will display the results.
The WebApp searches for list of products (can be anything) by sending a request to my api. Now my api creates different requests as the source (Let's say Site1 and Site2) for results can be different.
Now the third party handles all requests to different sources(Site1 and Site2) and convert there results into my result structure. I have just to provide the parameter from which site i want to get results and then call the service at my end.
Now I want to send the results to my WebApp as soon as any source(site1 or site2) gives me the result, and in background I want to save the result of other source in redis. So that I can fetch that too from my webapp on other request hit.
The code looks pretty good; there's only one adjustment I'd recommend: don't use Task.Run. Task.Run causes a thread switch, which is totally unnecessary here.
[HttpPost]
[Route("Test")]
public async Task<SearchResponse> Test(SearchRequest req)
{
var TaskList = new List<Task<SearchResponse>>();
for (int i = 0; i < 10; i++)
{
SearchRequest copyReq = Util.Copy(req); // my util function to copy the request
copyReq.ChangedParameter = i; // This is an example, many param can changed
TaskList.Add(DoSomething(copyReq));
}
return await await Task.WhenAny(TaskList);
}
private async Task<SearchResponse> DoSomething(SearchRequest req)
{
// Here calling the third party service
SearchResponse resp = await service.GetResultAsync(req);
// Saving the result in REDIS
RedisManager.Save("KEY",resp);
return resp;
}
Note that this is using fire-and-forget. In the general sense, fire-and-forget is dangerous, since it means you don't care if the code fails or if it even completes. In this case, since the code is only updating a cache, fire-and-forget is acceptable.
I have an API which needs to be run in a loop for Mass processing.
Current single API is:
public async Task<ActionResult<CombinedAddressResponse>> GetCombinedAddress(AddressRequestDto request)
We are not allowed to touch/modify the original single API. However can be run in bulk, using foreach statement. What is the best way to run this asychronously without locks?
Current Solution below is just providing a list, would this be it?
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>();
foreach(AddressRequestDto request in requests)
{
var newCombinedAddress = (await GetCombinedAddress(request)).Value;
combinedAddressResponses.Add(newCombinedAddress);
}
return combinedAddressResponses;
}
Update:
In debugger, it has to go to combinedAddressResponse.Result.Value
combinedAddressResponse.Value = null
and Also strangely, writing combinedAddressResponse.Result.Value gives error below "Action Result does not contain a definition for for 'Value' and no accessible extension method
I'm writing this code off the top of my head without an IDE or sleep, so please comment if I'm missing something or there's a better way.
But effectively I think you want to run all your requests at once (not sequentially) doing something like this:
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>(requests.Count);
var tasks = new List<Task<ActionResult<CombinedAddressResponse>>(requests.Count);
foreach (var request in requests)
{
tasks.Add(Task.Run(async () => await GetCombinedAddress(request));
}
//This waits for all the tasks to complete
await tasks.WhenAll(tasks.ToArray());
combinedAddressResponses.AddRange(tasks.Select(x => x.Result.Value));
return combinedAddressResponses;
}
looking for a way to speed things up and run in parallel thanks
What you need is "asynchronous concurrency". I use the term "concurrency" to mean "doing more than one thing at a time", and "parallel" to mean "doing more than one thing at a time using threads". Since you're on ASP.NET, you don't want to use additional threads; you'd want to use a form of concurrency that works asynchronously (which uses fewer threads). So, Parallel and Task.Run should not be parts of your solution.
The way to do asynchronous concurrency is to build a collection of tasks, and then use await Task.WhenAll. E.g.:
public async Task<ActionResult<IReadOnlyList<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
// Build the collection of tasks by doing an asynchronous operation for each request.
var tasks = requests.Select(async request =>
{
var combinedAddressResponse = await GetCombinedAdress(request);
return combinedAddressResponse.Value;
}).ToList();
// Wait for all the tasks to complete and get the results.
var results = await Task.WhenAll(tasks);
return results;
}
I have the following code, that I intend to run asynchronously. My goal is that GetPictureForEmployeeAsync() is called in parallel as many times as needed. I'd like to make sure that 'await' on CreatePicture does not prevent me from doing so.
public Task<Picture[]> GetPictures(IDictionary<string, string> tags)
{
var query = documentRepository.GetRepositoryQuery();
var employees = query.Where(doc => doc.Gender == tags["gender"]);
return Task.WhenAll(employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)));
}
private Task<Picture> GetPictureForEmployeeAsync(Employee employee, IDictionary<string, string> tags)
{
var base64PictureTask = blobRepository.GetBase64PictureAsync(employee.ID.ToString());
var documentTask = documentRepository.GetItemAsync(employee.ID.ToString());
return CreatePicture(tags, base64PictureTask, documentTask);
}
private static async Task<Picture> CreatePicture(IDictionary<string, string> tags, Task<string> base64PictureTask, Task<Employee> documentTask)
{
var document = await documentTask;
return new Picture
{
EmployeeID = document.ID,
Data = await base64PictureTask,
ID = document.ID.ToString(),
Tags = tags,
};
}
If I understand it correctly, Task.WhenAll() is not affected by the two awaited tasks inside CreatePicture() because GetPictureForEmployeeAsync() is not awaited. Am I right about this? If not, how should I restructure the code to achieve what I want?
I'd like to make sure that 'await' on CreatePicture does not prevent me from doing so.
It doesn't.
If I understand it correctly, Task.WhenAll() is not affected by the two awaited tasks inside CreatePicture() because GetPictureForEmployeeAsync() is not awaited. Am I right about this?
Yes and no. The WhenAll isn't limited in any way by the awaited tasks in CreatePicture, but that has nothing to do with whether GetPictureForEmployeeAsync is awaited or not. These two lines of code are equivalent in terms of behavior:
return Task.WhenAll(employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)));
return Task.WhenAll(employees.Select(async employee => await GetPictureForEmployeeAsync(employee, tags)));
I recommend reading my async intro to get a good understanding of how async and await work with tasks.
Also, since GetPictures has non-trivial logic (GetRepositoryQuery and evaluating tags["gender"]), I recommend using async and await for GetPictures, as such:
public async Task<Picture[]> GetPictures(IDictionary<string, string> tags)
{
var query = documentRepository.GetRepositoryQuery();
var employees = query.Where(doc => doc.Gender == tags["gender"]);
var tasks = employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)).ToList();
return await Task.WhenAll(tasks);
}
As a final note, you may find your code cleaner if you don't pass around "tasks meant to be awaited" - instead, await them first and pass their result values:
async Task<Picture> GetPictureForEmployeeAsync(Employee employee, IDictionary<string, string> tags)
{
var base64PictureTask = blobRepository.GetBase64PictureAsync(employee.ID.ToString());
var documentTask = documentRepository.GetItemAsync(employee.ID.ToString());
await Task.WhenAll(base64PictureTask, documentTask);
return CreatePicture(tags, await base64PictureTask, await documentTask);
}
static Picture CreatePicture(IDictionary<string, string> tags, string base64Picture, Employee document)
{
return new Picture
{
EmployeeID = document.ID,
Data = base64Picture,
ID = document.ID.ToString(),
Tags = tags,
};
}
The thing to keep in mind about calling an async method is that, as soon as an await statement is reached inside that method, control immediately goes back to the code that invoked the async method -- no matter where the await statement happens to be in the method. With a 'normal' method, control doesn't go back to the code that invokes that method until the end of that method is reached.
So in your case, you can do the following:
private async Task<Picture> GetPictureForEmployeeAsync(Employee employee, IDictionary<string, string> tags)
{
// As soon as we get here, control immediately goes back to the GetPictures
// method -- no need to store the task in a variable and await it within
// CreatePicture as you were doing
var picture = await blobRepository.GetBase64PictureAsync(employee.ID.ToString());
var document = await documentRepository.GetItemAsync(employee.ID.ToString());
return CreatePicture(tags, picture, document);
}
Because the first line of code in GetPictureForEmployeeAsync has an await, control will immediately go right back to this line...
return Task.WhenAll(employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)));
...as soon as it is invoked. This will have the effect of all of the employee items getting processed in parallel (well, sort of -- the number of threads that will be allotted to your application will be limited).
As an additional word of advice, if this application is hitting a database or web service to get the pictures or documents, this code will likely cause you issues with running out of available connections. If this is the case, consider using System.Threading.Tasks.Parallel and setting the maximum degree of parallelism, or use SemaphoreSlim to control the number of connections used simultaneously.
I am looking at using IObservable to get a response in a request-response environment within a c# async methods and replace some older callback based code, but I am finding that if a value is pushed (Subject.OnNext) to the observable but FirstAsync is not yet at the await, then the FirstAsync is never given that message.
Is there a straightforward way to make it work, without a 2nd task/thread plus synchronisation?
public async Task<ResponseMessage> Send(RequestMessage message)
{
var id = Guid.NewGuid();
var ret = Inbound.FirstAsync((x) => x.id == id).Timeout(timeout); // Never even gets invoked if response is too fast
await DoSendMessage(id, message);
return await ret; // Will sometimes miss the event/message
}
// somewhere else reading the socket in a loop
// may or may not be the thread calling Send
Inbound = subject.AsObservable();
while (cond)
{
...
subject.OnNext(message);
}
I cant put the await for the FirstAsync simply before I send the request, as that would prevent the request being sent.
The await will subscribe to the observable. You can separate the subscription from the await by calling ToTask:
public async Task<ResponseMessage> Send(RequestMessage message)
{
var id = Guid.NewGuid();
var ret = Inbound.FirstAsync((x) => x.id == id).Timeout(timeout).ToTask();
await DoSendMessage(id, message);
return await ret;
}
I took a closer look and there is a very easy solution to your problem by just converting hot into cold observable. Replace Subject with ReplaySubject. Here is the article: http://www.introtorx.com/content/v1.0.10621.0/14_HotAndColdObservables.html.
Here is the explanation:
The Replay extension method allows you take an existing observable
sequence and give it 'replay' semantics as per ReplaySubject. As a
reminder, the ReplaySubject will cache all values so that any late
subscribers will also get all of the values.
I am using postcodes.io api to get Geolocation information based on postcodes. It's working fine. But this API has a limitation that it can accept only 100 postcodes per request. I have around 300 postcodes. I am thinking of making calls to API parallely and aggregate the response after getting everything.
var httpClient = _httpClientFactory.GetHttpClient("GeolocationAPI", postCodeServiceUrl, _httpLogger);
// chunks the list by 100 items and returns the collection
var postcodeChunks = postcodes.ChunkBy(100);
postcodeChunks.ForEach(
postcodeList =>
{
var response = httpClient
.Post
<MultipleGeolocationInfoRequest,
GetGeolocationResult<GetGeolocationResult<GeolocationInfo>[]>>(
"postcodes",
new MultipleGeolocationInfoRequest {Postcodes = postcodeList.ToArray()});
}
);
ChunkBy extension method is returning the list of lists and each list containing 100 postcodes.
I am facing difficulty in aggregating the return response and handling exceptions if any. The response from each call is also a collection.
API : http://postcodes.io/
To aggregate results you can collect them into a collection of results first and than SelectMany:
List<Poscode[]> responses = new List<Poscode[]>();
postcodeChunks.ForEach(
postcodeList =>
{
try // handle exception for individual call
{
var response = httpClient.Post(....);
responses.Add(response);
}
catch(Exception ex)
{
// do something sensible with exception for request
// continue or abort (with throw)
}
}
// merge all results into single list
var aggregated = results.SelectMany(x => x).ToList();
Note that your code is not parallel at all - consider using Parallel.ForEach or asycn based code with Task.WhenAll to run all tasks in parallel.