Get Geolocation information from external API using Parallel programming

Get Geolocation information from external API using Parallel programming - c#

I am using postcodes.io api to get Geolocation information based on postcodes. It's working fine. But this API has a limitation that it can accept only 100 postcodes per request. I have around 300 postcodes. I am thinking of making calls to API parallely and aggregate the response after getting everything.
var httpClient = _httpClientFactory.GetHttpClient("GeolocationAPI", postCodeServiceUrl, _httpLogger);
// chunks the list by 100 items and returns the collection
var postcodeChunks = postcodes.ChunkBy(100);
postcodeChunks.ForEach(
postcodeList =>
{
var response = httpClient
.Post
<MultipleGeolocationInfoRequest,
GetGeolocationResult<GetGeolocationResult<GeolocationInfo>[]>>(
"postcodes",
new MultipleGeolocationInfoRequest {Postcodes = postcodeList.ToArray()});
}
);
ChunkBy extension method is returning the list of lists and each list containing 100 postcodes.
I am facing difficulty in aggregating the return response and handling exceptions if any. The response from each call is also a collection.
API : http://postcodes.io/

To aggregate results you can collect them into a collection of results first and than SelectMany:
List<Poscode[]> responses = new List<Poscode[]>();
postcodeChunks.ForEach(
postcodeList =>
{
try // handle exception for individual call
{
var response = httpClient.Post(....);
responses.Add(response);
}
catch(Exception ex)
{
// do something sensible with exception for request
// continue or abort (with throw)
}
}
// merge all results into single list
var aggregated = results.SelectMany(x => x).ToList();
Note that your code is not parallel at all - consider using Parallel.ForEach or asycn based code with Task.WhenAll to run all tasks in parallel.

Related

Parallelize C# Graph API SDK methods

I'm connecting to and fetching transitive groups data from MS Graph API via. following logic:
var queryOptions = new List<QueryOption>()
{
new QueryOption("$count", "true")
};
var lstTemp = graphClient.Groups[$"{groupID}"].TransitiveMembers
.Request(queryOptions)
.Header("ConsistencyLevel", "eventual")
.Select("id,mail,onPremisesSecurityIdentifier").Top(999)
.GetAsync().GetAwaiter().GetResult();
var lstGroups = lstTemp.CurrentPage.Where(x => x.ODataType.Contains("group")).ToList();
while (lstTemp.NextPageRequest != null)
{
lstTemp = lstTemp.NextPageRequest.GetAsync().GetAwaiter().GetResult();
lstGroups.AddRange(lstTemp.CurrentPage.Where(x => x.ODataType.Contains("group")).ToList());
}
Although the following logic works fine, for larger data set where the result count could be around 10K records or more, I've noticed the time required to fetch all of the results is around 10-12 seconds.
I'm looking for a solution by which we can parallelize (or multi-threading/tasking) API calls are executed in such a way that the overall time to get completed results is further reduced.
In C# we have Parallel.For etc. can I use it in this scenario to replace my regular While loop mentioned above?
Any suggestions?

Not really using the Parallel.For api, but you can execute a bunch of asynchronous tasks concurrently by throwing them into a List<Task<T>> and awaiting the whole list with Task.WhenAll. Your code may look something like this:
var queryOptions = new List<QueryOption>()
{
new QueryOption("$count", "true")
};
// Creating the first request
var firstRequest = graphClient.Groups[$"{groupID}"].TransitiveMembers
.Request(queryOptions)
.Header("ConsistencyLevel", "eventual")
.Select("id,mail,onPremisesSecurityIdentifier").Top(999)
.GetAsync();
// Creating a list of all requests (starting with the first one)
var requests = new List<Task<IGroupTransitiveMembersCollectionWithReferencesPage>>() { firstRequest };
// Awaiting the first response
var firstResponse = await firstRequest;
// Getting the total count from the request
var count = (int) firstResponse.AdditionalData["#odata.count"];
// Setting offset to the amount of data you already pulled
var offset = 999;
while (offset < count)
{
// Creating the next request
var nextRequest = graphClient.Groups[$"{groupID}"].TransitiveMembers
.Request() // Notice no $count=true (may potentially hurt performance and we don't need it anymore anyways)
.Header("ConsistencyLevel", "eventual")
.Select("id,mail,onPremisesSecurityIdentifier")
.Skip(offset).Top(999) // Skipping the data you already pulled
.GetAsync();
// Adding it to the list
requests.Add(nextRequest);
// Increasing the offset
offset += 999;
}
// Waiting for all the requests to finish
var allResponses = await Task.WhenAll(requests);
// This flattens the list while filtering as you did
allResponses
.Select(x => x.CurrentPage)
.SelectMany(x => x.Where(x => x.ODataType.Contains("group")));
Couldn't check if this code works without a Graph tenant, so you might need to modify a bit, but I hope you can see the general idea.
Also I allowed myself to refactor the code to use proper async/await since it's good and standard practice to do that, but it should work with .GetAwaiter().GetResult() if you can't use await in your context for some reason (please consider, though).

Run parallel task and return the first completed task and run other in background to save results

I have a WebApi in .NET CORE 3.1 in which I'm trying to get results from a service (other third party). I have created multiple requests in my API for the same service but some parameters of every request are different, the results return from service will be different for every request but structure of result will be same.
As all requests are independent of each other I want to run all that in parallel. And I want to return the first result as soon as received from the service from my API, but I also want to run all other requests in background and save there results in REDIS.
I tried to create a sample code to check if possible:
[HttpPost]
[Route("Test")]
public async Task<SearchResponse> Test(SearchRequest req)
{
List<Task<SearchResponse>> TaskList = new List<Task<SearchResponse>>();
for (int i = 0; i < 10; i++)
{
SearchRequest copyReq = Util.Copy(req); // my util function to copy the request
copyReq.ChangedParameter = i; // This is an example, many param can changed
TaskList.Add(Task.Run(() => DoSomething(copyReq)));
}
var finishedTask = await Task.WhenAny(TaskList);
return await finishedTask;
}
private async Task<SearchResponse> DoSomething(SearchRequest req)
{
// Here calling the third party service
SearchResponse resp = await service.GetResultAsync(req);
// Saving the result in REDIS
RedisManager.Save("KEY",resp);
return resp;
}
Now I'm wondering if this is correct way to dealing with this problem or not. If there is any better way please guide me to that.
EDIT
Use Case scenario
I have created a web app which will fetch results from my webapi and will display the results.
The WebApp searches for list of products (can be anything) by sending a request to my api. Now my api creates different requests as the source (Let's say Site1 and Site2) for results can be different.
Now the third party handles all requests to different sources(Site1 and Site2) and convert there results into my result structure. I have just to provide the parameter from which site i want to get results and then call the service at my end.
Now I want to send the results to my WebApp as soon as any source(site1 or site2) gives me the result, and in background I want to save the result of other source in redis. So that I can fetch that too from my webapp on other request hit.

The code looks pretty good; there's only one adjustment I'd recommend: don't use Task.Run. Task.Run causes a thread switch, which is totally unnecessary here.
[HttpPost]
[Route("Test")]
public async Task<SearchResponse> Test(SearchRequest req)
{
var TaskList = new List<Task<SearchResponse>>();
for (int i = 0; i < 10; i++)
{
SearchRequest copyReq = Util.Copy(req); // my util function to copy the request
copyReq.ChangedParameter = i; // This is an example, many param can changed
TaskList.Add(DoSomething(copyReq));
}
return await await Task.WhenAny(TaskList);
}
private async Task<SearchResponse> DoSomething(SearchRequest req)
{
// Here calling the third party service
SearchResponse resp = await service.GetResultAsync(req);
// Saving the result in REDIS
RedisManager.Save("KEY",resp);
return resp;
}
Note that this is using fire-and-forget. In the general sense, fire-and-forget is dangerous, since it means you don't care if the code fails or if it even completes. In this case, since the code is only updating a cache, fire-and-forget is acceptable.

C# Add to a List Asynchronously in API

I have an API which needs to be run in a loop for Mass processing.
Current single API is:
public async Task<ActionResult<CombinedAddressResponse>> GetCombinedAddress(AddressRequestDto request)
We are not allowed to touch/modify the original single API. However can be run in bulk, using foreach statement. What is the best way to run this asychronously without locks?
Current Solution below is just providing a list, would this be it?
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>();
foreach(AddressRequestDto request in requests)
{
var newCombinedAddress = (await GetCombinedAddress(request)).Value;
combinedAddressResponses.Add(newCombinedAddress);
}
return combinedAddressResponses;
}
Update:
In debugger, it has to go to combinedAddressResponse.Result.Value
combinedAddressResponse.Value = null
and Also strangely, writing combinedAddressResponse.Result.Value gives error below "Action Result does not contain a definition for for 'Value' and no accessible extension method

I'm writing this code off the top of my head without an IDE or sleep, so please comment if I'm missing something or there's a better way.
But effectively I think you want to run all your requests at once (not sequentially) doing something like this:
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>(requests.Count);
var tasks = new List<Task<ActionResult<CombinedAddressResponse>>(requests.Count);
foreach (var request in requests)
{
tasks.Add(Task.Run(async () => await GetCombinedAddress(request));
}
//This waits for all the tasks to complete
await tasks.WhenAll(tasks.ToArray());
combinedAddressResponses.AddRange(tasks.Select(x => x.Result.Value));
return combinedAddressResponses;
}

looking for a way to speed things up and run in parallel thanks
What you need is "asynchronous concurrency". I use the term "concurrency" to mean "doing more than one thing at a time", and "parallel" to mean "doing more than one thing at a time using threads". Since you're on ASP.NET, you don't want to use additional threads; you'd want to use a form of concurrency that works asynchronously (which uses fewer threads). So, Parallel and Task.Run should not be parts of your solution.
The way to do asynchronous concurrency is to build a collection of tasks, and then use await Task.WhenAll. E.g.:
public async Task<ActionResult<IReadOnlyList<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
// Build the collection of tasks by doing an asynchronous operation for each request.
var tasks = requests.Select(async request =>
{
var combinedAddressResponse = await GetCombinedAdress(request);
return combinedAddressResponse.Value;
}).ToList();
// Wait for all the tasks to complete and get the results.
var results = await Task.WhenAll(tasks);
return results;
}

Threading in c# while making put calls

I am new to threading world of c#. I read there are different ways to do threading like sequential.
My scenario is below. Which one would be more suitable for the below.
I have list of complex objects. I will be making calls to PUT endpoint for each object [body of put] separately. There can be 1000 or more objects in the list. And I cannot pass all the objects at one and hence I have to pass each object in every call to the put endpoint. In this way, I have to make 1000 calls separately if there are 1000 objects.
Each put call is independent of each other while I have to store the properties of the response back from each call.
I was thinking to apply threading concept to above but not sure which one and how to do it.
Any suggestions would be greatly appreciated.
Thanking in advance.
As per the comments below,
Putting the method signatures here and adding more details.
I have IEnumerable<CamelList>. For each camel, I have to make a put request call and update the table from the response of each call. I will write a new method that will accept this list and make use of below 2 methods to make call and update table. I have to ensure, I am making not more than 100 calls at the same time and the API I am calling can be called by the same user 100 times per minute.
We have a method as
public Camel SendRequest(handler, uri, route, Camel); //basically takes all the parameters and provide you the Camel.
We have a method as public void updateInTable(Entity Camel); //updates the table.

HTTP calls are typically made using the HttpClient class, whose HTTP methods are already asynchronous. You don't need to create your own threads or tasks.
All asynchronous methods return a Task or Task<T> value. You need to use theawaitkeyword to await for the operation to complete asynchronously - that means the thread is released until the operation completes. When that happens, execution resumes after theawait`.
You can see how to write a PUT request here. The example uses the PutAsJsonAsync method to reduce the boilerplate code needed to serialize a Product class into a string and create a StringContent class with the correct content type, eg:
var response = await client.PutAsJsonAsync($"api/products/{product.Id}", product);
response.EnsureSuccessStatusCode();
If you want to PUT 1000 products, all you need is an array or list with the products. You can use LINQ to make multiple calls and await the tasks they return at the end :
var callTasks = myProducts.Select(product=>client.PutAsJsonAsync($"api/products/{product.Id}", product);
var responses = await Task.WhenAll(callTasks);
This means that you have to wait for all requests to finish before you can check if any one succeeded. You can change the body of Select to await the response itself :
var callTasks = myProducts.Select(async product=>{
var response=await client.PutAsJsonAsync($"api/products/{product.Id}", product);
if (!response.IsSuccessStatusCode)
{
//Log the error
}
return response.StatusCode;
});
var responses=await Task.WhenAll(callTasks);
It's better to conver the lambda into a separate method though, eg PutProductAsync :
async Task<HttpStatusCode> PutProduct(Product product,HttpClient client)
{
var response=await client.PutAsJsonAsync($"api/products/{product.Id}", product);
if (!response.IsSuccessStatusCode)
{
//Log the error
}
return response.StatusCode;
};
var callTasks = myProducts.Select(product=>PutProductAsync(product));
var responses=await Task.WhenAll(callTasks);

I'm going to suggest using Microsoft's Reactive Framework for this. You need to NuGet "System.Reactive" to get the bits.
Then you can do this:
var urls = new string[1000]; //somehow populated;
Func<string, HttpContent, IObservable<string>> putCall = (u, c) =>
Observable
.Using(
() => new HttpClient(),
hc =>
from resp in Observable.FromAsync(() => hc.PutAsync(u, c))
from body in Observable.FromAsync(() => resp.Content.ReadAsStringAsync())
select body);
var callsPerTimeSpanAllowed = 100;
var timeSpanAllowed = TimeSpan.FromMinutes(1.0);
IObservable<IList<string>> bufferedIntervaledUrls =
Observable.Zip(
Observable.Interval(timeSpanAllowed),
urls.ToObservable().Buffer(callsPerTimeSpanAllowed),
(_, buffered_urls) => buffered_urls);
var query =
from bufferedUrls in bufferedIntervaledUrls
from url in bufferedUrls
from result in putCall(url, new StringContent("YOURCONTENTHERE"))
select new { url, result };
IDisposable subscription =
query
.Subscribe(
x => { /* do something with each `x.url` & `x.result` */ },
() => { /* do something when it is all finished */ });
This code is breaking the URLs into blocks (or buffers) of 100 and putting them on a timeline (or interval) of 1 minute apart. It then calls the putCall for each URL and returns the result.
It's probably a little advanced for you now, but I thought this answer might be useful just to see how clean this can be.

Await all Tasks in the list, and after that update the list base on task result

I want to run all tasks with WhenAll (not one by one).
But after that I need to update list (LastReport property) base on result.
I think I have solution but I would like to check if there is better way.
Idea is to:
Run all tasks
Remember relation between configuration and task
Update configuration
My solution is:
var lastReportAllTasks = new List<Task<Dictionary<string, string>>>();
var configurationTaskRelation = new Dictionary<int, Task<Dictionary<string, string>>>();
foreach (var configuration in MachineConfigurations)
{
var task = machineService.GetReports(configuration);
lastReportAllTasks.Add(task);
configurationTaskRelation.Add(configuration.Id, task);
}
await Task.WhenAll(lastReportAllTasks);
foreach (var configuration in MachineConfigurations)
{
var lastReportTask = configurationTaskRelation[configuration.Id];
configuration.LastReport = await lastReportTask;
}

The Select function can be asynchronous itself. You can await the report and return both the configuration and result in the same result object (anonymous type or tuple, whatever you prefer) :
var tasks=MachineConfigurations.Select(async conf=>{
var report= await machineService.GetReports(conf);
return new {conf,report});
var results=await Task.WhenAll(tasks);
foreach(var pair in results)
{
pair.conf.LastReport=pair.report;
}
EDIT - Loops and error handling
As Servy suggested, Task.WhenAll can be ommited and awaiting can be moved inside the loop :
foreach(var task in tasks)
{
var pair=await task;
pair.conf.LastReport=pair.report;
}
The tasks will still execute concurrently. In case of exception though, some configuration objects will be modified and some not.
In general, this would be an ugly situation, requiring extra exception handling code to clean up the modified objects. Exception handling is a lot easier when modifications are done on-the-side and finalized/applied when the happy path completes. That's one reason why updating the Configuration objects inside the Select() requires careful consideration.
In this particular case though it may be better to "skip" the failed reports, possibly move them to an error queue and reprocess them at a later time. It may be better to have partial results than no results at all, as long as this behaviour is expected:
foreach(var task in tasks)
{
try
{
var pair=await task;
pair.conf.LastReport=pair.report;
}
catch(Exception exc)
{
//Make sure the error is logged
Log.Error(exc);
ErrorQueue.Enqueue(new ProcessingError(conf,ex);
}
}
//Handle errors after the loop
EDIT 2 - Dataflow
For completeness, I do have several thousand ticket reports to generate each day, and each GDS call (the service through which every travel agency sells tickets) takes considerable time. I can't run all requests at the same time - I start getting server serialization errors if I try more than 10 concurrent requests. I can't retry everything either.
In this case I used TPL DataFlow combined with some Railway oriented programming tricks. An ActionBlock with a DOP of 8 processes the ticket requests. The results are wrapped in a Success class and sent to the next block. Failed requests and exceptions are wrapped in a Failure class and sent to another block. Both classes inherit from IFlowEnvelope which has a Successful flag. Yes, that's F# Discriminated Union envy.
This is combined with some retry logic for timeouts etc.
In pseudocode the pipeline looks like this :
var reportingBlock=new TransformBlock<Ticket,IFlowEnvelope<TicketReport>(reportFunc,dopOptions);
var happyBlock = new ActionBlock<IFlowEnvelope<TicketReport>>(storeToDb);
var errorBlock = new ActionBlock<IFlowEnvelope<TicketReport>>(logError);
reportingBlock.LinkTo(happyBlock,linkOptions,msg=>msg.Success);
reportingBlock.LinkTo(errorBlock,linkOptions,msg=>!msg.Success);
foreach(var ticket in tickets)
{
reportingBlock.Post(ticket);
}
reportFunc catches any exceptions and wraps them as Failure<T> objects:
async Task<IFlowEnvelope<Ticket,TicketReport>> reportFunc(Ticket ticket)
{
try
{
//Do the heavy processing
return new Success<TicketReport>(report);
}
catch(Exception exc)
{
//Construct an error message, msg
return new Failure<TicketReport>(report,msg);
}
}
The real pipeline includes steps that parse daily reports and individual tickets. Each call to the GDS takes 1-6 seconds so the complexity of the pipeline is justified.

I think you don't need Lists or Dictionaries. Why not simple loop which updates LastReport with results
foreach (var configuration in MachineConfigurations)
{
configuration.LastReport = await machineService.GetReports(configuration);
}
For executing all reports "in parallel"
Func<Configuration, Task> loadReport =
async config => config.LastReport = await machineService.GetReports(config);
await Task.WhenAll(MachineConfigurations.Select(loadReport));
And very poor try to be more functional.
Func<Configuration, Task<Configuration>> getConfigWithReportAsync =
async config =>
{
var report = await machineService.GetReports(config);
return new Configuration
{
Id = config.Id,
LastReport = report
};
}
var configsWithUpdatedReports =
await Task.WhenAll(MachineConfigurations.Select(getConfigWithReportAsync));

using System.Linq;
var taskResultsWithConfiguration = MachineConfigurations.Select(conf =>
new { Conf = conf, Task = machineService.GetReports(conf) }).ToList();
await Task.WhenAll(taskResultsWithConfiguration.Select(pair => pair.Task));
foreach (var pair in taskResultsWithConfiguration)
pair.Conf.LastReport = pair.Task.Result;

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Get Geolocation information from external API using Parallel programming - c#

Related

Parallelize C# Graph API SDK methods

Run parallel task and return the first completed task and run other in background to save results

C# Add to a List Asynchronously in API

Threading in c# while making put calls

Await all Tasks in the list, and after that update the list base on task result

Categories

Resources