I have a method that attempts to download data from several URLs in Parallel, and return an IEnumerable of Deserialized types
The method looks like this:
public IEnumerable<TContent> DownloadContentFromUrls(IEnumerable<string> urls)
{
var list = new List<TContent>();
Parallel.ForEach(urls, url =>
{
lock (list)
{
_httpClient.GetAsync(url).ContinueWith(request =>
{
var response = request.Result;
//todo ensure success?
response.Content.ReadAsStringAsync().ContinueWith(text =>
{
var results = JObject.Parse(text.Result)
.ToObject<IEnumerable<TContent>>();
list.AddRange(results);
});
});
}
});
return list;
}
In my unit test (I stub out _httpClient to return a known set of text) I basically get
Sequence contains no elements
This is because the method is returning before the tasks have completed.
If I add .Wait() on the end of my .ContinueWith() calls, it passes, but I'm sure that I'm misusing the API here...
If you want a blocking call which downloads in parallel using the HttpClient.GetAsync method then you should implement it like so:
public IEnumerable<TContent> DownloadContentFromUrls<TContent>(IEnumerable<string> urls)
{
var queue = new ConcurrentQueue<TContent>();
using (var client = new HttpClient())
{
Task.WaitAll(urls.Select(url =>
{
return client.GetAsync(url).ContinueWith(response =>
{
var content = JsonConvert.DeserializeObject<IEnumerable<TContent>>(response.Result.Content.ReadAsStringAsync().Result);
foreach (var c in content)
queue.Enqueue(c);
});
}).ToArray());
}
return queue;
}
This creates an array of tasks, one for each Url, which represents a GetAsync/Deserialize operation. This is assuming that the Url returns a Json array of TContent. An empty array or a single member array will deserialize fine, but not a single array-less object.
Related
So I had to create dozens of API requests and get json to make it an object and put it in a list.
I also wanted the requests to be parallel because I do not care about the order in which the objects enter the list.
public ConcurrentBag<myModel> GetlistOfDstAsync()
{
var req = new RequestGenerator();
var InitializedObjects = req.GetInitializedObjects();
var myList = new ConcurrentBag<myModel>();
Parallel.ForEach(InitializedObjects, async item =>
{
RestRequest request = new RestRequest("resource",Method.GET);
request.AddQueryParameter("key", item.valueOne);
request.AddQueryParameter("key", item.valueTwo);
var results = await GetAsync<myModel>(request);
myList.Add(results);
});
return myList;
}
What creates a new problem, I do not understand how to put them in the list and it seems I do not use a solution that exists in a form ConcurrentBag
Is my assumption correct and I implement it wrong or should I use another solution?
I also wanted the requests to be parallel
What you actually want is concurrent requests. Parallel does not work as expected with async.
To do asynchronous concurrency, you start each request but do not await the tasks yet. Then, you can await all the tasks together and get the responses using Task.WhenAll:
public async Task<myModel[]> GetlistOfDstAsync()
{
var req = new RequestGenerator();
var InitializedObjects = req.GetInitializedObjects();
var tasks = InitializedObject.Select(async item =>
{
RestRequest request = new RestRequest("resource",Method.GET);
request.AddQueryParameter("key", item.valueOne);
request.AddQueryParameter("key", item.valueTwo);
return await GetAsync<myModel>(request);
}).ToList();
var results = await TaskWhenAll(tasks);
return results;
}
I am describing my problem in a simple example and then describing a more close problem.
Imagine We Have n items [i1,i2,i3,i4,...,in] in the box1 and we have a box2 that can handle m items to do them (m is usually much less than n) . The time required for each item is different. I want to always have doing m job items until all items are proceeded.
A much more close problem is that for example you have a list1 of n strings (URL addresses) of files and we want to have a system to have m files downloading concurrently (for example via httpclient.getAsync() method). Whenever downloading of one of m items finishes, another remaining item from list1 must be substituted as soon as possible and this must be countinued until all of List1 items proceeded.
(number of n and m are specified by users input at runtime)
How this can be done?
Here is a generic method you can use.
when you call this TIn will be string (URL addresses) and the asyncProcessor will be your async method that takes the URL address as input and returns a Task.
The SlimSemaphore used by this method is going to allow only n number of concurrent async I/O requests in real time, as soon as one completes the other request will execute. Something like a sliding window pattern.
public static Task ForEachAsync<TIn>(
IEnumerable<TIn> inputEnumerable,
Func<TIn, Task> asyncProcessor,
int? maxDegreeOfParallelism = null)
{
int maxAsyncThreadCount = maxDegreeOfParallelism ?? DefaultMaxDegreeOfParallelism;
SemaphoreSlim throttler = new SemaphoreSlim(maxAsyncThreadCount, maxAsyncThreadCount);
IEnumerable<Task> tasks = inputEnumerable.Select(async input =>
{
await throttler.WaitAsync().ConfigureAwait(false);
try
{
await asyncProcessor(input).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
return Task.WhenAll(tasks);
}
You should look in to TPL Dataflow, add the System.Threading.Tasks.Dataflow NuGet package to your project then what you want is as simple as
private static HttpClient _client = new HttpClient();
public async Task<List<MyClass>> ProcessDownloads(IEnumerable<string> uris,
int concurrentDownloads)
{
var result = new List<MyClass>();
var downloadData = new TransformBlock<string, string>(async uri =>
{
return await _client.GetStringAsync(uri); //GetStringAsync is a thread safe method.
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = concurrentDownloads});
var processData = new TransformBlock<string, MyClass>(
json => JsonConvert.DeserializeObject<MyClass>(json),
new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded});
var collectData = new ActionBlock<MyClass>(
data => result.Add(data)); //When you don't specifiy options dataflow processes items one at a time.
//Set up the chain of blocks, have it call `.Complete()` on the next block when the current block finishes processing it's last item.
downloadData.LinkTo(processData, new DataflowLinkOptions {PropagateCompletion = true});
processData.LinkTo(collectData, new DataflowLinkOptions {PropagateCompletion = true});
//Load the data in to the first transform block to start off the process.
foreach (var uri in uris)
{
await downloadData.SendAsync(uri).ConfigureAwait(false);
}
downloadData.Complete(); //Signal you are done adding data.
//Wait for the last object to be added to the list.
await collectData.Completion.ConfigureAwait(false);
return result;
}
In the above code only concurrentDownloads number of HttpClients will be active at any given time, unlimited threads will be processing the received strings and turning them in to objects, and a single thread will be taking those objects and adding them to a list.
UPDATE: here is a simplified example that only does what you asked for in the question
private static HttpClient _client = new HttpClient();
public void ProcessDownloads(IEnumerable<string> uris, int concurrentDownloads)
{
var downloadData = new ActionBlock<string>(async uri =>
{
var response = await _client.GetAsync(uri); //GetAsync is a thread safe method.
//do something with response here.
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = concurrentDownloads});
foreach (var uri in uris)
{
downloadData.Post(uri);
}
downloadData.Complete();
downloadData.Completion.Wait();
}
A simple solution for throttling is a SemaphoreSlim.
EDIT
After a slight alteration the code now creates the tasks when they are needed
var client = new HttpClient();
SemaphoreSlim semaphore = new SemaphoreSlim(m, m); //set the max here
var tasks = new List<Task>();
foreach(var url in urls)
{
// moving the wait here throttles the foreach loop
await semaphore.WaitAsync();
tasks.Add(((Func<Task>)(async () =>
{
//await semaphore.WaitAsync();
var response = await client.GetAsync(url); // possibly ConfigureAwait(false) here
// do something with response
semaphore.Release();
}))());
}
await Task.WhenAll(tasks);
This is another way to do it
var client = new HttpClient();
var tasks = new HashSet<Task>();
foreach(var url in urls)
{
if(tasks.Count == m)
{
tasks.Remove(await Task.WhenAny(tasks));
}
tasks.Add(((Func<Task>)(async () =>
{
var response = await client.GetAsync(url); // possibly ConfigureAwait(false) here
// do something with response
}))());
}
await Task.WhenAll(tasks);
Process items in parallel, limiting the number of simultaneous jobs:
string[] strings = GetStrings(); // Items to process.
const int m = 2; // Max simultaneous jobs.
Parallel.ForEach(strings, new ParallelOptions {MaxDegreeOfParallelism = m}, s =>
{
DoWork(s);
});
How i can get an iterator for use in Task.WaitAll to wait for all task exist in customList with lowest overhead or lines of code ?
public class Custom
{
public Task task;
public int result;
}
public class Main
{
void doSomething()
{
List<Custom> customList=new List<>();
//customList.Add(custom1,2,3,4,5.....);
Task.WaitAll(????)
}
}
Tasks aren't threads. They represent the execution of a function, so there is no point in storing them and their results as classes. Getting the results of multiple tasks is very easy if you use Task.WhenAll.
There is no reason to wrap the task in a class. You can think of Task itself as the wrapper over a thread invocation and is result.
If you want to retrieve the contents from multiple pages, you can use LINQ a single HttpClient to request the content from many pages at once. await Task.WhenAll will return the results from all tasks :
string[] myUrls=...;
HttpClient client=new HttpClient();
var tasks=myUrls.Select(url=>client.GetStringAsync(url));
string[] pages=await Task.WhenAll(tasks);
If you want to return more data, eg check the contents, you can do so inside the Select lambda:
var tasks=myUrls.Select(async url=>
{
var response=await client.GetStringAsync(url);
return new {
Url=url,
IsSensitive=response.Contains("sensitive"),
Response=response
};
});
var results =await Task.WhenAll(tasks);
Results contains the anonymous objects generated inside Select. You can iterate or query them just like any other array, eg:
foreach(var result in results)
{
Console.WriteLine(result.Url);
}
i have a static method that should return list. but i want to do an await inside the method.
public static List<ContactModel> CreateSampleData()
{
var data = new List<ContactModel>();
StorageFolder musiclibrary = KnownFolders.MusicLibrary;
artists = (await musiclibrary.GetFoldersAsync(CommonFolderQuery.GroupByAlbumArtist)).ToList();
for (var i = 0; i < artists.Count; i++)
{
try
{
data.Add(new ContactModel(artists[i].Name));
}
catch { }
}
return data;
}
when i make it
public static async Task<List<ContactModel>> CreateSampleData(){//method contents}
i get error on another page for this code
Error: Task<List<ContactModel>> doesnt contain a definition for ToAlphaGroups
var items = ContactModel.CreateSampleData();
data = items.ToAlphaGroups(x => x.Name);
You have to await your async method:
var items = await ContactModel.CreateSampleData();
Your method now returns a Task, thats why you get the error message.
I don't know whether I should mention this because I agree with the answer of Jan-Patric Ahnen.
But since you said you cannot add await to your code: Task has a property called Result that returns the "result" of the Task.
var items = ContactModel.CreateSampleData().Result;
data = items.ToAlphaGroups(x => x.Name);
A few things before you use Result:
Result blocks the calling thread, if called from the UI thread your app might become unresponsive
You should try to avoid Result at all cost and try to use await since Result can produce unexpected results.
I have multiple async methods that return the same type from different OAuth based REST Api calls.
If I call one directly, I can get the data back:
//Call a specific provider
public async Task<List<Contacts>> Get()
{
return await Providers.SpecificProvider.GetContacts();
}
However, if I try to loop through multiple accounts, the object is returning before the AWAIT finishes:
//Call all providers
public async Task<List<Contacts>> Get()
{
return await Providers.GetContactsFromAllProviders();
}
public async Task<List<Contacts>> GetContactsFromAllProviders()
{
var returnList = new List<Contacts>();
//Providers inherits from List<>, so it can be enumerated to trigger
//all objects in the collection
foreach (var provider in Providers)
{
var con = await provider.GetContacts();
returnList.Add(con);
}
return returnList;
}
I'm new to async and am probably missing something simple
The code you have provided will call the web service for each provider one by one in a serial fashion. I do not see how your statement the object is returning before the AWAIT finishes can be true because inside the loop the call to GetContacts is awaited.
However, instead of calling the providers serially you can call them in parallel which in most cases should decrease the execution time:
var tasks = providers.Select(provider => provider.GetContacts());
// Tasks execute in parallel - wait for them all to complete.
var result = await Task.WhenAll(tasks);
// Combine the returned lists into a single list.
var returnList = result.SelectMany(contacts => contacts).ToList();