I'm trying to parallelize work that relies on external resources, and combine it into a single resulting dictionary.
To illustrate my need, imagine I want to download a set of file, and put each result in a dictionary, where the key is the url:
string[] urls = { "http://msdn.microsoft.com", "http://www.stackoverflow.com", "http://www.google.com" };
var fileContentTask = GetUrls(urls);
fileContentTask.Wait();
Dictionary<string, string> result = fileContentTask.Result;
// Do something
However, I was able to code the GetUrls methode. I can generate all the tasks, but I didn't found how to consolidate the result in the dictionary:
static Task<Dictionary<string,string>> GetUrls(string[] urls)
{
var subTasks = from url in urls
let wc = new WebClient()
select wc.DownloadStringTaskAsync(url);
return Task.WhenAll(subTasks); // Does not compile
}
How can I merge the resulting tasks into a dictionary?
You need to perform the mapping yourself. For example, you could use:
static async Task<Dictionary<string,string>> GetUrls(string[] urls)
{
var tasks = urls.Select(async url =>
{
using (var client = new WebClient())
{
return new { url, content = await client.DownloadStringTaskAsync(url) };
};
}).ToList();
var results = await Task.WhenAll(tasks);
return results.ToDictionary(pair => pair.url, pair => pair.content);
}
Note how the method has to be async so that you can use await within it.
As an alternative to #Jon's answer, here is another working code (see comments to know why it's not working):
private static Task<Dictionary<string, string>> GetUrls(string[] urls)
{
var tsc = new TaskCompletionSource<Dictionary<string, string>>();
var subTasks = urls.ToDictionary(
url => url,
url =>
{
using (var wc = new WebClient())
{
return wc.DownloadStringTaskAsync(url);
}
}
);
Task.WhenAll(subTasks.Values).ContinueWith(allTasks =>
{
var actualResult = subTasks.ToDictionary(
task => task.Key,
task => task.Value.Result
);
tsc.SetResult(actualResult);
});
return tsc.Task;
}
Something that makes use of your existing linq:
static async Task<Dictionary<string, string>> GetUrls(string[] urls)
{
IEnumerable<Task<string>> subTasks = from url in urls
let wc = new WebClient()
select wc.DownloadStringTaskAsync(url);
var urlsAndData = subTasks.Zip(urls, async (data, url) => new { url, data = await data });
return (await Task.WhenAll(urlsAndData)).ToDictionary(a => a.url, a => a.data);
}
But as that does not dispose the WebClient, I would refactor out a method to make it like below. I've also added a Distinct call as there's no point downloading two urls that are the same only to fall-over when making a dictionary.
static async Task<Dictionary<string, string>> GetUrls(string[] urls)
{
var distinctUrls = urls
.Distinct().ToList();
var urlsAndData =
distinctUrls
.Select(DownloadStringAsync)
.Zip(distinctUrls, async (data, url) => new { url, data = await data });
return (await Task.WhenAll(urlsAndData)).ToDictionary(a => a.url, a => a.data);
}
private static async Task<string> DownloadStringAsync(string url)
{
using (var client = new WebClient())
{
return await client.DownloadStringTaskAsync(url);
}
}
Related
Well, I'm building web parsing app and having some troubles making it async.
I have a method which creates async tasks, and decorator for RestSharp so I can do requests via proxy. Basically in code it just does 5 tries of requesting the webpage.
Task returns RestResponse and it's status code is always 0. And this is the problem, because if I do the same synchronously, it works.
private static async Task<HtmlNode> GetTableAsync(int page)
{
ProxyClient client = new ProxyClient((name) =>ProxyProvider.GetCoreNoCD(name),
serviceName, 10000, 10000);
var task = client.TryGetAsync(new Uri(GetPageUrl(page)), (res) =>
{
return res.IsSuccessStatusCode && res.IsSuccessful;
},5);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml((await task).Content);
return doc.DocumentNode.SelectSingleNode("//div[#class=\"table_block\"]/table");
}
And this works as expected, but synchronously.
private static async Task<HtmlNode> GetTableAsync(int page)
{
ProxyClient client = new ProxyClient((name) =>ProxyProvider.GetCoreNoCD(name),
serviceName, 10000, 10000);
var task = client.TryGetAsync(new Uri(GetPageUrl(page)), (res) =>
{
return res.IsSuccessStatusCode && res.IsSuccessful;
},5);
task.Wait();
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(task.Result.Content);
return doc.DocumentNode.SelectSingleNode("//div[#class=\"table_block\"]/table");
}
ProxyClient's insides:
public async Task<RestResponse?> TryGetAsync(Uri uri,
Predicate<RestResponse> condition, int tryCount = 15,
List<KeyValuePair<string, string>> query = null,
List<KeyValuePair<string, string>> headers = null,
Method method = Method.Get, string body = null)
{
WebClient? client = null;
RestResponse? res = null;
for(int i = 0; i < tryCount; i++)
{
try
{
client = new WebClient(source.Invoke(serviceName), serviceName, timeout);
res = await client.GetResponseAsync(uri, query, headers, method, body);
if (condition(res))
return res;
}
catch(Exception)
{
///TODO:add log maybe?
}
finally
{
if (client != null)
{
client.SetCDToProxy(new TimeSpan(cd));
client.Dispose();
}
}
}
return res;
}
I have no idea how to make it work with async and don't understand why it doesn't work as expected.
I think it might have to do with the Task.Wait() I would consider changing to await like this.
private static async Task<HtmlNode> GetTableAsync(int page)
{
ProxyClient client = new ProxyClient((name) =>ProxyProvider.GetCoreNoCD(name),
serviceName, 10000, 10000);
var statusOk = false;
var result = await client.GetAsync(new Uri(GetPageUrl(page));
statusOk = result.IsSuccessStatusCode &&
result.StatusCode == HttpStatusCode.OK;
//do what you want based on statusOk
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(result.Content);
return doc.DocumentNode.SelectSingleNode("//div[#class=\"table_block\"]/table");
}
Just decided to try different solutions, and seems like it works only if I return task result
Like this:
ProxyClient client = new ProxyClient((name) => ProxyProvider.GetCoreNoCD(name),
serviceName, 10000, 10000);
return await client.TryGetAsync(new Uri(GetPageUrl(page)), (res) =>
{ return res.IsSuccessStatusCode && res.IsSuccessful; });
I thought it could be some kind of misunderstanding of async/await, but seems like no. Maybe some kind of RestSharp bug.
I think you're just checking the result too early. You need to look at the result after the await:
var task = client.TryGetAsync(...);
// Too early to check
var x = await task;
// Check now
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(x.Content);
Using C# and amazon .Net core, able to list all the files with in a amazon S3 folder as below:
public async Task<string> GetMenuUrl(entities.Restaurant restaurant)
{
AmazonS3Client s3Client = new AmazonS3Client(_appSettings.AWSPublicKey, _appSettings.AWSPrivateKey, Amazon.RegionEndpoint.APSoutheast2);
string imagePath;
string restaurantName = trimSpecialCharacters(restaurant.Name);
int restaurantId = restaurant.RestaurantId;
ListObjectsRequest listRequest = new ListObjectsRequest();
ListObjectsResponse listResponse;
imagePath = $"Business_menu/{restaurantId}/";
listRequest.BucketName = _appSettings.AWSS3BucketName;
listRequest.Prefix = imagePath;
do
{
listResponse = await s3Client.ListObjectsAsync(listRequest);
} while (listResponse.IsTruncated);
var files = listResponse.S3Objects.Select(x => x.Key);
var arquivos = files.Select(x => Path.GetFileName(x)).ToList();
return arquivos.ToString();
}
Currently arquivos returns a list containing both the images (image1.jpg, image2.jpg) which is as expected and then I return it as a string.
But when I go to call this method from another function.
public async Task<VenueMenuResponse> GetVenueMenuUrl(int restaurantId)
{
var restaurant = await _context.Restaurant.Where(w => w.RestaurantId == restaurantId).FirstOrDefaultAsync();
var result = await _storyService.GetMenuUrl(restaurant);
var response = new MenuResponse() //just contains string variable called MenuUrl
{
MenuUrl = result
};
return response;
}
It returns this:
{
"menuUrl": "System.Collections.Generic.List`1[System.String]"
}
When I want it to return
{
"menuUrl": "Image1.jpg"
},
{
"menuUrl": "Image2.jpg"
}
You need to iterate thought results and return list of results.
public async Task<IEnumerable<VenueMenuResponse>> GetVenueMenuUrl(int restaurantId)
{
var restaurant = await _context.Restaurant.Where(w => w.RestaurantId == restaurantId).FirstOrDefaultAsync();
var result = await _storyService.GetMenuUrl(restaurant);
var response = result.Select(e => new MenuResponse() //just contains string variable called MenuUrl
{
MenuUrl = e
};
return response;
}
var result = string.Join(", ", fileName);
The default ToString() implementation of List simply prints the name of the type.
This magical line managed to solve things for me. The list was originally outputting just the object instead of the actual content of the list.
I've used the below code from this post - What is the best way to cal API calls in parallel in .net Core, C#?
It works fine, but when I'm processing a large list, some of the calls fail.
My question is, how can I implement Retry logic into this?
foreach (var post in list)
{
async Task<string> func()
{
var response = await client.GetAsync("posts/" + post);
return await response.Content.ReadAsStringAsync();
}
tasks.Add(func());
}
await Task.WhenAll(tasks);
var postResponses = new List<string>();
foreach (var t in tasks) {
var postResponse = await t; //t.Result would be okay too.
postResponses.Add(postResponse);
Console.WriteLine(postResponse);
}
This is my attempt to use Polly. It doesn't work as it still fails on around the same amount of requests as before.
What am I doing wrong?
var policy = Policy
.Handle<HttpRequestException>()
.RetryAsync(3);
foreach (var mediaItem in uploadedMedia)
{
var mediaRequest = new HttpRequestMessage { *** }
async Task<string> func()
{
var response = await client.SendAsync(mediaRequest);
return await response.Content.ReadAsStringAsync();
}
tasks.Add(policy.ExecuteAsync(() => func()));
}
await Task.WhenAll(tasks);
I have 2 projects. One of them aspnet core webapi and second one is console application which is consuming api.
Api method looks like:
[HttpPost]
public async Task<IActionResult> CreateBillingInfo(BillingSummary
billingSummaryCreateDto)
{
var role = User.FindFirst(ClaimTypes.Role).Value;
if (role != "admin")
{
return BadRequest("Available only for admin");
}
... other properties
billingSummaryCreateDto.Price = icu * roc.Price;
billingSummaryCreateDto.Project =
await _context.Projects.FirstOrDefaultAsync(x => x.Id ==
billingSummaryCreateDto.ProjectId);
await _context.BillingSummaries.AddAsync(billingSummaryCreateDto);
await _context.SaveChangesAsync();
return StatusCode(201);
}
Console application which consuming api:
public static async Task CreateBillingSummary(int projectId)
{
var json = JsonConvert.SerializeObject(new {projectId});
var data = new StringContent(json, Encoding.UTF8, "application/json");
using var client = new HttpClient();
client.DefaultRequestHeaders.Authorization =
new AuthenticationHeaderValue("Bearer", await Token.GetToken());
var loginResponse = await client.PostAsync(LibvirtUrls.createBillingSummaryUrl,
data);
WriteLine("Response Status Code: " + (int) loginResponse.StatusCode);
string result = loginResponse.Content.ReadAsStringAsync().Result;
WriteLine(result);
}
Program.cs main method looks like:
static async Task Main(string[] args)
{
if (Environment.GetEnvironmentVariable("TAIKUN_USER") == null ||
Environment.GetEnvironmentVariable("TAIKUN_PASSWORD") == null ||
Environment.GetEnvironmentVariable("TAIKUN_URL") == null)
{
Console.WriteLine("Please specify all credentials");
Environment.Exit(0);
}
Timer timer = new Timer(1000); // show time every second
timer.Elapsed += Timer_Elapsed;
timer.Start();
while (true)
{
Thread.Sleep(1000); // after 1 second begin
await PollerRequests.CreateBillingSummary(60); // auto id
await PollerRequests.CreateBillingSummary(59); // auto id
Thread.Sleep(3600000); // 1hour wait again requests
}
}
Is it possible find all id and paste it automatically instead of 59 and 60? Ids from projects table. _context.Projects
Tried also approach using method which returns ids
public static async Task<IEnumerable<int>> GetProjectIds2()
{
var json = await
Helpers.Transformer(LibvirtUrls.projectsUrl);
List<ProjectListDto> vmList =
JsonConvert.DeserializeObject<List<ProjectListDto>>(json);
return vmList.Select(x => x.Id).AsEnumerable(); // tried
ToList() as well
}
and in main method used:
foreach (var i in await PollerRequests.GetProjectIds2())
new List<int> { i }
.ForEach(async c => await
PollerRequests.CreateBillingSummary(c));
for first 3 ids it worked but does not get other ones,
tested with console writeline method returns all ids
First get all Ids:
var ids = await PollerRequests.GetProjectIds2();
Then create list of task and run all tasks:
var taskList = new List<Task>();
foreach(var id in ids)
taskList.Add(PollerRequests.CreateBillingSummary(id));
await Task.WhenAll(taskList);
I am trying to download mltiple files simultaneosly. But all files are downloading one by one, sequantilly. So, at first this file downloaded #"http://download.geofabrik.de/europe/cyprus-latest.osm.pbf", and then this file is started to dowload #"http://download.geofabrik.de/europe/finland-latest.osm.pbf",, and the next file to be downloaded is #"http://download.geofabrik.de/europe/great-britain-latest.osm.pbf" and so on.
But I would like to download simultaneously.
So I've the following code based on the code from this answer:
static void Main(string[] args)
{
Task.Run(async () =>
{
await DownloadFiles();
}).GetAwaiter().GetResult();
}
public static async Task DownloadFiles()
{
IList<string> urls = new List<string>
{
#"http://download.geofabrik.de/europe/cyprus-latest.osm.pbf",
#"http://download.geofabrik.de/europe/finland-latest.osm.pbf",
#"http://download.geofabrik.de/europe/great-britain-latest.osm.pbf",
#"http://download.geofabrik.de/europe/belgium-latest.osm.pbf",
#"http://download.geofabrik.de/europe/belgium-latest.osm.pbf"
};
foreach (var url in urls)
{
string fileName = url.Substring(url.LastIndexOf('/'));
await DownloadFile(url, fileName);
}
}
public static async Task DownloadFile(string url, string fileName)
{
string address = #"D:\Downloads";
using (var client = new WebClient())
{
await client.DownloadFileTaskAsync(url, $"{address}{fileName}");
}
}
However, when I see in my file system, then I see that files are downloading one by one, sequantially, not simultaneosuly:
In addition, I've tried to use this approach, however there are no simultaneous downloads:
static void Main(string[] args)
{
IList<string> urls = new List<string>
{
#"http://download.geofabrik.de/europe/cyprus-latest.osm.pbf",
#"http://download.geofabrik.de/europe/finland-latest.osm.pbf",
#"http://download.geofabrik.de/europe/great-britain-latest.osm.pbf",
#"http://download.geofabrik.de/europe/belgium-latest.osm.pbf",
#"http://download.geofabrik.de/europe/belgium-latest.osm.pbf"
};
Parallel.ForEach(urls,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
DownloadFile);
}
public static void DownloadFile(string url)
{
string address = #"D:\Downloads";
using (var sr = new StreamReader(WebRequest.Create(url)
.GetResponse().GetResponseStream()))
using (var sw = new StreamWriter(address + url.Substring(url.LastIndexOf('/'))))
{
sw.Write(sr.ReadToEnd());
}
}
Could you tell me how it is possible to download simultaneosly?
Any help would be greatly appreciated.
foreach (var url in urls)
{
string fileName = url.Substring(url.LastIndexOf('/'));
await DownloadFile(url, fileName); // you wait to download the item and then move the next
}
Instead you should create tasks and wait all of them to complete.
public static Task DownloadFiles()
{
IList<string> urls = new List<string>
{
#"http://download.geofabrik.de/europe/cyprus-latest.osm.pbf",
#"http://download.geofabrik.de/europe/finland-latest.osm.pbf",
#"http://download.geofabrik.de/europe/great-britain-latest.osm.pbf",
#"http://download.geofabrik.de/europe/belgium-latest.osm.pbf",
#"http://download.geofabrik.de/europe/belgium-latest.osm.pbf"
};
var tasks = urls.Select(url=> {
var fileName = url.Substring(url.LastIndexOf('/'));
return DownloadFile(url, fileName);
}).ToArray();
return Task.WhenAll(tasks);
}
Rest of your code can remain same.
Eldar's solution works with some minor edits. This is the full working DownloadFiles method that was edited:
public static async Task DownloadFiles()
{
IList<string> urls = new List<string>
{
#"http://download.geofabrik.de/europe/cyprus-latest.osm.pbf",
#"http://download.geofabrik.de/europe/finland-latest.osm.pbf",
#"http://download.geofabrik.de/europe/great-britain-latest.osm.pbf",
#"http://download.geofabrik.de/europe/belgium-latest.osm.pbf",
#"http://download.geofabrik.de/europe/belgium-latest.osm.pbf"
};
var tasks = urls.Select(t => {
var fileName = t.Substring(t.LastIndexOf('/'));
return DownloadFile(t, fileName);
}).ToArray();
await Task.WhenAll(tasks);
}
this will download them asynchronously one after each other.
await DownloadFile(url, fileName);
await DownloadFile(url2, fileName2);
this will do what you actually want to achieve:
var task1 = DownloadFile(url, fileName);
var task2 = DownloadFile(url2, fileName2);
await Task.WhenAll(task1, task2);