Send multiple requests at once to my WebAPI using Task.WhenAll - c#

I'm trying to send multiple same requests at (almost) once to my WebAPI to do some performance testing.
For this, I am calling PerformRequest multiple times and wait for them using await Task.WhenAll.
I want to calculate the time that each request takes to complete plus the start time of each one of them. In my code,however, I don't know what happens if the result of R3 (request number 3) comes before R1? Would the duration be wrong?
From what I see in the results, I think the results are mixing with each other. For example, the R4's result sets as R1's result. So any help would be appreciated.
GlobalStopWatcher is a static class that I'm using to find the start time of each request.
Basically I want to make sure that elapsedMilliseconds and Duration of each request is associated with the request itself.
So that if the result of request 10th comes before the result of 1st request, then duration would be duration = elapsedTime(10th)-(startTime(1st)). Isn't that the case?
I wanted to add a lock but it seems impossible to add it where there's await keyword.
public async Task<RequestResult> PerformRequest(RequestPayload requestPayload)
{
var url = "myUrl.com";
var client = new RestClient(url) { Timeout = -1 };
var request = new RestRequest { Method = Method.POST };
request.AddHeaders(requestPayload.Headers);
foreach (var cookie in requestPayload.Cookies)
{
request.AddCookie(cookie.Key, cookie.Value);
}
request.AddJsonBody(requestPayload.BodyRequest);
var st = new Stopwatch();
st.Start();
var elapsedMilliseconds = GlobalStopWatcher.Stopwatch.ElapsedMilliseconds;
var result = await client.ExecuteAsync(request).ConfigureAwait(false);
st.Stop();
var duration = st.ElapsedMilliseconds;
return new RequestResult()
{
Millisecond = elapsedMilliseconds,
Content = result.Content,
Duration = duration
};
}
public async Task RunAllTasks(int numberOfRequests)
{
GlobalStopWatcher.Stopwatch.Start();
var arrTasks = new Task<RequestResult>[numberOfRequests];
for (var i = 0; i < numberOfRequests; i++)
{
arrTasks[i] = _requestService.PerformRequest(requestPayload, false);
}
var results = await Task.WhenAll(arrTasks).ConfigureAwait(false);
RequestsFinished?.Invoke(this, results.ToList());
}

Where I think you're going wrong with this is trying to use a static GlobalStopWatcher and then pushing this code into your function that you're testing.
You should keep everything separate and use a new instance of Stopwatch for each RunAllTasks call.
Let's make it so.
Start with these:
public async Task<RequestResult<R>> ExecuteAsync<R>(Stopwatch global, Func<Task<R>> process)
{
var s = global.ElapsedMilliseconds;
var c = await process();
var d = global.ElapsedMilliseconds - s;
return new RequestResult<R>()
{
Content = c,
Millisecond = s,
Duration = d
};
}
public class RequestResult<R>
{
public R Content;
public long Millisecond;
public long Duration;
}
Now you're in a position to test anything that fits the signature of Func<Task<R>>.
Let's try this:
public async Task<int> DummyAsync(int x)
{
await Task.Delay(TimeSpan.FromSeconds(x % 3));
return x;
}
We can set up a test like this:
public async Task<RequestResult<int>[]> RunAllTasks(int numberOfRequests)
{
var sw = Stopwatch.StartNew();
var tasks =
from i in Enumerable.Range(0, numberOfRequests)
select ExecuteAsync<int>(sw, () => DummyAsync(i));
return await Task.WhenAll(tasks).ConfigureAwait(false);
}
Note that the line var sw = Stopwatch.StartNew(); captures a new Stopwatch for each RunAllTasks call. Nothing is actually "global" anymore.
If I execute that with RunAllTasks(7) then I get this result:
It runs and it counts correctly.
Now you can just refactor your PerformRequest method to just do what it needs to:
public async Task<string> PerformRequest(RequestPayload requestPayload)
{
var url = "myUrl.com";
var client = new RestClient(url) { Timeout = -1 };
var request = new RestRequest { Method = Method.POST };
request.AddHeaders(requestPayload.Headers);
foreach (var cookie in requestPayload.Cookies)
{
request.AddCookie(cookie.Key, cookie.Value);
}
request.AddJsonBody(requestPayload.BodyRequest);
var response = await client.ExecuteAsync(request);
return response.Content;
}
Running the tests is easy:
public async Task<RequestResult<string>[]> RunAllTasks(int numberOfRequests)
{
var sw = Stopwatch.StartNew();
var tasks =
from i in Enumerable.Range(0, numberOfRequests)
select ExecuteAsync<string>(sw, () => _requestService.PerformRequest(requestPayload));
return await Task.WhenAll(tasks).ConfigureAwait(false);
}
If there's any doubt about the thread-safety of Stopwatch then you could do this:
public async Task<RequestResult<R>> ExecuteAsync<R>(Func<long> getMilliseconds, Func<Task<R>> process)
{
var s = getMilliseconds();
var c = await process();
var d = getMilliseconds() - s;
return new RequestResult<R>()
{
Content = c,
Millisecond = s,
Duration = d
};
}
public async Task<RequestResult<int>[]> RunAllTasks(int numberOfRequests)
{
var sw = Stopwatch.StartNew();
var tasks =
from i in Enumerable.Range(0, numberOfRequests)
select ExecuteAsync<int>(() => { lock (sw) { return sw.ElapsedMilliseconds; } }, () => DummyAsync(i));
return await Task.WhenAll(tasks).ConfigureAwait(false);
}

Related

Why is using tasks with HttpClient synchronously so much slower?

So I was trying to do a quick performance test against a web api to see how it would handle multiple synchronous HTTP requests at once. I did this by spinning up 30 multiple tasks and have each of them send a http request with the HttpClient. To my surprise, it was extremely slow. I thought it was due to the lack of async/await or the web api was slow, but it turns out it's only when I'm using tasks and synchronous http calls (see TestSynchronousWithParallelTasks() below).
So I did a comparison between using without Tasks, async/await with tasks, and ParallelForEach by making some simple tests. All of these finished quickly around 10-20 milliseconds, but the original case which takes around 20 seconds!
Class: HttpClientTest Passed (5) 19.2 sec TestProject.HttpClientTest.TestAsyncWithParallelTasks Passed 12 ms TestProject.HttpClientTest.TestIterativeAndSynchronous Passed 22 ms TestProject.HttpClientTest.TestParallelForEach Passed 15 ms TestProject.HttpClientTest.TestSynchronousWithParallelTasks Passed 19.1 sec TestProject.HttpClientTest.TestSynchronousWithParallelThreads Passed 10 ms
public class HttpClientTest
{
private HttpClient httpClient;
private readonly ITestOutputHelper _testOutputHelper;
public HttpClientTest(ITestOutputHelper testOutputHelper)
{
_testOutputHelper = testOutputHelper;
ServicePointManager.DefaultConnectionLimit = 100;
httpClient = new HttpClient(new HttpClientHandler { MaxConnectionsPerServer = 100 });
}
[Fact]
public async Task TestSynchronousWithParallelTasks()
{
var tasks = new List<Task>();
var url = "https://localhost:44388/api/values";
for (var i = 0; i < 30; i++)
{
var task = Task.Run(() =>
{
var response = httpClient.GetAsync(url).Result;
var content = response.Content.ReadAsStringAsync().Result;
});
tasks.Add(task);
}
await Task.WhenAll(tasks);
}
[Fact]
public void TestIterativeAndSynchronous()
{
var url = "https://localhost:44388/api/values";
for (var i = 0; i < 30; i++)
{
var response = httpClient.GetAsync(url).Result;
var content = response.Content.ReadAsStringAsync().Result;
}
}
[Fact]
public async Task TestAsyncWithParallelTasks()
{
var url = "https://localhost:44388/api/values";
var tasks = new List<Task>();
for (var i = 0; i < 30; i++)
{
var task = Task.Run(async () =>
{
var response = await httpClient.GetAsync(url);
var content = await response.Content.ReadAsStringAsync();
});
tasks.Add(task);
}
await Task.WhenAll(tasks);
}
[Fact]
public void TestParallelForEach()
{
var url = "https://localhost:44388/api/values";
var n = new int[30];
Parallel.ForEach(n, new ParallelOptions { MaxDegreeOfParallelism = 2 }, (i) =>
{
var response = httpClient.GetAsync(url).Result;
var content = response.Content.ReadAsStringAsync().Result;
});
}
[Fact]
public async Task TestSynchronousWithParallelThreads()
{
var tasks = new List<Task>();
var url = "https://localhost:44388/api/values";
var threads = new List<Thread>();
for (var i = 0; i < 30; i++)
{
var thread = new Thread( () =>
{
var response = httpClient.GetAsync(url).Result;
var content = response.Content.ReadAsStringAsync().Result;
});
thread.Start();
threads.Add(thread);
}
foreach(var thread in threads)
{
thread.Join();
}
}
}
So any idea what's causing this performance hit?
I would have expected TestSynchronousWithParallelTasks() to be faster than TestIterativeAndSynchronous() as you'd be starting more requests at once, even if it's IO bound. While the latter is waiting for each request before starting a new one. So it seems like it's related to the tasks somehow blocking each other?
Edit: Added a test case to use threads instead and it's quick like the rest.

Is my approach correct for concurrent network requests?

I wrote a web crawler and I want to know if my approach is correct. The only issue I'm facing is that it stops after some hours of crawling. No exception, it just stops.
1 - the private members and the constructor:
private const int CONCURRENT_CONNECTIONS = 5;
private readonly HttpClient _client;
private readonly string[] _services = new string[2] {
"https://example.com/items?id=ID_HERE",
"https://another_example.com/items?id=ID_HERE"
}
private readonly List<SemaphoreSlim> _semaphores;
public Crawler() {
ServicePointManager.DefaultConnectionLimit = CONCURRENT_CONNECTIONS;
_client = new HttpClient();
_semaphores = new List<SemaphoreSlim>();
foreach (var _ in _services) {
_semaphores.Add(new SemaphoreSlim(CONCURRENT_CONNECTIONS));
}
}
Single HttpClient instance.
The _services is just a string array that contains the URL, they are not the same domain.
I'm using semaphores (one per domain) since I read that it's not a good idea to use the network queue (I don't remember how it calls).
2 - The Run method, which is the one I will call to start crawling.
public async Run(List<int> ids) {
const int BATCH_COUNT = 1000;
var svcIndex = 0;
var tasks = new List<Task<string>>(BATCH_COUNT);
foreach (var itemId in ids) {
tasks.Add(DownloadItem(svcIndex, _services[svcIndex].Replace("ID_HERE", $"{itemId}")));
if (++svcIndex >= _services.Length) {
svcIndex = 0;
}
if (tasks.Count >= BATCH_COUNT) {
var results = await Task.WhenAll(tasks);
await SaveDownloadedData(results);
tasks.Clear();
}
}
if (tasks.Count > 0) {
var results = await Task.WhenAll(tasks);
await SaveDownloadedData(results);
tasks.Clear();
}
}
DownloadItem is an async function that actually makes the GET request, note that I'm not awaiting it here.
If the number of tasks reaches the BATCH_COUNT, I will await all to complete and save the results to file.
3 - The DownloadItem function.
private async Task<string> DownloadItem(int serviceIndex, string link) {
var needReleaseSemaphore = true;
var result = string.Empty;
try {
await _semaphores[serviceIndex].WaitAsync();
var r = await _client.GetStringAsync(link);
_semaphores[serviceIndex].Release();
needReleaseSemaphore = false;
// DUE TO JSON SIZE, I NEED TO REMOVE A VALUE (IT'S USELESS FOR ME)
var obj = JObject.Parse(r);
if (obj.ContainsKey("blah")) {
obj.Remove("blah");
}
result = obj.ToString(Formatting.None);
} catch {
result = string.Empty;
// SINCE I GOT AN EXCEPTION, I WILL 'LOCK' THIS SERVICE FOR 1 MINUTE.
// IF I RELEASED THIS SEMAPHORE, I WILL LOCK IT AGAIN FIRST.
if (!needReleaseSemaphore) {
await _semaphores[serviceIndex].WaitAsync();
needReleaseSemaphore = true;
}
await Task.Delay(60_000);
} finally {
// RELEASE THE SEMAPHORE, IF NEEDED.
if (needReleaseSemaphore) {
_semaphores[serviceIndex].Release();
}
}
return result;
}
4- The function that saves the result.
private async Task SaveDownloadedData(List<string> myData) {
using var fs = new FileStream("./output.dat", FileMode.Append);
foreach (var res in myData) {
var blob = Encoding.UTF8.GetBytes(res);
await fs.WriteAsync(BitConverter.GetBytes((uint)blob.Length));
await fs.WriteAsync(blob);
}
await fs.DisposeAsync();
}
5- Finally, the Main function.
static async Task Main(string[] args) {
var crawler = new Crawler();
var items = LoadItemIds();
await crawler.Run(items);
}
After all this, is my approach correct? I need to make millions of requests, will take some weeks/months to gather all data I need (due to the connection limit).
After 12 - 14 hours, it just stops and I need to manually restart the app (memory usage is ok, my VPS has 1 GB and it never used more than 60%).

How to find all id and autopaste as parameter using linq?

I have 2 projects. One of them aspnet core webapi and second one is console application which is consuming api.
Api method looks like:
[HttpPost]
public async Task<IActionResult> CreateBillingInfo(BillingSummary
billingSummaryCreateDto)
{
var role = User.FindFirst(ClaimTypes.Role).Value;
if (role != "admin")
{
return BadRequest("Available only for admin");
}
... other properties
billingSummaryCreateDto.Price = icu * roc.Price;
billingSummaryCreateDto.Project =
await _context.Projects.FirstOrDefaultAsync(x => x.Id ==
billingSummaryCreateDto.ProjectId);
await _context.BillingSummaries.AddAsync(billingSummaryCreateDto);
await _context.SaveChangesAsync();
return StatusCode(201);
}
Console application which consuming api:
public static async Task CreateBillingSummary(int projectId)
{
var json = JsonConvert.SerializeObject(new {projectId});
var data = new StringContent(json, Encoding.UTF8, "application/json");
using var client = new HttpClient();
client.DefaultRequestHeaders.Authorization =
new AuthenticationHeaderValue("Bearer", await Token.GetToken());
var loginResponse = await client.PostAsync(LibvirtUrls.createBillingSummaryUrl,
data);
WriteLine("Response Status Code: " + (int) loginResponse.StatusCode);
string result = loginResponse.Content.ReadAsStringAsync().Result;
WriteLine(result);
}
Program.cs main method looks like:
static async Task Main(string[] args)
{
if (Environment.GetEnvironmentVariable("TAIKUN_USER") == null ||
Environment.GetEnvironmentVariable("TAIKUN_PASSWORD") == null ||
Environment.GetEnvironmentVariable("TAIKUN_URL") == null)
{
Console.WriteLine("Please specify all credentials");
Environment.Exit(0);
}
Timer timer = new Timer(1000); // show time every second
timer.Elapsed += Timer_Elapsed;
timer.Start();
while (true)
{
Thread.Sleep(1000); // after 1 second begin
await PollerRequests.CreateBillingSummary(60); // auto id
await PollerRequests.CreateBillingSummary(59); // auto id
Thread.Sleep(3600000); // 1hour wait again requests
}
}
Is it possible find all id and paste it automatically instead of 59 and 60? Ids from projects table. _context.Projects
Tried also approach using method which returns ids
public static async Task<IEnumerable<int>> GetProjectIds2()
{
var json = await
Helpers.Transformer(LibvirtUrls.projectsUrl);
List<ProjectListDto> vmList =
JsonConvert.DeserializeObject<List<ProjectListDto>>(json);
return vmList.Select(x => x.Id).AsEnumerable(); // tried
ToList() as well
}
and in main method used:
foreach (var i in await PollerRequests.GetProjectIds2())
new List<int> { i }
.ForEach(async c => await
PollerRequests.CreateBillingSummary(c));
for first 3 ids it worked but does not get other ones,
tested with console writeline method returns all ids
First get all Ids:
var ids = await PollerRequests.GetProjectIds2();
Then create list of task and run all tasks:
var taskList = new List<Task>();
foreach(var id in ids)
taskList.Add(PollerRequests.CreateBillingSummary(id));
await Task.WhenAll(taskList);

Processing large number of tasks concurrently and asynchronously

I would like to process a list of 50,000 urls through a web service, The provider of this service allows 5 connections per second.
I need to process these urls in parallel with adherence to provider's rules.
This is my current code:
static void Main(string[] args)
{
process_urls().GetAwaiter().GetResult();
}
public static async Task process_urls()
{
// let's say there is a list of 50,000+ URLs
var urls = System.IO.File.ReadAllLines("urls.txt");
var allTasks = new List<Task>();
var throttler = new SemaphoreSlim(initialCount: 5);
foreach (var url in urls)
{
await throttler.WaitAsync();
allTasks.Add(
Task.Run(async () =>
{
try
{
Console.WriteLine(String.Format("Starting {0}", url));
var client = new HttpClient();
var xml = await client.GetStringAsync(url);
//do some processing on xml output
client.Dispose();
}
finally
{
throttler.Release();
}
}));
}
await Task.WhenAll(allTasks);
}
Instead of var client = new HttpClient(); I will create a new object of the target web service but this is just to make the code generic.
Is this the correct approach to handle and process a huge list of connections? and is there anyway I can limit the number of established connections per second to 5 as the current implementation will not consider any timeframe?
Thanks
Reading values from web service is IO operation which can be done asynchronously without multithreading.
Threads do nothing - only waiting for response in this case. So using parallel is just wasting of resources.
public static async Task process_urls()
{
var urls = System.IO.File.ReadAllLines("urls.txt");
var allTasks = new List<Task>();
var throttler = new SemaphoreSlim(initialCount: 5);
foreach (var urlGroup in SplitToGroupsOfFive(urls))
{
var tasks = new List<Task>();
foreach(var url in urlGroup)
{
var task = ProcessUrl(url);
tasks.Add(task);
}
// This delay will sure that next 5 urls will be used only after 1 seconds
tasks.Add(Task.Delay(1000));
await Task.WhenAll(tasks.ToArray());
}
}
private async Task ProcessUrl(string url)
{
using (var client = new HttpClient())
{
var xml = await client.GetStringAsync(url);
//do some processing on xml output
}
}
private IEnumerable<IEnumerable<string>> SplitToGroupsOfFive(IEnumerable<string> urls)
{
var const GROUP_SIZE = 5;
var string[] group = null;
var int count = 0;
foreach (var url in urls)
{
if (group == null)
group = new string[GROUP_SIZE];
group[count] = url;
count++;
if (count < GROUP_SIZE)
continue;
yield return group;
group = null;
count = 0;
}
if (group != null && group.Length > 0)
{
yield return group.Take(group.Length);
}
}
Because you mention that "processing" of response is also IO operation, then async/await approach is most efficient, because it using only one thread and process other tasks when previous tasks waiting for response from web service or from file writing IO operations.

C# await lambda function

I'll start off by publishing the code that is troubled:
public async Task main()
{
Task t = func();
await t;
list.ItemsSource = jlist; //jlist previously defined
}
public async Task func()
{
TwitterService service = new TwitterService(_consumerKey, _consumerSecret);
service.AuthenticateWith(_accessToken, _accessTokenSecret);
TwitterGeoLocationSearch g = new TwitterGeoLocationSearch(40.758367, -73.982706, 25, 0);
SearchOptions s = new SearchOptions();
s.Geocode = g;
s.Q = "";
s.Count = 1;
service.Search(s, (statuses, response) => get_tweets(statuses, response));
void get_tweets(TwitterSearchResult statuses, TwitterResponse response)
{
//unimportant code
jlist.Add(info);
System.Diagnostics.Debug.WriteLine("done with get_tweets, jlist created");
}
I am having issues with the get_tweets(..) function running (on what I believe a different thread) and the Task t is not awaited like I have in the main function. Basically, my issue is that the list.Itemsource = jlist is ran before the get_tweets function is finished. Does anyone have a solution or the right direction to point me in?
First, create a TAP wrapper for TwitterService.Search, using TaskCompletionSource. So something like:
public static Task<Tuple<TwitterSearchResult, TwitterResponse>> SearchAsync(this TwitterService service, SearchOptions options)
{
var tcs = new TaskCompletionSource<Tuple<TwitterSearchResult, TwitterResponse>>();
service.Search(options, (status, response) => tcs.SetResult(Tuple.Create(status, response)));
return tcs.Task;
}
Then you can consume it using await:
SearchOptions s = new SearchOptions();
s.Geocode = g;
s.Q = "";
s.Count = 1;
var result = await service.SearchAsync(s);
get_tweets(result.Item1, result.Item2);

Categories