C# Speed up parallel webrequests using async

C# Speed up parallel webrequests using async - c#

so I have this code:
This is the main function,a parallel for loop that iterates through all the data that needs to be posted and calls a function
ParallelOptions pOpt = new ParallelOptions();
pOpt.MaxDegreeOfParallelism = 30;
Parallel.For(0, maxsize, pOpt, (index,loopstate) => {
//Calls the function where all the webrequests are made
CallRequests(data1,data2);
if (isAborted)
loopstate.Stop();
});
This function is called inside the parallel loop
public static void CallRequests(string data1, string data2)
{
var cookie = new CookieContainer();
var postData = Parameters[23] + data1 +
Parameters[24] + data2;
HttpWebRequest getRequest = (HttpWebRequest)WebRequest.Create(Parameters[25]);
getRequest.Accept = Parameters[26];
getRequest.KeepAlive = true;
getRequest.Referer = Parameters[27];
getRequest.CookieContainer = cookie;
getRequest.UserAgent = Parameters[28];
getRequest.Method = WebRequestMethods.Http.Post;
getRequest.AllowWriteStreamBuffering = true;
getRequest.ProtocolVersion = HttpVersion.Version10;
getRequest.AllowAutoRedirect = false;
getRequest.ContentType = Parameters[29];
getRequest.ReadWriteTimeout = 5000;
getRequest.Timeout = 5000;
getRequest.Proxy = null;
byte[] byteArray = Encoding.ASCII.GetBytes(postData);
getRequest.ContentLength = byteArray.Length;
Stream newStream = getRequest.GetRequestStream(); //open connection
newStream.Write(byteArray, 0, byteArray.Length); // Send the data.
newStream.Close();
HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse();
if (getResponse.Headers["Location"] == Parameters[30])
{
//These are simple get requests to retrieve the source code using the same format as above.
//I need to preserve the cookie
GetRequets(data1, data2, Parameters[31], Parameters[13], cookie);
GetRequets(data1, data2, Parameters[32], Parameters[15], cookie);
}
}
From what I have seen and been told,I understand that making these requests async is a better idea than using a parallel loop.My method is also heavy on the proccesor.I wonder how can I make these requests async,but also preserve the multithreaded aspect. I also need to keep the cookie,after the post requests finishes.

Converting the CallRequests method to an async is really just a case of switching the sync method calls for async ones with the await keyword and changing the method signature to return Task.
Something like this:
public static async Task CallRequestsAsync(string data1, string data2)
{
var cookie = new CookieContainer();
var postData = Parameters[23] + data1 +
Parameters[24] + data2;
HttpWebRequest getRequest = (HttpWebRequest)WebRequest.Create(Parameters[25]);
getRequest.Accept = Parameters[26];
getRequest.KeepAlive = true;
getRequest.Referer = Parameters[27];
getRequest.CookieContainer = cookie;
getRequest.UserAgent = Parameters[28];
getRequest.Method = WebRequestMethods.Http.Post;
getRequest.AllowWriteStreamBuffering = true;
getRequest.ProtocolVersion = HttpVersion.Version10;
getRequest.AllowAutoRedirect = false;
getRequest.ContentType = Parameters[29];
getRequest.ReadWriteTimeout = 5000;
getRequest.Timeout = 5000;
getRequest.Proxy = null;
byte[] byteArray = Encoding.ASCII.GetBytes(postData);
getRequest.ContentLength = byteArray.Length;
Stream newStream =await getRequest.GetRequestStreamAsync(); //open connection
await newStream.WriteAsync(byteArray, 0, byteArray.Length); // Send the data.
newStream.Close();
HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse();
if (getResponse.Headers["Location"] == Parameters[30])
{
//These are simple get requests to retrieve the source code using the same format as above.
//I need to preserve the cookie
GetRequets(data1, data2, Parameters[31], Parameters[13], cookie);
GetRequets(data1, data2, Parameters[32], Parameters[15], cookie);
}
}
However this, in itself, doesn't really get you anywhere because you still need to await the returned tasks in your main method. A very straightforward (if somewhat blunt) way of doing so would be to simply call Task.WaitAll() (or await Task.WhenAll() if the calling method itself is to become async). Something like this:
var tasks = Enumerable.Range(0, maxsize).Select(index => CallRequestsAsync(data1, data2));
Task.WaitAll(tasks.ToArray());
However, this is really pretty blunt and loses control over how many iterations are running in parallel, etc. I MUCH prefer use of the TPL dataflow library for this sort of thing. This library provides a way of chaining async (or sync for that matter) operations in parallel and passing them from one "processing block" to the next. It has a myriad of options for tweaking degrees of parallelism, buffer sizes, etc.
A detailed expose is beyond the possible scope of this answer so i'd encourage you to read up on it but one possible approach would be to simply push this to an action block - something like this:
var actionBlock = new ActionBlock<int>(async index =>
{
await CallRequestsAsync(data1, data2);
}, new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 30,
BoundedCapacity = 100,
});
for (int i=0; i <= maxsize; i++)
{
actionBlock.Post(i); // or await actionBlock.SendAsync(i) if calling method is also async
}
actionBlock.Complete();
actionBlock.Completion.Wait(); // or await actionBlock.Completion if calling method is also async
Couple of additional points that are outside the scope of my answer that I should mention in passing:
it looks like your CallRequests method is updating some external variable with its results. Where possible it's best to avoid this pattern and have the method return the results for collation later (which the TPL Dataflow library handles through TransformBlock<>). If updating external state is unavoidable then make sure you have thought about the multithreaded implications (deadlocks, race conditions, etc.) which are outside the scope of my answer.
I am assuming there is some useful property of index which has been lost when you created a minimal description for your question? Does it index into a parameter list or something similar? If so, you can always just iterate over these directly and change the ActionBlock<int> to an ActionBlock<{--whatever the type of your parameter is--}>
Make sure you understand the difference between multi-threaded/parallel execution and asynchronous. There are some similarities/overlaps for sure but just making something async doesn't make it multithreaded nor is the converse true.

Related

C# Asychronous REST query not working: WaitAll or WhenAll combined with .Result hangs up

I am building a C# Winforms application and I have many REST calls to process. Each call takes about 10 sec till I receive an answer, so in the end, my application is running quite a while. Mostly spending time waiting for the REST service to answer.
I am not coming forward because no matter what I try (configureAwait, waitAll or whenAll), the application hangs or when I want to access each tasks result, it is going back to the Main methods or hangs. Here is what I currently have:
I am building up a list of tasks to fill my objects :
List<Task> days = new List<Task>();
for (DateTime d = dtStart; d <= dtEnd; d = d.AddDays(1))
{
if (UseProduct)
{
Task _t = AsyncBuildDay(d, Project, Product, fixVersion);
var t = _t as Task<Day>;
days.Add(t);
}
else
{
Task _t = AsyncBuildDay(d, Project, fixVersion);
var t = _t as Task<Day>;
days.Add(t);
}
}
Then I am starting and waiting until every task is finished and the objects are built:
Task.WaitAll(days.ToArray());
When I try this, then the tasks are waiting for activation:
var tks = Task.WhenAll(days.ToArray());
What is running asynchronously inside the tasks (AsyncBuildDay
) is a query to JIRA:
private async Task<string> GetResponse(string url)
{
WebRequest request = WebRequest.Create(url);
request.Method = "GET";
request.Headers["Authorization"] = "Basic " + Convert.ToBase64String(Encoding.Default.GetBytes(JIRAUser + ":" + JIRAPassword));
request.Credentials = new NetworkCredential(JIRAUser, JIRAPassword);
WebResponse response = await request.GetResponseAsync().ConfigureAwait(false);
// Get the stream containing all content returned by the requested server.
Stream dataStream = response.GetResponseStream();
// Open the stream using a StreamReader for easy access.
StreamReader reader = new StreamReader(dataStream);
// Read the content fully up to the end.
string json = reader.ReadToEnd();
return json;
}
And now I would like to access all my objects with .Result, but then the whole code freezes again.
foreach (Task<Day> t in days)
{
dc.colDays.Add(t.Result);
}
I don't find a wait to get to my objects and I'm really going nuts with this stuff. Any ideas are much appreciated!

You're overcomplicating this.
Task.WhenAll is the way to go; it returns a new Task that completes when the provided tasks have all completed.
It's also non-blocking.
By awaiting the Task returned by Task.WhenAll, you unwrap it's results into an array:
List<Task<Day>> dayTasks = new();
// ...
Day[] days = await Task.WhenAll(dayTasks);
You can then add this to dc.colDays:
dc.colDays.AddRange(days);
Or if dc.colDays doesnt have an AddRange method:
foreach (var day in days) dc.colDays.Add(day);

It might be better to await any completion, and remove the completed task from the list.
while (days.Count > 0)
{
Task completedTask = await Task.WhenAny(days);
// Do something with result.
days.Remove(completedTask);
}

How to execute web requests in parallel?

How to execute this code in parallel? I tried to execute the execution in threads, but the requests are still being executed sequentially. I am new to parallel programming, I will be very happy for your help.
public async Task<IList<AdModel>> LocalBitcoins_buy(int page_number)
{
IList<AdModel> Buy_ads = new List<AdModel>();
string next_page_url;
string url = "https://localbitcoins.net/buy-bitcoins-online/.json?page=" + page_number;
WebRequest request = WebRequest.Create(url);
request.Method = "GET";
using (WebResponse response = await request.GetResponseAsync())
{
using (var reader = new StreamReader(response.GetResponseStream()))
{
JObject json = JObject.Parse(await reader.ReadToEndAsync());
next_page_url = (string) json["pagination"]["next"];
int counter = (int) json["data"]["ad_count"];
for (int ad_list_index = 0; ad_list_index < counter; ad_list_index++)
{
AdModel save = new AdModel();
save.Seller = (string) json["data"]["ad_list"][ad_list_index]["data"]["profile"]["username"];
save.Give = (string) json["data"]["ad_list"][ad_list_index]["data"]["currency"];
save.Get = "BTC";
save.Limits = (string) json["data"]["ad_list"][ad_list_index]["data"]["first_time_limit_btc"];
save.Deals = (string) json["data"]["ad_list"][ad_list_index]["data"]["profile"]["trade_count"];
save.Reviews = (string) json["data"]["ad_list"][ad_list_index]["data"]["profile"]["feedback_score"];
save.PaymentWindow = (string) json["data"]["ad_list"][ad_list_index]["data"]["payment_window_minutes"];
Buy_ads.Add(save);
}
}
}
Console.WriteLine(page_number);
return Buy_ads;
}

I googled and found this links 1, 2. It seems that WebRequest cannot execute requests in parallel. Also I tried to send multiple requests in parallel using WebRequest and for some reasons WebRequest didn't make requests in parallel.
But when I used HttpClient class it did requests in parallel. Try to use HttpClient instead of WebRequest as Microsoft recommends.
So, firstly, you should use HttpClient to make web request.
Then you can use the next approach to download pages in parallel:
public static IList<AdModel> DownloadAllPages()
{
int[] pageNumbers = getPageNumbers();
// Array of tasks that download data from the pages.
Task<IList<AdModel>>[] tasks = new Task<IList<AdModel>>[pageNumbers.Length];
// This loop lauches download tasks in parallel.
for (int i = 0; i < pageNumbers.Length; i++)
{
// Launch download task without waiting for its completion.
tasks[i] = LocalBitcoins_buy(pageNumbers[i]);
}
// Wait for all tasks to complete.
Task.WaitAll(tasks);
// Combine results from all tasks into a common list.
return tasks.SelectMany(t => t.Result).ToList();
}
Of course, you should add error handling into this method.

I suggest you study this post carefully: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/walkthrough-accessing-the-web-by-using-async-and-await
It has a tutorial that is doing exactly what you want to do: Downloading multiple pages simultaneously.
WebRequest does not work asynchronously with getResponse purposefully. It has a second method: getResponseAsync().
To get your toes wet with threading, you can also use the drop-in replacement for the foreach and for loops: https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/how-to-write-a-simple-parallel-foreach-loop

HttpWebResponse ReadAsync time out

I try to read stream of HttpWebResponse using await/async:
async static Task testHttpWebClientAsync()
{
string url = "http://localhost/1.txt";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
req.Method = "GET";
HttpWebResponse resp = (HttpWebResponse)await req.GetResponseAsync();
Stream stream = resp.GetResponseStream();
stream.ReadTimeout = 10 * 1000;
byte[] buffer = new byte[1024];
while (await stream.ReadAsync(buffer, 0, buffer.Length) > 0)
{
//time out exception never thrown
}
}
But it doesn't work, it never time out on ReadAsync.
For comparison a non-async version work perfectly with the same localhost test server:
static void testHttpWebClient()
{
string url = "http://localhost/1.txt";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
req.Method = "GET";
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
Stream stream = resp.GetResponseStream();
stream.ReadTimeout = 10 * 1000;
byte[] buffer = new byte[1024];
while (stream.Read(buffer, 0, buffer.Length) > 0)
{
//time out exception thrown here
}
}
The above code is tested in a console application:
static void Main(string[] args)
{
testHttpWebClient();
MainAsync(args).GetAwaiter().GetResult();
}
async static Task MainAsync(string[] args)
{
await testHttpWebClientAsync();
}
But this is not relevant to the problem, indeed I find the problem in a WinForms project and create the console project to test the problem.
For reference, the test server code is something like:
int c = 10;
byte[] ba = new byte[1024];
SendHeader(sHttpVersion, sMimeType,(int) ba.Length*c, " 200 OK", ref mySocket);
for (int k = 0; k < c; k++)
{
//set break point here
SendToBrowser(ba, ref mySocket);
}
There are several similar topics on SO, but it seems that none of them solve this problem. From API design perspective, obviously there is no reason that ReadAsync() doesn't time out just like Read() does, ReadAsync only need to watch both the socket and an internal timer event, this is how Task.Delay() works. This has nothing to do with CancellationToken,etc because we don't need to cancel anything, even ReadAsync has a version that accept CancellationToken.
So this question is both for a solution for the problem, and why ReadAsync doesn't just time out as expected.

Asynchronous APIs on HttpWebRequest (and on WebClient since it uses HttpWebRequest internally) do not use timeouts internally. While I can't really explain the reasoning behind it, this is by design.
This is especially apparent in the Write logic of the ConnectStream (used internally by HttpWebResponse):
if (async) {
m_Connection.BeginMultipleWrite(buffers, m_WriteCallbackDelegate, asyncResult);
}
else {
SafeSetSocketTimeout(SocketShutdown.Send);
m_Connection.MultipleWrite(buffers);
}
SafeSetSocketTimeout is the method responsible of setting the timeout on the underlying socket. As you can see, it's deliberately skipped on the async path. It's the same thing for read operations, but the code is more convoluted so it's harder to show.
Therefore, you really have only two solutions:
Implement the timeout yourself (usually with a timer that calls .Abort() on the HttpWebRequest)
Use HttpClient instead

C# Threading with functions that return variables

Okay so basically I have a function that returns a string, but to get that string it uses webrequest which means while it's doing that webrequest the form is locking up unless I put it in a different thread.
But I can't figure out a way to capture the returned data in a thread since it's started using thread.start and that's a void.
Any help please?
Current code if it matters to anyone:
string CreateReqThread(string UrlReq)
{
System.Threading.Thread NewThread = new System.Threading.Thread(() => CreateReq(UrlReq));
string ReturnedData = "";
return ReturnedData;
}
string CreateReq(string url)
{
try
{
WebRequest SendReq = WebRequest.Create(url);
SendReq.Credentials = CredentialCache.DefaultCredentials;
SendReq.Proxy = WebRequest.DefaultWebProxy; //For closed port networks like colleges
SendReq.Proxy.Credentials = CredentialCache.DefaultCredentials;
SendReq.Timeout = 15000;
System.IO.StreamReader Reader = new System.IO.StreamReader(SendReq.GetResponse().GetResponseStream());
string Response = Reader.ReadToEnd();
Reader.Close();
return Response;
}
catch (WebException e)
{
EBox(e.Message, "Unknown Error While Connecting");
return null;
}
}

A common means of doing this is to use a Task<T> instead of a thread:
Task<string> CreateReqThread(string UrlReq)
{
return Task.Factory.StartNew() => CreateReq(UrlReq));
// In .NET 4.5, you can use (or better yet, reimplement using await/async directly)
// return Task.Run(() => CreateReq(UrlReq));
}
You can then call Task<T>.Result to get the returned value (later), when it's needed, or schedule a continuation on the task which will run when it completes.
This could look something like:
var request = CreateReqThread(theUri);
request.ContinueWith(t =>
{
// Shove results in a text box
this.textBox.Text = t.Result;
}, TaskScheduler.FromCurrentSynchronizationContext());
This also works perfectly with the new await/async support in C# 5.

How to create an async method in C# 4 according to the best practices?

Consider the following code snippet:
public static Task<string> FetchAsync()
{
string url = "http://www.example.com", message = "Hello World!";
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = WebRequestMethods.Http.Post;
return Task.Factory.FromAsync<Stream>(request.BeginGetRequestStream, request.EndGetRequestStream, null)
.ContinueWith(t =>
{
var stream = t.Result;
var data = Encoding.ASCII.GetBytes(message);
Task.Factory.FromAsync(stream.BeginWrite, stream.EndWrite, data, 0, data.Length, null, TaskCreationOptions.AttachedToParent)
.ContinueWith(t2 => { stream.Close(); });
})
.ContinueWith<string>(t =>
{
var t1 =
Task.Factory.FromAsync<WebResponse>(request.BeginGetResponse, request.EndGetResponse, null)
.ContinueWith<string>(t2 =>
{
var response = (HttpWebResponse)t2.Result;
var stream = response.GetResponseStream();
var buffer = new byte[response.ContentLength > 0 ? response.ContentLength : 0x100000];
var t3 = Task<int>.Factory.FromAsync(stream.BeginRead, stream.EndRead, buffer, 0, buffer.Length, null, TaskCreationOptions.AttachedToParent)
.ContinueWith<string>(t4 =>
{
stream.Close();
response.Close();
if (t4.Result < buffer.Length)
{
Array.Resize(ref buffer, t4.Result);
}
return Encoding.ASCII.GetString(buffer);
});
t3.Wait();
return t3.Result;
});
t1.Wait();
return t1.Result;
});
}
It should return Task<string>, send HTTP POST request with some data, return a result from webserver in a form of string and be as much efficient as possible.
Did you spot any problems regarding async flow in the example above?
Is it OK to have .Wait() inside .ContinueWith() in this example
Do you see any other problems with this peace of code (keeping aside exception handling for now)?

If async related C# 4.0 code is huge and ugly - there is a chance that it's implemented properly. If it's nice and short, then most likely it's not ;)
..though, you may get it look more attractive by creating extension methods on WebRequest, Stream classes and cleanup the main method.
P.S.: I hope C# 5.0 with it's new async keyword and library will be released soon.
Reference: http://msdn.microsoft.com/en-us/vstudio/async.aspx

You're correct in thinking that the Waits are unnecessary - Result will block until a result is ready.
However, an even easier way would be to base it off use the examples provided in the ParallelExtensionsExtras library.
They have made extensions for WebClient which do exactly what you're looking for:
static Task<string> FetchAsync()
{
string url = "http://www.example.com", message = "Hello World!";
return new WebClient().UploadStringTask(url, "POST", message);
}
You can read more about it in this post on the Parallel Programming with .NET blog.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Speed up parallel webrequests using async - c#

Related

C# Asychronous REST query not working: WaitAll or WhenAll combined with .Result hangs up

How to execute web requests in parallel?

HttpWebResponse ReadAsync time out

C# Threading with functions that return variables

How to create an async method in C# 4 according to the best practices?

Categories

Resources