Invoking synchronous method asynchronously completes task faster than natural async methods [closed] - c#

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 7 years ago.
Improve this question
Sorry for bad title. I am currently learning TPL and reading this blog article which states
The ability to invoke a synchronous method asynchronously does nothing for scalability, because you’re typically still consuming the same amount of resources you would have if you’d invoked it synchronously (in fact, you’re using a bit more, since there’s overhead incurred to scheduling something ).
So I thought let's give it a try and I created demo application that uses WebClient's DownloadStringTaskAsync and DownloadString (synchronous) method.
My demo application is having two methods
DownloadHtmlNotAsyncInAsyncWay
This provides asynchronous method wrapper around the synchronous method DownloadString which should not scale good.
DownloadHTMLCSAsync
This calls async method DownloadStringTaskAsync.
I created 100 task from both methods and compared time consumed and found that option 1 consumed less time compare to second. why?
Here is my code.
using System;
using System.Diagnostics;
using System.Net;
using System.Threading.Tasks;
public class Program
{
public static void Main()
{
const int repeattime = 100;
var s = new Sample();
var sw = new Stopwatch();
var tasks = new Task<string>[repeattime];
sw.Start();
for (var i = 0; i < repeattime; i++)
{
tasks[i] = s.DownloadHtmlNotAsyncInAsyncWay();
}
Task.WhenAll(tasks);
Console.WriteLine("==========Time elapsed(non natural async): " + sw.Elapsed + "==========");
sw.Reset();
sw.Start();
for (var i = 0; i < repeattime; i++)
{
tasks[i] = s.DownloadHTMLCSAsync();
}
Task.WhenAll(tasks);
Console.WriteLine("==========Time elapsed(natural async) : " + sw.Elapsed + "==========");
sw.Reset();
}
}
public class Sample
{
private const string Url = "https://www.google.co.in";
public async Task<string> DownloadHtmlNotAsyncInAsyncWay()
{
return await Task.Run(() => DownloadHTML());
}
public async Task<string> DownloadHTMLCSAsync()
{
using (var w = new WebClient())
{
var content = await w.DownloadStringTaskAsync(new Uri(Url));
return GetWebTitle(content);
}
}
private string DownloadHTML()
{
using (var w = new WebClient())
{
var content = w.DownloadString(new Uri(Url));
return GetWebTitle(content);
}
}
private static string GetWebTitle(string content)
{
int titleStart = content.IndexOf("<title>", StringComparison.InvariantCultureIgnoreCase);
if (titleStart < 0)
{
return null;
}
int titleBodyStart = titleStart + "<title>".Length;
int titleBodyEnd = content.IndexOf("</title>", titleBodyStart, StringComparison.InvariantCultureIgnoreCase);
return content.Substring(titleBodyStart, titleBodyEnd - titleBodyStart);
}
}
Here is dotnetfiddle link.
Why did first option completed in less time than second time?

You aren't actually measuring anything.
Task.WhenAll(tasks); returns a Task of the completion of all of those tasks.
You don't do anything with that task, so you aren't waiting for anything to finish.
Therefore, you're just measuring the synchronous initialization of each alternative. Task.Run() just queues a delegate to the thread pool; it does less work than setting up an HTTP request.

in fact, you’re using a bit more, since there’s overhead incurred to scheduling something
Even if you were correctly awaiting the tasks, as SLaks suggested, it would be near impossible to accurately measure this overhead.
Your test is downloading a webpage, which requires network access.
The overhead you're trying to measure is soooo much smaller than the variance in the network latency, that it would be lost in the noise.

Related

Speed up multiple API calls

So I did this project in uni that I am trying to refactor. One of the problems I am having is my method for getting the top list which consist of around 250 movies, e.g. 250 API calls. After that I render them all on my web page. The API I am using is OMDBAPI and I am getting every movie individually as you can see in the code below.
Basically that the web page does is as default loads 10 movies but I can also load in all movies which is around 250.
I am trying to wrap my head around asynchronous programming. So basically it is taking around 4-6 seconds to process this method according to stopwatch in C# but I believe it should be possible to refactor and refine. I am new to asynchronous programming and I have tried looking at MSFT documentation and several issues before here on SO, but I am not getting anywhere with speeding up the calls.
I have looked at using parallel for the issue but I think my problem should be solvable with async?
With stopwatch in C# I have pinpointed the delay to come mostly from between the two x.
I would foremost like to speed up the calls but I would love tips on best practice with async programming as well.
public async Task<List<HomeTopListMovieDTO>> GetTopListAggregatedData(Parameter parameter)
{
List<Task<HomeTopListMovieDTO>> tasks = new List<Task<HomeTopListMovieDTO>>();
var toplist = await GetToplist(parameter);
//x
foreach (var movie in toplist)
{
tasks.Add(GetTopListMovieDetails(movie.ImdbID));
}
var results = Task.WhenAll(tasks);
//x
var tempToplist = toplist.ToArray();
for (int i = 0; i < tasks.Count; i++)
{
tasks[i].Result.NumberOfLikes = tempToplist[i].NumberOfLikes;
tasks[i].Result.NumberOfDislikes = tempToplist[i].NumberOfDislikes;
}
List<HomeTopListMovieDTO> toplistMovies = results.Result.ToList();
return toplistMovies;
}
public async Task<HomeTopListMovieDTO> GetTopListMovieDetails(string imdbId)
{
string urlString = baseUrl + "i=" + imdbId + accessKey;
return await apiWebClient.GetAsync<HomeTopListMovieDTO>(urlString);
}
public async Task<T> GetAsync<T>(string urlString)
{
using (HttpClient client = new HttpClient())
{
var response = await client.GetAsync(urlString,
HttpCompletionOption.ResponseHeadersRead);
response.EnsureSuccessStatusCode();
var data = await response.Content.ReadAsStringAsync();
var result = JsonConvert.DeserializeObject<T>(data);
return result;
}
}
You async code looks OKey. I would throttle it to not make more than X parallel requests using Partitioner / Parallel for each instead but approach with WaitAll is also good enough unless you see connection refused because of port exhaustion or API DDOS protection.
You should reuse HttpClient, see more details in
https://www.aspnetmonsters.com/2016/08/2016-08-27-httpclientwrong, so in your case create HttpClient in the root method and pass it as a parameter to your async methods. HttpClient is thread safe, can be used in parallel calls.
You should dispose HttpResponse.

What is wrong with my Code (SendPingAsync)

Im writing a C# Ping-Application.
I started with a synchronous Ping-method, but I figurred out that pinging several server with one click takes more and more time.
So I decided to try the asynchronous method.
Can someone help me out?
public async Task<string> CustomPing(string ip, int amountOfPackets, int sizeOfPackets)
{
// timeout
int Timeout = 2000;
// PaketSize logic
string packet = "";
for (int j = 0; j < sizeOfPackets; j++)
{
packet += "b";
};
byte[] buffer = Encoding.ASCII.GetBytes(packet);
// time-var
long ms = 0;
// Main Method
using (Ping ping = new Ping())
for (int i = 0; i < amountOfPackets; i++)
{
PingReply reply = await ping.SendPingAsync(ip, Timeout, buffer);
ms += reply.RoundtripTime;
};
return (ms / amountOfPackets + " ms");
};
I defined a "Server"-Class (Ip or host, City, Country).
Then I create a "server"-List:
List<Server> ServerList = new List<Server>()
{
new Server("www.google.de", "Some City,", "Some Country")
};
Then I loop through this list and I try to call the method like this:
foreach (var server in ServerList)
ListBox.Items.Add("The average response time of your custom server is: " + server.CustomPing(server.IP, amountOfPackets, sizeOfPackets));
Unfortunately, this is much more competitive than the synchronous method, and at the point where my method should return the value, it returns
System.Threading.Tasks.Taks`1[System.string]
since you have an async method it will return the task when it is called like this:
Task<string> task = server.CustomPing(server.IP, amountOfPackets, sizeOfPackets);
when you add it directly to your ListBox while concatenating it with a string it will use the ToString method, which by default prints the full class name of the object. This should explaint your output:
System.Threading.Tasks.Taks`1[System.string]
The [System.string] part actually tells you the return type of the task result. This is what you want, and to get it you would need to await it! like this:
foreach (var server in ServerList)
ListBox.Items.Add("The average response time of your custom server is: " + await server.CustomPing(server.IP, amountOfPackets, sizeOfPackets));
1) this has to be done in another async method and
2) this will mess up all the parallelity that you are aiming for. Because it will wait for each method call to finish.
What you can do is to start all tasks one after the other, collect the returning tasks and wait for all of them to finish. Preferably you would do this in an async method like a clickhandler:
private async void Button1_Click(object sender, EventArgs e)
{
Task<string> [] allTasks = ServerList.Select(server => server.CustomPing(server.IP, amountOfPackets, sizeOfPackets)).ToArray();
// WhenAll will wait for all tasks to finish and return the return values of each method call
string [] results = await Task.WhenAll(allTasks);
// now you can execute your loop and display the results:
foreach (var result in results)
{
ListBox.Items.Add(result);
}
}
The class System.Threading.Tasks.Task<TResult> is a helper class for Multitasking. While it resides in the Threading Namespace, it works for Threadless Multitasking just as well. Indeed if you see a function return a task, you can usually use it for any form of Multitasking. Tasks are very agnostic in how they are used. You can even run it synchronously, if you do not mind that little extra overhead of having a Task doing not a lot.
Task helps with some of the most important rules/convetions of Multitasking:
Do not accidentally swallow exceptions. Threadbase Multitasking is notoriously good in doing just that.
Do not use the result after a cancelation
It does that by throwing you exceptions in your face (usually the Aggregate one) if you try to access the Result Property when convention tells us you should not do that.
As well as having all those other usefull properties for Multitasking.

Parallel request to scrape multiple pages of a website

I want to scrape a website with plenty of pages with interesting data but as the source is very large I want to multithread and limit the overload.
I use a Parallel.ForEach to start each chunk of 10 tasks and I wait in the main for loop until the numbers of active threads started drop below a threshold. For that I use a counter of active threads I increment when starting a new thread with a WebClient and decrement when the DownloadStringCompleted event of the WebClient is triggered.
Originally the questions was how to use DownloadStringTaskAsync instead of DownloadString and wait that each of the threads started in the Parallel.ForEach has completed. This has been solved with a workaround:
a counter (activeThreads) and a Thread.Sleep in the main foor loop.
Is using await DownloadStringTaskAsync instead of DownloadString supposed to improve at all the speed by freeing a thread while waiting for the DownloadString data to arrive ?
And to get back to the original question, is there a way to do this more elegantly using TPL without the workaround of involving a counter ?
private static volatile int activeThreads = 0;
public static void RecordData()
{
var nbThreads = 10;
var source = db.ListOfUrls; // Thousands urls
var iterations = source.Length / groupSize;
for (int i = 0; i < iterations; i++)
{
var subList = source.Skip(groupSize* i).Take(groupSize);
Parallel.ForEach(subList, (item) => RecordUri(item));
//I want to wait here until process further data to avoid overload
while (activeThreads > 30) Thread.Sleep(100);
}
}
private static async Task RecordUri(Uri uri)
{
using (WebClient wc = new WebClient())
{
Interlocked.Increment(ref activeThreads);
wc.DownloadStringCompleted += (sender, e) => Interlocked.Decrement(ref iterationsCount);
var jsonData = "";
RootObject root;
jsonData = await wc.DownloadStringTaskAsync(uri);
var root = JsonConvert.DeserializeObject<RootObject>(jsonData);
RecordData(root)
}
}
If you want an elegant solution you should use Microsoft's Reactive Framework. It's dead simple:
var source = db.ListOfUrls; // Thousands urls
var query =
from uri in source.ToObservable()
from jsonData in Observable.Using(
() => new WebClient(),
wc => Observable.FromAsync(() => wc.DownloadStringTaskAsync(uri)))
select new { uri, json = JsonConvert.DeserializeObject<RootObject>(jsonData) };
IDisposable subscription =
query.Subscribe(x =>
{
/* Do something with x.uri && x.json */
});
That's the entire code. It's nicely multi-threaded and it's kept under control.
Just NuGet "System.Reactive" to get the bits.
Parallel.ForEach
Will create ProcessorCount tasks to execute the function for each item in the source Enumerable. It will take care that there are not to many tasks and will wait for all items and tasks to be executed.
Task.WhenAll
Only awaits the given tasks it does not execute them. Its on your hand to execute them in a proper way and not to many at once.
But there is some fault in your code. The function RecordUri will return a task that has to be awaited otherwise the ForEach will just create more and more as the function will never know when the current task is completed. Also problematic is that you create a task in a task and the first task does nothing else then wait for the first one.
You might also want to take a look at this overload of Parallel.ForEach
https://msdn.microsoft.com/en-us/library/dd782934(v=vs.110).aspx
Edit
Is using await DownloadStringTaskAsync instead of DownloadString supposed to improve at all the speed by freeing a thread while waiting for the DownloadString data to arrive ?
No. As when a task is awaiting a external resource it enters a Suspended state (Windows api that is not using some old/dirty iteration waiting). So there is no much difference.
What differs is the overhead the compiler will generate when compiling your async code. The DownloadStringTaskAsync will create a task that contains the long operation. If you use await it, you will attach yourself to that task (by ContinueWith). So you just create a Task for awaiting another. This is the overhead i was talking about in the upper text.
My approach would be: Use the synchronous method inside your Parallel.ForEach. The Threadding will be done by PLinq and you are free to go on.
Remember "KISS"

How to specify the number of parallel tasks executed in Parallel.ForEach? [duplicate]

This question already has answers here:
Keep running a specific number of tasks
(2 answers)
Have a set of Tasks with only X running at a time
(5 answers)
Closed 9 years ago.
I have ~500 tasks, each of them takes ~5 seconds where most of the time is wasted on waiting for the remote resource to reply. I would like to define the number of threads that should be spawned myself (after some testing) and run the tasks on those threads. When one task finishes I would like to spawn another task on the thread that became available.
I found System.Threading.Tasks the easiest to achieve what I want, but I think it is impossible to specify the number of tasks that should be executed in parallel. For my machine it's always around 8 (quad core cpu). Is it possible to somehow tell how many tasks should be executed in parallel? If not what would be the easiest way to achieve what I want? (I tried with threads, but the code is much more complex). I tried increasing MaxDegreeOfParallelism parameter, but it only limits the maximum number, so no luck here...
This is the code that I have currently:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
class Program
{
private static List<string> _list = new List<string>();
private static int _toProcess = 0;
static void Main(string[] args)
{
for (int i = 0; i < 1000; ++i)
{
_list.Add("parameter" + i);
}
var w = new Worker();
var w2 = new StringAnalyzer();
Parallel.ForEach(_list, new ParallelOptions() { MaxDegreeOfParallelism = 32 }, item =>
{
++_toProcess;
string data = w.DoWork(item);
w2.AnalyzeProcessedString(data);
});
Console.WriteLine("Finished");
Console.ReadKey();
}
static void Done(Task<string> t)
{
Console.WriteLine(t.Result);
--_toProcess;
}
}
class Worker
{
public string DoWork(string par)
{
// It's a long running but not CPU heavy task (downloading stuff from the internet)
System.Threading.Thread.Sleep(5000);
return par + " processed";
}
}
class StringAnalyzer
{
public void AnalyzeProcessedString(string data)
{
// Rather short, not CPU heavy
System.Threading.Thread.Sleep(1000);
Console.WriteLine(data + " and analyzed");
}
}
}
Assuming you can use native async methods like HttpClient.GetStringAsync while getting your resource,
int numTasks = 20;
SemaphoreSlim semaphore = new SemaphoreSlim(numTasks);
HttpClient client = new HttpClient();
List<string> result = new List<string>();
foreach(var url in urls)
{
semaphore.Wait();
client.GetStringAsync(url)
.ContinueWith(t => {
lock (result) result.Add(t.Result);
semaphore.Release();
});
}
for (int i = 0; i < numTasks; i++) semaphore.Wait();
Since GetStringAsync uses IO Completions Ports internally (like most other async IO methods) instead of creating new threads, this can be the solution you are after.
See also http://blog.stephencleary.com/2013/11/there-is-no-thread.html
As L.B mentioned, .NET Framework has methods that performs I/O operations (requests to databases, web services etc.) using IOCP internally, they can be recognized by their names - it ends with Async by convention. So you could just use them to build robust scalable applications that can process multiple requests simultaneously.
EDIT: I've completely rewritten the code example with the modern best practices so it becomes much more readable, shorter and easy to use.
For the .NET 4.5 we can use async-await approach:
class Program
{
static void Main(string[] args)
{
var task = Worker.DoWorkAsync();
task.Wait(); //stop and wait until our async method completed
foreach (var item in task.Result)
{
Console.WriteLine(item);
}
Console.ReadLine();
}
}
static class Worker
{
public async static Task<IEnumerable<string>> DoWorkAsync()
{
List<string> results = new List<string>();
for (int i = 0; i < 10; i++)
{
var request = (HttpWebRequest)WebRequest.Create("http://microsoft.com");
using (var response = await request.GetResponseAsync())
{
results.Add(response.ContentType);
}
}
return results;
}
}
Here is the nice MSDN tutorial about async programming using async-await.

C# WebClient with Task.Run only achieve 5% network usage. WHY?

I am experimenting / learning the new Task library and I have written a very simple html downloader using WebClient and Task.Run. However I can never reach anything more than 5% on my network usage. I would like to understand why and how I can improve my code to reach 100% network usage / throughput (probably not possible but it has to be a lot more than 5%).
I would also like to be able to limit the number of thread however it seems it's not as easy as I thought (i.e. custom task scheduler). Is there a way to just do something like this to set the max thread count: something.SetMaxThread(2)?
internal static class Program
{
private static void Main()
{
for (var i = 0; i < 1000000; i++)
{
Go(i, Thread.CurrentThread.ManagedThreadId);
}
Console.Read();
}
private static readonly Action<int, int> Go = (counter, threadId) => Task.Run(() =>
{
var stopwatch = new Stopwatch();
stopwatch.Start();
var webClient = new WebClient();
webClient.DownloadString(new Uri("http://stackoverflow.com"));
stopwatch.Stop();
Console.Write("{0} == {1} | ", threadId.ToString("D3"), Thread.CurrentThread.ManagedThreadId.ToString("D3"));
Console.WriteLine("{0}: {1}ms ", counter.ToString("D3"), stopwatch.ElapsedMilliseconds.ToString("D4"));
});
}
This is the async version according to #spender. However my understanding is that await will "remember" the point in time and hand off the download to OS level and skip (the 2 console.write) and return to main immediately and continue scheduling the remaining Go method in the for loop. Am I understanding it correctly? So there's no blocking on the UI.
private static async void Go(int counter, int threadId)
{
using (var webClient = new WebClient())
{
var stopWatch = new Stopwatch();
stopWatch.Start();
await webClient.DownloadStringTaskAsync(new Uri("http://ftp.iinet.net.au/test500MB.dat"));
stopWatch.Stop();
Console.Write("{0} == {1} | ", threadId.ToString("D3"), Thread.CurrentThread.ManagedThreadId.ToString("D3"));
Console.WriteLine("{0}: {1}ms ", counter.ToString("D3"), stopWatch.ElapsedMilliseconds.ToString("D4"));
}
}
What I noticed was that when I am downloading large files there's no that much difference in terms of download speed / network usage. They (threading version and the async version) both peaked at about 12.5% network usage and about 12MByte download /sec. I also tried to run multiple instances (multiple .exe running) and again there's no huge difference between the two. And when I am trying to download large files from 2 URLs concurrently (20 instances) I get similar network usage (12.5%) and download speed (10-12MByte /sec). I guess I am reaching the peak?
As it stands, your code is suboptimal because, although you are using Task.Run to create asynchronous code that runs in the ThreadPool, the code that is being run in the ThreadPool is still blocking on the line:
webClient.DownloadString(...
This amounts to an abuse of the ThreadPool because it is not designed to run blocking tasks, and is slow to spin up additional threads to deal with peaks in workload. This in turn will have a seriously degrading effect on the smooth running of any API that uses the ThreadPool (timers, async callbacks, they're everywhere), because they'll schedule work that goes to the back of the (saturated) queue for the ThreadPool (which is tied up reluctantly and slowly spinning up hundreds of threads that are going to spend 99.9% of their time doing nothing).
Stop blocking the ThreadPool and switch to proper async methods that do not block.
So now you can literally break your router and seriously upset the SO site admins with the following simple mod:
private static void Main()
{
for (var i = 0; i < 1000000; i++)
{
Go(i, Thread.CurrentThread.ManagedThreadId);
}
Console.Read();
}
private static async Task Go(int counter, int threadId)
{
var stopwatch = new Stopwatch();
stopwatch.Start();
using (var webClient = new WebClient())
{
await webClient.DownloadStringTaskAsync(
new Uri("http://stackoverflow.com"));
}
//...
}
HttpWebRequest (and therefore WebClient) are also constrained by a number of limits.

Categories