C# - how to do multiple web requests at the same time - c#

I wrote a code to check urls, however, ir works really slow.. I want to try to make it work on few urls at the same time, for example 10 urls or at least make it as fast as possible.
my Code:
Parallel.ForEach(urls, new ParallelOptions {
MaxDegreeOfParallelism = 10
}, s => {
try {
using(HttpRequest httpRequest = new HttpRequest()) {
httpRequest.UserAgent = "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0";
httpRequest.Cookies = new CookieDictionary(false);
httpRequest.ConnectTimeout = 10000;
httpRequest.ReadWriteTimeout = 10000;
httpRequest.KeepAlive = true;
httpRequest.IgnoreProtocolErrors = true;
string check = httpRequest.Get(s + "'", null).ToString();
if (errors.Any(new Func < string, bool > (check.Contains))) {
Valid.Add(s);
Console.WriteLine(s);
File.WriteAllLines(Environment.CurrentDirectory + "/Good.txt", Valid);
}
}
} catch {
}
});

It is unlikely that your service calls are CPU-bound. So spinning up more threads to handle the load is maybe not the best approach-- you will get better throughput if you use async and await instead, if you can, using the more modern HttpClient instead of HttpRequest or HttpWebRequest.
Here is an example of how to do it:
var client = new HttpClient();
//Start with a list of URLs
var urls = new string[]
{
"http://www.google.com",
"http://www.bing.com"
};
//Start requests for all of them
var requests = urls.Select
(
url => client.GetAsync(url)
).ToList();
//Wait for all the requests to finish
await Task.WhenAll(requests);
//Get the responses
var responses = requests.Select
(
task => task.Result
);
foreach (var r in responses)
{
// Extract the message body
var s = await r.Content.ReadAsStringAsync();
Console.WriteLine(s);
}

Try doing as below.
Parallel.ForEach(urls, new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount - 1 }
At least it makes sure that all the cores are used by leaving 1 so that your machine will not run out of memory.
Also, consider #KSib comment.

Related

Ping with multithreading

I did this:
WebClient client = new WebClient();
string[] dns = client.DownloadString("https://public-dns.info/nameservers.txt")
.Split('\n');
List<string> parsedDns = new List<string>();
foreach (string dnsStr in dns)
{
Ping ping = new Ping();
if (dnsStr.Contains(":"))
{
}
else if (ping.SendPingAsync(dnsStr, 150).Result.RoundtripTime <= 150)
{
parsedDns.Add(dnsStr);
}
}
foreach (var dns_ in parsedDns.ToArray())
{
Console.WriteLine(dns_);
}
Console.ReadKey();
That what it does is collect the DNS of a page, put them in a string[] and then ping them one by one and those with less than 150ms of response are saved and printed on the console. I tried to do it with multithreads but it kept giving me errors and I would like to know how it would be to do this with for example 500 threads without any bugs in order to increase the speed of this process.
You could use the Parallel.ForEachAsync API, that was introduced in .NET 6.
var parsedDns = new ConcurrentQueue<string>();
var options = new ParallelOptions() { MaxDegreeOfParallelism = 10 };
Parallel.ForEachAsync(dns, options, async (dnsStr, ct) =>
{
Ping ping = new();
PingReply reply = await ping.SendPingAsync(dnsStr, 150);
if (reply.RoundtripTime <= 150)
{
parsedDns.Enqueue(dnsStr);
}
}).Wait();
The Parallel.ForEachAsync method returns a Task that you can either await, or simply Wait as in the above example.

Why is using tasks with HttpClient synchronously so much slower?

So I was trying to do a quick performance test against a web api to see how it would handle multiple synchronous HTTP requests at once. I did this by spinning up 30 multiple tasks and have each of them send a http request with the HttpClient. To my surprise, it was extremely slow. I thought it was due to the lack of async/await or the web api was slow, but it turns out it's only when I'm using tasks and synchronous http calls (see TestSynchronousWithParallelTasks() below).
So I did a comparison between using without Tasks, async/await with tasks, and ParallelForEach by making some simple tests. All of these finished quickly around 10-20 milliseconds, but the original case which takes around 20 seconds!
Class: HttpClientTest Passed (5) 19.2 sec TestProject.HttpClientTest.TestAsyncWithParallelTasks Passed 12 ms TestProject.HttpClientTest.TestIterativeAndSynchronous Passed 22 ms TestProject.HttpClientTest.TestParallelForEach Passed 15 ms TestProject.HttpClientTest.TestSynchronousWithParallelTasks Passed 19.1 sec TestProject.HttpClientTest.TestSynchronousWithParallelThreads Passed 10 ms
public class HttpClientTest
{
private HttpClient httpClient;
private readonly ITestOutputHelper _testOutputHelper;
public HttpClientTest(ITestOutputHelper testOutputHelper)
{
_testOutputHelper = testOutputHelper;
ServicePointManager.DefaultConnectionLimit = 100;
httpClient = new HttpClient(new HttpClientHandler { MaxConnectionsPerServer = 100 });
}
[Fact]
public async Task TestSynchronousWithParallelTasks()
{
var tasks = new List<Task>();
var url = "https://localhost:44388/api/values";
for (var i = 0; i < 30; i++)
{
var task = Task.Run(() =>
{
var response = httpClient.GetAsync(url).Result;
var content = response.Content.ReadAsStringAsync().Result;
});
tasks.Add(task);
}
await Task.WhenAll(tasks);
}
[Fact]
public void TestIterativeAndSynchronous()
{
var url = "https://localhost:44388/api/values";
for (var i = 0; i < 30; i++)
{
var response = httpClient.GetAsync(url).Result;
var content = response.Content.ReadAsStringAsync().Result;
}
}
[Fact]
public async Task TestAsyncWithParallelTasks()
{
var url = "https://localhost:44388/api/values";
var tasks = new List<Task>();
for (var i = 0; i < 30; i++)
{
var task = Task.Run(async () =>
{
var response = await httpClient.GetAsync(url);
var content = await response.Content.ReadAsStringAsync();
});
tasks.Add(task);
}
await Task.WhenAll(tasks);
}
[Fact]
public void TestParallelForEach()
{
var url = "https://localhost:44388/api/values";
var n = new int[30];
Parallel.ForEach(n, new ParallelOptions { MaxDegreeOfParallelism = 2 }, (i) =>
{
var response = httpClient.GetAsync(url).Result;
var content = response.Content.ReadAsStringAsync().Result;
});
}
[Fact]
public async Task TestSynchronousWithParallelThreads()
{
var tasks = new List<Task>();
var url = "https://localhost:44388/api/values";
var threads = new List<Thread>();
for (var i = 0; i < 30; i++)
{
var thread = new Thread( () =>
{
var response = httpClient.GetAsync(url).Result;
var content = response.Content.ReadAsStringAsync().Result;
});
thread.Start();
threads.Add(thread);
}
foreach(var thread in threads)
{
thread.Join();
}
}
}
So any idea what's causing this performance hit?
I would have expected TestSynchronousWithParallelTasks() to be faster than TestIterativeAndSynchronous() as you'd be starting more requests at once, even if it's IO bound. While the latter is waiting for each request before starting a new one. So it seems like it's related to the tasks somehow blocking each other?
Edit: Added a test case to use threads instead and it's quick like the rest.

Getting latest app version from play store xamarin

How can I get latest android app version from Google play store? Earlier to used to do so by using below code
using (var webClient = new System.Net.WebClient())
{
var searchString = "itemprop=\"softwareVersion\">";
var endString = "</";
//possible network error if phone gets disconnected
string jsonString = webClient.DownloadString(PlayStoreUrl);
var pos = jsonString.IndexOf(searchString, StringComparison.InvariantCultureIgnoreCase) + searchString.Length;
var endPos = jsonString.IndexOf(endString, pos, StringComparison.Ordinal);
appStoreversion = Convert.ToDouble(jsonString.Substring(pos, endPos - pos).Trim());
System.Diagnostics.Debug.WriteLine($"{currentVersion} :: {appStoreversion}");
System.Diagnostics.Debug.WriteLine($"{appStoreversion > currentVersion}");
if ((appStoreversion.ToString() != currentVersion.ToString() && (appStoreversion > currentVersion)))
{
IsUpdateRequired = true;
}
}
& the code below even throwing exception
var document =
Jsoup.Connect("https://play.google.com/store/apps/details?id=" + "com.spp.in.spp" + "&hl=en")
.Timeout(30000)
.UserAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6")
.Referrer("http://www.google.com")
.Get();
Eception:
Android.OS.NetworkOnMainThreadException: Exception of type
'Android.OS.NetworkOnMainThreadException' was thrown. at
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw ()
But now Play store seems to change few conditions, so existing functionality is broke down. Few similar threads are already available here however those seems to have outdated.
This will return a string-based version, at least until Google changes the html page contents again.
var version = await Task.Run(async () =>
{
var uri = new Uri($"https://play.google.com/store/apps/details?id={PackageName}&hl=en");
using (var client = new HttpClient())
using (var request = new HttpRequestMessage(HttpMethod.Get, uri))
{
request.Headers.TryAddWithoutValidation("Accept", "text/html");
request.Headers.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0");
request.Headers.TryAddWithoutValidation("Accept-Charset", "ISO-8859-1");
using (var response = await client.SendAsync(request).ConfigureAwait(false))
{
try
{
response.EnsureSuccessStatusCode();
var responseHTML = await response.Content.ReadAsStringAsync().ConfigureAwait(false);
var rx = new Regex(#"(?<=""htlgb"">)(\d{1,3}\.\d{1,3}\.{0,1}\d{0,3})(?=<\/span>)", RegexOptions.Compiled);
MatchCollection matches = rx.Matches(responseHTML);
return matches.Count > 0 ? matches[0].Value : "Unknown";
}
catch
{
return "Error";
}
}
}
}
);
Console.WriteLine(version);
Based from this link, this exception is thrown when an application attempts to perform a networking operation on its main thread. You may refer with this thread wherein it stated that network operations on Android need to be performed off the main UI thread. The easiest way is use a Task to push it onto a thread in the default threadpool.

Processing large number of tasks concurrently and asynchronously

I would like to process a list of 50,000 urls through a web service, The provider of this service allows 5 connections per second.
I need to process these urls in parallel with adherence to provider's rules.
This is my current code:
static void Main(string[] args)
{
process_urls().GetAwaiter().GetResult();
}
public static async Task process_urls()
{
// let's say there is a list of 50,000+ URLs
var urls = System.IO.File.ReadAllLines("urls.txt");
var allTasks = new List<Task>();
var throttler = new SemaphoreSlim(initialCount: 5);
foreach (var url in urls)
{
await throttler.WaitAsync();
allTasks.Add(
Task.Run(async () =>
{
try
{
Console.WriteLine(String.Format("Starting {0}", url));
var client = new HttpClient();
var xml = await client.GetStringAsync(url);
//do some processing on xml output
client.Dispose();
}
finally
{
throttler.Release();
}
}));
}
await Task.WhenAll(allTasks);
}
Instead of var client = new HttpClient(); I will create a new object of the target web service but this is just to make the code generic.
Is this the correct approach to handle and process a huge list of connections? and is there anyway I can limit the number of established connections per second to 5 as the current implementation will not consider any timeframe?
Thanks
Reading values from web service is IO operation which can be done asynchronously without multithreading.
Threads do nothing - only waiting for response in this case. So using parallel is just wasting of resources.
public static async Task process_urls()
{
var urls = System.IO.File.ReadAllLines("urls.txt");
var allTasks = new List<Task>();
var throttler = new SemaphoreSlim(initialCount: 5);
foreach (var urlGroup in SplitToGroupsOfFive(urls))
{
var tasks = new List<Task>();
foreach(var url in urlGroup)
{
var task = ProcessUrl(url);
tasks.Add(task);
}
// This delay will sure that next 5 urls will be used only after 1 seconds
tasks.Add(Task.Delay(1000));
await Task.WhenAll(tasks.ToArray());
}
}
private async Task ProcessUrl(string url)
{
using (var client = new HttpClient())
{
var xml = await client.GetStringAsync(url);
//do some processing on xml output
}
}
private IEnumerable<IEnumerable<string>> SplitToGroupsOfFive(IEnumerable<string> urls)
{
var const GROUP_SIZE = 5;
var string[] group = null;
var int count = 0;
foreach (var url in urls)
{
if (group == null)
group = new string[GROUP_SIZE];
group[count] = url;
count++;
if (count < GROUP_SIZE)
continue;
yield return group;
group = null;
count = 0;
}
if (group != null && group.Length > 0)
{
yield return group.Take(group.Length);
}
}
Because you mention that "processing" of response is also IO operation, then async/await approach is most efficient, because it using only one thread and process other tasks when previous tasks waiting for response from web service or from file writing IO operations.

How to use HttpClient PostAsync() with threadpool in C#?

I'm using the following code to post an image to a server.
var image= Image.FromFile(#"C:\Image.jpg");
Task<string> upload = Upload(image);
upload.Wait();
public static async Task<string> Upload(Image image)
{
var uriBuilder = new UriBuilder
{
Host = "somewhere.net",
Path = "/path/",
Port = 443,
Scheme = "https",
Query = "process=false"
};
using (var client = new HttpClient())
{
client.DefaultRequestHeaders.Add("locale", "en_US");
client.DefaultRequestHeaders.Add("country", "US");
var content = ConvertToHttpContent(image);
content.Headers.ContentType = MediaTypeHeaderValue.Parse("image/jpeg");
using (var mpcontent = new MultipartFormDataContent("--myFakeDividerText--")
{
{content, "fakeImage", "myFakeImageName.jpg"}
}
)
{
using (
var message = await client.PostAsync(uriBuilder.Uri, mpcontent))
{
var input = await message.Content.ReadAsStringAsync();
return "nothing for now";
}
}
}
}
I'd like to modify this code to run multiple threads. I've used "ThreadPool.QueueUserWorkItem" before and started to modify the code to leverage it.
private void UseThreadPool()
{
int minWorker, minIOC;
ThreadPool.GetMinThreads(out minWorker, out minIOC);
ThreadPool.SetMinThreads(1, minIOC);
int maxWorker, maxIOC;
ThreadPool.GetMaxThreads(out maxWorker, out maxIOC);
ThreadPool.SetMinThreads(4, maxIOC);
var events = new List<ManualResetEvent>();
foreach (var image in ImageCollection)
{
var resetEvent = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem(
arg =>
{
var img = Image.FromFile(image.getPath());
Task<string> upload = Upload(img);
upload.Wait();
resetEvent.Set();
});
events.Add(resetEvent);
if (events.Count <= 0) continue;
foreach (ManualResetEvent e in events) e.WaitOne();
}
}
The problem is that only one thread executes at a time due to the call to "upload.Wait()". So I'm still executing each thread in sequence. It's not clear to me how I can use PostAsync with a thread-pool.
How can I post images to a server using multiple threads by tweaking the code above? Is HttpClient PostAsync the best way to do this?
I'd like to modify this code to run multiple threads.
Why? The thread pool should only be used for CPU-bound work (and I/O completions, of course).
You can do concurrency just fine with async:
var tasks = ImageCollection.Select(image =>
{
var img = Image.FromFile(image.getPath());
return Upload(img);
});
await Task.WhenAll(tasks);
Note that I removed your Wait. You should avoid using Wait or Result with async tasks; use await instead. Yes, this will cause async to grow through you code, and you should use async "all the way".

Categories