I have an issue where i loop over about 31 webservice URLs.
If i put a Thread.Sleep(1000) in the top code, it will work perfectly, but if I remove this, I only get success on 10 (sometimes less and sometimes more) request out of 31. How do I make it wait?
Code
foreach(var item in ss)
{
//Call metaDataApi(url,conn,name,alias)
}
public static void metadataApi(string _url, string _connstring, string _spname, string _alias)
{
// Thread.Sleep(1000);
//Metadata creation - Table Creation
using (var httpClient = new HttpClient())
{
string url = _url;
using (HttpResponseMessage response = httpClient.GetAsync(url).GetAwaiter().GetResult())
using (HttpContent content = response.Content)
{
Console.WriteLine("CHECKING");
if (response.IsSuccessStatusCode)
{
Console.WriteLine("IS OK");
string json = content.ReadAsStringAsync().GetAwaiter().GetResult();
//Doing some stuff not relevant
}
}
}
}
How it can look
You should look to use async/await where you can, but you could try something like this:
// you should share this for connection pooling
public static HttpClient = new HttpClient();
public static void Main(string[] args)
{
// build a list of tasks to wait on, then wait
var tasks = ss.Select(x => metadataApi(url, conn, name, alias)).ToArray();
Task.WaitAll(tasks);
}
public static async Task metadataApi(string _url, string _connstring, string _spname, string _alias)
{
string url = _url;
var response = await httpClient.GetAsync(url);
Console.WriteLine("CHECKING");
if (response.IsSuccessStatusCode)
{
Console.WriteLine("IS OK");
string json = await content.ReadAsStringAsync();
//Doing some stuff not relevant
}
}
One thing to note, this will try to run many in parallel. If you need to run them all one after the other, may want to make another async function that waits on each result individually and call that from the Main. .Result is a bit of an antipattern (with modern c# syntax, you can use async on the main function) but for your script it should be "ok", but I'd minimize usage of it (hence why I wouldn't use .Result inside of a loop.
Related
I have a list of URLs (thousands), I want to asynchronously get page data from each URL as fast as possible without putting extreme load on the CPU.
I have tried using threading but it still feels quite slow:
public static ConcurrentQueue<string> List = new ConcurrentQueue<string>(); //URL List (assume I added them already)
public static void Threading()
{
for(int i=0;i<100;i++) //100 threads
{
Thread thread = new Thread(new ThreadStart(Task));
thread.Start();
}
}
public static void Task()
{
while(!(List.isEmpty))
{
List.TryDequeue(out string URL);
//GET REQUEST HERE
}
}
Is there any better way to do this? I want to do this asynchronously but I can't figure out how to do it, and I don't want to sacrifice speed or CPU efficiency to do so.
Thanks :)
You should use Microsoft's Reactive Framework (aka Rx) - NuGet System.Reactive and add using System.Reactive.Linq; - then you can do this:
public static IObservable<(string url, string content)> GetAllUrls(List<string> urls) =>
Observable
.Using(
() => new HttpClient(),
hc =>
from url in urls.ToObservable()
from response in Observable.FromAsync(() => hc.GetAsync(url))
from content in Observable.FromAsync(() => response.Content.ReadAsStringAsync())
select (url, content));
That allows you to consume the results in a couple of ways.
You can process them as they get produced:
IDisposable subscription =
GetAllUrls(urlsx).Subscribe(x => Console.WriteLine(x.content));
Or you can get all of them produced and then await the full results:
(string url, string content)[] results = await GetAllUrls(urlsx).ToArray();
You are best off using HttpClient which allows async Task requests.
Just store each task in a list, and await the whole list. To prevent too many requests at once, wait for any single one to complete if there are too many, and remove the completed one from the list.
const int maxDegreeOfParallelism = 100;
static HttpClient _client = new HttpClient();
public static async Task GetAllUrls(List<string> urls)
{
var tasks = new List<Task>(urls.Count);
foreach (var url in urls)
{
if (tasks.Count == maxDegreeOfParallelism) // this prevents too many requests at once
tasks.Remove(await Task.WhenAny(tasks));
tasks.Add(GetUrl(url));
}
await Task.WhenAll(tasks);
}
private static async Task GetUrl(string url)
{
using var response = await _client.GetAsync(url);
// handle response here
var responseStr = await response.Content.ReadAsStringAsync(); // whatever
// do stuff etc
}
I'm coding a plug-in for Excel. I'd like to add a new method to excel that can crawl a web page and get back the html code.
my problem is that i have a lot of URLs to proces and if I use a sync method, it will take a lot of time and freeze my excel.
let say, i have a cell A1 which contains "http://www.google.com", and in A2, my method "=downloadHtml(A1)".
I'm using HttpClient because it is already handling Async. So here is my code :
static void Main()
{
GetWebPage(new Uri("http://www.google.com"));
}
static async void GetWebPage(Uri URI)
{
string html = await HttpGetAsync(URI);
//Do other operations with html code
Console.WriteLine(html);
}
static async Task<string> HttpGetAsync(Uri URI)
{
try
{
HttpClient hc = new HttpClient();
Task<Stream> result = hc.GetStreamAsync(URI);
Stream vs = await result;
StreamReader am = new StreamReader(vs);
return await am.ReadToEndAsync();
}
catch (WebException ex)
{
switch (ex.Status)
{
case WebExceptionStatus.NameResolutionFailure:
Console.WriteLine("domain_not_found");
break;
//Catch other exceptions here
}
}
return "";
}
The probem is that, when i run the program, the program exits before the task complete.
If i add a
Console.ReadLine();
the program will not exit do to the readline instruction, and after a couple of seconds, i see the html printed into my screen (du to the console.writeline instruction). So the program works.
how can i handle this ?
GetWebPage is a fire-and-forget method (an async void), so you cannot wait for it to finish.
You should be using this instead:
static void Main()
{
string html = Task.Run(() => HttpGetAsync(new Uri("http://www.google.com"))).GetAwaiter().GetResult();
//Do other operations with html code
Console.WriteLine(html);
}
Also, you could simplify the download code to this:
using (var HttpClient hc = new HttpClient())
{
return await hc.GetStringAsync(URI);
}
I have a WinForms application that has two roles. If no command line parameters are present, the Main function calls Application.Run, and presents the UI. If command line parameters are present, Application.Run is NOT called. Instead, I call an async method like this:
result = HandleCommandLine(args).GetAwaiter().GetResult();
(I am new to async/await, and this form was based on a SO answer).
The end goal is to loop through a list, and for each entry, start a new task. Each of those tasks should run in parallel with the others. The tasks are started like this:
runningTasks.Add(Task.Factory.StartNew((args) => HandlePlayback( (Dictionary<string,string>) ((object[])args)[0]), new object[] { runArgs } ));
The tasks are added to the collection of runningTasks, and I later call:
Task.WaitAll(runningTasks.ToArray());
In each of the runningTasks, I am trying to send web requests using HttpClient:
using (HttpResponseMessage response = await Client.SendAsync(message))
{
using (HttpContent responseContent = response.Content)
{
result = await responseContent.ReadAsStringAsync();
}
}
Once Client.SendAsync is called, the whole thing goes belly up. All of my runningTasks complete, and the application exits. Nothing past the Client.SendAsync executes in any of those tasks.
Since I am new at async/await, I have very few ideas about what exactly might be wrong, and hence few ideas about how to fix it. I imagine it has something to do with the SynchronizationContexts in this situation (WinForms app acting like a console app), but I'm not grasping what I need to do and where to keep the service request and the web request async calls from causing everything to complete too early.
I guess my question then is, why are (only some) 'awaited' calls causing all tasks to complete? What can I do about it?
UPDATE:
Two things. #Joe White: The WindowsFormsSynchronizationContext.Current is always null wherever I check.
#David Pine: Minimal (kind of :) ) complete viable example follows. You will either need to add a command line argument to the project, or force execution to the HandleCommandLine function. In this example, it tries to make a website request for each of three sites. It doesn't appear to matter if they exist. The code reaches the Client.SendAsync some number of times (usually not three), but timing appears to matter.
using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace WindowsFormsApplication1
{
static class Program
{
static List<Task> runningTasks = new List<Task>();
[STAThread]
static int Main()
{
int result = 1; // true, optimism
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
string[] args = Environment.GetCommandLineArgs();
if (args.Length > 1)
{
// do the command line work async, while keeping this thread active
result = HandleCommandLine(args).GetAwaiter().GetResult();
}
else
{
// normal interface mode
Application.Run(new Form1());
}
return result;
}
static async Task<int> HandleCommandLine(string[] args)
{
// headless mode
int result = 1; // true, optimism
result = await HandleControlMode(args);
return result;
}
private static async Task<int> HandleControlMode(string[] Arguments)
{
int result = 1; // optimism
try
{
List<string> sites = new List<string>() { #"http://localhost/site1", #"http://localhost/site2", #"http://localhost/site3" };
foreach (string site in sites)
{
Begin(site); // fire off tasks
// the HandleControlMode method is async because in other circumstances, I do the following:
//await Task.Delay(5000); // sleep 5 seconds
}
// wait while all test running threads complete
try
{
Task.WaitAll(runningTasks.ToArray());
}
catch (Exception)
{
// not really a catch all handler...
}
}
catch (Exception)
{
// not really a catch all handler...
}
return result;
}
private static void Begin(string site)
{
//runningTasks.Add(Task.Factory.StartNew(() => HandlePlayback(runArgs)));
runningTasks.Add(Task.Factory.StartNew((args) => HandlePlayback((string)((object[])args)[0]), new object[] { site }));
}
private static async Task<int> HandlePlayback(string site)
{
int result = 1;
try
{
PlaybackEngine engine = new PlaybackEngine(site);
bool runResult = await engine.RunCommandLine(site);
if (!runResult)
{
result = 0;
}
}
catch (Exception)
{
result = 0;
}
return result;
}
}
public class PlaybackEngine
{
private static HttpClientHandler ClientHandler = new HttpClientHandler()
{
AllowAutoRedirect = false,
AutomaticDecompression = System.Net.DecompressionMethods.GZip | DecompressionMethods.Deflate
};
private static HttpClient Client = new HttpClient(ClientHandler);
public string Target { get; set; }
public PlaybackEngine(string target)
{
Target = target;
}
public async Task<bool> RunCommandLine(string site)
{
bool success = true;
string response = await this.SendRequest();
return success;
}
private async Task<string> SendRequest()
{
string result = string.Empty;
string requestTarget = Target;
HttpMethod method = HttpMethod.Post;
var message = new HttpRequestMessage(method, requestTarget);
StringContent requestContent = null;
requestContent = new StringContent("dummycontent", Encoding.UTF8, "application/x-www-form-urlencoded");
message.Content = requestContent;
try
{
using (HttpResponseMessage response = await Client.SendAsync(message))
{
using (HttpContent responseContent = response.Content)
{
result = await responseContent.ReadAsStringAsync();
System.Diagnostics.Debug.WriteLine(result);
}
}
}
catch (Exception ex)
{
}
return result;
}
}
}
UPDATE2:
I put similar code online at http://rextester.com/CJS33330
It's a straight console app, and I've added .ConfigureAwait(false) to all awaits (with no effect). In separate testing, I tried 4 or 5 other ways to call the first async function from Main - which all worked but had the same behavior.
The problem with this code is that I am not waiting on the Tasks that I thought I was. The runningTasks collection accepts any kind of Task. I didn't realize that Task.Factory.StartNew returned different type than the Task I was trying to start. My function returns
Task<int>
but StartNew returns
Task<Task<int>>
Those tasks completed immediately, and so the main thread did not stay alive long enough for the actual routines to run. You have to wait on the inner task instead:
Task<Task<int>> wrappedTask = Task.Factory.StartNew(...);
Task<int> t = await wrappedTask;
runningTasks.Add(t);
...
Task allTasks = Task.WhenAll(runningTasks.ToArray());
await allTasks;
For some reason, I was not able to use the built in ".Unwrap" function that should be equivalent, but the above code does the job.
I have made a class to handle multiple HTTP GET requests. It looks something like this:
public partial class MyHttpClass : IDisposable
{
private HttpClient theClient;
private string ApiBaseUrl = "https://example.com/";
public MyHttpClass()
{
this.theClient = new HttpClient();
this.theClient.BaseAddress = new Uri(ApiBaseUrl);
this.theClient.DefaultRequestHeaders.Accept.Clear();
this.theClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
}
public async Task<JObject> GetAsync(string reqUrl)
{
var returnObj = new JObject();
var response = await this.theClient.GetAsync(reqUrl);
if (response.IsSuccessStatusCode)
{
returnObj = await response.Content.ReadAsAsync<JObject>();
Console.WriteLine("GET successful");
}
else
{
Console.WriteLine("GET failed");
}
return returnObj;
}
public void Dispose()
{
theClient.Dispose();
}
}
I am then queueing multiple requets by using a loop over Task.Run() and then after Task.WaitAll() in the manner of:
public async Task Start()
{
foreach(var item in list)
{
taskList.Add(Task.Run(() => this.GetThing(item)));
}
Task.WaitAll(taskList.ToArray());
}
public async Task GetThing(string url)
{
var response = await this.theClient.GetAsync(url);
// some code to process and save response
}
It definitiely works faster than synchonus operation but it is not as fast as I expected. Based on other advice I think the local threadpool is slowing me down. MSDN suggest I should specify it as a long running task but I can't see a way to do that calling it like this.
Right now I haven't got into limiting threads, I am just doing batches and testing speed to discover the right approach.
Can anyone suggest some areas for me to look at to increase the speed?
So, after you've set your DefaultConnectionLimit to a nice high number, or just the ConnectionLimit of the ServicePoint that manages connections to the host you are hitting:
ServicePointManager
.FindServicePoint(new Uri("https://example.com/"))
.ConnectionLimit = 1000;
the only suspect bit of code is where you start everything...
public async Task Start()
{
foreach(var item in list)
{
taskList.Add(Task.Run(() => this.GetThing(item)));
}
Task.WaitAll(taskList.ToArray());
}
This can be reduced to
var tasks = list.Select(this.GetThing);
to create the tasks (your async methods return hot (running) tasks... no need to double wrap with Task.Run)
Then, rather that blocking while waiting for them to complete, wait asynchronously instead:
await Task.WhenAll(tasks);
You are probably hitting some overhead in creating multiple instance-based HttpClient vs using a static instance. Your implementation will not scale. Using a shared HttpClient is actually recommended.
See my answer why - What is the overhead of creating a new HttpClient per call in a WebAPI client?
Im trying to make an async method that returns a value. everything work when use the method without return. you can process data , but the problem appears when the return clause added. the program freeze completely without any error or for a while.
please see the code:
public void runTheAsync(){
string resp = sendRequest("http://google.com","x=y").Result;
}
public async Task<string> sendRequest(string url, string postdata)
{
//There is no problem if you use void as the return value , the problem appears when Task<string> used. the program fully go to freeze.
Console.WriteLine("On the UI thread.");
string result = await TaskEx.Run(() =>
{
Console.WriteLine("Starting CPU-intensive work on background thread...");
string work = webRequest(url,postdata);
return work;
});
return result;
}
public string webRequest(string url, string postdata)
{
string _return = "";
WebClient client = new WebClient();
byte[] data = Encoding.UTF8.GetBytes(postdata);
Uri uri = new Uri(url);
_return = System.Text.Encoding.UTF8.GetString(client.UploadData(uri, "POST", data));
return _return;
}
string resp = sendRequest("http://google.com","x=y").Result;
That's your problem. If you call Result on a Task, it blocks until the Task finishes.
Instead, you can do this:
public async void runTheAsync()
{
string resp = await sendRequest("http://google.com","x=y");
}
But creating async void methods should be avoided. Whether you actually can avoid it, depends on how are you calling it.
Try this, data correctness checks etc. omitted but you ignored them either :-):
public async Task<string> UploadRequestAsync(string url, string postdata)
{
string result = await Encoding.GetString(
new WebClient().UploadData(new Uri(uri), "POST", Encoding.UTF8.GetBytes(postdata)));
return result;
}
You somehow doing the work twice, awaiting a explicitly started task. I'd be curious to see what the generated code for this looks like... And of course, in production code use the proper classes from .NET 4.5.