System.Net.WebClient unreasonably slow - c#

When using the System.Net.WebClient.DownloadData() method I'm getting an unreasonably slow response time.
When fetching an url using the WebClient class in .NET it takes around 10 sec before I get a response, while the same page is fetched by my browser in under 1 sec.
And this is with data that's 0.5kB or smaller in size.
The request involves POST/GET parameters and a user agent header if perhaps that could cause problems.
I haven't (yet) tried if other ways to download data in .NET gives me the same problems, but I'm suspecting I might get similar results. (I've always had a feeling web requests in .NET are unusually slow...)
What could be the cause of this?
Edit:
I tried doing the exact thing using System.Net.HttpWebRequest instead, using the following method, and all requests finish in under 1 sec.
public static string DownloadText(string url)
var request = (HttpWebRequest)WebRequest.Create(url);
var response = (HttpWebResponse)request.GetResponse();
using (var reader = new StreamReader(response.GetResponseStream()))
{
return reader.ReadToEnd();
}
}
While this (old) method using System.Net.WebClient takes 15-30s for each request to finish:
public static string DownloadText(string url)
{
var client = new WebClient();
byte[] data = client.DownloadData(url);
return client.Encoding.GetString(data);
}

I had that problem with WebRequest. Try setting Proxy = null;
WebClient wc = new WebClient();
wc.Proxy = null;
By default WebClient, WebRequest try to determine what proxy to use from IE settings, sometimes it results in like 5 sec delay before the actual request is sent.
This applies to all classes that use WebRequest, including WCF services with HTTP binding.
In general you can use this static code at application startup:
WebRequest.DefaultWebProxy = null;

Download Wireshark here http://www.wireshark.org/
Capture the network packets and filter the "http" packets.
It should give you the answer right away.

There is nothing inherently slow about .NET web requests; that code should be fine. I regularly use WebClient and it works very quickly.
How big is the payload in each direction? Silly question maybe, but is it simply bandwidth limitations?
IMO the most likely thing is that your web-site has spun down, and when you hit the URL the web-site is slow to respond. This is then not the fault of the client. It is also possible that DNS is slow for some reason (in which case you could hard-code the IP into your "hosts" file), or that some proxy server in the middle is slow.
If the web-site isn't yours, it is also possible that they are detecting atypical usage and deliberately injecting a delay to annoy scrapers.
I would grab Fiddler (a free, simple web inspector) and look at the timings.

WebClient may be slow on some workstations when Automatic Proxy Settings in checked in the IE settings (Connections tab - LAN Settings).

Setting WebRequest.DefaultWebProxy = null; or client.Proxy = null didn't do anything for me, using Xamarin on iOS.
I did two things to fix this:
I wrote a downloadString function which does not use WebRequest and System.Net:
public static async Task<string> FnDownloadStringWithoutWebRequest(string url)
{
using (var client = new HttpClient())
{
//Define Headers
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
var response = await client.GetAsync(url);
if (response.IsSuccessStatusCode)
{
string responseContent = await response.Content.ReadAsStringAsync();
//dynamic json = Newtonsoft.Json.JsonConvert.DeserializeObject(responseContent);
return responseContent;
}
Logger.DefaultLogger.LogError(LogLevel.NORMAL, "GoogleLoginManager.FnDownloadString", "error fetching string, code: " + response.StatusCode);
return "";
}
}
This is however still slow with Managed HttpClient.
So secondly, in Visual Studio Community for Mac, right click on your Project in the Solution -> Options -> set HttpClient implementation to NSUrlSession, instead of Managed.
Screenshot: Set HttpClient implementation to NSUrlSession instead of Managed
Managed is not fully integrated into iOS, doesn't support TLS 1.2, and thus does not support the ATS standards set as default in iOS9+, see here:
https://learn.microsoft.com/en-us/xamarin/ios/app-fundamentals/ats
With both these changes, string downloads are always very fast (<<1s).
Without both of these changes, on every second or third try, downloadString took over a minute.
Just FYI, there's one more thing you could try, though it shouldn't be necessary anymore:
//var authgoogle = new OAuth2Authenticator(...);
//authgoogle.Completed...
if (authgoogle.IsUsingNativeUI)
{
// Step 2.1 Creating Login UI
// In order to access SFSafariViewController API the cast is neccessary
SafariServices.SFSafariViewController c = null;
c = (SafariServices.SFSafariViewController)ui_object;
PresentViewController(c, true, null);
}
else
{
PresentViewController(ui_object, true, null);
}
Though in my experience, you probably don't need the SafariController.

Another alternative (also free) to Wireshark is Microsoft Network Monitor.

What browser are you using to test?
Try using the default IE install. System.Net.WebClient uses the local IE settings, proxy etc. Maybe that has been mangled?

Another cause for extremely slow WebClient downloads is the destination media to which you are downloading. If it is a slow device like a USB key, this can massively impact download speed. To my HDD I could download at 6MB/s, to my USB key, only 700kb/s, even though I can copy files to this USB at 5MB/s from another drive. wget shows the same behavior. This is also reported here:
https://superuser.com/questions/413750/why-is-downloading-over-usb-so-slow
So if this is your scenario, an alternative solution is to download to HDD first and then copy files to the slow medium after download completes.

Related

In .NET, failure to retrieve HTTP resource from W3C web site

Retrieving the resource at http://www.w3.org/TR/xmlschema11-1/XMLSchema.xsd takes around 10 seconds using the following mechanisms:
web browser
curl
Java URL.openConnection()
It's possible that the W3C site is applying some "throttling" - deliberately slowing the response to discourage bulk requests.
Trying to retrieve the same resource from a C# application on .NET, I get a timeout after about 60-70 seconds. I've tried a couple of different approaches, both with the same result:
System.Xml.XmlUrlResolver.GetEntity()
new WebClient().OpenRead(uri)
Anyone have any idea what's going on? Would another API, or some configuration options, solve the problem?
The problem is they are (probably) checking for a User-Agent string. If it's not present, they send you to purgatory. .NET's http clients do not set this by default.
So, give this a shot:
private static readonly HttpClient _client = new HttpClient();
public static async Task TestMe()
{
using (var req = new HttpRequestMessage(HttpMethod.Get,
"http://www.w3.org/TR/xmlschema11-1/XMLSchema.xsd"))
{
req.Headers.Add("user-agent",
"Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X)");
using (var resp = await _client.SendAsync(req))
{
resp.EnsureSuccessStatusCode();
var data = await resp.Content.ReadAsStringAsync();
}
}
}
No idea why they do this; Maybe it's a bug in their back-end? (I sure wouldn't want to leave a socket open longer than it needs to be for no good reason). The request still takes 10-15 seconds, but it's better than the 120+ second timeout.

.Net C# RESTSharp 10 Minute Timeout

I have embedded a browser control into a .Net form and compiled it as a window's executable. The browser control is displaying our HTML5 image viewer. The application opens sockets so it can listen to "push" requests from various servers. This allows images to be pushed to individual user's desktops.
When an incoming image push request comes in, the application calls a REST service using RESTSharp to generate a token for the viewer to use to display the image.
As long as the requests are consistently arriving, everything works great. If there is a lull (10 minutes seems to be the time frame), then the RESTSharp request times out. It is almost as though the creation of a new instance of the RESTSharp artifacts are reusing the old ones in an attempted .Net optimization.
Here is the RESTSharp code I am using:
private async Task<string> postJsonDataToUrl(string lpPostData) {
IRestClient client = new RestClient(string.Format("{0}:{1}", MstrScsUrlBase, MintScsUrlPort));
IRestRequest request = new RestRequest(string.Format("{0}{1}{2}", MstrScsUrlContextRoot, MstrScsUrlPath, SCS_GENERATE_TOKEN_URL_PATH));
request.Timeout = 5000;
request.ReadWriteTimeout = 5000;
request.AddParameter("application/json", lpPostData, ParameterType.RequestBody);
IRestResponse response = await postResultAsync(client, request);
return response.Content;
} // postJsonDataToUrl
private static Task<IRestResponse> postResultAsync(IRestClient client, IRestRequest request) {
return client.ExecutePostTaskAsync(request);
} // PostResultAsync
This is the line where the time out occurs:
IRestResponse response = await postResultAsync(client, request);
I have tried rewriting this using .Net's HttpWebRequest and I get the same problem.
If I lengthen the RESTSharp timeouts, I am able to make calls to the server (using a different client) while the application is "timing out" so I know the server isn't the issue.
The initial version of the code did not have the await async call structure - that was added as an attempt to get more information on the problem.
I am not getting any errors other than the REST timeout.
I have had limited success with forcing a Garbage Collection with this call:
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
Any thoughts?
It is possible you are hitting the connection limit for .Net apps, as in MS docs:
"By default, an application using the HttpWebRequest class uses a maximum of two persistent connections to a given server, but you can set the maximum number of connections on a per-application basis."
(https://learn.microsoft.com/en-us/dotnet/framework/network-programming/managing-connections).
Closing the connections should help, or you might be able to increase that limit, that is also in the doc
I ended up putting
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
in a timer that fired every 2 minutes. This completely solved my issue.
This is very surprising to me since my HttpWebRequest code was wrapped in "using" statements so the resources should have been released properly. I can only conclude that .Net was optimizing the use of the class and was trying to reuse a stale class rather than allow me to create a new one from scratch.
A new way of doing things.
var body = #"{ ""key"": ""value"" }";
// HTTP package
var request = new RestRequest("https://localhost:5001/api/networkdevices", Method.Put);
request.AddHeader("Content-Type", "application/json");
request.AddHeader("Keep-Alive", "");// set "timeout=120" will work as well
request.Timeout = 120;
request.AddBody(body);
// HTTP call
var client = new RestClient();
RestResponse response = await client.ExecuteAsync(request);
Console.WriteLine(response.Content);

WebRequest "HEAD" light weight alternative

I recently discovered that the following does not work with certain sites, such as IMDB.com.
class Program
{
static void Main(string[] args)
{
try
{
System.Net.WebRequest wc = System.Net.WebRequest.Create("http://www.imdb.com"); //args[0]);
((HttpWebRequest)wc).UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/0.2.153.1 Safari/525.19";
wc.Timeout = 1000;
wc.Method = "HEAD";
WebResponse res = wc.GetResponse();
var streamReader = new System.IO.StreamReader(res.GetResponseStream());
Console.WriteLine(streamReader.ReadToEnd());
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
}
It returns an HTTP 405 ( Method Not Allowed ). My problem is, I use code very similar to the above to check if a link is valid and the vast majority of times it works correctly. I can switch it to method equal GET and it works ( with an increase in timeout ), but this slows things down by an order of magnitude. I am assuming the 405 response is a server configuration on IMDB's server side.
Is there a way for me to do the same thing as above, in a light weight manner in .NET? Or, is there a way to fix the above code so it works as a GET request that works with imdb?
Open the connection yourself with a socket (instead of an HttpRequest or WebClient), and close the stream as soon as you've read the status code. Fortunately the status code comes near the top of the response stream :)
You'll have to clarify what you mean by "lightweight". What are you trying to accomplish?
Whether or not you can use GET/POST/HEAD/DELETE/etc will depend on the URL and what's configured in the application that is running on the server at that URL.
If all you're trying to do is see if you can make a connection without actually downloading the content you could maybe try just initiating a connection to port 80 using sockets, but there isn't really reliable or universally supported way just by changing the HTTP method.
If HEAD returns a 405, that means the server doesn't support HEAD (at least for that URL) and you'll have fall back to GET instead. The majority of sites should support HEAD, so you probably want to do HEAD by default, but if it throws a 405, you could maybe fall back to GET for that domain. Or maybe you want to try HEAD first for each request; YMMV.
If the server requires GET and you want to reduce network traffic, you could try doing a conditional GET and/or a partial GET (see e.g. RFC2616). I've never tried doing those with WebRequest but I think it lets you add custom outgoing HTTP headers, so you should be able to do it.
Also, don't forget that, if you're writing a spider (which you clearly are), you should respect the server's robots.txt, and it's also courteous to throttle your requests to something like one request every two seconds, so you don't slashdot the server.

What is the best way to download files via HTTP using .NET?

In one of my application I'm using the WebClient class to download files from a web server. Depending on the web server sometimes the application download millions of documents. It seems to be when there are lot of documents, performance vise the WebClient doesn't scale up well.
Also it seems to be the WebClient doesn't immediately close the connection it opened for the WebServer even after it successfully download the particular document.
I would like to know what other alternatives I have.
Update:
Also I noticed that for each and every download WebClient performs the authentication hand shake. I was expecting to see this hand shake once since my application only communicate with a single web server. Shouldn't the subsequent calls of the WebClient reuse the authentication session?
Update: My application also calls some web service methods and for these web service calls it seems to authentication session is reused. Also I'm using WCF to communicate with the web service.
I think you can still use "WebClient". However, you are better off using the "using" block as a good practice. This will make sure that the object is closed and is disposed off:-
using(WebClient client = new WebClient()) {
// Use client
}
I bet you are running into the default limit of 2 connections per server. Try running this code at the beginning of your program:
var cme = new System.Net.Configuration.ConnectionManagementElement();
cme.MaxConnection = 100;
System.Net.ServicePointManager.DefaultConnectionLimit = 100;
I have noticed the same behavior with the session in another project I was working on. To solve this "problem" I did use a static CookieContainer (since the session of the client is recognized by a value saved in a cookie).
public static class SomeStatics
{
private static CookieContainer _cookieContainer;
public static CookieContainer CookieContainer
{
get
{
if (_cookieContainer == null)
{
_cookieContainer = new CookieContainer();
}
return _cookieContainer;
}
}
}
public class CookieAwareWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
WebRequest request = base.GetWebRequest(address);
if (request is HttpWebRequest)
{
(request as HttpWebRequest).CookieContainer = SomeStatics.CookieContainer;
(request as HttpWebRequest).KeepAlive = false;
}
return request;
}
}
//now the code that will download the file
using(WebClient client = new CookieAwareWebClient())
{
client.DownloadFile("http://address.com/somefile.pdf", #"c:\\temp\savedfile.pdf");
}
The code is just an example and inspired on Using CookieContainer with WebClient class and C# get rid of Connection header in WebClient.
The above code will close your connection immediately after the file is download and it will reuse the authentication.
WebClient is probably the best option. It doesn't close the connection straight away for a reason: so it can use the same connection again, without having to open a new one. If you find that it's not reusing the connection as expected, that's usually because you're not Close()ing the response from the previous request:
var request = WebRequest.Create("...");
// populate parameters
var response = request.GetResponse();
// process response
response.Close(); // <-- make sure you don't forget this!

How to perform a fast web request in C#

I have a HTTP based API which I potentially need to call many times. The problem is that I can't get the request to take less than about 20 seconds, though the same request made through a browser is near instantaneous. The following code illustrates how I have implemented it so far.
WebRequest r = HttpWebRequest.Create("https://example.com/http/command?param=blabla");
var response = r.GetResponse();
One solution would be to make an asynchronous request but I would like to know why it takes so long and if I can avoid it. I have also tried using the WebClient class but I suspect it uses a WebRequest internally.
Update:
Running the following code took about 40 seconds in Release Mode (measured with Stopwatch):
WebRequest g = HttpWebRequest.Create("http://www.google.com");
var response = g.GetResponse();
I'm working at a university where there might be different things in the network configuration affecting the performance, but the direct use of the browser illustrates that it should be near instant.
Update 2:
I uploaded the code to a remote machine and it worked fine so the conclusion must be that the .NET code does something extra compared to the browser or it has problems resolving the address through the university network (proxy issues or something?!).
This problem is similar to another post on StackOverflow:
Stackoverflow-2519655(HttpWebrequest is extremely slow)
Most of the time the problem is the Proxy server property. You should set this property to null, otherwise the object will attempt to search for an appropriate proxy server to use before going directly to the source. Note: this property is turn on by default, so you have to explicitly tell the object not to perform this proxy search.
request.Proxy = null;
using (var response = (HttpWebResponse)request.GetResponse())
{
}
I was having the 30 second delay on 'first' attempt - JamesR's reference to the other post mentioning setting proxy to null solved it instantly!
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(_site.url);
request.Proxy = null; // <-- this is the good stuff
...
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Does your site have an invalid SSL cert? Try adding this
ServicePointManager.ServerCertificateValidationCallback = new System.Net.Security.RemoteCertificateValidationCallback(AlwaysAccept);
//... somewhere AlwaysAccept is defined as:
using System.Security.Cryptography.X509Certificates;
using System.Net.Security;
public bool AlwaysAccept(object sender, X509Certificate certification, X509Chain chain, SslPolicyErrors sslPolicyErrors)
{
return true;
}
You don't close your Request. As soon as you hit the number of allowed connections, you have to wait for the earlier ones to time out. Try
using (var response = g.GetResponse())
{
// do stuff with your response
}

Categories