What is the best way to download files via HTTP using .NET? - c#

In one of my application I'm using the WebClient class to download files from a web server. Depending on the web server sometimes the application download millions of documents. It seems to be when there are lot of documents, performance vise the WebClient doesn't scale up well.
Also it seems to be the WebClient doesn't immediately close the connection it opened for the WebServer even after it successfully download the particular document.
I would like to know what other alternatives I have.
Update:
Also I noticed that for each and every download WebClient performs the authentication hand shake. I was expecting to see this hand shake once since my application only communicate with a single web server. Shouldn't the subsequent calls of the WebClient reuse the authentication session?
Update: My application also calls some web service methods and for these web service calls it seems to authentication session is reused. Also I'm using WCF to communicate with the web service.

I think you can still use "WebClient". However, you are better off using the "using" block as a good practice. This will make sure that the object is closed and is disposed off:-
using(WebClient client = new WebClient()) {
// Use client
}

I bet you are running into the default limit of 2 connections per server. Try running this code at the beginning of your program:
var cme = new System.Net.Configuration.ConnectionManagementElement();
cme.MaxConnection = 100;
System.Net.ServicePointManager.DefaultConnectionLimit = 100;

I have noticed the same behavior with the session in another project I was working on. To solve this "problem" I did use a static CookieContainer (since the session of the client is recognized by a value saved in a cookie).
public static class SomeStatics
{
private static CookieContainer _cookieContainer;
public static CookieContainer CookieContainer
{
get
{
if (_cookieContainer == null)
{
_cookieContainer = new CookieContainer();
}
return _cookieContainer;
}
}
}
public class CookieAwareWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
WebRequest request = base.GetWebRequest(address);
if (request is HttpWebRequest)
{
(request as HttpWebRequest).CookieContainer = SomeStatics.CookieContainer;
(request as HttpWebRequest).KeepAlive = false;
}
return request;
}
}
//now the code that will download the file
using(WebClient client = new CookieAwareWebClient())
{
client.DownloadFile("http://address.com/somefile.pdf", #"c:\\temp\savedfile.pdf");
}
The code is just an example and inspired on Using CookieContainer with WebClient class and C# get rid of Connection header in WebClient.
The above code will close your connection immediately after the file is download and it will reuse the authentication.

WebClient is probably the best option. It doesn't close the connection straight away for a reason: so it can use the same connection again, without having to open a new one. If you find that it's not reusing the connection as expected, that's usually because you're not Close()ing the response from the previous request:
var request = WebRequest.Create("...");
// populate parameters
var response = request.GetResponse();
// process response
response.Close(); // <-- make sure you don't forget this!

Related

CredentialCache and HttpWebRequest in .NET

I'm having difficulty understanding how web requests and credentials work in .NET.
I have the following method that is executing a request to a SOAP endpoint.
public WebResponse Execute(NetworkCredential Credentials)
{
HttpWebRequest webRequest = CreateWebRequest(_url, actionUrl);
webRequest.AllowAutoRedirect = true;
webRequest.PreAuthenticate = true;
webRequest.Credentials = Credentials;
// Add headers and content into the requestStream
asyncResult.AsyncWaitHandle.WaitOne();
return webRequest.EndGetResponse(asyncResult);
}
It works well enough. However, users of my applications may have to execute dozens of these requests in short succession. Hundreds over the course of the day. My goal is to implement some of the recommendations I've read about, namely using an HttpClient that exists for the entire lifetime of the application, and to use the CredentialCache to store user's credentials, instead of passing them in to each request.
So I'm starting with the CredentialCache.
Following the example linked above, I instantiated a CredentialCache and added my network credentials to it. Note that this is the exact same NetworkCredential object that I was passing to the request earlier.
NetworkCredential credential = new NetworkCredential();
credential.UserName = Name;
credential.Password = PW;
Program.CredCache.Add(new Uri("https://blah.com/"), "Basic", credential);
Then, when I go to send my HTTP request, I get the credentials from the cache, instead of providing the credentials object directly.
public WebResponse Execute(NetworkCredential Credentials)
{
HttpWebRequest webRequest = CreateWebRequest(_url, actionUrl);
webRequest.AllowAutoRedirect = true;
webRequest.PreAuthenticate = true;
webRequest.Credentials = Program.CredCache;
// more stuff down here
}
The request now fails with a 401 error.
I am failing to understand this on several levels. For starters, I can't seem to figure out whether or not the CredentialCache has indeed passed the proper credentials to the HTTP request.
I suspect part of the problem might be that I'm trying to use "Basic" authentication. I tried "Digest" as well just as a shot in the dark (which also failed), but I'm sure there must be a way to see what kind of authentication the server is expecting.
I have been combing StackOverflow and MDN trying to read up as much as possible about this, but I am having a difficult time separating the relevant information from the outdated and irrelevant information.
If anyone can help me solve the problem that would be most appreciated, but even links to proper educational resources would be helpful.
According to the documentation the CredentialCache class is only for SMTP, it explicitly says that it is not for HTTP or FTP requests:
https://msdn.microsoft.com/en-us/library/system.net.credentialcache(v=vs.110).aspx
Which directly contradicts the info in the later api docs. Which one is right I don't know.
You could try using the HttpClient class. The methods and return types are different, so you would need to tweak your other a code a bit, but it would look a bit like this:
public class CommsClass
{
private HttpClient _httpClient;
public CommsClass(NetworkCredential credentials)
{
var handler = new HttpClientHandler { Credentials = credentials };
_httpclient = new HttpClient(handler);
}
public HttpResponseMessage Execute(HttpRequestMessage message)
{
var response = _httpClient.SendAsync(message).Result;
return response;
}
}
You can do all sorts of other things with the handler, and the client like set request headers or set a base address.

WebClient is very slow

I have problem with Webclient.
It is very slow. It takes about 3-5 seconds to downloadString from one website.
I don't have any network problems.
This is my Modifed WebClient.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Net;
namespace StatusChecker
{
class WebClientEx: WebClient
{
public CookieContainer CookieContainer { get; private set; }
public WebClientEx()
{
CookieContainer = new CookieContainer();
ServicePointManager.Expect100Continue = false;
Encoding = System.Text.Encoding.UTF8;
WebRequest.DefaultWebProxy = null;
Proxy = null;
}
public void ClearCookies()
{
CookieContainer = new CookieContainer();
}
protected override WebRequest GetWebRequest(Uri address)
{
var request = base.GetWebRequest(address);
if (request is HttpWebRequest)
{
(request as HttpWebRequest).CookieContainer = CookieContainer;
}
return request;
}
}
}
UPDATE:
In wireshark I saw that single DownladString is sending and receiving few thousands packets.
There may be two issues at hand here (that I've also noticed in my own programs previously):
The first request takes an abnormally long time: This occurs because WebRequest by default detects and loads proxy settings the first time it starts, which can take quite a while. To stop this, simply set the proxy property (WebRequest.Proxy) to null and it'll bypass the check (provided you can directly access the internet)
You can't download more than 2 items at once: By default, you can only have 2 simultaneous HTTP connections open. To change this, set ServicePointManager.DefaultConnectionLimit to something larger. I usually set this to int.MaxValue (just make sure you don't spam the host with 1,000,000 connections).
There are a few options if it is related to the initial proxy settings being checked:
Disable the automatic proxy detection settings in Internet Explorer
Set the proxy to null:
WebClient.Proxy = null
On application startup set the default webproxy to null:
WebRequest.DefaultWebProxy = null;
In older .NET code instead of setting to null, you used to write this (but null is now preferred):
webclient.Proxy = GlobalProxySelection.GetEmptyWebProxy();
Maybe it will help somebody. Some web services support compression (gzip or other). So you can add Accept-Encoding header for your requests and then enable automatic decompression for web client instance. Chrome works in that way.

System.Net.WebClient unreasonably slow

When using the System.Net.WebClient.DownloadData() method I'm getting an unreasonably slow response time.
When fetching an url using the WebClient class in .NET it takes around 10 sec before I get a response, while the same page is fetched by my browser in under 1 sec.
And this is with data that's 0.5kB or smaller in size.
The request involves POST/GET parameters and a user agent header if perhaps that could cause problems.
I haven't (yet) tried if other ways to download data in .NET gives me the same problems, but I'm suspecting I might get similar results. (I've always had a feeling web requests in .NET are unusually slow...)
What could be the cause of this?
Edit:
I tried doing the exact thing using System.Net.HttpWebRequest instead, using the following method, and all requests finish in under 1 sec.
public static string DownloadText(string url)
var request = (HttpWebRequest)WebRequest.Create(url);
var response = (HttpWebResponse)request.GetResponse();
using (var reader = new StreamReader(response.GetResponseStream()))
{
return reader.ReadToEnd();
}
}
While this (old) method using System.Net.WebClient takes 15-30s for each request to finish:
public static string DownloadText(string url)
{
var client = new WebClient();
byte[] data = client.DownloadData(url);
return client.Encoding.GetString(data);
}
I had that problem with WebRequest. Try setting Proxy = null;
WebClient wc = new WebClient();
wc.Proxy = null;
By default WebClient, WebRequest try to determine what proxy to use from IE settings, sometimes it results in like 5 sec delay before the actual request is sent.
This applies to all classes that use WebRequest, including WCF services with HTTP binding.
In general you can use this static code at application startup:
WebRequest.DefaultWebProxy = null;
Download Wireshark here http://www.wireshark.org/
Capture the network packets and filter the "http" packets.
It should give you the answer right away.
There is nothing inherently slow about .NET web requests; that code should be fine. I regularly use WebClient and it works very quickly.
How big is the payload in each direction? Silly question maybe, but is it simply bandwidth limitations?
IMO the most likely thing is that your web-site has spun down, and when you hit the URL the web-site is slow to respond. This is then not the fault of the client. It is also possible that DNS is slow for some reason (in which case you could hard-code the IP into your "hosts" file), or that some proxy server in the middle is slow.
If the web-site isn't yours, it is also possible that they are detecting atypical usage and deliberately injecting a delay to annoy scrapers.
I would grab Fiddler (a free, simple web inspector) and look at the timings.
WebClient may be slow on some workstations when Automatic Proxy Settings in checked in the IE settings (Connections tab - LAN Settings).
Setting WebRequest.DefaultWebProxy = null; or client.Proxy = null didn't do anything for me, using Xamarin on iOS.
I did two things to fix this:
I wrote a downloadString function which does not use WebRequest and System.Net:
public static async Task<string> FnDownloadStringWithoutWebRequest(string url)
{
using (var client = new HttpClient())
{
//Define Headers
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
var response = await client.GetAsync(url);
if (response.IsSuccessStatusCode)
{
string responseContent = await response.Content.ReadAsStringAsync();
//dynamic json = Newtonsoft.Json.JsonConvert.DeserializeObject(responseContent);
return responseContent;
}
Logger.DefaultLogger.LogError(LogLevel.NORMAL, "GoogleLoginManager.FnDownloadString", "error fetching string, code: " + response.StatusCode);
return "";
}
}
This is however still slow with Managed HttpClient.
So secondly, in Visual Studio Community for Mac, right click on your Project in the Solution -> Options -> set HttpClient implementation to NSUrlSession, instead of Managed.
Screenshot: Set HttpClient implementation to NSUrlSession instead of Managed
Managed is not fully integrated into iOS, doesn't support TLS 1.2, and thus does not support the ATS standards set as default in iOS9+, see here:
https://learn.microsoft.com/en-us/xamarin/ios/app-fundamentals/ats
With both these changes, string downloads are always very fast (<<1s).
Without both of these changes, on every second or third try, downloadString took over a minute.
Just FYI, there's one more thing you could try, though it shouldn't be necessary anymore:
//var authgoogle = new OAuth2Authenticator(...);
//authgoogle.Completed...
if (authgoogle.IsUsingNativeUI)
{
// Step 2.1 Creating Login UI
// In order to access SFSafariViewController API the cast is neccessary
SafariServices.SFSafariViewController c = null;
c = (SafariServices.SFSafariViewController)ui_object;
PresentViewController(c, true, null);
}
else
{
PresentViewController(ui_object, true, null);
}
Though in my experience, you probably don't need the SafariController.
Another alternative (also free) to Wireshark is Microsoft Network Monitor.
What browser are you using to test?
Try using the default IE install. System.Net.WebClient uses the local IE settings, proxy etc. Maybe that has been mangled?
Another cause for extremely slow WebClient downloads is the destination media to which you are downloading. If it is a slow device like a USB key, this can massively impact download speed. To my HDD I could download at 6MB/s, to my USB key, only 700kb/s, even though I can copy files to this USB at 5MB/s from another drive. wget shows the same behavior. This is also reported here:
https://superuser.com/questions/413750/why-is-downloading-over-usb-so-slow
So if this is your scenario, an alternative solution is to download to HDD first and then copy files to the slow medium after download completes.

how to execute a url or hyperlink without leaving the existing page using asp.net

I have used this code
WebClient webClient = new WebClient();
byte[] reqHTML;
reqHTML = webClient.DownloadData(url);
for executing a url. Here i am having a question, while using this code, whether the cookies set or not?
Cookies are not sent by default with WebClient. You could although write your implementation that uses a cookie container:
public class CookieAwareWebClient : WebClient
{
private CookieContainer _container = new CookieContainer();
protected override WebRequest GetWebRequest(Uri address)
{
WebRequest request = base.GetWebRequest(address);
if (request is HttpWebRequest)
((HttpWebRequest)request).CookieContainer = _container;
return request;
}
}
If you mean the cookies from the ASP.NET page that is executing - then no: I'm pretty sure that WebClient is not going to look at all for the cookies on the current executing web request.
If you want this functionality, can you perhaps use AJAX from the browser? Perhaps via jQuery? That should flow the context etc as per standard browser rules.
Alternatively, you are going to have to handle the cookies yourself (i.e. copy them into the WebClient, and back if needed).

Test if a website is alive from a C# application

I am looking for the best way to test if a website is alive from a C# application.
Background
My application consists of a Winforms UI, a backend WCF service and a website to publish content to the UI and other consumers. To prevent the situation where the UI starts up and fails to work properly because of a missing WCF service or website being down I have added an app startup check to ensure that all everything is alive.
The application is being written in C#, .NET 3.5, Visual Studio 2008
Current Solution
Currently I am making a web request to a test page on the website that will inturn test the web site and then display a result.
WebRequest request = WebRequest.Create("http://localhost/myContentSite/test.aspx");
WebResponse response = request.GetResponse();
I am assuming that if there are no exceptions thown during this call then all is well and the UI can start.
Question
Is this the simplest, right way or is there some other sneaky call that I don't know about in C# or a better way to do it.
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response == null || response.StatusCode != HttpStatusCode.OK)
As #Yanga mentioned, HttpClient is probably the more common way to do this now.
HttpClient client = new HttpClient();
var checkingResponse = await client.GetAsync(url);
if (!checkingResponse.IsSuccessStatusCode)
{
return false;
}
While using WebResponse please make sure that you close the response stream ie (.close) else it would hang the machine after certain repeated execution.
Eg
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(sURL);
HttpWebResponse response = (HttpWebResponse)req.GetResponse();
// your code here
response.Close();
from the NDiagnostics project on CodePlex...
public override bool WebSiteIsAvailable(string Url)
{
string Message = string.Empty;
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(Url);
// Set the credentials to the current user account
request.Credentials = System.Net.CredentialCache.DefaultCredentials;
request.Method = "GET";
try
{
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
// Do nothing; we're only testing to see if we can get the response
}
}
catch (WebException ex)
{
Message += ((Message.Length > 0) ? "\n" : "") + ex.Message;
}
return (Message.Length == 0);
}
We can today update the answers using HttpClient():
HttpClient Client = new HttpClient();
var result = await Client.GetAsync("https://stackoverflow.com");
int StatusCode = (int)result.StatusCode;
Assuming the WCF service and the website live in the same web app, you can use a "Status" WebService that returns the application status. You probably want to do some of the following:
Test that the database is up and running (good connection string, service is up, etc...)
Test that the website is working (how exactly depends on the website)
Test that WCF is working (how exactly depends on your implementation)
Added bonus: you can return some versioning info on the service if you need to support different releases in the future.
Then, you create a client on the Win.Forms app for the WebService. If the WS is not responding (i.e. you get some exception on invoke) then the website is down (like a "general error").
If the WS responds, you can parse the result and make sure that everything works, or if something is broken, return more information.
You'll want to check the status code for OK (status 200).
Solution from: How do you check if a website is online in C#?
var ping = new System.Net.NetworkInformation.Ping();
var result = ping.Send("https://www.stackoverflow.com");
if (result.Status != System.Net.NetworkInformation.IPStatus.Success)
return;

Categories