WebRequest "HEAD" light weight alternative - c#

I recently discovered that the following does not work with certain sites, such as IMDB.com.
class Program
{
static void Main(string[] args)
{
try
{
System.Net.WebRequest wc = System.Net.WebRequest.Create("http://www.imdb.com"); //args[0]);
((HttpWebRequest)wc).UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/0.2.153.1 Safari/525.19";
wc.Timeout = 1000;
wc.Method = "HEAD";
WebResponse res = wc.GetResponse();
var streamReader = new System.IO.StreamReader(res.GetResponseStream());
Console.WriteLine(streamReader.ReadToEnd());
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
}
It returns an HTTP 405 ( Method Not Allowed ). My problem is, I use code very similar to the above to check if a link is valid and the vast majority of times it works correctly. I can switch it to method equal GET and it works ( with an increase in timeout ), but this slows things down by an order of magnitude. I am assuming the 405 response is a server configuration on IMDB's server side.
Is there a way for me to do the same thing as above, in a light weight manner in .NET? Or, is there a way to fix the above code so it works as a GET request that works with imdb?

Open the connection yourself with a socket (instead of an HttpRequest or WebClient), and close the stream as soon as you've read the status code. Fortunately the status code comes near the top of the response stream :)

You'll have to clarify what you mean by "lightweight". What are you trying to accomplish?
Whether or not you can use GET/POST/HEAD/DELETE/etc will depend on the URL and what's configured in the application that is running on the server at that URL.
If all you're trying to do is see if you can make a connection without actually downloading the content you could maybe try just initiating a connection to port 80 using sockets, but there isn't really reliable or universally supported way just by changing the HTTP method.

If HEAD returns a 405, that means the server doesn't support HEAD (at least for that URL) and you'll have fall back to GET instead. The majority of sites should support HEAD, so you probably want to do HEAD by default, but if it throws a 405, you could maybe fall back to GET for that domain. Or maybe you want to try HEAD first for each request; YMMV.
If the server requires GET and you want to reduce network traffic, you could try doing a conditional GET and/or a partial GET (see e.g. RFC2616). I've never tried doing those with WebRequest but I think it lets you add custom outgoing HTTP headers, so you should be able to do it.
Also, don't forget that, if you're writing a spider (which you clearly are), you should respect the server's robots.txt, and it's also courteous to throttle your requests to something like one request every two seconds, so you don't slashdot the server.

Related

In .NET, failure to retrieve HTTP resource from W3C web site

Retrieving the resource at http://www.w3.org/TR/xmlschema11-1/XMLSchema.xsd takes around 10 seconds using the following mechanisms:
web browser
curl
Java URL.openConnection()
It's possible that the W3C site is applying some "throttling" - deliberately slowing the response to discourage bulk requests.
Trying to retrieve the same resource from a C# application on .NET, I get a timeout after about 60-70 seconds. I've tried a couple of different approaches, both with the same result:
System.Xml.XmlUrlResolver.GetEntity()
new WebClient().OpenRead(uri)
Anyone have any idea what's going on? Would another API, or some configuration options, solve the problem?
The problem is they are (probably) checking for a User-Agent string. If it's not present, they send you to purgatory. .NET's http clients do not set this by default.
So, give this a shot:
private static readonly HttpClient _client = new HttpClient();
public static async Task TestMe()
{
using (var req = new HttpRequestMessage(HttpMethod.Get,
"http://www.w3.org/TR/xmlschema11-1/XMLSchema.xsd"))
{
req.Headers.Add("user-agent",
"Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X)");
using (var resp = await _client.SendAsync(req))
{
resp.EnsureSuccessStatusCode();
var data = await resp.Content.ReadAsStringAsync();
}
}
}
No idea why they do this; Maybe it's a bug in their back-end? (I sure wouldn't want to leave a socket open longer than it needs to be for no good reason). The request still takes 10-15 seconds, but it's better than the 120+ second timeout.

Is there a way to unit test for a timeout error?

I created a program to call thousands of urls, what I want to do is make sure if the url times out I can trace the timeout error back to the url. What I have in place is:
string url = "http://www.google.com";
HttpWebRequest request = (HttpWebRequest)
WebRequest.Create(url);
request.Timeout = 5000;
request.ReadWriteTimeout = 5000;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101 Firefox/14.0.1";
//From Fiddler
DateTime giveUp = DateTime.UtcNow.AddSeconds(5);
if (DateTime.UtcNow > giveUp)
throw new TimeoutException("The following url could not be reached: " + url);
I want to make a unit to test that the timeout Exception works correctly. Let me know if you need more details but this should suffice.
The problem becomes what you're really testing. You could very easily spin up a little HTTP server in your test and send a request to it, knowing that it will time out--which would test your timeout code. But, you're really testing whether HttpWebRequest correctly times out. You don't care that it correctly times out, only that it does and that you want to make sure you code handles a TimeoutException exception.
What is usually done in this case is that a test double (mock) is used that basically just throws a TimeoutException.
Write a setter method to the Timeout period so you can manipulate it from the Unit test, set this so short that an Exception will be thrown then Test it either using the ExpectedException attribute or in the catch statement of your Unit test to make sure such an Exception has been thrown.
You should have well-known results. So I see two possibilities:
1. You trust the timer-functions and use (for testing) a timeout that is an average duration (e.g. if you have durations of 40-50 ms you use 45 ms)
2. You set up your own server with simulated delays.

System.Net.WebClient unreasonably slow

When using the System.Net.WebClient.DownloadData() method I'm getting an unreasonably slow response time.
When fetching an url using the WebClient class in .NET it takes around 10 sec before I get a response, while the same page is fetched by my browser in under 1 sec.
And this is with data that's 0.5kB or smaller in size.
The request involves POST/GET parameters and a user agent header if perhaps that could cause problems.
I haven't (yet) tried if other ways to download data in .NET gives me the same problems, but I'm suspecting I might get similar results. (I've always had a feeling web requests in .NET are unusually slow...)
What could be the cause of this?
Edit:
I tried doing the exact thing using System.Net.HttpWebRequest instead, using the following method, and all requests finish in under 1 sec.
public static string DownloadText(string url)
var request = (HttpWebRequest)WebRequest.Create(url);
var response = (HttpWebResponse)request.GetResponse();
using (var reader = new StreamReader(response.GetResponseStream()))
{
return reader.ReadToEnd();
}
}
While this (old) method using System.Net.WebClient takes 15-30s for each request to finish:
public static string DownloadText(string url)
{
var client = new WebClient();
byte[] data = client.DownloadData(url);
return client.Encoding.GetString(data);
}
I had that problem with WebRequest. Try setting Proxy = null;
WebClient wc = new WebClient();
wc.Proxy = null;
By default WebClient, WebRequest try to determine what proxy to use from IE settings, sometimes it results in like 5 sec delay before the actual request is sent.
This applies to all classes that use WebRequest, including WCF services with HTTP binding.
In general you can use this static code at application startup:
WebRequest.DefaultWebProxy = null;
Download Wireshark here http://www.wireshark.org/
Capture the network packets and filter the "http" packets.
It should give you the answer right away.
There is nothing inherently slow about .NET web requests; that code should be fine. I regularly use WebClient and it works very quickly.
How big is the payload in each direction? Silly question maybe, but is it simply bandwidth limitations?
IMO the most likely thing is that your web-site has spun down, and when you hit the URL the web-site is slow to respond. This is then not the fault of the client. It is also possible that DNS is slow for some reason (in which case you could hard-code the IP into your "hosts" file), or that some proxy server in the middle is slow.
If the web-site isn't yours, it is also possible that they are detecting atypical usage and deliberately injecting a delay to annoy scrapers.
I would grab Fiddler (a free, simple web inspector) and look at the timings.
WebClient may be slow on some workstations when Automatic Proxy Settings in checked in the IE settings (Connections tab - LAN Settings).
Setting WebRequest.DefaultWebProxy = null; or client.Proxy = null didn't do anything for me, using Xamarin on iOS.
I did two things to fix this:
I wrote a downloadString function which does not use WebRequest and System.Net:
public static async Task<string> FnDownloadStringWithoutWebRequest(string url)
{
using (var client = new HttpClient())
{
//Define Headers
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
var response = await client.GetAsync(url);
if (response.IsSuccessStatusCode)
{
string responseContent = await response.Content.ReadAsStringAsync();
//dynamic json = Newtonsoft.Json.JsonConvert.DeserializeObject(responseContent);
return responseContent;
}
Logger.DefaultLogger.LogError(LogLevel.NORMAL, "GoogleLoginManager.FnDownloadString", "error fetching string, code: " + response.StatusCode);
return "";
}
}
This is however still slow with Managed HttpClient.
So secondly, in Visual Studio Community for Mac, right click on your Project in the Solution -> Options -> set HttpClient implementation to NSUrlSession, instead of Managed.
Screenshot: Set HttpClient implementation to NSUrlSession instead of Managed
Managed is not fully integrated into iOS, doesn't support TLS 1.2, and thus does not support the ATS standards set as default in iOS9+, see here:
https://learn.microsoft.com/en-us/xamarin/ios/app-fundamentals/ats
With both these changes, string downloads are always very fast (<<1s).
Without both of these changes, on every second or third try, downloadString took over a minute.
Just FYI, there's one more thing you could try, though it shouldn't be necessary anymore:
//var authgoogle = new OAuth2Authenticator(...);
//authgoogle.Completed...
if (authgoogle.IsUsingNativeUI)
{
// Step 2.1 Creating Login UI
// In order to access SFSafariViewController API the cast is neccessary
SafariServices.SFSafariViewController c = null;
c = (SafariServices.SFSafariViewController)ui_object;
PresentViewController(c, true, null);
}
else
{
PresentViewController(ui_object, true, null);
}
Though in my experience, you probably don't need the SafariController.
Another alternative (also free) to Wireshark is Microsoft Network Monitor.
What browser are you using to test?
Try using the default IE install. System.Net.WebClient uses the local IE settings, proxy etc. Maybe that has been mangled?
Another cause for extremely slow WebClient downloads is the destination media to which you are downloading. If it is a slow device like a USB key, this can massively impact download speed. To my HDD I could download at 6MB/s, to my USB key, only 700kb/s, even though I can copy files to this USB at 5MB/s from another drive. wget shows the same behavior. This is also reported here:
https://superuser.com/questions/413750/why-is-downloading-over-usb-so-slow
So if this is your scenario, an alternative solution is to download to HDD first and then copy files to the slow medium after download completes.

How to perform a fast web request in C#

I have a HTTP based API which I potentially need to call many times. The problem is that I can't get the request to take less than about 20 seconds, though the same request made through a browser is near instantaneous. The following code illustrates how I have implemented it so far.
WebRequest r = HttpWebRequest.Create("https://example.com/http/command?param=blabla");
var response = r.GetResponse();
One solution would be to make an asynchronous request but I would like to know why it takes so long and if I can avoid it. I have also tried using the WebClient class but I suspect it uses a WebRequest internally.
Update:
Running the following code took about 40 seconds in Release Mode (measured with Stopwatch):
WebRequest g = HttpWebRequest.Create("http://www.google.com");
var response = g.GetResponse();
I'm working at a university where there might be different things in the network configuration affecting the performance, but the direct use of the browser illustrates that it should be near instant.
Update 2:
I uploaded the code to a remote machine and it worked fine so the conclusion must be that the .NET code does something extra compared to the browser or it has problems resolving the address through the university network (proxy issues or something?!).
This problem is similar to another post on StackOverflow:
Stackoverflow-2519655(HttpWebrequest is extremely slow)
Most of the time the problem is the Proxy server property. You should set this property to null, otherwise the object will attempt to search for an appropriate proxy server to use before going directly to the source. Note: this property is turn on by default, so you have to explicitly tell the object not to perform this proxy search.
request.Proxy = null;
using (var response = (HttpWebResponse)request.GetResponse())
{
}
I was having the 30 second delay on 'first' attempt - JamesR's reference to the other post mentioning setting proxy to null solved it instantly!
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(_site.url);
request.Proxy = null; // <-- this is the good stuff
...
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Does your site have an invalid SSL cert? Try adding this
ServicePointManager.ServerCertificateValidationCallback = new System.Net.Security.RemoteCertificateValidationCallback(AlwaysAccept);
//... somewhere AlwaysAccept is defined as:
using System.Security.Cryptography.X509Certificates;
using System.Net.Security;
public bool AlwaysAccept(object sender, X509Certificate certification, X509Chain chain, SslPolicyErrors sslPolicyErrors)
{
return true;
}
You don't close your Request. As soon as you hit the number of allowed connections, you have to wait for the earlier ones to time out. Try
using (var response = g.GetResponse())
{
// do stuff with your response
}

Posting using POST from C# over https

After wasting two days with this question (and trying to make it work), I've decided to take a step back and ask a more basic question, because apparently there's something I don't know or I'm doing wrong.
The requirements are simple, I need to make an HTTP post (passing a few values) over https from C#.
The website (if given the appropriate values) will return some simple html and a response code. (i'll show these later).
It's really that simple. The "webservice" works. I have a php sample that works and successfully connects to it. I also have a Dephi "demo" application (with source code) that also works. And finally I have the demo application (binary) from the company that has the "service", that also works of course.
But I need to do it through C#. That that sounds so simple, it is not working.
For testing purposes I've created a simple console app and a simple connect method. I've tried like 7 different ways to create an HTTP request, all more or less the same thing, different implementation (Using WebClient, using HttpWebRequest, etc).
Every method works, except when the URI begins with 'https'.
I get a webexception saying that the remote server returned 404. I've installed Fiddler (as suggested by a SO user), and investigated a little bit the traffic. The 404 is because I am passing something wrong, because as I mentioned later, the 'service' works. I'll talk about the fiddler results later.
The URL where I have to POST the data is: https://servicios.mensario.com/enviomasivo/apip/
And this is the POST data: (the values are fakes)
usuario=SomeUser&clave=SomePassword&nserie=01234567890123456789&version=01010000&operacion=220
The server might return a two/three lines response (sorry about the spanish, but the company is from Spain). Here's a sample of a possible response:
HTTP/1.1 200 OK
Content-Type: text/plain
01010000 100 BIEN
998
And here's another
HTTP/1.1 200 OK
Content-Type: text/plain
01010000 20 AUTENTIFICACION NEGATIVA
Ha habido un problema en la identificación ante el servidor. Corrija sus datos de autentificacion.
The 1st one means OK, and the 2nd one is Auth Failure.
As you can see the task is quite easy, only it doesn't work. If I use fiddler, I see that there's some sort of SSL stuff going on in the connection and then everything works fine. However, as far as I've read, .NET handles all that stuff for us (yes, i've added the callback to always validate invalid certs). I don't understand what I'm doing wrong. I can post/email the code, but what I'd like to know is very simple:
How can you make a POST over SSL using C# and a "simple" HttpWebRequest and later have the response in a string/array/Whatever for processing?
Trust me when I say I've been googling and Stackoverflowing for two days. I don't have any sort of proxy. The connection passes through my router. Standard ports. Nothing fancy. My Machine is inside a VMWare virtual machine and is Windows Vista, but given that the sample applications (php, delphi, binary) all work without an issue, I cannot see that as a problem).
The different samples (sans the binary) are available here if anyone wants to take a look at them.
I'd appreciate any help. If anyone wants to try with a "real" username, I have a demo user and I could pass you the user/pass for testing purposes. I only have one demo user (the one they gave me) and that's why I'm not pasting it here. I don't want to flood the user with tests ;)
I've tried (within the samples) using UTF8 and ASCII, but that didn't change anything.
I am 100% positive that there's something I have to do with SSL and I am not doing it because I don't know about it.
Thanks in advance.
Martín.
I was battling with the exact same problem a bit earlier (although in compact framework). Here's my question and my own answer to it:
Asynchronous WebRequest with POST-parameters in .NET Compact Framework
My version is asynchronous, so it's a bit more complex than what you're looking for, but the idea remains.
private string sendRequest(string url, string method, string postdata) {
WebRequest rqst = HttpWebRequest.Create(url);
// only needed, if you use HTTP AUTH
//CredentialCache creds = new CredentialCache();
//creds.Add(new Uri(url), "Basic", new NetworkCredential(this.Uname, this.Pwd));
//rqst.Credentials = creds;
rqst.Method = method;
if (!String.IsNullOrEmpty(postdata)) {
//rqst.ContentType = "application/xml";
rqst.ContentType = "application/x-www-form-urlencoded";
byte[] byteData = UTF8Encoding.UTF8.GetBytes(postdata);
rqst.ContentLength = byteData.Length;
using (Stream postStream = rqst.GetRequestStream()) {
postStream.Write(byteData, 0, byteData.Length);
postStream.Close();
}
}
((HttpWebRequest)rqst).KeepAlive = false;
StreamReader rsps = new StreamReader(rqst.GetResponse().GetResponseStream());
string strRsps = rsps.ReadToEnd();
return strRsps;
}
see my answer to your other question. I believe your problem may not be your C# code. The web service URL accually returns a 404 with several other tools I used, but it returns the response you indicated if you leave off the trailing slash from the web service URL, so I suggest trying that.
Oddly, it doesn't seem to matter if the trailing URL is there when not doing SSL. Something strange with that web server, I guess.
dont know if u already resolved this issue, it´s a post from one year ago. I am Spanish and I am using mensario too.
to send and http request: (this is in ASP but the process is the same one)
Function enviarMsg2
Dim oHTTP,inicio
Dim strParametros
Dim devolver
Set oHTTP= server.CreateObject("Msxml2.ServerXMLHTTP")
strParametros = "usuario="&usuario&"&clave="&clave&"&nserie="&nserie&"& version=01010000&operacion=300&sms=1%0934635035526%0920041231233000%09Clinica+Paz%09Clinica+Paz+le+desea+Feliz+Navidad%2E&sms=2%0934612345678%0920041231233001%09Clinica+Paz%09Clinica+Paz+le+desea+Feliz+Navidad%2E"
'response.Write strParametros
'response.End
'Abrimos la conexión con el método POST, ya que estamos enviando una petición.
oHTTP.open "POST", "https://servicios.mensario.com/enviomasivo/apip", False
'Agregamos encabezados HTTP requeridos...
oHTTP.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
'Enviamos la petición
oHTTP.send strParametros
devolver = oHTTP.responsetext
'Comprobamos si fue correcto
inicio = Instr(devolver, "100 BIEN")
'response.Write "--->"&inicio
'response.End
if inicio <=0 then
enviarSMS2 = "Ha ocurrido un error en el envío del SMS."
else
enviarSMS2 = Mid(devolver,inicio+9,len(devolver))
end if
Set oHTTP = Nothing
The only thing i dont have is a user /password for a try :)
Basicaly, when the response is "100 bien" the function returns that, otherwise it returns error.Hope it helps :)

Categories