In .NET, failure to retrieve HTTP resource from W3C web site - c#

Retrieving the resource at http://www.w3.org/TR/xmlschema11-1/XMLSchema.xsd takes around 10 seconds using the following mechanisms:
web browser
curl
Java URL.openConnection()
It's possible that the W3C site is applying some "throttling" - deliberately slowing the response to discourage bulk requests.
Trying to retrieve the same resource from a C# application on .NET, I get a timeout after about 60-70 seconds. I've tried a couple of different approaches, both with the same result:
System.Xml.XmlUrlResolver.GetEntity()
new WebClient().OpenRead(uri)
Anyone have any idea what's going on? Would another API, or some configuration options, solve the problem?

The problem is they are (probably) checking for a User-Agent string. If it's not present, they send you to purgatory. .NET's http clients do not set this by default.
So, give this a shot:
private static readonly HttpClient _client = new HttpClient();
public static async Task TestMe()
{
using (var req = new HttpRequestMessage(HttpMethod.Get,
"http://www.w3.org/TR/xmlschema11-1/XMLSchema.xsd"))
{
req.Headers.Add("user-agent",
"Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X)");
using (var resp = await _client.SendAsync(req))
{
resp.EnsureSuccessStatusCode();
var data = await resp.Content.ReadAsStringAsync();
}
}
}
No idea why they do this; Maybe it's a bug in their back-end? (I sure wouldn't want to leave a socket open longer than it needs to be for no good reason). The request still takes 10-15 seconds, but it's better than the 120+ second timeout.

Related

.Net C# RESTSharp 10 Minute Timeout

I have embedded a browser control into a .Net form and compiled it as a window's executable. The browser control is displaying our HTML5 image viewer. The application opens sockets so it can listen to "push" requests from various servers. This allows images to be pushed to individual user's desktops.
When an incoming image push request comes in, the application calls a REST service using RESTSharp to generate a token for the viewer to use to display the image.
As long as the requests are consistently arriving, everything works great. If there is a lull (10 minutes seems to be the time frame), then the RESTSharp request times out. It is almost as though the creation of a new instance of the RESTSharp artifacts are reusing the old ones in an attempted .Net optimization.
Here is the RESTSharp code I am using:
private async Task<string> postJsonDataToUrl(string lpPostData) {
IRestClient client = new RestClient(string.Format("{0}:{1}", MstrScsUrlBase, MintScsUrlPort));
IRestRequest request = new RestRequest(string.Format("{0}{1}{2}", MstrScsUrlContextRoot, MstrScsUrlPath, SCS_GENERATE_TOKEN_URL_PATH));
request.Timeout = 5000;
request.ReadWriteTimeout = 5000;
request.AddParameter("application/json", lpPostData, ParameterType.RequestBody);
IRestResponse response = await postResultAsync(client, request);
return response.Content;
} // postJsonDataToUrl
private static Task<IRestResponse> postResultAsync(IRestClient client, IRestRequest request) {
return client.ExecutePostTaskAsync(request);
} // PostResultAsync
This is the line where the time out occurs:
IRestResponse response = await postResultAsync(client, request);
I have tried rewriting this using .Net's HttpWebRequest and I get the same problem.
If I lengthen the RESTSharp timeouts, I am able to make calls to the server (using a different client) while the application is "timing out" so I know the server isn't the issue.
The initial version of the code did not have the await async call structure - that was added as an attempt to get more information on the problem.
I am not getting any errors other than the REST timeout.
I have had limited success with forcing a Garbage Collection with this call:
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
Any thoughts?
It is possible you are hitting the connection limit for .Net apps, as in MS docs:
"By default, an application using the HttpWebRequest class uses a maximum of two persistent connections to a given server, but you can set the maximum number of connections on a per-application basis."
(https://learn.microsoft.com/en-us/dotnet/framework/network-programming/managing-connections).
Closing the connections should help, or you might be able to increase that limit, that is also in the doc
I ended up putting
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
in a timer that fired every 2 minutes. This completely solved my issue.
This is very surprising to me since my HttpWebRequest code was wrapped in "using" statements so the resources should have been released properly. I can only conclude that .Net was optimizing the use of the class and was trying to reuse a stale class rather than allow me to create a new one from scratch.
A new way of doing things.
var body = #"{ ""key"": ""value"" }";
// HTTP package
var request = new RestRequest("https://localhost:5001/api/networkdevices", Method.Put);
request.AddHeader("Content-Type", "application/json");
request.AddHeader("Keep-Alive", "");// set "timeout=120" will work as well
request.Timeout = 120;
request.AddBody(body);
// HTTP call
var client = new RestClient();
RestResponse response = await client.ExecuteAsync(request);
Console.WriteLine(response.Content);

Big size of ServicePoint object after several hours sending HTTP request in parallel

We are using HttpClient to send requests to remote Web API in parallel:
public async Task<HttpResponseMessage> PostAsync(HttpRequestInfo httpRequestInfo)
{
using (var httpClient = new HttpClient())
{
httpClient.BaseAddress = new Uri(httpRequestInfo.BaseUrl);
if (httpRequestInfo.RequestHeaders.Any())
{
foreach (var requestHeader in httpRequestInfo.RequestHeaders)
{
httpClient.DefaultRequestHeaders.Add(requestHeader.Key, requestHeader.Value);
}
}
return await httpClient.PostAsync(httpRequestInfo.RequestUrl, httpRequestInfo.RequestBody);
}
}
This API can be called by several threads concurrently. After running about four hours we found memory leaks issue happened, from profiling tool, it seems there are two ServicePoint objects, one of which is quite big, about 160 MB.
From my knowledge, I can see some problems above codes:
We should share HttpClient instance as possible as we can. In our case, the request address and headers may vary a lot, so is this a point we can do something or it doesn't hurt too much performance? I just think of that we can prepare a dictionary to store and look up HttpClient instances.
We didn't modify the DefaultConnectionLimit of ServicePoint, so in default it can only send two requests to the same server concurrently. If we change this value to larger one, the memory leaks problem can be solved?
We also suppressed the HTTPS certificate validation: ServicePointManager.ServerCertificateValidationCallback = delegate { return true; }; Does this have something to do with the problem?
Due to this issue is not easily reproduced(need a lot of time), I just need some thoughts so that I can optimize our code for long time running.
Explain the situation myself, just in case others also meet this issue later.
First, this is not memory leak, it's something performance problem.
We test our application in virtual machine, on which we opened the proxy. It leads to the internet is quite slow. So in our case, each HTTP request might cost 3-4 seconds. As time going, there will be a lot of connections in the ServicePoint queue. Therefore, it's not memory leaks, that's just because the previous connections are not finished quickly enough.
Just make sure each HTTP request is not that slow, everything becomes normal.
We also tried to reduce the HttpClient instances, to increase the HTTP request performance:
private readonly ConcurrentDictionary<HttpRequestInfo, HttpClient> _httpClients;
private HttpClient GetHttpClient(HttpRequestInfo httpRequestInfo)
{
if (_httpClients.ContainsKey(httpRequestInfo))
{
HttpClient value;
if (!_httpClients.TryGetValue(httpRequestInfo, out value))
{
throw new InvalidOperationException("It seems there is no related http client in the dictionary.");
}
return value;
}
var httpClient = new HttpClient { BaseAddress = new Uri(httpRequestInfo.BaseUrl) };
if (httpRequestInfo.RequestHeaders.Any())
{
foreach (var requestHeader in httpRequestInfo.RequestHeaders)
{
httpClient.DefaultRequestHeaders.Add(requestHeader.Key, requestHeader.Value);
}
}
httpClient.DefaultRequestHeaders.ExpectContinue = false;
httpClient.DefaultRequestHeaders.ConnectionClose = true;
httpClient.Timeout = TimeSpan.FromMinutes(2);
if (!_httpClients.TryAdd(httpRequestInfo, httpClient))
{
throw new InvalidOperationException("Adding new http client thrown an exception.");
}
return httpClient;
}
In our case, only the request body is different for same server address. I also override the Equals and GetHashCode method of HttpRequestInfo.
Meanwhile, we set ServicePointManager.DefaultConnectionLimit = int.MaxValue;
Hopes this can help you.

How do I remove the delay between HTTP Requests when using Asynchronous actions in ASP.NET?

I am using HttpClient to send a GET request to a server inside of a while loop
while (cycle < maxcycle)
{
var searchParameters = new ASearchParameters
{
Page = cycle++,
id = getid
};
var searchResponse = await Client.SearchAsync(searchParameters);
}
and the SearchAsync contains
public async Task<AuctionResponse> SearchAsync()
{
var uriString = "Contains a https url with parameters"
var searchResponseMessage = await HttpClient.GetAsync(uriString);
return await Deserialize<AuctionResponse>(searchResponseMessage);
}
The thing is after every request there is a delay before the next request is started.
you can see this in fiddler timeline and also in fiddler there is "Tunnel To" example.com:443 before every request
Question : Why is there a delay and how to remove it ?
I see two things that are happening here. First, depending on the deserializer, it may take a while to translate your response back into an object. You might want to time that step and see if that's not the majority of your time spent. Second, the SSL handshake (the origin of your "tunnel to") does require a round trip to establish the SSL channel. I thought HttpClient sent a Keep-Alive header by default, but you may want to see if it is A) not being sent or B) being rejected. If you are re-establishing an SSL channel for each request, that could easily take on the order of a hundred ms all by itself (depending upon the server/network load).
If you're using Fiddler, you can enable the ability to inspect SSL traffic to see what the actual request/response headers are.
I believe you see this delay for a couple of reasons. Based on the code you provided, all other actions besides the request itself take up some fraction of the time between requests. So deserializing the response will add to a delay.
Also, the delay might be tied to the amount of data that is being returned and processed further down the stack. I tried to recreate the scenario you describe in your question with the following code:
const int MaxNumberOfCycles = 10;
static void Main()
{
Start().Wait();
}
async Task Start()
{
var client = new Client();
var cycle = 0;
while (cycle < MaxNumberOfCycles)
{
var response = await client.SearchAsync(cycle++);
}
}
class Client
{
public async Task<HttpResponseMessage> SearchAsync(int n)
{
// parameter 'n' used to vary web service response data
var url = ... // url removed for privacy
using (var client = new HttpClient())
using (var response = await client.GetAsync(url))
{
return response;
}
}
}
With small response sizes I saw no delay between requests. As response sizes increased I began to see slightly longer delays. Here's a screenshot for a series of requests returning 1MB responses:
One thing I noticed about your scenario is that your transfer activity graph shows a solid black line at the end of each request. This line indicates the "time to first byte", meaning that response processing did not even start until the very end of your request.
Another issue you might consider is that Fiddler is that causing these delays. I noticed that your responses aren't being streamed by Fiddler, which probably impacts the results. You can read more about response streaming in Fiddler.
I hope some of this information helps...

WebRequest "HEAD" light weight alternative

I recently discovered that the following does not work with certain sites, such as IMDB.com.
class Program
{
static void Main(string[] args)
{
try
{
System.Net.WebRequest wc = System.Net.WebRequest.Create("http://www.imdb.com"); //args[0]);
((HttpWebRequest)wc).UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/0.2.153.1 Safari/525.19";
wc.Timeout = 1000;
wc.Method = "HEAD";
WebResponse res = wc.GetResponse();
var streamReader = new System.IO.StreamReader(res.GetResponseStream());
Console.WriteLine(streamReader.ReadToEnd());
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
}
It returns an HTTP 405 ( Method Not Allowed ). My problem is, I use code very similar to the above to check if a link is valid and the vast majority of times it works correctly. I can switch it to method equal GET and it works ( with an increase in timeout ), but this slows things down by an order of magnitude. I am assuming the 405 response is a server configuration on IMDB's server side.
Is there a way for me to do the same thing as above, in a light weight manner in .NET? Or, is there a way to fix the above code so it works as a GET request that works with imdb?
Open the connection yourself with a socket (instead of an HttpRequest or WebClient), and close the stream as soon as you've read the status code. Fortunately the status code comes near the top of the response stream :)
You'll have to clarify what you mean by "lightweight". What are you trying to accomplish?
Whether or not you can use GET/POST/HEAD/DELETE/etc will depend on the URL and what's configured in the application that is running on the server at that URL.
If all you're trying to do is see if you can make a connection without actually downloading the content you could maybe try just initiating a connection to port 80 using sockets, but there isn't really reliable or universally supported way just by changing the HTTP method.
If HEAD returns a 405, that means the server doesn't support HEAD (at least for that URL) and you'll have fall back to GET instead. The majority of sites should support HEAD, so you probably want to do HEAD by default, but if it throws a 405, you could maybe fall back to GET for that domain. Or maybe you want to try HEAD first for each request; YMMV.
If the server requires GET and you want to reduce network traffic, you could try doing a conditional GET and/or a partial GET (see e.g. RFC2616). I've never tried doing those with WebRequest but I think it lets you add custom outgoing HTTP headers, so you should be able to do it.
Also, don't forget that, if you're writing a spider (which you clearly are), you should respect the server's robots.txt, and it's also courteous to throttle your requests to something like one request every two seconds, so you don't slashdot the server.

System.Net.WebClient unreasonably slow

When using the System.Net.WebClient.DownloadData() method I'm getting an unreasonably slow response time.
When fetching an url using the WebClient class in .NET it takes around 10 sec before I get a response, while the same page is fetched by my browser in under 1 sec.
And this is with data that's 0.5kB or smaller in size.
The request involves POST/GET parameters and a user agent header if perhaps that could cause problems.
I haven't (yet) tried if other ways to download data in .NET gives me the same problems, but I'm suspecting I might get similar results. (I've always had a feeling web requests in .NET are unusually slow...)
What could be the cause of this?
Edit:
I tried doing the exact thing using System.Net.HttpWebRequest instead, using the following method, and all requests finish in under 1 sec.
public static string DownloadText(string url)
var request = (HttpWebRequest)WebRequest.Create(url);
var response = (HttpWebResponse)request.GetResponse();
using (var reader = new StreamReader(response.GetResponseStream()))
{
return reader.ReadToEnd();
}
}
While this (old) method using System.Net.WebClient takes 15-30s for each request to finish:
public static string DownloadText(string url)
{
var client = new WebClient();
byte[] data = client.DownloadData(url);
return client.Encoding.GetString(data);
}
I had that problem with WebRequest. Try setting Proxy = null;
WebClient wc = new WebClient();
wc.Proxy = null;
By default WebClient, WebRequest try to determine what proxy to use from IE settings, sometimes it results in like 5 sec delay before the actual request is sent.
This applies to all classes that use WebRequest, including WCF services with HTTP binding.
In general you can use this static code at application startup:
WebRequest.DefaultWebProxy = null;
Download Wireshark here http://www.wireshark.org/
Capture the network packets and filter the "http" packets.
It should give you the answer right away.
There is nothing inherently slow about .NET web requests; that code should be fine. I regularly use WebClient and it works very quickly.
How big is the payload in each direction? Silly question maybe, but is it simply bandwidth limitations?
IMO the most likely thing is that your web-site has spun down, and when you hit the URL the web-site is slow to respond. This is then not the fault of the client. It is also possible that DNS is slow for some reason (in which case you could hard-code the IP into your "hosts" file), or that some proxy server in the middle is slow.
If the web-site isn't yours, it is also possible that they are detecting atypical usage and deliberately injecting a delay to annoy scrapers.
I would grab Fiddler (a free, simple web inspector) and look at the timings.
WebClient may be slow on some workstations when Automatic Proxy Settings in checked in the IE settings (Connections tab - LAN Settings).
Setting WebRequest.DefaultWebProxy = null; or client.Proxy = null didn't do anything for me, using Xamarin on iOS.
I did two things to fix this:
I wrote a downloadString function which does not use WebRequest and System.Net:
public static async Task<string> FnDownloadStringWithoutWebRequest(string url)
{
using (var client = new HttpClient())
{
//Define Headers
client.DefaultRequestHeaders.Accept.Clear();
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
var response = await client.GetAsync(url);
if (response.IsSuccessStatusCode)
{
string responseContent = await response.Content.ReadAsStringAsync();
//dynamic json = Newtonsoft.Json.JsonConvert.DeserializeObject(responseContent);
return responseContent;
}
Logger.DefaultLogger.LogError(LogLevel.NORMAL, "GoogleLoginManager.FnDownloadString", "error fetching string, code: " + response.StatusCode);
return "";
}
}
This is however still slow with Managed HttpClient.
So secondly, in Visual Studio Community for Mac, right click on your Project in the Solution -> Options -> set HttpClient implementation to NSUrlSession, instead of Managed.
Screenshot: Set HttpClient implementation to NSUrlSession instead of Managed
Managed is not fully integrated into iOS, doesn't support TLS 1.2, and thus does not support the ATS standards set as default in iOS9+, see here:
https://learn.microsoft.com/en-us/xamarin/ios/app-fundamentals/ats
With both these changes, string downloads are always very fast (<<1s).
Without both of these changes, on every second or third try, downloadString took over a minute.
Just FYI, there's one more thing you could try, though it shouldn't be necessary anymore:
//var authgoogle = new OAuth2Authenticator(...);
//authgoogle.Completed...
if (authgoogle.IsUsingNativeUI)
{
// Step 2.1 Creating Login UI
// In order to access SFSafariViewController API the cast is neccessary
SafariServices.SFSafariViewController c = null;
c = (SafariServices.SFSafariViewController)ui_object;
PresentViewController(c, true, null);
}
else
{
PresentViewController(ui_object, true, null);
}
Though in my experience, you probably don't need the SafariController.
Another alternative (also free) to Wireshark is Microsoft Network Monitor.
What browser are you using to test?
Try using the default IE install. System.Net.WebClient uses the local IE settings, proxy etc. Maybe that has been mangled?
Another cause for extremely slow WebClient downloads is the destination media to which you are downloading. If it is a slow device like a USB key, this can massively impact download speed. To my HDD I could download at 6MB/s, to my USB key, only 700kb/s, even though I can copy files to this USB at 5MB/s from another drive. wget shows the same behavior. This is also reported here:
https://superuser.com/questions/413750/why-is-downloading-over-usb-so-slow
So if this is your scenario, an alternative solution is to download to HDD first and then copy files to the slow medium after download completes.

Categories