Even though i am properly terminating everything when i check existing HTTP connections i see they are not terminated
For example when i open 200 concurrent connections by starting different tasks
I see
158 Established HTTP connections
927 TimeWait
95 SynSent
24 LastAck
6 CloseWait
34 FinWait
The worse part is, the number of TimeWait keep increasing each minute
So how can i prevent such issue to happen?
After a while the windows become unable to make any new requests
This problem occurs when i use webproxies : Too many proxy connection kills window's resolving hosts ability
Here when i use 200 connections with different proxies
Connections in TimeWait state can generate a performance problem.
First, take a look at TCP State diagram,
https://en.wikipedia.org/wiki/File:Tcp_state_diagram_fixed_new.svg
This is a state of a TCP connection after a machine’s TCP has sent the ACK segment in response to a FIN segment received from its peer (details in RFC 793 defining TCP back in 1981 http://www.ietf.org/rfc/rfc793.txt). During this state the socket resources, including the TCB (TCP Control Block) and the port of course, are not released to the OS. After a timeout expires, socket resources are released to the OS. The original reason is to deal with the Two Generals problem that can happen between peers in an unreliable medium. The connection will be in TimeWait until a configurable timeout which has a default value that is dependent on the operating system.
These links can help you to set the TcpTimedWaitDelay parameter in Windows:
https://technet.microsoft.com/en-us/library/cc938217.aspx
http://msdn.microsoft.com/en-us/library/ee377084%28v=bts.10%29.aspx
It says the default value is 240 seconds but I'm my tests I experienced lower times (between 60 and 120).
Anyway, today networks are more reliable and web services requiring high performance and throughput should reduce this value. I would suggest set it just to 5 seconds. If you want to be more conservative, set it to 30 seconds.
Other parameter that could be useful for you is the max number of ephemeral ports Windows allows a client to open. Windows Server by default limits the maximum number of ephemeral TCP ports. In some Windows, this value could be 5000. You can change this behavior by setting the value MaxUserPort in the registry.
Related
I have a Dotnet application (running on a Windows PC) which communicates with a Linux box via OPC UA. The use case here is to make ~40 read requests to the server in serial. Once these 40 read calls are complete, the next cycle of 40 read calls begins. Each read call returns a response from the server carrying a payload of ~16KB which is fragmented and delivered to the client. For most requests, the server finishes delivering the complete response within 5ms. However for some requests it takes ~300 ms to complete.
In scenarios where this delay exists, I can see the following pattern of re-transmissions.
[71612] A new Read request is sent to the server.
[71613-71630] The response is delivered to the client.
[71631] A new Read request is sent to the server.
[71632] A TCP Spurious Retransmission occurs from the server for packet [71844] with Seq No. 61624844
[71633] Client sends a DUP ACK for the packet.
[71634] Client does a TCP Retransmission for the read request in [71846] after 288ms
This delay adds up and causes some 5-6 seconds of delay for a complete cycle of 40 requests to complete. I want to figure out what is causing these retransmissions (hence delays) and what can possibly be done to-
Reduce the frequency of retransmissions.
Reduce the 300ms delay from the client side to quickly retransmit the obstructed read request.
I have tried disabling the Nagle algorithm on the server to possibly improve performance but it did not have any effect. Also, when reducing the response size by half (8KB), the retransmissions are rare and hence the delay is minute as well. But reducing the response is not a valid solution in our use case.
The connection to the Linux box is through a switch, however while directly connecting to it point-point, there is marginal reduction in the delay.
I can share relevant code but I think this issue is likely with the TCP stack (or at least, some configuration that should be enabled?) hence it would make little difference.
I have two C# asp.net applications running on IIS:
The main application creates up to 80 threads where each of them will
establish an http connection to a certrain endpoint (all the same endpoint (LAN)) at a frequency of roughly 3 seconds.
That endpoint is beeing hosted on localhost (e.g localhost:4510).
This endpoint is the second application which represents the "driver" that will ultimately establish a connection to a device within LAN.
So it's totally possible to have 80 threads trying to make a request to driver/device at the same time.
Over time the app seems to have issues with anything involving httpclients. RavenDB, Elasticsearch and also the 80 threads.
I read a few things about ServicePointManager class; especially DefaultConnectionLimit and
MaxServicePoints and how the influence http througput.
I only have basic understanding of the underlying mechanism so I'd like to ask if I should focus on a specific subject or what I would want to check to may improve on http throughput.
Update:
With current configuration CPU load is low and memory consumption also.
Following code shows how the 80 httpclients which connect to the driver on localhost:4510:
var driverBaseAddressSp= ServicePointManager.FindServicePoint(driverBaseAddress); Debug.WriteLine(driverBaseAddressSp.ConnectionLimit);
Debug.WriteLine(driverBaseAddressSp.MaxIdleTime);
var connectionUriSp = ServicePointManager.FindServicePoint(connectionUri);
Debug.WriteLine(connectionUriSp.ConnectionLimit);
Debug.WriteLine(connectionUriSp.MaxIdleTime);
return new HttpClient { BaseAddress = driverBaseAddress };
ConnectionLimit shows Int.Max when debugging but
I cannot find any configuration in the solution?
We have written a simple WebSocket client using System.Net.WebSockets. The KeepAliveInterval on the ClientWebSocket is set to 30 seconds.
The connection is opened successfully and traffic flows as expected in both directions, or if the connection is idle, the client sends Pong requests every 30 seconds to the server (visible in Wireshark).
But after 100 seconds the connection is abruptly terminated due to the TCP socket being closed at the client end (watching in Wireshark we see the client send a FIN). The server responds with a 1001 Going Away before closing the socket.
After a lot of digging we have tracked down the cause and found a rather heavy-handed workaround. Despite a lot of Google and Stack Overflow searching we have only seen a couple of other examples of people posting about the problem and nobody with an answer, so I'm posting this to save others the pain and in the hope that someone may be able to suggest a better workaround.
The source of the 100 second timeout is that the WebSocket uses a System.Net.ServicePoint, which has a MaxIdleTime property to allow idle sockets to be closed. On opening the WebSocket if there is an existing ServicePoint for the Uri it will use that, with whatever the MaxIdleTime property was set to on creation. If not, a new ServicePoint instance will be created, with MaxIdleTime set from the current value of the System.Net.ServicePointManager MaxServicePointIdleTime property (which defaults to 100,000 milliseconds).
The issue is that neither WebSocket traffic nor WebSocket keep-alives (Ping/Pong) appear to register as traffic as far as the ServicePoint idle timer is concerned. So exactly 100 seconds after opening the WebSocket it just gets torn down, despite traffic or keep-alives.
Our hunch is that this may be because the WebSocket starts life as an HTTP request which is then upgraded to a websocket. It appears that the idle timer is only looking for HTTP traffic. If that is indeed what is happening that seems like a major bug in the System.Net.WebSockets implementation.
The workaround we are using is to set the MaxIdleTime on the ServicePoint to int.MaxValue. This allows the WebSocket to stay open indefinitely. But the downside is that this value applies to any other connections for that ServicePoint. In our context (which is a Load test using Visual Studio Web and Load testing) we have other (HTTP) connections open for the same ServicePoint, and in fact there is already an active ServicePoint instance by the time that we open our WebSocket. This means that after we update the MaxIdleTime, all HTTP connections for the Load test will have no idle timeout. This doesn't feel quite comfortable, although in practice the web server should be closing idle connections anyway.
We also briefly explore whether we could create a new ServicePoint instance reserved just for our WebSocket connection, but couldn't see a clean way of doing that.
One other little twist which made this harder to track down is that although the System.Net.ServicePointManager MaxServicePointIdleTime property defaults to 100 seconds, Visual Studio is overriding this value and setting it to 120 seconds - which made it harder to search for.
I ran into this issue this week. Your workaround got me pointed in the right direction, but I believe I've narrowed down the root cause.
If a "Content-Length: 0" header is included in the "101 Switching Protocols" response from a WebSocket server, WebSocketClient gets confused and schedules the connection for cleanup in 100 seconds.
Here's the offending code from the .Net Reference Source:
//if the returned contentlength is zero, preemptively invoke calldone on the stream.
//this will wake up any pending reads.
if (m_ContentLength == 0 && m_ConnectStream is ConnectStream) {
((ConnectStream)m_ConnectStream).CallDone();
}
According to RFC 7230 Section 3.3.2, Content-Length is prohibited in 1xx (Informational) messages, but I've found it mistakenly included in some server implementations.
For additional details, including some sample code for diagnosing ServicePoint issues, see this thread: https://github.com/ably/ably-dotnet/issues/107
I set the KeepAliveInterval for the socket to 0 like this:
theSocket.Options.KeepAliveInterval = TimeSpan.Zero;
That eliminated the problem of the websocket shutting down when the timeout was reached. But then again, it also probably turns off the send of ping messages altogether.
I studied this issue these days, compared capture packages in Wireshark(webclient-client of python and WebSocketClient of .Net), and found what happened. In WebSocketClient, "Options.KeepAliveInterval" only send one packet to the server when no message received from server in these period. But some server only judge if there is active message from client. So we have to manually send arbitrary packets (not necessarily ping packets,and WebSocketMessageType has no ping type) to the server at regular intervals,even if the server side continuously sends packets. That's the solution.
Given an application that in parallel requests 100 urls at a time for 10000 urls, I'll receive the following error for 50-5000 of them:
The remote name cannot be resolved 'www.url.com'
I understand that the error means the DNS Server was unable to resolve the url. However, for each run, the number of urls that cannot be resolved changes (ranging from 50 to 5000).
Am I making too many requests too fast? And can I even do that? - Running the same test on a much more powerful server, shows that only 10 urls could not be resolved - which sounds much more realistic.
The code that does the parallel requesting:
var semp = new SemaphoreSlim(100);
var uris = File.ReadAllLines(#"C:\urls.txt").Select(x => new Uri(x));
foreach(var uri in uris)
{
Task.Run(async () =>
{
await semp.WaitAsync();
var result = await Web.TryGetPage(uri); // Using HttpWebRequest
semp.Release();
});
}
I'll bet that you didn't know that the DNS lookup of HttpWebRequest (which is the cornerstone of all .net http apis) happens synchronously, even when making async requests (annoying, right?). This means that firing off many requests at once causes severe ThreadPool strain and large amount of latency. This can lead to unexpected timeouts. If you really want to step things up, don't use the .net dns implementation. You can use a third party library to resolve hosts and create your webrequest with an ip instead of a hostname, then manually set the host header before firing off the request. You can achieve much higher throughput this way.
It does sound like you're swamping your local DNS server (in the jargon, your local recursive DNS resolver).
When your program issues a DNS resolution request, it sends a port 53 datagram to the local resolver. That resolver responds either by replying from its cache or recursively resending the request to some other resolver that's been identified as possibly having the record you're looking for.
So, your multithreaded program is causing a lot of datagrams to fly around. Internet Protocol hosts and routers handle congestion and overload by dropping datagram packets. It's like handling a traffic jam on a bridge by bulldozing cars off the bridge. In an overload situation, some packets just disappear.
So, it's up to endpoint software using datagram protocols to try again if their packets get lost. That's the purpose of TCP, and that's how it can provide the illusion of an error-free stream of data even though it can only communicate with datagrams.
So, your program will need to try again when you get resolution failure on some of your DNS requests. You're a datagram endpoint so you own the responsibility of retry. I suspect the .net library is give you back failure when some of your requests time out because your datagrams got dropped.
Now, here's the important thing. It is also the responsibility of a datagram endpoint program, like yours, to implement congestion control. TCP does this automatically using its sliding window system, with an algorithm called slow-start / exponential backoff. If TCP didn't do this all internet routers would be congested all the time. This algorithm was dreamed up by Van Jacobson, and you should go read about it.
In the meantime you should implement a simple form of it in your bulk DNS lookup program. Here's how you might do that.
Start with a batch size of, say, 5 lookups.
Every time you get the whole batch back successfully, increase your batch size by one for your next batch. This is slow-start. As long as you're not getting congestion, you increase the network load.
Every time you get a failure to resolve a name, reduce the size of the next batch by half. So, for example, if your batch size was 30 and you got a failure, your next batch size will be 15. This is exponential backoff. You respond to congestion by dramatically reducing the load you're putting on the network.
Implement a maximum batch size of something like 100 just to avoid being too much of a pig and looking like a crude denial-of-service attack to the DNS system.
I had a similar project a while ago and this strategy worked well for me.
I am using Entity framework 4.0 in conjunction with REST web service.
On the client side, during data/entities loading, client is making 40 sequential web requests.
When I set HttpWebRequest.KeepAlive to false (Fiddler shows Connection: Close headers in client-server communication), data loading is faster about 50% (requests are still sequential) - and I am wondering why.
From Wikipedia:
HTTP persistent connection, also called HTTP keep-alive, or HTTP connection reuse, is the idea of using the same TCP connection to send and receive multiple HTTP requests/responses, as opposed to opening a new connection for every single request/response pair.
From MSDN:
When the KeepAlive property is true, the application makes persistent connections to the servers that support them.
When using HTTP/1.1, Keep-Alive is on/true by default.
What´s wrong? How can I speed up persistent requests?
Maybe on the client the limit for no. of concurrent connections per IP is higher for non-persistent connections than for persistent. So when using keep-alive, client may have allowed you to have 10 conns in parallel, but when not using keep-alive, you can have for example 15 parallel connections.
But this will be faster only on local network where establishing connection is really fast. On internet (RTT of 5-200 ms) you would need 3x RTT time (SYN, SYN+ACK, ACK) only to begin new connection. So especially if you have many small requests (for example images under 1kB), the speed of keep-alive can be 4x faster - because you setup the connection only once and then send 1 packet as request and receive 1 packet as response. But without keepalive, you need 3 packets to begin, then send request, then receive response and then 2 packets to close the connection.