I have a Windows Service hosting an advanced WCF service that communicates over TCP(netTCP) with protobuf.net, some times also with certificates.
The receiveTimeout is set to infinite to never drop the connection due to inactivity. But from what I understand the connection could be dropped anyway so I have created a simple two way keepalive service method that the client is calling every 9 min to keep the connection alive. It's very important that the connection never breaks.
Is this the correct way? Or could I simply remove my keep live because the receiveTimout is set to infinite?
Edit : Current app.config for WCF service : http://1drv.ms/1uEVKIt
No. This is widely misunderstood, and unfortunately there is much misinformation out there.
First, "Infinite" is a sort of semi-valid value. There is are two special config serializers that convert "Infinite" to either TimeSpan.MaxValue or int.MaxValue (so they're not really "infinite" anyways), but not everything in WCF seems to recognize this. So it's always best to specify your timeouts explicitly with time values.
Second, you don't need a "keepalive" method in your service, since WCF provides what's called a "reliable session". If you add <reliableSession enabled="true" /> then WCF will provide it's own keep alive mechanism through "infrastructure messages".
By having your own "keepalive" mechanism, you're effectively doubling the load on your service and you can actually create more problems than it solves.
Third, when using a reliable session, you use the inactivityTimeout setting of reliableSession. This does two things. First, it controls how frequently infrastructure (keepalive) messages are sent. They are sent at half the timeout value, so if you set it to 18 minutes, then they will be sent every 9 minutes. Secondly, if no infrastructure or operation messages (ie messages that are part of your data contract) are received within the inactivity timeout, the connection is aborted because there has likely been a problem (one side has crashed, there's a network problem, etc..).
receiveTimeout is the maximum amount of time in which no operation messages can be received before the connection is aborted (the default is 10 minutes). Setting this to a large value (Int32.MaxValue is somewhere in the vicinity of 24 days) keeps the connection tacked up, setting inactivityTimeout to a smaller value (again, the default is 10 minutes) (to a time that is smaller than 2x the maximum amount of time before network routers will drop a connection from inactivity) keeps the connection alive.
WCF handles all this for you. You can then simply subscribe to the Connection Aborted messages to know when the connection is dropped for real reasons (app crashes, network timeouts, clients losing power, etc..) and allows you to recreate the connections.
Additionally, if you don't need ordered messages, set ordered="false", as this greatly reduces the overhead of reliable sessions. The default is true.
Note: You may not receive a connection aborted event until the inactivityTimeout has expired (or you try to use the connection). Be aware of this, and set your timeouts accordingly.
Most recommendations on the internet are to set both receiveTimeout and inactivityTimeout to Infinite. This has two problems, first infrastructure messages don't get sent in a timely manner, so routers will drop the connection... forcing you to do your own keepalives. Second, the large inactivity timeout means it won't recognize when a connection legitimately drops, and you have to rely on on that ping aborting to know when a failure occurs. This is all completely unnecessary, and can in fact even make your service even more unreliable.
See also this: How do I correctly configure a WCF NetTcp Duplex Reliable Session?
Related
I'm way out of my comfort zone so bear with me on providing the relevant information. We have just moved a IIS hosted WCF service to a new server and clients calling this service started experiencing timeouts. It does ok for about 10 minutes after recycling the app pool and then everything begins timing out. We enabled WCF tracing where I can see that its saying the MaxConcurrentSessions has been exceeded. The documentation says that value defaults to 2 x [# of processors] so it should be 200 for us.
The server is behind a load balancer, but is currently the only server. We notice the connections hang out at around 6 per second in Performance Monitor but will climb up to around 30 when the timeouts happen and continue climbing up from there.
The clients are connecting using a wsHttpBinding TransportWithMessageCredential security. The service validates the credentials provided in the message using the asp.net membership provider in a custom UserNamePasswordValidator configured for use on the server binding behavior. The clients do not enable reliableSession on their bindings. The service uses the default SessionMode and InstanceContextMode which I believe are Allowed and PerSession respectively? We do not call Close on the service proxies because in past investigation, I've found this only sets a flag on the option preventing it from being re-used and ours always go out of scope anyway...but now doing testing to see if this does close the connection.
If I'm interpreting the WCF trace log correctly (and I don't understand the majority of what I'm reading there) it appears we are processing around 30-40 messages per minute and that each request is completed in less than 300ms (usually much less, on rare occasions nearly 1s.) I determined the number of messages by counting the Processing message n messages over a few 1 min spans. So if we're getting 40 per minute and it takes 100s for those connections/sessions to timeout and close, we would still only have about 68 open at once before the first ones begin to time out. Not close to the 200 limit. Does the connection for a single client request get more than one session?
The strange thing is we didn't have any timeouts before and copied the service and web.config straight over to the new server. I believe the server and IIS versions were upgraded (server 2016, IIS 10.) Can you please help me identify and provide the relevant information to track down the problem causing these timeouts?
Edit:
From my reading, everything seems to indicate that the client must call Close otherwise the server will leave the connection open until it times out. However, in our test, we see one connection created in perf. mon. but it remains open after Close has been called anyway. So I can't determine if the need to call close is a rumor or if we are misinterpretting our monitoring. The real test would be to call Close everywhere and see if it eliminates our timeouts.
After increasing our MaxConcurrentSessions to 400, in performance monitor, we saw the number of concurrent sessions and instances steadily rise by about 1 per second up to about 225 where it finally leveled off and it's hovering around there. So it seems like sessions are not being closed.
Well we figured it out. There was nothing that just popped up and told us what the problem was and it took a lot of brain storming, but here's what we did:
Enabled WCF tracing. Went through the traces and was able to understand enough to basically see that the traffic didn't look out of the ordinary. All of the events seemed to be for the expected amount and types of service calls. Viewing in svctraceviewer, It didn't seem to be a DOS attack or anything like that. We just used the default configuration from that link, but it looks like it can be very customized to provide the specific information you're after if you know what that is.
What really helped in this case was finding the WCF Performance Counters. Initially we were using ASP.NET performance counters to look at sessions open which was not the right metric. This codeproject guide helped us enable the WCF performance counters to give us an insight into the number of sessions and the limit in real time.
It also helped to brush up on how WCF sessions and instances are related as well as creation of a security context:
https://www.codeproject.com/Articles/188749/WCF-Sessions-Brief-Introduction
http://webservices20.blogspot.com/2009/01/wcf-performance-gearing-up-your-service.html
https://learn.microsoft.com/en-us/previous-versions/dotnet/netframework-4.0/hh273122(v=vs.100)
We were able to see the percentage of the max WCF sessions being used, and observed it climbing higher and higher towards the default limit of 200 (100 per processor) but eventually level off between 150 and 200. This leveling off, together with far more sessions existing at a given time than the average number of requests per minute seen in our WCF tracing, indicated that sessions were closing but seemed to be remaining open until they timed out rather than closing as soon as the server completed the request.
Somewhere on Stack Overflow, that I've been unable to find, I once asked about the purpose of the [ClientBase<TChannel>.Close][4] method (a.k.a. the close method of a WCF service proxy) and, somewhat incorrectly, came to the conclusion that all it did is set a flag on the proxy object marking it closed so that it couldn't be used again. The documentation's description of the method seems in line with that:
Causes the ClientBase<TChannel> object to transition from its current
state into the closed state.
Well at the point that I would call Close, my references always just go out of scope anyway allowing garbage collection to clean it up so that seemed pointless. But I think a key factor was that that was regarding basicHttpBindings which are stateless. In this case, we are using wsHttpBindings which are stateful which means the server leaves keeps the session and leaves the connection open after it completes the request so that subsequent calls from the client can be made on the same connection. So, though I couldn't find any documentation or track down in the source code where it happens, it seems WCF clients must call Close on their service proxy after they make their last request in order to tell the server it can close the connection and free up that session slot. I didn't have the opportunity to look for a message sent to the server upon calling Close to do this, but we were able to observe, using the Performance Counter, the number of sessions dropping from 1 to 0 where before it would remain at 1 after our client called the service.
But we're saying a WCF client, who we may have no control over, is able to harm server performance and possibly create a denial of service if they aren't diligent in their coding and remembering to call Close and the server has no control over its own performance?? That sounds like a recipe for disaster. Well there are two things you can do on the server to mitigate this. First you can increase the max number of sessions. In our case we were hovering around 175 but occasionally under traffic spikes exceeding the 200. We bumped it up to 800 temporarily to ensure we wouldn't exceed the max. The trade-off is dedicating more server resources to holding those sessions that will probably never be used again until they time out. Luckily, the server also controls the timeout. The service can control the length these sessions are held open using the ReceiveTimeout and the InactivityTimeout. Both default to 10 minutes but the lesser of the two will be used. If you're thinking, "Receive timeout sounds wrong. That controls the amount of time the service can take to receive a large message", you're not alone. However, that's incorrect. On the server side:
ReceiveTimeout – used by the Service Framework Layer to initialize the session-idle timeout which controls how long a session can be idle before timing out.
And on the client-side it is not used. So we set our ReceiveTimeout to 30 seconds and the sessions dropped significantly. That may have actually been too low because some spots in code that do re-use the service proxy (making multiple calls in a loop for instance, or doing some data processing in between calls) are now getting an error when trying to call the service after the session has been closed. So you will have to find the right balance. But best practice, it seems, is to close your connections.
One gotcha to watch out for is using Dispose on your service proxy. I had always tried typing .dispo to see if intellisense would popup the Dispose method on my proxy and found that it didn't so assumed it didn't implement IDisposable and didn't need to be closed or disposed. It turns out it does implement IDisposable but it does it explicitly so you'd have to cast it as an IDisposable to call Dispose on it. But wait! Don't go putting your proxy in a using statement just yet. The implementation of Dispose sillily just calls Close on the proxy which will throw an exception if the proxy is in the faulted state (i.e. if a service call threw an exception). So you can't safely do something like this:
using(MyWcfClient proxy = new MyWcfClient())
{
try
{
proxy.Calculate();
}
catch(Exception)
{
}
}
because if Calculate throws an exception, the closing bracket of the using block will also throw an exception when it tries to dispose your proxy. Instead you just have to call Close after your last service method call. Evidently you can also call Abort in the catch, but I'm not sure if that actually communicates with the server to end the session.
MyWcfClient proxy = new MyWcfClient
try
{
proxy.Calculate();
proxy.Close();
}
catch(Exception)
{
proxy.Abort();
}
Addendum
We surmise the reason we started experiencing this when moving servers and were not experiencing it before is we were using Barracuda products before and are now using Oracle and perhaps the old load balancer or firewall was closing open connections for us.
When using WCF for 2 computer to communicate over the network, i am executing a method on the remote server, the time the operation can take is not known it can take from 1 second to a day or more, so i want to set the ((IClientChannel)pipeProxy).OperationTimeout property to a high value, but is this the way to go or is this a dirty way of programming, because a connection is active for the whole time (it is all on a relatively stable lan network).
I wouldn't do it like that. Such a long timeout is likely to cause issues.
I would split the operation into two: One call from client to server which starts the operation, and then a callback from the server to the client to say that it's finished. The callback would of course include any result information (success, failure etc).
For something which takes such a long time, you might also want to introduce a "keep alive" mechanism where the client periodically calls the server to check that it is still responding.
If you have a very long timeout, it makes it hard to know if something has actually gone wrong. But if you split the operation into two, it makes it impossible to know if something has gone wrong unless you poll occasionally with a keep-alive (or more accurately, "are you alive?") style message.
Alternatively, you could have the server call back occasionally with a progress message, but that's a bit harder to manage than having the client polling the server occasionally (because the client would have to track the last time the server called it back to determine if the server had stopped responding).
I've written a pair of applications, using WCF for a client/server connection between them.
Part of the functionality is that the client will download (potentially very large) files across WCF. Everything works fine.
I'm using basicHttpBinding, and have my sendTimeout and receiveTimeout set to a couple of minutes.
The trouble I have is, if I set my timeouts to be smaller, then they don't allow enough time for a large file download (especially across a slow network).
If I leave the timeouts as is, then I have to wait a long time when I get a dropped connection.
Is there a better way for me to deal with this issue, that still allows me to download the files over WCF?
EDIT: In addition to the answer from luksan, I found a lot of useful information on this previous post: Timeouts WCF Services
Have you tried setting the SendTimeout to a large value and leaving the ReceiveTimeout as is? I think SendTimeout times out a long-running operation whereas ReceiveTimeout times out an inactive channel.
So I realize this is a pretty loaded question, but here's what I'm trying to gauge.
I've got a server that accepts reliable-session tcp connections via WCF and opens a callbackchannel to the client. 99.999% of the time, it's just connected, waiting for the server to issue a callback (not actively processing anything, just maintaining the connection).
What kind of per machine bottlenecks will I hit? I've already handled WCF <servicethrottling /> attributes on the binding, but just from a load/max connection/"anything else I'm missing" standpoint, I'm trying to get a sense of how many clients can be served per Azure Small Instance given that by and large, these guys will be sitting idly by, just waiting.
If you're opening outbound connections, you'll want to consider increasing
ServicePointManager.DefaultConnectionLimit
in your role OnStart() code. I can't recall the default, but I believe it's 12.
While you're at it, might as well consider setting
ServicePointManager.UseNagleAlgorithm
to false if you push lots of short messages (under, oh, 1400 bytes). Otherwise the messages get buffered up to a half-second. I gave a bit more detail on Nagle in this SO answer.
I'm attempting to create a WCF service where several thousand (~10,000) clients can connect via a duplex NetTcpBinding for extended periods of time (weeks, maybe months).
After a bit of reading, it looks like it's better to host in IIS than a custom application or Windows service.
Is using WCF for such a service acceptable, or even possible? If so, where can I expect to run into throttling or performance issues, such as increasing the WCF ListenBacklog & MaxConcurrentConnections?
Thanks!
Why do you need to maintain opened connection for weeks / months? That will introduce a lot of complexity, timeouts handling, error handling, recreating connection, etc. I even doubt that this will work.
Net.tcp connections use transport session which leads to PerSession instancing of WCF service - the single service instance servers all requests and lives for the whole duration of the session (weeks or months in your case) = instance and whole its content is still in the memory. Any interruption or unhandled exception will break the channel and close the session = all session's local data are lost and client must crate new proxy to start a new session again. Also any timeout (default is 20 minutes of inactivity) will close the session. For the last - depending of business logic complexity you can find that if even few hundreds clients needs processing in the same time single server is not able to serve all of them and some clients will timeout (again breaks the session). Allowing load balancing with net.tcp demands load balancing algorithm with sticky sessions (session affinity) and whole architecture becomes even more complicated and fragile. Scalability in net.tcp means that service can be deployed on multiple servers but the whole client session must be handled by single server (if server dies all sessions served by the server die as well).
Hosting in IIS/WAS/AppFabric has several advantages where two of them is health monitoring and process recycling. Health monitoring continuously check that worker process is still alive and can process request - if it doesn't it silently starts new worker process and routes new incoming requests to that process. Process recycling regularly recycles (default setting is after 29 hours) application domain which makes process healthy and reducing memory leaks. The side effect is that both recreating process or application domain will kill all sessions. Once you self host the service you lose all of them so you have to deal with health of your service yourselves.
Edit:
IMHO health status information doesn't have to be send over TCP. That is information that doesn't require all the fancy stuff. If you lose some information it will not affect anything = you can use UDP for health status transfers.
When using TCP you don't need to maintain proxy / session opened just to keep opened the connection. TCP connection is not closed immediately when you close the proxy. It remains opened in a pool for short duration of time and if any other proxy needs connection to the same server it is reused (the default idle timeout in pool should be 2 minutes) - I discussed Net.Tcp transport in WCF in another answer.
I'm not a fan of callbacks - this whole concept in WCF is overused and abused. Keeping 10.000 TCP connection opened for months just in case to be able to send sometimes data back to few PCs sounds ridiculous. If you need to communicate with PC expose the service on the PC and call it when you need to send some commands. Just add functionality which will call the server when the PC starts and when the PC is about to shut down + add transfering monitoring informations.
Anyway 10.000 PCs sending information every minute - this can cause that you will receive 10.000 requests in the same time - it can have the same effect as Denial of service attack. Depending on the processing time your server(s) may not be able to process them and many requests will timeout. You can also think about some message queuing or publish-subscribe protocols. Messages will be passed to a queue or topic and server(s) will process them continuously.