WCF not handle 1000 call per second - c#

I am working on a WCF Service that is hosted in Windows Service, using nettcpbinding.
when i tried to perform load test on the service i built a simple client that call the service about 1000 call in second, the return from the service take about 2 to 8 seconds at first and after leaving the simple client running for about half hour the time to return the result increased, and some client gives some time out exceptions for the send time which was configured to be 2 minutes.
i revised the service throltting configuration and it's like this
these are the steps i tried to perform:
revised the service throttling configuration
<serviceThrottling maxConcurrentCalls="2147483647" maxConcurrentInstances="2147483647" maxConcurrentSessions="2147483647"/>
was working on Windows 7 machine, so i moved to server 2008 but the same result.
update the configuration of tcp binding to be like the following
NetTcpBinding baseBinding = new NetTcpBinding(SecurityMode.None, true);
baseBinding.MaxBufferSize = int.MaxValue;
baseBinding.MaxConnections = int.MaxValue;
baseBinding.ListenBacklog = int.MaxValue;
baseBinding.MaxBufferPoolSize = long.MaxValue;
baseBinding.TransferMode = TransferMode.Buffered;
baseBinding.MaxReceivedMessageSize = int.MaxValue;
baseBinding.PortSharingEnabled = true;
baseBinding.ReaderQuotas.MaxDepth = int.MaxValue;
baseBinding.ReaderQuotas.MaxStringContentLength = int.MaxValue;
baseBinding.ReaderQuotas.MaxArrayLength = int.MaxValue;
baseBinding.ReaderQuotas.MaxBytesPerRead = int.MaxValue;
baseBinding.ReaderQuotas.MaxNameTableCharCount = int.MaxValue;
baseBinding.ReliableSession.Enabled = true;
baseBinding.ReliableSession.Ordered = true;
baseBinding.ReliableSession.InactivityTimeout = new TimeSpan(23, 23, 59, 59);
BindingElementCollection elements = baseBinding.CreateBindingElements();
ReliableSessionBindingElement reliableSessionElement = elements.Find<ReliableSessionBindingElement>();
if (reliableSessionElement != null)
{
reliableSessionElement.MaxPendingChannels = 128;
TcpTransportBindingElement transport = elements.Find<TcpTransportBindingElement>();
transport.ConnectionPoolSettings.MaxOutboundConnectionsPerEndpoint = 1000;
CustomBinding newBinding = new CustomBinding(elements);
newBinding.CloseTimeout = new TimeSpan(0,20,9);
newBinding.OpenTimeout = new TimeSpan(0,25,0);
newBinding.ReceiveTimeout = new TimeSpan(23,23,59,59);
newBinding.SendTimeout = new TimeSpan(0,20,0);
newBinding.Name = "netTcpServiceBinding";
return newBinding;
}
else
{
throw new Exception("the base binding does not " +
"have ReliableSessionBindingElement");
}
changed my services function to use async and await
public async Task<ReturnObj> Connect(ClientInfo clientInfo)
{
var task = Task.Factory.StartNew(() =>
{
// do the needed work
// insert into database
// query some table to return information to client
});
var res = await task;
return res;
}
and updated the client to use async and await in it's call to the service.
applied the Worker thread solution proposed in this link
https://support.microsoft.com/en-us/kb/2538826
although i am using .net 4.5.1, and set the MinThreads to 1000 worker and 1000 IOCP
after all this the service start to handle more requests but the delay still exist and the simple client take about 4 hours to give time out
the strange thing that i found the service handle about 8 to 16 call within 100 ms, regarding the number of threads currently a live in the service.
i found a lot of articles talk about configuration needed to be placed in machine.config and Aspnet.config, i think this is not related to my case as i am using nettcp on windows service not IIS, but i have implemented these changes and found no change in the results.
could some one point me to what i am missing or i want from the service something it can't support?

It could be how your test client is written. With NetTcp, when you create a channel, it tries to get one from the idle connection pool. If it's empty, then it opens a new socket connection. When you close a client channel, it's returned back to the idle connection pool. The default size of the idle connection pool is 10, which means once there are 10 connections in the idle pool, any subsequent closes will actually close the TCP socket. If your test code is creating and disposing of channels quickly, you could be discarding connections in the pool. You could then be hitting a problem of too many sockets in the TIME_WAIT state.
Here is a blog post describing how to modifying the pooling behavior.

This is most likely due to the concurrency mode set to Single (this is default value). Try setting ConcurrencyMode to Multiple by adding ServiceBehaviourAttribute to your service implementation.
Be sure to check documenttation: https://msdn.microsoft.com/en-us/library/system.servicemodel.concurrencymode(v=vs.110).aspx
Example:
// With ConcurrencyMode.Multiple, threads can call an operation at any time.
// It is your responsibility to guard your state with locks. If
// you always guarantee you leave state consistent when you leave
// the lock, you can assume it is valid when you enter the lock.
[ServiceBehavior(ConcurrencyMode = ConcurrencyMode.Multiple)]
class MultipleCachingHttpFetcher : IContract
You may be interested also in Sessions, Instancing, and Concurrency article which describes concurrency problems.

Related

Hangfire: huge latency when using MySQL

I'm using HangFire with MySQL backend. When I do simple
var jobId = BackgroundJob.Enqueue(
() => Debug.WriteLine("Test"));
I see a delay of 5-10 seconds, even though I've set the polling rate to 1 sec:
app.UseHangfireServer(new BackgroundJobServerOptions()
{
SchedulePollingInterval = TimeSpan.FromSeconds(1),
ServerCheckInterval = TimeSpan.FromSeconds(1),
HeartbeatInterval = TimeSpan.FromSeconds(1)
});
My setup is the most simple as possible - server and client on the same machine, queue is empty. I was expecting delays not bigger than 1 second.
I do not plan to use distributed servers, can I force somehow a server to immediately start the task? I assume if I switch to in-memory storage for hangfire it would start tasks immediately, - I found https://github.com/perrich/Hangfire.MemoryStorage but it is stated that it should not be used in production. What are my options to optimize latency?

Async TcpClient Connect different when deployed on Windows and Linux

I'm writing a client application, that has to connect to a server application via TCP socket. The framework of choice is .NET Core 2.0 (it is not ASP.NET Core it is just a console app). I'm using a TcpClient class, and its .BeginConnect() and .EndConnect() methods, to be able to set a connection timeout. Here is the code:
public class Program
{
public static void Main(String[] args)
{
var c = new TcpClient();
int retryCount = 0;
var success = false;
IAsyncResult res;
do
{
if (retryCount > 0) Console.WriteLine("Retry: {0}", retryCount);
retryCount++;
c.Close();
c = new TcpClient();
res = c.BeginConnect("10.64.4.49", 13000, null, null);
success = res.AsyncWaitHandle.WaitOne(TimeSpan.FromSeconds(2));
Console.WriteLine(success.ToString());
}
while (!c.Connected);
c.EndConnect(res);
Console.WriteLine("Connected");
Console.ReadLine();
}
When I compile, publish and run this Console App, and nothing is listening on the IP address and port, the results if the app is running on Windows or Linux are different.
Here are the results on Windows:
Here is what it looks like on Linux:
The results are pretty the same, the only difference is on Windows it tries to connect every two seconds, but on Linux, it acts like this two seconds are ignored and goes on a "rampage connection session" as I call it.
I'm not sure if this is a .NET Core issue or some Linux tune-up, that Windows already have predefined.
Can anyone advice what might be the problem, and eventually propose a solution.
Thanks in advance,
Julian Dimitrov
I think I understand why you're having an issue, and it seems to be based upon a misunderstanding of what a timeout should do.
For the sake of testing, I changed your code to this:
var sw = Stopwatch.StartNew();
res = c.BeginConnect("127.0.0.1", 12, null, null);
success = res.AsyncWaitHandle.WaitOne(TimeSpan.FromSeconds(10));
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
On Windows, I can see that the connection fails after ~1 second, whereas running the same code within Linux, it fails almost instantly. It seems that Linux is capable of working out if a connection is possible faster than Windows is. I think perhaps you're mistaking the time Windows takes to work out it can't connect with the timeout you've specified.
Next: What is a timeout? A timeout is the maximum time a connection can take to be established. It's a limit. It means that the operation has to complete in less than X seconds (e.g. 10 seconds) or it fails. If an operation completes in 1 second, then it will be immediately returned.

Windows service - unrecoverable FtpWebRequest timeout when an FTP provider maintenance window occurs

I have a windows service, where every hour on a scheduled basis it downloads an FTP file from an FTP server. It uses the following code to do this:
var _request = (FtpWebRequest)WebRequest.Create(configuration.Url);
_request.Method = WebRequestMethods.Ftp.DownloadFile;
_request.Timeout = 20000;
_request.Credentials = new NetworkCredential("auser", "apassword");
using (var _response = (FtpWebResponse)_request.GetResponse())
using (var _responseStream = _response.GetResponseStream())
using (var _streamReader = new StreamReader(_responseStream))
{
this.c_fileData = _streamReader.ReadToEnd();
}
Normally, the downloading the FTP data works perfectly fine, however every few months the FTP server provider notifies us that some maintenance needs to be performed. So once maintenance is started (usually only 2 or 3 hours), our hourly attempt of a FTP download fails - i.e. it timeout, which is expected.
The problem is that post the maintenance window our windows service continues to timeout every time it attempts to download the file. Our windows service also has retry logic, but each retry also times out.
Once we do a restart of the windows service, the application starts downloading FTP files successfully again.
Does anyone know why we have to restart the windows service in order to recover from this failure?, Could it be a network issue e.g. DNS?
Note 1: There are similar questions to this one already, but they do not involve a maintenance window and they also do not have any credible answers either
Note 2: We profiled the memory of the application and it seems all ftp objects are being disposed of correctly.
Note 3: We executed a console app with same FTP code post maintenance window and it works fine, while the windows service was still timing out
Any help much appreciated
We eventually got to the bottom of this issue albeit not all questions were answered.
We found that when we used a different memory profiler, it showed up that two FtpWebRequest objects were in memory and had not been disposed for days in the process. These objects were what was causing the problem i.e. they were not being properly disposed.
From research, to solve the issue, we did the following:
Set the keep-alive to false
Set the connections lease timeout to a limited timeout value
Set the max idle time to a limited timeout value
Wrapped in a try/catch/finally, where the request is aborted in the finally block
We changed the code to the following:
var _request = (FtpWebRequest)WebRequest.Create(configuration.Url);
_request.Method = WebRequestMethods.Ftp.DownloadFile;
_request.Timeout = 20000;
_request.Credentials = new NetworkCredential("auser", "apassword");
_request.KeepAlive = false;
_request.ServicePoint.ConnectionLeaseTimeout = 20000;
_request.ServicePoint.MaxIdleTime = 20000;
try
{
using (var _response = (FtpWebResponse)_request.GetResponse())
using (var _responseStream = _response.GetResponseStream())
using (var _streamReader = new StreamReader(_responseStream))
{
this.c_fileData = _streamReader.ReadToEnd();
}
}
catch (Exception genericException)
{
throw genericException;
}
finally
{
_request.Abort();
}
To be honest we are not sure if we needed to do everything here but the problem no longer exists i.e. objects do not hang around, the application still functions post a maintenance window so we are happy!

azure queue performance

For the windows azure queues the scalability target per storage is supposed to be around 500 messages / second (http://msdn.microsoft.com/en-us/library/windowsazure/hh697709.aspx). I have the following simple program that just writes a few messages to a queue. The program takes 10 seconds to complete (4 messages / second). I am running the program from inside a virtual machine (on west-europe) and my storage account also is located in west-europe. I don't have setup geo replication for my storage. My connection string is setup to use the http protocol.
// http://blogs.msdn.com/b/windowsazurestorage/archive/2010/06/25/nagle-s-algorithm-is-not-friendly-towards-small-requests.aspx
ServicePointManager.UseNagleAlgorithm = false;
CloudStorageAccount storageAccount=CloudStorageAccount.Parse(ConfigurationManager.AppSettings["DataConnectionString"]);
var cloudQueueClient = storageAccount.CreateCloudQueueClient();
var queue = cloudQueueClient.GetQueueReference(Guid.NewGuid().ToString());
queue.CreateIfNotExist();
var w = new Stopwatch();
w.Start();
for (int i = 0; i < 50;i++ )
{
Console.WriteLine("nr {0}",i);
queue.AddMessage(new CloudQueueMessage("hello "+i));
}
w.Stop();
Console.WriteLine("elapsed: {0}", w.ElapsedMilliseconds);
queue.Delete();
Any idea how I can get better performance?
EDIT:
Based on Sandrino Di Mattia's answer I re-analyzed the code I've originally posted and found out that it was not complete enough to reproduce the error. In fact I had created a queue just before the call to ServicePointManager.UseNagleAlgorithm = false; The code to reproduce my problem looks more like this:
CloudStorageAccount storageAccount=CloudStorageAccount.Parse(ConfigurationManager.AppSettings["DataConnectionString"]);
var cloudQueueClient = storageAccount.CreateCloudQueueClient();
var queue = cloudQueueClient.GetQueueReference(Guid.NewGuid().ToString());
//ServicePointManager.UseNagleAlgorithm = false; // If you change the nagle algorithm here, the performance will be okay.
queue.CreateIfNotExist();
ServicePointManager.UseNagleAlgorithm = false; // TOO LATE, the queue is already created without 'nagle'
var w = new Stopwatch();
w.Start();
for (int i = 0; i < 50;i++ )
{
Console.WriteLine("nr {0}",i);
queue.AddMessage(new CloudQueueMessage("hello "+i));
}
w.Stop();
Console.WriteLine("elapsed: {0}", w.ElapsedMilliseconds);
queue.Delete();
The suggested solution from Sandrino to configure the ServicePointManager using the app.config file has the advantage that the ServicePointManager is initialized when the application starts up, so you don't have to worry about time dependencies.
I answered a similar question a few days ago: How to achive more 10 inserts per second with azure storage tables.
For adding 1000 items in table storage it took over 3 minutes, and with the changes I described in my answer it dropped to 4 seconds (250 requests/sec). In the end, table storage and storage queues aren't all that different. The backend is the same, data is simply stored in a different way. And both table storage and queues are exposed through a REST API, so if you improve the way you handle your requests, you'll get a better performance.
The most important changes:
expect100Continue: false
useNagleAlgorithm: false (you're already doing this)
Parallel requests combined with connectionManagement/maxconnection
Also, ServicePointManager.DefaultConnectionLimit should be increased before making a service point. Actually Sandrino's answer says the same thing but using config.
Turn off proxy detection even in the cloud. Auto detect in proxy config element. Slows initialisation.
Choose distributed partition keys.
Collocate your account near to compute, and customers.
Design to add more accounts as needed.
Microsoft set the SLA at 2,000 tps on queues and tables as of 07 2012.
I didn't read Sandrino's linked answer, sorry, just was on this question as I was watching Build 2012 session on exactly this.

AppFabric DataCacheFactory initialization often takes ~30 seconds

When I initialize my client to connect to AppFabric's cache, it seems to inconsistently take up to 30 seconds to connect on the following line:
factory = new DataCacheFactory(configuration);
See full Init() code below - mostly taken from here.
I say inconsistently because sometimes it takes 1 second and other times 27, 28 , etc ... seconds. I have an asp.net site using the AppFabric cache - which lives on a different box (on the same domain). Everything is working great, except for the inconsistent connection time. When it connects, its all good - I just need to get it to consistently connect in ~1 second :) ... Thoughts?
public static void Init()
{
if (cache == null)
{
Stopwatch sw = new Stopwatch();
sw.Start();
try
{
//Define Array for 1 Cache Host
List<DataCacheServerEndpoint> servers = new List<DataCacheServerEndpoint>(1);
var appFabricHost = ConfigurationManager.AppSettings["AppFabricHost"];
var appFabricPort = ConfigurationManager.AppSettings["AppFabricPort"].ParseAs<int>();
//Specify Cache Host Details
// Parameter 1 = host name
// Parameter 2 = cache port number
servers.Add(new DataCacheServerEndpoint(appFabricHost, appFabricPort));
TraceHelper.TraceVerbose("Init", string.Format("Defined AppFabric - Host: {0}, Port: {1}", appFabricHost, appFabricPort));
//Create cache configuration
DataCacheFactoryConfiguration configuration = new DataCacheFactoryConfiguration();
//Set the cache host(s)
configuration.Servers = servers;
//Set default properties for local cache (local cache disabled)
configuration.LocalCacheProperties = new DataCacheLocalCacheProperties();
//Disable tracing to avoid informational/verbose messages on the web page
DataCacheClientLogManager.ChangeLogLevel(System.Diagnostics.TraceLevel.Off);
//Pass configuration settings to cacheFactory constructor
factory = new DataCacheFactory(configuration);
//Get reference to named cache
cache = factory.GetCache(cacheName);
TraceHelper.TraceVerbose("Init", "Defined AppFabric - CacheName: " + cacheName);
}
catch (Exception ex)
{
TraceHelper.TraceError("Init", ex);
}
finally
{
TraceHelper.TraceInfo("Init", string.Format("AppFabric init took {0} seconds", sw.Elapsed.Seconds));
}
if (cache == null)
{
TraceHelper.TraceError("Init", string.Format("First init cycle took {0} seconds and failed, retrying", sw.Elapsed.Seconds));
UrlShortener.Init(); // if at first you don't succeed, try try again ...
}
}
}
Is it any faster and/or more consistent if you keep all the configuration info in a .config file rather than creating your configuration programmatically? See here for details - I would always use this method as opposed to the programmatic configuration as it's much easier to update when something changes.
Otherwise I think the general advice is that DataCacheFactory is an expensive object to create due to what it does i.e. makes a network connection to each server in the cluster. You definitely don't want to be creating a DataCacheFactory every time you need to get something from the cache, instead you might want to think about creating it in Application_Start as perhaps a singleton and then reusing that one throughout your application (which, granted, doesn't solve the problem but it might serve to mitigate it).

Categories