GRPC performance vs WCF performance

GRPC performance vs WCF performance - c#

We have a legacy app that runs on top of WCF, so we are trying to move off of it, and find another technology. One of the problems is that we need performance from the wire, so part of evaluating GRPC is evaluating how quickly it works, but also how many simultaneous clients we can run.
So, to that end, we're simulating many calls with relatively low amount of data being passed through, but high number of calls. In that respect, WCF has turned out to be significantly better than GRPC, which is very unexpected. Is there possibly something wrong with the way the test were conceived and implemented?
The server code:
public override Task<TestReply> Test(TestRequest request, ServerCallContext context)
{
var ret = new char[request.Size];
var a = (int)'a';
for (var i = 0; i < request.Size; i++)
{
ret[i] = (char)(a + (i % 26));
}
return Task.FromResult(new TestReply { Message = new string(ret) });
}
The client code:
static void Main(string[] args)
{
AppContext.SetSwitch("System.Net.Http.SocketsHttpHandler.Http2UnencryptedSupport", true);
using var channel = GrpcChannel.ForAddress("http://remote_server:8001", new GrpcChannelOptions { Credentials = ChannelCredentials.Insecure });
var client = new Greeter.GreeterClient(channel);
string TestMethod(int i)
{
var request = new TestRequest {Size = i};
return client.Test(request).Message;
}
var start = DateTime.Now;
for (var i = 0; i < 15625; i++)
{
var val = TestMethod(10);
}
var end = DateTime.Now;
}
If we run one a single instance of the client, it takes just under 7 seconds. If we run 64 instances simultaneously, each takes an average of 23 seconds. Part of the problem is that running 64 instances is also CPU intensive, on both client and server. With 64 clients, the client will see 85-95% CPU utilization, and the server will see 70-80%.
By comparison, WCF will run a single instance of that code in 2.4 seconds, and 64 in an average of 9 seconds, and never experience significant CPU utilization on either.
Are we using GRPC wrongly? Is there something wrong with the test? What can we do to make GRPC run a little faster/leaner?

Related

How to read from multiple EventHub partitions simultaneously with high throughput?

My one role instance needs to read data from 20-40 EventHub partitions at the same time (context: this is our internal virtual partitioning scheme - 20-40 partitions represent scale out unit).
In my prototype I use below code. By I get throughput 8 MBPS max. Since if I run the same console multiple times I get throughput (perfmon counter) multiplied accordingly then I think this is not neither VM network limit nor EventHub service side limit.
I wonder whether I create clients correctly here...
Thank you!
Zaki
const string EventHubName = "...";
const string ConsumerGroupName = "...";
var connectionStringBuilder = new ServiceBusConnectionStringBuilder();
connectionStringBuilder.SharedAccessKeyName = "...";
connectionStringBuilder.SharedAccessKey = "...";
connectionStringBuilder.Endpoints.Add(new Uri("sb://....servicebus.windows.net/"));
connectionStringBuilder.TransportType = TransportType.Amqp;
var clientConnectionString = connectionStringBuilder.ToString();
var eventHubClient = EventHubClient.CreateFromConnectionString(clientConnectionString, EventHubName);
var runtimeInformation = await eventHubClient.GetRuntimeInformationAsync().ConfigureAwait(false);
var consumerGroup = eventHubClient.GetConsumerGroup(ConsumerGroupName);
var offStart = DateTime.UtcNow.AddMinutes(-10);
var offEnd = DateTime.UtcNow.AddMinutes(-8);
var workUnitManager = new WorkUnitManager(runtimeInformation.PartitionCount);
var readers = new List<PartitionReader>();
for (int i = 0; i < runtimeInformation.PartitionCount; i++)
{
var reader = new PartitionReader(
consumerGroup,
runtimeInformation.PartitionIds[i],
i,
offStart,
offEnd,
workUnitManager);
readers.Add(reader);
}
internal async Task Read()
{
try
{
Console.WriteLine("Creating a receiver for '{0}' with offset {1}", this.partitionId, this.startOffset);
EventHubReceiver receiver = await this.consumerGroup.CreateReceiverAsync(this.partitionId, this.startOffset).ConfigureAwait(false);
Console.WriteLine("Receiver for '{0}' has been created.", this.partitionId);
var stopWatch = new Stopwatch();
stopWatch.Start();
while (true)
{
var message =
(await receiver.ReceiveAsync(1, TimeSpan.FromSeconds(10)).ConfigureAwait(false)).FirstOrDefault();
if (message == null)
{
continue;
}
if (message.EnqueuedTimeUtc >= this.endOffset)
{
break;
}
this.processor.Push(this.partitionIndex, message);
}
this.Duration = TimeSpan.FromMilliseconds(stopWatch.ElapsedMilliseconds);
}
catch (Exception ex)
{
Console.WriteLine(ex);
throw;
}
}

The above code snippet you provided is effectively: creating 1 Connection to ServiceBus Service and then running all receivers on one single connection (at protocl level, essentially, creating multiple Amqp Links on that same connection).
Alternately - to achieve high throughput for receive operations, You will need to create multiple connections and map your receivers to connection ratio to fine-tune your throughput. That's what happens when you run the above code in multiple processes.
Here's how:
You will need to go one layer down the .Net client SDK API and code at MessagingFactory level - you can start with 1 MessagingFactory per EventHubClient. MessagingFactory is the one - which represents 1 Connection to EventHubs service. Code to create a dedicated connection per EventHubClient:
var connStr = new ServiceBusConnectionStringBuilder("Endpoint=sb://servicebusnamespacename.servicebus.windows.net/;SharedAccessKeyName=saskeyname;SharedAccessKey=sakKey");
connStr.TransportType = TransportType.Amqp;
var msgFactory = MessagingFactory.CreateFromConnectionString(connStr.ToString());
var ehClient = msgFactory.CreateEventHubClient("teststream");
I just added connStr in my sample to emphasize assigning TransportType to Amqp.
You will end up with multiple connections with outgoing port 5671:
If you rewrite your code with 1 MessagingFactory per EventHubClient (or a reasonable ratio) - you are all set (in your code - you will need to move EventHubClient creation to Reader)!
The only extra criteria one need to consider while creating multiple connections is the Bill - only 100 connections are included (including senders and receivers) in basic sku. I guess you are already on standard (as you have >1 TUs) - which gives 1000 connections included in the package - so no need to worry - but mentioning just-in-case.
~Sree

A good option is to create a Task for each partition.
This a copy of my implementation which is able to process a rate of 2.5k messages per second per partition. This rate will be also related to your downstream speed.
static void EventReceiver()
{
for (int i = 0; i <= EventHubPartitionCount; i++)
{
Task.Factory.StartNew((state) =>
{
Console.WriteLine("Starting worker to process partition: {0}", state);
var factory = MessagingFactory.Create(ServiceBusEnvironment.CreateServiceUri("sb", "tests-eventhub", ""), new MessagingFactorySettings()
{
TokenProvider = TokenProvider.CreateSharedAccessSignatureTokenProvider("Listen", "PGSVA7L="),
TransportType = TransportType.Amqp
});
var client = factory.CreateEventHubClient("eventHubName");
var group = client.GetConsumerGroup("customConsumer");
Console.WriteLine("Group: {0}", group.GroupName);
var receiver = group.CreateReceiver(state.ToString(), DateTime.Now);
while (true)
{
if (cts.IsCancellationRequested)
{
receiver.Close();
break;
}
var messages = receiver.Receive(20);
messages.ToList().ForEach(aMessage =>
{
// Process your event
});
Console.WriteLine(counter);
}
}, i);
}
}

c# Why the WebClient times out most of the timeswhen it is invoked through a thread?

I am working on a project which uses a timed web client. Class structure is like this.
Controller => Main supervisor of class
Form1, SourceReader, ReportWriter, UrlFileReader, HTTPWorker, TimedWebClient.
HTTPworker is the class to get the page source when the url is given.
TimedWebClient is the class to handle the timeout of the WebClient. Here is the code.
class TimedWebClient : WebClient
{
int Timeout;
public TimedWebClient()
{
this.Timeout = 5000;
}
protected override WebRequest GetWebRequest(Uri address)
{
var objWebRequest = base.GetWebRequest(address);
objWebRequest.Timeout = this.Timeout;
return objWebRequest;
}
}
In HTTPWorker i have
TimedWebClient wclient = new TimedWebClient();
wclient.Proxy = WebRequest.GetSystemWebProxy();
wclient.Headers["Accept"] = "application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*";
wclient.Headers["User-Agent"] = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDC)";
string pagesource = wclient.DownloadData(requestUrl);
UTF8Encoding objUTF8 = new UTF8Encoding();
responseData = objUTF8.GetString(pagesource);
I have handled exceptions there.
In Form1 i have a background controller and a urllist.
First Implementation :
First I took one url at a time and gave it to the ONLY Controller object to process.
Then it worked fine. But as it is sequential it took a long time when the list is too large.
Second Implementation:
Then in the Do_Work of the backgroundworker I made seven controllers and seven threads. Each controller has unique HTTPWorker object. But now it throws exceptions saying "timedout".
Below is the code in Form1.cs backgroundworker1_DoWork.
private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{
bool done = false;
while (!backgroundWorker1.CancellationPending && !done)
{
int iterator = 1;
int tempiterator = iterator;
Controller[] cntrlrarray = new Controller[numofcontrollers];
Thread[] threadarray = new Thread[numofcontrollers];
int cntrlcntr = 0;
for ( cntrlcntr = 0; cntrlcntr < numofcontrollers; cntrlcntr++)
{
cntrlrarray[cntrlcntr] = new Controller();
}
cntrlcntr = 0;
for (iterator = 1; iterator <= this.urlList.Count; iterator++)
{
int assignedthreads = 0;
for (int threadcounter = 0; threadcounter < numofcontrollers; threadcounter++)
{
cntrlcntr = threadcounter;
threadarray[threadcounter] = new Thread(() => cntrlrarray[cntrlcntr].Process(iterator - 1));
threadarray[threadcounter].Name = this.urlList[iterator - 1];
threadarray[threadcounter].Start();
backgroundWorker1.ReportProgress(iterator);
assignedthreads++;
if (iterator == this.urlList.Count)
{
break;
}
else
{
iterator++;
}
}
for (int threadcounter = 0; threadcounter < assignedthreads; threadcounter++)
{
cntrlcntr = threadcounter;
threadarray[threadcounter].Join();
}
if (iterator == this.urlList.Count)
{
break;
}
else
{
iterator--;
}
}
done = true;
}
}
What is the reason and the solution for this?
Appolgises for being too lengthy. Thank you in advance.

The sky... it's full of Threads! Seriously, though - don't use this many threads. That's what asynchronous I/O is for. If you're using .NET 4.5, this is very easy to do using await/async, otherwise it's a bit of boilerplate code, but it's still far preferable to this.
With that out of the way, the amount of TCP connections is quite limited by default. Even if there was a use for having 1000 downloads at once (and it probably isn't, since you're sharing bandwidth), you simply can't create and drop TCP connections willy-nilly - there's a limit to open TCP connections (anywhere from 5 to 20, unless you're on a server). You can change this, but it's usually preferred to do things differently. See this entry. This might also be a problem if this application is not running alone (which it probably isn't, given that you wouldn't have such a problem on server Windows). For example, torrent clients often bump into the half-open connection limit (a connection which is still waiting for the end of the initial TCP handskahe). This would be detriminal to your application, of course).
Now, even if you keep under this limit, there's also a fixed amount of outbound and inbound ports to use when communicating. This is a problem when you quickly open and close TCP connections, because TCP keeps the connection alive in the background for about 4 minutes (to make sure no wrong packets arrive to the port, which could be reused in the meantime). This means that if you create enough connections in this time interval, you're going to "starve" your port pool, and every new TCP connection will be denied (so your browser will temporarily stop working, etc.).
Next, a 5 second timeout is pretty low. Really. Imagine that it would take a second to complete a handshake (that's a ping of ~300ms, which is still within the realm of reasonable internet response). Suddenly, you've got a new connection, which has to wait for the other handshakes to finish, and it might take a few seconds just for that. And that's still just the initiation of the connection. Then there's the DNS lookup, and the response of the HTTP server itself... 5 seconds is a low timeout.
In short, it's not the multi-threading - it's the massive amounts of (useless) connections you're opening. Also, for URLs on a single web, you should look into Keep-Alive connections - they can reuse the already opened TCP connection, which significantly mitigates this problem.
Now, to get deeper into this. You're starting and destroying threads needlessly. Instead, it would be a better idea to have a URL queue and several thread consumers, that would take input from the queue. This way, you'll only have those 7 (or whatever the number) threads that poll from the queue as long as there's something in it, which saves a lot of system resources (and improves your performance). I'm thinking that the Thread.Join you're doing might also have something to do with your issues. Even though you're running the thing in a background worker, it just might be possible there's something strange hapenning in there.

ZeroMQ performance issue

I'm having an issue with ZeroMQ, which I believe is because I'm not very familiar with it.
I'm trying to build a very simple service where multiple clients connect to a server and sends a query. The server responds to this query.
When I use REQ-REP socket combination (client using REQ, server binding to a REP socket) I'm able to get close to 60,000 messages per second at server side (when client and server are on the same machine). When distributed across machines, each new instance of client on a different machine linearly increases the messages per second at the server and easily reaches 40,000+ with enough client instances.
Now REP socket is blocking, so I followed ZeroMQ guide and used the rrbroker pattern (http://zguide.zeromq.org/cs:rrbroker):
REQ (client) <----> [server ROUTER -- DEALER --- REP (workers running on different threads)]
However, this completely screws up the performance. I'm getting only around 4000 messages per second at the server when running across machines. Not only that, each new client started on a different machine reduces the throughput of every other client.
I'm pretty sure I'm doing something stupid. I'm wondering if ZeroMQ experts here can point out any obvious mistakes. Thanks!
Edit: Adding code as per advice. I'm using the clrzmq nuget package (https://www.nuget.org/packages/clrzmq-x64/)
Here's the client code. A timer counts how many responses are received every second.
for (int i = 0; i < numTasks; i++) { Task.Factory.StartNew(() => Client(), TaskCreationOptions.LongRunning); }
void Client()
{
using (var ctx = new Context())
{
Socket socket = ctx.Socket(SocketType.REQ);
socket.Connect("tcp://192.168.1.10:1234");
while (true)
{
socket.Send("ping", Encoding.Unicode);
string res = socket.Recv(Encoding.Unicode);
}
}
}
Server - case 1: The server keeps track of how many requests are received per second
using (var zmqContext = new Context())
{
Socket socket = zmqContext.Socket(SocketType.REP);
socket.Bind("tcp://*:1234");
while (true)
{
string q = socket.Recv(Encoding.Unicode);
if (q.CompareTo("ping") == 0) {
socket.Send("pong", Encoding.Unicode);
}
}
}
With this setup, at server side, I can see around 60,000 requests received per second (when client is on the same machine). When on different machines, each new client increases number of requests received at server as expected.
Server Case 2: This is essentially rrbroker from ZMQ guide.
void ReceiveMessages(Context zmqContext, string zmqConnectionString, int numWorkers)
{
List<PollItem> pollItemsList = new List<PollItem>();
routerSocket = zmqContext.Socket(SocketType.ROUTER);
try
{
routerSocket.Bind(zmqConnectionString);
PollItem pollItem = routerSocket.CreatePollItem(IOMultiPlex.POLLIN);
pollItem.PollInHandler += RouterSocket_PollInHandler;
pollItemsList.Add(pollItem);
}
catch (ZMQ.Exception ze)
{
Console.WriteLine("{0}", ze.Message);
return;
}
dealerSocket = zmqContext.Socket(SocketType.DEALER);
try
{
dealerSocket.Bind("inproc://workers");
PollItem pollItem = dealerSocket.CreatePollItem(IOMultiPlex.POLLIN);
pollItem.PollInHandler += DealerSocket_PollInHandler;
pollItemsList.Add(pollItem);
}
catch (ZMQ.Exception ze)
{
Console.WriteLine("{0}", ze.Message);
return;
}
// Start the worker pool; cant connect
// to inproc socket before binding.
workerPool.Start(numWorkers);
while (true)
{
zmqContext.Poll(pollItemsList.ToArray());
}
}
void RouterSocket_PollInHandler(Socket socket, IOMultiPlex revents)
{
RelayMessage(routerSocket, dealerSocket);
}
void DealerSocket_PollInHandler(Socket socket, IOMultiPlex revents)
{
RelayMessage(dealerSocket, routerSocket);
}
void RelayMessage(Socket source, Socket destination)
{
bool hasMore = true;
while (hasMore)
{
byte[] message = source.Recv();
hasMore = source.RcvMore;
destination.Send(message, message.Length, hasMore ? SendRecvOpt.SNDMORE : SendRecvOpt.NONE);
}
}
Where the worker pool's start method is:
public void Start(int numWorkerTasks=8)
{
for (int i = 0; i < numWorkerTasks; i++)
{
QueryWorker worker = new QueryWorker(this.zmqContext);
Task task = Task.Factory.StartNew(() =>
worker.Start(),
TaskCreationOptions.LongRunning);
}
Console.WriteLine("Started {0} with {1} workers.", this.GetType().Name, numWorkerTasks);
}
public class QueryWorker
{
Context zmqContext;
public QueryWorker(Context zmqContext)
{
this.zmqContext = zmqContext;
}
public void Start()
{
Socket socket = this.zmqContext.Socket(SocketType.REP);
try
{
socket.Connect("inproc://workers");
}
catch (ZMQ.Exception ze)
{
Console.WriteLine("Could not create worker, error: {0}", ze.Message);
return;
}
while (true)
{
try
{
string message = socket.Recv(Encoding.Unicode);
if (message.CompareTo("ping") == 0)
{
socket.Send("pong", Encoding.Unicode);
}
}
catch (ZMQ.Exception ze)
{
Console.WriteLine("Could not receive message, error: " + ze.ToString());
}
}
}
}

Could you post some source code or at least a more detailed explanation of your test case? In general the way to build out your design is to make one change at a time, and measure at each change. You can always move stepwise from a known working design to more complex ones.

Most probably the 'ROUTER' is the bottleneck.
Check out these related questions on this:
Client maintenance in ZMQ ROUTER
Load testing ZeroMQ (ZMQ_STREAM) for finding the maximum simultaneous users it can handle
ROUTER (and ZMQ_STREAM, which is just a variant of ROUTER) internally has to maintain the client mapping, hence IMO it can accept limited connections from a particular client. It looks like ROUTER can multiplex multiple clients, only as long as, each client has only one active connection.
I could be wrong here - but I am not seeing much proof to the contrary (simple working code that scales to multi-clients with multi-connections with ROUTER or STREAM).
There certainly is a very severe restriction on concurrent connections with ZeroMQ, though it looks like no one know what is causing it.

I have done done performance testing on calling a native unmanaged DLL function with various methods from C#:
1. C++/CLI wrapper
2. PInvoke
3. ZeroMQ/clrzmq
The last might be interesting for you.
My finding at the end of my performance test was that using the ZMQ binding clrzmq was not useful and produced a factor of 100 performance overhead after I tried to optimize the PInvoke calls within the source code of the binding. Therefore I have used the ZMQ without a binding but with PInvoke calls.these calls must be done with the cdecl convention and with the option "SuppressUnmanagedCodeSecurity" to get most speed.
I had to import just 5 functions which was fairly easy.
At the end the speed was a bit slower than a PInvoke call but with the ZMQ-in my case over "inproc".
This may give you the hint to try it without the binding, if speed is interesting for you.
This is not a direct answer for your question but may help you to increase performance in general.

Track dead WebDriver instances during parallel task

I am seeing some dead-instance weirdness running parallelized nested-loop web stress tests using Selenium WebDriver, simple example being, say, hit 300 unique pages with 100 impressions each.
I'm "successfully" getting 4 - 8 WebDriver instances going using a ThreadLocal<FirefoxWebDriver> to isolate them per task thread, and MaxDegreeOfParallelism on a ParallelOptions instance to limit the threads. I'm partitioning and parallelizing the outer loop only (the collection of pages), and checking .IsValueCreated on the ThreadLocal<> container inside the beginning of each partition's "long running task" method. To facilitate cleanup later, I add each new instance to a ConcurrentDictionary keyed by thread id.
No matter what parallelizing or partitioning strategy I use, the WebDriver instances will occasionally do one of the following:
Launch but never show a URL or run an impression
Launch, run any number of impressions fine, then just sit idle at some point
When either of these happen, the parallel loop eventually seems to notice that a thread isn't doing anything, and it spawns a new partition. If n is the number of threads allowed, this results in having n productive threads only about 50-60% of the time.
Cleanup still works fine at the end; there may be 2n open browsers or more, but the productive and unproductive ones alike get cleaned up.
Is there a way to monitor for these useless WebDriver instances and a) scavenge them right away, plus b) get the parallel loop to replace the task segment immediately, instead of lagging behind for several minutes as it often does now?

I was having a similar problem. It turns out that WebDriver doesn't have the best method for finding open ports. As described here it gets a system wide lock on ports, finds an open port, and then starts the instance. This can starve the other instances that you're trying to start of ports.
I got around this by specifying a random port number directly in the delegate for the ThreadLocal<IWebDriver> like this:
var ports = new List<int>();
var rand = new Random((int)DateTime.Now.Ticks & 0x0000FFFF);
var driver = new ThreadLocal<IWebDriver>(() =>
{
var profile = new FirefoxProfile();
var port = rand.Next(50) + 7050;
while(ports.Contains(port) && ports.Count != 50) port = rand.Next(50) + 7050;
profile.Port = port;
ports.Add(port);
return new FirefoxDriver(profile);
});
This works pretty consistently for me, although there's the issue if you end up using all 50 in the list that is unresolved.

Since there is no OnReady event nor an IsReady property, I worked around it by sleeping the thread for several seconds after creating each instance. Doing that seems to give me 100% durable, functioning WebDriver instances.
Thanks to your suggestion, I've implemented IsReady functionality in my open-source project Webinator. Use that if you want, or use the code outlined below.
I tried instantiating 25 instances, and all of them were functional, so I'm pretty confident in the algorithm at this point (I leverage HtmlAgilityPack to see if elements exist, but I'll skip it for the sake of simplicity here):
public void WaitForReady(IWebDriver driver)
{
var js = #"{ var temp=document.createElement('div'); temp.id='browserReady';" +
#"b=document.getElementsByTagName('body')[0]; b.appendChild(temp); }";
((IJavaScriptExecutor)driver).ExecuteScript(js);
WaitForSuccess(() =>
{
IWebElement element = null;
try
{
element = driver.FindElement(By.Id("browserReady"));
}
catch
{
// element not found
}
return element != null;
},
timeoutInMilliseconds: 10000);
js = #"{var temp=document.getElementById('browserReady');" +
#" temp.parentNode.removeChild(temp);}";
((IJavaScriptExecutor)driver).ExecuteScript(js);
}
private bool WaitForSuccess(Func<bool> action, int timeoutInMilliseconds)
{
if (action == null) return false;
bool success;
const int PollRate = 250;
var maxTries = timeoutInMilliseconds / PollRate;
int tries = 0;
do
{
success = action();
tries++;
if (!success && tries <= maxTries)
{
Thread.Sleep(PollRate);
}
}
while (!success && tries < maxTries);
return success;
}
The assumption is if the browser is responding to javascript functions and is finding elements, then it's probably a reliable instance and ready to be used.

UploadValuesAsync response time

I am writing test harness to test a HTTP Post. Test case would send 8 http request using UploadValuesAsync in webclient class in 10 seconds interval. It sleeps 10 seconds after every 8 request. I am recording start time and end time of each request. When I compute the average response time. I am getting around 800 ms. But when I run this test case synchronously using UploadValues method in web client I am getting average response time 250 milliseconds. Can you tell me why is difference between these two methods? I was expecting the less response time in Aync but I did not get that.
Here is code that sends 8 requests async
var count = 0;
foreach (var nameValueCollection in requestCollections)
{
count++;
NameValueCollection collection = nameValueCollection;
PostToURL(collection,uri);
if (count % 8 == 0)
{
Thread.Sleep(TimeSpan.FromSeconds(10));
count = 0;
}
}
UPDATED
Here is code that sends 8 requests SYNC
public void PostToURLSync(NameValueCollection collection,Uri uri)
{
var response = new ServiceResponse
{
Response = "Not Started",
Request = string.Join(";", collection.Cast<string>()
.Select(col => String.Concat(col, "=", collection[col])).ToArray()),
ApplicationId = collection["ApplicationId"]
};
try
{
using (var transportType2 = new DerivedWebClient())
{
transportType2.Expect100Continue = false;
transportType2.Timeout = TimeSpan.FromMilliseconds(2000);
response.StartTime = DateTime.Now;
var responeByte = transportType2.UploadValues(uri, "POST", collection);
response.EndTime = DateTime.Now;
response.Response = Encoding.Default.GetString(responeByte);
}
}
catch (Exception exception)
{
Console.WriteLine(exception.ToString());
}
response.ResponseInMs = (int)response.EndTime.Subtract(response.StartTime).TotalMilliseconds;
responses.Add(response);
Console.WriteLine(response.ResponseInMs);
}
Here is the code that post to the HTTP URI
public void PostToURL(NameValueCollection collection,Uri uri)
{
var response = new ServiceResponse
{
Response = "Not Started",
Request = string.Join(";", collection.Cast<string>()
.Select(col => String.Concat(col, "=", collection[col])).ToArray()),
ApplicationId = collection["ApplicationId"]
};
try
{
using (var transportType2 = new DerivedWebClient())
{
transportType2.Expect100Continue = false;
transportType2.Timeout = TimeSpan.FromMilliseconds(2000);
response.StartTime = DateTime.Now;
transportType2.UploadValuesCompleted += new UploadValuesCompletedEventHandler(transportType2_UploadValuesCompleted);
transportType2.UploadValuesAsync(uri, "POST", collection,response);
}
}
catch (Exception exception)
{
Console.WriteLine(exception.ToString());
}
}
Here is the upload completed event
private void transportType2_UploadValuesCompleted(object sender, UploadValuesCompletedEventArgs e)
{
var now = DateTime.Now;
var response = (ServiceResponse)e.UserState;
response.EndTime = now;
response.ResponseInMs = (int) response.EndTime.Subtract(response.StartTime).TotalMilliseconds;
Console.WriteLine(response.ResponseInMs);
if (e.Error != null)
{
response.Response = e.Error.ToString();
}
else
if (e.Result != null && e.Result.Length > 0)
{
string downloadedData = Encoding.Default.GetString(e.Result);
response.Response = downloadedData;
}
//Recording response in Global variable
responses.Add(response);
}

One problem you're probably running into is that .NET, by default, will throttle outgoing HTTP connections to the limit (2 concurrent connections per remote host) that are mandated by the relevant RFC. Assuming 2 concurrent connections and 250ms per request, that means the response time for your first 2 requests will be 250ms, the second 2 will be 500ms, the third 750ms, and the last 1000ms. This would yield a 625ms average response time, which is not far from the 800ms you're seeing.
To remove the throttling, increase ServicePointManager.DefaultConnectionLimit to the maximum number of concurrent connections you want to support, and you should see your average response time go down alot.
A secondary problem may be that the server itself is slower handling multiple concurrent connections than handing one request at a time. Even once you unblock the throttling problem above, I'd expect each of the async requests to, on average, execute somewhat slower than if the server was only executing one request at a time. How much slower depends on how well the server is optimized for concurrent requests.
A final problem may be caused by test methodology. For example, if your test client is simulating a browser session by storing cookies and re-sending cookies with each request, that may run into problems with some servers that will serialize requests from a single user. This is often a simplification for server apps so they won't have to deal with locking cross-requests state like session state. If you're running into this problem, make sure that each WebClient sends different cookies to simulate different users.
I'm not saying that you're running into all three of these problems-- you might be only running into 1 or 2-- but these are the most likley culprits for the problem you're seeing.

As Justin said, I tried ServicePointManager.DefaultConnectionLimit but that did not fix the issue. I could not able reproduce other problems suggested by Justin. I am not sure how to reproduce them in first place.
What I did, I ran the same piece of code in peer machine that runs perfectly response time that I expected. The difference between the two machines is operating systems. Mine is running on Windows Server 2003 and other machine is running on Windows Server 2008.
As it worked on the other machines, I suspect that it might be one of the problem specified by Justin or could be server settings on 2003 or something else. I did not spend much time after that to dig this issue. As this is a test harness that we had low priority on this issue. We left off with no time further.
As I have no glue on what exactly fixed it, I am not accepting any answer other than this. Becuase at very least I know that switching to server 2008 fixed this issue.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.