How to maximize http.sys file upload performance

How to maximize http.sys file upload performance - c#

I'm building a tool that transfers very large streaming data sets (possibly on the order of terabytes in a single stream; routinely in the tens of gigabytes) from one server to another. The client portion of the tool will read blocks from the source disk, and send them over the network. The server side will read these blocks off the network and write them to a file on the server disk.
Right now I'm trying to decide which transport to use. Options are raw TCP, and HTTP.
I really, REALLY want to be able to use HTTP. The HttpListener (or WCF if I want to go that route) make it easy to plug in to the HTTP Server API (http.sys), and I can get things like authentication and SSL for free. The problem right now is performance.
I wrote a simple test harness that sends 128K blocks of NULL bytes using the BeginWrite/EndWrite async I/O idiom, with async BeginRead/EndRead on the server side. I've modified this test harness so I can do this with either HTTP PUT operations via HttpWebRequest/HttpListener, or plain old socket writes using TcpClient/TcpListener. To rule out issues with network cards or network pathways, both the client and server are on one machine and communicate over localhost.
On my 12-core Windows 2008 R2 test server, the TCP version of this test harness can push bytes at 450MB/s, with minimal CPU usage. On the same box, the HTTP version of the test harness runs between 130MB/s and 200MB/s depending upon how I tweak it.
In both cases CPU usage is low, and the vast majority of what CPU usage there is is kernel time, so I'm pretty sure my usage of C# and the .NET runtime is not the bottleneck. The box has two 6-core Xeon X5650 processors, 24GB of single-ranked DDR3 RAM, and is used exclusively by me for my own performance testing.
I already know about HTTP client tweaks like ServicePointManager.MaxServicePointIdleTime, ServicePointManager.DefaultConnectionLimit, ServicePointManager.Expect100Continue, and HttpWebRequest.AllowWriteStreamBuffering.
Does anyone have any ideas for how I can get HTTP.sys performance beyond 200MB/s? Has anyone seen it perform this well on any environment?
UPDATE:
Here's a bit more detail on the performance I'm seeing with TcpListener vs HttpListener:
First, I wrote a TcpClient/TcpListener test. On my test box that was able to push 450MB/s.
Then using reflector I figured out how to get the raw Socket object underlying HttpWebRequest, and modified my HTTP client test to use that. Still no joy; barely 200MB/s.
My current theory is that http.sys is optimized for the typical IIS use case, which is lots of concurrent small requests, and lots of concurrent and possibly large responses. I hypothesize that in order to achieve this optimization, MSFT had to do so at the expense of what I'm trying to accomplish, which is very high throughput on a single very large request, with a very small response.
For what it's worth, I also tried up to 32 concurrent HTTP PUT operations to see if it could scale out, but there was still no joy; about 200MB/s.
Interestingly, on my development workstation, which is a quad-core Xeon Precision T7400 running 64-bit Windows 7, my TcpClient implementation is about 200MB/s, and the HTTP version is also about 200MB/s. Once I take it to a higher-end server-class machine running Server 2008 R2, the TcpClient code gets up to 450MB/s, while the HTTP.sys code stays around 200.
At this point I've sadly concluded that HTTP.sys is not the right tool for the job I need done, and will have to continue to use the hand-rolled socket protocol we've been using all along.

I can't see too much of interest except for this Tech Note. It might be worth having a fiddle with MaxBytesPerSend

If you're going to send files over the LAN then UDP is the way to go, because TCP's overhead is a waste in that case. TCP provides rate limiting to avoid too many lost packets, whereas with UDP the application has to sort that out by itself. NFS would do the job, were it not that you're stuck with windows; but I'm sure there must be ready made UDP stuff. Also use the tool "iperf" (available on linux, probably also windows) to benchmark the network link irrespective of the protocol. Some network cards are plain crap and rely on the CPU too much, which will limit your speed to 200mbit. You want a proper network card with its own processors (don't know the exact terms to put this).

Related

Performance send / receive data via TCP socket Client-Server application

I'm developing a system for exchanging data between client and server application using sockets via TCP.
The server was written long ago in C++ (by a third part), the client instead is the application that I'm still developing in C#. The bases have already been completed. Both applications are communicating properly. The problem is that data is exchanged (in my opinion) too slowly.
I emphasize in my opinion because I have no idea what you can actually expect using sockets via TCP. Currently, and by eye, after several tests, the communication speed is an average of 200/250 KB per second, both in reception and in sending.
There are applications that allow you to evaluate the speed of data exchange between two endpoints on a specific port? What speed should I expect?
More specifically, the client and the server are developed for the file sharing. Currently, the client can receive and send one file at a time. The protocol would do otherwise (receive/send multiple files at once), but for security reasons I preferred otherwise (if you're receiving more than one file and you lose the connection you'll lose all files you're receiving).
Change this feature could significantly affect the speed of data exchange? In what way?

First, you have to know the bandwith of the link your host provides you to know what to expect. If you are hosting it yourself, your upload speed may be way slower than the download speed link. You can log into your server and run a speed test.
I had a situation similar to yours (developed a server and a client to send binary data to it) and by that time (2-3 years ago) I used a version of this tool to measure the speed of the data exchange. You would install it on your server, set up the monitor (ports you want to watch, etc), configure the charts and let it run for some time. It worked really well. You can look for other similar tools searching for "Bandwidth Traffic Monitor".
If you allow your application to exchange multiple files at the same time, have in mind that your link speed will be divided among the connections and, depending on how many clients are using your application, the multiple connections and file writes your server will be doing may be limited by its processing capacity, in addition to the bandwith.

What speed should I expect?
The full speed the slowest part of the network between the two nodes can transfer, minus the bandwidth of existing traffic over that slowest part. On a 100 Mbit LAN, I would expect a one-way transfer speed of 10 MB/s.
Change this feature could significantly affect the speed of data exchange? In what way?
That totally depends on the protocol being used and the client and server implementation. You'll have to benchmark where the hard work is being done, where your bottleneck is.

Does Localhost perform better on windows?

I have a .NET 3.5 server application that usually has about 8 clients. I'm using System.Net.Sockets for all the networking.
I've been told that if a client is running on the same box, it should use localhost:<port> or 127.0.0.1:<port> instead of the machine's ip or name for better performance. Several people at work have said that this skips some layers of the tcp stack.
But I'm not able to see any performance difference at all in testing (timing how long it takes to get a ping packet from server to client using System.Diagnostics.Stopwatch).
Should there really be better performance in theory?

No, performance is same in both cases. If you are using your local device ip address, your operating system kernel don't transport your packets data to your network device and this data don't be was sended anywhere then you don't have any ISO layers calculations (encapsulation, decapsulation etc).
I belive the OS will see this is a local device and you treat it like it was 127.0.0.1. So in fact both will have the same effect.

I suppose it's possible that there will be an extremely tiny performance boost in using 127.0.0.1 (though I doubt it), but with 8 clients you'll never notice it. That performance difference would have to be aggregate over A LOT of traffic to become at all noticeable.
The larger concern would be which value is better from a maintenance perspective. If the application is always looking at localhost for external dependencies, it won't do well if run on another host. But if it's looking for a more universally understood address for those dependencies, it'll find them from anywhere on the network.

How to send information fast like many games do?

I'm thinking like the methods games like Counter Sstrike, WoW etc uses. In CS you often have just like 50 ping, is there any way to send information to an online MySQL database at that speed?
Currently I'm using an online PHP script which my program requests, but this is really slow, because the program first has to send headers and post-information to it, and then retrieve the result as an ordinary webpage.
There really have to be any easier, faster way of doing this? I've heard about TCP/IP, is this what I should use here? Is it possible for it to connect to the database in a faster way than indirectly via the PHP script?

TCP/IP is made up of three protocols:
TCP
UDP
ICMP
ICMP is what you are using when you ping another computer on a network.
Games, like CounterStrike, don't care about what you previously did. So there's no requirement for completeness, to be able to reconstruct what you did (which is why competitors have to tape what they are doing). This is what UDP is used for - there's no guarantee that data is delivered or received. Which is why lag can be such a problem - you're already dead, you just didn't know it.
TCP guarantees that data is sent and received. Slower than UDP.
There are numerous things to be aware of to have a fast connection - less hops, etc.

Client-to-server for latency-critical stuff? Use non-blocking UDP.
For reliable stuff that can be a little slower, if you use TCP make sure you do so in a non-blocking fashion (select(), non-blocking send, etc.).
The big reason to use UDP is if you have time-sensitive data - if the position of a critter gets dropped, you're better off ignoring it and sending the next position packet rather than re-sending the last one.
And I don't think any high-performance game has each and every call resolve to a call to the database. It's more common to (if a database is even used) persist data occasionally, or at important events.
You're not going to implement Counterstrike or anything similar on top of http.

Most games like the ones you cite use UDP for this (one of the TCP/IP suite of protocols.) UDP is chosen over TCP for this application since it's lighter weight allowing for better performance and TCP's reliability features aren't necessary.
Keep in mind though, those games have standalone clients and servers usually written in C or C++. If your application is browser-based and you're trying to do this over HTTP then use a long-lived connection and strip back the headers as much as possible, including cookies. The Tornado framework may be of interest to you there. You may also want to look into HTML5 WebSockets however widespread support is still a fair way off.
If you are targeting a browser-based plugin like Flash, Java, SilverLight then you may be able to use UDP but I don't know enough about those platforms to confirm.
Edit:
Also worth mentioning: once your networking code and protocol is sufficiently optimized there are still things you can do to improve the experience for players with high pings.

Socket.BeginReceive Performance on Mono

I'm developing a server in C#. This server will act as a data server for a backup service: a client will send data, a lot of data, continuously, specifically will send data chunk of files, up to five, in the same tcp channel. I'll send data to the server slowly, i don't want to kill customer bandwidth, so i didn't need to speed up at max data send and, for this reason, i can use a single tcp channel for everything.
Said this, actually the server uses BeginReceive method to acquire data from client and, on windows, this means IOCP. My questions is: how BeginReceive will perform on linux/freebsd trough mono? On windows, i've read a lot of stuff, will perform very well but this software, the server part, will run on linux or freebsd trough mono and i don't know how these methods are implemented on it!
More, to try to reduce continue allocations of an Async State object for the (Begin|End)Receive method i mantain one for the tcp connection and in the BeginReceive callback i copy out data before reuse it (naturally i don't clear data in because i know how much read trough EndReceive return value). Buffer is set on 8kb so i'll at max copy out 8kb of data, it shouldn't kill resoruces.
My target is to get up to 400/500 connections at max. It isn't so much, but the server (machine), in the meantime, will handle files trough an own filesystem (developed using fuse first in C# and later in C) on LVM+Linux Software Raid Mirror and antivirus check using clamav so the software must be light as can!
EDIT: I forgot to say that the machine will be (probably) a Intel Core 2 Duo 2.66+ GHz (3 MB L2 - FSB 1066 MHz) with 2 GB of ram and the SO using 64 bits.
Is mono using epoll (libevent) or kqueue (on freebsd)? And I should do something specific to try to maximize performances? Can I do something more to don't kill resources receiving data packets?

I know it's a little late, but I just found this question...
Mono is able to handle the number of connections that you need and much more. I regularly test xsp2 (the Mono ASP.NET standalone server) with over 1k simultaneous connections.. If this is going to be a high load situation, you should play a bit with setting MONO_THREADS_PER_CPU until you find the right number of threads for the ThreadPool.
On linux, Mono uses epoll when available (which is always these days).

I can't speak specifically about the performance of that one function on mono, but in general mono performs very well these days. 4-500 connections is as you say, not very many, so I doubt you'd have any issues.
In saying that, it shouldn't be very hard to set a test for this kind of thing up. I think that's probably the only way you'll get a conclusive answer for your situation.

Monitoring (network) resource utlization and performance of a windows application

I am building a client-server based solution; client being a desktop application and the server being a web application.
Basically, I need to monitor the performance and resource utilization of the client, which is a .NET 2.0 based Windows Desktop application.
The most important thing I need to monitor is the network resources the client uses, i.e. what is the size of the data that flows out from the client to the server and what is the size of the data that the client downloads from the server.
Apart from this, general performance monitoring would help too.
Please guide.
Edit: A few people have suggested using perfmon, but aren't the values shown in perfmon system-wide? I need these network based stats for a single application only...bytes being sent and received by a single desktop application.

The standard tool for network monitoring is Wireshark.
It allows you to filter the network traffic very flexiblely.
This could be quite an overkill for your application though.
If you are using pure .NET, I would suggest that you add performance logging into your networking classes on the server side- if you are using .Net library classes, then inheritate from them your own classes which add statistics when sending and receiving data.

You need to split your monitoring in two parts:
How the system interacts with the server (number of calls performed)
Amount of network traffic (size of exchanged data for any call)
The first part is (in my experience) often negleted while it has a lot of importance, because acquiring a new connection is often much more expensive that data traffic in itself.
You do not tell us anything about the king of connection you're using (low level tcpip calls, web services, WCF or what else) but my suggestion is:
Find a way to determine how many time your application calls the server
Find how much any single call is costing in term of data exchanged
How to monitor these values depends a lot from the technology involved, for some is very simple (if, for example, you're using a web service, setting up Fiddler to monitor the calls and examining an monitoring results is very simple), for other you need to work using a low level traffic analyzer like Wireshark or MS Network Monitor and learn how to filter traffic according to IP address of the server, ports used and other parameters.
If you clarify your application architecture I can try to be more specific.
Regards
Massimo

You can also use Task Manager to do this. Go to the processes tab, then View->"select columns". Check "I/O read bytes" and "I/O write bytes". Then find your program in the processes list and you can observe the cumulative values.

Take a look at this article: http://www.codeproject.com/KB/IP/apptraffwatcher.aspx
You may be able to tear apart the source code, and grab what you need to meassure download/upload for your application's process ID.
It looks like he uses this library to get information about the amount of traffic: http://www.codeproject.com/KB/IP/trafficwatcher.aspx

I tried the perfmon and I was unable to watch our network traffic either. But I was able to in the Performance Explorer in Visual Studio 2005 Team suite.
If you have Team edition Visual Studio you can set up either Sampling/Instrumentation on your desktop application. Then go into options of the session. select Events -> Windows Kernel Trace -> Network. Run your application and let the Visual studio log the data. Then save the report. (I love Microsoft for this "feature") go to the command prompt, navigate to C:\Program Files\Microsoft Visual Studio 8\Team Tools\Performance Tools and run "vsperfreport /CALLTRACE (filename).vsp" This will produce a csv file containing all network packets sent/recieved/size/port etc by the desktop application.
I know this was a long winded solution but I just tried it on my .Net 2.0 application and it captured all of our communication with Oracle Identity Manager and Oracle Database.

It is not clear by your post if you are using HTTP requests. You indicated that the server is a web application, which implies (perhaps incorrectly) to me that you might be using the HTTP protocol to send/receive data from server to client.
If so, one tool that might be of use is Fiddler. This tool will monitor all HTTP traffic in and out of your workstation and it can (I believe) watch specific sessions and applications. The nice part is that you can see individual requests and see the statistics for these requests, including bytes in/out, round trip times, and other useful bits of information.
If you are not HTTP based, then this tool won't help.

I'm surprised nobody has suggested SysInternals (now Microsoft) Process Explorer (technet.microsoft.com/en-us/sysinternals/bb896653.aspx). If you right click on the executable in question and left click properties it will bring up a dialog box. Then you switch to the performance tab and you can monitor I/O of the executable. The Performance Graph tab will show CPU usage and I/O bytes history graphed over time. It's a cool and free tool.

You want to look at perfmon (otherwise called Performance Monitor in admin tools off the start menu).
Open it in its default graph view, add a counter, select network interface, then bytes per second (or a similar counter), click ok and you're done.
You can experiment with the other networking counters as there are many, one of them will do exactly what you want. You can also save the perfmon logs to a file and view them afterwards - you'll see the graph in its entirety and you can "zoom in" on sections. Alternatively, you can save log-style files with just raw numbers.
Here's a quick guide through perfmon as an admin tool, once you understand that, the rest comes easily.
In Vista you can't add individual counters any more, you add the entire set of counters grouped under an object - so for my example, you'd add the Network Interface object, then you'd see all the individual counters on the graph after you click ok.

If you want this built into your client codebase, and not using an external tool, you can use Performance Counters to get access to this and most other things reported by the Performance Monitor, Task Manager, etc.

You should check out ACE Analyst for this use case - think of it as a superintelligent layer on top of Wireshark packet captures. You need to look at the packets to understand the true nature of the application behavior as runs across the network.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.