I have a WPF app that makes some WCF calls (about 5-6 per minute). It has about 100 users. These calls come in bursts (The user presses save, that calls a WCF "Broker" service, which then calls several other WCF Services.)
I was looking into duplex communication and I saw that WCF can support TCP communication. I also saw that IIS 7 can support TCP hosting.
From what I have read, there can be some performance gains by using TCP.
But my understanding of TCP is that it is more for systems that are going to be making many hundreds of calls per minute.
Would my less chatty system see real benefits from taking the time to switch from HTTP to TCP?
As a matter of opinion, I would say that if your current system works well and you're not experiencing any particular problem using HTTP, then you probably shouldn't change it. Why would you inject uncertainty into your project for no particular reason?
If you're making five or six calls per minute, then I can't see how converting to TCP will gain you much. Sure, your data transmission time will be slightly less, but what's the point? If your messages are huge--megabytes in size--then I might worry about improving data transmission speed. Otherwise, there's just no point to it.
Now, if you expect that your traffic will increase a thousandfold in the near future, then you probably should look at converting to TCP rather than HTTP. Beyond that, I'd recommend that you spend your time and effort on improvements that add value to your product.
Related
I look for ideas how to speed up message transfers through RabbitMQ.
I installed the latest version on Windows 64 bit, running a server on my local machine on which I also publish and consume to/from through a C# implementation. I initially maxed out at 40,000 messages per second which is impressive but does not suit my needs (I compete with a custom binary reader which can handle 24 million unparsed 16 byte large byte arrays per second; obviously I dont expect to get close to that but I attempt to improve at least). I need to send around 115,000,000 messages as fast as possible. I do not want to persist the data and the connection is gonna be direct to one single consumer. I then built chunks of my 16b byte arrays and published onto the bus without any improvement. The transfer rate maxed out at 45mb/second. I find this very very slow given the fact that in the end it should just boil down to raw transmission speed because I could create byte arrays the size of several megabytes where the efficiency rate of routing by the exchange becomes negligible vs raw transmission speed. Why does my message bus max out at 45mb/second transfer speed?
Bump...and Update: Have not seen any answer to this question in a longer time. I am a bit surprised not a single RabbitMQ developer chimed in. I played extensively with RabbitMQ and ZeroMQ. I decided that RabbitMQ is not up to the task when looking at high throughput in-process messaging solutions. The broker implementation and especially parsing logic is a major bottleneck to improving throughput. I dropped RabbitMQ from my list of possible options.
There was a white paper out describing how they provided a solution to managing low latency, high throughput options financial data streams but it sounds to me all they did was throwing hardware at it rather than providing a solution that targets low latency, high throughput requirements.
ZeroMQ, did a superb job after I studied the documentation more intensively. I can run communication in-process, it provides stable enough push/pull, pub/sub, req/rep, and pair/pair patterns which I need. I was looking for blocking logic within the pub/sub pattern which ZeroMQ does not provide (it drops messages instead when a high watermark is exceeded), but the push/pull pattern provides blocking. So, pretty much all I needed is provided for. The only gripe I have is with their understanding of event processing; the event structure implementation through poll/multiplex is not very satisfactory.
I've got a WPF application written in C#. It has to instantiate thousands of objects. After pulling data from the database server, it has to run a ton of calculations that takes time. The whole process takes up to 20-30 seconds with 80% of it coming from the calculations.
So to help resolve this issue, I wrote a WCF service that keeps a copy of the already instantiated objects with the calcs already run, and then upon request, transfers the instantiated objects to the calling client.
It works! However it's slow...really slow. Much slower than the original way. It takes 3-4 minutes to transfer all the objects from the WCF service, thus defeating it's purpose.
I've tried streaming instead of buffering the service and increasing or decreasing the different service options in the client and server config files, but haven't found settings that make a real difference yet.
Is this slow speed to be expected, or should it be fast and I just need to modify some options? If so, what options?
WCF isn't necessarily slow but if the application isn't designed properly, the application can be slow. It could be compared to loading up a few thousand pounds of weight on a sports car. The car is a fast car, but it isn't really being used properly.
First, I would say you have to minimize the amount of data that is being sent on the wire (more about this later). Once on the wire, you'll get a lot better performance if you use TCP or named pipes instead of HTTP. See Choosing a Transport. HTTP is easy since most networks are configured to let is past easily but it isn't designed for large data sets.
If the delay is coming from the calculations, then the only thing the WCF service will accomplish is offloading the processing from the server to the client. Ultimately this might be a good thing - or even necessary - if you plan on having a high volume of concurrent requests to the server but as you have noticed, it doesn't necessarily mean shorter times for the end user. What you should focus on doing is minimizing the calculation time.
It is hard to give specifics since you havent revealed much about what is being queried, what is being returned and the the calculations are doing. However, I have had impressive results with large data sets by offloading code from the application server to the database server via Visual Studio SQL Server Projects. Since .NET and MSSQL are both written on the CLR, you can write native database objects (like user defined functions) in C# or VB or any other CLR language and deploy them directly into the database. Then you can use these functions in your queries and they are very fast since they are compiled into native SQL. I've seen orders of magnitude in difference between running C# in the application vs running the same function in the database.
If 80% of your applicatiion's work comes from the calculations, then it might be a great idea to parallize some parts of it, for example with the Task Parallel Library.
I have an application that performs analysis on incoming event flow (CEP engine).
This flow can come from different sources (database, network, etc...).
For efficient decoupling, I want this service to expose a named pipe using wcf, and allow a different application to read the data from the source and feed it into the service.
So, one process is in charge of getting and handling the incoming data while the other for analyzing it, connecting the two using wcf with named pipes binding. They both will be deployed on the same machine.
Question is, will I notice a lower throughput using wcf in the middle then if I would have simply coupled the two services into a single process and use regular events?
No, in modern mainstream operating systems, IPC will never be, can never be, as fast as in-process eventing. The reason for this is the overhead of context switching associated to activating different processes. Even for a multi-core system where distinct processes run on distinct cores, though they each run independently (and therefore there is no cost associated to activating one process versus another - they are both always active), the communication across processes still requires crossing security boundaries, hitting the network stack (even if using pipes), and so on. Where a local function call will be on the order of 1000's of cpu cycles to invoke, an IPC will be millions.
So IPC will be slower than in-process communication. Whether that actually matters in your case, is a different question. For example, suppose you have an operation that requires Monte Carlo simnulation that runs for 2 hours. In this case it really doesn't matter whether it takes 1ms or 1000ms in order to invoke the operation.
Usually, performance of the communication is not what you want to optimize for. Even if performance is important, focusing on one small aspect of performance - let's say, whether to use IPC or local function calls - is probably the wrong way to go about things.
I assumed "CEP" referred to "complex event processing" which implies high throughput, high volume processing. So I understand that performance is important to you.
But, for true scalability and reliability, you cannot simply optimize in-process eventing; You will need to rely on multiple computers and scale out. This will imply some degree of IPC, one way or the other. It's obviously important to be efficient at the smaller scale (events) but your overall top-end performance will be largely bounded by the architecture you choose for scale out.
WCF is nice because of the flexibility it allows in moving building blocks from the local machine to a remote machine, and because of the Channel stack, you can add communication services in a modular way.
Whether this is important to you, is up to you to decide.
Can anyone help me with a question about webservices and scalability? I have written a webservice as a facade into our document management system and need to think about scalability issues. What areas should I be looking at to ensure performance and availability?
Thanks in advance
Performance is separate from scalability. Scalability means that you can add more servers to linearly increase system throughput (i.e more client connections). The best way to start is having stateless webservices. That way any client can call any of the n webservice intance on n different machines. If there is a shared database at the end for persistence that will ultimately be your bottleneck. There are ways to reduce that with data partitioning and sharding, but only when you get to that point.
First of all, decide what is acceptable behaviour of your web service. What should it be able to cope - 1000 connections per second? What response time will each connection have?
Then you need to automate the usage of your web service so you can stress test the system.
What happens when you have 100 requests per second? 1000? 10000?
Then you can make a decision about if performance is ok, if the acceptable behaviour is too strict, or if you need to do heavy performance tuning based on actual profiling data.
You should be looking to host your WCF service in IIS. IIS has a lot of performance, scalability, security etc. mechanisms built in and is the best starting point to save you reinventing the wheel.
Some of the performance is certainly due to your own code, but lets assume that it's already optimized. At that point, the additional performance scaling issues involve the service host (e.g. IIS) the machines that host it, and their network (inter/intranet) connection speeds. You'll need to do some speed tests to be sure of things.
Well it really depends on what you're doing in your web service, but the only way you're going to find out is by simulating lots of users and measuring it.
Take a look at my answer to this question: Measuring performance
When we tested our code in this manor (where the web services were hosted in Windows service(s)), we found that the bottleneck was authenticating each user in the facade service. In particular the windows component LSASS was using most of the CPU.
Luckily we were able to create new machines, each with a facade service, which then called through to our main set of web services. This enable us to scale up to a large number of users (in the region of 100,000 users using our software normally).
I currently have an application that sends XML over TCP sockets from windows client to windows server.
We are rewriting the architecture and our servers are going to be in Java. One architecture we are looking at is a REST architecture over http. So C# WinForm clients will send info using this. We are looking for high throughput and low latency.
Does anyone have any performance metrics on this approach versus some other C# client to Java server communication options.
This isn't really well-enough defined to make any metric statements; how big are the messages, how often would you be hitting the REST service, is it straight HTTP or do you need to secure it with SSL? In other words, what can you tell us about the workload parameters?
(I say this over and over again on performance questions: unless you can tell me something about the workload, I can't -- nobody really can -- tell you what will give better performance. That's why they used to say you couldn't consider performance until you had an implementation: it's not that you can't think about performance, it's that people often couldn't or at least wouldn't think about workload.)
That said, though, you can make some good estimates simply by looking at how many messages you want to exchange, because setup time for TCP/IP often dominates REST. REST offers two advantages here: first, the TCP/IP time often does dominate the message transmission, and that's pretty well optimized in production web servers like Apache or lighttpd; second, a RESTful architecture enhances scalability by eliminating session state. That means you can scale freely using just a simple TCP/IP load balancer.
I would set up a test to try it and see. I understand that the only part of your application you're changing is the client/server communication. So analyse what you're sending now, and put together a test client/server setup sending messages which are representative of what you think your final solution is going to be doing (perhaps representative only in terms of size/throughput).
As noted in the previous post, there's not enough detail to really judge what the performance is going to be like. e.g.
is your message structure/format going to be the same, but merely over HTTP rather than raw sockets ?
are you going to be sending subsets of XML data ? Processing large quantities of XML can be memory intensive (e.g. if you're using DOM-based approach).
What overhead is your chosen REST framework going to be introducing (hopefully very little, but at the moment we don't know).
The best solution is to set something up using (say) Jersey and spend some time testing various scenarios. If you're re-architecting a solution, it's going to be worth a few days investigating performance (let alone functionality, ease of development etc.)
It's going to be plenty fast, unless you have a very, very large number of concurrent clients hitting those servers. The XML shredding keeps getting faster in both Java and .NET. If you are on CLR2 and Java 5 or above, you will be fine. But of course you still need to do the tests to verify.
We've tested in our lab, REST and SOAP transactions, and they are faster than you might think. Tens of thousands of messages per second. Small numbers of modern CPUs generating XML messages can easily saturate a gigabit network. In other words, the network is the bottleneck (transmission of data), not the CPU (serializing & de-serializing XML).
AND, If you do your software design properly, in the very unlikely situation where REST is not sufficient, then swapping out the message format layer (REST => protobufs) will get you better transmission perf, with minimal disruption.
But before you need to go there, you will be able to send some money to Cisco and get lots more headroom.