Hi I'm in the early stage of choosing an actor framework for a project I'm about to start. As far as I know Orleans was meant to relief the developer of as much pain as possible, at cost of some performance. In Akka.net I know that the actor size is 400 bytes If I'm right and you have to go to low level to handle cluster connections and things that are managed by orleans, but will bring you great performance.
The only performance metrics I found around internet for Orleans are:
Using X-Large VMs (8 CPU Cores / 14 GB RAM) on Microsoft Azure, with one silo per VM:
A grain will handle a maximum of 1,000 requests per second.
A silo will handle a maximum of 10,000 requests per second.
A silo will hold 100,000 active grains.
And for Akka.net in the main page:
50 million msg/sec on a single machine. Small memory footprint; ~2.5 million actors per GB of heap.
I'd like to know what machines were used in the Akka.net scenario and how do they perform Grain vs Actor (in terms of requests per seconds and how many grains/actors can you fit in a GB of RAM more or less) and how much does a grain weight in memory.
With the quotes from Orleans and Akka.net looks like Akka.net performs much better but I'd like to get further comparison on both in terms of performance.
I found this Akka.Net VS MS Orleans Comparison and Orleans and Akka Actors: A Comparison but does not address the performance question.
Thanks!
Akka.net reports local messages, which are basically function calls. Orleans reports remote messages, see RPC. That is a main difference. There are other differences as well of course.
Besides the above, the only real advise I can give you is to measure yourself, for your realistic benchmark, in a setup that will be as close as possible to production in terms of communication pattern and number of servers.
Microsoft Orleans was used to develop the backend of Halo 4 and Halo 5 which were both aclaimed for their multiplayer performance in online matches.
I work with Akka.net and I hear and read a lot of comments with claims that Akka.net is better or faster because of this and that but with little evidence on what they are basing their claims.
I would advise you to ignore the bias and do your own research or study the uses cases for each one. Also keep in mind that performance comparison may be biased to the tool the developer is familiarized.
Personally I think Microsoft Orleans is faster to learn and the code has less boilerplate when compared to Akka.net. Also it is way more familiar to what C# developers are used to.
Your link about Akka vs Orleans isn't comparing Akka.net vs Orleans. Akka on the JVM is another story. Akka.net is just a port and the JVM is totally different from the dotnet runtime which is why the Orleans team say C# doesn't need something like Akka (sorry I could not find their post saying this).
Related
I've built multiple socket server apps in Node.js for a multi-user artificial intelligence app. We're looking at 1K to 10K active socket connections per box. However even when idle and with 0 active connections, some of my servers consume 50-100 MB of memory when running on Unix. I'm sure with a sensible platform like C# or C++, this should be close to 0 MB. So we are considering a port to a "better" platform. Now let my clarify my use case:
This is not a "web server". No files are served.
We do lots of CPU intensive data processing and certain portions have already been ported to C++ and pulled into node via native modules.
We don't need to access much I/O (in most cases a few files are accessed, in some cases none, we don't use an RDBMS either)
We went with node because it was Unix friendly (unlike .NET) and seemed easy to use. But with its current memory consumption we need to evaluate other options. Many have compared Node.js with ASP.NET but I need to build a socket server in C# or C++.
I have significant experience with .NET and C++. There are libs like SuperSocket (used by Redgate and Telerik) that handle all of the low-level stuff in .NET. I will have to find a similar socket framework for C++.
So putting this all together, what are the advantages of using .NET or C++ over Node.js? And considering my servers are highly CPU-bound (not I/O bound) would the benefits of using .NET/C++ be significant or should I stick with Node.js? Any other comments regarding porting a Node.js app to C# or C++?
Bounty: I need advice and a recommended socket server library/implementation/example app in C# and/or C++. Must be open source. I need it to be high-performance, async and bug-free. Must support binary data transfer. Must run on Windows. Unix is a bonus.
We're looking at 1K to 10K active socket connections per box
the bottleneck here is not the programing language or the technology, it's the hardware and OS support. the thing that limits the amount of concurrent sockets count is basically the machine you're running on. yet, from my experience, the determinisitic object lifetime of C++ can help dramatically for supporting large number of concurrent OS resources.
This is not a "web server". No files are served.
I have done some Node.js in my profesional work, I have done some C# but mostly C++. even with node.js as a web server, most of the client and server code didn't had many much in common besides the language itself. the web server dealt with buisness logic mostly, while the client dealt with fetching and presenting the data interactivly. So, I think the main advantage of node.js as a web server is that it gives purist-JS developers the ability to write server side without using languages/technology they are not familliar with.
We do lots of CPU intensive data processing and certain portions have
already been ported to C++ and pulled into node via native modules.
yep. using strongly typed language can do wonders here. no redunadand runtime-parsing.
We don't need to access much I/O (in most cases a few files are
accessed, in some cases none, we don't use an RDBMS either)
Well, I feel there's a myth in the air that node.js somehow handles IO better than other technologies. this is simply wrong. the main feature of Node.js is the fact that by default, the IO is asynchronous. but Node.js didn't invent any wheel. you have asynchronous IO in Java (aka Java.NIO), C# (async/await) and C++ (as native stuff like epoll/IOCompletionPort, or some higher stuff like Boost.ASIO/ CPP-rest, Proxygen etc.)
We went with node because it was Unix friendly (unlike .NET)
.Net Core is a relativly new technology where .Net can run on Unix-based systems (like linux)
I will have to find a similar socket framework for C++.
Boost.ASIO, or write something yourself, it's really not that hard..
So putting this all together, what are the advantages of using .NET or
C++ over Node.js?
better CPU usage: because C++ and C# are strongly typed languages, and C++ is a statically compiled language, there are huge oppretunities for the compiler to optimize CPU extensive jobs.
lower memory footprint: usually because strongly typed languages have smaller objects without the overhead of keeping a lot of meta-data behind the scences.
with C++, having stack allocation and scoped object life-time usually the memory footprint is low. again, it depends on the quality of the code in any language.
no callback hell: C# has tasks and async await. C++ has futures/promises and some compilers (aka VC++) do supports await as well. the asynchronous code simply becomes pure fun to write as oppossed to callbacks. yes, I do aware of JS promises and the new async/await stuff, but they are relativly new compared to .Net implementation.
Compiler checks : since C# and C++ have to be compiled, a lot of silly bugs are caught in compile time. no "undefiend is not a function" or "cannot read property of undefined".
other than that it's pretty much a matter of choice.
NetMQ is native C# port of zeromq.
Zeromq is lightweight messaging library, the zeromq guide is a great if you want to learn about messaging, it also come as a book. It applicable both to zeromq and NetMQ.
If you are using windows and need to handle a lot of connection I don't recommend zeromq as it not using IOCP.
NetMQ is using IOCP on Windows and works both on windows and linux.
Disclosure - I'm author of NetMQ and maintainer on the zeromq (libzmq) project.
[1] https://github.com/zeromq/netmq
[2] http://netmq.readthedocs.io/en/latest/
[3] http://zguide.zeromq.org/page:all
[4] http://www.amazon.com/ZeroMQ-Messaging-Applications-Pieter-Hintjens/dp/1449334067/ref=sr_1_1?ie=UTF8&qid=1462550951&sr=8-1&keywords=zeromq
We do lots of CPU intensive data processing
Node.js may have been the wrong choice from the start and it would probably never match performances of a C++ server. However, it can be pretty close, if you are doing things right. In addition, writing good C++ and a complete rewrite of a system is difficult and time consuming. So, I want to give some reasons for you to stick to Node.js or at least, completely exhaust all your options before you move.
my servers consume 50-100 MB
Are you using Node.js v0.12? With Node.js v4.2 LTS, idle Node.js server should use around 20 MB of memory. (It would probably never be near 0 MB because of V8) Have you checked for memory leaks?
1K to 10K active socket connections per box
This should be easily achievable. If you are using the most popular socket.io library, here's some relevant benchmarks.
on a 3.3 GHz Xeon X5470 using one core, the max messages-sent-per-second rate is around 9,000–10,000 depending on the concurrency level.
from: http://drewww.github.io/socket.io-benchmarking/
(Since, all these connections are kept alive concurrently, CPU usage matters more)
If you are already using that and having issues, try replacing socket.io with SocketCluster which is faster and more scalable. Replacing this should be easier than a complete rewrite. Here's some benchmarks:
8-core Amazon EC2 m3.2xlarge instance running Linux
at 42K, the CPU use of the busiest worker dropped to around 45%
http://socketcluster.io/#!/performance
Finally, to prove that Node.js can nearly reach C++ performance. Have a look at this:
servers use 12G memory
It supports 1,200,000 active websocket connections
https://github.com/smallnest/C1000K-Servers
My point is you have average performance goals that you should be able to reach with Node.js with little effort. Try to benchmark (https://github.com/machinezone/tcpkali) and find the issue rather than do a complete rewrite.
Having listened to recent azure podcasts (particularly the one on building low latency financial systems on azure) and reading all the hype about Service Fabric I decided to try to alter the 'Distributed computation code sample Monte Carlo simulation' pattern for my needs.
My scenario is:
One request with a given starting state to run 10k full sports match simulations using a simplistic (computationally-wise) monte-carlo based model.
My first attempt was:
1 * Stateful 'Processor' Actor that receives the start state of the match and forwards it to 10k + Task Actors, along with relevant Aggregator ActorId
10K+ * StateLess 'Task' Actors that ran 1 simulation and passed the Result to their Aggregator Actor. Simulation time was small (~2ms)
100 * Stateful 'Aggregator' Actors that aggregated received simulations and passed to a finaliser Actor
1 * 'Finaliser' Actor that calculated the final result
Running the above on my dev box simply using Tasks takes < 100ms, but the above setup (running on the dev machine as a local cluster) took 50secs and more!
After debugging through one potential cause that i found was the amount of time it takes for the Processor Actor to send the initial tasks so i was wondering what sort of overhead there is in calling Service Fabric (I guess all sorts of Naming service calls are happening when i call an actor's methods) and whether the slowness was likely to be due to this and my number of tasks?
To eliminate other possibilities i did the following and noticed only very small differences in total time:
Made all actors stateless to ensure that state management wasn't adding overheads.
Created all ActorProxies in the Processor and stored their references for future calls to ensure Actor Activations weren't causing issues.
Does anybody have any suggestions about where to go from here, or has anybody tried to implement something similar?
Thanks,
Alex
I would have posted this as a comment, but I do not yet have enough reputation for that! If you reference this page in Service Fabric's documentation, take a look at the comments below the article, particularly the comment trail started by "tom" sometime around June, 2015. He was experiencing poor performance (~20 operations per second) with stateful actors, which seemed to be acknowledged as an area of future improvement. They stressed the use of readonly attributes on non-mutating methods to significantly improve performance. Abhishek Ram also included some notes and a link to information on relevant performance counters that may help with troubleshooting.
You noted that you tried using stateless actors with little impact on performance. I would point further down the comment trail where another user reports achieving 2k+ operations per second on a single actor using readonly methods, which I would expect to perform similarly to stateless actor methods. Perhaps the information from the performance counters can be compared with this to see how closely your performance is matching their somewhat trivial example in the comments.
I look for ideas how to speed up message transfers through RabbitMQ.
I installed the latest version on Windows 64 bit, running a server on my local machine on which I also publish and consume to/from through a C# implementation. I initially maxed out at 40,000 messages per second which is impressive but does not suit my needs (I compete with a custom binary reader which can handle 24 million unparsed 16 byte large byte arrays per second; obviously I dont expect to get close to that but I attempt to improve at least). I need to send around 115,000,000 messages as fast as possible. I do not want to persist the data and the connection is gonna be direct to one single consumer. I then built chunks of my 16b byte arrays and published onto the bus without any improvement. The transfer rate maxed out at 45mb/second. I find this very very slow given the fact that in the end it should just boil down to raw transmission speed because I could create byte arrays the size of several megabytes where the efficiency rate of routing by the exchange becomes negligible vs raw transmission speed. Why does my message bus max out at 45mb/second transfer speed?
Bump...and Update: Have not seen any answer to this question in a longer time. I am a bit surprised not a single RabbitMQ developer chimed in. I played extensively with RabbitMQ and ZeroMQ. I decided that RabbitMQ is not up to the task when looking at high throughput in-process messaging solutions. The broker implementation and especially parsing logic is a major bottleneck to improving throughput. I dropped RabbitMQ from my list of possible options.
There was a white paper out describing how they provided a solution to managing low latency, high throughput options financial data streams but it sounds to me all they did was throwing hardware at it rather than providing a solution that targets low latency, high throughput requirements.
ZeroMQ, did a superb job after I studied the documentation more intensively. I can run communication in-process, it provides stable enough push/pull, pub/sub, req/rep, and pair/pair patterns which I need. I was looking for blocking logic within the pub/sub pattern which ZeroMQ does not provide (it drops messages instead when a high watermark is exceeded), but the push/pull pattern provides blocking. So, pretty much all I needed is provided for. The only gripe I have is with their understanding of event processing; the event structure implementation through poll/multiplex is not very satisfactory.
is it good idea realize OLTP system using WCF?
System must process 5-8k request per sec.
As noted by #nonnb in a comment, WCF is a great platform to build service oriented or distributed applications. This includes using WCF in OLTP applications (we do that here). With WCF you could theoretically keep adding servers to scale and handle the load but usually you will end up hitting some database contention (e.g. locking).
5K-8K requests per second is a large number. That translates to 300K-~500K requests per minute. To put this in perspective, if you take a look at the TPC-C benchmark results the top end of your range is almost in the top 50 results with the lower end being in (maybe) the top third of results.
Note that the Microsoft TPC-C results are C++ running in COM+ and do not involve .NET or WCF.
In terms of WCF some reading of interest would be Creating high performance WCF services and A Performance Comparison of Windows Communication Foundation. The latter is almost 4 years old so some of those performance benchmarks may have been improved over the years.
I currently have an application that sends XML over TCP sockets from windows client to windows server.
We are rewriting the architecture and our servers are going to be in Java. One architecture we are looking at is a REST architecture over http. So C# WinForm clients will send info using this. We are looking for high throughput and low latency.
Does anyone have any performance metrics on this approach versus some other C# client to Java server communication options.
This isn't really well-enough defined to make any metric statements; how big are the messages, how often would you be hitting the REST service, is it straight HTTP or do you need to secure it with SSL? In other words, what can you tell us about the workload parameters?
(I say this over and over again on performance questions: unless you can tell me something about the workload, I can't -- nobody really can -- tell you what will give better performance. That's why they used to say you couldn't consider performance until you had an implementation: it's not that you can't think about performance, it's that people often couldn't or at least wouldn't think about workload.)
That said, though, you can make some good estimates simply by looking at how many messages you want to exchange, because setup time for TCP/IP often dominates REST. REST offers two advantages here: first, the TCP/IP time often does dominate the message transmission, and that's pretty well optimized in production web servers like Apache or lighttpd; second, a RESTful architecture enhances scalability by eliminating session state. That means you can scale freely using just a simple TCP/IP load balancer.
I would set up a test to try it and see. I understand that the only part of your application you're changing is the client/server communication. So analyse what you're sending now, and put together a test client/server setup sending messages which are representative of what you think your final solution is going to be doing (perhaps representative only in terms of size/throughput).
As noted in the previous post, there's not enough detail to really judge what the performance is going to be like. e.g.
is your message structure/format going to be the same, but merely over HTTP rather than raw sockets ?
are you going to be sending subsets of XML data ? Processing large quantities of XML can be memory intensive (e.g. if you're using DOM-based approach).
What overhead is your chosen REST framework going to be introducing (hopefully very little, but at the moment we don't know).
The best solution is to set something up using (say) Jersey and spend some time testing various scenarios. If you're re-architecting a solution, it's going to be worth a few days investigating performance (let alone functionality, ease of development etc.)
It's going to be plenty fast, unless you have a very, very large number of concurrent clients hitting those servers. The XML shredding keeps getting faster in both Java and .NET. If you are on CLR2 and Java 5 or above, you will be fine. But of course you still need to do the tests to verify.
We've tested in our lab, REST and SOAP transactions, and they are faster than you might think. Tens of thousands of messages per second. Small numbers of modern CPUs generating XML messages can easily saturate a gigabit network. In other words, the network is the bottleneck (transmission of data), not the CPU (serializing & de-serializing XML).
AND, If you do your software design properly, in the very unlikely situation where REST is not sufficient, then swapping out the message format layer (REST => protobufs) will get you better transmission perf, with minimal disruption.
But before you need to go there, you will be able to send some money to Cisco and get lots more headroom.