Porting socket server from Node.js to C#

Porting socket server from Node.js to C# - c#

I've built multiple socket server apps in Node.js for a multi-user artificial intelligence app. We're looking at 1K to 10K active socket connections per box. However even when idle and with 0 active connections, some of my servers consume 50-100 MB of memory when running on Unix. I'm sure with a sensible platform like C# or C++, this should be close to 0 MB. So we are considering a port to a "better" platform. Now let my clarify my use case:
This is not a "web server". No files are served.
We do lots of CPU intensive data processing and certain portions have already been ported to C++ and pulled into node via native modules.
We don't need to access much I/O (in most cases a few files are accessed, in some cases none, we don't use an RDBMS either)
We went with node because it was Unix friendly (unlike .NET) and seemed easy to use. But with its current memory consumption we need to evaluate other options. Many have compared Node.js with ASP.NET but I need to build a socket server in C# or C++.
I have significant experience with .NET and C++. There are libs like SuperSocket (used by Redgate and Telerik) that handle all of the low-level stuff in .NET. I will have to find a similar socket framework for C++.
So putting this all together, what are the advantages of using .NET or C++ over Node.js? And considering my servers are highly CPU-bound (not I/O bound) would the benefits of using .NET/C++ be significant or should I stick with Node.js? Any other comments regarding porting a Node.js app to C# or C++?
Bounty: I need advice and a recommended socket server library/implementation/example app in C# and/or C++. Must be open source. I need it to be high-performance, async and bug-free. Must support binary data transfer. Must run on Windows. Unix is a bonus.

We're looking at 1K to 10K active socket connections per box
the bottleneck here is not the programing language or the technology, it's the hardware and OS support. the thing that limits the amount of concurrent sockets count is basically the machine you're running on. yet, from my experience, the determinisitic object lifetime of C++ can help dramatically for supporting large number of concurrent OS resources.
This is not a "web server". No files are served.
I have done some Node.js in my profesional work, I have done some C# but mostly C++. even with node.js as a web server, most of the client and server code didn't had many much in common besides the language itself. the web server dealt with buisness logic mostly, while the client dealt with fetching and presenting the data interactivly. So, I think the main advantage of node.js as a web server is that it gives purist-JS developers the ability to write server side without using languages/technology they are not familliar with.
We do lots of CPU intensive data processing and certain portions have
already been ported to C++ and pulled into node via native modules.
yep. using strongly typed language can do wonders here. no redunadand runtime-parsing.
We don't need to access much I/O (in most cases a few files are
accessed, in some cases none, we don't use an RDBMS either)
Well, I feel there's a myth in the air that node.js somehow handles IO better than other technologies. this is simply wrong. the main feature of Node.js is the fact that by default, the IO is asynchronous. but Node.js didn't invent any wheel. you have asynchronous IO in Java (aka Java.NIO), C# (async/await) and C++ (as native stuff like epoll/IOCompletionPort, or some higher stuff like Boost.ASIO/ CPP-rest, Proxygen etc.)
We went with node because it was Unix friendly (unlike .NET)
.Net Core is a relativly new technology where .Net can run on Unix-based systems (like linux)
I will have to find a similar socket framework for C++.
Boost.ASIO, or write something yourself, it's really not that hard..
So putting this all together, what are the advantages of using .NET or
C++ over Node.js?
better CPU usage: because C++ and C# are strongly typed languages, and C++ is a statically compiled language, there are huge oppretunities for the compiler to optimize CPU extensive jobs.
lower memory footprint: usually because strongly typed languages have smaller objects without the overhead of keeping a lot of meta-data behind the scences.
with C++, having stack allocation and scoped object life-time usually the memory footprint is low. again, it depends on the quality of the code in any language.
no callback hell: C# has tasks and async await. C++ has futures/promises and some compilers (aka VC++) do supports await as well. the asynchronous code simply becomes pure fun to write as oppossed to callbacks. yes, I do aware of JS promises and the new async/await stuff, but they are relativly new compared to .Net implementation.
Compiler checks : since C# and C++ have to be compiled, a lot of silly bugs are caught in compile time. no "undefiend is not a function" or "cannot read property of undefined".
other than that it's pretty much a matter of choice.

NetMQ is native C# port of zeromq.
Zeromq is lightweight messaging library, the zeromq guide is a great if you want to learn about messaging, it also come as a book. It applicable both to zeromq and NetMQ.
If you are using windows and need to handle a lot of connection I don't recommend zeromq as it not using IOCP.
NetMQ is using IOCP on Windows and works both on windows and linux.
Disclosure - I'm author of NetMQ and maintainer on the zeromq (libzmq) project.
[1] https://github.com/zeromq/netmq
[2] http://netmq.readthedocs.io/en/latest/
[3] http://zguide.zeromq.org/page:all
[4] http://www.amazon.com/ZeroMQ-Messaging-Applications-Pieter-Hintjens/dp/1449334067/ref=sr_1_1?ie=UTF8&qid=1462550951&sr=8-1&keywords=zeromq

We do lots of CPU intensive data processing
Node.js may have been the wrong choice from the start and it would probably never match performances of a C++ server. However, it can be pretty close, if you are doing things right. In addition, writing good C++ and a complete rewrite of a system is difficult and time consuming. So, I want to give some reasons for you to stick to Node.js or at least, completely exhaust all your options before you move.
my servers consume 50-100 MB
Are you using Node.js v0.12? With Node.js v4.2 LTS, idle Node.js server should use around 20 MB of memory. (It would probably never be near 0 MB because of V8) Have you checked for memory leaks?
1K to 10K active socket connections per box
This should be easily achievable. If you are using the most popular socket.io library, here's some relevant benchmarks.
on a 3.3 GHz Xeon X5470 using one core, the max messages-sent-per-second rate is around 9,000–10,000 depending on the concurrency level.
from: http://drewww.github.io/socket.io-benchmarking/
(Since, all these connections are kept alive concurrently, CPU usage matters more)
If you are already using that and having issues, try replacing socket.io with SocketCluster which is faster and more scalable. Replacing this should be easier than a complete rewrite. Here's some benchmarks:
8-core Amazon EC2 m3.2xlarge instance running Linux
at 42K, the CPU use of the busiest worker dropped to around 45%
http://socketcluster.io/#!/performance
Finally, to prove that Node.js can nearly reach C++ performance. Have a look at this:
servers use 12G memory
It supports 1,200,000 active websocket connections
https://github.com/smallnest/C1000K-Servers
My point is you have average performance goals that you should be able to reach with Node.js with little effort. Try to benchmark (https://github.com/machinezone/tcpkali) and find the issue rather than do a complete rewrite.

Related

Akka.net vs Orleans performance

Hi I'm in the early stage of choosing an actor framework for a project I'm about to start. As far as I know Orleans was meant to relief the developer of as much pain as possible, at cost of some performance. In Akka.net I know that the actor size is 400 bytes If I'm right and you have to go to low level to handle cluster connections and things that are managed by orleans, but will bring you great performance.
The only performance metrics I found around internet for Orleans are:
Using X-Large VMs (8 CPU Cores / 14 GB RAM) on Microsoft Azure, with one silo per VM:
A grain will handle a maximum of 1,000 requests per second.
A silo will handle a maximum of 10,000 requests per second.
A silo will hold 100,000 active grains.
And for Akka.net in the main page:
50 million msg/sec on a single machine. Small memory footprint; ~2.5 million actors per GB of heap.
I'd like to know what machines were used in the Akka.net scenario and how do they perform Grain vs Actor (in terms of requests per seconds and how many grains/actors can you fit in a GB of RAM more or less) and how much does a grain weight in memory.
With the quotes from Orleans and Akka.net looks like Akka.net performs much better but I'd like to get further comparison on both in terms of performance.
I found this Akka.Net VS MS Orleans Comparison and Orleans and Akka Actors: A Comparison but does not address the performance question.
Thanks!

Akka.net reports local messages, which are basically function calls. Orleans reports remote messages, see RPC. That is a main difference. There are other differences as well of course.
Besides the above, the only real advise I can give you is to measure yourself, for your realistic benchmark, in a setup that will be as close as possible to production in terms of communication pattern and number of servers.

Microsoft Orleans was used to develop the backend of Halo 4 and Halo 5 which were both aclaimed for their multiplayer performance in online matches.
I work with Akka.net and I hear and read a lot of comments with claims that Akka.net is better or faster because of this and that but with little evidence on what they are basing their claims.
I would advise you to ignore the bias and do your own research or study the uses cases for each one. Also keep in mind that performance comparison may be biased to the tool the developer is familiarized.
Personally I think Microsoft Orleans is faster to learn and the code has less boilerplate when compared to Akka.net. Also it is way more familiar to what C# developers are used to.
Your link about Akka vs Orleans isn't comparing Akka.net vs Orleans. Akka on the JVM is another story. Akka.net is just a port and the JVM is totally different from the dotnet runtime which is why the Orleans team say C# doesn't need something like Akka (sorry I could not find their post saying this).

Can C# .NET be used for hard real-time?

Given that the familiar form of .NET is run on Windows, which is not a real-time O/S, and MONO runs on Linux (standard kernel is also not a real-time O/S).
Given also, that any memory allocation scheme offering garbage collection (as in "managed" .NET), and indeed any heap memory scheme will introduce non-deterministic, potentially non-trivial delays into an application's execution behavior.
Is there any combination of alternate host O/S and coding paradigm in which one can leverage all of the power and conveniences of C# .NET while implementing a solution which can execute designated portions of code within tightly specified time constraints? e.g. start a C# method every 10ms to a tolerance of less than 1ms, with completion time determined only by the work performed in the method itself?
Obviously, the application would have to be carefully written; time-critical code would have to avoid memory allocations; the application would have to have completed all its memory allocation etc. work and have no other threads active once the hard real-time loop is started. Also, the host O/S would have to support real-time scheduling.
Is this possible within the .NET / MONO framework, or is it precluded by the design of the .NET runtime, framework, and O/Ss on which it (or compatible equivalent) is supported?
For example: is it possible to do reliable fine-grained (~1ms) machine control purely in C# with something like NETduino, or do they have limits or require alternate strategies for such applications?

Short Answer: No.
Longer answer: The closest you can get is running the .net Micro Framework directly on Hardware, but the TinyCLR still doesn't give you deterministic timings. Microsoft has Windows CE/Windows Embedded Compact as their real time offering, but even that is only real time for slower tasks (I believe somewhere in the range of 50 microseconds or more - not sure if that qualifies for Hard Real Time)
I do not know if it were technically possible to create a real-time c# implementation, but no one has done one and even .net native isn't made for that.

Can C# be used for hard real-time? Yes
When we talk about real-time it's most often (if not always) about robotics and IoT. And for that we almost always go with one of these options (forget Windows CE and Windows 10 IoT):
Microcontrollers (example: Arduino, RPi Pico, NodeMCU)
Linux based SBCs (example: Raspberry Pi, BeagleBone, Rock Pi)
Microcontrollers are by nature real-time. Basically the device will just run a loop forever (there are interrupts and multi-threading on some chips though). Top languages in this category are C/C++ and MicroPython. But C# can also be used:
Wilderness Labs (Netduino and Meadow F7)
.NET nanoframefork (several boards)
The second option (Linux based SBCs) is a bit more tricky. The OS has complete control over the hardware and it has a scheduler. That way many processes can be run on just one CPU. The OS itself has a lot of housekeeping as well.
Linux has a set of scheduling APIs that can be used to tell the OS that we want you to favor our process over others. And the OS will do its best to comply but no guarantees. This is usually called soft real-time. In .NET you can use the Process.PriorityClass to change your process's nice value. Depending on how busy the OS is and the amount of resources available (CPUs and memory) you might get satisfying results.
Other than that, Linux also provides hard real-time capabilities with the PREEMT_RT patch, and there is also a feature that you can isolate a CPU core for your selected processes. But to my knowledge .NET does not have any API to use these capabilities (P/Invoke may work).

node.js write modules for business logic (edge.js)

I fairly new to node.js and I don´t feel 100% comfortable writing business logic in node.js.
I have to support MSSQL as Database-Backend, so I came accross Edge.js. Subsequently I thought maybe it could be a good Idea writing my DataLayer and BusinessLogic as Edge.js modules.
Does anybody have experience with this approach?
Are there any limitations?

Edge.js (http://tjanczuk.github.io/edge) was created primarily to help Node.js developers in efficiently performing tasks that Node.js is not good at, for example:
running CPU intensive operations (which Edge.js enables one to do on dedicated CLR threads),
accessing functionality that is not available or not mature in Node.js (e.g. accessing MS SQL databases, which Edge.js enables you to do with ADO.NET),
efficiently integrating pre-existing .NET components or business logic in new Node.js applications.
The primary cost of using Edge.js is in increased memory footprint, since your node.exe process is now hosting two virtual machines instead of one: V8 and CLR. However, that disadvantage becomes irrelevant if your alternative is to run .NET code in a separate process. Compared to the alternative of running .NET logic in an external process, Edge.js has the advantage of dramatically reduced latency (see http://bit.ly/1hQseHY), and simplicity (one process instead of two or more).

Is it reasonable to write a server application in C# in my case?

I want it to work on windows servers.
It will be a cloud type server - it'll consist of modules\parts running on different machines all over the world using http\tcp + upnp to connect to each other
There are going to be controlling\monitoring\observing modules on each machine to provide stats on performance
This net is going to be working with large amount of VIDEO\AUDIO life streaming\broadcasting data
It is going to use FFMPEG for re-encoding and OpenGL, OpenCV and such for filtering (.NET wrappers exist and work BTW)
It will not use any WCF or IIS
I want to develop it in team of 2-4 developers, smart students.
So is it OK to create this in C# .Net or I shall not waste my time on promises of ease it could provide to a developer and go C\C++?
So is it reasonable to write a server application in C# in my case?
Offtop - why not WCF
Warning: it gets way to subjective in here.
WCF is grate when you have big corp with relatively small data exchange per one session of service.
When you have video, LIVE video, it all gets complicated. Large amounts of data, lots of users stream in and out from your service at the same time.
Try to do live video streaming over http binding - than try it with others than you'll see why I do not like idea of live streaming with WCF - it is slow, with way2much not needed for live streaming info and after all have you ever seen a live video streaming app on WCF? No - you haven't - may be you have seen +- live video on Silverlight + IIS pair which I do not like because it is just for Silverlight\WindowsMediaPlayer video streaming solution while I want more than that.
I love to have cross-platform clients with reach UI’s. And I do not like (it is all here my personal opinion - so it is subjective) Silverlight+IIS+WCF group. So what shall I do - right go to sockets, streams in such old and simple formats like FLV and Flash as back end client - Simpler in development in some parts, more conservative way of doing live video over the web than one you get from MS today.
I love Flash FLV live streaming because you just open socket and start sending live FLV video data onto it (for each user FLV header and than FLV "TAG's", one by one: video tag, audio tag, video tag, audio tag etc) and Flash plays it! With no special\unusual code. It is fast, easy in supporting, and does not make client need anything new\unusual. And you on server side can take grate use of that "TAG" form of video\audio data representation.
So that is in short why I just do not want to use WCF - hard to get live video playing out from it on client side, no general benefits for live video server.
And when most of live data goes thru sockets why to bother with using WCF for service management.
During last half of 2009 and first half of 2010 I was getting into WCF, live video streaming, silverlight and flash, comparing process of client\server creation, reading different formats with a team of wary interesting developers. In general at the end of project we had lots of mini servers streaming live data and lots of different clients receiving it. Comparing all we've done we came to conclusions which are near one I present you here.
That is why I do not want to use WCF in my nearest project - I do not want to think about how to deliver media data, I want to focus on its filtering\editing.
Why the question appeared
We started playing with FFmpeg\OpenCV in C, and it is pretty simple to manipulate data using them... in C... on Linux...
But when we started to play with there .Net bindings (we are now playing with Tao.FFmpeg) we found that in most cases we end up playing with C# Marshal a lot, and having 2 variables for its C analog (problem of pointers) and so on. I hope we will not see such problem with Emgu CV but steel it makes me a little bit afraid...

I think it's entirely reasonable. The benefits of C# with regard to ease of development will greatly outweigh any performance drawbacks of not using C++.
C# is generally more cross-platform than C++. True, C++ is a cross-platform language, but there are large differences between the APIs that C++ programs use to interact with the system. C# and .Net/Mono have a much more standardized interface to the socket layer.
Finally, with ambitious projects like this, getting the project into a usable form is a much more important goal than getting the highest performance possible. Performance only matters if the project is complete. Write it in C# because that will give you the greatest odds of completion. Then worry about performance.

I'm not exactly sure why people have brought up Cross Platform concerns as clearly the OP has stated the app will run on Windows.
As to the actual questions.
Can you build a server application that communicates via tcp/http in C# that does not have to run in IIS. -> Yes.
Can you build a server application that is performant and scales in C# -> Yes.
Can you do so with Students -> Maybe. Depends on the students... ;) But that is irrespective of the language in use.
Is this something I would do? Yes. We've done that. We have a c# app running on approximately 20,000 machines right now that are communicating effectively over tcp. We aren't using WCF, but we did decide to use RESTful style services over http for the data transfer.
Our biggest issue was simply tuning the app to transfer the "right" amount of data over the wire at a time. This network is for data collection and storage. It's averaging around 200GB of data collected a day..
UPDATE
I wanted to clarify a bit about the above app. The 20,000 machines at the above installation are clients (XP, Vista, 7, 2003 Server, and 2008 Servers). There's only one data collection point server in the mix. The clients post data to the server, when connected to a network, once every 45 seconds. Roughly 97% of the machines stay connected in this manner, the rest connect a couple times a week.
This works out to the server processing about 37 million requests a day.
Now, to be sure, each request is relatively small at around 5KB to 6KB each. However, the shear number of requests shows that a C# application can handle managing those connections, which is the bigger part of the OP's problem.
Because the OP's files are large (Video), then the real issue is simply in data transfer. Which will be hindered more by hard drive speeds, as well as network speed and latency. Those issues are irrespective of which language you are working in and will limit the number of connections per server based on available bandwidth.
Working this out let's limit it down to one server for an example. If you have a video rate of 400kb/s then and a 25MB connection to the internet, then that box could physically only handle around 62 simultaneous connections. Which is so FAR below the number of connections our app is doing as to be a rounding error.
Assuming perfect network conditions (which don't exist), pumping that internet connection up to 100MB (which can be expensive) means a 4x increase in simultaneous connections to 240; still completely manageable.
However, the network is only one side of the equation. Drive speed on the servers matters a lot. You better have a good disk array capable of continuously delivering that amount of data. I know drives claim 3GB data transfers, but a drive which can saturate the channel has never been built. Which means serious planning and money in the server setup.
The point of all of this is to say that the language doesn't matter one bit in your situation. You have other much larger contention issues. With that being the case, go with the language that will help you get the project done faster.

Why stop at C#, if you (possibly) want cross-platform, write it in Python or similar, you'll find that the networking aspects of a scripting language are far better than C# (as that's pretty much the role scripting languages are put to nowadays, running web-based servers).
You'll find developer productivity is much improved over C# (just as C# has better productivity over C++), and there are lots of people who know and want to work on these systems. It sounds like performance of the servers themselves is of less importance than the networking, so it appears that script would be your best choice. Plus ffmpeg libraries are more tightly integrated with python using pyffmpeg than C# (well, mostly).
And it'd be a lot cooler, more fun, and very much cross-platform!

If you want C# and also cross-platform abilities, your development will have to target the Mono platform (or another cross-platform .NET runtime, if you can find one). You might have to give up VisualStudio, and maybe some Microsoft-specific libraries and tools, but you can still have C# on multiple platforms. Just make sure you start the multi-platform building and testing EARLY in the process or it will be hell to change things later.

If the target of the application is to run only on Windows platforms, I'm completely sure to write this application in C#. Many applications like that can be running right now and we don't even know that.
If the target is to run on multiple platformms, you should encapsulate first all the problems that a non-windows platform can bring to your application.
Why do you have to write it in C++ if, in this case, C# is capable to do everything that C++ does? I would use C++ to program things on hardware-level things, like a robot or something else. To write a server application, C# will fit very well what you want, it was designed for these things.
And C# is cross-platform, you just need the right tool to make it work on a specific platform.

What's the fastest IPC method for a .NET Program?

Named Pipes ? XML-RPC ? Standard Input-Output ? Web Services ?
I would not use unsafe stuff like Shared Memory and similar

Named pipes would be the fastest method, but it only works for communication between processes on the same computer. Named pipes communication doesn't go all the way down the network stack (because it only works for communication on the same computer) so it will always be faster.
Anonymous Pipes may only be used on the local machine. However, Named Pipes may traverse the network.
I left out Shared Memory since you specifically mentioned that you don't want to go that route. Shared Memory would be even faster than named pipes tho.
So it depends if you only need to communicate between processes on the same computer or different computers. Any XML-based communication protocol (eg. Web Services) will usually be slower due to the massive overhead in XML.

i don't think there's a quick answer to this. if i was you, i would buy/borrow a copy of Advanced Programming in the Unix Environment (APUE) by Stevens and Rago and read Chapter 15 and 16 on IPC. It's a brilliant book if you really want to understand how *nix (a lot of it applies to any POSIX system) works down to the kernel level.
If you must have a quick answer, i would say the following (without putting a huge amount of thought into it), in descending order of efficiency:
Local Machine IPC
Shared Memory/Memory Mapped
files
Named Pipe/FIFO (only between
related processed - i.e. fork)
Unix Domain Socket
Network IPC/Internet Sockets
Datagram Sockets
Stream Sockets
Raw Sockets
At both levels, you are going to have to think about how the data you transfer is encoded/decoded and trade off between memory usage and CPU utilization.
At the Network level, you will have to consider what layers of protcols you are going to run on top of. Most commonly, at the bottom of the application layer you will be choosing between TCP/IP or UDP. TCP has a lot more overhead as it is does error correction, checksumming and lots of other stuff. if you need in order delivery of messages you need to use TCP as opposed to UDP.
On top of these are other protocols like HTTP, SOAP (on top of HTTP or another protocol like FTP/SMTP etc.). A binary protocol is going to be more efficient as long as you are network bound rather than CPU bound. If using SOAP on the MS.Net platform, then binary encoding of the messages is going to be quicker across the network but may be more CPU intensive.
I could go on. it's not a simple question. Learning where the latencies are and how buffering is handled are key to being able to make decisions on the trade offs you are always forced to with IPC. I'd recommend the APUE book above if you really want to know what is going on under the hood...

Windows Messaging is one of the fastest ways for IPC, after-all Windows is built on them.
It's possible to use WM_COPYDATA with IPInvoke calls to exchange data between 2 form based .Net applications, and I've got an open source library for doing exactly that. I've bench marked around 1771 msg/sec on a fairly hot laptop.
http://thecodeking.github.com/XDMessaging.Net

I don't know why you won't go with shared memory, but its very very fast from C# to C# apps on the same machine, and very reliable (unlike TCP sockets). spazzarama/SharedMemory is a fantastic C# lib that supports shared arrays and buffers with a simple high level API. You just initialize the class with a common memory file name (on client/server sides), and then update the array. Values magically appear on the other side!

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.