node.js write modules for business logic (edge.js) - c#

I fairly new to node.js and I don´t feel 100% comfortable writing business logic in node.js.
I have to support MSSQL as Database-Backend, so I came accross Edge.js. Subsequently I thought maybe it could be a good Idea writing my DataLayer and BusinessLogic as Edge.js modules.
Does anybody have experience with this approach?
Are there any limitations?

Edge.js (http://tjanczuk.github.io/edge) was created primarily to help Node.js developers in efficiently performing tasks that Node.js is not good at, for example:
running CPU intensive operations (which Edge.js enables one to do on dedicated CLR threads),
accessing functionality that is not available or not mature in Node.js (e.g. accessing MS SQL databases, which Edge.js enables you to do with ADO.NET),
efficiently integrating pre-existing .NET components or business logic in new Node.js applications.
The primary cost of using Edge.js is in increased memory footprint, since your node.exe process is now hosting two virtual machines instead of one: V8 and CLR. However, that disadvantage becomes irrelevant if your alternative is to run .NET code in a separate process. Compared to the alternative of running .NET logic in an external process, Edge.js has the advantage of dramatically reduced latency (see http://bit.ly/1hQseHY), and simplicity (one process instead of two or more).

Related

Porting socket server from Node.js to C#

I've built multiple socket server apps in Node.js for a multi-user artificial intelligence app. We're looking at 1K to 10K active socket connections per box. However even when idle and with 0 active connections, some of my servers consume 50-100 MB of memory when running on Unix. I'm sure with a sensible platform like C# or C++, this should be close to 0 MB. So we are considering a port to a "better" platform. Now let my clarify my use case:
This is not a "web server". No files are served.
We do lots of CPU intensive data processing and certain portions have already been ported to C++ and pulled into node via native modules.
We don't need to access much I/O (in most cases a few files are accessed, in some cases none, we don't use an RDBMS either)
We went with node because it was Unix friendly (unlike .NET) and seemed easy to use. But with its current memory consumption we need to evaluate other options. Many have compared Node.js with ASP.NET but I need to build a socket server in C# or C++.
I have significant experience with .NET and C++. There are libs like SuperSocket (used by Redgate and Telerik) that handle all of the low-level stuff in .NET. I will have to find a similar socket framework for C++.
So putting this all together, what are the advantages of using .NET or C++ over Node.js? And considering my servers are highly CPU-bound (not I/O bound) would the benefits of using .NET/C++ be significant or should I stick with Node.js? Any other comments regarding porting a Node.js app to C# or C++?
Bounty: I need advice and a recommended socket server library/implementation/example app in C# and/or C++. Must be open source. I need it to be high-performance, async and bug-free. Must support binary data transfer. Must run on Windows. Unix is a bonus.
We're looking at 1K to 10K active socket connections per box
the bottleneck here is not the programing language or the technology, it's the hardware and OS support. the thing that limits the amount of concurrent sockets count is basically the machine you're running on. yet, from my experience, the determinisitic object lifetime of C++ can help dramatically for supporting large number of concurrent OS resources.
This is not a "web server". No files are served.
I have done some Node.js in my profesional work, I have done some C# but mostly C++. even with node.js as a web server, most of the client and server code didn't had many much in common besides the language itself. the web server dealt with buisness logic mostly, while the client dealt with fetching and presenting the data interactivly. So, I think the main advantage of node.js as a web server is that it gives purist-JS developers the ability to write server side without using languages/technology they are not familliar with.
We do lots of CPU intensive data processing and certain portions have
already been ported to C++ and pulled into node via native modules.
yep. using strongly typed language can do wonders here. no redunadand runtime-parsing.
We don't need to access much I/O (in most cases a few files are
accessed, in some cases none, we don't use an RDBMS either)
Well, I feel there's a myth in the air that node.js somehow handles IO better than other technologies. this is simply wrong. the main feature of Node.js is the fact that by default, the IO is asynchronous. but Node.js didn't invent any wheel. you have asynchronous IO in Java (aka Java.NIO), C# (async/await) and C++ (as native stuff like epoll/IOCompletionPort, or some higher stuff like Boost.ASIO/ CPP-rest, Proxygen etc.)
We went with node because it was Unix friendly (unlike .NET)
.Net Core is a relativly new technology where .Net can run on Unix-based systems (like linux)
I will have to find a similar socket framework for C++.
Boost.ASIO, or write something yourself, it's really not that hard..
So putting this all together, what are the advantages of using .NET or
C++ over Node.js?
better CPU usage: because C++ and C# are strongly typed languages, and C++ is a statically compiled language, there are huge oppretunities for the compiler to optimize CPU extensive jobs.
lower memory footprint: usually because strongly typed languages have smaller objects without the overhead of keeping a lot of meta-data behind the scences.
with C++, having stack allocation and scoped object life-time usually the memory footprint is low. again, it depends on the quality of the code in any language.
no callback hell: C# has tasks and async await. C++ has futures/promises and some compilers (aka VC++) do supports await as well. the asynchronous code simply becomes pure fun to write as oppossed to callbacks. yes, I do aware of JS promises and the new async/await stuff, but they are relativly new compared to .Net implementation.
Compiler checks : since C# and C++ have to be compiled, a lot of silly bugs are caught in compile time. no "undefiend is not a function" or "cannot read property of undefined".
other than that it's pretty much a matter of choice.
NetMQ is native C# port of zeromq.
Zeromq is lightweight messaging library, the zeromq guide is a great if you want to learn about messaging, it also come as a book. It applicable both to zeromq and NetMQ.
If you are using windows and need to handle a lot of connection I don't recommend zeromq as it not using IOCP.
NetMQ is using IOCP on Windows and works both on windows and linux.
Disclosure - I'm author of NetMQ and maintainer on the zeromq (libzmq) project.
[1] https://github.com/zeromq/netmq
[2] http://netmq.readthedocs.io/en/latest/
[3] http://zguide.zeromq.org/page:all
[4] http://www.amazon.com/ZeroMQ-Messaging-Applications-Pieter-Hintjens/dp/1449334067/ref=sr_1_1?ie=UTF8&qid=1462550951&sr=8-1&keywords=zeromq
We do lots of CPU intensive data processing
Node.js may have been the wrong choice from the start and it would probably never match performances of a C++ server. However, it can be pretty close, if you are doing things right. In addition, writing good C++ and a complete rewrite of a system is difficult and time consuming. So, I want to give some reasons for you to stick to Node.js or at least, completely exhaust all your options before you move.
my servers consume 50-100 MB
Are you using Node.js v0.12? With Node.js v4.2 LTS, idle Node.js server should use around 20 MB of memory. (It would probably never be near 0 MB because of V8) Have you checked for memory leaks?
1K to 10K active socket connections per box
This should be easily achievable. If you are using the most popular socket.io library, here's some relevant benchmarks.
on a 3.3 GHz Xeon X5470 using one core, the max messages-sent-per-second rate is around 9,000–10,000 depending on the concurrency level.
from: http://drewww.github.io/socket.io-benchmarking/
(Since, all these connections are kept alive concurrently, CPU usage matters more)
If you are already using that and having issues, try replacing socket.io with SocketCluster which is faster and more scalable. Replacing this should be easier than a complete rewrite. Here's some benchmarks:
8-core Amazon EC2 m3.2xlarge instance running Linux
at 42K, the CPU use of the busiest worker dropped to around 45%
http://socketcluster.io/#!/performance
Finally, to prove that Node.js can nearly reach C++ performance. Have a look at this:
servers use 12G memory
It supports 1,200,000 active websocket connections
https://github.com/smallnest/C1000K-Servers
My point is you have average performance goals that you should be able to reach with Node.js with little effort. Try to benchmark (https://github.com/machinezone/tcpkali) and find the issue rather than do a complete rewrite.

Can C# .NET be used for hard real-time?

Given that the familiar form of .NET is run on Windows, which is not a real-time O/S, and MONO runs on Linux (standard kernel is also not a real-time O/S).
Given also, that any memory allocation scheme offering garbage collection (as in "managed" .NET), and indeed any heap memory scheme will introduce non-deterministic, potentially non-trivial delays into an application's execution behavior.
Is there any combination of alternate host O/S and coding paradigm in which one can leverage all of the power and conveniences of C# .NET while implementing a solution which can execute designated portions of code within tightly specified time constraints? e.g. start a C# method every 10ms to a tolerance of less than 1ms, with completion time determined only by the work performed in the method itself?
Obviously, the application would have to be carefully written; time-critical code would have to avoid memory allocations; the application would have to have completed all its memory allocation etc. work and have no other threads active once the hard real-time loop is started. Also, the host O/S would have to support real-time scheduling.
Is this possible within the .NET / MONO framework, or is it precluded by the design of the .NET runtime, framework, and O/Ss on which it (or compatible equivalent) is supported?
For example: is it possible to do reliable fine-grained (~1ms) machine control purely in C# with something like NETduino, or do they have limits or require alternate strategies for such applications?
Short Answer: No.
Longer answer: The closest you can get is running the .net Micro Framework directly on Hardware, but the TinyCLR still doesn't give you deterministic timings. Microsoft has Windows CE/Windows Embedded Compact as their real time offering, but even that is only real time for slower tasks (I believe somewhere in the range of 50 microseconds or more - not sure if that qualifies for Hard Real Time)
I do not know if it were technically possible to create a real-time c# implementation, but no one has done one and even .net native isn't made for that.
Can C# be used for hard real-time? Yes
When we talk about real-time it's most often (if not always) about robotics and IoT. And for that we almost always go with one of these options (forget Windows CE and Windows 10 IoT):
Microcontrollers (example: Arduino, RPi Pico, NodeMCU)
Linux based SBCs (example: Raspberry Pi, BeagleBone, Rock Pi)
Microcontrollers are by nature real-time. Basically the device will just run a loop forever (there are interrupts and multi-threading on some chips though). Top languages in this category are C/C++ and MicroPython. But C# can also be used:
Wilderness Labs (Netduino and Meadow F7)
.NET nanoframefork (several boards)
The second option (Linux based SBCs) is a bit more tricky. The OS has complete control over the hardware and it has a scheduler. That way many processes can be run on just one CPU. The OS itself has a lot of housekeeping as well.
Linux has a set of scheduling APIs that can be used to tell the OS that we want you to favor our process over others. And the OS will do its best to comply but no guarantees. This is usually called soft real-time. In .NET you can use the Process.PriorityClass to change your process's nice value. Depending on how busy the OS is and the amount of resources available (CPUs and memory) you might get satisfying results.
Other than that, Linux also provides hard real-time capabilities with the PREEMT_RT patch, and there is also a feature that you can isolate a CPU core for your selected processes. But to my knowledge .NET does not have any API to use these capabilities (P/Invoke may work).

Cache Dataset in session to fetch on demand

I'm considering using WCF or mormot as frameworks for RESTful service, where the code of business / legacy that needs to be accessed is written in Delphi. Performance is a premise in the project.
The application must be prepared for load balancing. The clients of REST service Desktops are Windows applications. These desktop clients allow the user to view large volumes of data, with huge resultsets in SQL statements. What is the best way to implement a service to cache a recordset and consume it slowly through the REST service. Can demonstrate a good example? The recordset must be cached in the session until the client completed the consultation or decided to do the full fetch. I'm looking for the right architecture?
Enabling load balancing will work in WCF? Due to the recordset being cached on a single server, with the row fetch requests, if any, must fall on the same server.
Both WCF and mORMot share the same high-performance kernel-mode http.sys server. Both feature IOCP and multi-threading.
For performance, mORMot will be lighter, will allocate (much) less memory, won't be affected by Garbage Collector freezes, and is able to get JSON content directly from the database engine (by-passing most temporary data conversion and allocation) - so that you can achieve amazing speed. In short, mORMot was designed for performance of serving REST/JSON content from the ground up - with a multi-threaded kernel (whereas e.g. node.js is mono-threaded). If your purpose is also to cache some data, mORMot works very well as 64 bit native services, giving access to all your system RAM if needed, and has built-in real-time content compression.
WCF is a great general-purpose communication library, which can be RESTful, but is not RESTful from its (historical) roots. The main issue I saw with WCF is the difficulty to configure it between applications (.exe.config tuning may be confusing), and that it is a big black box. For instance, it was not possible to implement Cross-origin resource sharing with WCF when the server is hosted as a Windows service (the Access-Control-Allow-Origin: HTTP headers are deleted by WCF!): you have to host it within IIS - and can't fix the issue, whereas with a full Open Source solution, you can fix any issue.
Load-balancing can be implemented in mORMot and WCF with the same algorithm. Instead of using a round-robbin algorithm in your case, a simple routing based on the content may be enough.
Using WCF to serve business logic written in Delphi will be slow, error prone and difficult to maintain. Mixing technologies induces unneeded complexity. I would not go into this direction.
If you have an existing Delphi code base, and some Delphi skills, I guess mORMot may be a better choice. It was reported e.g. that a single server on production is able to hande more than one million requests per day, serving thousands of concurrent clients, with dedicated JavaScript process on the server side. One of the mORMot design goals was to help working with existing code and legacy projects. But I'm not 100% fair, since I'm the main maintainer of this open source project. :)

WCF cluster for mid-length tasks

We have a self-hosted WCF application which analyses text. The installation we're discussing, involves processing batches of small fragments of text (social media) and longer ones (like newspaper articles). The longer fragments take on average 5-6 sec to process in one WCF instance, while the shorter ones are under 1 sec. There are millions of items of each kind to be processed every day.
Several questions:
What is the recommended configuration? Windows Azure / any kind of IaaS like Amazon / cluster managed by a load balancer?
Is there a built-in support for load balancing in WCF, which does not require writing a wrapper?
For some reason, when a long task is running and another task is submitted to an instance deployed on a multicore machine, they both run in parallel on the same core, instead of starting on another core which is free. Is this some kind of conservative allocation? Can it be managed more efficiently?
The easy answer is Azure (because it's a PaaS by microsoft) but it isn't really a technical question. It depends on costs, and growth prediction.
Not really. WCF supports load balancing, but WCF itself runs in your process and can't load balance itself. It's usually a feature of your hosting platform.
If that's 2 different processes then the OS schedules CPU-time, and I wouldn't recommend messing with that. If both were run on the same core it's probably because they can (which makes sense, as WCF uses a lot if IO)

An approach towards a Distributed Computing problem using the .Net framework

Im interested in programming a Project which distributes a certain computation on large files throughout several computers. The need for distributed computing arises from the crashy and unstable nature of the software I'm using to do the actual computing - so it might crash on some computers but others will surely do the job.
The ideas I have so far Include:
-Using several servers, each pulling a task from a master server whenever its possible
-Using VMwares
-Using a load-balancing Cluster
What is more suited for the job? Any other ideas I should be aware of?
Also, If you can recommend any reliable distributed computing C# framework, it will be helpful.
haven't used any of these myself (yet), but I bookmarked this question a little while ago. some good suggestions there.
Have you looked at Hadoop MapReduce? It's an open-source implementation of Google's MapReduce framework. Though it's Java and not C#, it sounds like it could be perfectly suited to your scenario; the master server automatically handles load balancing and fault-tolerance in a distributed environment.
I would check out Appistry CloudIQ Platform. It links together multiple machines into a single computing framework identified by a unified address. Your client simply submits jobs to the unified address, and the framework distributes jobs to individual machines. It also monitors task execution, and can automatically restart failed jobs. So if your application is prone to crashing, this framework could be ideal. Rather than submitting the same job to multiple machines (and wasting CPU) to cover the failure case, just submit it once, and let the framework handle restarting the jobs that actually fail. I would consider it ideal for your reliability concerns.

Categories