Currently I am working on a multi-process desktop application on Windows. This application will be a shrink wrapped application which will be deployed on client machines across the world. While we can have broad specifications for the machines - e.g. Windows XP SP3 with .Net 4.0 CF, we wont have control over them and we cant be too specific on their configuration - e.g. we cannot specify the machine must have a cuda 1.4 capable graphic processor etc.
Some of these processes are managed (.Net 4.0) and others are unmanaged (C++ Win32). The processes need to share data. The options I have evaluated to date are
Tcp sockets
Named Pipes
Pipes seem to perform a little better, but for our needs - performance from both are acceptable. And sockets give us the flexibility of crossing machine (and operating systems - we would like to support non-Microsoft OSes eventually) boundaries in the future hence our preference for going with sockets.
However - my major concern is this - If we use Tcp sockets - are we likely to run into issues with firewalls? Has anyone else deployed desktop applications / programs that use TCP for IPC and experienced issues? If so - what kind?
I know this is a fairly open ended question and I will be glad to rephrase. But I would really like to know what kind of potential problems we are likely to run into.
edit: To throw a little more light - we are only transporting a few PODs, ints, floats and strings. We have built a layer of abstraction that offers 2 paradigms - a request/response and subscription . The transport layer has been abstracted away and currently we have two implementations - pipe based and TCP based.
Performance of pipes is often better on a fast LAN but TCP is often better on slower networks or WANs. See msdn points below.
TPC is also more configurable. Concerning firewalls, they allow you to open/close communication ports. If that's not an option or a concern, an alternative would be http (REST/json, web service, xml rpc, etc...) but you have to consider if the http overhead is acceptable. Make sure you try it with real world datasets (passing trivial data in a test makes the overhead seem unreasonable, which would be very reasonable with a real world data set).
Some other info from msdn:
In a fast local area network (LAN) environment, Transmission Control
Protocol/Internet Protocol (TCP/IP) Sockets and Named Pipes clients
are comparable in terms of performance. However, the performance
difference between the TCP/IP Sockets and Named Pipes clients becomes
apparent with slower networks, such as across wide area networks
(WANs) or dial-up networks. This is because of the different ways the
interprocess communication (IPC) mechanisms communicate between peers.
For named pipes, network communications are typically more
interactive. A peer does not send data until another peer asks for it
using a read command. A network read typically involves a series of
peek named pipes messages before it begins to read the data. These can
be very costly in a slow network and cause excessive network traffic,
which in turn affects other network clients.
It is also important to clarify if you are talking about local pipes
or network pipes. If the server application is running locally on the
computer running an instance of Microsoft® SQL Server™ 2000, the local
Named Pipes protocol is an option. Local named pipes runs in kernel
mode and is extremely fast.
For TCP/IP Sockets, data transmissions are more streamlined and have
less overhead. Data transmissions can also take advantage of TCP/IP
Sockets performance enhancement mechanisms such as windowing, delayed
acknowledgements, and so on, which can be very beneficial in a slow
network. Depending on the type of applications, such performance
differences can be significant.
TCP/IP Sockets also support a backlog queue, which can provide a
limited smoothing effect compared to named pipes that may lead to pipe
busy errors when you are attempting to connect to SQL Server.
> In general, sockets are preferred in a slow LAN, WAN, or dial-up
network, whereas named pipes can be a better choice when network speed
is not the issue, as it offers more functionality, ease of use, and
configuration options.
For more information about TCP/IP, see the Microsoft Windows NT®
documentation.
If you need to impersonate the named pipe client's security credentials, there's really only one option :) And named pipes also have nicer names (although DNS SRV records can provide those for TCP ports also).
Otherwise, there's not much difference. Both treat the data as a stream of bytes, leaving you responsible for finding message boundaries yourself. Named pipes have an additional option of keeping message boundaries for you, but be warned, you must both create the pipe in message mode and explicitly set the read mode as well.
If I correctly understand your requirements you need to communicate between processes running on the same computer. The processes probably run all in the same security context of the user which is logged on interactively.
In the case I should mention that there are different aspects of the solution. One problem is just to share the data between the applications. Another problem is the protocol which defines how the common data could be accessed and modified and how the communication between the processes take place. You can have for example one process which provide the data and all another subscribe the data. Another case: you can have common data which can be read or modified by all the applications and you need just be sure that nobody modify the shared data on the same time or nobody access the data during another modify it. Of cause it could be many other different communication scenarios.
Under the aspect I would suggest you two other options which you don't included in your question:
usage memory mapped files (see here and here)
usage of COM interface
Both ways can be good implemented in both .NET and unmanaged C++. The usage of memory mapped files is the best way from the performance point of view. If you create View which will be not associated with some physical file you will have just common memory which can be used between processes. You can use additionally an Mutex or Event to control that the memory will be not used at the same time by multiple applications.
In the most simple scenario you can even use #pragma data_seg in C++ to place some data in the named section of DLL and use /SECTION option (like /SECTION:.MYSEC,RWS) to make the data shared. You can use the DLL in all your .NET applications and in all unmanaged C++ application to access the common data. In the way you will have simple way to access to the common data.
If you need to have some more complex communication scenario the approach with COM interface in C++/.NET could be the best choice. In case of I would recommend you the article which describes step by step how to implement Primary Interop Assembly with the COM interface only in .NET and uses it in both .NET and C++ COM for the communication.
Related
I have an application that performs analysis on incoming event flow (CEP engine).
This flow can come from different sources (database, network, etc...).
For efficient decoupling, I want this service to expose a named pipe using wcf, and allow a different application to read the data from the source and feed it into the service.
So, one process is in charge of getting and handling the incoming data while the other for analyzing it, connecting the two using wcf with named pipes binding. They both will be deployed on the same machine.
Question is, will I notice a lower throughput using wcf in the middle then if I would have simply coupled the two services into a single process and use regular events?
No, in modern mainstream operating systems, IPC will never be, can never be, as fast as in-process eventing. The reason for this is the overhead of context switching associated to activating different processes. Even for a multi-core system where distinct processes run on distinct cores, though they each run independently (and therefore there is no cost associated to activating one process versus another - they are both always active), the communication across processes still requires crossing security boundaries, hitting the network stack (even if using pipes), and so on. Where a local function call will be on the order of 1000's of cpu cycles to invoke, an IPC will be millions.
So IPC will be slower than in-process communication. Whether that actually matters in your case, is a different question. For example, suppose you have an operation that requires Monte Carlo simnulation that runs for 2 hours. In this case it really doesn't matter whether it takes 1ms or 1000ms in order to invoke the operation.
Usually, performance of the communication is not what you want to optimize for. Even if performance is important, focusing on one small aspect of performance - let's say, whether to use IPC or local function calls - is probably the wrong way to go about things.
I assumed "CEP" referred to "complex event processing" which implies high throughput, high volume processing. So I understand that performance is important to you.
But, for true scalability and reliability, you cannot simply optimize in-process eventing; You will need to rely on multiple computers and scale out. This will imply some degree of IPC, one way or the other. It's obviously important to be efficient at the smaller scale (events) but your overall top-end performance will be largely bounded by the architecture you choose for scale out.
WCF is nice because of the flexibility it allows in moving building blocks from the local machine to a remote machine, and because of the Channel stack, you can add communication services in a modular way.
Whether this is important to you, is up to you to decide.
I'm in the process of learning C# and just need a pointing in the right direction. I want to build a client in C# that communicates with a server running PHP/mySQL. There will need to be almost constant communication between the two. It will be for a game, so low-latency and bi-directional communication. I'm not looking for an exact how-to, but rather what method I need to use to connect the two for the fastest and most reliable connection. I have read others use XML, but that seems like it would be slower if used near-constantly, like once or more a second, but I could be totally wrong. Thanks in advance!
Normally communication with those characteristics is made over a persistent TCP connection. C# offers lots of ready-to-use functionality in the System.Net.Sockets namespace (start looking from TcpClient and TcpListener; there are also more low-level interfaces if you need them).
The important question here is: what do you mean exactly "server running PHP"? If the server offers only an HTTP interface, then you would find it more natural to communicate not with sockets but with the WebClient or the more low-level HttpWebRequest classes instead.
Ah, writing a game in C# as a means to get started with the language! How many have started this way.
Have you defined your client-server protocol yet? I'm not talking about TCP vs. UDP, which TomTom and Jon have discussed. I mean, what is the data stream going to look like?
Packet fragmentation is the enemy of low-latency network code. Learn about MTU and packet fragmentation, Nagle's algorithm, etc. and write down some notes for later when you implement the network code. Make sure you calculate the smallest size packet you would be interested in sending, how big its headers might be, and how large of a payload you can fit into that packet. Then see if you can come up with a protocol that uses the available space efficiently.
You may gain a lot more by optimizing your server application and/or porting it to a different language. Just because you CAN use PHP for everything server side doesn't mean you SHOULD. Keep the part that shows you useful information in a web browser, and evaluate whether you should rewrite the time-critical and game client communication parts in another language. Interpreted languages are not especially well known for their speed when crunching real-time game world data. Sure, I once wrote something like that in Perl using POE, but ultimately it was a lot less performant than the C code I was mimicking.
Finally, I would recommend you look into XNA, since it has a lot of this stuff already.
I see a lot of questions on the topic of network programming. Despite all the questions and answers I just do not know which way is best to start. Is it better to start from the lowest level, or immediately to work in .NET C #, without going into details below abstraction? Is it better to go with Winsock or BSD Socket programming in Linux?
You can still do low-level TCP or UDP programming in C#, so at that point it is really just a matter of choice whether you want to write network code in C, C#, etc... If all you are trying to do is learn how to write network code, I would consider the language more of a personal choice as the underlying network concepts remain the same.
.NET C#
You have:
low level API to work with TCP, UDP etc.
.NET Remoting
WCF (http, tcp, named pipes, MSMQ)
I would recommended last option but depends on what exaclty you are trying to learn:
to build distributed apps or the nitty gritty details of low level socket API.
It all depends on your existing programming skills. I would not start with the lower levels such as the Socket class (or TcpClient/UdpClient) without having at least basic understanding of asynchronous programming.
A lot of people who starts with socket programming launches a separate thread for the reading since the Read method blocks. It's a very noneffective way to solve the problem, especially in servers. BeingRead/EndRead is the way to go.
Next up is designing a transfer protocol since TCP doesn't guarantee that the complete message is delivered at one. It only guarantees that your messages arrive in the correct order.
The next big thing with socket programming is how to handle incoming data. A newbie mistake is to start appending strings which would result in a lot of memory usage in server applications. Use byte[] buffers and a buffer pool (flyweight pattern) to manage incoming data (should be a easy task if you've made a well designed protocol).
As you can see, it' quite a big task to take on with no prior experience. WCF is a much better option since it handles most of that stuff for you.
Named Pipes ? XML-RPC ? Standard Input-Output ? Web Services ?
I would not use unsafe stuff like Shared Memory and similar
Named pipes would be the fastest method, but it only works for communication between processes on the same computer. Named pipes communication doesn't go all the way down the network stack (because it only works for communication on the same computer) so it will always be faster.
Anonymous Pipes may only be used on the local machine. However, Named Pipes may traverse the network.
I left out Shared Memory since you specifically mentioned that you don't want to go that route. Shared Memory would be even faster than named pipes tho.
So it depends if you only need to communicate between processes on the same computer or different computers. Any XML-based communication protocol (eg. Web Services) will usually be slower due to the massive overhead in XML.
i don't think there's a quick answer to this. if i was you, i would buy/borrow a copy of Advanced Programming in the Unix Environment (APUE) by Stevens and Rago and read Chapter 15 and 16 on IPC. It's a brilliant book if you really want to understand how *nix (a lot of it applies to any POSIX system) works down to the kernel level.
If you must have a quick answer, i would say the following (without putting a huge amount of thought into it), in descending order of efficiency:
Local Machine IPC
Shared Memory/Memory Mapped
files
Named Pipe/FIFO (only between
related processed - i.e. fork)
Unix Domain Socket
Network IPC/Internet Sockets
Datagram Sockets
Stream Sockets
Raw Sockets
At both levels, you are going to have to think about how the data you transfer is encoded/decoded and trade off between memory usage and CPU utilization.
At the Network level, you will have to consider what layers of protcols you are going to run on top of. Most commonly, at the bottom of the application layer you will be choosing between TCP/IP or UDP. TCP has a lot more overhead as it is does error correction, checksumming and lots of other stuff. if you need in order delivery of messages you need to use TCP as opposed to UDP.
On top of these are other protocols like HTTP, SOAP (on top of HTTP or another protocol like FTP/SMTP etc.). A binary protocol is going to be more efficient as long as you are network bound rather than CPU bound. If using SOAP on the MS.Net platform, then binary encoding of the messages is going to be quicker across the network but may be more CPU intensive.
I could go on. it's not a simple question. Learning where the latencies are and how buffering is handled are key to being able to make decisions on the trade offs you are always forced to with IPC. I'd recommend the APUE book above if you really want to know what is going on under the hood...
Windows Messaging is one of the fastest ways for IPC, after-all Windows is built on them.
It's possible to use WM_COPYDATA with IPInvoke calls to exchange data between 2 form based .Net applications, and I've got an open source library for doing exactly that. I've bench marked around 1771 msg/sec on a fairly hot laptop.
http://thecodeking.github.com/XDMessaging.Net
I don't know why you won't go with shared memory, but its very very fast from C# to C# apps on the same machine, and very reliable (unlike TCP sockets). spazzarama/SharedMemory is a fantastic C# lib that supports shared arrays and buffers with a simple high level API. You just initialize the class with a common memory file name (on client/server sides), and then update the array. Values magically appear on the other side!
I have a core .NET application that needs to spawn an arbitrary number of subprocesses. These processes need to be able to access some form of state object in the core application.
What is the best technique? I'll be moving a large amount of data between processes (Bitmaps), so it needs to be fast.
WCF would probably fit the bill.
Here's a really good article on .NET Remoting for performing distributed intensive analysis. Though Remoting has been replaced with the WCF, the article is relevant and shows how to make the calls asynchronously, etc.
This article contrasts WCF to .NET Remoting—the key takeaway here shows that WCF throughput outperforms Remoting for small data, but it approaches Remoting performance as data size increases.
I have similar requirements and am using Windows Communication Foundation to do that right now. My data sizes are probably a bit smaller though.
For reference I'm doing about 30-60 requests of about 5 KB-30 KB per second on a quad-core machine. WCF has been holding up quite well so far.
With WCF you have the added advantages of choosing a transport protocol and security mode that is suitable for your application.
I'd be hesitant to move large data around. I'd be inclined to move pointers to large data around instead, i.e., memory mapped files.
If you truly need to have separate processes there is always named pipes which would perform quite well.
However, would a application domain boundary suffice? Then you could do object marshalling and things would be a lot easier. You application could work shared instances of the same object by using the MarshalByRefObject attribute.
You can use .NET Remoting for inter-process communication (IPC) with IpcChannel. Otherwise you can search for shared memory wrappers and other IPC forms.
There is an MSDN article comparing WCF to a variety of methods including Remoting. However, unless I am reading the bar graph wrong, it shows Remoting to be the same or slightly better (unlike the other comment said).
There is also a blog post about WCF vs. Remoting. The blog post clearly shows Remoting is faster for binary objects and if you are passing Bitmaps (binary objects) then it seems Remoting or shared memory or other IPC option might be faster, although WCF might not be a bad choice anyway.