Threaded Communication and Object Overhead - c#

Denizens of Stack Overflow I require your knowledge. I am working on a video processing system utilizing WPF and basic C# .NET socket communications. Each client transmits video data at 30 frames per second to a server across a LAN environment for processing. Some of the processing is handled by each of the clients to mitigate server load.
I have learned programing in an environment in which hardware limitations have never been a concern. Video changes that.. "Hello World" did not prepare me for this to say the least. Implementation of either of these two prospective methods is not a serious issue. Determination of which I should devote my time and energy to is where I require assistance.
I have two options (but open to suggestions!) assuming hardware limits the clients from producing as close to real time results as possible:
--Queued Client--
Client processes a queue of video frames. Each frame is processed and then sent via TCP packets to the server for further analysis. This system only processes a single frame at a time, in order of sensor capture, and transmits it to the server via a static socket client. *This system fails to take advantage of modern multi-core hardware.
--Threaded Client--
The client utilizes threaded (background worker) processing and transmission of each frame to the server. Each new frame triggers a new processing thread as well as the instantiation of a new network communication class. *This system utilizes modern hardware but may produce serious timing concerns.
To the heart of my inquiry, does threaded communication produce mostly-in-order communication? I already plan to synch video frames between the clients on the server end... but will data delivery be so far out of order as to create a new problem? Recall that this is communication across a local network.
More importantly, will instantiating a new socket communication class as well as a new (simple) video processing class create enough overhead that each frame should NOT be queued or processed in parallel?
The code is just starting to take shape. Hardware of the client systems is unknown and as such their performance cannot be determined. How would you proceed with development?
I am a college student. As such any input assists me in the design of my first real world application of my knowledge.

'To the heart of my inquiry, does threaded communication produce mostly-in-order communication?' No, not in general. If video frames are processed concurrently then some mechanism to maintain end-to-end order is often required, (eg. sequence numbers), together with a suitable protocol and sufficient buffering to reassemble and maintain a valid sequence of frames at the server, (with the correct timing and sychronization, if display is required instead of/as well as streaming to a disk file.
Video usually requires every trick available to optimize performance. Pools of frame objects, (optimally allocated to avoid false-sharing), to avoid garbage-collection, threadPools for image-processing etc.
'will data delivery be so far out of order as to create a new problem?' - quite possibly, if you don't specify and apply a suitable mimimum network speed and availablilty, some streams may stretch the available bufffering to the point where frames have to be dropped, duplicated or interpolated to maintain synchronization. Doing that effectively is part of the fun of video protocols :)

There is only one valid response to "is performance of XXXXXXX sufficient" - try and measure.
In your case you should estimate
network traffic to/from server.
number of clients/number of units of work per unit of time clients send (i.e. total number of frames per second in your case)
how long processing of a unit of work will take
When you estimate the requirements - see if it looks reasonable (i.e. having 10Tb/second of incoming data can't be handled by any normal machine, while 100Mb/s may work with normal 1Gb network).
Than build most basic version of the system possible (i.e. use ASP.Net to build single page site and post files to it at required speed, for "processing" use Thread.Sleep) and observe/measure results.
As for you "will creation of an object be slow" - extremely unlikely to matter for your case as you plan to send huge amount of data over network. But this is extremely easy to try yourself - StopWatch + new MyObject() will show you detailed timing.

Related

C# and Siemens PLC communication

Does anyone know a C# read data command will take place in which step of PLC cycle?
The PLC process steps are:
The operating system starts the scan cycle monitoring time.
The CPU writes the values from the process-image output table in the output modules.
The CPU reads out the status of the inputs at the inputs modules and updates the process-image input table.
The CPU processes the user program in time slices and performs the operations specified in the program.
At the end of a cycle, the operating system executes pending tasks, such as the loading and clearing of blocks.
The CPU the goes back to the begining of the cycle after the configured minimum cycle time, as necessary, and starts cycle time monitoring again.
My purpose is to find out how a C# application can affect on PLC CPU scan cycle time.
It really depends on how you read values from the PLC, but - in general - it's irrelevant: whenever you read, you get the value stored in PLC memory at that time.
From my experience, client applications connected to PLCs have no measurable effect on scan cycle time. By the way, I highly recommend you to use OPC UA subscriptions to maximize read / write efficiency and let the PLC firmware manage tasks internally.
Probably a more detailed answer is possibile with additional details (PLC type, library used for connection / data read-write, ecc).

How to measure the volume of data actually used up by a socket?

I need to measure as precisely as possible how much of cell service provider's data limit my application uses up.
Is it possible to get the volume of data transferred by a .Net UDP Socket over the network interface (including overhead of UDP and IP)?
The application is a server communicating with a great number of embedded devices, each of which is connected to the internet using GPRS with a very low data limit (several megabytes per month at best, so even a few bytes here and there matter). I know the devices don't open connections with any other servers, so measuring the traffic server-side should be enough.
I know I can't get 100% accurate number (I have no idea what traffic the service provider charges), but I would like to get as close as possible.
Assuming this is IPv4, you could add 28 bytes to every data you transfer but your problem is going to be detecting packet loss and potentially fragmentation. You could add some meta data to your communication to detect packet loss (e.g. sequence numbers, acknowledgments and so on) but that would add more overhead of course which you might not want. Maybe a percentage of expected package loss could help. As for fragmentation again you could compensate when the size of your message is greater than the MTU size (which I believe could be quite small, like 296 bytes, not too sure though, maybe check with your mobile provider)
Another somewhat non-intrusive option could be reading network performance counters of your process or restrict your communication into a separate AppDomain.

Improve C# Socket performance

We have a TCP Async socket game server written in C# serving many concurrent connections. Everything else works fine except for the problems below:
Frequent disconnections for some users (Not all mind you)
Degraded performance for users on slow connections
Most of the users who face this problem are using GSM connections (portable USB dongles based on CDMA) and often the signal strength is poor. So basically they have rather poor bandwidth. Perhaps under 1KB per sec.
My question: What can we do to make the connections more stable and improve performance even for the slower connections?
I am thinking dynamic TCP buffers on client and server side, but not really sure of the performance degradation due to overhead in dynamically doing this for each connection of my direction is even correct.
Max data packet size is under 1 KB.
Current TCP buffer size on server and client is 16KB
Any pointers or references on how to write stable anync socket code in C# for poor or slow connections would be a great help. Thanks in advance.
"Performance" is a very relative term. It looks like your main concern is with data transfer rates. Unfortunately you can't do much about it given low-bandwidth connections - maybe data compression can help, but actual effect depends on your data, and there's always a tradeoff between transfer rate improvement vs. compression/de-compression delays. There's also latency to consider for any sort of interactive game.
As #Pierre suggested in the comments you might consider UDP for the transport, but that only works if you can tolerate packet loss and re-ordering, and that again depends of the data and what you do with it.
Another approach I would suggest investigating is to provide two different quality-of-service modes. Clients on good links can use full functionality (say, full-resolution game images), while low-bandwidth clients would get reduced options (say, much smaller size low-quality images). You can measure initial round-trip times on client connect/handshake to select the mode.
Hope this helps a bit.

RabbitMQ transfer rates speed up?

I look for ideas how to speed up message transfers through RabbitMQ.
I installed the latest version on Windows 64 bit, running a server on my local machine on which I also publish and consume to/from through a C# implementation. I initially maxed out at 40,000 messages per second which is impressive but does not suit my needs (I compete with a custom binary reader which can handle 24 million unparsed 16 byte large byte arrays per second; obviously I dont expect to get close to that but I attempt to improve at least). I need to send around 115,000,000 messages as fast as possible. I do not want to persist the data and the connection is gonna be direct to one single consumer. I then built chunks of my 16b byte arrays and published onto the bus without any improvement. The transfer rate maxed out at 45mb/second. I find this very very slow given the fact that in the end it should just boil down to raw transmission speed because I could create byte arrays the size of several megabytes where the efficiency rate of routing by the exchange becomes negligible vs raw transmission speed. Why does my message bus max out at 45mb/second transfer speed?
Bump...and Update: Have not seen any answer to this question in a longer time. I am a bit surprised not a single RabbitMQ developer chimed in. I played extensively with RabbitMQ and ZeroMQ. I decided that RabbitMQ is not up to the task when looking at high throughput in-process messaging solutions. The broker implementation and especially parsing logic is a major bottleneck to improving throughput. I dropped RabbitMQ from my list of possible options.
There was a white paper out describing how they provided a solution to managing low latency, high throughput options financial data streams but it sounds to me all they did was throwing hardware at it rather than providing a solution that targets low latency, high throughput requirements.
ZeroMQ, did a superb job after I studied the documentation more intensively. I can run communication in-process, it provides stable enough push/pull, pub/sub, req/rep, and pair/pair patterns which I need. I was looking for blocking logic within the pub/sub pattern which ZeroMQ does not provide (it drops messages instead when a high watermark is exceeded), but the push/pull pattern provides blocking. So, pretty much all I needed is provided for. The only gripe I have is with their understanding of event processing; the event structure implementation through poll/multiplex is not very satisfactory.

Can interprocess communication be as fast as in-process events (using wcf and c#)

I have an application that performs analysis on incoming event flow (CEP engine).
This flow can come from different sources (database, network, etc...).
For efficient decoupling, I want this service to expose a named pipe using wcf, and allow a different application to read the data from the source and feed it into the service.
So, one process is in charge of getting and handling the incoming data while the other for analyzing it, connecting the two using wcf with named pipes binding. They both will be deployed on the same machine.
Question is, will I notice a lower throughput using wcf in the middle then if I would have simply coupled the two services into a single process and use regular events?
No, in modern mainstream operating systems, IPC will never be, can never be, as fast as in-process eventing. The reason for this is the overhead of context switching associated to activating different processes. Even for a multi-core system where distinct processes run on distinct cores, though they each run independently (and therefore there is no cost associated to activating one process versus another - they are both always active), the communication across processes still requires crossing security boundaries, hitting the network stack (even if using pipes), and so on. Where a local function call will be on the order of 1000's of cpu cycles to invoke, an IPC will be millions.
So IPC will be slower than in-process communication. Whether that actually matters in your case, is a different question. For example, suppose you have an operation that requires Monte Carlo simnulation that runs for 2 hours. In this case it really doesn't matter whether it takes 1ms or 1000ms in order to invoke the operation.
Usually, performance of the communication is not what you want to optimize for. Even if performance is important, focusing on one small aspect of performance - let's say, whether to use IPC or local function calls - is probably the wrong way to go about things.
I assumed "CEP" referred to "complex event processing" which implies high throughput, high volume processing. So I understand that performance is important to you.
But, for true scalability and reliability, you cannot simply optimize in-process eventing; You will need to rely on multiple computers and scale out. This will imply some degree of IPC, one way or the other. It's obviously important to be efficient at the smaller scale (events) but your overall top-end performance will be largely bounded by the architecture you choose for scale out.
WCF is nice because of the flexibility it allows in moving building blocks from the local machine to a remote machine, and because of the Channel stack, you can add communication services in a modular way.
Whether this is important to you, is up to you to decide.

Categories