How quickly do events fire? - c#

I am developing an application to work alongside a hardware configuration.
The hardware connects to the application through a serial port in the application, and will be sending data from the hardware to the application at a fast rate (~2-4 μs).
My plan is to receive the data via serial port in the parent form, and then send this data to a User Control that is dedicated to displaying it, via an event.
While I've had no problem with directly transmitting it to a single form, I'm unfamiliar with events, and am not sure if the event will fire fast enough to smoothly handle the data.
My questions are:
How fast can an event fire? Is it based off hardware, or are there software limitations?
What are some drawbacks to handling data with this method?
Are there better alternatives to passing data via an event?
Is it at all feasible to process data at this rate?

Events are plenty fast (a delegate call is almost as cheap as a direct call)
For smoothness, you only need as many batches per second as your frame rate (about 60 Hz)
Depending on your serial port vendor and settings, you might get considerable delay caused by buffering.
Processing data at "this" rate (500,000 samples per second) could very well be feasible, depending on how much processing you have to do. Given the four orders of magnitude difference between your sample rate and the display update rate, it's worth batching them when doing screen updates, not trying to redraw 500,000 times per second.
No traditional "serial port", even the fancy USB-based ones, can transfer 500,000 samples per second. (Best UART I've ever seen is 3 Mbps = 300,000 bytes per second, and each of your packets is probably more than one byte)
If 2-4 microseconds is the bit time, then you have nothing to worry about. The serial port will buffer hundreds of bits before generating an event to your program.

Related

How to measure the volume of data actually used up by a socket?

I need to measure as precisely as possible how much of cell service provider's data limit my application uses up.
Is it possible to get the volume of data transferred by a .Net UDP Socket over the network interface (including overhead of UDP and IP)?
The application is a server communicating with a great number of embedded devices, each of which is connected to the internet using GPRS with a very low data limit (several megabytes per month at best, so even a few bytes here and there matter). I know the devices don't open connections with any other servers, so measuring the traffic server-side should be enough.
I know I can't get 100% accurate number (I have no idea what traffic the service provider charges), but I would like to get as close as possible.
Assuming this is IPv4, you could add 28 bytes to every data you transfer but your problem is going to be detecting packet loss and potentially fragmentation. You could add some meta data to your communication to detect packet loss (e.g. sequence numbers, acknowledgments and so on) but that would add more overhead of course which you might not want. Maybe a percentage of expected package loss could help. As for fragmentation again you could compensate when the size of your message is greater than the MTU size (which I believe could be quite small, like 296 bytes, not too sure though, maybe check with your mobile provider)
Another somewhat non-intrusive option could be reading network performance counters of your process or restrict your communication into a separate AppDomain.

How to ensure a period between two messages over a Serial port

The documentation for the Serial Port states that:
The DataReceived event is not guaranteed to be raised for every byte
received. Use the BytesToRead property to determine how much data is
left to be read in the buffer.
The protocol we are trying to implement, separates the messages by idle periods. Since we have to rely on the time of each character received, this .NET's restriction seems to be a problem.
Does anyone know how does the .NET's SerialPort decides whether to raise an event or not. Is it to avoid the event spamming in high baud rates, so it buffers them?
Is there any guarantee that at least one event will be raised in ever XY milliseconds? What is that minimal period, if any?
How to approach this problem?
EDIT: A little more research shows that it can be done by setting the timeouts. Stupid me!
This is not a good plan for a protocol, unless the periods between messages are at least a couple of seconds. Windows does not provide any kind of service guarantee for code that runs in user mode, it is not a real-time operating system. Your code will fail when the machine gets heavily loaded and your code gets pre-empted by other threads that run with a higher priority. Like kernel threads. Delays of hundreds of milliseconds are common, several seconds is certainly quite possible, especially when your code got paged-out and the paging file got fragmented. Very hard to troubleshoot, it repeats horribly poorly.
The alternative is simple, just use a frame around the message so you can reliably detect the start and the end of a message. Two bytes will do, STX and ETX are popular choices. Add a length byte if the end-of-message byte can also appear in the data.

Threaded Communication and Object Overhead

Denizens of Stack Overflow I require your knowledge. I am working on a video processing system utilizing WPF and basic C# .NET socket communications. Each client transmits video data at 30 frames per second to a server across a LAN environment for processing. Some of the processing is handled by each of the clients to mitigate server load.
I have learned programing in an environment in which hardware limitations have never been a concern. Video changes that.. "Hello World" did not prepare me for this to say the least. Implementation of either of these two prospective methods is not a serious issue. Determination of which I should devote my time and energy to is where I require assistance.
I have two options (but open to suggestions!) assuming hardware limits the clients from producing as close to real time results as possible:
--Queued Client--
Client processes a queue of video frames. Each frame is processed and then sent via TCP packets to the server for further analysis. This system only processes a single frame at a time, in order of sensor capture, and transmits it to the server via a static socket client. *This system fails to take advantage of modern multi-core hardware.
--Threaded Client--
The client utilizes threaded (background worker) processing and transmission of each frame to the server. Each new frame triggers a new processing thread as well as the instantiation of a new network communication class. *This system utilizes modern hardware but may produce serious timing concerns.
To the heart of my inquiry, does threaded communication produce mostly-in-order communication? I already plan to synch video frames between the clients on the server end... but will data delivery be so far out of order as to create a new problem? Recall that this is communication across a local network.
More importantly, will instantiating a new socket communication class as well as a new (simple) video processing class create enough overhead that each frame should NOT be queued or processed in parallel?
The code is just starting to take shape. Hardware of the client systems is unknown and as such their performance cannot be determined. How would you proceed with development?
I am a college student. As such any input assists me in the design of my first real world application of my knowledge.
'To the heart of my inquiry, does threaded communication produce mostly-in-order communication?' No, not in general. If video frames are processed concurrently then some mechanism to maintain end-to-end order is often required, (eg. sequence numbers), together with a suitable protocol and sufficient buffering to reassemble and maintain a valid sequence of frames at the server, (with the correct timing and sychronization, if display is required instead of/as well as streaming to a disk file.
Video usually requires every trick available to optimize performance. Pools of frame objects, (optimally allocated to avoid false-sharing), to avoid garbage-collection, threadPools for image-processing etc.
'will data delivery be so far out of order as to create a new problem?' - quite possibly, if you don't specify and apply a suitable mimimum network speed and availablilty, some streams may stretch the available bufffering to the point where frames have to be dropped, duplicated or interpolated to maintain synchronization. Doing that effectively is part of the fun of video protocols :)
There is only one valid response to "is performance of XXXXXXX sufficient" - try and measure.
In your case you should estimate
network traffic to/from server.
number of clients/number of units of work per unit of time clients send (i.e. total number of frames per second in your case)
how long processing of a unit of work will take
When you estimate the requirements - see if it looks reasonable (i.e. having 10Tb/second of incoming data can't be handled by any normal machine, while 100Mb/s may work with normal 1Gb network).
Than build most basic version of the system possible (i.e. use ASP.Net to build single page site and post files to it at required speed, for "processing" use Thread.Sleep) and observe/measure results.
As for you "will creation of an object be slow" - extremely unlikely to matter for your case as you plan to send huge amount of data over network. But this is extremely easy to try yourself - StopWatch + new MyObject() will show you detailed timing.

Highest Performance for Cross AppDomain Signaling

My performance sensitive application uses MemoryMappedFiles for pushing bulk data between many AppDomains. I need the fastest mechanism to signal a receiving AD that there is new data to be read.
The design looks like this:
AD 1: Writer to MMF, when data is written it should notify the reader ADs
AD 2,3,N..: Reader of MMF
The readers do not need to know how much data is written because each message written will start with a non zero int and it will read until zero, don't worry about partially written messages.
(I think) Traditionally, within a single AD, Monitor.Wait/Pulse could be used for this, I do not think it works across AppDomains.
A MarshalByRefObject remoting method or event can also be used but I would like something faster. (I benchmark 1,000,000 MarshalByRefObject calls/sec on my machine, not bad but I want more)
A named EventWaitHandle is about twice as fast from initial measurements.
Is there anything faster?
Note: The receiving ADs do not need to get every signal as long as the last signal is not dropped.
A thread context switch costs between 2000 and 10,000 machine cycles on Windows. If you want more than a million per second then you are going to have to solve the Great Silicon Speed Bottleneck. You are already on the very low end of the overhead.
Focus on switching less often and collecting more data in one whack. Nothing needs to switch at a microsecond.
The named EventWaitHandle is the way to go for a one way signal (For lowest latency). From my measurements 2x faster than a cross-appdomain method call. The method call performance is very impressive in the latest versions of the CLR to date (4) and should make the most sense for the large majority of cases since it's possible to pass some information int he method call (in my case, how much data to read)
If it's OK to continuously burn a thread on the receiving end, and performance is that critical, a tight loop may be faster.
I hope Microsoft continues to improve the cross appdomain functionality as it can really help with application reliability and plugin-ins.

Fixing gaps in streamed USB data

We have a hardware system with some FPGA's and an FTDI USB controller. The hardware streams data over USB bulk transfer to the PC at around 5MB/s and the software is tasked with staying in sync, checking the CRC and writing the data to file.
The FTDI chip has a 'busy' pin which goes high while its waiting for the PC to do its business. There is a limited amount of buffering in the FTDI and elsewhere on the hardware.
The busy line is going high for longer than the hardware can buffer (50-100ms) so we are losing data. To save us from having to re-design the hardware I have been asked to 'fix' this issue!
I think my code is quick enough as we've had it running up to 15MB/s, so that leaves an IO bottleneck somewhere. Are we just expecting too much from the PC/OS?
Here is my data entry point. Occasionally we get a dropped bit or byte. If the checksum doesn't compute, I shift through until it does. byte[] data is nearly always 4k.
void ftdi_OnData(byte[] data)
{
List<byte> buffer = new List<byte>(data.Length);
int index = 0;
while ((index + rawFile.Header.PacketLength + 1) < data.Length)
{
if (CheckSum.CRC16(data, index, rawFile.Header.PacketLength + 2)) // <- packet length + 2 for 16bit checksum
{
buffer.AddRange(data.SubArray<byte>(index, rawFile.Header.PacketLength));
index += rawFile.Header.PacketLength + 2; // <- skip the two checksums, we dont want to save them...
}
else
{
index++; // shift through
}
}
rawFile.AddData(buffer.ToArray(), 0, buffer.Count);
}
Tip: do not write to a file.... queue.
Modern computers have multiple processors. If you want certain things as fast as possible, use multiple processors.
Have on thread deal with the USB data, check checksums etc. It queues (ONLY) the results to a thread safe queue.
Another thread reads data from the queue and writes it to a file, possibly buffered.
Finished ;)
100ms is a lot of time for decent operations. I have successfully managed around 250.000 IO data packets per second (financial data) using C# without a sweat.
Basically, make sure your IO threads do ONLY that and use your internal memory as buffer. Especially dealing with hardware on one end the thread doing that should ONLY do that, POSSIBLY if needed running in high priority.
To get good read throughput on Windows on USB, you generally need to have multiple asynchronous reads (or very large reads, which is often less convenient) queued onto the USB device stack. I'm not quite sure what the FTDI drivers / libraries do internally in this regard.
Traditionally I have written mechanisms with an array of OVERLAPPED strutures and an array of buffers, and kept shovelling them into ReadFile as soon as they're free. I was doing 40+MB/s reads on USB2 like this about 5-6 years ago, so modern PCs should certainly be able to cope.
It's very important that you (or your drivers/libraries) don't get into a "start a read, finish a read, deal with the data, start another read" cycle, because you'll find that the bus is idle for vast swathes of time. A USB analyser would show you if this was happening.
I agree with the others that you should get off the thread that the read is happening as soon as possible - don't block the FTDI event handler for any longer than at takes to put the buffer into another queue.
I'd preallocate a circular queue of buffers, pick the next free one and throw the received data into it, then complete the event handling as quickly as possible.
All that checksumming and concatenation with its attendant memory allocation, garbage collection, etc, can be done the other side of potentially 100s of MB of buffer time/space on the PC. At the moment you may well be effectively asking your FPGA/hardware buffer to accommodate the time taken for you to do all sorts of ponderous PC stuff which can be done much later.
I'm optimistic though - if you can really buffer 100ms of data on the hardware, you should be able to get this working reliably. I wish I could persuade all my clients to allow so much...
So what does your receiving code look like? Do you have a thread running at high priority responsible solely for capturing the data and passing it in memory to another thread in a non-blocking fashion? Do you run the process itself at an elevated priority?
Have you designed the rest of your code to avoid the more expensive 2nd gen garbage collections? How large are you buffers, are they on the large object heap? Do you reuse them efficiently?

Categories