C# and Siemens PLC communication - c#

Does anyone know a C# read data command will take place in which step of PLC cycle?
The PLC process steps are:
The operating system starts the scan cycle monitoring time.
The CPU writes the values from the process-image output table in the output modules.
The CPU reads out the status of the inputs at the inputs modules and updates the process-image input table.
The CPU processes the user program in time slices and performs the operations specified in the program.
At the end of a cycle, the operating system executes pending tasks, such as the loading and clearing of blocks.
The CPU the goes back to the begining of the cycle after the configured minimum cycle time, as necessary, and starts cycle time monitoring again.
My purpose is to find out how a C# application can affect on PLC CPU scan cycle time.

It really depends on how you read values from the PLC, but - in general - it's irrelevant: whenever you read, you get the value stored in PLC memory at that time.
From my experience, client applications connected to PLCs have no measurable effect on scan cycle time. By the way, I highly recommend you to use OPC UA subscriptions to maximize read / write efficiency and let the PLC firmware manage tasks internally.
Probably a more detailed answer is possibile with additional details (PLC type, library used for connection / data read-write, ecc).

Related

How quickly do events fire?

I am developing an application to work alongside a hardware configuration.
The hardware connects to the application through a serial port in the application, and will be sending data from the hardware to the application at a fast rate (~2-4 μs).
My plan is to receive the data via serial port in the parent form, and then send this data to a User Control that is dedicated to displaying it, via an event.
While I've had no problem with directly transmitting it to a single form, I'm unfamiliar with events, and am not sure if the event will fire fast enough to smoothly handle the data.
My questions are:
How fast can an event fire? Is it based off hardware, or are there software limitations?
What are some drawbacks to handling data with this method?
Are there better alternatives to passing data via an event?
Is it at all feasible to process data at this rate?
Events are plenty fast (a delegate call is almost as cheap as a direct call)
For smoothness, you only need as many batches per second as your frame rate (about 60 Hz)
Depending on your serial port vendor and settings, you might get considerable delay caused by buffering.
Processing data at "this" rate (500,000 samples per second) could very well be feasible, depending on how much processing you have to do. Given the four orders of magnitude difference between your sample rate and the display update rate, it's worth batching them when doing screen updates, not trying to redraw 500,000 times per second.
No traditional "serial port", even the fancy USB-based ones, can transfer 500,000 samples per second. (Best UART I've ever seen is 3 Mbps = 300,000 bytes per second, and each of your packets is probably more than one byte)
If 2-4 microseconds is the bit time, then you have nothing to worry about. The serial port will buffer hundreds of bits before generating an event to your program.

CPU underutilized. Due to blocking I/O?

I am trying to find where lies the bottleneck of a C# server application which underutilize CPU. I think this may be due to poor disk I/O performance and has nothing to do with the application itself but I am having trouble making a fact out of this supposition.
The application reads messages from a local MSMQ queue, does some processing on each messages and after processing the messages, sends out response messages to another local MSMQ queue.
I am using an async loop to read messages from queue, dequeuing them as fast as possible and dispatching them for processing using Task.Run to launch the processing of each messages (and do not await on this Task.Run .. just attaching a continuation only faulted on it to log error). Each messages is processed concurrently, i.e there is no need to wait for a message to be fully processed before processing the next one.
At the end of the processing of a message, I am using the Send method of MessageQueue (somehow asynchronous but not really because it has to wait on disk write before returning -see System.Messaging - why MessageQueue does not offer an asynchronous version of Send).
For the benchmarks I am queuing 100K messages in the queue (approx 100MB total size for 100K messages) and then I launch the program. On two of my personal computers (SSD HD on one and SATA2 HD on the other with i7 CPU quadcores -8 logical proc-) I reach ~95% CPU usage for the duration of the program lifecyle (dequeuing the 100K messages, processing them and sending responses). Messages are dequeued as fast a possible, processed as fast as possible (CPU involved here) and then response for each message sent to different local queue.
Now on a virtual machine running non HT dual core CPU (have no idea what is the underlying disk but seems far less performant than mines... during benchmark, with Perfmon I can see avg disk sec/write arround 10-15 ms on this VM, whereas it is arround 2ms on my personal machines) when I am running the same bench, I only reach ~55% CPU (when I am running the same bench on the machine without sending response messages to queue I reach ~90% CPU).
I don't really understand what is the problem here. Seems clear that sending message to the queue is the problem and slows down the global processing of the program (and dequeuing of messages to be processed), but why would that be considering that I am using Task.Run to launch processing of each dequeued message and ultimately response sending, I would not expect CPU to be underutilized. Unless when one thread is sending a message it blocks other threads to run on the same core while it waits for the return (disk write) in which case it would maybe make sense considering latency is much higher than on my personal computers, but a thread waiting for I/O should not prevent other threads from running.
I am really trying to understand why I am not reaching at least 95% cpu usage on this machine. I am blindly saying this is due to poorer disk i/o performance, but still I don't see why it would lead to CPU underutilization considering I am running processing concurrently using Task.Run. It could also be some system problem completely unrelated to disk, but considering that MessageQueue.Send seems to be the problem and that this method ultimately writes messages to a memory mapped file + disk, I don't see where the performance issue could come from other than disk.
It is of course for sure a system performance issue as the program maximize CPU usage on my own computers, but I need to find what the bottleneck is exactly on the VM system, and why exactly it is affecting the concurrency / speed of my application.
Any idea ?
To examine poor disc and or cpu utilization there is only one tool: Windows Performance Toolkit. For an example how to use it see here.
You should get the latest one from the Windows 8.1 SDK (requires .NET 4.5.1) which gives you most capabilities but the one from the Windows 8 SDK is also fine.
There you get graphs % CPU Utilization and % Disc Utilization. If either one is at 100% and the other one is low then you have found the bottleneck. Since it is a system wide profiler you can check if the msmq service is using the disc badly or you or someone else (e.g. virus scanner is a common issue).
You can directly get to your call stacks and check which process and thread did wake your worker thread up which is supposed to run at full speed. Then you can jump to the readying thread and process and check what it did do before it could ready your thread. That way you can directly verify what was hindering it so long.
No more guessing. You can really see what the system is doing.
To analyze further enable in the CPU Usage Precise view the following columns:
NewProcess
NewThreadId
NewThreadStack(Frame Tags)
ReadyingProcess
ReadyingThreadId
Ready(us) Sum
Wait(us) Sum
Wait(us)
%CPU Usage
Then drill down for a call stack in your process to see where high Wait(us) times do occur in a thread that is supposed to run at full speed.. You can drill down to one single event until you can go no further. Then you will see values in Readying Process and ReadyingThreadId. Go to that process/thread (it can be your own) and repeat the process until you end up in some blocking operation which does either involve disc IO or sleeps or a long running device driver call (e.g virus scanner or the vm driver).
If the Disk I/O performance counters don't look abnormally high, I'd look next at the hypervisor level. Assuming you're running the exact same code, using a VM adds latency to the entire stack (CPU, RAM, Disk). You can perhaps tweak CPU Scheduling at the hypervisor level and see if this will increase CPU utilization.
I'd also consider using a RAMDisk temporarily for performance testing. This would eliminate the Disk/SAN latency and you can see if that fixes your problem.

Threaded Communication and Object Overhead

Denizens of Stack Overflow I require your knowledge. I am working on a video processing system utilizing WPF and basic C# .NET socket communications. Each client transmits video data at 30 frames per second to a server across a LAN environment for processing. Some of the processing is handled by each of the clients to mitigate server load.
I have learned programing in an environment in which hardware limitations have never been a concern. Video changes that.. "Hello World" did not prepare me for this to say the least. Implementation of either of these two prospective methods is not a serious issue. Determination of which I should devote my time and energy to is where I require assistance.
I have two options (but open to suggestions!) assuming hardware limits the clients from producing as close to real time results as possible:
--Queued Client--
Client processes a queue of video frames. Each frame is processed and then sent via TCP packets to the server for further analysis. This system only processes a single frame at a time, in order of sensor capture, and transmits it to the server via a static socket client. *This system fails to take advantage of modern multi-core hardware.
--Threaded Client--
The client utilizes threaded (background worker) processing and transmission of each frame to the server. Each new frame triggers a new processing thread as well as the instantiation of a new network communication class. *This system utilizes modern hardware but may produce serious timing concerns.
To the heart of my inquiry, does threaded communication produce mostly-in-order communication? I already plan to synch video frames between the clients on the server end... but will data delivery be so far out of order as to create a new problem? Recall that this is communication across a local network.
More importantly, will instantiating a new socket communication class as well as a new (simple) video processing class create enough overhead that each frame should NOT be queued or processed in parallel?
The code is just starting to take shape. Hardware of the client systems is unknown and as such their performance cannot be determined. How would you proceed with development?
I am a college student. As such any input assists me in the design of my first real world application of my knowledge.
'To the heart of my inquiry, does threaded communication produce mostly-in-order communication?' No, not in general. If video frames are processed concurrently then some mechanism to maintain end-to-end order is often required, (eg. sequence numbers), together with a suitable protocol and sufficient buffering to reassemble and maintain a valid sequence of frames at the server, (with the correct timing and sychronization, if display is required instead of/as well as streaming to a disk file.
Video usually requires every trick available to optimize performance. Pools of frame objects, (optimally allocated to avoid false-sharing), to avoid garbage-collection, threadPools for image-processing etc.
'will data delivery be so far out of order as to create a new problem?' - quite possibly, if you don't specify and apply a suitable mimimum network speed and availablilty, some streams may stretch the available bufffering to the point where frames have to be dropped, duplicated or interpolated to maintain synchronization. Doing that effectively is part of the fun of video protocols :)
There is only one valid response to "is performance of XXXXXXX sufficient" - try and measure.
In your case you should estimate
network traffic to/from server.
number of clients/number of units of work per unit of time clients send (i.e. total number of frames per second in your case)
how long processing of a unit of work will take
When you estimate the requirements - see if it looks reasonable (i.e. having 10Tb/second of incoming data can't be handled by any normal machine, while 100Mb/s may work with normal 1Gb network).
Than build most basic version of the system possible (i.e. use ASP.Net to build single page site and post files to it at required speed, for "processing" use Thread.Sleep) and observe/measure results.
As for you "will creation of an object be slow" - extremely unlikely to matter for your case as you plan to send huge amount of data over network. But this is extremely easy to try yourself - StopWatch + new MyObject() will show you detailed timing.

Highest Performance for Cross AppDomain Signaling

My performance sensitive application uses MemoryMappedFiles for pushing bulk data between many AppDomains. I need the fastest mechanism to signal a receiving AD that there is new data to be read.
The design looks like this:
AD 1: Writer to MMF, when data is written it should notify the reader ADs
AD 2,3,N..: Reader of MMF
The readers do not need to know how much data is written because each message written will start with a non zero int and it will read until zero, don't worry about partially written messages.
(I think) Traditionally, within a single AD, Monitor.Wait/Pulse could be used for this, I do not think it works across AppDomains.
A MarshalByRefObject remoting method or event can also be used but I would like something faster. (I benchmark 1,000,000 MarshalByRefObject calls/sec on my machine, not bad but I want more)
A named EventWaitHandle is about twice as fast from initial measurements.
Is there anything faster?
Note: The receiving ADs do not need to get every signal as long as the last signal is not dropped.
A thread context switch costs between 2000 and 10,000 machine cycles on Windows. If you want more than a million per second then you are going to have to solve the Great Silicon Speed Bottleneck. You are already on the very low end of the overhead.
Focus on switching less often and collecting more data in one whack. Nothing needs to switch at a microsecond.
The named EventWaitHandle is the way to go for a one way signal (For lowest latency). From my measurements 2x faster than a cross-appdomain method call. The method call performance is very impressive in the latest versions of the CLR to date (4) and should make the most sense for the large majority of cases since it's possible to pass some information int he method call (in my case, how much data to read)
If it's OK to continuously burn a thread on the receiving end, and performance is that critical, a tight loop may be faster.
I hope Microsoft continues to improve the cross appdomain functionality as it can really help with application reliability and plugin-ins.

Fixing gaps in streamed USB data

We have a hardware system with some FPGA's and an FTDI USB controller. The hardware streams data over USB bulk transfer to the PC at around 5MB/s and the software is tasked with staying in sync, checking the CRC and writing the data to file.
The FTDI chip has a 'busy' pin which goes high while its waiting for the PC to do its business. There is a limited amount of buffering in the FTDI and elsewhere on the hardware.
The busy line is going high for longer than the hardware can buffer (50-100ms) so we are losing data. To save us from having to re-design the hardware I have been asked to 'fix' this issue!
I think my code is quick enough as we've had it running up to 15MB/s, so that leaves an IO bottleneck somewhere. Are we just expecting too much from the PC/OS?
Here is my data entry point. Occasionally we get a dropped bit or byte. If the checksum doesn't compute, I shift through until it does. byte[] data is nearly always 4k.
void ftdi_OnData(byte[] data)
{
List<byte> buffer = new List<byte>(data.Length);
int index = 0;
while ((index + rawFile.Header.PacketLength + 1) < data.Length)
{
if (CheckSum.CRC16(data, index, rawFile.Header.PacketLength + 2)) // <- packet length + 2 for 16bit checksum
{
buffer.AddRange(data.SubArray<byte>(index, rawFile.Header.PacketLength));
index += rawFile.Header.PacketLength + 2; // <- skip the two checksums, we dont want to save them...
}
else
{
index++; // shift through
}
}
rawFile.AddData(buffer.ToArray(), 0, buffer.Count);
}
Tip: do not write to a file.... queue.
Modern computers have multiple processors. If you want certain things as fast as possible, use multiple processors.
Have on thread deal with the USB data, check checksums etc. It queues (ONLY) the results to a thread safe queue.
Another thread reads data from the queue and writes it to a file, possibly buffered.
Finished ;)
100ms is a lot of time for decent operations. I have successfully managed around 250.000 IO data packets per second (financial data) using C# without a sweat.
Basically, make sure your IO threads do ONLY that and use your internal memory as buffer. Especially dealing with hardware on one end the thread doing that should ONLY do that, POSSIBLY if needed running in high priority.
To get good read throughput on Windows on USB, you generally need to have multiple asynchronous reads (or very large reads, which is often less convenient) queued onto the USB device stack. I'm not quite sure what the FTDI drivers / libraries do internally in this regard.
Traditionally I have written mechanisms with an array of OVERLAPPED strutures and an array of buffers, and kept shovelling them into ReadFile as soon as they're free. I was doing 40+MB/s reads on USB2 like this about 5-6 years ago, so modern PCs should certainly be able to cope.
It's very important that you (or your drivers/libraries) don't get into a "start a read, finish a read, deal with the data, start another read" cycle, because you'll find that the bus is idle for vast swathes of time. A USB analyser would show you if this was happening.
I agree with the others that you should get off the thread that the read is happening as soon as possible - don't block the FTDI event handler for any longer than at takes to put the buffer into another queue.
I'd preallocate a circular queue of buffers, pick the next free one and throw the received data into it, then complete the event handling as quickly as possible.
All that checksumming and concatenation with its attendant memory allocation, garbage collection, etc, can be done the other side of potentially 100s of MB of buffer time/space on the PC. At the moment you may well be effectively asking your FPGA/hardware buffer to accommodate the time taken for you to do all sorts of ponderous PC stuff which can be done much later.
I'm optimistic though - if you can really buffer 100ms of data on the hardware, you should be able to get this working reliably. I wish I could persuade all my clients to allow so much...
So what does your receiving code look like? Do you have a thread running at high priority responsible solely for capturing the data and passing it in memory to another thread in a non-blocking fashion? Do you run the process itself at an elevated priority?
Have you designed the rest of your code to avoid the more expensive 2nd gen garbage collections? How large are you buffers, are they on the large object heap? Do you reuse them efficiently?

Categories