I am writing an application that uses OpenNETCF.IO.Serial (open source, see serial code here) for its serial communication on a Windows CE 6.0 device. This application is coded in C# in the Compact Framework 2.0. I do not believe the issue I am about to describe is specifically related to these details, but I may be proven to be wrong in that regard.
The issue I am having is that, seemingly randomly (read as: intermittent issue I cannot reliably duplicate yet), data will fail to transmit or be received until the device itself is rebooted. The Windows CE device communicates with a system that runs an entirely different application. Rebooting this other system and disconnecting/reconnecting communication cables does not appear to resolve this issue, only rebooting the Windows CE device.
The only sign of this issue occurring is a lack of a TxDone event from OpenNETCF firing (look for "TxDone();" in OpenNETCF.IO.Serial class), and no data being received, when I know for a fact that the connected system is sending data.
Any character value from 1 - 255 (0x01 - 0xFF) can be sent and received in our serial communication. Null values are discarded.
My serial settings are 38400 Baud, 8 data bits, no parity, 1 stop bit (38400, 8n1). I've set the input and output buffer sizes to 256 bytes. DataReceived event happens whenever we receive 1 or more characters, and transmission occurs when there's 1 or more bytes in the output buffer, since messages are of variable length.
No handshaking is used. Since this is RS422, only 4 signals are being used: RX+, RX-, TX+, TX-.
I receive a "DataReceived" event, I read all data from the input buffer and make my own buffer in my code to parse through it at my leisure outside of the DataReceived event. When I receive a command message, I send an quick acknowledgment message back. When the other system receives a command message from the Windows CE device, it will send a quick acknowledgment message back. Acknowledgment messages get no further replies since they're intended as a simple "Yep, got it."
In my code, I receive/transmit through multiple threads, so I use the lock keyword so I'm not transmitting multiple messages simultaneously on multiple threads. Double checking through code has shown that I am not getting hung up on any locks.
At this point, I am wondering if I am continuously missing something obvious about how serial communication works, such as if I need to set some variable or property, rather than just reading from an input buffer when not empty and writing to a transmit buffer.
Any insight, options to check, suggestions, ideas, and so on are welcome. This is something I've been wrestling with on my own for months, I hope that answers or comments I receive here can help in figuring out this issue. Thank you in advance.
Edit, 2/24/2011:
(1) I can only seem to recreate the error on boot up of the system that the Windows CE device is communicating with, and not every boot up. I also looked at the signals, common mode voltage fluctuates, but amplitude of the noise that occurs at system boot up seems unrelated to if the issue occurs or not, I've seen 25V peak-to-peak cause no issue, when 5V peak-to-peak the issue reoccurred).
Issue keeps sounding more and more hardware related, but I'm trying to figure out what can cause the symptoms I'm seeing, as none of the hardware actually appears to fail or shutdown, at least where I've been able to reach to measure signals. My apologies, but I will not be able to give any sort of part numbers of hardware parts, so please don't ask the components being used.
(2) As per #ctacke's suggestion, I ensured all transmits were going through the same location for maintainability, the thread safety I put in is essentially as follows:
lock(transmitLockObj)
{
try
{
comPort.Output = data;
}
[various catches and error handling for each]
}
(3) Getting UART OVERRUN errors, in a test where <10 bytes were being sent and received on about a 300msec time interval at 38400 Baud. Once it gets an error, it goes to the next loop iteration, and does NOT run ReadFile, and does NOT run TxDone event (or any other line checking procedures). Also, not only does closing and reopening the port do nothing to resolve this, rebooting the software while the device is still running doesn't do anything, either. Only a hardware reboot.
My DataReceived event is as follows:
try
{
byte[] input = comPort.Input; //set so Input gets FULL RX buffer
lock(bufferLockObj)
{
for (int i = 0; i < input.Length; i++)
{
_rxRawBuffer.Enqueue(input[i]);
//timer regularly checks this buffer and parses data elsewhere
//there, it is "lock(bufferLockObj){dataByte = _rxRawBuffer.Dequeue();}"
//so wait is kept short in DataReceived, while remaining safe
}
}
}
catch (Exception exc)
{
//[exception logging and handling]
//hasn't gotten here, so no point in showing
}
However, instantly after the WriteFile call did timed out the first time in the test was when I started getting UART OVERRUN errors. I honestly can't see my code causing a UART OVERRUN condition.
Thoughts? Hardware or software related, I'm checking everything I can think to check.
Everything sounds right, but your observations kind of show that they're not.
Since you've stated that you're sending from multiple threads, the first thing I'd do is put in some sort of mechanism for sending where all send requests come into one location before calling out to the serial object instance. Sure, you say that you've ensured you have thread safety, but serializing these calls through one location would help reinforce that (and make the code a bit more maintainable/extensible).
Next I'd probably add some temp handling in the Serial lib to specifically set an event or break in the debugger when you've done a Tx but the TxDone event doesn't fire within some bounding period. It's always possible that the Serial lib has a bug in it (trust me, the author of that code is far from infallible) where some race condition is getting by.
Thank you everyone who responded. We've found that this actually appears to be hardware-related. I'm afraid I can't give more information than this, but I thank everyone who contributed possible solutions or troubleshooting steps.
Related
Like many people, I'm having issues with the DataReceived event not firing.
After working with it, I wrapped my handling processes under the the BytesToRead count, so if I miss a fire, I can pick up where I left off. Seemed like it would fix all my issues.
The problem is, sometimes it doesn't trigger even once. Depending on the packet being sent back, this could be absolutely critical, forcing me to restart the application and the setup process because it relies on being able to process a response.
Reading through some of the responses to similar questions hasn't gotten me any closer to guaranteeing that the event will fire at minimum requirements. Microsoft mentions the issue with DataReceived not being guaranteed to fire for every byte, but I noticed this above:
The DataReceived event is also raised if an Eof character is received, regardless of the number of bytes in the internal input buffer and the value of the ReceivedBytesThreshold property.
So my question is, can I force an EOF character through my serial connection to force the event to fire? What would this character be, 0x1A?
If I can't force an EOF character through serial, what would my options be? My first thought was maybe create a Task to keep a watch for the event triggering, and if it doesn't trigger, to trigger the actions through the Task.
So I was able to fix my issues, coming out a little wiser.
From my observations, the ReadBytesThreshold plays a critical part in how effective the event actually is. When the threshold is set too low, the serial port has a tendency to get itself confused, and will eventually throw up its hands and give up.
Setting this closer to the size of my expected data coming in appeared to help ease the burden enough to make the reading fairly consistent.
I didn't test my idea of using a Task, but reading further online appeared to answer my question about using 0x1A: from what I noticed, it will trigger if it receives this character (at least on a Windows machine).
I'm not sure how to approach this. I am hesitant about showing my code because it's a university assignment. I need some place to start.
I'm making a TCP card game with four players and a server. Every 100ms, a player asks for an update from the server using a background worker. The server accepts a client connection and reads in a Enumeration value (sent as an Int32) that tells it the action it wants server to send it (update, card, player, etc) and a value that is read in based on the Enumeration value (Recieving an Update Enumeration means it needs to read in a Int32 next). The server sends back a response based on the Enumeration read in.
Here's where the problem occurs. I have a custom built computer (2500K processor, Win8 x64) and when I execute the program on it, it will loop forever, accepting client requests and sending the appropriate response back. Exactly as expected! However, on my laptop (Levono YogaPad, Win8 x64) the back and forth exchange lasts for around 30-50 requests and then deadlocks. It's always at the same spot. The server has read in the Enumeration and is awaiting for the second value. The client is past the part of sending the enum and value and is waiting for the results. It is always stable on my desktop and always deadlocks on my laptop. I even slow the program down to update every second and it still deadlocks. I'm not sure what to do.
I've built the program on each computer. I've built the program on my desktop and ran it on my laptop and it still deadlocks. Does anyone have any suggestions?
Many thanks!
You are lucky that the code hangs on one machine before you send the assignment in and it hangs on your teachers machine. You are also lucky that the problem is reproducible, so better find out where exactly it hangs. Without having access to the code I have the following wild guesses:
you forgot to do proper error handling in all places and now it busy loops because of an unexpected error
it hangs inside a read where you try to read N bytes but the peer sends only K<N bytes
These are wild guesses, but without having access to even the basic structure of your program you cannot probably expect anything more.
I have a GPIB device that I'm communicating with using a National Instruments USB to GPIB. the USB to GPIB works great.
I am wondering what can cause a GPIB device to be unresponsive? If I Turn off the device and turn it back on it will respond, but when I run my program it will respond at first. It then cuts off I can't even communicate with the GPIB device it just times out.
Did I fill up the buffer?
Some specifics from another questioner
I'm controlling a National Instruments GPIB card (not USB) with PyVisa. The instrument on the GPIB bus is a Newport ESP300 motion controller. During a session of several hours (all the while sending commands to and reading from the ESP300) the ESP300 will sometimes stop listening and become unresponsive. All reads time out, and not even *idn? produces a response.
Is there something I can do that is likely to clear this state? e.g. drive the IFC line?
Since you are using National Instruments hardware you can run NI Trace in the background to check all the commands that is send out from the Program. In the Trace do check the last command and its parameters that is send out from the program that causes the hardware to hang.
You can download NI IO Trace here
There should be a clear command (something like "*CLS?", but dont quote me on that). I always run that when i first connect to a device. Then make sure you have a good timeout duration. I found for my device around 1 second works. Less then 1 second makes it so I miss the read after a write. Most of the time, a timeout is because you just missed it or you are reading after a command without a return. Make sure you are also checking for errors in the error queue in between write to make sure the write actually properly when through.
Even the command *CLS will not work if the device is not listening anymore (which might be the case here). The only way to force resetting the device's interface whatever its status (listening or not) is to send the low-level gpib bus message "Selected Device Clear" (it is implemented by the function "ibclr" of the standard gpib library, e.g. https://www.l-com.com/multimedia/manuals/M_USB-488.PDF page 3-7, but I don't know what is the equivalent in Python). This command is intended to be used whenever a GPIB error occurs, I'm always doing it and never had problems. For this to work well you should also monitor the return values of all gpib calls - usually people don't do it so they are unaware of errors until the program hangs.
I have a program that is used to talk to hardware over rs232. This software is used to display a stream of data that is pushed over the rs232 from the hardware as fast as it can be. The problem I am running into is that over time the private memory assigned to the program explodes, and will very rapidly crash the program. If I disable the hardware from sending data for about 2 minutes, then the software can clear out the memory, but only if I pause the data stream.
I am using the DataReceived event from the SerialPort, and this appears to be where the problem is at, because it will cause a memory spike even if the DataReceived function does nothing inside it. The only thing I can come up with is that every time this event is raised it creates a new thread to run, and it is happening so fast that the computer doesn't have time to run GC while the data is coming in.
Is there a more efficient way to pull data off a SerialPort object? I only care about a string when I receive a "NewLine"?
Thanks,
John Vickers
DataReceived is executed on a different thread. I had problems with really fast data and this event caused me problems. Because of that, I created one thread and read the data myself:
while (this.serialPort.IsOpen)
{
int b = this.serialPort.ReadByte();
if (b != -1)
{
// data is good here
}
}
But like the others said, without any code sample, there isn't much we can help you with.
This is very unusual but it is technically possible. SerialPort uses threadpool threads to call the DataReceived event handler. As soon as it receives one or more bytes, it grabs a TP thread to notify your app. There's a lock in the event generation code, only one thread can call your event handler at a time.
A potential failure mode here is that one of these calls, likely the first one, enters a loop in your code from which it never exits. If you haven't set up the Handshake property, the device can keep sending and triggering more TP calls, all of them blocking on that lock.
Diagnose this from the Debug + Windows + Threads window. If my guess is accurate then you should see a large number of threads listed here. One of them should be inside your DataReceived event handler, double-click it and look at the call stack to see where it is stuck. The memory you are seeing consumed is eaten by the stacks of these threads, one megabyte each.
Another possibility is that your DataReceived event handling code is very slow, possibly by calling Control.Invoke(). Slow enough to not be able to keep up with the device. You now really do need to use the Handshake property to setup flow control. Or fix whatever makes it so slow. There should also be a very large number of ErrorReceived events btw, be sure to implement it so you can see this stuff going wrong.
There's an upper limit on the number of TP threads that can be running at the same time. It is rather generous, 250 times the number of cores. That can easily consume half a gigabyte of memory on a typical dual-core machine.
Just to revive this issue.
I'm seeing a massive memory leak when using the DataReceived event.
I am using a USB 3G modem which provides a serial modem interface. I wrote a tiny program that just opens the serial port and connects to the DataReceived event. The event handler is just an empty method.
If you yank out the dongle memory starts to leak at about 10MB per second. No exception is thrown.
Spinning up a new thread and using the synchronous Read(...) method solved the problem for me. I now get an exception when I yank out the dongle that I can handle and no memory leaks.
I developed a state-driven serial port programming language in C# and I believe it really solves nearly all of the serial port problems that everybody encounters with.
Would you please try it with the following simple state and check memory leaks ?
state Init
recv();
$len = length($DATA_PACKET);
if("$len > 0") {
log($DATA_PACKET, Debug);
}
end state
Screen shots
Project homepage
Download
If you have any questions, please feel free to ask.
Here's some background on what I'm trying to do:
Open a serial port from a mobile device to a Bluetooth printer.
Send an EPL/2 form to the Bluetooth printer, so that it understands how to treat the data it is about to receive.
Once the form has been received, send some data to the printer which will be printed on label stock.
Repeat step 3 as many times as necessary for each label to be printed.
Step 2 only happens the first time, since the form does not need to precede each label. My issue is that when I send the form, if I send the label data too quickly it will not print. Sometimes I get "Bluetooth Failure: Radio Non-Operational" printed on the label instead of the data I sent.
I have found a way around the issue by doing the following:
for (int attempt = 0; attempt < 3; attempt++)
{
try
{
serialPort.Write(labelData);
break;
}
catch (TimeoutException ex)
{
// Log info or display info based on ex.Message
Thread.Sleep(3000);
}
}
So basically, I can catch a TimeoutException and retry the write method after waiting a certain amount of time (three seconds seems to work all the time, but any less and it seems to throw the exception every attempt). After three attempts I just assume the serial port has something wrong and let the user know.
This way seems to work ok, but I'm sure there's a better way to handle this. There are a few properties in the SerialPort class that I think I need to use, but I can't really find any good documentation or examples of how to use them. I've tried playing around with some of the properties, but none of them seem to do what I'm trying to achieve.
Here's a list of the properties I have played with:
CDHolding
CtsHolding
DsrHolding
DtrEnable
Handshake
RtsEnable
I'm sure some combination of these will handle what I'm trying to do more gracefully.
I'm using C# (2.0 framework), a Zebra QL 220+ Bluetooth printer and a windows Mobile 6 handheld device, if that makes any difference for solutions.
Any suggestions would be appreciated.
[UPDATE]
I should also note that the mobile device is using Bluetooth 2.0, whereas the printer is only at version 1.1. I'm assuming the speed difference is what's causing the printer to lag behind in receiving the data.
Well I've found a way to do this based on the two suggestions already given. I need to set up my serial port object with the following:
serialPort.Handshake = Handshake.RequestToSendXOnXOff;
serialPort.WriteTimeout = 10000; // Could use a lower value here.
Then I just need to do the write call:
serialPort.Write(labelData);
Since the Zebra printer supports software flow control, it will send an XOff value to the mobile device when the buffer is nearly full. This causes the mobile device to wait for an XOn value to be sent from the printer, effectively notifying the mobile device that it can continue transmitting.
By setting the write time out property, I'm giving a total time allowed for the transmission before a write timeout exception is thrown. You would still want to catch the write timeout, as I had done in my sample code in the question. However, it wouldn't be necessary to loop 3 (or an arbitrary amount of) times, trying to write each time, since the software flow control would start and stop the serial port write transmission.
Flow control is the correct answer here, and it may not be present/implemented/applicable to your bluetooth connection.
Check out the Zebra specification and see if they implement, or if you can turn on, software flow control (xon, xoff) which will allow you to see when the various buffers are getting full.
Further, the bluetooth radio is unlikely to be capable of transmitting faster than 250k at the maximum. You might consider artificially limiting it to 9,600bps - this will allow the radio a lot of breathing room for retransmits, error correction, detection, and its own flow control.
If all else fails, the hack you're using right now isn't bad, but I'd call Zebra tech support and find out what they recommend before giving up.
-Adam
The issue is likely not with the serial port code, but with the underlying bluetooth stack. The port you're using is purely virtual, and it's unlikely that any of the handshaking is even implemented (as it would be largely meaningless). CTS/RTS DTR/DSR are simply non-applicable for what you're working on.
The underlying issue is that when you create the virtual port, underneath it has to bind to the bluetooth stack and connect to the paired serial device. The port itself has no idea how long that might take and it's probably set up to do this asynchronously (though it would be purely up to the device OEM how that's done) to prevent the caller from locking up for a long period if there is no paired device or the paired device is out of range.
While your code may feel like a hack, it's probably the best, most portable way to do what you're doing.
You could use a bluetooth stack API to try to see if the device is there and alive before connecting, but there is no standardization of stack APIs, so the Widcom and Microsoft APIs differ on how you'd do that, and Widcom is proprietary and expensive. What you'd end up with is a mess of trying to discover the stack type, dynamically loading an appropriate verifier class, having it call the stack and look for the device. In light of that, your simple poll seems much cleaner, and you don't have to shell out a few $k for the Widcom SDK.