C# TCP Application deadlocks on one computer and never deadlocks on another - c#

I'm not sure how to approach this. I am hesitant about showing my code because it's a university assignment. I need some place to start.
I'm making a TCP card game with four players and a server. Every 100ms, a player asks for an update from the server using a background worker. The server accepts a client connection and reads in a Enumeration value (sent as an Int32) that tells it the action it wants server to send it (update, card, player, etc) and a value that is read in based on the Enumeration value (Recieving an Update Enumeration means it needs to read in a Int32 next). The server sends back a response based on the Enumeration read in.
Here's where the problem occurs. I have a custom built computer (2500K processor, Win8 x64) and when I execute the program on it, it will loop forever, accepting client requests and sending the appropriate response back. Exactly as expected! However, on my laptop (Levono YogaPad, Win8 x64) the back and forth exchange lasts for around 30-50 requests and then deadlocks. It's always at the same spot. The server has read in the Enumeration and is awaiting for the second value. The client is past the part of sending the enum and value and is waiting for the results. It is always stable on my desktop and always deadlocks on my laptop. I even slow the program down to update every second and it still deadlocks. I'm not sure what to do.
I've built the program on each computer. I've built the program on my desktop and ran it on my laptop and it still deadlocks. Does anyone have any suggestions?
Many thanks!

You are lucky that the code hangs on one machine before you send the assignment in and it hangs on your teachers machine. You are also lucky that the problem is reproducible, so better find out where exactly it hangs. Without having access to the code I have the following wild guesses:
you forgot to do proper error handling in all places and now it busy loops because of an unexpected error
it hangs inside a read where you try to read N bytes but the peer sends only K<N bytes
These are wild guesses, but without having access to even the basic structure of your program you cannot probably expect anything more.

Related

Cancel background worker which is calling external process

I have created a TelNet server for a project I need to do which is working fine, however when a client connects to the server it needs to connect to a database, again this works fine when the connection information is correct and/or calls to the database do not take too long.
If the database call takes a long time (usually due to incorrect credentials or a badly optimised stored procedure) the server will crash with a Windows error message (i.e. not debuggable), which I understand is the underlying TCP system kicking in, which is fine. To resolve this I am putting all the database calls into BackgroundWorkers, so the server (and clients) continue to work, however I need to kill off this process if it is obviously taking too long.
I know about using BackgroundWorker.CancellationPending, but as this is a single method call to the database (via and external DLL), it will never get checked. Same issue with a self-made approach that I have seen elsewhere. The other option I have seen is using Thread.Abort(), but I also know that is unpredictable and unsafe, so probably best not to use that.
Does anyone have any suggestions how to accomplish this?
The problem here is that an external DLL is controlling the waiting. Normally, you could cancel ADO.NET connections or socket connections but this doesn't work here.
Two reliable approaches:
Move the connection into a child process that you can kill. Kill is safe (in contrast to Thread.Abort!) because all state of that process is gone at the same time.
Structure the application so that in case of cancellation the result of the connection attempt is just being ignored and that the app continues running something else. You just let the hanging connection attempt "dangle" in the background and throw away its result when it happens to return later.

What can cause a GPIB to be unresponsive

I have a GPIB device that I'm communicating with using a National Instruments USB to GPIB. the USB to GPIB works great.
I am wondering what can cause a GPIB device to be unresponsive? If I Turn off the device and turn it back on it will respond, but when I run my program it will respond at first. It then cuts off I can't even communicate with the GPIB device it just times out.
Did I fill up the buffer?
Some specifics from another questioner
I'm controlling a National Instruments GPIB card (not USB) with PyVisa. The instrument on the GPIB bus is a Newport ESP300 motion controller. During a session of several hours (all the while sending commands to and reading from the ESP300) the ESP300 will sometimes stop listening and become unresponsive. All reads time out, and not even *idn? produces a response.
Is there something I can do that is likely to clear this state? e.g. drive the IFC line?
Since you are using National Instruments hardware you can run NI Trace in the background to check all the commands that is send out from the Program. In the Trace do check the last command and its parameters that is send out from the program that causes the hardware to hang.
You can download NI IO Trace here
There should be a clear command (something like "*CLS?", but dont quote me on that). I always run that when i first connect to a device. Then make sure you have a good timeout duration. I found for my device around 1 second works. Less then 1 second makes it so I miss the read after a write. Most of the time, a timeout is because you just missed it or you are reading after a command without a return. Make sure you are also checking for errors in the error queue in between write to make sure the write actually properly when through.
Even the command *CLS will not work if the device is not listening anymore (which might be the case here). The only way to force resetting the device's interface whatever its status (listening or not) is to send the low-level gpib bus message "Selected Device Clear" (it is implemented by the function "ibclr" of the standard gpib library, e.g. https://www.l-com.com/multimedia/manuals/M_USB-488.PDF page 3-7, but I don't know what is the equivalent in Python). This command is intended to be used whenever a GPIB error occurs, I'm always doing it and never had problems. For this to work well you should also monitor the return values of all gpib calls - usually people don't do it so they are unaware of errors until the program hangs.

Partial file uploads being automatically deleted

I have some c# code that is doing some file uploads to my apache server via HttpWebRequests. While the upload is in progress, I am able to use ls -la to see the growing file size.
Now, if I for example pull my computers network cable, the partial file upload remains on the server.
However, if I simply close my c# app, the partial file is deleted!
I assume this is being caused by my streams being closed gracefully. How can I prevent this behavior? I want my partial file uploads to remain regardless of how the uploading app behaves.
I have attempted to use a destructor to abort my request stream, as well as call System.Environment.Exit(1), neither of which had any effect.
Pulling the network cable will never be equivalent to aborting the stream or closing the socket, as it is a failure in a lower OSI level.
Whenever the application is closed, the networking session is aborted and any pending operation cancelled. I don't think there's any workaround, unless you programmatically split the file transfer in
smaller chunks and save them as you go along (this way you'd have a manual incremental transfer, but it requires some code server-side).
Write a very simple HTTP proxy that keeps accepting connections but never closes a connection to your server
Even simpler, using netcat 1.10 (though this will accept just one connection)
nc -q $FOREVER -l -p 12345 -c 'nc $YOUR_SERVER 80'
Then connect your C# client to localhost:12345
This might be a silly suggestion but what if you call Process.GetCurrentProcess().Kill(); while the application is being closed?
Before looking at processing of partial uploads, start by testing whether turning keepalives on in Apache configuration solves your problem of receiving partial uploads.
This may have the effect of seeing fewer disconnects and thus less need to process their partial data. Such disconnects may be due to the client, the server, but often they are due to an intermediate node such as a firewall. The keepalives option has the effect of maintaining steady "dummy" traffic (0 byte long data payload), thus advertising to all parties that the connection is still alive.
For a large site with heavy concurrent load, keepalives are a bad thing which is why they are off by default. The option makes connection management for Apache much more complicated, preventing optimized connection reuse, and there is also a little bit of extra network traffic. But maybe you have a specialized use case where this is not a concern.
Keepalives will never help you at all if your clients simply tend to crash too soon (that is, if you see steady progress on the uploads at all times). They may help you considerably if the issue is network related.
They will help you tremendously if your clients generate the data gradually, with long delays in between uploaded chunks.
Have you checked, if your application steps into
void FinishUpload(IAsyncResult result) {…}
(line 240) when aborting/killing the app? If so, you may consider to not enter the callback. This is a bit dirty but may give you a location to start digging.
Does Apache support the SendChunked property of HTTPRequest?
If so it is worth trying out.
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.sendchunked.aspx

What makes my program work with a delay on windows startup?

My program starts with windows startup,
But a background worker is supposed to work instantly after the program is opened.
But it starts with a delay and then even returns false signs(it returns if a site is up),
Only after about 15 seconds the background-worker continues to work normally and the program too. I think this is because of .net framework trying to load, or internet connection that is not up yet, or something that didn't load yet(windows startup).
What can solve this, and what is the probable cause? (WinForm C#)
Edit:
Here is something I thought of,
I don't think though that this is a good practice. Is there a better way?
(Load method):
while (!netConnection())
{
}
if(netConnection())
bwCheck.RunWorkerAsync();
I think this is because of .net framework trying to load
Nope. If that were the case your program wouldn't run.
or internet connection that is not up yet, or
Yup. The network card/interface/connection/whatever is not initialized and connected to the internet yet. You can't expect a PC to be connected to the internet immediately at startup. Even more, what if your customer is on a domain using network authentication? What if they delay network communications until some task is complete (this was actually the problem in my case below. Seriously.)
It may take even longer to get it up and running in that case (read: don't add a Thread.Sleep() in a vain attempt to 'fix' the issue.
I had to fix a problem like this once in a systems design where we communicated to a motion control board via the ethernet bus in a PC. I ended up adding some code to monitor the status of the network connection and, only when it was established, started talking to the device via the network card.
EDIT: As SLaks pointed out in the comments, this is pretty simple in C#: The NetworkAvailabilityChanged event for your programming pleasure.
It is absolutely because of everything still starting up. Services can still be coming online long after you log in, the quick login dialog you see was an optimization in windows to let you log in while everything else still starts up.
Take note of
How to detect working internet connection in C#?
specifically a technique that avoids the loopback adapter:
System.Net.NetworkInformation.NetworkInterface.GetIsNetworkAvailable()

Serial, RS422, In C#, TxDone Event Not Firing, No Data Being Received

I am writing an application that uses OpenNETCF.IO.Serial (open source, see serial code here) for its serial communication on a Windows CE 6.0 device. This application is coded in C# in the Compact Framework 2.0. I do not believe the issue I am about to describe is specifically related to these details, but I may be proven to be wrong in that regard.
The issue I am having is that, seemingly randomly (read as: intermittent issue I cannot reliably duplicate yet), data will fail to transmit or be received until the device itself is rebooted. The Windows CE device communicates with a system that runs an entirely different application. Rebooting this other system and disconnecting/reconnecting communication cables does not appear to resolve this issue, only rebooting the Windows CE device.
The only sign of this issue occurring is a lack of a TxDone event from OpenNETCF firing (look for "TxDone();" in OpenNETCF.IO.Serial class), and no data being received, when I know for a fact that the connected system is sending data.
Any character value from 1 - 255 (0x01 - 0xFF) can be sent and received in our serial communication. Null values are discarded.
My serial settings are 38400 Baud, 8 data bits, no parity, 1 stop bit (38400, 8n1). I've set the input and output buffer sizes to 256 bytes. DataReceived event happens whenever we receive 1 or more characters, and transmission occurs when there's 1 or more bytes in the output buffer, since messages are of variable length.
No handshaking is used. Since this is RS422, only 4 signals are being used: RX+, RX-, TX+, TX-.
I receive a "DataReceived" event, I read all data from the input buffer and make my own buffer in my code to parse through it at my leisure outside of the DataReceived event. When I receive a command message, I send an quick acknowledgment message back. When the other system receives a command message from the Windows CE device, it will send a quick acknowledgment message back. Acknowledgment messages get no further replies since they're intended as a simple "Yep, got it."
In my code, I receive/transmit through multiple threads, so I use the lock keyword so I'm not transmitting multiple messages simultaneously on multiple threads. Double checking through code has shown that I am not getting hung up on any locks.
At this point, I am wondering if I am continuously missing something obvious about how serial communication works, such as if I need to set some variable or property, rather than just reading from an input buffer when not empty and writing to a transmit buffer.
Any insight, options to check, suggestions, ideas, and so on are welcome. This is something I've been wrestling with on my own for months, I hope that answers or comments I receive here can help in figuring out this issue. Thank you in advance.
Edit, 2/24/2011:
(1) I can only seem to recreate the error on boot up of the system that the Windows CE device is communicating with, and not every boot up. I also looked at the signals, common mode voltage fluctuates, but amplitude of the noise that occurs at system boot up seems unrelated to if the issue occurs or not, I've seen 25V peak-to-peak cause no issue, when 5V peak-to-peak the issue reoccurred).
Issue keeps sounding more and more hardware related, but I'm trying to figure out what can cause the symptoms I'm seeing, as none of the hardware actually appears to fail or shutdown, at least where I've been able to reach to measure signals. My apologies, but I will not be able to give any sort of part numbers of hardware parts, so please don't ask the components being used.
(2) As per #ctacke's suggestion, I ensured all transmits were going through the same location for maintainability, the thread safety I put in is essentially as follows:
lock(transmitLockObj)
{
try
{
comPort.Output = data;
}
[various catches and error handling for each]
}
(3) Getting UART OVERRUN errors, in a test where <10 bytes were being sent and received on about a 300msec time interval at 38400 Baud. Once it gets an error, it goes to the next loop iteration, and does NOT run ReadFile, and does NOT run TxDone event (or any other line checking procedures). Also, not only does closing and reopening the port do nothing to resolve this, rebooting the software while the device is still running doesn't do anything, either. Only a hardware reboot.
My DataReceived event is as follows:
try
{
byte[] input = comPort.Input; //set so Input gets FULL RX buffer
lock(bufferLockObj)
{
for (int i = 0; i < input.Length; i++)
{
_rxRawBuffer.Enqueue(input[i]);
//timer regularly checks this buffer and parses data elsewhere
//there, it is "lock(bufferLockObj){dataByte = _rxRawBuffer.Dequeue();}"
//so wait is kept short in DataReceived, while remaining safe
}
}
}
catch (Exception exc)
{
//[exception logging and handling]
//hasn't gotten here, so no point in showing
}
However, instantly after the WriteFile call did timed out the first time in the test was when I started getting UART OVERRUN errors. I honestly can't see my code causing a UART OVERRUN condition.
Thoughts? Hardware or software related, I'm checking everything I can think to check.
Everything sounds right, but your observations kind of show that they're not.
Since you've stated that you're sending from multiple threads, the first thing I'd do is put in some sort of mechanism for sending where all send requests come into one location before calling out to the serial object instance. Sure, you say that you've ensured you have thread safety, but serializing these calls through one location would help reinforce that (and make the code a bit more maintainable/extensible).
Next I'd probably add some temp handling in the Serial lib to specifically set an event or break in the debugger when you've done a Tx but the TxDone event doesn't fire within some bounding period. It's always possible that the Serial lib has a bug in it (trust me, the author of that code is far from infallible) where some race condition is getting by.
Thank you everyone who responded. We've found that this actually appears to be hardware-related. I'm afraid I can't give more information than this, but I thank everyone who contributed possible solutions or troubleshooting steps.

Categories