C# socket image transfer: file sometimes partially transferred

C# socket image transfer: file sometimes partially transferred - c#

Problem
I have PHP client which sends a image file to a C# socket server. My problem is about 30% of the time the file is partially transferred and stops.
PHP END ->
$file = file_get_contents('a.bmp');
socket_write($socket,$file);
C# END ->
int l= Socket.Receive(buffer, buffer.Length, SocketFlags.None);
//create the file using a file stream
How can I always transfer the full file without intermediate states? And why does it happen?

From the documentation for Socket.Receive:
If you are using a connection-oriented Socket, the Receive method will read as much data as is available, up to the number of bytes specified by the size parameter. If the remote host shuts down the Socket connection with the Shutdown method, and all available data has been received, the Receive method will complete immediately and return zero bytes.
This means you may get less than the total amount. This is just the way sockets work.
So if if you get a partial read, you should call Socket.Receive again. You can use the overload of Socket.Receive to continue reading into the same buffer.
Here is an article that shows how "keep reading" until you get what you want:
Socket Send and Receive
If you don't know how big the data is, you must keep reading until Socket.Receive returns zero.

Related

How does a TCP packet arrive when using the Socket api in C#

I have been reading about TCP packet and how they can be split up any number of times during their voyage. I took this to assume I would have to implement some kind of buffer on top of the buffer used for the actual network traffic in order to store each ReceiveAsync() until enough data is available to parse a message. BTW, I am sending length-prefixed, protobuf-serialized messages over TCP.
Then I read that the lower layers (ethernet?, IP?) will actually re-assemble packets transparently.
My question is, in C#, am I guaranteed to receive a full "message" over TCP? In other words, if I send 32 bytes, will I necessarily receive those 32 bytes in "one-go" (one call to ReceiveAsync())? Or do I have to "store" each receive until the number of bytes received is equal to the length-prefix?
Also, could I receive more than one message in a single call to ReceiveAsync()? Say one "protobuf message" is 32 bytes. I send 2 of them. Could I potentially receive 48 bytes in "one go" and then 16 in another?
I know this question shows up easily on google, but I can never tell if it's in the correct context (talking about the actual TCP protocol, or how C# will expose network traffic to the programmer).
Thanks.

TCP is a stream protocol - it transmits a stream of bytes. That's all. Absolutely no message framing / grouping is implied. In fact, you should forget that Ethernet packets or IP datagrams even exist when writing code using a TCP socket.
You may find yourself with 1 byte available, or 10,000 bytes available to read. The beauty of the (synchronous) Berkeley sockets API is that you, as an application programmer don't need to worry about this. Since you're using a length-prefixed message format (good job!) simply recv() as many bytes as you're expecting. If there are more bytes available than the application requests, the kernel will keep the rest buffered until the next call. If there are fewer bytes available than required, the thread will either block or the call will indicate that fewer bytes were received. In this case, you can simply sleep again until data is available.
The problem with async APIs is that it requires the application to track a lot more state itself. Even this Microsoft example of Asynchronous Client Sockets is far more complicated than it needs to be. With async APIs, you still control the amount of data you're requesting from the kernel, but when your async callback is fired, you then need to know the next amount of data to request.
Note that the C# async/await in 4.5 make asynchronous processing easier, as you can do so in a synchronous way. Have a look at this answer where the author comments:
Socket.ReceiveAsync is a strange one. It has nothing to do with async/await features in .net4.5. It was designed as an alternative socket API that wouldn't thrash memory as hard as BeginReceive/EndReceive, and only needs to be used in the most hardcore of server apps.

TCP is a stream-based octet protocol. So, from the application's perspective, you can only read or write bytes to the stream.
I have been reading about TCP packet and how they can be split up any number of times during their voyage.
TCP packets are a network implementation detail. They're used for efficiency (it would be very inefficient to send one byte at a time). Packet fragmentation is done at the device driver / hardware level, and is never exposed to applications. An application never knows what a "packet" is or where its boundaries are.
I took this to assume I would have to implement some kind of buffer on top of the buffer used for the actual network traffic in order to store each ReceiveAsync() until enough data is available to parse a message.
Yes. Because "message" is not a TCP concept. It's purely an application concept. Most application protocols do define a kind of "message" because it's easier to reason about.
Some application protocols, however, do not define the concept of a "message"; they treat the TCP stream as an actual stream, not a sequence of messages.
In order to support both kinds of application protocols, TCP/IP APIs have to be stream-based.
BTW, I am sending length-prefixed, protobuf-serialized messages over TCP.
That's good. Length prefixing is much easier to deal with than the alternatives, IMO.
My question is, in C#, am I guaranteed to receive a full "message" over TCP?
No.
Or do I have to "store" each receive until the number of bytes received is equal to the length-prefix? Also, could I receive more than one message in a single call to ReceiveAsync()?
Yes, and yes.
Even more fun:
You can get only part of your length prefix (assuming a multi-byte length prefix).
You can get any number of messages at once.
Your buffer can contain part of a message, or part of a message's length prefix.
The next read may not finish the current message, or even the current message's length prefix.
For more information on the details, see my TCP/IP .NET FAQ, particularly the sections on message framing and some example code for length-prefixed messages.
I strongly recommend using only asynchronous APIs in production; the synchronous alternative of having two threads per connection negatively impacts scalability.
Oh, and I also always recommend using SignalR if possible. Raw TCP/IP socket programming is always complex.

My question is, in C#, am I guaranteed to receive a full "message" over TCP?
No. You will not receive a full message. A single send does not result in a single receive. You must keep reading on the receiving side until you have received everything you need.
See the example here, it keeps the read data in a buffer and keeps checking to see if there is more data to be read:
private static void ReceiveCallback(IAsyncResult ar)
{
try
{
// Retrieve the state object and the client socket
// from the asynchronous state object.
StateObject state = (StateObject)ar.AsyncState;
Socket client = state.workSocket;
// Read data from the remote device.
int bytesRead = client.EndReceive(ar);
if (bytesRead > 0)
{
// There might be more data, so store the data received so far.
state.sb.Append(Encoding.ASCII.GetString(state.buffer, 0, bytesRead));
// Get the rest of the data.
client.BeginReceive(state.buffer, 0, StateObject.BufferSize, 0,
new AsyncCallback(ReceiveCallback), state);
}
else
{
// All the data has arrived; put it in response.
if (state.sb.Length > 1)
{
response = state.sb.ToString();
}
// Signal that all bytes have been received.
receiveDone.Set();
}
}
catch (Exception e)
{
Console.WriteLine(e.ToString());
}
}
See this MSDN article and this article for more details. The 2nd link goes into more details and it also has sample code.

TCP client to server corrupts data

This is very weird.
When I send data to a TCP server, with a TCP client. It corrupts the data, for some extremely odd, and quite annoying reason.
Here is the server code:
TcpListener lis = new TcpListener(IPAddress.Any, 9380); // it needs to be 9380, crucial
lis.Start();
Socket sk = lis.AcceptSocket();
byte[] pd = new byte[sk.ReceiveBufferSize];
sk.Receive(pd);
// cd is the current directory, filename is the file, ex picture.png,
// that was previously sent to the server with UDP.
File.WriteAllBytes(Path.Combine(cd, filename), pd);
Here is the client code:
// disregard "filename" var, it was previously assigned in earlier code
byte[] p = File.ReadAllBytes(filename)
TcpClient client = new TcpClient();
client.Connect(IPAddress.Parse(getIP()), 9380);
Stream st = client.GetStream();
st.Write(p, 0, p.Length);
What happens is data loss. Extreme, sometimes I would upload a 5KB file, but when the server receives and writes the file I send it, it would turn out crazy like 2 bytes. Or it would be 8KB instead! Applications send through this won't even run, pictures show errors, ect.
I would like to note, however, with this, client -> server fails, on the other hand, server -> client works. Strange.
Btw, in case you are interested this is for sending files... Also, using .NET 4.5. Also, my network is extremely reliable.

Hi I think you have some misconceptions about TCP.
I can see you are setting up a server with a receive buffer of x bytes.
Firstly have you checked to see how many bytes that is? I suspect it is very small something like 1024 or something.
When writing data over TCP the data is split into frames. Each time you receive you will get some of the data sent and it will tell you how much of the data has been received . As I can see from your use case you do not know the size of the data to receive so you will have to build up a protocol between your client and server to communicate this. The simplest of such would be to write a 4 byte integer specifying the size of the data to be received.
The communication would go like this.
Client:
Write 4 bytes (file size)
Write Data bytes
Server:
Read 4 bytes (file size)
While we have yet to receive file size read
the stream and push bytes into memory/file stream

How does HttpWebRequest GetResponseStream work

Say I create an HttpWebRequest and call it's GetResponseStream method. When I attempt to read the stream has all of the data already been copied to a local buffer, or does it work by reading it as it comes across the wire?

The GetResponseStream method returns a specific implementation of the Stream abstract class which is a NetworkStream. This type of stream is bound to a socket. It is a pointer to the TCP/IP socket stream. If the server has written some data to it, when the client starts reading from this stream it will read only the data that is available on the socket and block if you attempt to read more data than what is currently available until the server writes more data to the socket.
So if we assume that the server has already written 5 bytes to the socket, if you attempt to read 5 bytes from the stream on the client, you will be able to retrieve those 5 bytes, but if you attempt to read 6 bytes the read operation will block until the server sends one more byte or until it times out.

If you look at that example you'll see that you need to call .GetResponse() first, which will answer your question.

C# socket blocking behavior

My situation is this : I have a C# tcp socket through which I receive structured messages consisting of a 3 byte header and a variable size payload. The tcp data is routed through a network of tunnels and is occasionally susceptible to fragmentation. The solution to this is to perform a blocking read of 3 bytes for the header and a blocking read of N bytes for the variable size payload (the value of N is in the header). The problem I'm experiencing is that occasionally, the blocking receive operation returns a partial packet. That is, it reads a volume of bytes less than the number I explicitly set in the receive call. After some debugging, it appears that the number of bytes it returns is equal to the number of bytes in the Available property of the socket before the receive op.
This behavior is contrary to my expectation. If the socket is blocking and I explicitly set the number of bytes to receive, shouldn't the socket block until it recv's those bytes?, any help, pointers, etc would be much appreciated.

The behaviour depends on the type of socket you're using. TCP is a Connection-Oriented Socket, which means:
If you are using a connection-oriented Socket, the Receive method will read as much data as is available, up to the number of bytes specified by the size parameter. If the remote host shuts down the Socket connection with the Shutdown method, and all available data has been received, the Receive method will complete immediately and return zero bytes.
When using TCP sockets, you have to be prepared for this possibility; check the return value of the Receive method, and if it was less than what you expected, Receive again until either the socket is closed or you've actually received as much data as you need.

How does NetworkStream work in two directions?

I've read an example of a Tcp Echo Server and some things are unclear to me.
TcpClient client = null;
NetworkStream netStream = null;
try {
client = listener.AcceptTcpClient();
netStream = client.GetStream();
int totalBytesEchoed = 0;
while ((bytesRcvd = netStream.Read(rcvBuffer, 0, rcvBuffer.Length)) > 0) {
netStream.Write(rcvBuffer, 0, bytesRcvd);
totalBytesEchoed += bytesRcvd;
}
netStream.Close();
client.Close();
} catch {
netStream.Close();
}
When the server receives a packet (the while loop), he reads the data into rcvBuffer and writes it to the stream.
What confuses me is the chronological order of messages in communication. Is the data which was written with netStream.Write() sent immediately to the client (who may even still be sending), or only after the data which is already written to the stream (by client) processed.
The following question may even clarify the previous: If a client sends some data by writing to the stream, is that data moved to the message queue on the server side waiting to be read so the stream is actually "empty"? That would explain why the server can immediately write to stream - because the data which comes from the stream is actually buffered elsewhere...?

A TCP connection is, in principal, full duplex. So you are dealing with 2 separate channels and yes, both sides could be writing at the same time.

Hint: The method call NetworkStream.Read is blocking in that example.
The book is absolutely correct -- raw access to TCP streams does not imply any sort of extra "chunking" and, in this example for instance, a single byte could easily be processed at a time. However, performing the reading and writing in batches (normally with exposed buffers) can allow for more efficient processing (often as a result of less system calls). The network layer and network hardware also employ there own forms of buffers.
There is actually no guarantee that data written from Write() will actually be written before more Reads() successfully complete: even if data is flushed in one layer it does not imply it is flushed in another and there is absolutely no guarantee that the data has made its way back over to the client. This is where higher-level protocols come into play.
With this echo example the data is simply shoved through as fast as it can be. Both the Write and the Read will block based upon the underlying network stack (the send and receive buffers in particular), each with their own series of buffers.
[This simplifies things a bit of course -- one could always look at the TCP [protocol] itself which does impose transmission characteristics on the actual packet flow.]

You are right that technically when performing Read() operation, you are not reading bits off the wire. You are basically reading buffered data (chunks received by a TCP and arranged in a correct order). When sending you can Flush() that should in theory should send data immediately, but modern TCP stacks have a bit of logic how to gather data in appropriate size packets and burst them to the wire.
As Henk Holterman explained, TCP is a full duplex protocol (if supported by all underlying infrastructure), so sending and receiving data is more of when you server/client reads and writes data. It's not like when you server send data, a client will read it immediately. Client can be sending it's own data and then perform Read(), in this case data will stay in network buffer longer and can be discarded after some time it no-one want to read it. At least I've experienced this when dealing with my supa dupa server/client library (-.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.