How to handle buffering data read from the network?

How to handle buffering data read from the network? - c#

When reading data over the network, you specify a buffer to receive the data into:
byte[] b = new byte[4096];
socket.Receive(b);
Now my first thought is of course to reuse the receive buffer by declaring it as a member variable of the class. My next issue is that I have not received all of the data that I am expecting, so I need to buffer my data. This is easy to accomplish by keeping track of the count of bytes received, and specifying the offset:
socket.Receive(m_ReceiveBuffer, count, m_ReceiveBuffer.Length - count);
Now, the issue here is that if it is still not enough, I am guessing that I need to grow the buffer, which means copying memory, and continue to receive into this buffer. Assuming that something went wrong, this buffer would continue to grow, and if big enough messages are received, would run the system out of memory.
Any ideas how to properly handle this? Is there a better way of receiving the data than just fill, copy, grow, fill, copy, grow that I am talking about?

read in chunks:
const int ChunkSize = 4096;
int bytesRead;
byte[] buffer = new byte[ChunkSize];
while ((bytesRead = socket.Receive(buffer, 0, ChunkSize, SocketFlags.None)) > 0)
{
byte[] actualBytesRead = new byte[bytesRead];
Buffer.BlockCopy(buffer, 0, actualBytesRead, 0, bytesRead);
// Do something with actualBytesRead,
// maybe add it to a list or write it to a stream somewhere
}

Before starting with SYstem.Net.Sockets.Socket, are you sure that you can't use System.Net.Sockets.TcpClient (or UdpClient) that does all the messy buffer work for you and transforms it to an easily managed stream?
If not, remember that the amount of data you recieve doesn't have to be equal to what you request, so you should always look at the return value from the recieve function. And, the only way to not run out of memory is by actually processing what you recieve.

First, separate the code into receiving the data and processing the data. The receive buffer should only hold the data until the code gets a chance to copy it out to the processing area. The processing area is where you will determine if you have received enough data to do something useful. Don't leave the data in the network receive buffer until this happens. For the network receive buffer, I think using a circular buffer will help you with your idea of reusing a buffer. Hopefully, you have an idea of the message sizes. That will help in determining the size of the buffer. If you assert (or something similar) when the read and write pointers of the circular buffer meet, then increase the size of the buffer. Once the size of the buffer is large enough, you should be able to read enough data out of the buffer into the processing code at a rate fast enough that the circular buffer doesn't overflow.

Related

memory efficient way to read from COM stream to c# byte[]

My current approach is to read the COM stream into a C# MemoryStream and then call .toArray. However, I believe toArray creates a redundant copy of the data. Is there a better way that has reduced memory usage as the priority?
var memStream = new MemoryStream(10000);
var chunk = new byte[1000];
while (true)
{
int bytesRead = comStream.read(ref chunk, chunk.Length);
if (bytesRead == 0)
break; // eos
memStream.Write(chunk, 0, bytesRead);
}
//fairly sure this creates a duplicate copy of the data
var array = memStream.ToArray();
//does this also dupe the data?
var array2 = memStream.GetBuffer();

If you know the length of the data before you start consuming it, then: you can allocate a simple byte[] and fill that in your read loop simply by incrementing an offset each read with the number of bytes read (and decrementing your "number of bytes you're allowed to touch). This does depend on having a read overload / API that accepts either an offset or a pointer, though.
If that isn't an option: GetBuffer() is your best bet - it doesn't duplicate the data; rather it hands you the current possibly oversized byte[]. Because it is oversized, you must consider it in combination with the current .Length, perhaps wrapping the length/data pair in either a ArraySegment<byte>, or a Span<byte>/Memory<byte>.
In the "the length is known" scenario, if you're happy to work with oversized buffers, you could also consider a leased array, via ArrayPool<byte>.Shared - rent one of at least that size, fill it, then constrain your segment/span to the populated part (and remember to return it to the pool when you're done).

Offset and length were out of bounds for the array

My code
private static int readinput(byte[] buff, BinaryReader reader)
{
int size = reader.ReadInt32();
reader.Read(buff, 0, size);
return size;
}
Exception in reader.Read(buff, 0, size);
The exception is offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of source collection

Take a step back and think about your code
You've written a method that takes an array of bytes. We don't know how big this array is, but it's controlled by the code calling the method. Let's assume it is 1000 bytes long
Then you read an int from somewhere else, let's assume 2000 is read
Then you attempt to read 2000 bytes into an array that can only hold 1000 bytes, you perform no checking to make sure your array is big enough, nor do you attempt to read in chunks and concatenate if it's not big enough
That's why you get the error you're getting, but as to what you should be coding, I think you need to think about that some more- maybe make the size to the buffer in response to the size int you read, or read in chunks..

The buffer buff that you passed into your function to read the data is too small. buff.Length should be bigger than or equal to your variable called size.
Set a breakpoint on "reader.Read(buff, 0, size);" and hover over buff and size and you'll see what I mean.
Make sure when you call your function, the buff you pass in is sufficient size. If you don't know what size to create a buffer for ahead of time, then change your function to look something like this:
private static byte[] ReadInput(BinaryReader reader)
{
int size = reader.ReadInt32();
return reader.ReadBytes(size);
}
Especially since you're just reading it into the beginning of a provided buffer anyways.
Summary to frame what you're currently doing:
You provided us a function which takes a binary reader (whatever position it's already at, if it's new, then position 0), it reads a 32-bit integer (4 bytes) to figure out the size of some data after it. Then you read that data of that size into a buffer. You do all of this with a buffer provided called buff. You need to be sure that whatever size data you're going to read in, the buffer provided to the function is of sufficient size. If you make the size of the buffer too large, then "reader.Read(buff, 0, size)" is only reading it into the beginning of the buffer. So if your intention was just to read the data the way you coded that function into a perfectly sized buffer, I suggest using the code above.
Just thought I'd explain it a bit more in case that helps you understand what's going on.

Get Length of Data Available in NetworkStream

I would like to be able to get the length of the data available from a TCP network stream in C# to set the size of the buffer before reading from the network stream. There is a NetworkStream.Length property but it isn't implemented yet, and I don't want to allocate an enormous size for the buffer as it would take up too much space. The only way I though of doing it would be to precede the data transfer with another telling the size, but this seems a little messy. What would be the best way for me to go about doing this.

When accessing Streams, you usually read and write data in small chunks (e.g. a kilobyte or so), or use a method like CopyTo that does that for you.
This is an example using CopyTo to copy the contents of a stream to another stream and return it as a byte[] from a method, using an automatically-sized buffer.
using (MemoryStream ms = new MemoryStream())
{
networkStream.CopyTo(ms);
return ms.ToArray();
}
This is code that reads data in the same way, but more manually, which might be better for you to work with, depending on what you're doing with the data:
byte[] buffer = new byte[2048]; // read in chunks of 2KB
int bytesRead;
while((bytesRead = networkStream.Read(buffer, 0, buffer.Length)) > 0)
{
//do something with data in buffer, up to the size indicated by bytesRead
}
(the basis for these code snippets came from Most efficient way of reading data from a stream)

There is no inherent length of a network stream. You will either have to send the length of the data to follow from the other end or read all of the incoming data into a different stream where you can access the length information.

The thing is, you can't really be sure all the data is read by the socket yet, more data might come in at any time. This is try even if you somehow do know how much data to expect, say if you have a package header that contains the length. the whole packet might not be received yet.
If you're reading arbitrary data (like a file perhaps) you should have a buffer of reasonable size (like 1k-10k or whatever you find to be optimal for your scenario) and then write the data to a file as its read from the stream.
var buffer = byte[1000];
var readBytes = 0;
using(var netstream = GetTheStreamSomhow()){
using(var fileStream = (GetFileStreamSomeHow())){
while(netstream.Socket.Connected) //determine if there is more data, here we read until the socket is closed
{
readBytes = netstream.Read(buffer,0,buffer.Length);
fileStrem.Write(buffer,0,buffer.Length);
}
}
}
Or just use CopyTo like Tim suggested :) Just make sure that all the data has indeed been read, including data that hasn't gotten across the network yet.

You could send the lenght of the incoming data first.
For example:
You have data = byte[16] you want to send. So at first you send the 16 and define on the server, that this length is always 2 (because 16 has two characters). Now you know that the incomingLength = 16. You can wait now for data of the lenght incomingLength.

When does Socket.Receive return the data?

Beginner Question again: Kind of a follow up to a question I asked not long ago.
I am trying to understand this synchronous socket tutorial http://msdn.microsoft.com/en-us/library/6y0e13d3.aspx, particularly one single line in the code below.
QUESTION:
I want to make sure I am understanding the program flow right . When does handler.Receive(bytes) return? Does it return and store the number of bytes received in int bytesRec**when it "overflows" and has received more than 1024bytes? **And if that is so, and this might sound silly, what happens if MORE bytes arrive as it is storing the 1024 bytes in the *data*variable and not listening for more bytes that might be arriving at that time? Or should I not worry about it and let .net take care of that?
Socket handler = listener.Accept();
data = null;
// An incoming connection needs to be processed.
while (true) {
bytes = new byte[1024];
int bytesRec = handler.Receive(bytes);
// My question is WHEN does the following line
// get to be executed
data += Encoding.ASCII.GetString(bytes,0,bytesRec);
if (data.IndexOf("<EOF>") > -1) {
break;
}
}

When does handler.Receive(bytes) return?
Documentation:
If no data is available for reading, the Receive method will block
until data is available, unless a time-out value was set by using
Socket.ReceiveTimeout. If the time-out value was exceeded, the Receive
call will throw a SocketException. If you are in non-blocking mode,
and there is no data available in the in the protocol stack buffer,
the Receive method will complete immediately and throw a
SocketException. You can use the Available property to determine if
data is available for reading. When Available is non-zero, retry the
receive operation.
Does it return and store the number of bytes received in int
bytesRec when it "overflows" and has received more than 1024 bytes?
No, it always returns the number of bytes that have been read. If it didn't, how could you know what part of bytes contains meaningful data and what part of it has remained unused?
It is very important to understand how sockets typically work: bytes may be arriving in packets, but as far as the receiver is concerned each byte should be considered independently. This means that there is no guarantee that you will get the bytes in the chunks the sender sent them, and of course there is no guarantee that there is enough data to fill up your buffer.
If you only want to process incoming data in 1024-byte chunks, it is your own responsibility to keep calling Receive until it has released 1024 bytes in total to you.
And if that is so, and this might sound silly, what happens if MORE
bytes arrive as it is storing the 1024 bytes in the variable and
not listening for more bytes that might be arriving at that time?
Let's reiterate that Receive will not store 1024 bytes in the buffer because that's the buffer's size. It will store up to 1024 bytes.
If there is more data buffered internally by the network stack than your buffer can hold then 1024 bytes will be given back to you and the rest will stay in the network stack's buffers until you call Receive again. If Receive has begun copying data to your buffer and at that moment more data is received from the network then most likely what's going to happen is that these will have to wait for the next Receive call.
After all, at no point did anyone provide a guarantee that Receive would give you all the data it can (although of course that is desirable and it is what happens most of the time).

How to "concatenate" or "combine" or "join" a series of 'binarily' serialized byte arrays? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Is the received stream from a socket limited to a single send command?
Note: I see this question very complicated (hopefully not for you guys, that's why Im asking here lol) and I tried my best to explain it as simple and clear as possible.
In my application, I'm continually receiving byte arrays in a fix sized buffer.
These series of byte arrays that I'm receiving has been serialized 'binarily'.
However, sometimes the byte array received will be bigger than the fix sized buffer so I would need to store the current received byte array into a container and loop again to receive the remaining byte arrays coming in.
My question now is how to "concatenate" or "combine" or "join" all the "batches" of byte arrays I received ( and is stored in a container, possibly a queue of byte arrays) to form a single byte array and then de-serialize them?
int bytesRead = client.EndReceive(ar);
if (bytesRead > 0)
{
// There might be more data, so store the data received so far.
// If the buffer was not filled, I have to get the number of bytes received as Thorsten Dittmar was saying, before queuing it
dataReceivedQueue.Enqueue(state.buffer);
// Get the rest of the data.
client.BeginReceive(state.buffer, 0, StateObject.BufferSize, 0,
new AsyncCallback(ReceiveCallback_onQuery), state);
}
else
{
// All the data has arrived; put it in response.
response_onQueryHistory = ByteArrayToObject(functionThatCombinesBytes(dataReceivedQueue));
// Signal that all bytes have been received.
receiveDoneQuery.Set();
}
state.buffer is buffer where data are received. buffer is a byte array of size 4096. state is of type StateObject.
ByteArrayToObject(byte []) takes care of deserializing the data received and converting it back to its object form
functionThatCombinesBytes(Queue) this function will receive a Queue of bytes and will "combine" all the bytes into one byte array

Just because you are calling BeginReceive with a buffer of a particular size, doesn't mean that it will necessarily entirely fill the buffer, so it's very likely that some of your queued buffers will actually only be partially filled with received data, and the remainder being zeros, this will almost certainly corrupt your combined stream if you simply concatenate them together since you're not also storing the number of bytes actually read into the buffer. You also appear to be reusing the same buffer each time, so you'll just be overwriting already-read data with new data.
I would therefore suggest replacing your dataReceivedQueue with a MemoryStream, and using something like:
if (bytesRead > 0)
{
// There might be more data, so store the data received so far.
memoryStream.Write(state.buffer, 0, bytesRead);
// Get the rest of the data.
client.BeginReceive(state.buffer, 0, StateObject.BufferSize, 0,
new AsyncCallback(ReceiveCallback_onQuery), state);
}
else
{
// All the data has arrived; put it in response.
response_onQueryHistory = ByteArrayToObject(memoryStream.ToArray());
// Signal that all bytes have been received.
receiveDoneQuery.Set();
}

First of all, unless your dataReceivedQueue's type implements its own (or overrides Queue's) Enqueue method, your state.buffer would be rewritten with each client.BeginReceive call.
You can simply add a MemoryStream member to your StateObject and append bytes to it as they come:
state.rawData.Seek(0, SeekOrigin.End);
state.rawData.Write(state.buffer, 0, bytesRead);

First of all, you need to not only store the byte array, but also the number of bytes in the arrays that are actually valid. For example, each receive may not fully fill the buffer, thus the number of bytes is returned (bytesRead in your code).
If you had this, you could calculate the size of the final buffer by summing up the number of received bytes for each "batch".
After that you can - in a loop - use Array.Copy to copy a "batch" to a specified position with a specified length into the target array.
For example, this could look like this:
// Batch is a class that contains the batch byte buffer and the number of bytes valid
int destinationPos = 0;
byte[] destination = new byte[<number of bytes in total>];
foreach (Batch b in batches)
{
Array.Copy(b.Bytes, 0, destination, destinationPos, b.ValidLength);
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.