How to place a delimiter in a NetworkStream byte array?

How to place a delimiter in a NetworkStream byte array? - c#

I'm setting up a way to communicate between a server and a client. How I am working it at the moment, is that a stream's first byte will contain an indicator of what is coming and then looking up that request's class I can determine the length of the request:
stream.Read(message, 0, 1)
if(message == <byte representation of a known class>)
{
stream.Read(message, 0, Class.RequestSize);
}
I'm curious how to handle the case of when the class size is not known, of if after reading a known request the data is corrupt.
I'm thinking that I can insert in some sort of delimiter into the stream, but since a byte can only be between 0-255, I'm not sure how to go about creating a unique delimiter. Do I want to place a pattern into the stream to represent the end of a message? How can I be sure that this pattern is unique enough to not be mistaken for actual data?

There are different approaches on this. One option would be sending the length of the class name and possible of the whole packet first (e.g. always the first byte). This way you can read just read that byte and then n bytes more to get the class name.
By this approach you don't end up reading a lot of stuff a malicious client sends you with the intent to DoS your application and you can quickly determine if you read enough to handle the packet or if it's not yet complete.

There are some low level bytes which are used especially as delimiters. Start of Text and End of Text have a (hex) value of 0x02 and 0x03 respectively. And you have Start of Heading coupled with End of Transmission, 0x01 and 0x04; you could use these.

Related

Creating an Integer Format?

I'm venturing into networking using C# and I'm trying to create a clean way to send my packets, right now though I'm not going to worry with all of the Packets enclosed in special characters stuff that I've been reading about, instead the packet is three digit number dedicated to the front of the data passed to the client. For example, a data string may be the following.
LoginPacket is packet 000.
LoginPacket Data would be "000Username~Password"
I've tried to clean this up, so I could just write things in a cleaner manner, and try something like this
SendPacket(000, new string { "data", "parameters" });
However, sending the integer 000 is instantly converted to zero.
Is there a way around this, or would I be better off storing it all in a string, such as
SendPacket(new string { "000", "data", "params" } );

When you convert the number to text you need to specify the number of digits. The number 000 and 0 are both zero. However the string "000" and "0" are different strings.
Use
n.ToString("000");
To ensure you get three digits.

I would suggest you go with a Command Type followed by a Length followed by the Payload
Then your payloads can be of a similar structure.. so the login command (0) would use a structure that began with a length byte followed by username followed by a second length byte and finally followed by a password.
For example:
0155Dan-o8password
Remember that this all comes over the wire as a byte array.. so you read the first 4 bytes (Int32, Command Type)
Then read the next Int32 to figure out the length of the payload.. that's how many bytes you will read in your third read.
Now that you know the Command is login you can implement login-specific reading.
In addition I would suggest you create some extension methods to make this easier.
like: Stream.ReadByte, Stream.ReadInt32, Stream.ReadInt64, Stream.ReadString(length)
Then some application-specific extensions.. like Stream.ReadLogin

how do you account for when TCP does not get all the bytes in one read

I just read an article that says TCPClient.Read() may not get all the sent bytes in one read. How do you account for this?
For example, the server can write a string to the tcp stream. The client reads half of the string's bytes, and then reads the other half in another read call.
how do you know when you need to combine the byte arrays received in both calls?

how do you know when you need to combine the byte arrays received in both calls?
You need to decide this at the protocol level. There are four common models:
Close-on-finish: each side can only send a single "message" per connection. After sending the message, they close the sending side of the socket. The receiving side keeps reading until it reaches the end of the stream.
Length-prefixing: Before each message, include the number of bytes in the message. This could be in a fixed-length format (e.g. always 4 bytes) or some compressed format (e.g. 7 bits of size data per byte, top bit set for the final byte of size data). Then there's the message itself. The receiving code will read the size, then read that many bytes.
Chunking: Like length-prefixing, but in smaller chunks. Each chunk is length-prefixed, with a final chunk indicating "end of message"
End-of-message signal: Keep reading until you see the terminator for the message. This can be a pain if the message has to be able to include arbitrary data, as you'd need to include an escaping mechanism in order to represent the terminator data within the message.
Additionally, less commonly, there are protocols where each message is always a particular size - in which case you just need to keep going until you've read that much data.
In all of these cases, you basically need to loop, reading data into some sort of buffer until you've got enough of it, however you determine that. You should always use the return value of Read to note how many bytes you actually read, and always check whether it's 0, in which case you've reached the end of the stream.
Also note that this doesn't just affect network streams - for anything other than a local MemoryStream (which will always read as much data as you ask for in one go, if it's in the stream at all), you should assume that data may only become available over the course of multiple calls.

You should call read() in a loop. The condition of that loop would check if there is still any data available to be read.

That is kinda hard to answer, because you can never know when data will arrive, and thats why I usually use a thread for receiving data in my chat program. But you should be able to use something similar to this:
do{
numberOfBytesRead = myNetworkStream.Read(myReadBuffer,
0,
myReadBuffer.Length);
myCompleteMessage.AppendFormat("{0}",
Encoding.ASCII.GetString(myReadBuffer, 0, numberOfBytesRead));
}
while(myNetworkStream.DataAvailable);
Look at this source!

Fragmented length prefix causes next data read from buffer use incorrect message length

I'm one of those guys who come here to find answers to those questions that others have asked, and I think i newer asked anything myself, but after two days searching unsuccessfully I decided that it's time to ask something myself. So here it is...
I have a TCP server and client written in C#, .NET 4, asynchronous sockets using SocketAsyncEventArgs. I have a length-prefixed message framing protocol. Overall everything works just fine, but one issue keeps bugging me.
Situation is like this (I will use small numbers just as an example):
Lets say Server has a Send buffer length of 16 bytes.
It sends a message which is 6 bytes long, and prefixes it with 4 bytes long length prefix. Total message length is 6+4=10.
Client reads the data and receives a buffer of 16 bytes length (yes 10 bytes of data and 6 bytes equal to zero).
Received buffer looks like this: 6 0 0 0 56 21 33 1 5 7 0 0 0 0 0 0
So I read first 4 bytes which is my length prefix, I determine that my message is 6 bytes long, I read it as well and everything is fine so far. Then i have 16-10=6 bytes left to read. All of them are zeroes I read 4 of them, since it's my length prefix. So it's a zero length message which is allowed as keep-alive packet.
Remaining data to read: 0 0
Now the issue "kicks in". I got only 2 remaining bytes to read, they are not enough to complete a 4 byte-long length prefix buffer. So I read those 2 bytes, and wait for more incoming data. Now server is not aware that I'm still reading length prefix (I'm just reading all those zeroes in the buffer) and sends another message correctly prefixed with 4 bytes. And the client is assuming the server sends those missing 2 bytes. I receive the data on the client side, and read first two bytes to form a complete 4 byte length buffer. The results are something like that
lengthBuffer = new byte[4]{0, 0, 42, 0}
Which then translates into 2752512 message length. So my code will continue to read next 2752512 bytes to complete the message...
So in every single message framing example I have seen zero length messages are supported as keep-alive's. And every example I've seen doesn't do anything more than I do. The problem is that I do not know how much data I have to read when I receive it from the server. Since I have partially-filled buffer with zeroes, I have to read it all as those zeroes could be keep-alive's I sent from the other end of connection.
I could drop zero-length messages and stop reading the buffer after first empty message and it should fix this issue, and use custom messages for my keep-alive mechanism. But I want to know if I am missing something, or doing something wrong, since every code example I've seen seems to have same issue (?)
UPDATE
Marc Gravell, you sir pulled words out of my mouth. Was about to update that the issue is with sending the data. The problem is that initially when exploring .NET Sockets and SocketAsyncEventArgs I came across this sample: http://archive.msdn.microsoft.com/nclsamples/Wiki/View.aspx?title=socket%20performance
It uses reusable pool of buffers. Simply takes predefined number of maximum client connections allowed, for example 10, takes maximum single buffer size, for example 512, and creates one large buffer for all of them. So 512 * 10 * 2 (for send and receive) = 10240
So we have byte[] buff = new byte[10240];
Then for each client that connects it assigns a piece of this large buffer. First connected client gets first 512 bytes for Data Reading operations, and gets next 512 bytes (offset 512) for Data Sending operations. Therefore the code ended up having already allocated Send buffer which size is 512 (exactly the number the client later receives as BytesTransferred). This buffer is populated with data, and all remaining space out of these 512 bytes is sent as zeroes.
Strange enough this example is from msdn. The reason there is a single huge buffer is to avoid fragmented heap memory, when buffer gets pinned and GC cant collect it or something like that.
Comment from BufferManager.cs in the provided example (see link above):
This class creates a single large buffer which can be divided up and
assigned to SocketAsyncEventArgs objects for use with each socket I/O
operation. This enables bufffers to be easily reused and gaurds
against fragmenting heap memory.
So the issue is pretty much clear. Any suggestions on how I should resolve this are welcome :) Is it true what they say about fragmented heap memory, is it OK to create a data buffer "on the fly"? If so, will I have memory issues when the server scales to a few hundred or even thousands of clients?

I guess the problem is that you are treating the trailing zeros in the buffer you read as data. This is not data. It is garbage. No one ever sent it to you.
The Stream.Read call returns you the number of bytes actually read. You should not interpret the rest of the buffer in any way.
The problem is that I do not know how much data I have to read when I
receive it from the server.
Yes, you do: Use the return value from Stream.Read.

That sounds simply like a bug in either your send or receive code. You should only get BytesTransferred as the data that was actually sent, or some number smaller than that if arriving in fragments. The first thing I would wonder is: did you setup the send correctly? i.e. if you have an oversized buffer, a correct implementation might look like:
args.SetBuffer(buffer, 0, actualBytesToSend);
if (!socket.SendAsync(args)) { /* whatever */ }
where actualBytesToSend can be much less than buffer.Length. My initial suspicion is that
you are doing something like:
args.SetBuffer(buffer, 0, buffer.Length);
and therefore sending more data than you have actually populated.
I should emphasize: there is something wrong in either your send or receive; I do not believe, at least without an example, that there is some fundamental underlying bug in the BCL here - I use the async API extensively, and it works fine - but you do need to accurately track the data you are sending and receiving at all points.

"Now server is not aware that I'm still reading length prefix (I'm just reading all those zeroes in the buffer) and sends another message correctly prefixed with 4 bytes.".
Why? How does the server know what you are and aren't reading? If the server retransmits any part of a message it is in error. TCP already does that for you.
There seems to be something radically wrong with your server.

C# best way to differentiate socket messages

Im new to sockets, and Im creating a tictactoe online, I know how to make the connections with the clients and the server, but I will make a chat too.
Then I doing this, when a user chat I send a message with a prefix "CHAT: HELLO WORLD"
and when a user make a move I send a message without the prefix... this is the best way?
THX!!!

In defining a wire protocol over a stream-based protocol like TCP, you have a few options for constructing messages:
Fixed-length
All messages are the same length; every sequence of x bytes represents a new message.
Length-prefixed (variable length)
The first byte(s) of the message represent the length of the payload to follow.
String-terminated (variable length)
Read bytes from the stream until you come to a specified byte-string that represents the end of a message, i.e. the newline character \n.
If you ever intend on changing the protocol (protip: you will, even if you don't think you will), it is crucial that you include an identifier for the protocol version in each message to prevent issues when dealing with clients using an older iteration of the protocol. Clearly, this is the first thing you must determine before deciphering the rest of the payload, so this should be the first byte(s) of the message (following any length-prefix) - how could we determine the version if we don't know where it is located in every message we receive?

Typically you would go with a format that includes a packet length, type and payload.
In your case you could go with a Byte (type), Int16 (length), Byte[] (payload).
The type can be represented in code as an enum. Length would just represent the length of the payload.
public enum Byte PacketType {
PlayerMove = 1,
PlayerChat = 2
}

You need to define a protocol. Remember to allow room for additional features :-).
Eg. using regular expressions over complete lines (end with selected line terminator):
Matching ^:[a-c][1-3]:: is a move (colon, position, colon user name).
Matching ^!.*?:: is a chat message (exclamation point, name, colon, text).
and anything else (in V1) is an error.
Remember:
Data is sent in packets, you might need multiple reads from the socket to get a complete message.
Avoid ambiguity: resolving it might be x or y is hard.
Specify a text encoding (eg. UTF-8).

I assume you're using TCP?
You need to make sure you 'frame' both messages so you can identify them and also avoid potential blocking issues (in case the client stops sending while you are still expecting to read CHAT: or whatever you define). With TCP your byte order is guaranteed but reading does not guarantee a complete 'packet' so you'll need to implement some way of building up a buffer and identifying when your 'message' is complete.
A reasonably simple way of doing this is to make sure each 'message' has a header with the type and size specified.
EG:
Enumerate your message types (move and chat currently), so say 'chat' is 0x01 and your message is 1020 bytes. You can prefix your 'message' with 0x0103FC so the server knows how many bytes to expect, and build up a buffer using async socket calls until the 1020 bytes are read (or you arbitrarily decide that the client is not sending anymore)

SslStream equivalent of TcpClient.Available?

Based on the advice of #Len-Holgate in this question, I'm asynchronously requesting 0-byte reads, and in the callback, accept bytes the available bytes with synchronous reads, since I know the data is available and won't block. This seems so efficient and wonderful.
But then I add the option for SslStream, and the approach falls apart. The zero-byte read is fine, but the SslStream decrypts the bytes, leaving a zero byte-count in the TcpClient's buffer (appropriately so), and I cannot determine how many bytes are now in the SslStream available for reading.
Is there a simple trick around this?
Some code, just for context:
sslStream.BeginRead(this.zeroByteBuffer, 0, 0, DataAvailable, this);
And after the EndRead() ( which correctly returns 0 ), DataAvailable contains:
// by now this is 0, because sslStream has already consumed the bytes
available = myTcpClient.Available;
if (0 < available) // Never occurs
{
// this part can be distractingly complicated, but
// it's based on the available byte count
sslStream.Read(...);
}
And due to the protocol, I need to evaluate byte-by-byte and decode variable byte-width unicode and stuff. I don't want to have to read byte-by-byte asynchronously!

If I understood correctly, your messages are delimited by a certain character, and you are already using a StringBuilder to cover the case when a message is fragmented into multiple pieces.
You could consider ignoring the delimiter when reading data, adding any data to it when it becomes available, and then inspecting the local StringBuilder for the delimiter character. When found, you can extract a single message using sb.ToString(0, delimiterIndex) and sb.Remove(0, delimiterIndex) until no delimiters remain.
This would also cover the case when two messages are received simultaneously.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.