var buffer = new byte[short.MaxValue];
var splitString = new string[] {"\r\n"};
while (_tcpClient.Connected)
{
if (!_networkStream.CanRead || !_networkStream.DataAvailable)
continue;
var bytesRead = _networkStream.Read(buffer, 0, buffer.Length);
var stringBuffer = Encoding.ASCII.GetString(buffer, 0, bytesRead);
var messages =
stringBuffer.Split(splitString, StringSplitOptions.RemoveEmptyEntries);
foreach (var message in messages)
{
if (MessageReceived != null)
{
MessageReceived(this, new SimpleTextClientEventArgs(message));
}
}
}
Problem is that even with a buffer as big as short.MaxValue, you can actually fill the buffer. When you split the string that you create from the buffer, the last string gets chomped, and the rest of it comes with the next read.
I was thinking of creating a buffer large enough for a single line (which according to RFC2812 is 512 chars), extracting a substring up until the first "\r\n", then array-copying the rest of the data to the beginning of the buffer and using the offset parameter to read more data onto the end of the data that wasn't extracted last iteration. Sorry if that was hard to follow...
Is that the best solution, or am I missing the obvious here?
You're dealing with TCP/IP, which means you're dealing with stream data. You must not rely on how the data comes in terms of whether one call to Read will give you the whole of the data or not. In a case like this, you probably want to just keep reading (it will block until there's some data) and find convert the binary data into a text buffer. When you see a line terminator in the text buffer, you can notify the higher level of that message, and remove it from the buffer... but don't assume anything about what comes after that message. You may well still have more data to read.
As a side-note, is IRC really only ASCII? If so, that at least makes things a bit simpler...
So here's how I ended up solving it:
var buffer = new byte[Resources.MaxBufferSize];
var contentLength = 0;
while (_tcpClient.Connected)
{
if (!_networkStream.CanRead || !_networkStream.DataAvailable)
continue;
var bytesRead = _networkStream.Read(buffer, contentLength, buffer.Length - contentLength - 1);
contentLength += bytesRead;
var message = string.Empty;
do
{
message = ExtractMessage(ref buffer, ref contentLength);
if (!String.IsNullOrEmpty(message))
{
if (MessageReceived != null)
{
MessageReceived(this, new SimpleTextClientEventArgs(message));
}
}
} while (message != string.Empty);
}
private string ExtractMessage(ref byte[] buffer, ref int length)
{
var message = string.Empty;
var stringBuffer = Encoding.UTF8.GetString(buffer, 0, length);
var lineBreakPosition = stringBuffer.IndexOf(Resources.LineBreak);
if (lineBreakPosition > -1)
{
message = stringBuffer.Substring(0, lineBreakPosition);
var tempBuffer = new byte[Resources.MaxBufferSize];
length = length - message.Length - Resources.LineBreak.Length;
if (length > 0)
{
Array.Copy(buffer, lineBreakPosition + Resources.LineBreak.Length, tempBuffer, 0, length);
buffer = tempBuffer;
}
}
return message;
}
Related
What would be the most optimal/fastest way to split a Steam into chunks delimited by a byte pattern (eg. new byte[] { 0, 0 })?
My current, naieve and slow, implementation reads the stream byte per byte, decrements a counter each time it encounters the delimiter. If the counter is zero, it yields a memory chunk.
const int NUMBER_CONSECUTIVE_DELIMITER = 2;
const int DELIMITER = 0;
public IEnumerable<ReadOnlyMemory<byte>> Chunk(Stream stream)
{
var chunk = new MemoryStream();
try
{
int b; //the byte being read
int c = NUMBER_CONSECUTIVE_DELIMITER;
while ((b = stream.ReadByte()) != -1) //Read the stream byte by byte, -1 = end of the stream
{
chunk.WriteByte((byte)b); //Write this byte to the next chunk
if (b == DELIMITER)
c--; //if we hit the delimiter (ie '0') decrement the counter
else
c = NUMBER_CONSECUTIVE_DELIMITER; //else, reset the couter
if ((c <= 0 || stream.Position == stream.Length) //we hit two subsequent '0's
{
var r = chunk.ToArray().AsMemory(); //parse it to a Memory<T>
chunk.Dispose();
chunk = new();
yield return r;
}
}
}
finally
{
chunk.Dispose();
}
}
Such an implementation is extremely difficult to implement because a stream has to be read out in fixed buffer sizes. The buffer can be too big or too small for the content to be interpreted. To solve this problem, the ReadOnlySequence<T> struct was added. More information about this topic can be seen here.
By using System.IO.Pipelines (package must be obtained) this problem can be solved as follows:
public static async Task FillPipeAsync(Stream stream, PipeWriter writer, CancellationToken cancellationToken = default)
{
// The minimum buffer size that is used for the current buffer segment.
const int bufferSize = 65536;
while (true)
{
// Request 65536 bytes from the PipeWriter.
Memory<byte> memory = writer.GetMemory(bufferSize);
// Read the content from the stream.
int bytesRead = await stream.ReadAsync(memory, cancellationToken).ConfigureAwait(false);
if (bytesRead == 0) break;
// Tell the writer how many bytes are read.
writer.Advance(bytesRead);
// Flush the data to the PipeWriter.
FlushResult result = await writer.FlushAsync(cancellationToken).ConfigureAwait(false);
if (result.IsCompleted) break;
}
// This enables our reading process to be notified that no more new data is coming.
await writer.CompleteAsync().ConfigureAwait(false);
}
This will read your stream asynchronously and write a buffer segment to the pipe. Next you have to implement a read logic to slice/merge the concatenated buffer segments into chunks:
public static async IAsyncEnumerable<ReadOnlySequence<byte>> ReadPipeAsync(PipeReader reader, ReadOnlyMemory<byte> delimiter,
[EnumeratorCancellation] CancellationToken cancellationToken = default)
{
while (true)
{
// Read from the PipeReader.
ReadResult result = await reader.ReadAsync(cancellationToken).ConfigureAwait(false);
ReadOnlySequence<byte> buffer = result.Buffer;
while (TryReadChunk(ref buffer, delimiter.Span, out ReadOnlySequence<byte> chunk))
yield return chunk;
// Tell the PipeReader how many bytes are read.
// This is essential because the Pipe will release last used buffer segments that are not longer in use.
reader.AdvanceTo(buffer.Start, buffer.End);
// Take care of the complete notification and return the last buffer. UPDATE: Corrected issue 2/.
if (result.IsCompleted)
{
yield return buffer;
break;
}
}
await reader.CompleteAsync().ConfigureAwait(false);
}
private static bool TryReadChunk(ref ReadOnlySequence<byte> buffer, ReadOnlySpan<byte> delimiter,
out ReadOnlySequence<byte> chunk)
{
// Search the buffer for the first byte of the delimiter.
SequencePosition? position = buffer.PositionOf(delimiter[0]);
// If no occurence was found or the next bytes of the data in the buffer does not match the delimiter, return false.
// UPDATE: Corrected issue 3/.
if (position is null || !buffer.Slice(position.Value, delimiter.Length).FirstSpan.StartsWith(delimiter))
{
chunk = default;
return false;
}
// Return the calculated chunk and update the buffer to cut the start.
chunk = buffer.Slice(0, position.Value);
buffer = buffer.Slice(buffer.GetPosition(delimiter.Length, position.Value));
return true;
}
For this to work in that form you have to use an IAsyncEnumerable so that the chunks can be streamed into a foreach loop. Merging and slicing is largely handled by the pipe, so that a reliable algorithm can be built here with relatively little code. This code will also handle this in a high-performance manner.
Usage:
// Create a Pipe that manages the buffer.
Pipe pipe = new Pipe();
ConfiguredTaskAwaitable writing = FillPipeAsync(stream, pipe.Writer).ConfigureAwait(false);
// The delimiter that should be used. This can be any data with length > 0.
ReadOnlyMemory<byte> delimiter = new ReadOnlyMemory<byte>(new byte[] { 0, 0 });
// 'await foreach' and 'await writing' are executed asynchronously (in parallel).
await foreach (ReadOnlySequence<byte> chunk in ReadPipeAsync(pipe.Reader, delimiter))
{
// Use "chunk" to retrieve your chunked content.
};
await writing;
Note that reading and chunking is done asynchronously and independently.
I eventually ended up with the below code, strongly inspired by Philipp's answer above and https://keestalkstech.com/2010/11/seek-position-of-a-string-in-a-file-or-filestream/.
public override IEnumerable<byte[]> Chunk(Stream stream)
{
var buffer = new byte[bufferSize];
var size = bufferSize;
var offset = 0;
var position = stream.Position;
var nextChunk = Array.Empty<byte>();
while (true)
{
var bytesRead = stream.Read(buffer, offset, size);
// when no bytes are read -- the string could not be found
if (bytesRead <= 0)
break;
// when less then size bytes are read, we need to slice the buffer to prevent reading of "previous" bytes
ReadOnlySpan<byte> ro = buffer;
if (bytesRead < size)
ro = ro.Slice(0, offset + bytesRead);
// check if we can find our search bytes in the buffer
var i = ro.IndexOf(Delimiter);
if (i > -1 && // we found something
i <= bytesRead && //i <= r -- we found something in the area that was read (at the end of the buffer, the last values are not overwritten). i = r if the delimiter is at the end of the buffer
nextChunk.Length + (i + Delimiter.Length - offset) >= MinChunkSize) //the size of the chunk that will be made is large enough
{
var chunk = buffer[offset..(i + Delimiter.Length)];
yield return new byte[](Concat(nextChunk, chunk));
nextChunk = Array.Empty<byte>();
offset = 0;
size = bufferSize;
position += i + Delimiter.Length;
stream.Position = position;
continue;
}
else if (stream.Position == stream.Length)
{
// we re at the end of the stream
var chunk = buffer[offset..(bytesRead + offset)]; //return the bytes read
yield return new byte[](Concat(nextChunk, chunk));
break;
}
// the stream is not finished. Copy the last 2 bytes to the beginning of the buffer and set the offset to fill the buffer as of byte 3
nextChunk = Concat(nextChunk, buffer[offset..buffer.Length]);
offset = Delimiter.Length;
size = bufferSize - offset;
Array.Copy(buffer, buffer.Length - offset, buffer, 0, offset);
position += bufferSize - offset;
}
}
Sorry for such a vague title, I really dont know what to title this issue.
Basically when I get a stream thats chunked as told by Transfer-Encoding, I then do the following code:
private IEnumerable<byte[]> ReceiveMessageBodyChunked() {
readChunk:
#region Read a line from the Stream which should be a Block Length (Chunk Body Length)
string blockLength = _receiverHelper.ReadLine();
#endregion
#region If the end of the block is reached, re-read from the stream
if (blockLength == Http.NewLine) {
goto readChunk;
}
#endregion
#region Trim it so it should end up with JUST the number
blockLength = blockLength.Trim(' ', '\r', '\n');
#endregion
#region If the end of the message body is reached
if (blockLength == string.Empty) {
yield break;
}
#endregion
int blockLengthInt = 0;
#region Convert the Block Length String to an Int32 base16 (hex)
try {
blockLengthInt = Convert.ToInt32(blockLength, 16);
} catch (Exception ex) {
if (ex is FormatException || ex is OverflowException) {
throw new Exception(string.Format(ExceptionValues.HttpException_WrongChunkedBlockLength, blockLength), ex);
}
throw;
}
#endregion
// If the end of the message body is reached.
if (blockLengthInt == 0) {
yield break;
}
byte[] buffer = new byte[blockLengthInt];
int totalBytesRead = 0;
while (totalBytesRead != blockLengthInt) {
int length = blockLengthInt - totalBytesRead;
int bytesRead = _receiverHelper.HasData ? _receiverHelper.Read(buffer, 0, length) : _request.ClientStream.Read(buffer, 0, length);
if (bytesRead == 0) {
WaitData();
continue;
}
totalBytesRead += bytesRead;
System.Windows.Forms.MessageBox.Show("Chunk Length: " + blockLengthInt + "\nBytes Read/Total:" + bytesRead + "/" + totalBytesRead + "\n\n" + Encoding.ASCII.GetString(buffer));
yield return buffer;
}
goto readChunk;
}
What this is doing is reading 1 line of data from the stream which should be the Chunk's Length, does some checks here and there but eventually converts that to a Int32 Radix16 integer.
From there it essentially creates a byte buffer of that int32 as its length size.
It then just keeps reading from the stream until its read the same amount as the Int32 we converted.
This works splendid, however, for whatever reason, its responding incorrectly on the last read.
It will read the exact amount of bytes as the chunk length perfectly fine, and all data I expect is read. BUT it's ALSO reading again another small chunk of data that was ALREADY read at the very end, resulting in lets say all data from <!DOCTYPE html> down to </html> ASWELL as some data from inside somewhere like <form> e.t.c
Here's an example of what occured:
As you can see, the highlighted red text should NOT have been returned from the read! It should have ended at </html>.
Why is the Chunk's Length lying to me and how can I find the proper size to read at?
I'm not familiar with C# but if I understand your code and the semantics of Read in C# correctly (which seem to be similar to read in C) then the problem is that you are using the same buffer again and again without resetting it first:
byte[] buffer = new byte[blockLengthInt];
int totalBytesRead = 0;
while (totalBytesRead != blockLengthInt) {
int length = blockLengthInt - totalBytesRead;
int bytesRead = _receiverHelper.HasData ? _receiverHelper.Read(buffer, 0, length) : _request.ClientStream.Read(buffer, 0, length);
...
totalBytesRead += bytesRead;
...
yield return buffer;
}
To make some example of what goes wrong here: assume that the chunk size is 10, the content you read is 0123456789 and the first read will return 6 bytes and the second read the remaining 4 bytes. In this case your buffer will be 012345 after the first read and 567845 after the second read. These 45 at the end of the buffer remain from the previous read since you only replaced the first 4 bytes in the buffer but kept the rest.
Whats odd AF is that if I hand the request to another TCPStream proxied (127.0.0.1:8888 as a proxy which is fiddler) it works perfectly fine...
Fiddler is a proxy and might change how the response gets transferred. For example it might use Content-length instead of chunked encoding or it might use smaller chunks so that you always get the full chunk with the first read.
I have a large file with (text/Binary) format.
file format: (0 represent a byte)
00000FileName0000000Hello
World
world1
...
0000000000000000000000
Currently i'm using FileStream and i want to read the Hello.
I Know where Hello start, and it ends with a 0x0D 0x0A.
I also need to go back if the words is not equal to Hello.
How can i read until a carriage return?
is there any PEEK like function in FileStream so i can move back the read pointer`?
is FileStream even a good choice in this case?
You can use the method FileStream.Seek to change the read/write position.
You can use BinaryReader for reading binary content; however, it uses an inner buffer so you cannot rely the underlying Stream.Position anymore, because it can read more bytes in the background than you want. But you can re-implement its needed methods:
private byte[] ReadBytes(Stream s, int count)
{
buffer = new byte[count];
if (count == 0)
{
return buffer;
}
// reading one byte
if (count == 1)
{
int value = s.ReadByte();
if (value == -1)
threw new IOException("Out of stream");
buffer[0] = (byte)value;
return buffer;
}
// reading multiple bytes
int offset = 0;
do
{
int readBytes = s.Read(buffer, offset, count - offset);
if (readBytes == 0)
threw new IOException("Out of stream");
offset += readBytes;
}
while (offset < count);
return buffer;
}
public int ReadInt32(Stream s)
{
byte[] buffer = ReadBytes(s, 4);
return BitConverter.ToInt32(buffer, 0);
}
// similarly, write ReadInt16/64, etc, whatever you need
Assuming that you are on the start position, you can write a ReadString, too:
private string ReadString(Stream s, char delimiter)
{
var result = new List<char>();
int c;
while ((c = s.ReadByte()) != -1 && (char)c != delimiter)
{
result.Add((char)c);
}
return new string(result.ToArray());
}
Usage:
FileStream fs = GetMyFile(); // todo
if (!fs.CanSeek)
throw new NotSupportedException("sorry");
long posCurrent = fs.Position; // save current position
int posHello = ReadInt32(fs); // read position of "hello"
fs.Seek(posHello, SeekOrigin.Begin); // seeking to hello
string hello = ReadString(fs, '\n'); // reading hello
fs.Seek(posCurrent, SeekOrigin.Begin); // seeking back
I have a basic stream which is the stream of HTTP request
and
var s=new HttpListener().GetContext().Request.InputStream;
I want to read the stream (which contain non-Character content, because i've sent the packet)
When we wrap this stream by StreamReader then we use the ReadToEnd() function of StreamReader it can read the whole stream and return a string...
HttpListener listener = new HttpListener();
listener.Prefixes.Add("http://127.0.0.1/");
listener.Start();
var context = listener.GetContext();
var sr = new StreamReader(context.Request.InputStream);
string x=sr.ReadToEnd(); //This Workds
but since it has nonCharacter content we cant use StremReader (i tried all encoding mechanisms..using string is just wrong).And i Cant use the function
context.Request.InputStream.Read(buffer,position,Len)
because I cant get the length of the stream, InputStream.Length always throws an exception and cant be used..and i dont want to create a small protocol like [size][file] and read first size then the file ...somehow the StreamReader can get the length ..and i just want to know how .
I also tried this and it didn't work
List<byte> bb = new List<byte>();
var ss = context.Request.InputStream;
byte b = (byte)ss.ReadByte();
while (b >= 0)
{
bb.Add(b);
b = (byte)ss.ReadByte();
}
I've solved it by the following
FileStream fs = new FileStream("C:\\cygwin\\home\\Dff.rar", FileMode.Create);
byte[] file = new byte[1024 * 1024];
int finishedBytes = ss.Read(file, 0, file.Length);
while (finishedBytes > 0)
{
fs.Write(file, 0, finishedBytes);
finishedBytes = ss.Read(file, 0, file.Length);
}
fs.Close();
thanks Jon , Douglas
Your bug lies in the following line:
byte b = (byte)ss.ReadByte();
The byte type is unsigned; when Stream.ReadByte returns -1 at the end of the stream, you’re indiscriminately casting it to byte, which converts it to 255 and, therefore, satisfies the b >= 0 condition. It is helpful to note that the return type is int, not byte, for this very reason.
A quick-and-dirty fix for your code:
List<byte> bb = new List<byte>();
var ss = context.Request.InputStream;
int next = ss.ReadByte();
while (next != -1)
{
bb.Add((byte)next);
next = ss.ReadByte();
}
The following solution is more efficient, since it avoids the byte-by-byte reads incurred by the ReadByte calls, and uses a dynamically-expanding byte array for Read calls instead (similar to the way that List<T> is internally implemented):
var ss = context.Request.InputStream;
byte[] buffer = new byte[1024];
int totalCount = 0;
while (true)
{
int currentCount = ss.Read(buffer, totalCount, buffer.Length - totalCount);
if (currentCount == 0)
break;
totalCount += currentCount;
if (totalCount == buffer.Length)
Array.Resize(ref buffer, buffer.Length * 2);
}
Array.Resize(ref buffer, totalCount);
StreamReader cannot get the length either -- it seems there's some confusion regarding the third parameter of Stream.Read. That parameter specifies the maximum number of bytes that will be read, which does not need (and really cannot) be equal to the number of bytes actually available in the stream. You just call Read in a loop until it returns 0, in which case you know you have reached the end of the stream. This is all documented on MSDN, and it's also exactly how StreamReader does it.
There's also no problem in reading the request with StreamReader and getting it into string; strings are binary safe in .NET, so you 're covered. The problem will be making sense of the contents of the string, but we can't really talk about that since you don't provide any relevant information.
HttpRequestStream won't give you the length, but you can get it from the HttpListenerRequest.ContentLength64 property. Like Jon said, make sure you observe the return value from the Read method. In my case, we get buffered reads and cannot read our entire 226KB payload in one go.
Try
byte[] getPayload(HttpListenerContext context)
{
int length = (int)context.Request.ContentLength64;
byte[] payload = new byte[length];
int numRead = 0;
while (numRead < length)
numRead += context.Request.InputStream.Read(payload, numRead, length - numRead);
return payload;
}
I'm trying to read the response stream from an HttpWebResponse object. I know the length of the stream (_response.ContentLength) however I keep getting the following exception:
Specified argument was out of the range of valid values.
Parameter name: size
While debugging, I noticed that at the time of the error, the values were as such:
length = 15032 //the length of the stream as defined by _response.ContentLength
bytesToRead = 7680 //the number of bytes in the stream that still need to be read
bytesRead = 7680 //the number of bytes that have been read (offset)
body.length = 15032 //the size of the byte[] the stream is being copied to
The peculiar thing is that the bytesToRead and bytesRead variables are ALWAYS 7680, regardless of the size of the stream (contained in the length variable). Any ideas?
Code:
int length = (int)_response.ContentLength;
byte[] body = null;
if (length > 0)
{
int bytesToRead = length;
int bytesRead = 0;
try
{
body = new byte[length];
using (Stream stream = _response.GetResponseStream())
{
while (bytesToRead > 0)
{
// Read may return anything from 0 to length.
int n = stream.Read(body, bytesRead, length);
// The end of the file is reached.
if (n == 0)
break;
bytesRead += n;
bytesToRead -= n;
}
stream.Close();
}
}
catch (Exception exception)
{
throw;
}
}
else
{
body = new byte[0];
}
_responseBody = body;
You want this line:
int n = stream.Read(body, bytesRead, length);
to be this:
int n = stream.Read(body, bytesRead, bytesToRead);
You are saying the maximum number of bytes to read is the stream's length, but it isn't since it is actually only the remaining information in the stream after the offset has been applied.
You also shouldn't need this part:
if (n == 0)
break;
The while should end the reading correctly, and it is possible that you won't read any bytes before you have finished the whole thing (if the stream is filling slower than you are taking the data out of it)