Why does CodedInputStream set stream position to end? - c#

I'm using protocol buffers 3 in c#. I'm trying to bounce through a stream to find the start locations of each message, without actually Deserialising the messages. All messages are written to the stream with WriteDelimitedTo.
I then use this code to try to jump from length markers:
_map = new List<int>();
_stream.Seek(0, SeekOrigin.Begin);
var codedStream = new CodedInputStream(_stream);
while (_stream.Position < _stream.Length)
{
var length = codedStream.ReadInt32();
_map.Add((int) _stream.Position);
_stream.Seek(length, SeekOrigin.Current);
}
However, the moment I do codedStream.ReadInt32() the stream position is set to the end, rather than just the next byte after the varint32.

This behaviour is due to CodedInputStream buffering the original stream as you can see in the source code. It is probably unsuitable for manually reading and seeking through a stream. An alternative is to use parts of Marc Gravell's source code for reading a varint, available here, and move though the raw stream directly.

Related

Read from a compressing GZipStream

I'm exploring how to implement an HTTP server in C#. (And before you ask, I know there is Kestrel (and nothing else that isn't obsolete), and I want a much, much smaller application.) So, the response could be a Stream that cannot be seeked and has an unknown length. For this situation, chunked encoding can be used instead of sending a Content-Length header.
The response can also be compressed with gzip or br as indicated by the client. This can be accomplished with e.g. the GZipStream class. I had almost said "easily", because that's not really the case. I always find the GZipStream API confusing each time I use it. I usually bump into every exception there is until I finally get it right.
It seems like I can only write (push) to a GZipStream and the compressed data will trickle out the other end into the specified "base" stream. But that's not desirable because I can't just let the compressed data flow to the client. It needs to be chunked. That is, each bit of compressed data needs to be prefixed with its chunk size. Of course the GZipStream cannot produce that format.
Instead, I'd like to read (pull) from the compressing GZipStream, but that doesn't seem to be possible. The documentation says it will throw an exception if I try that. But there has to be some instance that brings the compressed bytes into the chunked format.
So how would I get the expected result? Can it even be achieved with this API? Why can't I pull from the compressing stream, only push?
I'm not trying to make up (non-functional) sample code because that would only be confusing.
PS: Okay, maybe this:
Stream responseBody = ...;
if (canCompress)
{
responseBody = new GZipStream(responseBody, CompressionMode.Compress); // <-- probably wrong
}
// not shown: add appropriate headers
while (true)
{
int chunkLength = responseBody.Read(buffer); // <-- not possible
if (chunkLength == 0)
break;
response.Write($"{chunkLength:X}\r\n");
response.Write(buffer.AsMemory()[..chunkLength]);
response.Write("\r\n");
}
response.Write("0\r\n\r\n");
Your usage of GZipStream is incomplete. While your input responseBuffer is the correct target buffer, you have to actually write the bytes TO the GZipStream itself.
In addition, once you are done writing, you must close the GZipStream instance to write all compressed bytes to your target buffer. This is the critical step because there is no such thing as "partial compression" of an input stream in GZip. You would have to analyze the entire input in order to properly compress it. As such, this is the critical missing link that MUST happen before you can continue to write the response.
Finally, you need to reset the position of your output stream so that you can read it into an intermediary response buffer.
using MemoryStream responseBody = new MemoryStream();
GZipStream gzipStream = null; // make sure to dispose after use
if (canCompress)
{
using MemoryStream gzipStreamBuffer = new MemoryStream(bytes);
gzipStream = new GZipStream(responseBody, CompressionMode.Compress, true);
gzipStreamBuffer.CopyTo(gzipStream);
gzipStream.Close(); // close the stream so that all compressed bytes are written
responseBody.Seek(0, SeekOrigin.Begin); // reset the response so that we can read it to the buffer
}
var buffer = new byte[20];
while (true)
{
int chunkLength = responseBody.Read(buffer);
if (chunkLength == 0)
break;
// write response
}
In my test example, my bytes input was 241 bytes, whereas the compressed bytes written to the buffer totaled 82 bytes.

Pass the Length of uncertain Stream to WCF Service

Is there any way to pass the Length of uncertain Stream to WCF Service?
Unsertain Stream means the stream of
The stream provides its length only after process and writing the data.
e.g. GZipStream
Background
I'm making a WCF Service receiving multiple Streams from client.
As WCF Streaming only allows one stream in the message, I decided to concatenate all streams into one stream and divide it in server code.
The streams client provides will contains variable kinds of stream like FileStream, MemoryStreamwith data from DataTable serialization and
using (var fileStream = new FileStream(filePath, FileMode.Open))
using (var memoryStream = new MemoryStream())
using (var concatStream = new ConcatenatedStream(fileStream, memoryStream))
{
client.UploadStreams(concatStream);
}
ConcatenatedStream is a Stream implementation suggested in c# - How do I concatenate two System.Io.Stream instances into one? - Stack Overflow.
In server side, Length of each Streams will be needed to divide single stream to multiple streams.
As I want to save memory in client side, I decided to use PullStream.
PullStream will Write buffer on demand of Read.
But this causes a big problem. I cannot get Length of PullStream before starting streaming.
Any helps will be appreciated.
Thanks
Let's make it simple:
If you have the length of a part of the stream on client before you start pushing it to server you can append a structure before the payload and read that structure on server. That is a standard data transfer template. Doing so i.e. appending a header before each payload you give your server a hint on how long the next part is going to be.
If you do not have the length of a part of the stream on client before you start pushing it to server, you are going to have to 'insert' the header inside the payload. That's not very intuitive and not that useful but it does work. I used such a thing when I had my data prepared asynchronously on client and the first buffers were ready before the length was known. In this scenario you are going to need a so called marker i.e. a set of bytes that could not be found anywhere in the stream but before the header.
This scenario is the toughest of the 3 to implement when done for the first time. Buckle up. In order to do it right you should create an artificial structure of your stream. Such a structure is used for streaming video over network and called Network Abstraction Layer or NAL, read about it. It is also called stream format AnnexB from the h264 standard. You should abstract from the field in which the standard is described, the idea is very versatile.
In short the payload is divided into parts, so called NAL Units or NALUs, each part has a byte sequence which marks it's start, then goes the type indicator and length of the current NALU, then follows the payload of the NALU. For your purposes you would need to implement NALUs of two types:
Main data payload
Metadata
After you imagine how your stream should look like, you have to grip on the idea of "stream encoding". Those are fearsome words but do not worry. You just have to ensure that the byte sequence that is used to mark the start of the NALU is never met inside the payload of the NALU. In order to achieve that you are going to implement some replacement tactic. Browse for samples.
When you are done thinking this through and before you dive into that, think twice about it. Might be the scenario 3 would fit you easier.
In the case you are sure you will never have to process a part of the streamed data you can greatly simplify the scenario i.e. totally skip the stream encoding and implement something like this:
Client Stream principal code:
private byte[] mabytPayload;
private int mintCurrentPayloadPosition;
private int? mintTotalPayloadLength;
private bool mblnTotalPayloadLengthSent;
public int Read(byte[] iBuffer, int iStart, int iLength)
{
if (mintTotalPayloadLength.HasValue && !mblnTotalPayloadLengthSent)
{
//1. Write the packet type (0)
//3. Write the total stream length (4 bytes).
...
mblnTotalPayloadLengthSent = true;
}
else
{
//1. Write the packet type (1)
//2. Write the packet length (iLength - 1 for example, 1 byte is for
//the type specification)
//3. Write the payload packet.
...
}
}
public void TotalStreamLengthSet(int iTotalStreamLength)
{
mintTotalPayloadLength = iTotalStreamLength;
}
Server stream reader:
Public void WCFUploadCallback(Stream iUploadStream)
{
while(!endOfStream)
{
//1. Read the packet type.
if (normalPayload)
{
//2.a Read the payload packet length.
//2.b Read the payload.
}
else
{
//2.c Read the total stream length.
}
}
}
In the scenario where your upload is no-stop and the metadata about the stream is ready on client long after the payload, that happens as well, you are going to need two channels i.e. one channel for payload stream and another channel with metadata where you server will answer to the client with another question like 'what did you just started sending me' or 'what have you sent me' and the client will explain itself in the next message.
If you are ready to stick to one of the scenarios, one could give you some further details and/or recommendations.

Why is my DeflateStream not receiving data correctly over TCP?

I have a TcpClient class on a client and server setup on my local machine. I have been using the Network stream to facilitate communications back and forth between the 2 successfully.
Moving forward I am trying to implement compression in the communications. I've tried GZipStream and DeflateStream. I have decided to focus on DeflateStream. However, the connection is hanging without reading data now.
I have tried 4 different implementations that have all failed due to the Server side not reading the incoming data and the connection timing out. I will focus on the two implementations I have tried most recently and to my knowledge should work.
The client is broken down to this request: There are 2 separate implementations, one with streamwriter one without.
textToSend = ENQUIRY + START_OF_TEXT + textToSend + END_OF_TEXT;
// Send XML Request
byte[] request = Encoding.UTF8.GetBytes(textToSend);
using (DeflateStream streamOut = new DeflateStream(netStream, CompressionMode.Compress, true))
{
//using (StreamWriter sw = new StreamWriter(streamOut))
//{
// sw.Write(textToSend);
// sw.Flush();
streamOut.Write(request, 0, request.Length);
streamOut.Flush();
//}
}
The server receives the request and I do
1.) a quick read of the first character then if it matches what I expect
2.) I continue reading the rest.
The first read works correctly and if I want to read the whole stream it is all there. However I only want to read the first character and evaluate it then continue in my LongReadStream method.
When I try to continue reading the stream there is no data to be read. I am guessing that the data is being lost during the first read but I'm not sure how to determine that. All this code works correctly when I use the normal NetworkStream.
Here is the server side code.
private void ProcessRequests()
{
// This method reads the first byte of data correctly and if I want to
// I can read the entire request here. However, I want to leave
// all that data until I want it below in my LongReadStream method.
if (QuickReadStream(_netStream, receiveBuffer, 1) != ENQUIRY)
{
// Invalid Request, close connection
clientIsFinished = true;
_client.Client.Disconnect(true);
_client.Close();
return;
}
while (!clientIsFinished) // Keep reading text until client sends END_TRANSMISSION
{
// Inside this method there is no data and the connection times out waiting for data
receiveText = LongReadStream(_netStream, _client);
// Continue talking with Client...
}
_client.Client.Shutdown(SocketShutdown.Both);
_client.Client.Disconnect(true);
_client.Close();
}
private string LongReadStream(NetworkStream stream, TcpClient c)
{
bool foundEOT = false;
StringBuilder sbFullText = new StringBuilder();
int readLength, totalBytesRead = 0;
string currentReadText;
c.ReceiveBufferSize = DEFAULT_BUFFERSIZE * 100;
byte[] bigReadBuffer = new byte[c.ReceiveBufferSize];
while (!foundEOT)
{
using (var decompressStream = new DeflateStream(stream, CompressionMode.Decompress, true))
{
//using (StreamReader sr = new StreamReader(decompressStream))
//{
//currentReadText = sr.ReadToEnd();
//}
readLength = decompressStream.Read(bigReadBuffer, 0, c.ReceiveBufferSize);
currentReadText = Encoding.UTF8.GetString(bigReadBuffer, 0, readLength);
totalBytesRead += readLength;
}
sbFullText.Append(currentReadText);
if (currentReadText.EndsWith(END_OF_TEXT))
{
foundEOT = true;
sbFullText.Length = sbFullText.Length - 1;
}
else
{
sbFullText.Append(currentReadText);
}
// Validate data code removed for simplicity
}
c.ReceiveBufferSize = DEFAULT_BUFFERSIZE;
c.ReceiveTimeout = timeOutMilliseconds;
return sbFullText.ToString();
}
private string QuickReadStream(NetworkStream stream, byte[] receiveBuffer, int receiveBufferSize)
{
using (DeflateStream zippy = new DeflateStream(stream, CompressionMode.Decompress, true))
{
int bytesIn = zippy.Read(receiveBuffer, 0, receiveBufferSize);
var returnValue = Encoding.UTF8.GetString(receiveBuffer, 0, bytesIn);
return returnValue;
}
}
EDIT
NetworkStream has an underlying Socket property which has an Available property. MSDN says this about the available property.
Gets the amount of data that has been received from the network and is
available to be read.
Before the call below Available is 77. After reading 1 byte the value is 0.
//receiveBufferSize = 1
int bytesIn = zippy.Read(receiveBuffer, 0, receiveBufferSize);
There doesn't seem to be any documentation about DeflateStream consuming the whole underlying stream and I don't know why it would do such a thing when there are explicit calls to be made to read specific numbers of bytes.
Does anyone know why this happens or if there is a way to preserve the underlying data for a future read? Based on this 'feature' and a previous article that I read stating a DeflateStream must be closed to finish sending (flush won't work) it seems DeflateStreams may be limited in their use for networking especially if one wishes to counter DOS attacks by testing incoming data before accepting a full stream.
The basic flaw I can think of looking at your code is a possible misunderstanding of how network stream and compression works.
I think your code might work, if you kept working with one DeflateStream. However, you use one in your quick read and then you create another one.
I will try to explain my reasoning on an example. Assume you have 8 bytes of original data to be sent over the network in a compressed way. Now let's assume for sake of an argument, that each and every byte (8 bits) of original data will be compressed to 6 bits in compressed form. Now let's see what your code does to this.
From the network stream, you can't read less than 1 byte. You can't take 1 bit only. You take 1 byte, 2 bytes, or any number of bytes, but not bits.
But if you want to receive just 1 byte of the original data, you need to read first whole byte of compressed data. However, there is only 6 bits of compressed data that represent the first byte of uncompressed data. The last 2 bits of the first byte are there for the second byte of original data.
Now if you cut the stream there, what is left is 5 bytes in the network stream that do not make any sense and can't be uncompressed.
The deflate algorithm is more complex than that and thus it makes perfect sense if it does not allow you to stop reading from the NetworkStream at one point and continue with new DeflateStream from the middle. There is a context of the decompression that must be present in order to decompress the data to their original form. Once you dispose the first DeflateStream in your quick read, this context is gone, you can't continue.
So, to resolve your issue, try to create only one DeflateStream and pass it to your functions, then dispose it.
This is broken in many ways.
You are assuming that a read call will read the exact number of bytes you want. It might read everything in one byte chunks though.
DeflateStream has an internal buffer. It can't be any other way: Input bytes do not correspond 1:1 to output bytes. There must be some internal buffering. You must use one such stream.
Same issue with UTF-8: UTF-8 encoded strings cannot be split at byte boundaries. Sometimes, your Unicode data will be garbled.
Don't touch ReceiveBufferSize, it does not help in any way.
You cannot reliably flush a deflate stream, I think, because the output might be at a partial byte position. You probably should devise a message framing format in which you prepend the compressed length as an uncompressed integer. Then, send the compressed deflate stream after the length. This is decodable in a reliable way.
Fixing these issues is not easy.
Since you seem to control client and server you should discard all of this and not devise your own network protocol. Use a higher-level mechanism such as web services, HTTP, protobuf. Anything is better than what you have there.
Basically there are a few things wrong with the code I posted above. First is that when I read data I'm not doing anything to make sure the data is ALL being read in. As per microsoft documentation
The Read operation reads as much data as is available, up to the
number of bytes specified by the size parameter.
In my case I was not making sure my reads would get all the data I expected.
This can be accomplished simply with this code.
byte[] data= new byte[packageSize];
bytesRead = _netStream.Read(data, 0, packageSize);
while (bytesRead < packageSize)
bytesRead += _netStream.Read(data, bytesRead, packageSize - bytesRead);
On top of this problem I had a fundamental issue with using DeflateStream - namely I should not use DeflateStream to write to the underlying NetworkStream. The correct approach is to first use the DeflateStream to compress data into a ByteArray, then send that ByteArray using the NetworkStream directly.
Using this approach helped to correctly compress data over the network and property read the data on the other end.
You may point out that I must know the size of the data, and that is true. Every call has a 8 byte 'header' that includes the size of the compressed data and the size of the data when it is uncompressed. Although I think the second was utimately not needed.
The code for this is here. Note the variable compressedSize serves 2 purposes.
int packageSize = streamIn.Read(sizeOfDataInBytes, 0, 4);
while (packageSize!= 4)
{
packageSize+= streamIn.Read(sizeOfDataInBytes, packageSize, 4 - packageSize);
}
packageSize= BitConverter.ToInt32(sizeOfDataInBytes, 0);
With this information I can correctly use the code I showed you first to get the contents fully.
Once I have the full compressed byte array I can get the incoming data like so:
var output = new MemoryStream();
using (var stream = new MemoryStream(bufferIn))
{
using (var decompress = new DeflateStream(stream, CompressionMode.Decompress))
{
decompress.CopyTo(output);;
}
}
output.Position = 0;
var unCompressedArray = output.ToArray();
output.Close();
output.Dispose();
return Encoding.UTF8.GetString(unCompressedArray);

Send Multiple Object through TCP without Closing or Disposing Stream

I had using the BinaryFormatter to Serialize an object through NetworkStream
The code like this
//OpenConnection ...
TCPClient client = server.AcceptTCPConnection();
Message message = new Message("bla bla"); // This is the serializable class
NetworkStream stream = client.GetStream(); // Get Stream
BinaryFormatter bf = new BinaryFormatter();
bf.Serialize(stream, message);
stream.Flush();
stream.Close(); //Close Connection
And in client Code, we just need to Read from stream
bf.Deserialize(stream) as Message
to get the object we just sent from Server.
But there is a problem here, if I delete the line stream.Close(); the client cannot read this Object. Or I can change to stream.Dispose();
However, I want to use this stream again to send another Message, how I can do? Please help, it make me feel so headache ##
UPDATE:
I found the reason of this issue. Because I used one machine to run both client and server. It definitely worked well in two different machines. Someone can tell me why? Get big problem with this for a couple day ago.
Sending multiple separate messages involves "framing" - splitting the single channel into separate chunks that don't ever require the client to "read to end". Oddly, though, I was under the impression that BinaryFormatter already implemented basic framing - but: I could be wrong. In the general case, when working with a binary protocol, the most common approach is to prefix each message with the length of the payload, i.e.
using(var ms = new MemoryStream()) {
while(...)
{
// not shown: serialize to ms
var len BitConverter.GetBytes((int)ms.Length);
output.Write(len, 0, 4);
output.Write(ms.GetBuffer(), 0, (int) ms.Length);
ms.SetLength(0); // ready for next cycle
}
}
the caller has to:
read exactly 4 bytes (at least, for the above), or detect EOF
determine the length
read exactly that many bytes
deserialize
repeat
If that sounds like a lot of work, maybe just use a serializer that does all this for you; for example, with protobuf-net, this would be:
while(...) { // each item
Serializer.SerializeWithLengthPrefix(output, PrefixStyle.Base128, 1);
}
and the reader would be:
foreach(var msg in Serializer.DeserializeItems<Message>(
input, PrefixStyle.Base128, 1))
{
// ...
}
(note: this does not use the same format / rules as BinaryFormatter)

Why does HttpWebResponse return a null terminated string?

I recently was using HttpWebResponse to return xml data from a HttpWebRequest, and I noticed that the stream returned a null terminated string to me.
I assume its because the underlying library has to be compatible with C++, but I wasn't able to find a resource to provide further illumination.
Mostly I'm wondering if there is an easy way to disable this behavior so I don't have to sanitize strings I'm passing into my xml reader.
Edit here is a sample of the relevant code:
httpResponse.GetResponseStream().Read(serverBuffer, 0, BUFFER_SIZE);
output = processResponse(System.Text.UTF8Encoding.UTF8.GetString(serverBuffer))
where processResponse looks like:
processResponse(string xmlResponse)
{
var Parser = new XmlDocument();
xmlResponse = xmlResponse.Replace('\0',' '); //fix for httpwebrequest null terminating strings
Parser.LoadXml(xmlResponse);
This definitely isn't normal behaviour. Two options:
You made a mistake in the reading code (e.g. creating a buffer and then calling Read on a stream, expecting it to fill the buffer)
The web server actually returned a null-terminated response
You should be able to tell the difference using Wireshark if nothing else.
Could it be that you are setting a size (wrong size) to the buffer you are loading?
You can use a StreamReader to avoid the temp buffer if you don't need it.
using(var stream = new StreamReader(httpResponse.GetResponseStream()))
{
string output = stream.ReadToEnd();
//...
}
Hmm... I doubt it returns a null-terminated string since simply there is no such concept in C#. At best you could have a string with a \0u0000 character at the end, but in this case it would mean that the return from the server contains such a character and the HttpWebRequest is simply doing it's duty and returns whatever the server returned.
Update
after reading your code, the mistake is pretty obvious: you are Read()-ing from a stream into a byte[] but not tacking notice of how much you actually read:
int responseLength = httpResponse.GetResponseStream().Read(
serverBuffer, 0, BUFFER_SIZE);
output = processResponse(System.Text.UTF8Encoding.UTF8.GetString(
serverBuffer, 0, responseLength));
this would fix the immediate problem, leaving only the other bugs in your code to deal with, like the fact that you cannot handle correctly a response larger than BUFFER_SIZE... I would suggest you open a XML document reader on the returned stream instead of manipulating the stream via an (unnecessary) byte[ ] copy operation:
Parser.Load(httpResponse.GetResponseStream());

Categories