Read from a compressing GZipStream - c#

I'm exploring how to implement an HTTP server in C#. (And before you ask, I know there is Kestrel (and nothing else that isn't obsolete), and I want a much, much smaller application.) So, the response could be a Stream that cannot be seeked and has an unknown length. For this situation, chunked encoding can be used instead of sending a Content-Length header.
The response can also be compressed with gzip or br as indicated by the client. This can be accomplished with e.g. the GZipStream class. I had almost said "easily", because that's not really the case. I always find the GZipStream API confusing each time I use it. I usually bump into every exception there is until I finally get it right.
It seems like I can only write (push) to a GZipStream and the compressed data will trickle out the other end into the specified "base" stream. But that's not desirable because I can't just let the compressed data flow to the client. It needs to be chunked. That is, each bit of compressed data needs to be prefixed with its chunk size. Of course the GZipStream cannot produce that format.
Instead, I'd like to read (pull) from the compressing GZipStream, but that doesn't seem to be possible. The documentation says it will throw an exception if I try that. But there has to be some instance that brings the compressed bytes into the chunked format.
So how would I get the expected result? Can it even be achieved with this API? Why can't I pull from the compressing stream, only push?
I'm not trying to make up (non-functional) sample code because that would only be confusing.
PS: Okay, maybe this:
Stream responseBody = ...;
if (canCompress)
{
responseBody = new GZipStream(responseBody, CompressionMode.Compress); // <-- probably wrong
}
// not shown: add appropriate headers
while (true)
{
int chunkLength = responseBody.Read(buffer); // <-- not possible
if (chunkLength == 0)
break;
response.Write($"{chunkLength:X}\r\n");
response.Write(buffer.AsMemory()[..chunkLength]);
response.Write("\r\n");
}
response.Write("0\r\n\r\n");

Your usage of GZipStream is incomplete. While your input responseBuffer is the correct target buffer, you have to actually write the bytes TO the GZipStream itself.
In addition, once you are done writing, you must close the GZipStream instance to write all compressed bytes to your target buffer. This is the critical step because there is no such thing as "partial compression" of an input stream in GZip. You would have to analyze the entire input in order to properly compress it. As such, this is the critical missing link that MUST happen before you can continue to write the response.
Finally, you need to reset the position of your output stream so that you can read it into an intermediary response buffer.
using MemoryStream responseBody = new MemoryStream();
GZipStream gzipStream = null; // make sure to dispose after use
if (canCompress)
{
using MemoryStream gzipStreamBuffer = new MemoryStream(bytes);
gzipStream = new GZipStream(responseBody, CompressionMode.Compress, true);
gzipStreamBuffer.CopyTo(gzipStream);
gzipStream.Close(); // close the stream so that all compressed bytes are written
responseBody.Seek(0, SeekOrigin.Begin); // reset the response so that we can read it to the buffer
}
var buffer = new byte[20];
while (true)
{
int chunkLength = responseBody.Read(buffer);
if (chunkLength == 0)
break;
// write response
}
In my test example, my bytes input was 241 bytes, whereas the compressed bytes written to the buffer totaled 82 bytes.

Related

HttpWebResponse Stream reading in uneven chunks

I am using the stream from HttpWebRequest.GetResponse().GetResponseStream() to read data from a streaming web API. I use Begin/EndRead on the stream with a buffer of 65K bytes. I can see that data is being returned in the following pattern:
16383 bytes read.
1 bytes read.
16383 bytes read.
1 bytes read.
16383 bytes read.
1 bytes read.
etc...
Obviously the 1 byte reads introduce a lot of inefficiency in the process, and the buffer size I provide is large enough to fit 16384 bytes or more. Is there anything I can do as a client to improve this or is simply up to the server how it's streaming data to me?
The reader code is basically:
var buffer = new byte[65536];
using (var stream = response.GetResponseStream()) {
while (true) {
var bytesRead = await AsyncRead(stream.BeginRead, stream.EndRead, buffer);
Console.WriteLine($"{bytesRead} bytes read.");
// do something with the bytes
}
}
where AsyncRead just calls BeginRead(buffer, 0, buffer.Length, callback, null), then EndRead in the callback and returns the return value of EndRead.
BTW this is on .NET 4.0, no HttpClient.
What exactly are you trying to achieve by sending the HTTPWebRequest to the target server?
Are you trying to read a live response from the server after asking the server for data or initializing a request between your client application and the target server? If you are try to send an HTTPWepRequest and an HTTPWebResponse to the target server then convert the response given from the server to a stream then use System.IO.StreamReader to read the in-coming stream!
Then just to be on the safe side convert the stream that was read by the System.IO.StreamReader method into UTF-8 if that is your overall goal!
After converting to UTF-8 you can now print the output stream into a string value which you can output to console or wherever you want to send the output string!
I hope this is what you wanted if not then I have been essentially useless! :P

Why is my DeflateStream not receiving data correctly over TCP?

I have a TcpClient class on a client and server setup on my local machine. I have been using the Network stream to facilitate communications back and forth between the 2 successfully.
Moving forward I am trying to implement compression in the communications. I've tried GZipStream and DeflateStream. I have decided to focus on DeflateStream. However, the connection is hanging without reading data now.
I have tried 4 different implementations that have all failed due to the Server side not reading the incoming data and the connection timing out. I will focus on the two implementations I have tried most recently and to my knowledge should work.
The client is broken down to this request: There are 2 separate implementations, one with streamwriter one without.
textToSend = ENQUIRY + START_OF_TEXT + textToSend + END_OF_TEXT;
// Send XML Request
byte[] request = Encoding.UTF8.GetBytes(textToSend);
using (DeflateStream streamOut = new DeflateStream(netStream, CompressionMode.Compress, true))
{
//using (StreamWriter sw = new StreamWriter(streamOut))
//{
// sw.Write(textToSend);
// sw.Flush();
streamOut.Write(request, 0, request.Length);
streamOut.Flush();
//}
}
The server receives the request and I do
1.) a quick read of the first character then if it matches what I expect
2.) I continue reading the rest.
The first read works correctly and if I want to read the whole stream it is all there. However I only want to read the first character and evaluate it then continue in my LongReadStream method.
When I try to continue reading the stream there is no data to be read. I am guessing that the data is being lost during the first read but I'm not sure how to determine that. All this code works correctly when I use the normal NetworkStream.
Here is the server side code.
private void ProcessRequests()
{
// This method reads the first byte of data correctly and if I want to
// I can read the entire request here. However, I want to leave
// all that data until I want it below in my LongReadStream method.
if (QuickReadStream(_netStream, receiveBuffer, 1) != ENQUIRY)
{
// Invalid Request, close connection
clientIsFinished = true;
_client.Client.Disconnect(true);
_client.Close();
return;
}
while (!clientIsFinished) // Keep reading text until client sends END_TRANSMISSION
{
// Inside this method there is no data and the connection times out waiting for data
receiveText = LongReadStream(_netStream, _client);
// Continue talking with Client...
}
_client.Client.Shutdown(SocketShutdown.Both);
_client.Client.Disconnect(true);
_client.Close();
}
private string LongReadStream(NetworkStream stream, TcpClient c)
{
bool foundEOT = false;
StringBuilder sbFullText = new StringBuilder();
int readLength, totalBytesRead = 0;
string currentReadText;
c.ReceiveBufferSize = DEFAULT_BUFFERSIZE * 100;
byte[] bigReadBuffer = new byte[c.ReceiveBufferSize];
while (!foundEOT)
{
using (var decompressStream = new DeflateStream(stream, CompressionMode.Decompress, true))
{
//using (StreamReader sr = new StreamReader(decompressStream))
//{
//currentReadText = sr.ReadToEnd();
//}
readLength = decompressStream.Read(bigReadBuffer, 0, c.ReceiveBufferSize);
currentReadText = Encoding.UTF8.GetString(bigReadBuffer, 0, readLength);
totalBytesRead += readLength;
}
sbFullText.Append(currentReadText);
if (currentReadText.EndsWith(END_OF_TEXT))
{
foundEOT = true;
sbFullText.Length = sbFullText.Length - 1;
}
else
{
sbFullText.Append(currentReadText);
}
// Validate data code removed for simplicity
}
c.ReceiveBufferSize = DEFAULT_BUFFERSIZE;
c.ReceiveTimeout = timeOutMilliseconds;
return sbFullText.ToString();
}
private string QuickReadStream(NetworkStream stream, byte[] receiveBuffer, int receiveBufferSize)
{
using (DeflateStream zippy = new DeflateStream(stream, CompressionMode.Decompress, true))
{
int bytesIn = zippy.Read(receiveBuffer, 0, receiveBufferSize);
var returnValue = Encoding.UTF8.GetString(receiveBuffer, 0, bytesIn);
return returnValue;
}
}
EDIT
NetworkStream has an underlying Socket property which has an Available property. MSDN says this about the available property.
Gets the amount of data that has been received from the network and is
available to be read.
Before the call below Available is 77. After reading 1 byte the value is 0.
//receiveBufferSize = 1
int bytesIn = zippy.Read(receiveBuffer, 0, receiveBufferSize);
There doesn't seem to be any documentation about DeflateStream consuming the whole underlying stream and I don't know why it would do such a thing when there are explicit calls to be made to read specific numbers of bytes.
Does anyone know why this happens or if there is a way to preserve the underlying data for a future read? Based on this 'feature' and a previous article that I read stating a DeflateStream must be closed to finish sending (flush won't work) it seems DeflateStreams may be limited in their use for networking especially if one wishes to counter DOS attacks by testing incoming data before accepting a full stream.
The basic flaw I can think of looking at your code is a possible misunderstanding of how network stream and compression works.
I think your code might work, if you kept working with one DeflateStream. However, you use one in your quick read and then you create another one.
I will try to explain my reasoning on an example. Assume you have 8 bytes of original data to be sent over the network in a compressed way. Now let's assume for sake of an argument, that each and every byte (8 bits) of original data will be compressed to 6 bits in compressed form. Now let's see what your code does to this.
From the network stream, you can't read less than 1 byte. You can't take 1 bit only. You take 1 byte, 2 bytes, or any number of bytes, but not bits.
But if you want to receive just 1 byte of the original data, you need to read first whole byte of compressed data. However, there is only 6 bits of compressed data that represent the first byte of uncompressed data. The last 2 bits of the first byte are there for the second byte of original data.
Now if you cut the stream there, what is left is 5 bytes in the network stream that do not make any sense and can't be uncompressed.
The deflate algorithm is more complex than that and thus it makes perfect sense if it does not allow you to stop reading from the NetworkStream at one point and continue with new DeflateStream from the middle. There is a context of the decompression that must be present in order to decompress the data to their original form. Once you dispose the first DeflateStream in your quick read, this context is gone, you can't continue.
So, to resolve your issue, try to create only one DeflateStream and pass it to your functions, then dispose it.
This is broken in many ways.
You are assuming that a read call will read the exact number of bytes you want. It might read everything in one byte chunks though.
DeflateStream has an internal buffer. It can't be any other way: Input bytes do not correspond 1:1 to output bytes. There must be some internal buffering. You must use one such stream.
Same issue with UTF-8: UTF-8 encoded strings cannot be split at byte boundaries. Sometimes, your Unicode data will be garbled.
Don't touch ReceiveBufferSize, it does not help in any way.
You cannot reliably flush a deflate stream, I think, because the output might be at a partial byte position. You probably should devise a message framing format in which you prepend the compressed length as an uncompressed integer. Then, send the compressed deflate stream after the length. This is decodable in a reliable way.
Fixing these issues is not easy.
Since you seem to control client and server you should discard all of this and not devise your own network protocol. Use a higher-level mechanism such as web services, HTTP, protobuf. Anything is better than what you have there.
Basically there are a few things wrong with the code I posted above. First is that when I read data I'm not doing anything to make sure the data is ALL being read in. As per microsoft documentation
The Read operation reads as much data as is available, up to the
number of bytes specified by the size parameter.
In my case I was not making sure my reads would get all the data I expected.
This can be accomplished simply with this code.
byte[] data= new byte[packageSize];
bytesRead = _netStream.Read(data, 0, packageSize);
while (bytesRead < packageSize)
bytesRead += _netStream.Read(data, bytesRead, packageSize - bytesRead);
On top of this problem I had a fundamental issue with using DeflateStream - namely I should not use DeflateStream to write to the underlying NetworkStream. The correct approach is to first use the DeflateStream to compress data into a ByteArray, then send that ByteArray using the NetworkStream directly.
Using this approach helped to correctly compress data over the network and property read the data on the other end.
You may point out that I must know the size of the data, and that is true. Every call has a 8 byte 'header' that includes the size of the compressed data and the size of the data when it is uncompressed. Although I think the second was utimately not needed.
The code for this is here. Note the variable compressedSize serves 2 purposes.
int packageSize = streamIn.Read(sizeOfDataInBytes, 0, 4);
while (packageSize!= 4)
{
packageSize+= streamIn.Read(sizeOfDataInBytes, packageSize, 4 - packageSize);
}
packageSize= BitConverter.ToInt32(sizeOfDataInBytes, 0);
With this information I can correctly use the code I showed you first to get the contents fully.
Once I have the full compressed byte array I can get the incoming data like so:
var output = new MemoryStream();
using (var stream = new MemoryStream(bufferIn))
{
using (var decompress = new DeflateStream(stream, CompressionMode.Decompress))
{
decompress.CopyTo(output);;
}
}
output.Position = 0;
var unCompressedArray = output.ToArray();
output.Close();
output.Dispose();
return Encoding.UTF8.GetString(unCompressedArray);

Generate zip file with xml content on the fly [duplicate]

I want to write a String to a Stream (a MemoryStream in this case) and read the bytes one by one.
stringAsStream = new MemoryStream();
UnicodeEncoding uniEncoding = new UnicodeEncoding();
String message = "Message";
stringAsStream.Write(uniEncoding.GetBytes(message), 0, message.Length);
Console.WriteLine("This:\t\t" + (char)uniEncoding.GetBytes(message)[0]);
Console.WriteLine("Differs from:\t" + (char)stringAsStream.ReadByte());
The (undesired) result I get is:
This: M
Differs from: ?
It looks like it's not being read correctly, as the first char of "Message" is 'M', which works when getting the bytes from the UnicodeEncoding instance but not when reading them back from the stream.
What am I doing wrong?
The bigger picture: I have an algorithm which will work on the bytes of a Stream, I'd like to be as general as possible and work with any Stream. I'd like to convert an ASCII-String into a MemoryStream, or maybe use another method to be able to work on the String as a Stream. The algorithm in question will work on the bytes of the Stream.
After you write to the MemoryStream and before you read it back, you need to Seek back to the beginning of the MemoryStream so you're not reading from the end.
UPDATE
After seeing your update, I think there's a more reliable way to build the stream:
UnicodeEncoding uniEncoding = new UnicodeEncoding();
String message = "Message";
// You might not want to use the outer using statement that I have
// I wasn't sure how long you would need the MemoryStream object
using(MemoryStream ms = new MemoryStream())
{
var sw = new StreamWriter(ms, uniEncoding);
try
{
sw.Write(message);
sw.Flush();//otherwise you are risking empty stream
ms.Seek(0, SeekOrigin.Begin);
// Test and work with the stream here.
// If you need to start back at the beginning, be sure to Seek again.
}
finally
{
sw.Dispose();
}
}
As you can see, this code uses a StreamWriter to write the entire string (with proper encoding) out to the MemoryStream. This takes the hassle out of ensuring the entire byte array for the string is written.
Update: I stepped into issue with empty stream several time. It's enough to call Flush right after you've finished writing.
Try this "one-liner" from Delta's Blog, String To MemoryStream (C#).
MemoryStream stringInMemoryStream =
new MemoryStream(ASCIIEncoding.Default.GetBytes("Your string here"));
The string will be loaded into the MemoryStream, and you can read from it. See Encoding.GetBytes(...), which has also been implemented for a few other encodings.
You're using message.Length which returns the number of characters in the string, but you should be using the nubmer of bytes to read. You should use something like:
byte[] messageBytes = uniEncoding.GetBytes(message);
stringAsStream.Write(messageBytes, 0, messageBytes.Length);
You're then reading a single byte and expecting to get a character from it just by casting to char. UnicodeEncoding will use two bytes per character.
As Justin says you're also not seeking back to the beginning of the stream.
Basically I'm afraid pretty much everything is wrong here. Please give us the bigger picture and we can help you work out what you should really be doing. Using a StreamWriter to write and then a StreamReader to read is quite possibly what you want, but we can't really tell from just the brief bit of code you've shown.
I think it would be a lot more productive to use a TextWriter, in this case a StreamWriter to write to the MemoryStream. After that, as other have said, you need to "rewind" the MemoryStream using something like stringAsStream.Position = 0L;.
stringAsStream = new MemoryStream();
// create stream writer with UTF-16 (Unicode) encoding to write to the memory stream
using(StreamWriter sWriter = new StreamWriter(stringAsStream, UnicodeEncoding.Unicode))
{
sWriter.Write("Lorem ipsum.");
}
stringAsStream.Position = 0L; // rewind
Note that:
StreamWriter defaults to using an instance of UTF8Encoding unless specified otherwise. This instance of UTF8Encoding is constructed without a byte order mark (BOM)
Also, you don't have to create a new UnicodeEncoding() usually, since there's already one as a static member of the class for you to use in convenient utf-8, utf-16, and utf-32 flavors.
And then, finally (as others have said) you're trying to convert the bytes directly to chars, which they are not. If I had a memory stream and knew it was a string, I'd use a TextReader to get the string back from the bytes. It seems "dangerous" to me to mess around with the raw bytes.
You need to reset the stream to the beginning:
stringAsStream.Seek(0, SeekOrigin.Begin);
Console.WriteLine("Differs from:\t" + (char)stringAsStream.ReadByte());
This can also be done by setting the Position property to 0:
stringAsStream.Position = 0

C# networkstream compression - Sharpziplib, DotNetZip, gzipstream all give errors on my stream

I have a pair of C# client-server programs that communicate using a networkstream.
Everything works fine as it is without compression.
Now I'd like to get the bandwidth-usage down, so I want to use a compressing wrapperstream around my networkstream.
I have tried SharpZipLib, DotNetZip, C#'s own GZipStream - but I can get none of them to work.
SharpZipLib has problems flushing, and applying the fix specified here: http://community.sharpdevelop.net/forums/p/7855/22139.aspx results in an exception "Header checksum illegal".
Using DotNetZip's DeflateStream results in a ZLibException("Bad state (invalid stored block lengths)");
GZipStream gives me a System.IO.InvalidDataException stating "The magic number in GZip header is not correct. Make sure you are passing in a GZip stream.".
The way I've implemented it is that everytime an array of byte has to be sent by my framework, I create a new Compression stream wrapper around the existing networkstream, write the bytes to the compression stream, and then flush, close & dispose it.
This to make sure that each WriteMessage(byte[] blah) uses it's own state-independent compressionstream that will be flushed immediately.
I've taken care to not let any of the streams close the original network stream.
using (System.IO.Stream outputStream = CreateOutputStreamWrapper(_networkStream))
{
outputStream.Write(messageBytes, 0, messageBytes.Length);
outputStream.Flush();
outputStream.Close();
outputStream.Dispose();
}
Basicly, my DecompressionStream is created as follows (optionals commented out)
protected System.IO.Stream CreateInputStreamWrapper(System.IO.Stream inInputStream)
{
//return new DeflateStream(inInputStream, CompressionMode.Decompress, true);
//return new BZip2InputStream(inInputStream, true);
return new GZipStream(inInputStream, System.IO.Compression.CompressionMode.Decompress, true);
}
and started as
_inputStream.BeginRead(_buffer, 0, _buffer.Length, new AsyncCallback(ReceiveCallback), null);
then in the ReceiveCallback, the data is read, the stream is flushed, closed and disposed:
//Get received bytes count
var bytesRead = _inputStream.EndRead(ar);
_inputStream.Flush();
_inputStream.Close();
_inputStream.Dispose();
and immediately create a new inputStream by calling CreateInputStreamWrapper again.
So what's going on ?
Since all compression-stream implementations are failing with errors that come down to "there's an error in the datastream" I have a hunch it must be me and my code.
On the other hand, if I remove the compression and just use the networkstream there's no problem, which makes me think the problem must lie with the compression-code.
Does this sound familiar to anyone ?
And while we're at it, does anyone know of any (other) compression stream implementations that are suited to wrap around a networkstream ?
Just in case anyone else ever reads this, DotNetZip's ZLib streams have a FlushMode flag that enables you to set up flushing compatible for networking stuff ('Sync' and 'Full' modes).

Why does gzip/deflate compressing a small file result in many trailing zeroes?

I'm using the following code to compress a small (~4kB) HTML file in C#.
byte[] fileBuffer = ReadFully(inFile, ResponsePacket.maxResponsePayloadLength); // Read the entire requested HTML file into a memory buffer
inFile.Close(); // Close the requested HTML file
byte[] payload;
using (MemoryStream compMS = new MemoryStream()) // Create a new memory stream to hold the compressed HTML data
{
using (GZipStream gzip = new GZipStream(compMS, CompressionMode.Compress)) // Create a new GZip object pointing to the empty memory stream
{
gzip.Write(fileBuffer, 0, fileBuffer.Length); // Compress the file buffer and write it to the empty memory stream
gzip.Close(); // Close the GZip object
}
payload = compMS.GetBuffer(); // Write the compressed file buffer data in the memory stream to a byte buffer
}
The resulting compressed data is about 2k, but about half of it is just zeroes. This is for a very bandwidth sensitive application (which is why I'm bothering to compress 4kB in the first place), so the extra 1kB of zeroes is wasted valuable space. My best guess would be that the compression algorithm is padding out the data to a block boundary. If so, is there any way to override this behavior or change the block size? I get the same results with vanilla .NET GZipStream and zlib's GZipStream, as well as DeflateStream.
Wrong MemoryStream method. GetBuffer() returns the underlying buffer, it is always larger (or exactly as large) as the data in the stream. Very efficient because no copy needs to be made.
But you need the ToArray() method here. Or use the Length property.

Categories