The deflate function of zlib accepts a Z_FULL_FLUSH option as its argument, as far as I know, when full flush is performed, a full flush point (0,0,0xFF,0xFF) will be set, subsequent data will be independent from the bytes precedes that point, hence makes the compressed data quasi random accessible. I've read a little bit source of dictzip, it took advantage of this feature to implement its chunk-wise random-accessibility, yet kept the compatibility with gzip. I want to reimplement dictzip compressor/decompressor using CLR provided functionality. The decompressor (random parts reader) is easy, I can just use the DeflateStream to decompress the data chunk by chunk, with no problem, but as for the creation, there is a huge obstacle, the API of DeflateStream is too high-level, it seems all deflate details are hidden, hence unexploitable. I really don't like to include a C shared library in my C# project, it makes the cross-platform deployment very painful, and cancels the benefits of choosing coding in C# in the first place. So, what should I do? Is there any way I can circumvent this obstacle merely using managed code? Does Mono provide a lower-level wrapper for zlib so that I can call deflate with full flush option? Any advice will be appreciated!
--- >8 ---
Thanks to Mark Adler who provided the answer. DotNetZip (specifically Ionic.Zlib.DeflateStream) supports exactly the feature I was asking for. The following example shows how it works:
Encoding u8 = Encoding.UTF8;
byte[] s1 = u8.GetBytes("the quick brown fox "),
s2 = u8.GetBytes("jumps over the lazy dog!");
var ms = new MemoryStream(100);
var deflator = new DeflateStream(ms, CompressionMode.Compress, true);
deflator.FlushMode = FlushType.Full;
deflator.Write(s1, 0, s1.Length);
deflator.Flush();
var pos = ms.Position;//remember the full flush point
deflator.Write(s2, 0, s2.Length);
deflator.Dispose();
var inflator = new DeflateStream(ms, CompressionMode.Decompress);
ms.Position = pos;
byte[] buf = new byte[100];
inflator.Read(buf, 0, s2.Length);
Console.WriteLine(u8.GetString(buf));//output: jumps over the lazy dog!
Try DotNetZip. SO won't let me post less than 30 characters, even though that is a complete and correct answer, so I added this extraneous sentence.
Related
I am confused by the behavior of Deflate algorithm, such as, the first chunk of bytes (size 12~13k) always decompress successfully. But the second decompression never turns out successful..
I am using DotNetZip (DeflateStream) with a simple code, later I switched to ZLIB.Net (component ace), Org.Bouncycastle, and variety of c# libraries.
The compression goes in c++ (the server that sends the packets) with deflateInit2, windowSize (-15) -> (15 - nowrap).
What could be incorrectly going so that I'm having zeros at the end of the buffer despite the fact that the decompression went successfully?
a example code with "Org.BouncyCastle.Utilities.Zlib"
it's pretty much the same code for almost any lib (DotNetZip, ZLIB.Net, ...)
internal static bool Inflate(byte[] compressed, out byte[] decompressed)
{
using (var inputStream = new MemoryStream(compressed))
using (var zInputStream = new ZInputStream(inputStream, true))
using (var outputStream = new MemoryStream())
{
zInputStream.CopyTo(outputStream);
decompressed = outputStream.ToArray();
}
return true;
}
To make sure everything working correctly, you should check for the following:
Both zlib versions match on both sides (compression - server, and decompression - client).
Flush mode is set to sync, that means the buffer must be synchronized in order to decompress further packets sent by the server.
Ensure that the packets you received is actually correct, and at my specific case, I was appending a different array size (a constant one in fact, of 0xFFFF) which might be different than the size of the received data (and that happens in most cases).
[EDIT 13th Nov. 19']
Keep in mind that conventionally the server may not send the last 4 bytes (sync 00 00 ff ff) if both has a contract that the flush type is sync, so be aware of adding them manually.
I have a TcpClient class on a client and server setup on my local machine. I have been using the Network stream to facilitate communications back and forth between the 2 successfully.
Moving forward I am trying to implement compression in the communications. I've tried GZipStream and DeflateStream. I have decided to focus on DeflateStream. However, the connection is hanging without reading data now.
I have tried 4 different implementations that have all failed due to the Server side not reading the incoming data and the connection timing out. I will focus on the two implementations I have tried most recently and to my knowledge should work.
The client is broken down to this request: There are 2 separate implementations, one with streamwriter one without.
textToSend = ENQUIRY + START_OF_TEXT + textToSend + END_OF_TEXT;
// Send XML Request
byte[] request = Encoding.UTF8.GetBytes(textToSend);
using (DeflateStream streamOut = new DeflateStream(netStream, CompressionMode.Compress, true))
{
//using (StreamWriter sw = new StreamWriter(streamOut))
//{
// sw.Write(textToSend);
// sw.Flush();
streamOut.Write(request, 0, request.Length);
streamOut.Flush();
//}
}
The server receives the request and I do
1.) a quick read of the first character then if it matches what I expect
2.) I continue reading the rest.
The first read works correctly and if I want to read the whole stream it is all there. However I only want to read the first character and evaluate it then continue in my LongReadStream method.
When I try to continue reading the stream there is no data to be read. I am guessing that the data is being lost during the first read but I'm not sure how to determine that. All this code works correctly when I use the normal NetworkStream.
Here is the server side code.
private void ProcessRequests()
{
// This method reads the first byte of data correctly and if I want to
// I can read the entire request here. However, I want to leave
// all that data until I want it below in my LongReadStream method.
if (QuickReadStream(_netStream, receiveBuffer, 1) != ENQUIRY)
{
// Invalid Request, close connection
clientIsFinished = true;
_client.Client.Disconnect(true);
_client.Close();
return;
}
while (!clientIsFinished) // Keep reading text until client sends END_TRANSMISSION
{
// Inside this method there is no data and the connection times out waiting for data
receiveText = LongReadStream(_netStream, _client);
// Continue talking with Client...
}
_client.Client.Shutdown(SocketShutdown.Both);
_client.Client.Disconnect(true);
_client.Close();
}
private string LongReadStream(NetworkStream stream, TcpClient c)
{
bool foundEOT = false;
StringBuilder sbFullText = new StringBuilder();
int readLength, totalBytesRead = 0;
string currentReadText;
c.ReceiveBufferSize = DEFAULT_BUFFERSIZE * 100;
byte[] bigReadBuffer = new byte[c.ReceiveBufferSize];
while (!foundEOT)
{
using (var decompressStream = new DeflateStream(stream, CompressionMode.Decompress, true))
{
//using (StreamReader sr = new StreamReader(decompressStream))
//{
//currentReadText = sr.ReadToEnd();
//}
readLength = decompressStream.Read(bigReadBuffer, 0, c.ReceiveBufferSize);
currentReadText = Encoding.UTF8.GetString(bigReadBuffer, 0, readLength);
totalBytesRead += readLength;
}
sbFullText.Append(currentReadText);
if (currentReadText.EndsWith(END_OF_TEXT))
{
foundEOT = true;
sbFullText.Length = sbFullText.Length - 1;
}
else
{
sbFullText.Append(currentReadText);
}
// Validate data code removed for simplicity
}
c.ReceiveBufferSize = DEFAULT_BUFFERSIZE;
c.ReceiveTimeout = timeOutMilliseconds;
return sbFullText.ToString();
}
private string QuickReadStream(NetworkStream stream, byte[] receiveBuffer, int receiveBufferSize)
{
using (DeflateStream zippy = new DeflateStream(stream, CompressionMode.Decompress, true))
{
int bytesIn = zippy.Read(receiveBuffer, 0, receiveBufferSize);
var returnValue = Encoding.UTF8.GetString(receiveBuffer, 0, bytesIn);
return returnValue;
}
}
EDIT
NetworkStream has an underlying Socket property which has an Available property. MSDN says this about the available property.
Gets the amount of data that has been received from the network and is
available to be read.
Before the call below Available is 77. After reading 1 byte the value is 0.
//receiveBufferSize = 1
int bytesIn = zippy.Read(receiveBuffer, 0, receiveBufferSize);
There doesn't seem to be any documentation about DeflateStream consuming the whole underlying stream and I don't know why it would do such a thing when there are explicit calls to be made to read specific numbers of bytes.
Does anyone know why this happens or if there is a way to preserve the underlying data for a future read? Based on this 'feature' and a previous article that I read stating a DeflateStream must be closed to finish sending (flush won't work) it seems DeflateStreams may be limited in their use for networking especially if one wishes to counter DOS attacks by testing incoming data before accepting a full stream.
The basic flaw I can think of looking at your code is a possible misunderstanding of how network stream and compression works.
I think your code might work, if you kept working with one DeflateStream. However, you use one in your quick read and then you create another one.
I will try to explain my reasoning on an example. Assume you have 8 bytes of original data to be sent over the network in a compressed way. Now let's assume for sake of an argument, that each and every byte (8 bits) of original data will be compressed to 6 bits in compressed form. Now let's see what your code does to this.
From the network stream, you can't read less than 1 byte. You can't take 1 bit only. You take 1 byte, 2 bytes, or any number of bytes, but not bits.
But if you want to receive just 1 byte of the original data, you need to read first whole byte of compressed data. However, there is only 6 bits of compressed data that represent the first byte of uncompressed data. The last 2 bits of the first byte are there for the second byte of original data.
Now if you cut the stream there, what is left is 5 bytes in the network stream that do not make any sense and can't be uncompressed.
The deflate algorithm is more complex than that and thus it makes perfect sense if it does not allow you to stop reading from the NetworkStream at one point and continue with new DeflateStream from the middle. There is a context of the decompression that must be present in order to decompress the data to their original form. Once you dispose the first DeflateStream in your quick read, this context is gone, you can't continue.
So, to resolve your issue, try to create only one DeflateStream and pass it to your functions, then dispose it.
This is broken in many ways.
You are assuming that a read call will read the exact number of bytes you want. It might read everything in one byte chunks though.
DeflateStream has an internal buffer. It can't be any other way: Input bytes do not correspond 1:1 to output bytes. There must be some internal buffering. You must use one such stream.
Same issue with UTF-8: UTF-8 encoded strings cannot be split at byte boundaries. Sometimes, your Unicode data will be garbled.
Don't touch ReceiveBufferSize, it does not help in any way.
You cannot reliably flush a deflate stream, I think, because the output might be at a partial byte position. You probably should devise a message framing format in which you prepend the compressed length as an uncompressed integer. Then, send the compressed deflate stream after the length. This is decodable in a reliable way.
Fixing these issues is not easy.
Since you seem to control client and server you should discard all of this and not devise your own network protocol. Use a higher-level mechanism such as web services, HTTP, protobuf. Anything is better than what you have there.
Basically there are a few things wrong with the code I posted above. First is that when I read data I'm not doing anything to make sure the data is ALL being read in. As per microsoft documentation
The Read operation reads as much data as is available, up to the
number of bytes specified by the size parameter.
In my case I was not making sure my reads would get all the data I expected.
This can be accomplished simply with this code.
byte[] data= new byte[packageSize];
bytesRead = _netStream.Read(data, 0, packageSize);
while (bytesRead < packageSize)
bytesRead += _netStream.Read(data, bytesRead, packageSize - bytesRead);
On top of this problem I had a fundamental issue with using DeflateStream - namely I should not use DeflateStream to write to the underlying NetworkStream. The correct approach is to first use the DeflateStream to compress data into a ByteArray, then send that ByteArray using the NetworkStream directly.
Using this approach helped to correctly compress data over the network and property read the data on the other end.
You may point out that I must know the size of the data, and that is true. Every call has a 8 byte 'header' that includes the size of the compressed data and the size of the data when it is uncompressed. Although I think the second was utimately not needed.
The code for this is here. Note the variable compressedSize serves 2 purposes.
int packageSize = streamIn.Read(sizeOfDataInBytes, 0, 4);
while (packageSize!= 4)
{
packageSize+= streamIn.Read(sizeOfDataInBytes, packageSize, 4 - packageSize);
}
packageSize= BitConverter.ToInt32(sizeOfDataInBytes, 0);
With this information I can correctly use the code I showed you first to get the contents fully.
Once I have the full compressed byte array I can get the incoming data like so:
var output = new MemoryStream();
using (var stream = new MemoryStream(bufferIn))
{
using (var decompress = new DeflateStream(stream, CompressionMode.Decompress))
{
decompress.CopyTo(output);;
}
}
output.Position = 0;
var unCompressedArray = output.ToArray();
output.Close();
output.Dispose();
return Encoding.UTF8.GetString(unCompressedArray);
I'm trying to decompress data compressed with zlib algorithm in C# using 2 most legitimate libraries compatible with zlib algorithm and I got similar exception thrown.
Using DotNetZip:
Ionic.Zlib.ZlibException: Bad state (invalid stored block lengths)
Using Zlib.Net:
inflate: invalid stored block lenghts
but using same data as input to zlib-flate command on linux using only default parameters, works great and decompressed without any warnings (output is correct):
zlib-flate -uncompress < ./dbgZlib
Any suggestions what I can do in order to decompress this data in C# or why actually decompression failing in this case?
Compressed data as hex:
root#localhost:~# od -t x1 -An ./dbgZlib |tr -d '\n '
789c626063520b666060606262d26160d05307329999e70a6400e93c2066644080cf8c938c0c0c4d0d0d0d2d839c437c02dcfd0c0c0c11d28ea121013e7e41860ce18e210640e06810141669c080051840012eb970d790800090f99eee409ea189025e806c8e8b5354a89b13d81c136ca60f3a000e5fd6af0fb14a3221873e96400506374cd6c7d52dc8d98980657e7e06460ace0a4ce86e80da9f0249030edf816c16481ab06b60404f03931169c0cdc728c0db0fd928681a3042a481480347336c6e21320d78fb8155195a9090067ca3420387771a400a546aa70100000000ffff
Compressed data as base64:
root#localhost:~# base64 ./dbgZlib
eJxiYGNSC2ZgYGBiYtJhYNBTBzKZmecKZADpPCBmZECAz4yTjAwMTQ0NDS2DnEN8Atz9DAwMEdKO
oSEBPn5BhgzhjiEGQOBoEBQWacCABRhAAS65cNeQgACQ+Z7uQJ6hiQJegGyOi1NUqJsT2BwTbKYP
OgAOX9avD7FKMiGHPpZABQY3TNbH1S3I2YmAZX5+BkYKzgpM6G6A2p8CSQMO34FsFkgasGtgQE8D
kxFpwM3HKMDbD9koaBowQqSBSANHM2xuITINePuBVRlakJAGfKNCA4d3GkAKVGqnAQAAAAD//w==
Data after decompression, encoded with base64 look like this:
root#localhost:~# zlib-flate -uncompress < ./dbgZlib | base64
AAYCJlMAAAACAgIsAAAuJwAAAAMDnRBoAAAAbgAAAAEAAAAAAAAAAAAAAPMBkjIwMTUxMTE5UkNU
TFBHTjAwMQAAAAAAAAAAAABBVVRQTE5SMQBXQVQwMDAwQTBSVlkwAAAAAAAAAAAAAAAAAAAAAAAA
AAAwMDAwMDAwMAAAAAAAAAAAAAAAAAAAAAAAAAAAMDAwMFdFVFBQTFBHTklHMDAwMTQgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAwMAAAAAAAAAAAAABEQlpVRkIAAAAAMDQAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAX14QAAAAAAAAAA
AAAAAAAAAAAAAAAAAAIBAAAAAAAAAAAAAABBVDAwMDBBMFJWWTAAAAAAAAAAUExOAAAAAAAAAABM
RUZSQ0IAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATk4wMiBDIAIAAAAAAAAAAAAAAAAA
AAABBfXhAAAAAGQCAgIsAABA9wAAAAQDnRBoAAA+gAAAAAEAAAAAAAAAAAAAAPMBkzIwMTUxMTE5
UkNGTDJQS04AAAAAAAAAAAAAAABBVVRQTE5SMgBXQVQwMDAwQTBZMEE2AAAAAAAAAAAAAAAAAAAA
AAAAAAAwMDAwMDAwMAAAAAAAAAAAAAAAAAAAAAAAAAAAMDAwMFdFVFBQTFBLTjAwMDAwMTggICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAwMAAAAAAAAAAAAABETVpVUUIAAAAAMDQAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAX14QAAAAAA
AAAAAAAAAAAAAAAAAAAAAAIBAAAAAAAAAAAAAABBVDAwMDBBMFkwQTYAAAAAAAAAUExOAAAAAAAA
AABMRUZSQ0IAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATk4wMiBDIAIAAAAAAAAAAAAA
AAAAAAABBfXhAAAAAGQ=
The problem is that you are using zlib-flate as a general-purpose compression algorithm which, according to the manpage for it, you should not do:
This program should not be used as a general purpose compression tool.
Use something like gzip(1) instead.
So perhaps you should follow the instructions given by your tools and not use them for things that they are not intended for. Use gzip and the System.IO.Compression.GZipStream instead, it's much simpler, especially when you're looking for cross-platform compatible compression algorithms.
That said...
The reason that you can't inflate the data is that it lacks a correct GZIP header. If you add the right header to it you will get something that can be decompressed.
For instance:
public static byte[] DecompressZLibRaw(byte[] bCompressed)
{
byte[] bHdr = new byte[] { 0x1F, 0x8b, 0x08, 0, 0, 0, 0, 0 };
using (var sOutput = new MemoryStream())
using (var sCompressed = new MemoryStream())
{
sCompressed.Write(bHdr, 0, bHdr.Length);
sCompressed.Write(bCompressed, 0, bCompressed.Length);
sCompressed.Position = 0;
using (var decomp = new GZipStream(sCompressed, CompressionMode.Decompress))
{
decomp.CopyTo(sOutput);
}
return sOutput.ToArray();
}
}
Adding the header makes all the difference.
NB: There are two bytes in the 10-byte GZIP header that are not stripped from your source. These are normally used to store the compression flags and the source file system. In the compressed data you present they are invalid values. Additionally the file footer is abbreviated to 5 bytes instead of 8 bytes... all of which is not actually required for decompression. Which probably has a lot to do with why the manpage says not to use this for general compression.
The stream you provided is not complete. It appears that you ended it with a Z_SYNC_FLUSH or Z_FULL_FLUSH in your C# code, instead of a Z_FINISH like you're supposed to. That is causing the error. If you terminate the stream properly, you won't have a problem.
zlib-flate is simply ignoring that error.
If you are not in control of the generation of the stream, you can still use zlib to decompress what's there. You just need to use it at a lower level where you operate on blocks of data and get the decompressed data available given the provided input.
I'm using protocol buffers 3 in c#. I'm trying to bounce through a stream to find the start locations of each message, without actually Deserialising the messages. All messages are written to the stream with WriteDelimitedTo.
I then use this code to try to jump from length markers:
_map = new List<int>();
_stream.Seek(0, SeekOrigin.Begin);
var codedStream = new CodedInputStream(_stream);
while (_stream.Position < _stream.Length)
{
var length = codedStream.ReadInt32();
_map.Add((int) _stream.Position);
_stream.Seek(length, SeekOrigin.Current);
}
However, the moment I do codedStream.ReadInt32() the stream position is set to the end, rather than just the next byte after the varint32.
This behaviour is due to CodedInputStream buffering the original stream as you can see in the source code. It is probably unsuitable for manually reading and seeking through a stream. An alternative is to use parts of Marc Gravell's source code for reading a varint, available here, and move though the raw stream directly.
Ok, before we start. I work for a company that has a license to redistribute PDF files from various publishers in any media form. So, that being said, the extraction of embedded fonts from the given PDF files is not only legal - but also vital to the presentation.
I am using code found on this site, however I do not recall the author, when I find it I will reference them. I have located the stream within the PDF file that contains the embedded fonts, I have isolated this encoded stream as a string and then into a byte[]. When I use the following code I get an error
Block length does not match with its complement.
Code (the error occurs in the while line below):
private static byte[] DecodeFlateDecodeData(byte[] data)
{
MemoryStream outputStream;
using (outputStream = new MemoryStream())
{
using (var compressedDataStream = new MemoryStream(data))
{
// Remove the first two bytes to skip the header (it isn't recognized by the DeflateStream class)
compressedDataStream.ReadByte();
compressedDataStream.ReadByte();
var deflateStream = new DeflateStream(compressedDataStream, CompressionMode.Decompress, true);
var decompressedBuffer = new byte[compressedDataStream.Length];
int read;
// The error occurs in the following line
while ((read = deflateStream.Read(decompressedBuffer, 0, decompressedBuffer.Length)) != 0)
{
outputStream.Write(decompressedBuffer, 0, read);
}
outputStream.Flush();
compressedDataStream.Close();
}
return ReadFully(outputStream);
}
}
After using the usual tools (Google, Bing, archives here) I found that the majority of the time that this occurs is when one has not consumed the first two bytes of the encoding stream - but this is done here so i cannot find the source of this error. Below is the encoded stream:
H‰LT}lg?7ñù¤aŽÂ½ãnÕ´jh›Ú?-T’ÑRL–¦
ëš:Uí6Ÿ¶“ø+ñ÷ùü™”ÒÆŸŸíóWlDZ“ºu“°tƒ¦t0ÊD¶jˆ
Ö m:$½×^*qABBï?Þç÷|ýÞßóJÖˆD"yâP—òpgÇó¦Q¾S¯9£Û¾mçÁçÚ„cÂÛO¡É‡·¥ï~á³ÇãO¡ŸØö=öPD"d‚ìA—$H'‚DC¢D®¤·éC'Å:È—€ìEV%cÿŽS;þÔ’kYkùcË_ZÇZ/·þYº(ý݇Ã_ó3m¤[3¤²4ÿo?²õñÖ*Z/Þiãÿ¿¾õ8Ü ?»„O Ê£ðÅP9ÿ•¿Â¯*–z×No˜0ãÆ-êàîoR‹×ÉêÊêÂulaƒÝü
Please help, I am beating my head against the wall here!
NOTE: The stream above is the encoded version of Arial Black - according to the specs inside the PDF:
661 0 obj
<<
/Type /FontDescriptor
/FontFile3 662 0 R
/FontBBox [ -194 -307 1688 1083 ]
/FontName /HLJOBA+ArialBlack
/Flags 4
/StemV 0
/CapHeight 715
/XHeight 518
/Ascent 0
/Descent -209
/ItalicAngle 0
/CharSet (/space/T/e/s/t/a/k/i/n/g/S/r/E/x/m/O/u/l)
>>
endobj
662 0 obj
<< /Length 1700 /Filter /FlateDecode /Subtype /Type1C >>
stream
H‰LT}lg?7ñù¤aŽÂ½ãnÕ´jh›Ú?-T’ÑRL–¦
ëš:Uí6Ÿ¶“ø+ñ÷ùü™”ÒÆŸŸíóWlDZ“ºu“°tƒ¦t0ÊD¶jˆ
Ö m:$½×^*qABBï?Þç÷|ýÞßóJÖˆD"yâP—òpgÇó¦Q¾S¯9£Û¾mçÁçÚ„cÂÛO¡É‡·¥ï~á³ÇãO¡ŸØö=öPD"d‚ìA—$H'‚DC¢D®¤·éC'Å:È—€ìEV%cÿŽS;þÔ’kYkùcË_ZÇZ/·þYº(ý݇Ã_ó3m¤[3¤²4ÿo?²õñÖ*Z/Þiãÿ¿¾õ8Ü ?»„O Ê£ðÅP9ÿ•¿Â¯*–z×No˜0ãÆ-êàîoR‹×ÉêÊêÂulaƒÝü
Is there a particular reason why you're not using the GetStreamBytes() method that is provided with iText? What about data? Are you sure you are looking at the correct bytes? Did you create the PRStream object correctly and did you get the bytes with PdfReader.GetStreamBytesRaw()? If so, why decode the bytes yourself? Which brings me to my initial counter-question: is there a particular reason why you're not using the GetStreamBytes() method?
Looks like GetStreamBytes() might solve your problem out right, but let me point out that I think you're doing something dangerous concerning end-of-line markers. The PDF Specification in 7.3.8.1 states that:
The keyword stream that follows the stream dictionary shall be
followed by an end-of-line marker consisting of either a CARRIAGE
RETURN and a LINE FEED or just a LINE FEED, and not by a CARRIAGE
RETURN alone.
In your code it looks like you always skip two bytes while the spec says it could be either one or two (CR LF or LF).
You should be able to catch whether you are running into this by comparing the exact number of bytes you want to decode with the value of the (Required) "Length" key in the stream dictionary.
Okay, for anyone who might stumble across this issue themselves allow me to warn you - this is a rocky road without a great deal of good solutions. I eventually moved away from writing all of the code to extract the fonts myself. I simply downloaded MuPDF (open source) and then made command line calls to mutool.exe:
mutool extract C:\mypdf.pdf
This pulls all of the fonts into the folder mutool resides in (it also extracts some images (these are the fonts that could not be converted (usually small subsets I think))). I then wrote a method to move those from that folder into the one I wanted them in.
Of course, to convert these to anything usable is a headache in itself - but I have found it to be doable.
As a reminder, font piracy IS piracy.