Decompress file with wrong size - c#

I have a method that decompresses *.gz file:
using (FileStream originalFileStream = new FileStream(gztempfilename, FileMode.Open, FileAccess.Read))
{
using (FileStream decompressedFileStream = new FileStream(outputtempfilename, FileMode.Create, FileAccess.Write))
{
using (GZipStream decompressionStream = new GZipStream(originalFileStream, CompressionMode.Decompress))
{
decompressionStream.CopyTo(decompressedFileStream);
}
}
}
It worked perfectly, but recently I received pack of files with wrong size:
When I open them with 7-zip they have Packed Size ~ 1,600,000 and Size = 7 (it should be ~20,000,000).
So when I extract them using this code I get only a part of the file. But when I extract this file using 7-zip I get full file.
How can I handle this situation in my code?

My guess is that that the other end does a mistake when GZipping the files. It looks like it does not set the ISIZE bytes correctly.
The ISIZE bytes are the last four bytes of a valid GZip file and come after a 32-bit CRC value which in turn comes directly after the compressed data bytes.
7-Zip seems to be robust against such mistakes whereas the GZipStream is not. It is odd however that 7-Zip is not showing you any errors. It should show you (tested with 7-Zip 16.02 x64/Win7)...
CRC error in case the size is simply wrong,
"Unexpected end of data" in case some or all of the ISIZE bytes are cut off,
"There are some data after end of the payload data" in case there is more data following the ISIZE bytes.
7-Zip always uses the last four bytes of the packed file to determine the size of the original unpacked file without checking if the file is valid and whether the bytes read for that are actually the ISIZE bytes.
You can verify this by checking those last four bytes of the GZipped file with a hex viewer. For your example they should be exactly 07 00 00 00.
If you know the exact size of the unpacked original file you could replace those bytes so that they specify the correct size. For instance, if the unpacked file's size is 20,000,078, which is 01312D4E in hex (0-padded to eight digits), those bytes should be 4E 2D 31 01.
In case you don't know the exact size you can try replacing them with the maximum value, i.e. FF FF FF FF.
After that try your unpack code again.
This is obviously only a hacky solution to your problem. Better try fixing the code that GZips the files you receive or try to find a library that is more robust than GZipStream.

I've used ICSharpCode.SharpZipLib.GZip.GZipInputStream from this library instead of System.IO.Compression.GZipStream and it helped.

Did you try this for check the size? ie:
byte[] bArray;
using (FileStream f = new FileStream(tempFile, FileMode.Open))
{
bArray= new byte[f.Length];
f.Read(b, 0, f.Length);
}
Regards
try:
GZipStream uncompressed = new GZipStream(streamIn, CompressionMode.Decompress, true);
FileStream streamOut = new FileStream(tempDoc[0], FileMode.Create, FileAccess.Write, FileShare.None);

Looks like this is some sort of bug in GZipStream (it does not write original file length into gz end of file).
You need to change the way you compress your files using GZipStream.
The way it will work:
inputBytes = Encoding.UTF8.GetBytes(output);
using (var outputStream = new MemoryStream())
{
using (var gZipStream = new GZipStream(outputStream, CompressionMode.Compress))
gZipStream.Write(inputBytes, 0, inputBytes.Length);
System.IO.File.WriteAllBytes("file.xml.gz", outputStream.ToArray());
}
And this way will cause the error you have (no matter will you use Flush() or not):
inputBytes = Encoding.UTF8.GetBytes(output);
using (var outputStream = new MemoryStream())
{
using (var gZipStream = new GZipStream(outputStream, CompressionMode.Compress))
{
gZipStream.Write(inputBytes, 0, inputBytes.Length);
System.IO.File.WriteAllBytes("file.xml.gz", outputStream.ToArray());
}
}

You might need to call decompressedStream.Seek() after closing the gZip stream.
As shown here.

Related

How can i find the start of Gzip string in a MemoryStream?

byte[] httpDecompress(HttpDatagram http)
{
int magicnum = 0x1f8b;
Stream str= http.Body.ToMemoryStream();
using (var zipStream = new GZipStream(str, CompressionMode.Decompress))
using (var resultStream = new MemoryStream())
{
zipStream.CopyTo(resultStream);
return resultStream.ToArray();
}
}
there is the code but it gives a magic number error. How can i find the beginning of the GZip string, i think the source of problem is there. Can anyone help?
Not knowing where the gzip stream starts may or may not be your problem. (In fact, probably not.) In any case, you can search for the three-byte sequence 1f 8b 08 to identify candidate gzip streams. Start decompressing from the 1f to see if it really is a gzip stream.

GZipStream makes my text bigger than original

There is a post in here Compress and decompress string in c# for compressing string in c#.
I've implement the same code for myself but the returned text is almost twice as mine :O
I've tried it on a json with size 87 like this:
{"G":"82f88ff5-4143-46ef-86cc-a19910f4a6b5","U":"df39e3c7-ffd3-4829-a9cd-27bfcbd4403a"}
The result is 168
H4sIAAAAAAAEAC2NUQ6DIBQE5yx8l0QFqfQCnqAHqKCXaHr3jsaQ3TyYfcuXwKpeamHi0Bf9YCaSGVW6psLua5QWmifykVbPyCDJ3gube4GHet+tXZZM7Xrj6d7Z3u/W8896dVVpd5rMbCaa3k1k25M88OMPcjDew64AAAA=
I've changed Unicode to ASCII but the result is still too big (128)
H4sIAAAAAAAEAA3KyxGAMAgFwF44y0w+JAEbsAILICSvCcfedc/70EUnaYEq0FiyVJa+wdoj2LNZThDvs9FB918Xqu0ag4H1Vy3GbrG4jImYSyRVp/cDp8EZE1cAAAA=
public static string Compress(this string s)
{
var bytes = Encoding.ASCII.GetBytes(s);
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream())
{
using (var gs = new GZipStream(mso, CompressionMode.Compress))
{
msi.CopyTo(gs);
}
return Convert.ToBase64String(mso.ToArray());
}
}
Gzip is not only compression but a complete file format - this means it adds additional structures which usually can be neglected regarding their size.
However if compressing small strings they can blow up the overall gzip stream.
The standard GZIP header for example has 10 bytes and it's footer is 8 bytes long.
Therefore you now take your gzip compressed result in raw format (not the bloated up base64 encoded one) you will see that it has 95 bytes.
Therefore the 18 bytes for header and hooter already make nearly 20% of the output!

Convert a wav file to 8000Hz 16Bit Mono Wav

I need to convert a wav file to 8000Hz 16Bit Mono Wav. I already have a code, which works well with NAudio library, but I want to use MemoryStream instead of temporary file.
using System.IO;
using NAudio.Wave;
static void Main()
{
var input = File.ReadAllBytes("C:/input.wav");
var output = ConvertWavTo8000Hz16BitMonoWav(input);
File.WriteAllBytes("C:/output.wav", output);
}
public static byte[] ConvertWavTo8000Hz16BitMonoWav(byte[] inArray)
{
using (var mem = new MemoryStream(inArray))
using (var reader = new WaveFileReader(mem))
using (var converter = WaveFormatConversionStream.CreatePcmStream(reader))
using (var upsampler = new WaveFormatConversionStream(new WaveFormat(8000, 16, 1), converter))
{
// todo: without saving to file using MemoryStream or similar
WaveFileWriter.CreateWaveFile("C:/tmp_pcm_8000_16_mono.wav", upsampler);
return File.ReadAllBytes("C:/tmp_pcm_8000_16_mono.wav");
}
}
Not sure if this is the optimal way, but it works...
public static byte[] ConvertWavTo8000Hz16BitMonoWav(byte[] inArray)
{
using (var mem = new MemoryStream(inArray))
{
using (var reader = new WaveFileReader(mem))
{
using (var converter = WaveFormatConversionStream.CreatePcmStream(reader))
{
using (var upsampler = new WaveFormatConversionStream(new WaveFormat(8000, 16, 1), converter))
{
byte[] data;
using (var m = new MemoryStream())
{
upsampler.CopyTo(m);
data = m.ToArray();
}
using (var m = new MemoryStream())
{
// to create a propper WAV header (44 bytes), which begins with RIFF
var w = new WaveFileWriter(m, upsampler.WaveFormat);
// append WAV data body
w.Write(data,0,data.Length);
return m.ToArray();
}
}
}
}
}
}
It might be added and sorry I can't comment yet due to lack of points. That NAudio ALWAYS writes 46 byte headers which in certain situations can cause crashes. I want to add this in case someone encouters this while searching for a clue why naudio wav files only start crashing certain programs.
I encoutered this problem after figuring out how to convert and/or sample wav with NAudio and was stuck after for 2 days now and only figured it out with a hex editor.
(The 2 extra bytes are located at byte 37 and 38 right before the data subchunck [d,a,t,a,size<4bytes>].
Here is a comparison of two wave file headers left is saved by NAudio 46 bytes; right by Audacity 44 bytes
You can check this back by looking at the NAudio src in WaveFormat.cs at line 310 where instead of 16 bytes for the fmt chunck 18+extra are reserved (+extra because there are some wav files which even contain bigger headers than 46 bytes) but NAudio always seems to write 46 byte headers and never 44 (MS standard). It may also be noted that in fact NAudio is able to read 44 byte headers (line 210 in WaveFormat.cs)

binary file to string

i'm trying to read a binary file (for example an executable) into a string, then write it back
FileStream fs = new FileStream("C:\\tvin.exe", FileMode.Open);
BinaryReader br = new BinaryReader(fs);
byte[] bin = br.ReadBytes(Convert.ToInt32(fs.Length));
System.Text.Encoding enc = System.Text.Encoding.ASCII;
string myString = enc.GetString(bin);
fs.Close();
br.Close();
System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
byte[] rebin = encoding.GetBytes(myString);
FileStream fs2 = new FileStream("C:\\tvout.exe", FileMode.Create);
BinaryWriter bw = new BinaryWriter(fs2);
bw.Write(rebin);
fs2.Close();
bw.Close();
this does not work (the result has exactly the same size in bytes but can't run)
if i do bw.Write(bin) the result is ok, but i must save it to a string
When you decode the bytes into a string, and re-encodes them back into bytes, you're losing information. ASCII in particular is a very bad choice for this since ASCII will throw out a lot of information on the way, but you risk losing information when encoding and decoding regardless of the type of Encoding you pick, so you're not on the right path.
What you need is one of the BaseXX routines, that encodes binary data to printable characters, typically for storage or transmission over a medium that only allows text (email and usenet comes to mind.)
Ascii85 is one such algorithm, and the page contains links to different implementations. It has a ratio of 4:5 meaning that 4 bytes will be encoded as 5 characters (a 25% increase in size.)
If nothing else, there's already a Base64 encoding routine built into .NET. It has a ratio of 3:4 (a 33% increase in size), here:
Convert.ToBase64String Method
Convert.FromBase64String Method
Here's what your code can look like with these methods:
string myString;
using (FileStream fs = new FileStream("C:\\tvin.exe", FileMode.Open))
using (BinaryReader br = new BinaryReader(fs))
{
byte[] bin = br.ReadBytes(Convert.ToInt32(fs.Length));
myString = Convert.ToBase64String(bin);
}
byte[] rebin = Convert.FromBase64String(myString);
using (FileStream fs2 = new FileStream("C:\\tvout.exe", FileMode.Create))
using (BinaryWriter bw = new BinaryWriter(fs2))
bw.Write(rebin);
I don't think you can represent all bytes with ASCII in that way. Base64 is an alternative, but with a ratio between bytes and text of 3:4.

Modifying XMP data with C#

I'm using C# in ASP.NET version 2. I'm trying to open an image file, read (and change) the XMP header, and close it back up again. I can't upgrade ASP, so WIC is out, and I just can't figure out how to get this working.
Here's what I have so far:
Bitmap bmp = new Bitmap(Server.MapPath(imageFile));
MemoryStream ms = new MemoryStream();
StreamReader sr = new StreamReader(Server.MapPath(imageFile));
*[stuff with find and replace here]*
byte[] data = ToByteArray(sr.ReadToEnd());
ms = new MemoryStream(data);
originalImage = System.Drawing.Image.FromStream(ms);
Any suggestions?
How about this kinda thing?
byte[] data = File.ReadAllBytes(path);
... find & replace bit here ...
File.WriteAllBytes(path, data);
Also, i really recommend against using System.Bitmap in an asp.net process, as it leaks memory and will crash/randomly fail every now and again (even MS admit this)
Here's the bit from MS about why System.Drawing.Bitmap isn't stable:
http://msdn.microsoft.com/en-us/library/system.drawing.aspx
"Caution:
Classes within the System.Drawing namespace are not supported for use within a Windows or ASP.NET service. Attempting to use these classes from within one of these application types may produce unexpected problems, such as diminished service performance and run-time exceptions."
Part 1 of the XMP spec 2012, page 10 specifically talks about how to edit a file in place without needing to understand the surrounding format (although they do suggest this as a last resort). The embedded XMP packet looks like this:
<?xpacket begin="■" id="W5M0MpCehiHzreSzNTczkc9d"?>
... the serialized XMP as described above: ...
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf= ...>
...
</rdf:RDF>
</x:xmpmeta>
... XML whitespace as padding ...
<?xpacket end="w"?>
In this example, ‘■’ represents the
Unicode “zero width non-breaking space
character” (U+FEFF) used as a
byte-order marker.
The (XMP Spec 2010, Part 3, Page 12) also gives specific byte patterns (UTF-8, UTF16, big/little endian) to look for when scanning the bytes. This would complement Chris' answer about reading the file in as a giant byte stream.
You can use the following functions to read/write the binary data:
public byte[] GetBinaryData(string path, int bufferSize)
{
MemoryStream ms = new MemoryStream();
using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read))
{
int bytesRead;
byte[] buffer = new byte[bufferSize];
while((bytesRead = fs.Read(buffer,0,bufferSize))>0)
{
ms.Write(buffer,0,bytesRead);
}
}
return(ms.ToArray());
}
public void SaveBinaryData(string path, byte[] data, int bufferSize)
{
using (FileStream fs = File.Open(path, FileMode.Create, FileAccess.Write))
{
int totalBytesSaved = 0;
while (totalBytesSaved<data.Length)
{
int remainingBytes = Math.Min(bufferSize, data.Length - totalBytesSaved);
fs.Write(data, totalBytesSaved, remainingBytes);
totalBytesSaved += remainingBytes;
}
}
}
However, loading entire images to memory would use quite a bit of RAM. I don't know much about XMP headers, but if possible you should:
Load only the headers in memory
Manipulate the headers in memory
Write the headers to a new file
Copy the remaining data from the original file

Categories