C# decode (decompress) Deflate data of PDF File

C# decode (decompress) Deflate data of PDF File - c#

I would like to decompress in C# some DeflateCoded data (PDF extracted).
Unfortunately I got every time the exception "Found invalid data while decoding.".
But the data are valid.
private void Decompress()
{
FileStream fs = new FileStream(#"S:\Temp\myFile.bin", FileMode.Open);
//First two bytes are irrelevant
fs.ReadByte();
fs.ReadByte();
DeflateStream d_Stream = new DeflateStream(fs, CompressionMode.Decompress);
StreamToFile(d_Stream, #"S:\Temp\myFile1.txt", FileMode.OpenOrCreate);
d_Stream.Close();
fs.Close();
}
private static void StreamToFile(Stream inputStream, string outputFile, FileMode fileMode)
{
if (inputStream == null)
throw new ArgumentNullException("inputStream");
if (String.IsNullOrEmpty(outputFile))
throw new ArgumentException("Argument null or empty.", "outputFile");
using (FileStream outputStream = new FileStream(outputFile, fileMode, FileAccess.Write))
{
int cnt = 0;
const int LEN = 4096;
byte[] buffer = new byte[LEN];
while ((cnt = inputStream.Read(buffer, 0, LEN)) != 0)
outputStream.Write(buffer, 0, cnt);
}
}
Does anyone has some ideas?
Thanks.

I added this for test data:-
private static void Compress()
{
FileStream fs = new FileStream(#"C:\Temp\myFile.bin", FileMode.Create);
DeflateStream d_Stream = new DeflateStream(fs, CompressionMode.Compress);
for (byte n = 0; n < 255; n++)
d_Stream.WriteByte(n);
d_Stream.Close();
fs.Close();
}
Modified Decompress like this:-
private static void Decompress()
{
FileStream fs = new FileStream(#"C:\Temp\myFile.bin", FileMode.Open);
//First two bytes are irrelevant
// fs.ReadByte();
// fs.ReadByte();
DeflateStream d_Stream = new DeflateStream(fs, CompressionMode.Decompress);
StreamToFile(d_Stream, #"C:\Temp\myFile1.txt", FileMode.OpenOrCreate);
d_Stream.Close();
fs.Close();
}
Ran it like this:-
static void Main(string[] args)
{
Compress();
Decompress();
}
And got no errors.
I conclude that either the first two bytes are relevant (Obviously they are with my particular test data.) or
that your data has a problem.
Can we have some of your test data to play with?
(Obviously don't if it's sensitive)

private static string decompress(byte[] input)
{
byte[] cutinput = new byte[input.Length - 2];
Array.Copy(input, 2, cutinput, 0, cutinput.Length);
var stream = new MemoryStream();
using (var compressStream = new MemoryStream(cutinput))
using (var decompressor = new DeflateStream(compressStream, CompressionMode.Decompress))
decompressor.CopyTo(stream);
return Encoding.Default.GetString(stream.ToArray());
}
Thank you user159335 and user1011394 for bringing me on the right track! Just pass all bytes of the stream to input of above function. Make sure the bytecount is the same as the length specified.

All you need to do is use GZip instead of Deflate. Below is the code I use for the content of the stream… endstream section in a PDF document:
using System.IO.Compression;
public void DecompressStreamData(byte[] data)
{
int start = 0;
while ((this.data[start] == 0x0a) | (this.data[start] == 0x0d)) start++; // skip trailling cr, lf
byte[] tempdata = new byte[this.data.Length - start];
Array.Copy(data, start, tempdata, 0, data.Length - start);
MemoryStream msInput = new MemoryStream(tempdata);
MemoryStream msOutput = new MemoryStream();
try
{
GZipStream decomp = new GZipStream(msInput, CompressionMode.Decompress);
decomp.CopyTo(msOutput);
}
catch (Exception e)
{
MessageBox.Show(e.Message);
}
}

None of the solutions worked for me on Deflate attachments in a PDF/A-3 document. Some research showed that .NET DeflateStream does not support compressed streams with a header and trailer as per RFC1950.
Error message for reference: The archive entry was compressed using an unsupported compression method.
The solution is to use an alternative library SharpZipLib
Here is a simple method that successfully decoded a Deflate attachment from a PDF/A-3 file for me:
public static string SZLDecompress(byte[] data) {
var outputStream = new MemoryStream();
using var compressedStream = new MemoryStream(data);
using var inputStream = new InflaterInputStream(compressedStream);
inputStream.CopyTo(outputStream);
outputStream.Position = 0;
return Encoding.Default.GetString(outputStream.ToArray());
}

Related

Value already read, or no value when trying to read from a Stream

I've been trying this for a long time but it keeps giving me an error. I have an array of bytes that should represent a nbt document. I would like to convert this into a c# object with a library: fNbt.
Here is my code:
byte[] buffer = Convert.FromBase64String(value);
byte[] decompressed;
using (var inputStream = new MemoryStream(buffer))
{
using var outputStream = new MemoryStream();
using (var gzip = new GZipStream(inputStream, CompressionMode.Decompress, leaveOpen: true))
{
gzip.CopyTo(outputStream);
}
fNbt.NbtReader reader = new fNbt.NbtReader(outputStream, true);
var output = reader.ReadValueAs<AuctionItem>(); //Error: Value already read, or no value to read.
return output;
}
When I try this, it works:
decompressed = outputStream.ToArray();
outputStream.Seek(0, SeekOrigin.Begin);
outputStream.Read(new byte[1000], 0, decompressed.Count() - 1);
But when I try this, it doesn't:
outputStream.Seek(0, SeekOrigin.Begin);
fNbt.NbtReader reader = new fNbt.NbtReader(outputStream, true);
reader.ReadValueAs<AuctionItem>();

NbtReader, like most stream readers, begins reading from the current position of whatever stream you give it. Since you're just done writing to outputStream, then that position is the stream's end. Which means at that point there's nothing to be read.
The solution is to seek the outputStream back to the beginning before reading from it:
outputStream.Seek(0, SeekOrigin.Begin); // <-- seek to the beginning
// Do the read
fNbt.NbtReader reader = new fNbt.NbtReader(outputStream, true);
var output = reader.ReadValueAs<AuctionItem>(); // No error anymore
return output;

The solution is as follows. NbtReader.ReadValueAs does not consider a nbtCompound or nbtList as value. I made this little reader but it is not done yet (I will update the code once it is done).
public static T ReadValueAs<T>(string value) where T: new()
{
byte[] buffer = Convert.FromBase64String(value);
using (var inputStream = new MemoryStream(buffer))
{
using var outputStream = new MemoryStream();
using (var gzip = new GZipStream(inputStream, CompressionMode.Decompress, leaveOpen: true))
{
gzip.CopyTo(outputStream);
}
outputStream.Seek(0, SeekOrigin.Begin);
return new EasyNbt.NbtReader(outputStream).ReadValueAs<T>();
}
}
This is the NbtReader:
private MemoryStream MemStream { get; set; }
public NbtReader(MemoryStream memStream)
{
MemStream = memStream;
}
public T ReadValueAs<T>() where T: new()
{
return ReadTagAs<T>(new fNbt.NbtReader(MemStream, true).ReadAsTag());
}
private T ReadTagAs<T>(fNbt.NbtTag nbtTag)
{
//Reads to the root and adds to T...
}

FileStream is not working with relative path

I'm trying to use a FileStream with a relative path but it is not working.
var pic = ReadFile("~/Images/money.png");
It is working when I use something like:
var p = GetFilePath();
var pic = ReadFile(p);
the rest of the code(from SO):
public static byte[] ReadFile(string filePath)
{
byte[] buffer;
FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
try
{
int length = (int)fileStream.Length; // get file length
buffer = new byte[length]; // create buffer
int count; // actual number of bytes read
int sum = 0; // total number of bytes read
// read until Read method returns 0 (end of the stream has been reached)
while ((count = fileStream.Read(buffer, sum, length - sum)) > 0)
sum += count; // sum is a buffer offset for next reading
}
finally
{
fileStream.Close();
}
return buffer;
}
public string GetFilePath()
{
return HttpContext.Current.Server.MapPath("~/Images/money.png");
}
I don't get why it is not working because the FileStream constructor allow using relative path.

I'm assuming the folder in your program has the subfolder images, which contains your image file.
\folder\program.exe
\folder\Images\money.jpg
Try without the "~".

I also had the same issue but I solved it by using this code,
Try one of this code, hope it will solve your issue too.
#region GetImageStream
public static Stream GetImageStream(string Image64string)
{
Stream imageStream = new MemoryStream();
if (!string.IsNullOrEmpty(Image64string))
{
byte[] imageBytes = Convert.FromBase64String(Image64string.Substring(Image64string.IndexOf(',') + 1));
using (Image targetimage = BWS.AWS.S3.ResizeImage(System.Drawing.Image.FromStream(new MemoryStream(imageBytes, false)), new Size(1600, 1600), true))
{
targetimage.Save(imageStream, ImageFormat.Jpeg);
}
}
return imageStream;
}
#endregion
2nd one
#region GetImageStream
public static Stream GetImageStream(Stream stream)
{
Stream imageStream = new MemoryStream();
if (stream != null)
{
using (Image targetimage = BWS.AWS.S3.ResizeImage(System.Drawing.Image.FromStream(stream), new Size(1600, 1600), true))
{
targetimage.Save(imageStream, ImageFormat.Jpeg);
}
}
return imageStream;
}
#endregion

C# - Compress byte[]

I receive a zip file base64 string, convert to byte[], open in memory, modify content, and then 'compress' the new byte[] to base64 string again.
My problem, I don't know how to 'compress' the new byte[] to zip format.
public string ModifyZipContent(string base64) {
ZipPackage zipPackage = null;
MemoryStream memoryStream = null;
long lenght;
byte[] data = Convert.FromBase64String(base64);
byte[] buffer;
byte[] newData;
int arrayOffset = 0;
memoryStream = new MemoryStream();
memoryStream.Write(data, 0, data.Length);
zipPackage = (ZipPackage)Package.Open(memoryStream, FileMode.Open);
PackagePartCollection zipParts = zipPackage.GetParts();
// this is awful
foreach(ZipPackagePart zipPart in zipParts) {
using(Stream stream = zipPart.GetStream()) {
arrayOffset += (int)stream.Length;
}
}
newData = new byte[arrayOffset];
// end
arrayOffset = 0;
foreach(ZipPackagePart zipPart in zipParts) {
using(Stream stream = zipPart.GetStream()) {
lenght = stream.Length;
buffer = new byte[lenght];
stream.Read(buffer, 0, (int)lenght);
Buffer.BlockCopy(buffer, 0, newData, arrayOffset, buffer.Length);
arrayOffset += buffer.Length;
}
}
return Convert.ToBase64String(newData);
}

I haven't fully tested this, but something along these lines should work...
// Requires System.IO.Compression using statement.
byte[] bytes = new byte[256]; // Your byte[] would be here instead of this empty one.
using (var zipFile = ZipFile.Open("C:/ZipFile.zip", ZipArchiveMode.Update))
{
var entry = zipFile.CreateEntry("YourEntryPathHere");
using (var stream = entry.Open())
{
stream.Write(bytes, 0, bytes.Length);
}
}

uncompressed file is bigger than original file in GZIP

i'm using the following function to compress(thanks to http://www.dotnetperls.com/):
public static void CompressStringToFile(string fileName, string value)
{
// A.
// Write string to temporary file.
string temp = Path.GetTempFileName();
File.WriteAllText(temp, value);
// B.
// Read file into byte array buffer.
byte[] b;
using (FileStream f = new FileStream(temp, FileMode.Open))
{
b = new byte[f.Length];
f.Read(b, 0, (int)f.Length);
}
// C.
// Use GZipStream to write compressed bytes to target file.
using (FileStream f2 = new FileStream(fileName, FileMode.Create))
using (GZipStream gz = new GZipStream(f2, CompressionMode.Compress, false))
{
gz.Write(b, 0, b.Length);
}
}
and for decompress:
static byte[] Decompress(byte[] gzip)
{
// Create a GZIP stream with decompression mode.
// ... Then create a buffer and write into while reading from the GZIP stream.
using (GZipStream stream = new GZipStream(new MemoryStream(gzip), CompressionMode.Decompress))
{
const int size = 4096;
byte[] buffer = new byte[size];
using (MemoryStream memory = new MemoryStream())
{
int count = 0;
do
{
count = stream.Read(buffer, 0, size);
if (count > 0)
{
memory.Write(buffer, 0, count);
}
}
while (count > 0);
return memory.ToArray();
}
}
}
so my goal is actually compress log files and than to decompress them in memory and compare the uncompressed file to the original file in order to check that the compression succeeded and i'm able to open the compressed file successfuly.
the problem is that the uncompressed file is most of the time bigger than the original file and my compare check is failing altough the compression probably succeeded.
any idea why ?
btw here how i compare the uncompressed file to the original file:
static bool FileEquals(byte[] file1, byte[] file2)
{
if (file1.Length == file2.Length)
{
for (int i = 0; i < file1.Length; i++)
{
if (file1[i] != file2[i])
{
return false;
}
}
return true;
}
return false;
}

Try this method to compress a file:
public static byte[] Compress(byte[] raw)
{
using (MemoryStream memory = new MemoryStream())
{
using (GZipStream gzip = new GZipStream(memory,
CompressionMode.Compress, true))
{
gzip.Write(raw, 0, raw.Length);
}
return memory.ToArray();
}
}
}
And this to decompress :
static byte[] Decompress(byte[] gzip)
{
// Create a GZIP stream with decompression mode.
// ... Then create a buffer and write into while reading from the GZIP stream.
using (GZipStream stream = new GZipStream(new MemoryStream(gzip), CompressionMode.Decompress))
{
const int size = 4096;
byte[] buffer = new byte[size];
using (MemoryStream memory = new MemoryStream())
{
int count = 0;
do
{
count = stream.Read(buffer, 0, size);
if (count > 0)
{
memory.Write(buffer, 0, count);
}
}
while (count > 0);
return memory.ToArray();
}
}
}
}
Tell me if it worked.
Goodluck.

Think you'd be better off with the simplest API call, try Stream.CopyTo(). I can't find the error in your code. If I was working on it, I'd probably make sure everything is getting flushed properly.. can't recall if GZipStream is going to flush its output to FileStream when the using block closes.. but then you are also saying that the final file is larger, not smaller.
Anyhow, best policy in my experience.. don't rewrite gotcha prone code when you don't need to. At least you tested it ;)

zlib.net code example for C# that take byte[] as input agument

I spent 3 hours searching for how to uncompress a string using Zlib.net.dll and I did not find anything useful.
Since my string is compressed by the old VB6 program that uses zlib.dll and I do not want to use file access each time I want to uncompress a string.
The problem is you need to know what the original size of the byte[] is before compression.
Or you can use dynamic array for decoding the data.
The code is here:
private string ZlibNetDecompress(string iCompressData, uint OriginalSize)
{
byte[] todecode_byte = Convert.FromBase64String(iCompressData);
byte[] lDecodeData = new byte[OriginalSize];
string lTempoString = System.Text.Encoding.Unicode.GetString(todecode_byte);
todecode_byte = System.Text.Encoding.Default.GetBytes(lTempoString);
string lReVal = "";
MemoryStream outStream = new MemoryStream();
MemoryStream InStream = new MemoryStream(todecode_byte);
zlib.ZOutputStream outZStream = new zlib.ZOutputStream(outStream);
try
{
CopyStream(InStream, outZStream);
lDecodeData = outStream.GetBuffer();
lReVal = System.Text.Encoding.Default.GetString(lDecodeData);
}
finally
{
outZStream.Close();
InStream.Close();
}
return lReVal;
}
private void CopyStream(System.IO.Stream input, System.IO.Stream output)
{
byte[] buffer = new byte[2000];
int len;
while ((len = input.Read(buffer, 0, 2000)) > 0)
{
output.Write(buffer, 0, len);
}
output.Flush();
}

You could use the GZipStreamClass from the framework.
var data = new byte[resultSizeMax];
using (Stream ds = new DeflateStream(stream, CompressionMode.Decompress))
for (var i=0; i< 1000; i+=ds.Read(data, i,1000-i);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# decode (decompress) Deflate data of PDF File - c#

Related

Value already read, or no value when trying to read from a Stream

FileStream is not working with relative path

C# - Compress byte[]

uncompressed file is bigger than original file in GZIP

zlib.net code example for C# that take byte[] as input agument

Categories

Resources