Decompressing a Zip file from a string - c#

I'm fetching an object from couchbase where one of the fields has a file. The file is zipped and then encoded in base64.
How would I be able to take this string and decompress it back to the original file?
Then, if I'm using ASP.MVC 4 - How would I send it back to the browser as a downloadable file?
The original file is being created on a Linux system and decoded on a Windows system (C#).

You should use Convert.FromBase64String to get the bytes, then decompress, and then use Controller.File to have the client download the file. To decompress, you need to open the zip file using some sort of ZIP library. .NET 4.5's built-in ZipArchive class should work. Or you could use another library, both SharpZipLib and DotNetZip support reading from streams.
public ActionResult MyAction()
{
string base64String = // get from Linux system
byte[] zipBytes = Convert.FromBase64String(base64String);
using (var zipStream = new MemoryStream(zipBytes))
using (var zipArchive = new ZipArchive(zipStream))
{
var entry = zipArchive.Entries.Single();
string mimeType = MimeMapping.GetMimeMapping(entry.Name);
using (var decompressedStream = entry.Open())
return File(decompressedStream, mimeType);
}
}
You'll also need the MIME type of the file, you can use MimeMapping.GetMimeMapping to help you get that for most common types.

I've used SharpZipLib successfully for this type of task in the past.
For an example that's very close to what you need to do have a look here.
Basically, the steps should be something like this:
you get the compressed input as a string from the database
create a MemoryStream and write the string to it
seek back to the beginning of the memory stream
use the MemoryStream as an input to the SharpZipLib ZipFile class
follow the example provided above to unpack the contents of the ZipFile
Update
If the string contains only the zipped contents of the file (not a full Zip archive) then you can simply use the GZipStream class in .NET to unzip the contents. You can find a sample here. But the initial steps are the same as above (get string from db, write to memory stream, feed memory stream as input to the GZipStream to decompress).

Related

System.Text.Encoding.Default.GetBytes fails

Here is my sample code:
CodeSnippet 1: This code executes in my file repository server and returns the file as encoded string using the WCF Service:
byte[] fileBytes = new byte[0];
using (FileStream stream = System.IO.File.OpenRead(#"D:\PDFFiles\Sample1.pdf"))
{
fileBytes = new byte[stream.Length];
stream.Read(fileBytes, 0, fileBytes.Length);
stream.Close();
}
string retVal = System.Text.Encoding.Default.GetString(fileBytes); // fileBytes size is 209050
Code Snippet 2:
Client box, which demanded the PDF file, receives the encoded string and converts to PDF and save to local.
byte[] encodedBytes = System.Text.Encoding.Default.GetBytes(retVal); /// GETTING corrupted here
string pdfPath = #"C:\DemoPDF\Sample2.pdf";
using (FileStream fileStream = new FileStream(pdfPath, FileMode.Create)) //encodedBytes is 327279
{
fileStream.Write(encodedBytes, 0, encodedBytes.Length);
fileStream.Close();
}
Above code working absolutely fine Framework 4.5 , 4.6.1
When I use the same code in Asp.Net Core 2.0, it fails to convert to Byte Array properly. I am not getting any runtime error but, the final PDF is not able to open after it is created. Throws error as pdf file is corrupted.
I tried with Encoding.Unicode and Encoding.UTF-8 also. But getting same error for final PDF.
Also, I have noticed that when I use Encoding.Unicode, atleast the Original Byte Array and Result byte array size are same. But other encoding types are mismatching with bytes size also.
So, the question is, System.Text.Encoding.Default.GetBytes broken in .NET Core 2.0 ?
I have edited my question for better understanding.
Sample1.pdf exists on a different server and communicate using WCF to transmit the data to Client which stores the file encoded stream and converts as Sample2.pdf
Hopefully my question makes some sense now.
1: the number of times you should ever use Encoding.Default is essentially zero; there may be a hypothetical case, but if there is one: it is elusive
2: PDF files are not text, so trying to use an Encoding on them is just... wrong; you aren't "GETTING corrupted here" - it just isn't text.
You may wish to see Extracting text from PDFs in C# or Reading text from PDF in .NET
If you simply wish to copy the content without parsing it: File.Copy or Stream.CopyTo are good options.

Serializing Json directly into AWS S3 bucket with Newtonsoft.Json

I have an object that has to be converted to Json format and uploaded via Stream object. This is the AWS S3 upload code:
AWSS3Client.PutObjectAsync(new PutObjectRequest()
{
InputStream = stream,
BucketName = name,
Key = keyName
}).Wait();
Here stream is Stream type which is read by AWSS3Client.
The data that I am uploading is a complex object that has to be in Json format.
I can convert object to string using JsonConvert.SerializeObject or serialize to file using JsonSerializer but since amount of data is quite significant I would prefer to avoid temporary string or file and convert object to readable Stream right away. My ideal code would look something like this:
AWSS3Client.PutObjectAsync(new PutObjectRequest()
{
InputStream = MagicJsonConverter.ToStream(myDataObject),
BucketName = name,
Key = keyName
}).Wait();
Is there a way to achieve this using Newtonsoft.Json ?
You need two things here: one is producer/consumer stream, e.g. BlockingStream from this StackOverflow question, and second, Json.Net serializer writing to this stream like in this another SO question.
Another practical option is to wrap the memory stream with gzip stream (2 lines of code).
Usually, JSON files will have great compression (1GB file can be compressed to 50MB).
Then when serving the stream to S3, wrap it with gzip stream which decompresses it.
I guess the trade-off comparing to temp file is CPU vs IO (both will probably work well). If you can save it compressed on S3 it will save you space and increase networking efficiency too.
Example code:
var compressed = new MemoryStream();
using (var zip = new GZipStream(compressed, CompressionLevel.Fastest, true))
{
-> Write to zip stream...
}
compressed.Seek(0, SeekOrigin.Begin);
-> Use stream to upload to S3

Unzip Byte Array in C#

I'm working on EBICS protocol and i want to read a data in an XML File to compare with another file.
I have successfull decode data from base64 using
Convert.FromBase64String(OrderData); but now i have a byte array.
To read the content i have to unzip it. I tried to unzip it using Gzip like this example :
static byte[] Decompress(byte[] data)
{
using (var compressedStream = new MemoryStream(data))
using (var zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
using (var resultStream = new MemoryStream())
{
zipStream.CopyTo(resultStream);
return resultStream.ToArray();
}
}
But it does not work i have an error message :
the magic number in gzip header is not correct. make sure you are passing in a gzip stream
Now i have no idea how i can unzip it, please help me !
Thanks !
The first four bytes provided by the OP in a comment to another answer: 0x78 0xda 0xe5 0x98 is the start of a zlib stream. It is neither gzip, nor zip, but zlib. You need a ZlibStream, which for some reason Microsoft does not provide. That's fine though, since what Microsoft does provide is buggy.
You should use DotNetZip, which provides ZlibStream, and it works.
Try using SharpZipLib. It copes with various compression formats and is free under the GPL license.
As others have pointed out, I suspect you have a zip stream and not gzip. If you check the first 4 bytes in a hex view, ZIP files always start with 0x04034b50 ZIP File Format Wiki whereas GZIP files start with 0x8b1f GZIP File Format Wiki
I think I finally got it - as usual the problem is not what is in the title. Luckily I've noticed the word EBICS in your post. So, according to EBICS spec the data is first compressed, then encrypted and finally base64 encoded. As you see, after decoding base64 you need first to decrypt the data and then try to unzip it.
UPDATE: If that's not the case, it turns out from the EBICS spec Chapter 16 Appendix: Standards and references that ZIP refers to zlib/deflate format, so all you need to do is to replace GZipStream with the DeflateStream

DotNetZip Library read from one zip into another

Using the DotNetZip Library (http://dotnetzip.codeplex.com/) is there a way to move files from one zip file into another without extracting that file to disk first? Maybe extract to a stream, then update into the other zip from that same stream?
The zip files are password protected and the data in these zip files are meant to stay that way due to their licenses. If I simply extract to disk first then update the other zip there is a chance where those files can be intercepted by the user.
Yes, you should be able to do something like;
var ms = new MemoryStream();
using (ZipFile zip = ZipFile.Read(sourceZipFile))
{
zip.Extract("NameOfEntryInArchive.doc", ms);
}
ms.Seek(0);
using (ZipFile zip = new ZipFile())
{
zip.AddEntry("NameOfEntryInArchive.doc", ms);
zip.Save(zipToCreate);
}
(see it as pseudocode since I didn't have a chance to compile)
Naturally you'll have to add your decryption/encryption to that, but those calls are equally straight forward.

How to get type and number of .zip contents

How can i get the content names of a zipped folder in C# i.e. name of files and folders inside the compressed folder?
I want to decompress the zip by using GZipStream only.
thanks,
kapil
You can't do this using GZipStream only. You will need an implementation of the ZIP standard such as #ziplib. Quote from MSDN:
Compressed GZipStream objects written
to a file with an extension of .gz can
be decompressed using many common
compression tools; however, this class
does not inherently provide
functionality for adding files to or
extracting files from .zip archives.
Example with #ziplib:
using (var stream = File.OpenRead("test.zip"))
using (var zipStream = new ZipInputStream(stream))
{
ZipEntry entry;
while ((entry = zipStream.GetNextEntry()) != null)
{
// entry.IsDirectory, entry.IsFile, ...
Console.WriteLine(entry.Name);
}
}

Categories