Compress a file with GZipStream while maintaining its meta-data

Compress a file with GZipStream while maintaining its meta-data - c#

How can I get the extension of compressed file after being compressed with System.IO.Compression.GZipStream?
For example, if the original file is named test.doc and compresses to test.gz, how do I know what file extension to use when decompressing?

There is no way to get the file name - in fact there may never be a filename at all, if for example a piece of data is created in memory and then send over a network connection.
Instead of replacing the file extension, why not append it, for example: test.doc.gz
Then you can simply strip it off when decompressing.

I had to do this some time ago. The solution is to use the J# libraries to do it. You still write it in C# however.
http://msdn.microsoft.com/en-us/magazine/cc164129.aspx
That's microsofts answer on the topic.

Not sure what is your question- I assume you want a mechanism to "remember" what the extension was before the compression took place?
If that is the question then the convention of test.doc compressing into test.doc.gz will work.

The test.gz is just a raw byte stream with no meta-data about what has been compressed (for example, original file name, extension etc). What you'd need to do is create an archive that contains the gzip stream and meta-data about each file contained in the archive.
The article linked to in Mech Software's answer provides a pretty reasonable way to implement this.
There was also this question (vaguely related) asked some time back which may help out:
How to compress a directory with the built in .net compression classes?

Related

How do I get the file version from bytes or stream?

I get the file version this way:
var fileVersion = FileVersionInfo.GetVersionInfo(path).FileVersion
But this option is not suitable for me, since I have to use a non-native tool to get the file that returns the stream. Can I get the file version from this stream or from an array of bytes?

Unfortunately, you cant do this directly
you should
Write the file to disk in some sort of temporary location
Read the version from the file on disk
Delete the file

In short, no, what you want is not possible with the current tools. The problem is that, as you've noticed, FileVersionInfo.GetVersionInfo relies on a physical file to be present on disk. If you look at its internals, you'll see that all it really does is to delegate to the Windows API which does the real work, precisely in the GetFileVersionInfo function, which in turn also takes a file name as parameter, so it's only designed to operate from the filesystem.
A possible workaround would be to drop a temp file with the binary you got from your stream, get the version info you need, then delete the file.
Another option would be to look for a library that can parse in-memory exe/dll files and extract the relevant details directly from there.

Unpacking tar/BZ2 files using C#

I have a tar.bz2 file and I want to extract it to a directory. In the examples I only see option of compress or decompress however I want actually to extract or unpack.
Also tried ICSharpCode.SharpZipLib.BZip2 but I didn't find an option to unpack.

While you use a ZipInputStream for .zip files, you should use a BZip2InputStream for .bz2 files (and GZipInputStream for .gz files etc.).
Taken from:
How to decompress .bz2 file in C#?

Decompressing and unpacking are two different operations. A foo.tar.bz2 file is actually a foo.tar file which was then compressed using bz2.
So to get single files you have to do this in the opposite direction. I.e. first decompress it (which you managed to do with sharpziplib). The result of this decompression has then to be untared (which can also be done with sharpziplib) see the docs for details.

Determine if an .iso is actually a video/movie in C#

I'm in a situation where I'd like to, using C#, look at .iso files that are in a directory and determine if they are indeed video discs (DVD/BD or similar).
I don't need to actually distinguish the type, just a blanket "yes this is a video disc". Is there a way to do this?

the ISO file is actually a CD Image in file format. The easiest way to determine what is on it is to mount it with a Virtual CD program. Or you can look at the file contents.
Here is the Specifications for ISO files
http://users.telenet.be/it3.consultants.bvba/handouts/ISO9960.html
After you are able to determine what information is on the disk then you can determine if there is video information on it by finding out what the contents of those files are.
That is a much more daunting task then just determining the file structure.
This specification file will only define ISO files. Other cd formats will need to be read using their own Specifications...
You can determine if the file is of type ISO using the header data
Here is a Stack Question explaining in a little more detail.
Using .NET, how can you find the mime type of a file based on the file signature not the extension
EDIT
Looking into the Mime type thing a little more reveals that Microsoft will have to have a registered mime type for that header data. It may not know that it is an ISO and may tell you application/octet-stream If this is the case then you can instead use your own judgement with the same first 256 bytes. Determine some things that tell you that it is an ISO file that you can handle. Usually you can tell what type and version a file is with the first 20 bytes or so.

I did some searching around for a library that you could use to read/write ISO files. You just need the read part obviously and this project is something you could probably use http://discutils.codeplex.com/

As another mentioned, an ISO file contains a file system. The easiest way to read it is to mount it as a virtual drive, using any one of a number of utilities. Once you've mounted it as a drive, then you can determine that it likely contains a movie by inspecting the file system (i.e. using Directory.GetFiles and similar methods in C#).
If you want to read the file's contents directly (without mounting it), I'm not sure what to tell you. I've heard that 7-zip has an API that will let you read the files. You might also check out DiscUtils, which claims to be able to read ISO files.
Once you can read the contents of the file system, see the "Filesystem" section of http://en.wikipedia.org/wiki/DVD-Video. That will tell you what files and directories you should expect to see in the ISO of a DVD movie.
Note that the files' existence is an indication that the image probably contains a DVD movie. There's no way to tell for sure without examining the files' contents individually. Tracking down the specifications for the individual file types might be a more difficult task.

try using IMAPIv2 to interrogate the iso.
This link doesnt do that.. but it should get you started in the right direction.
How to create optical ISO using IMAPIv2

Is it possible decompress a zip file while maintaining hierarchy using just .NET or some other built-in Windows API?

I have a zip file that contains folder hierarchies and files.
\images\
\images\1.jpg
\images\2.jpg
\something\something\a.exe
\something\something\b.exe
1.txt
I need to decompress the contents of this zip file to a location. I also need to preserve the structure of the zip file.
I've read about .NET's GZipStream and DeflateStream but I am of the opinion that it is too "complicated" for my purpose.
I've also used DotNetZip and SharpZipLib in the past for personal projects but since this is work related and I'm working at a huge company, I would have a hard time convincing legal to use these libraries.
Question:
Is it possible decompress a zip file while maintaining hierarchy using just .NET or some other built-in Windows API?
PS: I've also read this but I think it's hacky because you'll need to produce another executable just to hide the progress dialog.
Thanks!

Check out if Ionic Zip helps?

DotNetZip would do what you want, but I understand your concerns about legal approval.
On a side note, It might be good for you to navigate the legal jungle associated with getting an open-source library approved for use in the company, just to understand what's involved. But I'll leave that up to you.
Getting back to rolling your own...
DotNetZip is pretty full featured, and it handles a number of scenarios you probably don't care about. Like Unicode filenames and comments, setting windows timestamps and permissions of extracted files, getting timestamps of zip files created on old unix systems, split archives, Encrypted archives, files over 2gb, or self-extracting archives, etc etc etc. Many zip files use none of those things.
Also DotNetZip does eventing and zip updates and zip creation - all the code associated with these things is probably not of interest to you, if you confine yourself just to the requirements you described in your question.
You could, though, grab the DotNetZip code and use it to help you roll your own solution. If you constrain yourself to JUST reading zip files and not dealing with all the possible special cases, the zip format is not difficult to parse.
here's how to do it:
open the zip file using new FileStream() or File.Open. You want a FileStream object.
Read 4 bytes. Verify that it is the zip-entry-header descriptor. (0x04034b50)
In the file, the order you will find these bytes is 50 4b 03 04.
if you find a match, you're in business.
at offset 14 is a 4-byte CRC. Get it. (Same byte ordering as above)
at offset 18 - the 4-byte length of the compressed blob. get it. (N)
at offset 22 - the 4-byte length of the UNcompressed blob. get it. (U)
at 26 - the 2-byte length of the filename. get it (L)
at 28 - the 2-byte length of the "extra field". get it (E)
Beyond the extra field, at offset 30, is the actual filename. read L bytes for the filename, and call System.Text.Encoding.ASCII.GetString(). The result will include a directory path, with the backslashes replaced with slashes (unix style). String.Replace() the slashes.
after the filename comes the extra field - seek E bytes to get beyond it. You can mostly ifgnore it. This is where the compressed data starts.
Open a System.IO.DeflateStream() on the zip FileStream, using CompressionMode.Decompress, and using the current offset of the FileStream as input. open a new FileStream, for output, with the file path you read in step 3. in a loop, call inflater.Read(). and output.Write(), to write the decompressed output of the DeflateStream to a filesystem file with the correct name. You will need to stop reading from the DeflateStream when you read exactly U (uncompressed) bytes.
Check the uncompressed size (U) against the data you actually wrote out from the DeflateStream (after compression). They should match.
If you are fancy, you can check the CRC of the output against what was in the header.
go to step 2, to look for the next entry in the file.
The most complicated part is step 3. Working code for that is easily found in this source module, look for the ReadHeader method.

Maybe the full features set of GZipStream it's a bit complicated, but note that the sample in the msdn page it's exactly what you need. I mean this msdn web (the 4.0 version) not the one you supply in the question.
http://msdn.microsoft.com/en-us/library/system.io.compression.gzipstream.aspx#Y2750

Unzip file while reading it

I have hundreds of CSV files zipped. This is great because they take very little space but when it is time to use them, I have to make some space on my HD and unzip them before I can process. I was wondering if it is possible with .NET to unzip a file while reading it. In other words, I would like to open a zip file, start to decompress the file and as we go, process the file.
So there would be no need for extra space on my drive. Any ideas or suggestions?

Yes. Zip is a streamed format which means that you can use the data as you decompress it rather than having to decompress everything first.
With .net's System.IO.Compression classes you can apply similar compression as used in zip files (Deflate & GZip) to any stream you like, but if you want to work with actual zip format files you'll need a third party library like this one (sharpziplib).

A better solution might be to keep the files decompressed on the drive, but turn on compression on the file system level. This way you'll just be reading CSV files, and the OS will take care of making sure it doesn't take too much space.
Anyhoo, to answer your question, maybe the GZipStream class can help you.

sharpziplib allows for stream-based decompression - see this related question - the item provides similar stream-based Read methods, so you can process each item like you would with any stream.

I'm not sure about zip files, but you could use GZ format with GZipSteam (works like any other input stream). Unfortunately, the entire System.IO.Compression namespace is only 2 classes (the other does DEFLATE).
EDIT: There's a class called ZipPackage. I'm not sure how if it will let you do decompression streaming, but it might be worth looking into.
Also, take a look at #ziplib.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.