Quickest Way To Decompress BIG .tar.gz In C#? - c#

I have a load of .tar.gz files that are around 5GB. I have noticed that the .NET GZipStream actually gets stuck in an infinite loop trying to decompress them.
I found some pure C# code but these all had issues with the size of my files. Unlike other posters (24GB tar.gz Decompress using sharpziplib) I am compiling the application as a 64 bit .NET 4.5.1 application on an X64 bit machine.
I noticed that .NET 4.5.1 removes the 2GB limit.. but after reading it found it to be quite misleading, it appears actually it removes all the nested parts of an object not being able to use more than 2GB but the actual addressable range for objects such as byte arrays still appears to be 2GB even with the relevant option turned on
Does anyone have any solutions or have I hit a limitation in C#? I can invoke the 64bit 7ZIP DLL from my app or call the 7ZIP .exe and wait for it to finish (bit of a bodge) but there has to be a cleaner way? Also I want the quickest decompression and preferably something in pure C# code but I'm currently left thinking this is not possible in C# (due to limitations on the addressable range of byte arrays)

You won't be able to load the resulting data into a single byte[] in C#. You will still be limited by the array size.
However, you should be able to decompress these without issue by just using streams, and decompressing through a stream. I've had very good luck with DotNetZip and large streams - using it, you should be able to just do:
using (System.IO.Stream input = System.IO.File.OpenRead(inputFile))
using (Stream decompressor= new Ionic.Zlib.GZipStream(input, CompressionMode.Decompress, true))
using (var output = System.IO.File.Create(outputFile))
decompressor.CopyTo(output);

Related

Unable to scan 3.5GB image using C# Twain/WIA

I'm trying to scan a 2400DPI A3-size image to TIFF with an Epson scanner using C# (which will result in a 3.5GB uncompressed TIFF). I've tried twain-cs, twaindotnet and ntwain as wrappers (which should use the 64-bit capable twaindsm.dll) as well as WIA.
In all cases when telling Twain to scan to file - just over halfway (the expected 2GB mark) it gives an error that the driver doesn't have enough memory to do that. Unless I set it to save using jpeg compression (as the Epson driver doesn't seem to have lossless compression for photo formats).
When telling Twain to do a memory transfer, it does the full scan, but when I tranfer the memory and write it to a TIFF (using libTiff), the first half is okay (which I guess is around 2GB), the last half is just a single line repeated (still not sure which line, as it doesn't seem to be the last line it has scanned). So even though it doesn't generate an error, it has a problem after the 2GB mark.
WIA gives me a hard limit of 1200DPI, which I think is a limit set in the Epson driver. Next to that, I haven't been able to get WIA to tranfer directly to file As I can't find a way to set TYMED_FILE (and all C++ code I find uses low-level minidriver calls). Also haven't found a way to get the stream so I can write it to a file myself. Creating a minidriver then gives me a problem with an unsigned (and certainly not MS certified) driver.
Any help or links that will point me in the right direction will be welcome!

Unzip internal ZIP file to path

I have a application that I want to copy directories within a internal ZIP to a path.
Did some searching and found this: Decompress byte array to string via BinaryReader yields empty string. However, the result is simply bytes. I haven't a clue about how to translate this back into folders that can then be moved to a path. (Working with just bytes is confusing to me)
Doing some more searching on here pointed me to the .NET 4.5 feature:
https://learn.microsoft.com/en-us/dotnet/standard/io/how-to-compress-and-extract-files
There's one complication, I don't have a zip path, rather a array of bytes from the zip kept internally inside my application. Keeping this in mind, how would I go about using this ZipFile feature but instead with a array of bytes as a input?
Some other things I've looked at:
Compress a single file using C#
https://msdn.microsoft.com/en-us/library/system.io.compression.zipfile%28v=vs.110%29.aspx
How to extract zip file contents into a folder in .NET 4.5
Note, for this particular application, I'd like to refrain from using external DLL's. A portable CLI executable is what I'm aiming for.
In order to satisfy both the need that I have only bytes and unzip the bytes (without using MemoryBuffer as that still makes no sense to me), I ended up creating a temporary folder, creating a empty file in that folder, filling it with the bytes of the zipped file then using ZipFile.ExtractToDirectory() to extract it to the final destination.
It may not be the most efficient, but it works quite well.

Ghostscript wrapper that works with byte arrays rather than file directories

I'm using a generic C# wrapper to render images from PDFs after a user uploads a file and I'm wondering whether its possible to configure the wrapper to work with byte arrays rather than actual files on disk as this will save me an extra trip and increase my application's performance. Ideally I want to pass a byte array of the PDF and have it return a byte array. I had look at the wrapper code and I can't figure out how exactly (if even possible) I would do this. So is it possible? If so any guidance as to where I should start?
Thanks.
You can't feed a sequence of bytes to the Ghostscript PDF interpreter, nor read back a PDF file as a sequence of bytes produced by the pdfwrite device.
The reason is simply that the PDF interpreter, and the PDF writer, both need random access to the file in order to interpret/create the file. If the whole file were held in memory then it would be possible to do so, but that would be a severe limitation on the size of files.
The wrapper you are using is pure wrapper does not provide you what you need. Take a look at Ghostscript.NET managed Ghostscript wrapper (full implementation) which allows you to run interpret prolog / postscript and multiple instances of Ghostscript library at a same time if you have a need to process multiple pdf's at a same time. There is a class GhostscriptViewerPdfFileHandler which demonstrates you how to manipulate pdf through the interpreter. Everything you need can be done.

Compress multiple jpeg files together in C#

I was trying compress jpeg files (say 16 files) together using C#. I did successfully created a tar file and finally a tar.gz (using C# GZipStream class). But the problem with my solution is that the gzip pass increased for 37% the size of the tar file (so a compression ratio of 137%). I tried to manually compress the files together using winrar and it gave me a reduction of 10% in the size (compress ratio of 90%).
I believe that my problem is with GZipStream. I think I should go for another kind of compression (or compressor?!), do you have any idea/suggestion of compression to use.
The framework's compression routines don't always go a great job.
I would recommend trying DotNetZip to compress this. My experience is that the compression (even Gzip) there is much closer to other software, and far smaller than the framework classes. This is also nice in that it requires nearly no code changes from the framework's GzipStream class if you want to use their GzipStream implementation.

programming files of size larger than 2 GB using C#.Net

How to write large content to disk dynamically using c sharp. any advice or reference is appreciated.
Iam trying to create a file(custom format and extension)and writing to it. The User will upload a file and its contents are converted to byte stream and is written to the file(filename.hd).The indexing of the uploaded files is done in another file(filename.hi).
This works fine for me when the "filename.hd" file size is 2 GB when it exceeds 2GB it is not allowing me to add the content.This is my problem.
After googling i found that the FAT 32 windows based system (most of the versions) only support 2GB of file size.Is there any solution for me to handle this situation.Please let me know.
Thanks in advance
sree
Use another filesystem (e.g. NTFS) ?
Use StreamWriter for writing to disk. StringBuilder is recommended to create the string, since when using 'string' appending two strings really creates a new string, which hurts preformance.
Okay you will have some restrictions that are not code related:
File system - FAT and FAT32 will restrict you.
Whether the system is 16, 32 or 64 bit will place restrictions on you.

Categories