SharpZipLib compression level between creating or updating a zip file - c#

I've got a couple of questions regarding the use of ShareZipLib.
I'm using 2 different methods to create a zip file and updating one which seems like overkill. Is there a better way?
Create Zip File:
using (var zipFile = File.Open(zipFilename, FileMode.CreateNew))
using (var outputStream = new ZipOutputStream(zipFile))
{
outputStream.Password = password;
outputStream.SetLevel(compressionLevel); // 0 - store only to 9 - means best compression
var buffer = new byte[4096];
foreach (var file in filenames)
{
var entry = new ZipEntry(Path.GetFileName(file));
entry.DateTime = DateTime.Now;
outputStream.PutNextEntry(entry);
using (var fs = File.OpenRead(file))
{
var sourceBytes = 0;
do
{
sourceBytes = fs.Read(buffer, 0, buffer.Length);
outputStream.Write(buffer, 0, sourceBytes);
} while (sourceBytes > 0);
}
}
outputStream.Finish();
outputStream.Close();
}
Update ZipFile:
using (var zipFile = File.Open(zipFilename, FileMode.Append))
using (var outputStream = new ZipFile(zipFile))
{
outputStream.Password = password;
outputStream.BeginUpdate();
foreach (var filename in filenames)
{
var entry = new ZipEntry(Path.GetFileName(filename));
entry.DateTime = DateTime.Now;
outputStream.Add(entry);
}
outputStream.CommitUpdate();
}
I've tried using the File.Open(zipFilename, FileMode.Append) on the create method hoping I'd be able to append files to the zip file but it just overrides it. The only other way I found is the second method but as I said, it seems like an overkill. Hopefully, this can be simplified and this brings my second question/issue.
Why is the compression level set at the stream level, which makes sense to me by the way, when creating a new zip file but when you are updating a zip file, you have to set it at the file level.
Also when setting it at the stream level, it requires a number from 0 to 9.
outputStream.SetLevel(compressionLevel);
But when setting it at the file level on the update, it uses an enum:
Surely, you don't want to have different level of compression when adding and updating but 0 to 9 are not the values provided in the enum.
Thanks.

Related

How to unzip a file in memory to an Azure CloudBlob [duplicate]

I'm probably doing something obviously stupid here. Please point it out!
I have some C# code that is pulling down a bunch of .gz files from SFTP (using the SSH.NET Nuget package - works great!). Each gz contains only a single .CSV file inside of them. I want to keep these files in memory without hitting disk (yes, I know, server memory management concerns exist - that's fine as these files are fairly small), decompress them in memory to extract the CSV file inside, and then return a collection of CSV files in a custom DTO (FtpFile).
My problem is that while my MemoryStream from the SFTP connection has data in it, either it doesn't ever seem to be populated in my GZipStream or the copy from the GZipStream to my output MemoryStream is failing. I have tried with the more traditional looping over Read with my own buffer but it had the same results as this code.
Aside from connection details (it connects successfully, so no worries there), here's all of my code:
Logic:
public static List<FtpFile> Foo()
{
var connectionInfo = new ConnectionInfo("example.com",
"username",
new PasswordAuthenticationMethod("username", "password"));
using (var client = new SftpClient(connectionInfo))
{
client.Connect();
var searchResults = client.ListDirectory("/testdir")
.Where(obj => obj.IsRegularFile
&& obj.Name.ToLowerInvariant().StartsWith("test_")
&& obj.Name.ToLowerInvariant().EndsWith(".gz"))
.Take(2)
.ToList();
var fileResults = new List<FtpFile>();
foreach (var file in searchResults)
{
var ftpFile = new FtpFile { FileName = file.Name, FileSize = file.Length };
using (var fileStream = new MemoryStream())
{
client.DownloadFile(file.FullName, fileStream); // Success! All is good here, so far. :)
using (var gzStream = new GZipStream(fileStream, CompressionMode.Decompress))
{
using (var outputStream = new MemoryStream())
{
gzStream.CopyTo(outputStream);
byte[] outputBytes = outputStream.ToArray(); // No data. Sad panda. :'(
ftpFile.FileContents = Encoding.ASCII.GetString(outputBytes);
fileResults.Add(ftpFile);
}
}
}
}
return fileResults;
}
}
FtpFile (just a simple DTO I'm populating):
public class FtpFile
{
public string FileName { get; set; }
public long FileSize { get; set; }
public string FileContents { get; set; }
}
PSA If anybody comes and copies this code, be aware that this is NOT good code in that you could have some serious memory management problems with this code! It's best practice to instead stream it to disk, which is not being done in this code! My needs are very specific in that I have to have these files simultaneously in memory for what I'm building with them.
If you are inserting data into the stream, make sure to seek back to its origin before un-gzipping it.
The following should fix your troubles:
using (var fileStream = new MemoryStream())
{
client.DownloadFile(file.FullName, fileStream); // Success! All is good here, so far. :)
fileStream.Seek(0, SeekOrigin.Begin);
using (var gzStream = new GZipStream(fileStream, CompressionMode.Decompress))
{
using (var outputStream = new MemoryStream())
{
gzStream.CopyTo(outputStream);
byte[] outputBytes = outputStream.ToArray(); // No data. Sad panda. :'(
ftpFile.FileContents = Encoding.ASCII.GetString(outputBytes);
fileResults.Add(ftpFile);
}
}
}

How to go from byte[], to MemoryStream, Unzip, then write to FileStream

I am unsure what I am doing wrong. The files that I create after grabbing a byte[] (which is emailAttachment.Body) and passing it to the method ExtractZipFile, converting it to MemoryStream and then unzipping it, returning it as a KeyValuePair and then Writing to a file using FileStream.
However when I go to open the new created files there is an error in opening them. They are not able to be opened.
The below are in the same class
using Ionic.Zip;
var extractedFiles = ExtractZipFile(emailAttachment.Body);
foreach (KeyValuePair<string, MemoryStream> extractedFile in extractedFiles)
{
string FileName = extractedFile.Key;
using (FileStream file = new FileStream(CurrentFileSystem +
FileName.FileFullPath(),FileMode.Create, System.IO.FileAccess.Write))
{
byte[] bytes = new byte[extractedFile.Value.Length];
extractedFile.Value.Read(bytes, 0, (int) xtractedFile.Value.Length);
file.Write(bytes,0,bytes.Length);
extractedFile.Value.Close();
}
}
private Dictionary<string, MemoryStream> ExtractZipFile(byte[] messagePart)
{
Dictionary<string, MemoryStream> result = new Dictionary<string,MemoryStream>();
MemoryStream data = new MemoryStream(messagePart);
using (ZipFile zip = ZipFile.Read(data))
{
foreach (ZipEntry ent in zip)
{
MemoryStream memoryStream = new MemoryStream();
ent.Extract(memoryStream);
result.Add(ent.FileName,memoryStream);
}
}
return result;
}
Is there something I am missing? I do not want to save the original zip file just the extracted Files from MemoryStream.
What am I doing wrong?
After writing to your MemoryStream, you're not setting the position back to 0:
MemoryStream memoryStream = new MemoryStream();
ent.Extract(memoryStream);
result.Add(ent.FileName,memoryStream);
Because of this, the stream position will be at the end when you try to read from it, and you'll read nothing. Make sure to rewind it:
memoryStream.Position = 0;
Also, you don't have to handle the copy manually. Just let the CopyTo method take care of it:
extractedFile.Value.CopyTo(file);
I'd suggest that you clean up your use of MemoryStream in your code.
I agree that calling memoryStream.Position = 0; will allow this code to work correctly, but it's an easy thing to miss when reading and writing memory streams.
It's better to write code that avoids the bug.
Try this:
private IEnumerable<(string Path, byte[] Content)> ExtractZipFile(byte[] messagePart)
{
using (var data = new MemoryStream(messagePart))
{
using (var zipFile = ZipFile.Read(data))
{
foreach (var zipEntry in zipFile)
{
using (var memoryStream = new MemoryStream())
{
zipEntry.Extract(memoryStream);
yield return (Path: zipEntry.FileName, Content: memoryStream.ToArray());
}
}
}
}
}
Then your calling code would look something like this:
foreach (var extractedFile in ExtractZipFile(emailAttachment.Body))
{
File.WriteAllBytes(Path.Combine(CurrentFileSystem, extractedFile.Path.FileFullPath()), extractedFile.Content);
}
It's just a lot less code and a much better chance of avoiding bugs. The number one predictor of bugs in code is the number of lines of code you write.
Since I find it all a lot of code for a simple operation, here's my two cents.
using Ionic.Zip;
using (var s = new MemoryStream(emailAttachment.Body))
using (ZipFile zip = ZipFile.Read(s))
{
foreach (ZipEntry ent in zip)
{
string path = Path.Combine(CurrentFileSystem, ent.FileName.FileFullPath())
using (FileStream file = new FileStream(path, FileAccess.Write))
{
ent.Extract(file);
}
}
}

Adding large files to IO.Compression.ZipArchiveEntry throws OutOfMemoryException Exception

I am trying to add a large video file(~500MB) to an ArchiveEntry by using this code:
using (var zipFile = ZipFile.Open(outputZipFile, ZipArchiveMode.Update))
{
var zipEntry = zipFile.CreateEntry("largeVideoFile.avi");
using (var writer = new BinaryWriter(zipEntry.Open()))
{
using (FileStream fs = File.Open(#"largeVideoFile.avi", FileMode.Open))
{
var buffer = new byte[16 * 1024];
using (var data = new BinaryReader(fs))
{
int read;
while ((read = data.Read(buffer, 0, buffer.Length)) > 0)
{
writer.Write(buffer, 0, read);
}
}
}
}
}
I am getting the error
System.OutOfMemoryException
when writer.Write is called, alltought I used a intermediate buffer....
Any idea how to solve this?
Build the application as any CPU and execute it in a x64 machine. This should fix the issue. (Or directly build the application as x64).
Videos normally cannot be compressed a lot and the zip file probably remains in memory until the are completely created.

Nesting Zip Files and Folders in Memory using DotNetZip Library

We have a page that users can download media and we construct a folder structure similar to the following and zip it up and send it back to the user in the response.
ZippedFolder.zip
- Folder A
- File 1
- File 2
- Folder B
- File 3
- File 4
The existing implementation that accomplishes this saves files and directories temporarily to file system and then deletes them at the end. We are trying to get away from doing this and would like to accomplish this entirely in memory.
I am able to successfully create a ZipFile with files in it, but the problem I am running into is creating Folder A and Folder B and adding files to those and then adding those two folders to the Zip File.
How can I do this without saving to the file system?
The code for just saving the file streams to the zip file and then setting the Output Stream on the response is the following.
public Stream CompressStreams(IList<Stream> Streams, IList<string> StreamNames, Stream OutputStream = null)
{
MemoryStream Response = null;
using (ZipFile ZippedFile = new ZipFile())
{
for (int i = 0, length = Streams.Count; i < length; i++)
{
ZippedFile.AddEntry(StreamNames[i], Streams[i]);
}
if (OutputStream != null)
{
ZippedFile.Save(OutputStream);
}
else
{
Response = new MemoryStream();
ZippedFile.Save(Response);
// Move the stream back to the beginning for reading
Response.Seek(0, SeekOrigin.Begin);
}
}
return Response;
}
EDIT We are using DotNetZip for the zipping/unzipping library.
Here's another way of doing it using System.IO.Compression.ZipArchive
public Stream CompressStreams(IList<Stream> Streams, IList<string> StreamNames, Stream OutputStream = null)
{
MemoryStream Response = new MemoryStream();
using (ZipArchive ZippedFile = new ZipArchive(Response, ZipArchiveMode.Create, true))
{
for (int i = 0, length = Streams.Count; i < length; i++)
using (var entry = ZippedFile.CreateEntry(StreamNames[i]).Open())
{
Streams[i].CopyTo(entry);
}
}
if (OutputStream != null)
{
Response.Seek(0, SeekOrigin.Begin);
Response.CopyTo(OutputStream);
}
return Response;
}
and a little test:
using (var write = new FileStream(#"C:\users\Public\Desktop\Testzip.zip", FileMode.OpenOrCreate, FileAccess.Write))
using (var read = new FileStream(#"C:\windows\System32\drivers\etc\hosts", FileMode.Open, FileAccess.Read))
{
CompressStreams(new List<Stream>() { read }, new List<string>() { #"A\One.txt" }, write);
}
re: your comment -- sorry, not sure if it creates something in the background, but you're not creating it yourself to do anything

Displaying the contents of a Zip archive in WinRT

I want to iterate through the contents of a zipped archive and, where the contents are readable, display them. I can do this for text based files, but can't seem to work out how to pull out binary data from things like images. Here's what I have:
var zipArchive = new System.IO.Compression.ZipArchive(stream);
foreach (var entry in zipArchive.Entries)
{
using (var entryStream = entry.Open())
{
if (IsFileBinary(entry.Name))
{
using (BinaryReader br = new BinaryReader(entryStream))
{
//var fileSize = await reader.LoadAsync((uint)entryStream.Length);
var fileSize = br.BaseStream.Length;
byte[] read = br.ReadBytes((int)fileSize);
binaryContent = read;
I can see inside the zip file, but calls to Length result in an OperationNotSupported error. Also, given that I'm getting a long and then having to cast to an integer, it feels like I'm missing something quite fundamental about how this should work.
I think the stream will decompress the data as it is read, which means that the stream cannot know the decompressed length. Calling entry.Length should return the correct size value that you can use. You can also call entry.CompressedLength to get the compressed size.
Just copy the stream into a file or another stream:
using (var fs = await file.OpenStreamForWriteAsync())
{
using (var src = entry.Open())
{
var buffLen = 1024;
var buff = new byte[buffLen];
int read;
while ((read = await src.ReadAsync(buff, 0, buffLen)) > 0)
{
await fs.WriteAsync(buff, 0, read);
await fs.FlushAsync();
}
}
}

Categories