OutOfMemory exception creating a large zip file - c#

I am attempting to create a zip file using the Ionic.Zip library in .NET. My procedure iterates over a list of files from various sources and puts each file (file bytes, file name) into the zip file my procedure builds (as shown below).
The procedure below works great for smaller lists of files. But with larger lists of files, my procedure throws an OutOfMemory exception. I "thought" I was putting the zipped contents directly into the zip file I am building. But since I am getting an OutOfMemory exception, it makes me think that my procedure is loading up everything into memory before saving it to disk.
This is my procedure:
using (var tempFileStream = new System.IO.FileStream(tempFileName, FileMode.OpenOrCreate))
using (var zipOutputStream = new ZipOutputStream(tempFileStream))
{
foreach (var fileRec in dbFileRecs)
{
var fileToPutInZip = getFileData();
if (fileToPutInZip != null)
{
var fileNameToDownload = removeIllegalChars(fileToPutInZip.FileName);
fileNameToDownload = ensureUniqueFilename(fileNameToDownload);
var entry = zipOutputStream.PutNextEntry(fileNameToDownload);
using (var ms = new System.IO.MemoryStream(fileToPutInZip.FileData)) // FileData is a byte array
{
ms.CopyTo(zipOutputStream);
}
zipOutputStream.Flush();
}
}
zipOutputStream.Flush();
zipOutputStream.Close();
}
Am I doing something wrong here? How can I write directly to the zip file and not load up the whole file into memory (and avoid those OutOfMemory errors)?

Related

XML file from ZIP Archive is incomplete in C#

I've work with large XML Files (~1000000 lines, 34mb) that are stored in a ZIP archive. The XML file is used at runtime to store and load app settings and measurements. The gets loadeted with this function:
public static void LoadFile(string path, string name)
{
using (var file = File.OpenRead(path))
{
using (var zip = new ZipArchive(file, ZipArchiveMode.Read))
{
var foundConfigurationFile = zip.Entries.First(x => x.FullName == ConfigurationFileName);
using (var stream = new StreamReader(foundConfigurationFile.Open()))
{
var xmlSerializer = new XmlSerializer(typeof(ProjectConfiguration));
var newObject = xmlSerializer.Deserialize(stream);
CurrentConfiguration = null;
CurrentConfiguration = newObject as ProjectConfiguration;
AddRecentFiles(name, path);
}
}
}
}
This works for most of the time.
However, some files don't get read to the end and i get an error that the file contains non valid XML. I used
foundConfigurationFile.ExtractToFile();
and fount that the readed file stops at line ~800000. But this only happens inside this code. When i open the file via editor everything is there.
It looks like the zip doesnt get loaded correctly, or for that matter, completly.
Am i running in some limitations? Or is there an error in my code i don't find?
The file is saved via:
using (var file = File.OpenWrite(Path.Combine(dirInfo.ToString(), fileName.ToString()) + ".pwe"))
{
var zip = new ZipArchive(file, ZipArchiveMode.Create);
var configurationEntry = zip.CreateEntry(ConfigurationFileName, CompressionLevel.Optimal);
var stream = configurationEntry.Open();
var xmlSerializer = new XmlSerializer(typeof(ProjectConfiguration));
xmlSerializer.Serialize(stream, CurrentConfiguration);
stream.Close();
zip.Dispose();
}
Update:
The problem was the File.OpenWrite() method.
If you try to override a file with this method it will result in a mix between the old file and the new file, if the new file is shorter than the old file.
File.OpenWrite() doenst truncate the old file first as stated in the docs
In order to do it correctly it was neccesary to use the File.Create() method. Because this method truncates the old file first.

Does ZipArchive load entire zip file into memory

If I stream a zip file like so:
using var zip = new ZipArchive(fileStream, ZipArchiveMode.Read);
using var sr = new StreamReader(zip.Entries[0].Open());
var line = sr.ReadLine(); //etc..
Am I streaming the zip file entry or is it loading the entire zip file into memory then I am streaming the uncompressed file?
It depends on how the fileStream was created. Was it created from a file on disk? If so, then ZipArchive will read from disk as it needs data. It won't put the entire thing in memory then read it. That would be incredibly inefficient.
I have a bunch of experience in this... I worked on a project where I had to unarchive 25 GB. Zip files. .NET's ZipArchive was very quick and very memory efficient.
You can have MemoryStreams that contain data that ZipArchive can read from, so you aren't limited to just Zip files on disk.
Here is a slightly efficient way to unzip a ZipArchive:
var di = new DirectoryInfo(Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.CommonApplicationData), "MyDirectoryToExtractTo"));
var filesToExtract = _zip.Entries.Where(x =>
!string.IsNullOrEmpty(x.Name) &&
!x.FullName.EndsWith("/", StringComparison.Ordinal));
foreach(var x in filesToExtract)
{
var fi = new FileInfo(Path.Combine(di.FullName, x.FullName));
if (!fi.Directory.Exists) { fi.Directory.Create(); }
using (var i = x.Open())
using (var o = fi.OpenWrite())
{
i.CopyTo(o);
}
}
This will extract all the files to C:\ProgramData\MyDirectoryToExtractTo\ keeping directory structure.
If you'd like to see how ZipArchive was implemented to verify, take a look here.

Out Of Memory Exception when zipping memory stream

I have a text box that a user can submit a list of document ids to download those files zipped up from an Azure blob.
How the code currently functions building a zip memory stream and then for each document id submitted we build a memory stream, get the file from that stream, and then add it to the zip file. The issue is that we we are building the memory stream and getting a file that is larger than 180 mb the program throws an out of memory exception.
There is the code
public async Task<byte[]> BuildZipStream(string valueDataUploadContainerName, IEnumerable<Document> docs)
{
var zipMemStream = new MemoryStream();
using (Ionic.Zip.ZipFile zip = new Ionic.Zip.ZipFile())
{
zip.Name = System.IO.Path.GetTempFileName();
var insertedEntries = new List<string>();
foreach (var doc in docs)
{
var EntryName = $"{doc.Name}{Path.GetExtension(doc.DocumentPath)}";
if (insertedEntries.Contains(EntryName))
{
EntryName = $"{doc.Name} (1){Path.GetExtension(doc.DocumentPath)}";
var i = 1;
while (insertedEntries.Contains(EntryName))
{
EntryName = $"{doc.Name} ({i.ToString()}){Path.GetExtension(doc.DocumentPath)}";
i++;
}
}
insertedEntries.Add(EntryName);
var file = await GetFileStream(blobFolderName, doc.DocumentPath);
if (file != null)
zip.AddEntry($"{EntryName}", file);
}
zip.Save(zipMemStream);
}
zipMemStream.Seek(0, 0);
return zipMemStream.ToArray();
And then for actually getting the file from the blob storage
public async Task<byte[]> GetFileStream(string container, string filename)
{
var blobStorageAccount = _keyVaultService.GetSecret(new KeyVaultModel { Key = storageLocation });
var storageAccount = CloudStorageAccount.Parse(blobStorageAccount ?? _config.Value.StorageConnection);
var blobClient = storageAccount.CreateCloudBlobClient();
var blobContainer = blobClient.GetContainerReference(container);
await blobContainer.CreateIfNotExistsAsync();
var blockBlob = blobContainer.GetBlockBlobReference(filename);
if (blockBlob.Exists())
{
using (var mStream = new MemoryStream())
{
await blockBlob.DownloadToStreamAsync(mStream);
mStream.Seek(0, 0);
return mStream.ToArray();
}
}
}
The problem occurs when the program hits await blockBlob.DownloadToStreamAsync(mStream); it will sit and spin for a while and then throw an out of memory exception.
I have read a few different solutions which have not been working for me, the most common being to change the Platform target under properties to be at least x64 and I am running this at x86. Another solution I could see would be to move the GetFileStream logic into the method for BuildZipStream, but then I feel the method would be doing too much.
Any suggestions?
EDIT:
The problem is actually occurring when the program hits zip.Save(zipMemStream)
Your methodology here is flawed. Because you do not know:
The Amount of Files.
The Size of Each File.
You cannot accurately determine if you'll have the memory in RAM within the server to actually HOUSE all the files in memory. What you are doing here is collecting every Azure Blob File they list and putting that into a Zip File IN MEMORY, while downloading each file IN MEMORY. It's no wonder you're getting a Out-Of-Memory exception, even with 128gb of RAM, if the user requests a big enough file, you'll Run out of Memory.
Your best solution, and the most common practice found with downloading and zipping multiple Azure Blob Files is to utilize a Temp Blob file.
Instead of writing to a MemoryStream, you write to FileStream and place that Zipped File onto the Azure Blob, and then Serve that zipped Blob File. Once the file is served you remove it from the Blob.
Hope this helps.

How to extract multi-volume archive within Azure Blob Storage?

I have a multi-volume archive stored in Azure Blob Storage that is split into a series of zips titled like this: Archive-Name.zip.001, Archive-Name.zip.002, etc. . . Archive-Name.zip.010. Each file is 250 MB and contains hundreds of PDFs.
Currently we were trying to iterate through each archive part and extract the PDFs. This works except when the past PDF in an archive has been split between two archive parts, ZipFile in C# is unable to process the split file and throws an exception.
We tried reading all the archive parts into a single MemoryStream and then extracting the files, however then we are finding the memory streams exceed 2GBs which is the limit - so this method does not work either.
It is not feasible to download the archive into a machines memory, extract, then upload the PDFs to a new file. The extraction needs to be done in Azure where the program will run.
This is the code we are currently using - it is unable to handle PDFs split between two archive parts.
public static void UnzipTaxForms(TextWriter log, string type, string fiscalYear)
{
var folderName = "folderName";
var outPutContainer = GetContainer("containerName");
CreateIfNotExists(outPutContainer);
var fileItems = ListFileItems(folderName);
fileItems = fileItems.Where(i => i.Name.Contains(".zip")).ToList();
foreach (var file in fileItems)
{
using (var ziped = ZipFile.Read(GetMemoryStreamFromFile(folderName, file.Name)))
{
foreach (var zipEntry in ziped)
{
using (var outPutStream = new MemoryStream())
{
zipEntry.Extract(outPutStream);
var blockblob = outPutContainer.GetBlockBlobReference(zipEntry.FileName);
outPutStream.Seek(0, SeekOrigin.Begin);
blockblob.UploadFromStream(outPutStream);
}
}
}
}
}
Another note. We are unable to change the way the multi-volume archive is generated. Any help would be appreciated.

OutOfMemory exception when trying to download multiple files as a Zip file using Ionic.Zip dll

This is my working code that I used to download multiple files as a zip file using Ionic.Zip dll. File contents is stored in a SQL database. This program works if I try to download 1-2 files at a time, but throws an OutOfMemory exception if I try to download multiple files as some of the files may very large.
Exception occurs when it's trying to write in to outputStream.
How can I improve this code to download multiple files or is there a better way to download multiple files one by one rather than zipping them to a one large file?
Code:
public ActionResult DownloadMultipleFiles()
{
string connectionString = "MY DB CONNECTIOBN STRING";
List<Document> documents = new List<Document>();
var query = "MY LIST OF FILES - FILE METADA DATA LIKE FILEID, FILENAME";
documents = query.Query<Document>(connectionString1).ToList();
List<Document> DOCS = documents.GetRange(0, 50); // 50 FILES
Response.Clear();
var outputStream = new MemoryStream();
using (var zip = new ZipFile())
{
foreach (var doc in DOCS)
{
Stream stream = new MemoryStream();
byte[] content = GetFileContent(doc.FileContentId); // This method returns file content
stream.Write(content, 0, content.Length);
zip.UseZip64WhenSaving = Zip64Option.AsNecessary // edited
zip.AddEntry(doc.FileName, content);
}
zip.Save(outputStream);
}
return File(outputStream, "application/zip", "allFiles.zip");
}
Download the files to disc instead of to memory, then use Ionic to zip them from disc. This means you don't need to have all the files in memory at once.

Categories