I am getting OutofMemoryException while trying to add files to a .zip file. I am using 32-bit architecture for building and running the application.
string[] filePaths = Directory.GetFiles(Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData) + "\\capture\\capture");
System.IO.Compression.ZipArchive zip = ZipFile.Open(filePaths1[c], ZipArchiveMode.Update);
foreach (String filePath in filePaths)
{
string nm = Path.GetFileName(filePath);
zip.CreateEntryFromFile(filePath, "capture/" + nm, CompressionLevel.Optimal);
}
zip.Dispose();
zip = null;
I am unable to understand the reason behind it.
The exact reason depends on a variety of factors, but most likely you are simply just adding too much to the archive. Try using the ZipArchiveMode.Create option instead, which writes the archive directly to disk without caching it in memory.
If you are really trying to update an existing archive, you can still use ZipArchiveMode.Create. But it will require opening the existing archive, copying all of its contents to a new archive (using Create), and then adding the new content.
Without a good, minimal, complete code example, it would not be possible to say for sure where the exception is coming from, never mind how to fix it.
EDIT:
Here is what I mean by "…opening the existing archive, copying all of its contents to a new archive (using Create), and then adding the new content":
string[] filePaths = Directory.GetFiles(Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData) + "\\capture\\capture");
using (ZipArchive zipFrom = ZipFile.Open(filePaths1[c], ZipArchiveMode.Read))
using (ZipArchive zipTo = ZipFile.Open(filePaths1[c] + ".tmp", ZipArchiveMode.Create))
{
foreach (ZipArchiveEntry entryFrom in zipFrom.Entries)
{
ZipArchiveEntry entryTo = zipTo.CreateEntry(entryFrom.FullName);
using (Stream streamFrom = entryFrom.Open())
using (Stream streamTo = entryTo.Open())
{
streamFrom.CopyTo(streamTo);
}
}
foreach (String filePath in filePaths)
{
string nm = Path.GetFileName(filePath);
zipTo.CreateEntryFromFile(filePath, "capture/" + nm, CompressionLevel.Optimal);
}
}
File.Delete(filePaths1[c]);
File.Move(filePaths1[c] + ".tmp", filePaths1[c]);
Or something like that. Lacking a good, minimal, complete code example, I just wrote the above in my browser. I didn't try to compile it, never mind test it. And you may want to adjust some specifics (e.g. the handling of the temp file). But hopefully you get the idea.
The reason is simple. OutOfMemoryException means memory is not enough for the execution.
Compression consumes a lot of memory. There is no guarantee that a change of logic can solve the problem. But you can consider different methods to alleviate it.
1.
Since your main program must be 32-bit, you can consider starting another 64-bit process to do the compression (use System.Diagnostics.Process.Start). After the 64-bit process finishes its job and exits, your 32-bit main program can continue. You can simply use a tool already installed on the system, or write a simple program yourself.
2.
Another method is to dispose each time you add an entry.
ZipArchive.Dispose saves the file. After each iteration, memory allocated for the ZipArchive can be freed.
foreach (String filePath in filePaths)
{
System.IO.Compression.ZipArchive zip = ZipFile.Open(filePaths1[c], ZipArchiveMode.Update);
string nm = Path.GetFileName(filePath);
zip.CreateEntryFromFile(filePath, "capture/" + nm, CompressionLevel.Optimal);
zip.Dispose();
}
This approach is not straightforward, and it might not be as effective as the first approach.
Related
I am trying to create a function that will retrieve all the uploaded files (which are now saved as byte in the database) and download it in a single zip file. I currently have 6000 files to download (and the number could grow).
The functionality is already working (from retrieval to download) if I limit the number of files being downloaded, otherwise, I get an OutOfMemoryException on the ForEach loop.
Here's a pseudo code: (files variable is a list of byte array and file name)
var files = getAllFilesFromDB();
foreach (var file in files)
{
var tempFilePath = Path.Combine(path, filename);
using (FileStream stream = new FileStream(tempfileName, FileMode.Create, FileAccess.ReadWrite))
{
stream.Write(file.byteArray, 0, file.byteArray.Length);
}
}
private readonly IEntityRepository<File> fileRepository;
IEnumerable<FileModel> getAllFilesFromDb()
{
return fileRepository.Select(f => new FileModel(){ fileData = f.byteArray, filename = f.fileName});
}
My question is, is there any other way to do this to avoid getting such errors?
To avoid this problem, you could avoid loading all the contents of all the files in one go. Most likely you will need to split your database call in to two database calls.
Retrieve a list of all the files without their contents but with some identifier - like the PK of the table.
A method which retrieves the contents of an individual file.
Then your (pseudo)code becomes
get list of all files
for each file
get the file contents
write the file to disk
Another possibility is to alter the way your query works currently, so that it uses deferred execution - this means it will not actually load all the files at once, but stream them one at a time from the database - but without seeing more code from your repository implementation, I cannot/ will not guess the right solution for you.
I have a 7zip archive craeted with LZMA2 compression (compression level: ultra).
The archive contains 1,749 files, which in total originally had a size of 661mb.
The zipped file is 39mb in size.
Now I'm trying to use C# to extract a tiny (~200kb'ish) single file from this archive.
I'm getting the corresponding IArchiveEntry from the IArchive (which works relatively fast),
but then calling IArchiveEntry.WriteToFile(targetPath) takes around 33 seconds! And similarly long if I write to a memory stream instead. (edit: When I'm running this on a 7z LZMA2 archive with compression level = normal, it still takes 9 seconds)
When I'm opening the same archive in the actual 7zip application and extract the same file from there, it takes around 2-3 seconds only.
I suspected it's some sort of multicore (7zip) vs single core (SharpCompress probably?) thing, but I don't notice any CPU usage spike during decompression with 7zip.. maybe its too fast to be noticeable though..
Does anyone know what could be the issue for such slow speeds with SharpCompress? Am I maybe missing some setting or using a wrong factory (ArchiveFactory) ?
If not - is there any C# library out there that might be significantly faster at decompressing this?
For reference, here's a sketch of how I'm using SharpCompress to extract:
private void Extract()
{
using(var archive = GetArchive())
{
var entryPath = /* ... path to entry .. */
var entry = TryGetEntry(archive, entryPath);
entry.WriteToFile(some_target_path);
}
}
private IArchive GetArchive()
{
string path = /* .. path to my .7z file */;
return ArchiveFactory.Open(path);
}
private IArchiveEntry TryGetEntry(IArchive archive, string path)
{
path = path.Replace("\\", "/");
foreach (var entry in archive.Entries)
{
if (!entry.IsDirectory)
{
if (entry.Key == path)
return entry;
}
}
return null;
}
Update: For a temporary solution, I'm now including the 7zr.exe from the 7zip SDK in my application, and run this in a new process to extract a single file, reading the process' output into a binary stream.
This works in around ~3 seconds compared to the ~33seconds with SharpCompress. Works for now, but kind of ugly.. so still curious why SharpCompress seems to be so slow there
This line is the problem
foreach (var entry in archive.Entries)
The problem is described here (ie. If there are 100 files, it decompresses the 1st file 100 times, 2nd file 99 times, and so on)
You need to use reader (forward-only). See the API.
But the sample code there doesn't support 7z.
For 7z you can use archive.ExtractAllEntries(), eg.
var reader = archive.ExtractAllEntries();
while (reader.MoveToNextEntry())
{
if (!reader.Entry.IsDirectory)
reader.WriteEntryToDirectory(extractDir, new ExtractionOptions() { ExtractFullPath = false, Overwrite = true });
}
It will be much faster.
If you need all the files you could also do:
using var reader = archive.ExtractAllEntries();
reader.WriteAllToDirectory(targetPath, new ExtractionOptions() { ExtractFullPath = true, Overwrite = true });
How can I read content of a text file inside a zip archive?
For example I have an archive qwe.zip, and insite it there's a file asd.txt, so how can I read contents of that file?
Is it possible to do without extracting the whole archive? Because it need to be done quick, when user clicks a item in a list, to show description of the archive (it needed for plugin system for another program). So extracting a whole archive isn't the best solution... because it might be few Mb, which will take at least few seconds or even more to extract... while only that single file need to be read.
You could use a library such as SharpZipLib or DotNetZip to unzip the file and fetch the contents of individual files contained inside. This operation could be performed in-memory and you don't need to store the files into a temporary folder.
Unzip to a temp-folder take the file and delete the temp-data
public static void Decompress(string outputDirectory, string zipFile)
{
try
{
if (!File.Exists(zipFile))
throw new FileNotFoundException("Zip file not found.", zipFile);
Package zipPackage = ZipPackage.Open(zipFile, FileMode.Open, FileAccess.Read);
foreach (PackagePart part in zipPackage.GetParts())
{
string targetFile = outputDirectory + "\\" + part.Uri.ToString().TrimStart('/');
using (Stream streamSource = part.GetStream(FileMode.Open, FileAccess.Read))
{
using (Stream streamDestination = File.OpenWrite(targetFile))
{
Byte[] arrBuffer = new byte[10000];
int iRead = streamSource.Read(arrBuffer, 0, arrBuffer.Length);
while (iRead > 0)
{
streamDestination.Write(arrBuffer, 0, iRead);
iRead = streamSource.Read(arrBuffer, 0, arrBuffer.Length);
}
}
}
}
}
catch (Exception)
{
throw;
}
}
Although late in the game and the question is already answered, in hope that this still might be useful for others who find this thread, I would like to add another solution.
Just today I encountered a similar problem when I wanted to check the contents of a ZIP file with C#. Other than NewProger I cannot use a third party library and need to stay within the out-of-the-box .NET classes.
You can use the System.IO.Packaging namespace and use the ZipPackage class. If it is not already included in the assembly, you need to add a reference to WindowsBase.dll.
It seems, however, that this class does not always work with every Zip file. Calling GetParts() may return an empty list although in the QuickWatch window you can find a property called _zipArchive that contains the correct contents.
If this is the case for you, you can use Reflection to get the contents of it.
On geissingert.com you can find a blog article ("Getting a list of files from a ZipPackage") that gives a coding example for this.
SharpZipLib or DotNetZip may still need to get/read the whole .zip file to unzip a file. Actually, there is still method could make you just extract special file from the .zip file without reading the entire .zip file but just reading small segment.
I needed to have insights into Excel files, I did it like so:
using (var zip = ZipFile.Open("ExcelWorkbookWithMacros.xlsm", ZipArchiveMode.Update))
{
var entry = zip.GetEntry("xl/_rels/workbook.xml.rels");
if (entry != null)
{
var tempFile = Path.GetTempFileName();
entry.ExtractToFile(tempFile, true);
var content = File.ReadAllText(tempFile);
[...]
}
}
I tried to use FileInfo.CreationTime, but it doesn't represent the copy finish time.
I am trying to get a list of files in a directory. The problem is that the call also returns files which are not yet finished copying.
If I try to use the file, it returns an error stating that the file is in use.
How can you query for files that are fully copied?
As below code. Directory.GetFiles() returns which are not yet finished copying.
my test file size is over 200Mb.
if(String.IsNullOrEmpty(strDirectoryPath)){
txtResultPrint.AppendText("ERROR : Wrong Directory Name! ");
}else{
string[] newFiles = Directory.GetFiles(strDirectoryPath,"*.epk");
_epkList.PushNewFileList(newFiles);
if(_epkList.IsNewFileAdded()){
foreach (var fileName in _epkList.GetNewlyAddedFile()){
txtResultPrint.AppendText(DateTime.Now.Hour + ":" + DateTime.Now.Minute + ":" + DateTime.Now.Second + " => ");
txtResultPrint.AppendText(fileName + Environment.NewLine);
this.Visible = true;
notifyIconMain.Visible = true;
}
}else{
}
}
If performance and best-practices aren't huge concerns then you could simply wrap the failing file operation in an inner-scoped try/catch.
using System.IO;
string[] files = Directory.GetFiles("pathToFiles");
foreach (string file in files) {
FileStream fs = null;
try {
//try to open file for exclusive access
fs = new FileStream(
file,
FileMode.Open,
FileAccess.Read, //we might not have Read/Write privileges
FileShare.None //request exclusive (non-shared) access
);
}
catch (IOException ioe) {
//File is in use by another process, or doesn't exist
}
finally {
if (fs != null)
fs.Close();
}
}
This isn't really the best design advice as you shouldn't be relying on exception handling for this sort of thing, but if you're in a pinch and it's not code for a client or for your boss then this should work alright until a better solution is suggested or found.
Do you have the ability to change thy copying itself?
If yes (and if you can guarantee that your program will always execute on NTFS on Windows Vista or newer), you can use Transactional NTFS to wrap the copy in a single transaction. File(s) being copied will only become visible to the rest of the world after you commit the transaction, so you'll never even see the partially copied files.
Unfortunately Transactional NTFS is not accessible directly from .NET Framework - you'll need to P/Invoke into Win32 APi functions such as: CreateTransaction, CommitTransaction, RollbackTransaction, CopyFileTransacted (and other *Transacted functions).
Here is the scenario:
I have a directory with 2+ million files. The code I have below writes out all the files in about 90 minutes. Does anybody have a way to speed it up or make this code more efficent? I'd also like to only write out the file names in the listing.
string lines = (listBox1.Items.ToString());
string sourcefolder1 = textBox1.Text;
string destinationfolder = (#"C:\anfiles");
using (StreamWriter output = new StreamWriter(destinationfolder + "\\" + "MasterANN.txt"))
{
string[] files = Directory.GetFiles(textBox1.Text, "*.txt");
foreach (string file in files)
{
FileInfo file_info = new FileInfo(file);
output.WriteLine(file_info.Name);
}
}
The slow down is that it writes out 1 line at a time.
It takes about 13-15 minutes to get all the files it needs to write out.
The following 75 minutes is creating the file.
It could help if you don't make a FileInfo instance for every file, use Path.GetFileName instead:
string lines = (listBox1.Items.ToString());
string sourcefolder1 = textBox1.Text;
string destinationfolder = (#"C:\anfiles");
using (StreamWriter output = new StreamWriter(Path.Combine(destinationfolder, "MasterANN.txt"))
{
string[] files = Directory.GetFiles(textBox1.Text, "*.txt");
foreach (string file in files)
{
output.WriteLine(Path.GetFileName(file));
}
}
You're reading 2+ million file descriptors into memory. Depending on how much memory you have you may well be swapping. Try breaking it up into smaller chunks by filtering on the file name.
The first thing I would need to know is, where's the slow down? is it taking 89 minutes for Directory.GetFiles() to execute or is the delay spread out over the calls to FileInfo file_info = new FileInfo(file);?
If the delay is from the latter, you can probably speed things up by getting the file name from the path instead of creating an FileInfo instance to get the filename.
System.IO.Path.GetFileName(file);
From my experience, it's Directory.GetFiles that's slowing you down (aside from console output). To overcome this, P/Invoke into FindFirstFile/FindNextFile to avoid all the memory consumption and generall lagginess.
Using Directory.EnumerateFiles do not need to load all the file names in to memory first. Check this out: C# directory.getfiles memory help
In your case, the code could be:
using (StreamWriter output = new StreamWriter(destinationfolder + "\\" + "MasterANN.txt"))
{
foreach (var file in Directory.EnumerateFiles(sourcefolder, "*.txt"))
{
output.WriteLine(Path.GetFileName(file));
}
}
From this doc, it said that:
The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.
So if you have sufficient memory, Directory.GetFiles is ok. But Directory.EnumerateFiles is much better when a folder contains millions of files.