System.IO.Compression.ZipArchive work with Large File

System.IO.Compression.ZipArchive work with Large File - c#

I have a code that is SSIS script task to zip file written in C#.
I have problem when zipping 1gb (approxymately) file.
I try to implement this code and still get error 'System.OutOfMemoryException'
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at ST_4cb59661fb81431abcf503766697a1db.ScriptMain.AddFileToZipUsingStream(String sZipFile, String sFilePath, String sFileName, String sBackupFolder, String sPrefixFolder) in c:\Users\dtmp857\AppData\Local\Temp\vsta\84bef43d323b439ba25df47c365b5a29\ScriptMain.cs:line 333
at ST_4cb59661fb81431abcf503766697a1db.ScriptMain.Main() in c:\Users\dtmp857\AppData\Local\Temp\vsta\84bef43d323b439ba25df47c365b5a29\ScriptMain.cs:line 131
This is the snippet of code when zipping file:
protected bool AddFileToZipUsingStream(string sZipFile, string sFilePath, string sFileName, string sBackupFolder, string sPrefixFolder)
{
bool bIsSuccess = false;
try
{
if (File.Exists(sZipFile))
{
using (ZipArchive addFile = ZipFile.Open(sZipFile, ZipArchiveMode.Update))
{
addFile.CreateEntryFromFile(sFilePath, sFileName);
//Move File after zipping it
BackupFile(sFilePath, sBackupFolder, sPrefixFolder);
}
}
else
{
//from https://stackoverflow.com/questions/28360775/adding-large-files-to-io-compression-ziparchiveentry-throws-outofmemoryexception
using (var zipFile = ZipFile.Open(sZipFile, ZipArchiveMode.Update))
{
var zipEntry = zipFile.CreateEntry(sFileName);
using (var writer = new BinaryWriter(zipEntry.Open()))
using (FileStream fs = File.Open(sFilePath, FileMode.Open))
{
var buffer = new byte[16 * 1024];
using (var data = new BinaryReader(fs))
{
int read;
while ((read = data.Read(buffer, 0, buffer.Length)) > 0)
writer.Write(buffer, 0, read);
}
}
}
//Move File after zipping it
BackupFile(sFilePath, sBackupFolder, sPrefixFolder);
}
bIsSuccess = true;
}
catch (Exception ex)
{
throw ex;
}
return bIsSuccess;
}
What I am missing, please give me suggestion maybe tutorial or best practice handling this problem.

I know this is an old post but what can I say, it helped me sort out some stuff and still comes up as a top hit on Google.
So there is definitely something wrong with the System.IO.Compression library!
First and Foremost...
You must make sure to turn off 32-Preferred. Having this set (in my case with a build for "AnyCPU") causes so many inconsistent issues.
Now with that said, I took some demo files (several less than 500MB, one at 500MB, and one at 1GB), and created a sample program with 3 buttons that made use of the 3 methods.
Button 1 - ZipArchive.CreateFromDirectory(AbsolutePath, TargetFile);
Button 2 - ZipArchive.CreateEntryFromFile(AbsolutePath, RelativePath);
Button 3 - Using the [16 * 1024] Byte Buffer method from above
Now here is where it gets interesting. (Assuming that the program is built as "AnyCPU" and with NO 32 Preferred check)... all 3 Methods worked on a Windows 64-Bit OS, regardless of how much memory it had.
However, as soon as I ran the same test on a 32-Bit OS, regardless of how much memory it had, ONLY method 1 worked!
Method 2 and 3 blew up with the outofmemory error AND to add salt to it, method 3 (the preferred method of chunking) actually corrupted more files than method #2!
By messed up, I mean that of my files, the 500MB and the 1GB file ended up in the zipped archive but at a size less than the original (it was basically truncated).
So I dunno... since there are not many 32-bit OS around anymore, I guess maybe it is a moot point.
But seems like there are some bugs in the System.IO.Compression Framework!

Related

File Copy Program Doesn't Properly Copy File

Hello
I've been working on terminal-like application to get better at programming in c#, just something to help me learn. I've decided to add a feature that will copy a file exactly as it is, to a new file... It seems to work almost perfect. When opened in Notepad++ the file are only a few lines apart in length, and very, very, close to the same as far as actual file size goes. However, the duplicated copy of the file never runs. It says the file is corrupt. I have a feeling it's within the methods for reading and rewriting binary to files that I created. The code is as follows, thank for the help. Sorry for the spaghetti code too, I get a bit sloppy when I'm messing around with new ideas.
Class that handles the file copying/writing
using System;
using System.IO;
//using System.Collections.Generic;
namespace ConsoleFileExplorer
{
class FileTransfer
{
private BinaryWriter writer;
private BinaryReader reader;
private FileStream fsc; // file to be duplicated
private FileStream fsn; // new location of file
int[] fileData;
private string _file;
public FileTransfer(String file)
{
_file = file;
fsc = new FileStream(file, FileMode.Open);
reader = new BinaryReader(fsc);
}
// Reads all the original files data to an array of bytes
public byte[] ReadAllDataToArray()
{
byte[] bytes = reader.ReadBytes((int)fsc.Length); // reading bytes from the original file
return bytes;
}
// writes the array of original byte data to a new file
public void WriteDataFromArray(byte[] fileData, string path) // got a feeling this is the problem :p
{
fsn = new FileStream(path, FileMode.Create);
writer = new BinaryWriter(fsn);
int i = 0;
while(i < fileData.Length)
{
writer.Write(fileData[i]);
i++;
}
}
}
}
Code that interacts with this class .
(Sleep(5000) is because I was expecting an error on first attempt...
case '3':
Console.Write("Enter source file: ");
string sourceFile = Console.ReadLine();
if (sourceFile == "")
{
Console.Clear();
Console.ForegroundColor = ConsoleColor.DarkRed;
Console.Error.WriteLine("Must input a proper file path.\n");
Console.ForegroundColor = ConsoleColor.White;
Menu();
} else {
Console.WriteLine("Copying Data"); System.Threading.Thread.Sleep(5000);
FileTransfer trans = new FileTransfer(sourceFile);
//copying the original files data
byte[] data = trans.ReadAllDataToArray();
Console.Write("Enter Location to store data: ");
string newPath = Console.ReadLine();
// Just for me to make sure it doesnt exit if i forget
if(newPath == "")
{
Console.Clear();
Console.ForegroundColor = ConsoleColor.DarkRed;
Console.Error.WriteLine("Cannot have empty path.");
Console.ForegroundColor = ConsoleColor.White;
Menu();
} else
{
Console.WriteLine("Writing data to file"); System.Threading.Thread.Sleep(5000);
trans.WriteDataFromArray(data, newPath);
Console.WriteLine("File stored.");
Console.ReadLine();
Console.Clear();
Menu();
}
}
break;
File compared to new file
right-click -> open in new tab is probably a good idea
Original File
New File

You're not properly disposing the file streams and the binary writer. Both tend to buffer data (which is a good thing, especially when you're writing one byte at a time). Use using, and your problem should disappear. Unless somebody is editing the file while you're reading it, of course.
BinaryReader and BinaryWriter do not just write "raw data". They also add metadata as needed - they're designed for serialization and deserialization, rather than reading and writing bytes. Now, in the particular case of using ReadBytes and Write(byte[]) in particular, those are really just raw bytes; but there's not much point to use these classes just for that. Reading and writing bytes is the thing every Stream gives you - and that includes FileStreams. There's no reason to use BinaryReader/BinaryWriter here whatsover - the file streams give you everything you need.
A better approach would be to simply use
using (var fsn = ...)
{
fsn.Write(fileData, 0, fileData.Length);
}
or even just
File.WriteAllBytes(fileName, fileData);
Maybe you're thinking that writing a byte at a time is closer to "the metal", but that simply isn't the case. At no point during this does the CPU pass a byte at a time to the hard drive. Instead, the hard drive copies data directly from RAM, with no intervention from the CPU. And most hard drives still can't write (or read) arbitrary amounts of data from the physical media - instead, you're reading and writing whole sectors. If the system really did write a byte at a time, you'd just keep rewriting the same sector over and over again, just to write one more byte.
An even better approach would be to use the fact that you've got file streams open, and stream the files from source to destination rather than first reading everything into memory, and then writing it back to disk.

There is an File.Copy() Method in C#, you can see it here https://msdn.microsoft.com/ru-ru/library/c6cfw35a(v=vs.110).aspx
If you want to realize it by yourself, try to place a breakpoint inside your methods and use a debug. It is like a story about fisher and god, who gived a rod to fisher - to got a fish, not the exactly fish.
Also, look at you int[] fileData and byte[] fileData inside last method, maybe this is problem.

Out of memory exception reading "Large" file

I'm trying to serialize an object into a string
The first problem I encountered was that the XMLSerializer.Serialize method threw an Out of memory exception, I've trying all kind of solutions and none worked so I serialized it into a file.
The file is about 300mb's (32 bit process, 8gb ram) and trying to read it with StreamReader.ReadToEnd also results in Out of memory exception.
The XML format and loading it on a string are not an option but a must.
The question is:
Any reason that a 300mb file will throw that kind of exception? 300mb is not really a large file.
Serialization code that fails on .Serialize
using (MemoryStream ms = new MemoryStream())
{
var type = obj.GetType();
if (!serializers.ContainsKey(type))
serializers.Add(type,new XmlSerializer(type));
// new XmlSerializer(obj.GetType()).Serialize(ms, obj);
serializers[type].Serialize(ms, obj);
ms.Position = 0;
using (StreamReader sr = new StreamReader(ms))
{
return sr.ReadToEnd();
}
}
Serialization and read from file that fails on ReadToEnd
var type = obj.GetType();
if (!serializers.ContainsKey(type))
serializers.Add(type,new XmlSerializer(type));
FileStream fs = new FileStream(#"c:/temp.xml", FileMode.Create);
TextWriter writer = new StreamWriter(fs, new UTF8Encoding());
serializers[type].Serialize(writer, obj);
writer.Close();
fs.Close();
using (StreamReader sr = new StreamReader(#"c:/temp.xml"))
{
return sr.ReadToEnd();
}
The object is large because its an elaborate system entire configuration object...
UPDATE:
Reading the file in chucks (8*1024 chars) will load the file into a StringBuilder but the builders fails on ToString().... starting to think there is no way which is really strange.

Yeah, if you're using 32-bit, trying to load 300MB in one chunk is going to be awkward, especially when using approaches that don't know the final size (number of characters, not bytes) in advance, thus have to keep doubling an internal buffer. And that is just when processing the string! It then needs to rip that into a DOM, which can often take several times as much space as the underlying data. And finally, you need to deserialize it into the actual objects, usually taking about the same again.
So - indeed, trying to do this in 32-bit will be tough.
The first thing to try is: don't use ReadToEnd - just use XmlReader.Create with either the file path or the FileStream, and let XmlReader worry about how to load the data. Don't load the contents for it.
After that... the next thing to do is: don't limit it to 32-bit.
Well, you could try enabling the 3GB switch, but... moving to 64-bit would be preferable.
Aside: xml is not a good choice for large volumes of data.

Exploring the source code for StreamReader.ReadToEnd reveals that it internally makes use of the StringBuilder.Append method:
public override String ReadToEnd()
{
if (stream == null)
__Error.ReaderClosed();
#if FEATURE_ASYNC_IO
CheckAsyncTaskInProgress();
#endif
// Call ReadBuffer, then pull data out of charBuffer.
StringBuilder sb = new StringBuilder(charLen - charPos);
do {
sb.Append(charBuffer, charPos, charLen - charPos);
charPos = charLen; // Note we consumed these characters
ReadBuffer();
} while (charLen > 0);
return sb.ToString();
}
which most probably throws this exception that leads to the this question/answer: interesting OutOfMemoryException with StringBuilder
al

Issue with memory management and program performance

OK, I made a C# winform app, it's a File_Splitter_Joiner.
You just give it a file and it splits it for you to a number of pieces you specify.
The splitting is done in a separate thread.
Everything was working pretty fine until I sliced a 1Gig file!
In the task manager, I saw that my program started consuming 1Gigabyte of memory and my computer almost died!
not just that, when slicing finished, the consuming didn't change!
(dunno if this means that the garbage collector isn't working, although I'm pretty sure that I lost all references to what was holding the big data chumps, so it should work)
Here's the Splitter constructor (just to give you a better idea):
public FileSplitter(string FileToSplitPath, string PiecesFolder, int NumberOfPieces, int PieceSize, SplittingMethod Method)
{
FileToSplitInfo = new FileInfo(FileToSplitPath);
this.FileToSplitPath = FileToSplitPath;
this.PiecesFolder = PiecesFolder;
this.NumberOfPieces = NumberOfPieces;
this.PieceSize = PieceSize;
this.Method = Method;
SplitterThread = new Thread(Split);
}
And here is the method that did the actual splitting:
(I'm still a newbie, so what you're about to see 'may not' be done in the best way ever possible, I'm just learning here)
private void Split()
{
int remainingSize = 0;
int remainingPos = -1;
bool isNumberOfPiecesEqualInSize = true;
int fileSize = (int)FileToSplitInfo.Length; // FileToSplitInfo is a FileInfo object
if (fileSize % PieceSize != 0)
{
remainingSize = fileSize % PieceSize;
remainingPos = fileSize - remainingSize;
isNumberOfPiecesEqualInSize = false;
}
byte[] fileBytes = new byte[fileSize];
var _fs = File.Open(FileToSplitPath, FileMode.Open);
BinaryReader br = new BinaryReader(_fs);
br.Read(fileBytes, 0, fileSize);
br.Close();
_fs.Close();
for (int i = 0, index = 0; i < NumberOfPieces; i++, index += PieceSize)
{
var fs = File.Create(PiecesFolder + "\\" + Path.GetFileName(FileToSplitPath) + "." + (i+1).ToString());
var bw = new BinaryWriter(fs);
bw.Write(fileBytes, index, PieceSize);
if(i == NumberOfPieces-1 && !isNumberOfPiecesEqualInSize && Method == SplittingMethod.NumberOfPieces)
bw.Write(fileBytes, remainingPos, remainingSize);
bw.Close();
fs.Close();
}
MessageBox.Show("File has been splitted successfully!");
SplitterThread.Abort();
}
Now, instead of reading the bytes of the file via a BinaryReader, I was first reading it via the File.ReadAllBytes method, it was working fine with small file sizes, but, I got a "SystemOutOfMemory" exception when I dealt with our big guy, dunno why I didn't get that exception when I read the bytes via a BinaryReader.
(that was an in between question)
So, the main question is, how can I load big files (gigs speaking) in a way that doesn't consume so much memory ? I mean, how can I make my program not consume all that memory ?
and how I can I free the used memory after the splitting is done ?
(I actually used
bw.Dispose; fs.Dispose;
instead of
bw.Close(); fs.Close();
it was the same.
I know the Q might not make sense, cuz when we load something, it gets in our memory not somewhere else, but, the reason I asked it like that, is cuz I used another Splitting_Joining program (not written by me) just to see that if it had the same problem, I loaded the file, the program consumed about 5Migs of ram, when I started splitting, it used about 10Migs!!
Now that is a VERY big difference .. Probably that app was in C/C++ ..
So to sum up, who sucks ? is it my code, and if so how can I fix it ? or is it C# when it comes to performance ?
Thank you SOOO much for anything you could hook me up with :)

The following 2 lines will kil you:
int fileSize = (int)FileToSplitInfo.Length; // a FileInfo object
...
byte[] fileBytes = new byte[fileSize];
Your code will fail when the size is over Int32.MaxValue. Unnecessary, just use long fileSize = FileToSplitInfo.Length;
This corrected code will fail when there is not enough contiguous memory. Fragmentation (of the LOH) will bring you down sooner or later.
You allocate memory for the entire file but your only need PieceSize bytes at a time.
You don't even need to know the fileSize, just
byte[] pieceBuffer = new byte[PieceSize];
while (true)
{
int nBytes = br.Read(pieceBuffer, 0, pieceBuffer.Length);
if (nBytes == 0)
break;
// write this piece, the length is nBytes
}

There are different aspects that can be made better:
if you are working with big file, why first read all inside an array and after write into another file ? Just write into the new file while reading from the other.
use using to gurantee disposal of the streams, in any case: either there is an exception or not.
if you begin to work with really big file, like 1GB or even more, I would recommend to look on Memory Mapped Files. So you will laverage incredible memory consuption benefit with some increased performance cost.

Insufficient system resources exist to complete the requested service

I have a Web Application hosted in IIS 6 on a Windows Server 2003 box and have to handle 2 large PDF files around 7-8mb, these files are read by the website from a network share and the bytes passed to a WCF service for saving elsewhere.
here is the code I use to read the Files:
public static byte[] ReadFile(string filePath)
{
int count;
int sum = 0;
byte[] buffer;
FileStream stream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
try
{
int length = (int)stream.Length;
buffer = new byte[length];
while ((count = stream.Read(buffer, sum, length - sum)) > 0)
sum += count;
return buffer;
}
catch (Exception)
{
throw;
}
finally
{
stream.Close();
stream.Dispose();
}
}
An error is thrown on the stream.Read() and the error is:
Insufficient system resources exist to complete the requested service
This code works in my dev environment but as soon as I post to our production environment we get this error message.
I have seen this error has surfaced a few times searching round and the worfaround for this is to use File.Move() but we can not do this as the file needs to be passed to a WCF service method.
Is there something in IIS6 that needs to be changed to allow holding 15-20mb in memory when reading file? or is there something else that needs to be configured?
Any ideas?

See this:
Why I need to read file piece by piece to buffer?
It seems you are reading the whole file, without buffering..
buffer = new byte[length];
Best regards.

C#: Appending contents of one text file to another text file

There is probably no other way to do this, but is there a way to append the contents of one text file into another text file, while clearing the first after the move?
The only way I know is to just use a reader and writer, which seems inefficient for large files...
Thanks!

No, I don't think there's anything which does this.
If the two files use the same encoding and you don't need to verify that they're valid, you can treat them as binary files, e.g.
using (Stream input = File.OpenRead("file1.txt"))
using (Stream output = new FileStream("file2.txt", FileMode.Append,
FileAccess.Write, FileShare.None))
{
input.CopyTo(output); // Using .NET 4
}
File.Delete("file1.txt");
Note that if file1.txt contains a byte order mark, you should skip past this first to avoid having it in the middle of file2.txt.
If you're not using .NET 4 you can write your own equivalent of Stream.CopyTo... even with an extension method to make the hand-over seamless:
public static class StreamExtensions
{
public static void CopyTo(this Stream input, Stream output)
{
if (input == null)
{
throw new ArgumentNullException("input");
}
if (output == null)
{
throw new ArgumentNullException("output");
}
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
}
}

Ignoring error handling, encodings, and efficiency for the moment, something like this would probably work (but I haven't tested it)
File.AppendAllText("path/to/destination/file", File.ReadAllText("path/to/source/file"));
Then you just have to delete or clear out the first file once this step is complete.

The cmd.exe version of this is
type fileone.txt >>filetwo.txt
del fileone.txt
You could create a system shell to do this. It should be pretty effecient.

I'm not sure what you mean by "inefficient". Jon's answer is probably enough for most cases.
However, if you are concerned about extremely large source files, Memory-Mapped Files could be your friend. See this link for more info.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.