System.OutOfMemory exception when trying to read large files - c#

public static byte[] ReadMemoryMappedFile(string fileName)
{
long length = new FileInfo(fileName).Length;
using (var stream = File.Open(fileName, FileMode.OpenOrCreate, FileAccess.Read, FileShare.ReadWrite))
{
using (var mmf = MemoryMappedFile.CreateFromFile(stream, null, length, MemoryMappedFileAccess.Read, null, HandleInheritability.Inheritable, false))
{
using (var viewStream = mmf.CreateViewStream(0, length, MemoryMappedFileAccess.Read))
{
using (BinaryReader binReader = new BinaryReader(viewStream))
{
var result = binReader.ReadBytes((int)length);
return result;
}
}
}
}
}
OpenFileDialog openfile = new OpenFileDialog();
openfile.Filter = "All Files (*.*)|*.*";
openfile.ShowDialog();
byte[] buff = ReadMemoryMappedFile(openfile.FileName);
texteditor.Text = BitConverter.ToString(buff).Replace("-"," "); <----A first chance exception of type 'System.OutOfMemoryException' occurred in mscorlib.dll
I get a System.OutOfMemory exception when trying to read large files.
I've read a lot for 4 weeks in all the web... and tried a lot!!! But still, I can't seem to find a good solution to my problem.
Please help me..
Update
public byte[] FileToByteArray(string fileName)
{
byte[] buff = null;
FileStream fs = new FileStream(fileName,
FileMode.Open,
FileAccess.Read);
BinaryReader br = new BinaryReader(fs);
long numBytes = new FileInfo(fileName).Length;
buff = br.ReadBytes((int)numBytes);
//return buff;
return File.ReadAllBytes(fileName);
}
OR
public static byte[] FileToByteArray(FileStream stream, int initialLength)
{
// If we've been passed an unhelpful initial length, just
// use 32K.
if (initialLength < 1)
{
initialLength = 32768;
}
BinaryReader br = new BinaryReader(stream);
byte[] buffer = new byte[initialLength];
int read = 0;
int chunk;
while ((chunk = br.Read(buffer, read, buffer.Length - read)) > 0)
{
read += chunk;
// If we've reached the end of our buffer, check to see if there's
// any more information
if (read == buffer.Length)
{
int nextByte = br.ReadByte();
// End of stream? If so, we're done
if (nextByte == -1)
{
return buffer;
}
// Nope. Resize the buffer, put in the byte we've just
// read, and continue
byte[] newBuffer = new byte[buffer.Length * 2];
Array.Copy(buffer, newBuffer, buffer.Length);
newBuffer[read] = (byte)nextByte;
buffer = newBuffer;
read++;
}
}
// Buffer is now too big. Shrink it.
byte[] ret = new byte[read];
Array.Copy(buffer, ret, read);
return ret;
}
I still get a System.OutOfMemory exception when trying to read large files.

If your file is 4GB, then BitConverter will turn each byte into XX- string, each char in string is 2 bytes * 3 chars per byte * 4 294 967 295 bytes = 25 769 803 770. You need +25Gb of free memory to fit entire string, plus you already have your file in memory as byte array.
Besides, no single object in a .Net program may be over 2GB. Theoretical limit for a string length would be 1,073,741,823 chars, but you also need to have a 64-bit process.
So solution in your case - open FileStream. Read first 16384 bytes (or how much can fit on your screen), convert to hex and display, and remember file offset. When user wants to navigate to next or previous page - seek to that position in file on disk, read and display again, etc.

You need to read the file in chunks, keep track of where you are in the file, page the contents on screen and use seek and position to move up and down in the file stream.

You will not be able to display 4Gb file reading all of it in memory first by any approach.
The approach is to virtualize the data, reading only the visible lines when user scrolls. If you need to do a read-only text viewer then you can use WPF ItemsControl with virtulizing stack panel and bind to custom IList collection which will lazily fetch lines from the file calculating file offset by for the line index.

Related

File sent over NetworkStream is received corrupted C#

My goal is to send a file over a TCP connection using NetworkStream.
I first send the length of the data I'm going to send, and then I use a filestream and a binary writter to send the data byte by byte.
While debugging the process, I found out that some '0' bytes are being put at the beggining of the file on the receiving end.
For example, the base file's content azertyuiop is received as azerty (4 spaces replacing uiop), causing files like images to be corrupted.
The code I've got so far :
(Where br is a BinaryReader and bw is a BinaryWriter)
Sender:
using (var readStream = new FileStream(fileLocation, FileMode.Open))
{
// Send the data length first
bw.Write(new FileInfo(fileLocation).Length);
bw.Flush();
var buffer = new byte[1];
while (readStream.Read(buffer, 0, 1) > 0)
{
bw.Write(buffer[0]);
bw.Flush();
}
}
Receiver:
// Get data length
var dataLength = br.ReadInt32();
using (var fs = new FileStream(newFileLocation, FileMode.Create))
{
var buffer = new byte[1];
for(int i = 0; i < dataLength; i++)
{
br.Read(buffer, 0, 1);
fs.Write(buffer, 0, 1);
}
}
What am I missing or doing wrong ?
The problem could be the following:
bw.Write(new FileInfo(fileLocation).Length);
...
var dataLength = br.ReadInt32();
The Length property is actually of type long (8 bytes). But you are reading the value as Int32 (4 bytes), leaving the other 4 bytes in the stream.
fileinfo.length is a long not an int32

Convert Stream to byte[] c# for large files of 2GB

Trying to convert Stream object to byte[] and using the below method for the same:
public static byte[] ReadFully(System.IO.Stream input)
{
byte[] buffer = new byte[16*1024];
using (System.IO.MemoryStream ms = new System.IO.MemoryStream())
{
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, read);
}
return ms.ToArray();
}
}
However the input parameter "input" is for large file that is of 2 GB and hence the code does not get enter into while loop and hence does not convert it to byte array.
For smaller files it is working fine
That's what a Stream is for.
You don't load the whole content into a byte[], you read a small buffer from the Stream into memory and handle it, then dispose and read the next buffer.
If you still need to use a byte[]:
It seems like your app can't handle more than 2^32 Bytes Memory, meaning it's 32bit.
Try changing it to 64bit (in Project Properties go to Build and disable Prefer 32 bit)
Object limited below 32bit. (that is why all index using int)
how about use list contains byte array to deal entire data?
public List<byte[]> ReadBytesList(string fileName)
{
List<byte[]> rawDataBytes= new List<byte[]>();
byte[] buff;
FileStream fs = new FileStream(fileName,
FileMode.Open,
FileAccess.Read);
BinaryReader br = new BinaryReader(fs);
long numBytes = new FileInfo(fileName).Length;
int arrayCount= (int)(numBytes / 2100000000); //2147483648 is max
int arrayRest = (int)(numBytes % 2100000000);
if(arrayCount>0)
{
for (int i = 0; i < arrayCount; i++)
{
buff = br.ReadBytes(2100000000);
rawDataBytes.Add(buff);
}
buff = br.ReadBytes(arrayRest);
rawDataBytes.Add(buff);
}
else
{
buff = br.ReadBytes(arrayRest);
rawDataBytes.Add(buff);
}
return rawDataBytes;
}

Convert a VERY LARGE binary file into a Base64String incrementally

I need help converting a VERY LARGE binary file (ZIP file) to a Base64String and back again. The files are too large to be loaded into memory all at once (they throw OutOfMemoryExceptions) otherwise this would be a simple task. I do not want to process the contents of the ZIP file individually, I want to process the entire ZIP file.
The problem:
I can convert the entire ZIP file (test sizes vary from 1 MB to 800 MB at present) to Base64String, but when I convert it back, it is corrupted. The new ZIP file is the correct size, it is recognized as a ZIP file by Windows and WinRAR/7-Zip, etc., and I can even look inside the ZIP file and see the contents with the correct sizes/properties, but when I attempt to extract from the ZIP file, I get: "Error: 0x80004005" which is a general error code.
I am not sure where or why the corruption is happening. I have done some investigating, and I have noticed the following:
If you have a large text file, you can convert it to Base64String incrementally without issue. If calling Convert.ToBase64String on the entire file yielded: "abcdefghijklmnopqrstuvwx", then calling it on the file in two pieces would yield: "abcdefghijkl" and "mnopqrstuvwx".
Unfortunately, if the file is a binary then the result is different. While the entire file might yield: "abcdefghijklmnopqrstuvwx", trying to process this in two pieces would yield something like: "oiweh87yakgb" and "kyckshfguywp".
Is there a way to incrementally base 64 encode a binary file while avoiding this corruption?
My code:
private void ConvertLargeFile()
{
FileStream inputStream = new FileStream("C:\\Users\\test\\Desktop\\my.zip", FileMode.Open, FileAccess.Read);
byte[] buffer = new byte[MultipleOfThree];
int bytesRead = inputStream.Read(buffer, 0, buffer.Length);
while(bytesRead > 0)
{
byte[] secondaryBuffer = new byte[buffer.Length];
int secondaryBufferBytesRead = bytesRead;
Array.Copy(buffer, secondaryBuffer, buffer.Length);
bool isFinalChunk = false;
Array.Clear(buffer, 0, buffer.Length);
bytesRead = inputStream.Read(buffer, 0, buffer.Length);
if(bytesRead == 0)
{
isFinalChunk = true;
buffer = new byte[secondaryBufferBytesRead];
Array.Copy(secondaryBuffer, buffer, buffer.length);
}
String base64String = Convert.ToBase64String(isFinalChunk ? buffer : secondaryBuffer);
File.AppendAllText("C:\\Users\\test\\Desktop\\Base64Zip", base64String);
}
inputStream.Dispose();
}
The decoding is more of the same. I use the size of the base64String variable above (which varies depending on the original buffer size that I test with), as the buffer size for decoding. Then, instead of Convert.ToBase64String(), I call Convert.FromBase64String() and write to a different file name/path.
EDIT:
In my haste to reduce the code (I refactored it into a new project, separate from other processing to eliminate code that isn't central to the issue) I introduced a bug. The base 64 conversion should be performed on the secondaryBuffer for all iterations save the last (Identified by isFinalChunk), when buffer should be used. I have corrected the code above.
EDIT #2:
Thank you all for your comments/feedback. After correcting the bug (see the above edit), I re-tested my code, and it is actually working now. I intend to test and implement #rene's solution as it appears to be the best, but I thought that I should let everyone know of my discovery as well.
Based on the code shown in the blog from Wiktor Zychla the following code works. This same solution is indicated in the remarks section of Convert.ToBase64String as pointed out by Ivan Stoev
// using System.Security.Cryptography
private void ConvertLargeFile()
{
//encode
var filein= #"C:\Users\test\Desktop\my.zip";
var fileout = #"C:\Users\test\Desktop\Base64Zip";
using (FileStream fs = File.Open(fileout, FileMode.Create))
using (var cs=new CryptoStream(fs, new ToBase64Transform(),
CryptoStreamMode.Write))
using(var fi =File.Open(filein, FileMode.Open))
{
fi.CopyTo(cs);
}
// the zip file is now stored in base64zip
// and decode
using (FileStream f64 = File.Open(fileout, FileMode.Open) )
using (var cs=new CryptoStream(f64, new FromBase64Transform(),
CryptoStreamMode.Read ) )
using(var fo =File.Open(filein +".orig", FileMode.Create))
{
cs.CopyTo(fo);
}
// the original file is in my.zip.orig
// use the commandlinetool
// fc my.zip my.zip.orig
// to verify that the start file and the encoded and decoded file
// are the same
}
The code uses standard classes found in System.Security.Cryptography namespace and uses a CryptoStream and the FromBase64Transform and its counterpart ToBase64Transform
You can avoid using a secondary buffer by passing offset and length to Convert.ToBase64String, like this:
private void ConvertLargeFile()
{
using (var inputStream = new FileStream("C:\\Users\\test\\Desktop\\my.zip", FileMode.Open, FileAccess.Read))
{
byte[] buffer = new byte[MultipleOfThree];
int bytesRead = inputStream.Read(buffer, 0, buffer.Length);
while(bytesRead > 0)
{
String base64String = Convert.ToBase64String(buffer, 0, bytesRead);
File.AppendAllText("C:\\Users\\test\\Desktop\\Base64Zip", base64String);
bytesRead = inputStream.Read(buffer, 0, buffer.Length);
}
}
}
The above should work, but I think Rene's answer is actually the better solution.
Use this code:
public void ConvertLargeFile(string source , string destination)
{
using (FileStream inputStream = new FileStream(source, FileMode.Open, FileAccess.Read))
{
int buffer_size = 30000; //or any multiple of 3
byte[] buffer = new byte[buffer_size];
int bytesRead = inputStream.Read(buffer, 0, buffer.Length);
while (bytesRead > 0)
{
byte[] buffer2 = buffer;
if(bytesRead < buffer_size)
{
buffer2 = new byte[bytesRead];
Buffer.BlockCopy(buffer, 0, buffer2, 0, bytesRead);
}
string base64String = System.Convert.ToBase64String(buffer2);
File.AppendAllText(destination, base64String);
bytesRead = inputStream.Read(buffer, 0, buffer.Length);
}
}
}

ushort array compression in C#

I've got a ushort array (actually an image coming from a camera) that I'd like to lossless compress before persistent storage. I'm using the GZipStream function provided in System.IO.Compression.GZipStream. This approach, to my knowledge, requires that I convert the ushort array to a byte array. My solution appears to function properly, but just isn't as quick as I'd like. The images are about 2 Mbytes in raw size, and the compress time ranges (on my slow machine) 200 - 400 msecs, and decompress time ranges 100 - 200 msecs. Looking for suggestions for improving my performance.
public static class Zip
{
public static ushort[] Decompress_ByteToShort(byte[] zippedData)
{
byte[] decompressedData = null;
using (MemoryStream outputStream = new MemoryStream())
{
using (MemoryStream inputStream = new MemoryStream(zippedData))
{
using (GZipStream zip = new GZipStream(inputStream, CompressionMode.Decompress))
{
zip.CopyTo(outputStream);
}
}
decompressedData = outputStream.ToArray();
}
ushort[] decompressShort = new ushort[decompressedData.Length / sizeof(ushort)];
Buffer.BlockCopy(decompressedData, 0, decompressShort, 0, decompressedData.Length);
return decompressShort;
}
public static byte[] Compress_ShortToByte(ushort[] plainData)
{
byte[] compressesData = null;
byte[] uncompressedData = new byte[plainData.Length * sizeof(ushort)];
Buffer.BlockCopy(plainData, 0, uncompressedData, 0, plainData.Length * sizeof(ushort));
using (MemoryStream outputStream = new MemoryStream())
{
using (GZipStream zip = new GZipStream(outputStream, CompressionMode.Compress))
{
zip.Write(uncompressedData, 0, uncompressedData.Length);
}
//Dont get the MemoryStream data before the GZipStream is closed
//since it doesn’t yet contain complete compressed data.
//GZipStream writes additional data including footer information when its been disposed
compressesData = outputStream.ToArray();
}
return compressesData;
}
}
The first problem in your approach I see is that you are using byte arrays instead of direcly loading and writing to files.
Using a smaller temporary buffer and reading\writing to streams and files directly in chunks should be much faster.
Here I propose some functions and overloads you can use to decompress from byte arrays, to byte arrays, from stream, to stream, from file and to file.
The performance improvement should be from 10% to 20%.
Try to adjust the constants as needed.
I used DeflateStream instead of GZipStream, this increases the performance a bit.
You can go back to a GZipStream if you prefer.
I tried just the byte to ushort and ushort to byte[] version of the code and it is about 10% faster.
Accessing directly to files instead of loading it to a big buffer should increase the performance even more.
WARNING: This approach of reading and writing images in this way is not little-endian/big-endian agnostic - it means that a file saved from a Intel/AMD machine is not compatible with an ARM machine, for example in some tablets! Just as a side note :)
/// <summary>The average file size, used to preallocate the right amount of memory for compression.</summary>
private const int AverageFileSize = 100000;
/// <summary>The default size of the buffer used to convert data. WARNING: Must be a multiple of 2!</summary>
private const int BufferSize = 32768;
/// <summary>Decompresses a byte array to unsigned shorts.</summary>
public static ushort[] Decompress_ByteToShort(byte[] zippedData)
{
using (var inputStream = new MemoryStream(zippedData))
return Decompress_File(inputStream);
}
/// <summary>Decompresses a file to unsigned shorts.</summary>
public static ushort[] Decompress_File(string inputFilePath)
{
using (var stream = new FileStream(inputFilePath, FileMode.Open, FileAccess.Read))
return Decompress_File(stream);
}
/// <summary>Decompresses a file stream to unsigned shorts.</summary>
public static ushort[] Decompress_File(Stream zippedData)
{
using (var zip = new DeflateStream(zippedData, CompressionMode.Decompress, true))
{
// Our temporary buffer.
var buffer = new byte[BufferSize];
// Read the number of bytes, written initially as header in the file.
zip.Read(buffer, 0, sizeof(int));
var resultLength = BitConverter.ToInt32(buffer, 0);
// Creates the result array
var result = new ushort[resultLength];
// Decompress the file chunk by chunk
var resultOffset = 0;
for (; ; )
{
// Read a chunk of data
var count = zip.Read(buffer, 0, BufferSize);
if (count <= 0)
break;
// Copy a piece of the decompressed buffer
Buffer.BlockCopy(buffer, 0, result, resultOffset, count);
// Advance counter
resultOffset += count;
}
return result;
}
}
/// <summary>Compresses an ushort array to a file array.</summary>
public static byte[] Compress_ShortToByte(ushort[] plainData)
{
using (var outputStream = new MemoryStream(AverageFileSize))
{
Compress_File(plainData, outputStream);
return outputStream.ToArray();
}
}
/// <summary>Compresses an ushort array directly to a file.</summary>
public static void Compress_File(ushort[] plainData, string outputFilePath)
{
using (var stream = new FileStream(outputFilePath, FileMode.OpenOrCreate, FileAccess.Write))
Compress_File(plainData, stream);
}
/// <summary>Compresses an ushort array directly to a file stream.</summary>
public static void Compress_File(ushort[] plainData, Stream outputStream)
{
using (var zip = new DeflateStream(outputStream, CompressionMode.Compress, true))
{
// Our temporary buffer.
var buffer = new byte[BufferSize];
// Writes the length of the plain data
zip.Write(BitConverter.GetBytes(plainData.Length), 0, sizeof(int));
var inputOffset = 0;
var availableBytes = plainData.Length * sizeof(ushort);
while (availableBytes > 0)
{
// Compute the amount of bytes to copy.
var bytesCount = Math.Min(BufferSize, availableBytes);
// Copy a chunk of plain data into the temporary buffer
Buffer.BlockCopy(plainData, inputOffset, buffer, 0, bytesCount);
// Write the buffer
zip.Write(buffer, 0, bytesCount);
// Advance counters
inputOffset += bytesCount;
availableBytes -= bytesCount;
}
}
}
There is GZipStream constructor (Stream, CompressionLevel) you can change CompressionLevel to speed up compression there is level which say Fastest in this enumeration.
Links to relevant documentation:
http://msdn.microsoft.com/pl-pl/library/hh137341(v=vs.110).aspx
http://msdn.microsoft.com/pl-pl/library/system.io.compression.compressionlevel(v=vs.110).aspx

how to buffer an input stream until it is complete

i'm implementing a wcf service that accepts image streams. however i'm currently getting an exception when i run it. as its trying to get the length of the stream before the stream is complete. so what i'd like to do is buffer the stream until its complete. however i cant find any examples of how to do this...
can anyone help?
my code so far:
public String uploadUserImage(Stream stream)
{
Stream fs = stream;
BinaryReader br = new BinaryReader(fs);
Byte[] bytes = br.ReadBytes((Int32)fs.Length);// this causes exception
File.WriteAllBytes(filepath, bytes);
}
Rather than try to fetch the length, you should read from the stream until it returns that it's "done". In .NET 4, this is really easy:
// Assuming we *really* want to read it into memory first...
MemoryStream memoryStream = new MemoryStream();
stream.CopyTo(memoryStream);
memoryStream.Position = 0;
File.WriteAllBytes(filepath, memoryStream);
In .NET 3.5 there's no CopyTo method, but you can write something similar yourself:
public static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[8192];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
}
However, now we've got something to copy a stream, why bother reading it all into memory first? Let's just write it straight to a file:
using (FileStream output = File.OpenWrite(filepath))
{
CopyStream(stream, output); // Or stream.CopyTo(output);
}
I'm not sure what you are returning (or not returning), but something like this might work for you:
public String uploadUserImage(Stream stream) {
const int KB = 1024;
Byte[] bytes = new Byte[KB];
StringBuilder sb = new StringBuilder();
using (BinaryReader br = new BinaryReader(stream)) {
int len;
do {
len = br.Read(bytes, 0, KB);
string readData = Encoding.UTF8.GetString(bytes);
sb.Append(readData);
} while (len == KB);
}
//File.WriteAllBytes(filepath, bytes);
return sb.ToString();
}
A string can hold up to 2 GB, I believe.
Try this :
using (StreamWriter sw = File.CreateText(filepath))
{
stream.CopyTo(sw);
sw.Close();
}
Jon Skeets answer for .Net 3.5 and below using a Buffer Read is actually done incorrectly.
The buffer isn't cleared between reads which can result in issues on any read that returns less than 8192, for example if the 2nd read, read 192 bytes, the 8000 last bytes from the first read would STILL be in the buffer which would then be returned to the stream.
My code below you supply it a Stream and it will return a IEnumerable array.
Using this you can for-each it and Write to a MemoryStream and then use .GetBuffer() to end up with a compiled merged byte[].
private IEnumerable<byte[]> ReadFullStream(Stream stream) {
while(true) {
byte[] buffer = new byte[8192];//since this is created every loop, its buffer is cleared
int bytesRead = stream.Read(buffer, 0, buffer.Length);//read up to 8192 bytes into buffer
if (bytesRead == 0) {//if we read nothing, stream is finished
break;
}
if(bytesRead < buffer.Length) {//if we read LESS than 8192 bytes, resize the buffer to essentially remove everything after what was read, otherwise you will have nullbytes/0x00bytes at the end of your buffer
Array.Resize(ref buffer, bytesRead);
}
yield return buffer;//yield return the buffer data
}//loop here until we reach a read == 0 (end of stream)
}

Categories