Memory leak at simple loading/saving of files - c#

As part of my thesis, I need to load, modify and save .dds texture files. Therefore I'm using the DevIL.NET-Wrapper library (but the problem isn't specific to this library I guess, it's more of a general problem).
I managed (by using the visual studio memory analysis tools) to figure out the memory leaking function inside the DevIL.NET-Wrapper:
public static byte[] ReadStreamFully(Stream stream, int initialLength) {
if(initialLength < 1) {
initialLength = 32768; //Init to 32K if not a valid initial length
}
byte[] buffer = new byte[initialLength];
int position = 0;
int chunk;
while((chunk = stream.Read(buffer, position, buffer.Length - position)) > 0) {
position += chunk;
//If we reached the end of the buffer check to see if there's more info
if(position == buffer.Length) {
int nextByte = stream.ReadByte();
//If -1 we reached the end of the stream
if(nextByte == -1) {
return buffer;
}
//Not at the end, need to resize the buffer
byte[] newBuffer = new byte[buffer.Length * 2];
Array.Copy(buffer, newBuffer, buffer.Length);
newBuffer[position] = (byte) nextByte;
buffer = newBuffer;
position++;
}
}
//Trim the buffer before returning
byte[] toReturn = new byte[position];
Array.Copy(buffer, toReturn, position);
return toReturn;
}
I did a test program to figure out where the memory leak actually comes from:
private static void testMemoryOverflow(string[] args)
{
DevIL.ImageImporter im;
DevIL.ImageExporter ie;
...
foreach (String file in ddsPaths)
{
using (FileStream fs = File.Open(file, FileMode.Open))
{
/* v memory leak v */
DevIL.Image img = im.LoadImageFromStream(fs);
/* ^ memory leak ^ */
ie.SaveImage(img, fileSavePath);
img = null;
}
}
}
The LoadImageFromStream() function is also part of the DevIL.NET-Wrapper, and in fact calling the function from above. This is where the leak occurs.
What I already tried:
Using GC.Collect()
Disposing the FileStream object manually instead of using the using{} directive
Disposing the stream inside the DevIL.NET ReadStreamFully() function from above
Does anyone have a solution for this?
I'm new to C#, so maybe it's kind of a basic mistake.

Your issue is the buffer size.
byte[] newBuffer = new byte[buffer.Length * 2];
After 2 iterations.. you're already very close to the 85K limit of objects hitting the Large Object Heap. At 3 iterations.. you've hit the threshold. Once there.. they won't be collected until a full garbage collection occurs across all generations. Even then.. the LOH isn't compacted.. so you'll still see some high memory.
I'm not sure why the library you're using does this. I'm not sure why you're using it either.. given that you can use:
Image img = Image.FromStream(fs); // built into .NET.
The way that library is written looks like it was from an earlier version of .NET. It doesn't appear to have memory usage as any sort of concern.

Related

Handling big file stream (read+write bytes)

The following code do :
Read all bytes from an input file
Keep only part of the file in outbytes
Write the extracted bytes in outputfile
byte[] outbytes = File.ReadAllBytes(sourcefile).Skip(offset).Take(size).ToArray();
File.WriteAllBytes(outfile, outbytes);
But there is a limitation of ~2GB data for each step.
Edit: The extracted bytes size can also be greater than 2GB.
How could I handle big file ? What is the best way to proceed with good performances, regardless of size ?
Thx !
Example to FileStream to take the middle 3 Gb out of a 5 Gb file:
byte[] buffer = new byte{1024*1024];
using(var readFS = File.Open(pathToBigFile))
using(var writeFS = File.OpenWrite(pathToNewFile))
{
readFS.Seek(1024*1024*1024); //seek to 1gb in
for(int i=0; i < 3000; i++){ //3000 times of one megabyte = 3gb
int bytesRead = readFS.Read(buffer, 0, buffer.Length);
writeFS.Write(buffer, 0, bytesRead);
}
}
It's not a production grade code; Read might not read a full megabyte so you'd end up with less than 3Gb - it's more to demonstrate the concept of using two filestreams and reading repeatedly from one and writing repeatedly to the other. I'm sure you can modify it so that it copies an exact number of bytes by keeping track of the total of all the bytesRead in the loop and stopping reading when you have read enough
It is better to stream the data from one file to the other, only loading small parts of it into memory:
public static void CopyFileSection(string inFile, string outFile, long startPosition, long size)
{
// Open the files as streams
using (var inStream = File.OpenRead(inFile))
using (var outStream = File.OpenWrite(outFile))
{
// seek to the start position
inStream.Seek(startPosition, SeekOrigin.Begin);
// Create a variable to track how much more to copy
// and a buffer to temporarily store a section of the file
long remaining = size;
byte[] buffer = new byte[81920];
do
{
// Read the smaller of 81920 or remaining and break out of the loop if we've already reached the end of the file
int bytesRead = inStream.Read(buffer, 0, (int)Math.Min(buffer.Length, remaining));
if (bytesRead == 0) { break; }
// Write the buffered bytes to the output file
outStream.Write(buffer, 0, bytesRead);
remaining -= bytesRead;
}
while (remaining > 0);
}
}
Usage:
CopyFileSection(sourcefile, outfile, offset, size);
This should have equivalent functionality to your current method without the overhead of reading the entire file, regardless of its size, into memory.
Note: If you're doing this in code that uses async/await, you should change CopyFileSection to be public static async Task CopyFileSection and change inStream.Read and outStream.Write to await inStream.ReadAsync and await outStream.WriteAsync respectively.

C# MemoryStream.Read() always reads same part

Edit: Solution is at bottom of post
I am trying my luck with reading binary files. Since I don't want to rely on byte[] AllBytes = File.ReadAllBytes(myPath), because the binary file might be rather big, I want to read small portions of the same size (which fits nicely with the file format to read) in a loop, using what I would call a "buffer".
public void ReadStream(MemoryStream ContentStream)
{
byte[] buffer = new byte[sizePerHour];
for (int hours = 0; hours < NumberHours; hours++)
{
int t = ContentStream.Read(buffer, 0, sizePerHour);
SecondsToAdd = BitConverter.ToUInt32(buffer, 0);
// further processing of my byte[] buffer
}
}
My stream contains all the bytes I want, which is a good thing. When I enter the loop several things cease to work.
My int t is 0although I would presume that ContentStream.Read() would process information from within the stream to my bytearray, but that isn't the case.
I tried buffer = ContentStream.GetBuffer(), but that results in my buffer containing all of my stream, a behaviour I wanted to avoid by using reading to a buffer.
Also resetting the stream to position 0 before reading did not help, as did specifying an offset for my Stream.Read(), which means I am lost.
Can anyone point me to reading small portions of a stream to a byte[]? Maybe with some code?
Thanks in advance
Edit:
Pointing me to the right direction was the answer, that .Read() returns 0 if the end of stream is reached. I modified my code to the following:
public void ReadStream(MemoryStream ContentStream)
{
byte[] buffer = new byte[sizePerHour];
ContentStream.Seek(0, SeekOrigin.Begin); //Added this line
for (int hours = 0; hours < NumberHours; hours++)
{
int t = ContentStream.Read(buffer, 0, sizePerHour);
SecondsToAdd = BitConverter.ToUInt32(buffer, 0);
// further processing of my byte[] buffer
}
}
And everything works like a charm. I initially reset the stream to its origin every time I iterated over hour and giving an offset. Moving the "set to beginning-Part" outside my look and leaving the offset at 0 did the trick.
Read returns zero if the end of the stream is reached. Are you sure, that your memory stream has the content you expect? I´ve tried the following and it works as expected:
// Create the source of the memory stream.
UInt32[] source = {42, 4711};
List<byte> sourceBuffer = new List<byte>();
Array.ForEach(source, v => sourceBuffer.AddRange(BitConverter.GetBytes(v)));
// Read the stream.
using (MemoryStream contentStream = new MemoryStream(sourceBuffer.ToArray()))
{
byte[] buffer = new byte[sizeof (UInt32)];
int t;
do
{
t = contentStream.Read(buffer, 0, buffer.Length);
if (t > 0)
{
UInt32 value = BitConverter.ToUInt32(buffer, 0);
}
} while (t > 0);
}

FileStream.copyTo(Net.ConnectStream) what happens intern?

this code works fine. My question is what happens within the Net.ConnectionStream when i use the CopyTo() method?
System.Net.HttpWebRequest request
using (FileStream fileStream = new FileStream("C:\\myfile.txt")
{
using (Stream str = request.GetRequestStream())
{
fileStream.CopyTo(str);
}
}
More specific: What happens to the data?
1. write into the memory and upload then? (what's with big files?)
2. write into the network directly? (how does that work?)
Thanks for your answers
It creates a byte[] buffer and calls Read on the source and Write on the destination until the source doesn't have anymore data.
So when doing this with big files you don't need to be concerned about running out of memory because you'll only allocate as much as the buffer size, 81920 bytes by default.
Here's the actual implementation -
public void CopyTo(Stream destination)
{
// ... a bunch of argument validation stuff (omitted)
this.InternalCopyTo(destination, 81920);
}
private void InternalCopyTo(Stream destination, int bufferSize)
{
byte[] array = new byte[bufferSize];
int count;
while ((count = this.Read(array, 0, array.Length)) != 0)
{
destination.Write(array, 0, count);
}
}

File Chunking Performance in C#

I am trying to empower users to upload large files. Before I upload a file, I want to chunk it up. Each chunk needs to be a C# object. The reason why is for logging purposes. Its a long story, but I need to create actual C# objects that represent each file chunk. Regardless, I'm trying the following approach:
public static List<FileChunk> GetAllForFile(byte[] fileBytes)
{
List<FileChunk> chunks = new List<FileChunk>();
if (fileBytes.Length > 0)
{
FileChunk chunk = new FileChunk();
for (int i = 0; i < (fileBytes.Length / 512); i++)
{
chunk.Number = (i + 1);
chunk.Offset = (i * 512);
chunk.Bytes = fileBytes.Skip(chunk.Offset).Take(512).ToArray();
chunks.Add(chunk);
chunk = new FileChunk();
}
}
return chunks;
}
Unfortunately, this approach seems to be incredibly slow. Does anyone know how I can improve the performance while still creating objects for each chunk?
thank you
I suspect this is going to hurt a little:
chunk.Bytes = fileBytes.Skip(chunk.Offset).Take(512).ToArray();
Try this instead:
byte buffer = new byte[512];
Buffer.BlockCopy(fileBytes, chunk.Offset, buffer, 0, 512);
chunk.Bytes = buffer;
(Code not tested)
And the reason why this code would likely be slow is because Skip doesn't do anything special for arrays (though it could). This means that every pass through your loop is iterating the first 512*n items in the array, which results in O(n^2) performance, where you should just be seeing O(n).
Try something like this (untested code):
public static List<FileChunk> GetAllForFile(string fileName, FileMode.Open)
{
var chunks = new List<FileChunk>();
using (FileStream stream = new FileStream(fileName))
{
int i = 0;
while (stream.Position <= stream.Length)
{
var chunk = new FileChunk();
chunk.Number = (i);
chunk.Offset = (i * 512);
Stream.Read(chunk.Bytes, 0, 512);
chunks.Add(chunk);
i++;
}
}
return chunks;
}
The above code skips several steps in your process, preferring to read the bytes from the file directly.
Note that, if the file is not an even multiple of 512, the last chunk will contain less than 512 bytes.
Same as Robert Harvey's answer, but using a BinaryReader, that way I don't need to specify an offset. If you use a BinaryWriter on the other end to reassemble the file, you won't need the Offset member of FileChunk.
public static List<FileChunk> GetAllForFile(string fileName) {
var chunks = new List<FileChunk>();
using (FileStream stream = new FileStream(fileName)) {
BinaryReader reader = new BinaryReader(stream);
int i = 0;
bool eof = false;
while (!eof) {
var chunk = new FileChunk();
chunk.Number = i;
chunk.Offset = (i * 512);
chunk.Bytes = reader.ReadBytes(512);
chunks.Add(chunk);
i++;
if (chunk.Bytes.Length < 512) { eof = true; }
}
}
return chunks;
}
Have you thought about what you're going to do to compensate for packet loss and data corruption?
Since you mentioned that the load is taking a long time then I would use asynchronous file reading in order to speed up the loading process. The hard disk is the slowest component of a computer. Google does asynchronous reads and writes on Google Chrome to improve their load times. I had to do something like this in C# in a previous job.
The idea would be to spawn several asynchronous requests over different parts of the file. Then when a request comes in, take the byte array and create your FileChunk objects taking 512 bytes at a time. There are several benefits to this:
If you have this run in a separate thread, then you won't have the whole program waiting to load the large file you have.
You can process a byte array, creating FileChunk objects, while the hard disk is still trying to for-fill read request on other parts of the file.
You will save on RAM space if you limit the amount of pending read requests you can have. This allows less page faulting to the hard disk and use the RAM and CPU cache more efficiently, which speeds up processing further.
You would want to use the following methods in the FileStream class.
[HostProtectionAttribute(SecurityAction.LinkDemand, ExternalThreading = true)]
public virtual IAsyncResult BeginRead(
byte[] buffer,
int offset,
int count,
AsyncCallback callback,
Object state
)
public virtual int EndRead(
IAsyncResult asyncResult
)
Also this is what you will get in the asyncResult:
// Extract the FileStream (state) out of the IAsyncResult object
FileStream fs = (FileStream) ar.AsyncState;
// Get the result
Int32 bytesRead = fs.EndRead(ar);
Here is some reference material for you to read.
This is a code sample of working with Asynchronous File I/O Models.
This is a MS documentation reference for Asynchronous File I/O.

How to copy data from one UnamangedMemoryStream to another

I'm writing an UnmanagedRewindBuffer and I want to implement dynamic resizing of the buffer. I've tried several different things, but I can't seem to be able to get it right. The basic idea is that:
I allocate a new block of unmanaged memory.
Create a new UnmanagedMemoryStream (UMS).
Copy the contents from the old UMS to the new UMS.
Dispose of the old UMS and free the old allocated block.
Replace the old UMS and memory block with the new ones.
Here is my resize function:
private void DynamicallyResizeBuffer(long spaceNeeded)
{
while (_ums.Length < spaceNeeded)
{
// Allocate a new buffer
int length = (int)((double)spaceNeeded * RESIZE_FACTOR);
IntPtr tempMemoryPointer = Marshal.AllocHGlobal(length);
// Set the temporary pointer to null
//MemSet(tempMemoryPointer, length, 0);
byte* bytePointer = (byte*)tempMemoryPointer.ToPointer();
for (int i = 0; i < length; i++)
{
*(bytePointer + i) = 0;
}
// Copy the data
// MoveMemory(bytePointer, _memoryPointer.ToPointer(), _length);
// Create a new UnmanagedMemoryStream
UnmanagedMemoryStream tempUms = new UnmanagedMemoryStream(bytePointer, length, length, FileAccess.ReadWrite);
// Set up the reader and writers
BinaryReader tempReader = new BinaryReader(tempUms);
BinaryWriter tempWriter = new BinaryWriter(tempUms);
// Copy the data
_ums.Position = 0;
tempWriter.Write(ReadBytes(_length));
// I had deleted this line while I was using the writers and
// I forgot to copy it over, but the line was here when I used
// the MoveMemory function
tempUms.Position = _ums.Position;
// Free the old resources
Free(true);
_ums = tempUms;
_reader = tempReader;
_writer = tempWriter;
_length = length;
}
}
And here is my test for resizing:
public void DynamicResizeTest()
{
Int32 expected32 = 32;
Int32 actual32 = 0;
UInt64 expected64 = 64;
UInt64 actual64 = 0;
string expected = "expected";
string actual = string.Empty;
string actualFromBytes = string.Empty;
byte[] expectedBytes = Encoding.UTF8.GetBytes(expected);
// Create an 4 byte buffer
UnmanagedRewindBuffer ubs = null;
try
{
ubs = new UnmanagedRewindBuffer(4, 1);
ubs.WriteInt32(expected32);
// should dynamically resize for the 64 bit integer
ubs.WriteUInt64(expected64);
ubs.WriteString(expected);
// should dynamically resize for the bytes
ubs.WriteByte(expectedBytes);
ubs.Rewind();
actual32 = ubs.ReadInt32();
actual64 = ubs.ReadUInt64();
actual = ubs.ReadString();
actualFromBytes = Encoding.UTF8.GetString(ubs.ReadBytes(expected.Length));
}
finally
{
if (ubs != null)
{
ubs.Clear();
ubs.Dispose();
}
ubs = null;
}
Assert.AreEqual(expected32, actual32);
Assert.AreEqual(expected64, actual64);
Assert.AreEqual(expected, actual);
Assert.AreEqual(expected, actualFromBytes);
}
I've tried calling MoveMemory, which is just an unsafe extern to the kernel32 RtlMoveMemory, but when I run the test I get the following results:
actual32 is 32, expected 32
actual64 is 0, expected 64
actual is "", expected "expected"
actualFromBytes is some gibberish, expected "expected"
When I use the reader/writer to directly read from the old UMS to the new UMS, I get the following results:
actual32 is 32, expected 32
actual64 is 64, expected 64
actual is "", expected "expected"
actualFromBytes is "\b\0expect", expected "expected"
If I allocate enough space right from the start, then I have no issues with reading the values and I get the correct expected results.
What's the right way to copy the data?
Update:
Per Alexi's comment, here is the Free method which disposes of the reader/writer and the UnmanagedMemoryStream:
private void Free(bool disposeManagedResources)
{
// Dispose unmanaged resources
Marshal.FreeHGlobal(_memoryPointer);
// Dispose managed resources. Should not be called from destructor.
if (disposeManagedResources)
{
_reader.Close();
_writer.Close();
_reader = null;
_writer = null;
_ums.Dispose();
_ums = null;
}
}
You forgot this assignment:
_memoryPointer = tempMemoryPointer;
That can go unnoticed for a while, _memoryPointer is pointing to a released memory block that still contains the old bytes. Until the Windows heap manager re-uses the block or your code overwrites memory owned by another allocation. Exactly when that happens is unpredictable. You can take "unsafe" in the class name quite literally here.
First guess - you are not disposing StreamWriter - data may not be commited to the underlying stream.
You also may be missing code that updates position in your UnmanagedRewindBuffer...
Second guess: reader created on wrong stream.
Note: Consider using Stream.CopyTo (.Net 4 - http://msdn.microsoft.com/en-us/library/system.io.stream.copyto.aspx) to copye the stream. For 3.5 check How do I copy the contents of one stream to another? .

Categories