User specifies filename and block size. Original file splits into blocks with users block size (except last block). For each block calculates hash-function SHA256 and writes to the console.
This is program with 2 threads: first thread reading the original file and put into queue byte array of block; second thread removes byte array of block from queue and calculate hash.
After first iteration memory doesn't dispose until the program complete.
On next iterations memory allocates and disposes normally.
So, during next reading of part array I get OutOfMemoryException.
How can I manage memory correctly to avoid memory leak?
class Encryption
{
static FileInfo originalFile;
static long partSize = 0;
static long lastPartSize = 0;
static long numParts = 0;
static int lastPartNumber = 0;
static string[] hash;
static Queue<byte[]> partQueue = new Queue<byte[]>();
public Encryption(string _filename, long _partSize)
{
try
{
originalFile = new FileInfo(#_filename);
partSize = _partSize;
numParts = originalFile.Length / partSize;
lastPartSize = originalFile.Length % partSize;
if (lastPartSize != 0)
{
numParts++;
}
else if (lastPartSize == 0)
{
lastPartSize = partSize;
}
lastPartNumber = (int)numParts - 1;
hash = new string[numParts];
}
catch (FileNotFoundException fe)
{
Console.WriteLine("Error: {0}\nStackTrace: {1}", fe.Message, fe.StackTrace);
return;
}
catch (Exception e)
{
Console.WriteLine("Error: {0}\nStackTrace: {1}", fe.Message, fe.StackTrace);
}
}
private void readFromFile()
{
try
{
using (FileStream fs = new FileStream(originalFile.FullName, FileMode.Open, FileAccess.Read))
{
for (int i = 0; i < numParts; i++)
{
long len = 0;
if (i == lastPartNumber)
{
len = lastPartSize;
}
else
{
len = partSize;
}
byte[] part = new byte[len];
fs.Read(part, 0, (int)len);
partQueue.Enqueue(part);
part = null;
}
}
}
catch(Exception e)
{
Console.WriteLine("Error: {0}\nStackTrace: {1}", fe.Message, fe.StackTrace);
}
}
private static void hashToArray()
{
try
{
SHA256Managed sha256HashString = new SHA256Managed();
int numPart = 0;
while (numPart < numParts)
{
long len = 0;
if (numPart == lastPartNumber)
{
len = lastPartSize;
}
else
{
len = partSize;
}
hash[numPart] = sha256HashString.ComputeHash(partQueue.Dequeue()).ToString();
numPart++;
}
}
catch (Exception e)
{
Console.WriteLine("Error: {0}\nStackTrace: {1}", fe.Message, fe.StackTrace);
}
}
private void hashWrite()
{
try
{
Console.WriteLine("\nResult:\n");
for (int i = 0; i < numParts; i++)
{
Console.WriteLine("{0} : {1}", i, hash[i]);
}
}
catch(Exception e)
{
Console.WriteLine("Error: {0}\nStackTrace: {1}", fe.Message, fe.StackTrace);
}
}
public void threadsControl()
{
try
{
Thread readingThread = new Thread(readFromFile);
Thread calculateThread = new Thread(hashToArray);
readingThread.Start();
calculateThread.Start();
readingThread.Join();
calculateThread.Join();
hashWrite();
}
catch (Exception e)
{
Console.WriteLine("Error: {0}\nStackTrace: {1}", fe.Message, fe.StackTrace);
}
}
}
You should read some books about .NET internals before you writing such code. Your understanding of .NET memory model is completely wrong, this is why you getting such error. OutOfMemoryException occurs very rarely, if you care about your resourses, especially if you are dealing with arrays.
You should know that in .NET runtime there are two heaps for reference objects, basic one, and Large Objects Heap, and the most important difference between them is that LOH doesn't being compacted even after garbage collection.
You should know that all the arrays, even small ones, are going to the LOH, and the memory is being consumed very quickly. Also you should know that this line:
part = null;
doesn't dispose memory immidiately. Even worse, this line doesn't do anything at all, because you still have a reference to the part of the file you've read in the queue. This is why your memory goes out. You can try to fix this by calling the GC after each hash computing, but this is highly not recommended solution.
You should rewrite your algorithm (which is very simple case of the Producer/Consumer pattern) without storing whole file contents in memory simultaneously. This is quite easy - simply move out your part variable to the static field, and read the next file part into it. Introduce the EventWaitHandle (or one of it's child classes) in your code instead of queue, and simply compute the next hash right after you've read the next part of file.
I recommend you to start from the basics in threading in C# by reading the great series of Joe Albahari, and only after that try to implement such solutions. Good luck with your projects.
Related
I'm using this, slightly modified, to copy large files from a file share with the ability to continue copying, if the download was disrupted. It runs in a BackroudWorker and reports progress. This works fine, but I'd like to have the ability to write the current MD5 hash to disk (the current total, not once for each block) each time a block of file data is written to disk WITH MINIMAL ADDITIONAL OVERHEAD. If a partial file is discovered, I'd like to read the MD5 hash from file, and if it is identical to that of the partial file, continue copying. When the file has been copied completely, the MD5 hash in the file should be that of the completly copied file. I'd like to use that later to determine that the files in source and destination are identical. Thanks for any help!
This is my current copy method:
public static bool CopyFile(List<CopyObjects> FileList, FSObjToCopy job, BackgroundWorker BW)
{
Stopwatch sw = new Stopwatch();
long RestartPosition = 0;
bool Retry = false;
int BYTES_TO_READ = (0x200000)
foreach (CopyObjects co in FileList)
{
FileInfo fi = co.file;
FileInfo fo = null;
if (fi.Directory.FullName.StartsWith($#"{Test_Updater_Core.ServerName}\{Test_Updater_Core.ServerTemplateRoot}"))
{
if (File.Exists(fi.FullName.Replace($#"{Test_Updater_Core.ServerName}\{Test_Updater_Core.ServerTemplateRoot}", $#"{ Test_Updater_Core.USBStore_Drive.driveInfo.Name.Replace("\\", "")}\{Test_Updater_Core.UsbTemplateRoot}")))
{
fi = new FileInfo(fi.FullName.Replace($#"{Test_Updater_Core.ServerName}\{Test_Updater_Core.ServerTemplateRoot}", $#"{Test_Updater_Core.USBStore_Drive.driveInfo.Name.Replace("\\", "")}\{Test_Updater_Core.UsbTemplateRoot}"));
co.destination = co.destination.Replace($#"{Test_Updater_Core.USBStore_Drive.driveInfo.Name.Replace("\\", "")}\{Test_Updater_Core.UsbTemplateRoot}", $#"{Test_Updater_Core.LocalInstallDrive}\{Test_Updater_Core.LocalTemplateRoot}");
fo = new FileInfo($"{fi.FullName.Replace($#"{Test_Updater_Core.USBStore_Drive.driveInfo.Name.Replace("\\", "")}\{Test_Updater_Core.UsbTemplateRoot}", $#"{Test_Updater_Core.LocalInstallDrive}\{Test_Updater_Core.LocalTemplateRoot}")}{Test_Updater_Core.TempFileExtension}");
}
}
//If a clean cancellation was requested, we do it here, otherwise the BackgroundWorker will be killed
if (BW.CancellationPending)
{
job.Status = FSObjToCopy._Status.Complete;
return false;
}
//If a pause is requested, we loop here until resume or termination has been signaled
while (job.PauseBackgroundWorker == true)
{
Thread.Sleep(100);
if (BW.CancellationPending)
{
job.Status = FSObjToCopy._Status.Complete;
return false;
}
Application.DoEvents();
}
if (fo == null)
fo = new FileInfo($"{fi.FullName.Replace(job.Source, co.destination)}{Test_Updater_Core.TempFileExtension}");
if (fo.Exists)
{
Retry = true;
RestartPosition = fo.Length - BYTES_TO_READ;
}
else
{
RestartPosition = 0;
Retry = false;
}
if (RestartPosition <= 0)
{
Retry = false;
}
sw.Start();
try
{
// Read source files into file streams
FileStream source = new FileStream(fi.FullName, FileMode.Open, FileAccess.Read);
// Additional way to write to file stream
FileStream dest = new FileStream(fo.FullName, FileMode.OpenOrCreate, FileAccess.Write);
// Actual read file length
int destLength = 0;
// If the length of each read is less than the length of the source file, read in chunks
if (BYTES_TO_READ < source.Length)
{
byte[] buffer = new byte[BYTES_TO_READ];
long copied = 0;
if (Retry)
{
source.Seek(RestartPosition, SeekOrigin.Begin);
dest.Seek(RestartPosition, SeekOrigin.Begin);
Retry = false;
}
while (copied <= source.Length - BYTES_TO_READ)
{
destLength = source.Read(buffer, 0, BYTES_TO_READ);
source.Flush();
dest.Write(buffer, 0, BYTES_TO_READ);
dest.Flush();
// Current position of flow
dest.Position = source.Position;
copied += BYTES_TO_READ;
job.CopiedSoFar += BYTES_TO_READ;
if (sw.ElapsedMilliseconds > 250)
{
job.PercComplete = (int)(float)((float)job.CopiedSoFar / (float)job.TotalFileSize * 100);
sw.Restart();
sw.Start();
job.ProgressCell.Value = job.PercComplete;
BW.ReportProgress(job.PercComplete < 100 ? job.PercComplete : 99);
}
if (BW.CancellationPending)
{
job.Status = FSObjToCopy._Status.Complete;
return false;
}
while (job.PauseBackgroundWorker == true)
{
Thread.Sleep(100);
if (BW.CancellationPending)
{
job.Status = FSObjToCopy._Status.Complete;
return false;
}
Application.DoEvents();
}
}
int left = (int)(source.Length - copied);
destLength = source.Read(buffer, 0, left);
source.Flush();
dest.Write(buffer, 0, left);
dest.Flush();
job.CopiedSoFar += left;
}
else
{
// If the file length of each copy is longer than that of the source file, the actual file length is copied directly.
byte[] buffer = new byte[source.Length];
source.Read(buffer, 0, buffer.Length);
source.Flush();
dest.Write(buffer, 0, buffer.Length);
dest.Flush();
job.CopiedSoFar += source.Length;
job.PercComplete = (int)(float)((float)job.CopiedSoFar / (float)job.TotalFileSize * 100);
job.ProgressCell.Value = job.PercComplete;
BW.ReportProgress(job.PercComplete < 100 ? job.PercComplete : 99);
}
source.Close();
dest.Close();
fo.LastWriteTimeUtc = fi.LastWriteTimeUtc;
if (File.Exists(fo.FullName))
{
if (File.Exists(fo.FullName.Replace($"{Test_Updater_Core.TempFileExtension}", "")))
{
File.Delete(fo.FullName.Replace($"{Test_Updater_Core.TempFileExtension}", ""));
}
File.Move(fo.FullName, fo.FullName.Replace($"{Test_Updater_Core.TempFileExtension}", ""));
}
job.ProgressCell.Value = job.PercComplete;
BW.ReportProgress(job.PercComplete);
}
catch (Exception ex)
{
MessageBox.Show($"There was an error copying:{Environment.NewLine}{fi}{Environment.NewLine}to:" +
$"{Environment.NewLine}{fo}{Environment.NewLine}The error is: {Environment.NewLine}{ex.Message}",
"Error", MessageBoxButtons.OK, MessageBoxIcon.Error);
job.Status = FSObjToCopy._Status.Error;
return false;
}
finally
{
sw.Stop();
}
}
return true;
}
I decided to create Checksum files on the server that contain a series of checksums in each. As I copy the file, I add the checksums to an internal list, and compare them to the server list. If at some point, they do not match, I go back to the point where they were identical and start back up there. At the end of the copy job, I write the checksums from the internal list to disk, with the same name as of the server. If I'd like to check the integrity of a file, I can compare the server file to the local file and verify the checksums.
I have got an embedded debian board with mono running an .NET 4.0 application with a fixed number of threads (no actions, no tasks). Because of memory issues I used CLR-Profiler in Windows to analyse memory heap.
Following diagram shows now, that IThreadPoolWorkItems are not (at least not in generation 0) collected:
Now, I really dont have any idea where this objects are possibly used and why they arent collected.
Where could the issue be for this behaviour and where would the IThreadPoolWorkItem being used?
What can I do to find out where they are being used (I couldnt find them through searching the code or looking in CLR-Profiler yet).
Edit
...
private Dictionary<byte, Telegram> _incoming = new Dictionary<byte, Telegram>();
private Queue<byte> _serialDataQueue;
private byte[] _receiveBuffer = new byte[2048];
private Dictionary<Telegram, Telegram> _resultQueue = new Dictionary<Telegram, Telegram>();
private static Telegram _currentTelegram;
ManualResetEvent _manualReset = new ManualResetEvent(false);
...
// Called from other thread (class) to send new telegrams
public bool Send(Dictionary<byte, Telegram> telegrams, out IDictionary<Telegram, Telegram> received)
{
try
{
_manualReset.Reset();
_incoming.Clear(); // clear all prev sending telegrams
_resultQueue.Clear(); // clear the receive queue
using (token = new CancellationTokenSource())
{
foreach (KeyValuePair<byte, Telegram> pair in telegrams)
{
_incoming.Add(pair.Key, pair.Value);
}
int result = WaitHandle.WaitAny(new[] { token.Token.WaitHandle, _manualReset });
received = _resultQueue.Clone<Telegram, Telegram>();
_resultQueue.Clear();
return result == 1;
}
}
catch (Exception err)
{
...
return false;
}
}
// Communication-Thread
public void Run()
{
while(true)
{
...
GetNextTelegram(); // _currentTelegram is set there and _incoming Queue is dequeued
byte[] telegramArray = GenerateTelegram(_currentTelegram, ... );
bool telegramReceived = SendReceiveTelegram(3000, telegramArray);
...
}
}
// Helper method to send and receive telegrams
private bool SendReceiveTelegram(int timeOut, byte[] telegram)
{
// send telegram
try
{
// check if serial port is open
if (_serialPort != null && !_serialPort.IsOpen)
{
_serialPort.Open();
}
Thread.Sleep(10);
_serialPort.Write(telegram, 0, telegram.Length);
}
catch (Exception err)
{
log.ErrorFormat(err.Message, err);
return false;
}
// receive telegram
int offset = 0, bytesRead;
_serialPort.ReadTimeout = timeOut;
int bytesExpected = GetExpectedBytes(_currentTelegram);
if (bytesExpected == -1)
return false;
try
{
while (bytesExpected > 0 &&
(bytesRead = _serialPort.Read(_receiveBuffer, offset, bytesExpected)) > 0)
{
offset += bytesRead;
bytesExpected -= bytesRead;
}
for (int index = 0; index < offset; index++)
_serialDataQueue.Enqueue(_receiveBuffer[index]);
List<byte> resultList;
// looks if telegram is valid and removes bytes from _serialDataQueue
bool isValid = IsValid(_serialDataQueue, out resultList, currentTelegram);
if (isValid && resultList != null)
{
// only add to queue if its really needed!!
byte[] receiveArray = resultList.ToArray();
_resultQueue.Add((Telegram)currentTelegram.Clone(), respTelegram);
}
if (!isValid)
{
Clear();
}
return isValid;
}
catch (TimeOutException err) // Timeout exception
{
log.ErrorFormat(err.Message, err);
Clear();
return false;
} catch (Exception err)
{
log.ErrorFormat(err.Message, err);
Clear();
return false;
}
}
Thx for you help!
I found out, like spender mentioned already, the "issue" is the communication over SerialPort. I found an interesting topic here:
SerialPort has a background thread that's waiting for events (via WaitCommEvent). Whenever an event arrives, it queues a threadpool work
item that may result in a call to your event handler. Let's focus on
one of these threadpool threads. It tries to take a lock (quick
reason: this exists to synchronize event raising with closing; for
more details see the end) and once it gets the lock it checks whether
the number of bytes available to read is above the threshold. If so,
it calls your handler.
So this lock is the reason your handler won't be called in separate
threadpool threads at the same time.
Thats most certainly the reason why they arent collected immediatly. I also tried not using the blocking Read in my SendReceiveTelegram method, but using SerialDataReceivedEventHandler instead led to the same result.
So for me, I will leave things now as they are, unless you bring me a better solution, where these ThreadPoolWorkitems arent kept that long in the Queue anymore.
Thx for your help and also your negative assessment :-D
I have a data parsing application I have been working on and given the sheer size of the text files it is reading, controlling memory usage is key for good performance. The two part strategy here first measures how much RAM each file would contribute to the sum total, but it also needs to know how much RAM is available to the application at a given point in time. If enough RAM is available, the application opts to do its processing in memory. Otherwise, it switches to a mode that performs all or most of the operations on disk.
Measuring a file's contribution to memory usage is quick and easy:
static Int64 GetSizeInMemory(string path)
{
//THIS CODE IS SPEEDY
Int64 r = ((Func<Int64>)(
() =>
{
try
{
using (Stream s = new MemoryStream())
{
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(s, File.ReadAllLines(path));
return s.Length;
}
}
catch
{
//this file is way too big
return -1;
}
}
))();
GC.Collect();
GC.WaitForPendingFinalizers();
return r;
}
However, measuring the total amount of memory available is slow and difficult. In this case, I attempted to do so by trapping the stack overflow error, which in my thinking should give the most reliable figure.
static Int64 GetMaxAllowedMemory()
{
//THIS CODE IS SLOW
Int64 r = ((Func<Int64>)(
() =>
{
byte[] b = new byte[]{};
Int64 rs = 0;
while (true)
{
try
{
Array.Resize<byte>(ref b, b.Length + 1);
b[b.Length - 1] = new byte();
rs = b.Length;
} catch (Exception e) {
break;
}
}
b = null;
return rs;
}
))();
GC.Collect();
GC.WaitForPendingFinalizers();
return r;
}
Is there a better approach I should be using here?
Please note I have looked at a number of questions similar to this one on Stack Overflow, but most deal only with obtaining a figure for the total amount of available RAM on the computer, which is not the same as the maximum amount of RAM a .NET process is allowed at runtime.
UPDATE
After receiving an answer, I came up with the following that allows me to get the total amount of RAM available to the application.
static Int64 GetMemoryFailPoint()
{
Int64 r = ((Func<Int64>)(
() =>
{
int rs = 1;
while (true)
{
try
{
using (new System.Runtime.MemoryFailPoint(rs))
{
}
}
catch {
break;
}
rs++;
}
return Convert.ToInt64(rs) * 1000000;
}
))();
return r;
}
You could try to use the MemoryFailPoint class:
try
{
using (new System.Runtime.MemoryFailPoint(1024)) // 1024 megabytes
{
// Do processing in memory
}
}
catch (InsufficientMemoryException)
{
// Do processing on disk
}
Based on this original post.
Instead of reading the entire file in to memory and seeing if it fails or not you can insead use MemoryFailPoint to check to see if there is enough ram available to do the in-memory processing by using the size of the file on the disk.
void ProcessFile(string path)
{
try
{
var fileInfo = new FileInfo(path);
var fileSizeInMb = (int)(fileInfo.Length >> 20);
using (new System.Runtime.MemoryFailPoint(fileSizeInMb))
{
// Do processing in memory
}
}
catch (InsufficientMemoryException)
{
// Do processing on disk
}
}
I'm using a function to add some values in an Dynamic Array (I know that I could use a list but it's a requirement that I must use an Array).
Right now everything is working but I need to know when a thread fails adding a value (because it's locked and to save that time) and when it adds it (I think when it adds, I already have it as you can see in the function Add.
Insert Data:
private void button6_Click(object sender, EventArgs e)
{
showMessage(numericUpDown5.Value.ToString());
showMessage(numericUpDown6.Value.ToString());
for (int i = 0; i < int.Parse(numericUpDown6.Value.ToString()); i++)
{
ThreadStart start = new ThreadStart(insertDataSecure);
new Thread(start).Start();
}
}
private void insertDataSecure()
{
for (int i = 0; i < int.Parse(numericUpDown5.Value.ToString()); i++)
sArray.addSecure(i);
MessageBox.Show(String.Format("Finished data inserted, you can check the result in: {0}", Path.Combine(
Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location),
"times.txt")), "Result", MessageBoxButtons.OK, MessageBoxIcon.Information);
}
Function to Add:
private object padLock = new object();
public void addSecure(int value)
{
Stopwatch sw = Stopwatch.StartNew();
string values = "";
lock (padLock)
{
try
{
if (array == null)
{
this.size = 1;
Resize(this.size);
array[0] = value;
count++;
}
else
{
count++;
if (size == count)
{
size *= 2;
Resize(size);
}
array[count - 1] = value;
}
}
catch
{
throw new System.ArgumentException("It was impossible to insert, try again later.", "insert");
}
values=String.Format("Element {0}, Time taken: {1}ms", value.ToString(), sw.Elapsed.TotalMilliseconds);
sw.Stop();
saveFile(values);
}
Sorry for asking this question but I have read different articles and this is the last one that I tried to use: http://msdn.microsoft.com/en-us/library/4tssbxcw.aspx but when I tried to implement in my code finally crashed in an strange error.
I'm afraid I might not completely understand the question. It sounds like you want to know how long it takes between the time the thread starts and when it actually acquires the lock. But in that case, the thread does not actually fail to add a value; it is simply delayed some period of time.
On the other hand, you do have an exception handler, so presumably there's some scenario you expect where the Resize() method can throw an exception (but you should catch only those exceptions you expect and know you can handle…a bare catch clause is not a good idea, though the harm is mitigated somewhat by the fact that you do throw some exception the exception handler). So I can't help but wonder if that is the failure you're talking about.
That said, assuming the former interpretation is correct – that you want to time how long it takes to acquire the lock – then the following change to your code should do that:
public void addSecure(int value)
{
Stopwatch sw = Stopwatch.StartNew();
string values = "";
lock (padLock)
{
// Save the current timer value here
TimeSpan elapsedToAcquireLock = sw.Elapsed;
try
{
if (array == null)
{
this.size = 1;
Resize(this.size);
array[0] = value;
count++;
}
else
{
count++;
if (size == count)
{
size *= 2;
Resize(size);
}
array[count - 1] = value;
}
}
catch
{
throw new System.ArgumentException("It was impossible to insert, try again later.", "insert");
}
sw.Stop();
values = string.Format(
"Element {0}, Time taken: for lock acquire: {1}ms, for append operation: {2}ms",
value.ToString(),
elapsedToAcquireLock.TotalMilliseconds,
sw.Elapsed.TotalMilliseconds - elapsedToAcquireLock.TotalMilliseconds);
saveFile(values);
}
}
That will display the individual times for the sections of code: acquiring the lock, and then actually adding the value to the array (i.e. the latter not including the time taken to acquire the lock).
If that's not actually what you are trying to do, please edit your question so that it is more clear.
We are trying with below code.
public static int SplitFile(string fileName, string tmpFolder, List<string> queue, int splitSize = 100000)
{
int chunk = 0;
if (!Directory.Exists(tmpFolder))
Directory.CreateDirectory(tmpFolder);
using (var lineIterator = File.ReadLines(fileName).GetEnumerator())
{
bool stillGoing = true;
for (chunk = 0; stillGoing; chunk++)
{
stillGoing = WriteChunk(lineIterator, splitSize, chunk, tmpFolder, queue);
}
}
return chunk;
}
private static bool WriteChunk(IEnumerator<string> lineIterator,
int splitSize, int chunk, string tmpFolder, List<string> queue)
{
try
{
//int tmpChunkSize = 1000;
//int tmpChunkInc = 0;
string splitFile = Path.Combine(tmpFolder, "file" + chunk + ".txt");
using (var writer = File.CreateText(splitFile))
{
queue.Add(splitFile);
for (int i = 0; i < splitSize; i++)
{
if (!lineIterator.MoveNext())
{
return false;
}
writer.WriteLine(lineIterator.Current);
}
}
return true;
}
catch (Exception)
{
throw;
}
}
It creates around 36 text files (around 800 MB), but starting throwing "Out of memory exception" in creation of 37th File at lineIterator.MoveNext().
While lineIterator.Current shows the value in debugger.
As It s a huge file you should read it Seek and ReadBytes methods of BinaryReader.
You can see a simple example here. After you use the ReadBytes check for the last new lines and write the process file in certain amount of lines you read. Don t write every line you read and also don t keep all the data in the memory.
The rest is in your hands.
Maybe it is realted to that one When does File.ReadLines free resources
IEnumerable doesn't inherit from IDisposable because typically, the class that implements it only gives you the promise of being enumerable, it hasn't actually done anything yet that warrants disposal.