I have a data parsing application I have been working on and given the sheer size of the text files it is reading, controlling memory usage is key for good performance. The two part strategy here first measures how much RAM each file would contribute to the sum total, but it also needs to know how much RAM is available to the application at a given point in time. If enough RAM is available, the application opts to do its processing in memory. Otherwise, it switches to a mode that performs all or most of the operations on disk.
Measuring a file's contribution to memory usage is quick and easy:
static Int64 GetSizeInMemory(string path)
{
//THIS CODE IS SPEEDY
Int64 r = ((Func<Int64>)(
() =>
{
try
{
using (Stream s = new MemoryStream())
{
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(s, File.ReadAllLines(path));
return s.Length;
}
}
catch
{
//this file is way too big
return -1;
}
}
))();
GC.Collect();
GC.WaitForPendingFinalizers();
return r;
}
However, measuring the total amount of memory available is slow and difficult. In this case, I attempted to do so by trapping the stack overflow error, which in my thinking should give the most reliable figure.
static Int64 GetMaxAllowedMemory()
{
//THIS CODE IS SLOW
Int64 r = ((Func<Int64>)(
() =>
{
byte[] b = new byte[]{};
Int64 rs = 0;
while (true)
{
try
{
Array.Resize<byte>(ref b, b.Length + 1);
b[b.Length - 1] = new byte();
rs = b.Length;
} catch (Exception e) {
break;
}
}
b = null;
return rs;
}
))();
GC.Collect();
GC.WaitForPendingFinalizers();
return r;
}
Is there a better approach I should be using here?
Please note I have looked at a number of questions similar to this one on Stack Overflow, but most deal only with obtaining a figure for the total amount of available RAM on the computer, which is not the same as the maximum amount of RAM a .NET process is allowed at runtime.
UPDATE
After receiving an answer, I came up with the following that allows me to get the total amount of RAM available to the application.
static Int64 GetMemoryFailPoint()
{
Int64 r = ((Func<Int64>)(
() =>
{
int rs = 1;
while (true)
{
try
{
using (new System.Runtime.MemoryFailPoint(rs))
{
}
}
catch {
break;
}
rs++;
}
return Convert.ToInt64(rs) * 1000000;
}
))();
return r;
}
You could try to use the MemoryFailPoint class:
try
{
using (new System.Runtime.MemoryFailPoint(1024)) // 1024 megabytes
{
// Do processing in memory
}
}
catch (InsufficientMemoryException)
{
// Do processing on disk
}
Based on this original post.
Instead of reading the entire file in to memory and seeing if it fails or not you can insead use MemoryFailPoint to check to see if there is enough ram available to do the in-memory processing by using the size of the file on the disk.
void ProcessFile(string path)
{
try
{
var fileInfo = new FileInfo(path);
var fileSizeInMb = (int)(fileInfo.Length >> 20);
using (new System.Runtime.MemoryFailPoint(fileSizeInMb))
{
// Do processing in memory
}
}
catch (InsufficientMemoryException)
{
// Do processing on disk
}
}
Related
User specifies filename and block size. Original file splits into blocks with users block size (except last block). For each block calculates hash-function SHA256 and writes to the console.
This is program with 2 threads: first thread reading the original file and put into queue byte array of block; second thread removes byte array of block from queue and calculate hash.
After first iteration memory doesn't dispose until the program complete.
On next iterations memory allocates and disposes normally.
So, during next reading of part array I get OutOfMemoryException.
How can I manage memory correctly to avoid memory leak?
class Encryption
{
static FileInfo originalFile;
static long partSize = 0;
static long lastPartSize = 0;
static long numParts = 0;
static int lastPartNumber = 0;
static string[] hash;
static Queue<byte[]> partQueue = new Queue<byte[]>();
public Encryption(string _filename, long _partSize)
{
try
{
originalFile = new FileInfo(#_filename);
partSize = _partSize;
numParts = originalFile.Length / partSize;
lastPartSize = originalFile.Length % partSize;
if (lastPartSize != 0)
{
numParts++;
}
else if (lastPartSize == 0)
{
lastPartSize = partSize;
}
lastPartNumber = (int)numParts - 1;
hash = new string[numParts];
}
catch (FileNotFoundException fe)
{
Console.WriteLine("Error: {0}\nStackTrace: {1}", fe.Message, fe.StackTrace);
return;
}
catch (Exception e)
{
Console.WriteLine("Error: {0}\nStackTrace: {1}", fe.Message, fe.StackTrace);
}
}
private void readFromFile()
{
try
{
using (FileStream fs = new FileStream(originalFile.FullName, FileMode.Open, FileAccess.Read))
{
for (int i = 0; i < numParts; i++)
{
long len = 0;
if (i == lastPartNumber)
{
len = lastPartSize;
}
else
{
len = partSize;
}
byte[] part = new byte[len];
fs.Read(part, 0, (int)len);
partQueue.Enqueue(part);
part = null;
}
}
}
catch(Exception e)
{
Console.WriteLine("Error: {0}\nStackTrace: {1}", fe.Message, fe.StackTrace);
}
}
private static void hashToArray()
{
try
{
SHA256Managed sha256HashString = new SHA256Managed();
int numPart = 0;
while (numPart < numParts)
{
long len = 0;
if (numPart == lastPartNumber)
{
len = lastPartSize;
}
else
{
len = partSize;
}
hash[numPart] = sha256HashString.ComputeHash(partQueue.Dequeue()).ToString();
numPart++;
}
}
catch (Exception e)
{
Console.WriteLine("Error: {0}\nStackTrace: {1}", fe.Message, fe.StackTrace);
}
}
private void hashWrite()
{
try
{
Console.WriteLine("\nResult:\n");
for (int i = 0; i < numParts; i++)
{
Console.WriteLine("{0} : {1}", i, hash[i]);
}
}
catch(Exception e)
{
Console.WriteLine("Error: {0}\nStackTrace: {1}", fe.Message, fe.StackTrace);
}
}
public void threadsControl()
{
try
{
Thread readingThread = new Thread(readFromFile);
Thread calculateThread = new Thread(hashToArray);
readingThread.Start();
calculateThread.Start();
readingThread.Join();
calculateThread.Join();
hashWrite();
}
catch (Exception e)
{
Console.WriteLine("Error: {0}\nStackTrace: {1}", fe.Message, fe.StackTrace);
}
}
}
You should read some books about .NET internals before you writing such code. Your understanding of .NET memory model is completely wrong, this is why you getting such error. OutOfMemoryException occurs very rarely, if you care about your resourses, especially if you are dealing with arrays.
You should know that in .NET runtime there are two heaps for reference objects, basic one, and Large Objects Heap, and the most important difference between them is that LOH doesn't being compacted even after garbage collection.
You should know that all the arrays, even small ones, are going to the LOH, and the memory is being consumed very quickly. Also you should know that this line:
part = null;
doesn't dispose memory immidiately. Even worse, this line doesn't do anything at all, because you still have a reference to the part of the file you've read in the queue. This is why your memory goes out. You can try to fix this by calling the GC after each hash computing, but this is highly not recommended solution.
You should rewrite your algorithm (which is very simple case of the Producer/Consumer pattern) without storing whole file contents in memory simultaneously. This is quite easy - simply move out your part variable to the static field, and read the next file part into it. Introduce the EventWaitHandle (or one of it's child classes) in your code instead of queue, and simply compute the next hash right after you've read the next part of file.
I recommend you to start from the basics in threading in C# by reading the great series of Joe Albahari, and only after that try to implement such solutions. Good luck with your projects.
I'm using a function to add some values in an Dynamic Array (I know that I could use a list but it's a requirement that I must use an Array).
Right now everything is working but I need to know when a thread fails adding a value (because it's locked and to save that time) and when it adds it (I think when it adds, I already have it as you can see in the function Add.
Insert Data:
private void button6_Click(object sender, EventArgs e)
{
showMessage(numericUpDown5.Value.ToString());
showMessage(numericUpDown6.Value.ToString());
for (int i = 0; i < int.Parse(numericUpDown6.Value.ToString()); i++)
{
ThreadStart start = new ThreadStart(insertDataSecure);
new Thread(start).Start();
}
}
private void insertDataSecure()
{
for (int i = 0; i < int.Parse(numericUpDown5.Value.ToString()); i++)
sArray.addSecure(i);
MessageBox.Show(String.Format("Finished data inserted, you can check the result in: {0}", Path.Combine(
Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location),
"times.txt")), "Result", MessageBoxButtons.OK, MessageBoxIcon.Information);
}
Function to Add:
private object padLock = new object();
public void addSecure(int value)
{
Stopwatch sw = Stopwatch.StartNew();
string values = "";
lock (padLock)
{
try
{
if (array == null)
{
this.size = 1;
Resize(this.size);
array[0] = value;
count++;
}
else
{
count++;
if (size == count)
{
size *= 2;
Resize(size);
}
array[count - 1] = value;
}
}
catch
{
throw new System.ArgumentException("It was impossible to insert, try again later.", "insert");
}
values=String.Format("Element {0}, Time taken: {1}ms", value.ToString(), sw.Elapsed.TotalMilliseconds);
sw.Stop();
saveFile(values);
}
Sorry for asking this question but I have read different articles and this is the last one that I tried to use: http://msdn.microsoft.com/en-us/library/4tssbxcw.aspx but when I tried to implement in my code finally crashed in an strange error.
I'm afraid I might not completely understand the question. It sounds like you want to know how long it takes between the time the thread starts and when it actually acquires the lock. But in that case, the thread does not actually fail to add a value; it is simply delayed some period of time.
On the other hand, you do have an exception handler, so presumably there's some scenario you expect where the Resize() method can throw an exception (but you should catch only those exceptions you expect and know you can handle…a bare catch clause is not a good idea, though the harm is mitigated somewhat by the fact that you do throw some exception the exception handler). So I can't help but wonder if that is the failure you're talking about.
That said, assuming the former interpretation is correct – that you want to time how long it takes to acquire the lock – then the following change to your code should do that:
public void addSecure(int value)
{
Stopwatch sw = Stopwatch.StartNew();
string values = "";
lock (padLock)
{
// Save the current timer value here
TimeSpan elapsedToAcquireLock = sw.Elapsed;
try
{
if (array == null)
{
this.size = 1;
Resize(this.size);
array[0] = value;
count++;
}
else
{
count++;
if (size == count)
{
size *= 2;
Resize(size);
}
array[count - 1] = value;
}
}
catch
{
throw new System.ArgumentException("It was impossible to insert, try again later.", "insert");
}
sw.Stop();
values = string.Format(
"Element {0}, Time taken: for lock acquire: {1}ms, for append operation: {2}ms",
value.ToString(),
elapsedToAcquireLock.TotalMilliseconds,
sw.Elapsed.TotalMilliseconds - elapsedToAcquireLock.TotalMilliseconds);
saveFile(values);
}
}
That will display the individual times for the sections of code: acquiring the lock, and then actually adding the value to the array (i.e. the latter not including the time taken to acquire the lock).
If that's not actually what you are trying to do, please edit your question so that it is more clear.
We are trying with below code.
public static int SplitFile(string fileName, string tmpFolder, List<string> queue, int splitSize = 100000)
{
int chunk = 0;
if (!Directory.Exists(tmpFolder))
Directory.CreateDirectory(tmpFolder);
using (var lineIterator = File.ReadLines(fileName).GetEnumerator())
{
bool stillGoing = true;
for (chunk = 0; stillGoing; chunk++)
{
stillGoing = WriteChunk(lineIterator, splitSize, chunk, tmpFolder, queue);
}
}
return chunk;
}
private static bool WriteChunk(IEnumerator<string> lineIterator,
int splitSize, int chunk, string tmpFolder, List<string> queue)
{
try
{
//int tmpChunkSize = 1000;
//int tmpChunkInc = 0;
string splitFile = Path.Combine(tmpFolder, "file" + chunk + ".txt");
using (var writer = File.CreateText(splitFile))
{
queue.Add(splitFile);
for (int i = 0; i < splitSize; i++)
{
if (!lineIterator.MoveNext())
{
return false;
}
writer.WriteLine(lineIterator.Current);
}
}
return true;
}
catch (Exception)
{
throw;
}
}
It creates around 36 text files (around 800 MB), but starting throwing "Out of memory exception" in creation of 37th File at lineIterator.MoveNext().
While lineIterator.Current shows the value in debugger.
As It s a huge file you should read it Seek and ReadBytes methods of BinaryReader.
You can see a simple example here. After you use the ReadBytes check for the last new lines and write the process file in certain amount of lines you read. Don t write every line you read and also don t keep all the data in the memory.
The rest is in your hands.
Maybe it is realted to that one When does File.ReadLines free resources
IEnumerable doesn't inherit from IDisposable because typically, the class that implements it only gives you the promise of being enumerable, it hasn't actually done anything yet that warrants disposal.
I know that exceptions have a performance penalty, and that it's generally more efficient to try and avoid exceptions than to drop a big try/catch around everything -- but what about the try block itself? What's the cost of merely declaring a try/catch, even if it never throws an exception?
The performance cost of try is very small. The major cost of exception handling is getting the stack trace and other metadata, and that's a cost that's not paid until you actually have to throw an exception.
But this will vary by language and implementation. Why not write a simple loop in C# and time it yourself?
Actually, a couple months ago I was creating an ASP.NET web app, and I accidentally wrapped a try / catch block with a very long loop. Even though the loop wasn't generating every exceptions, it was taking too much time to finish. When I went back and saw the try / catch wrapped by the loop, I did it the other way around, I wrapped the loop IN the try / catch block. Performance improved a LOT. You can try this on your own: do something like
int total;
DateTime startTime = DateTime.Now;
for(int i = 0; i < 20000; i++)
{
try
{
total += i;
}
catch
{
// nothing to catch;
}
}
Console.Write((DateTime.Now - startTime).ToString());
And then take out the try / catch block. You'll see a big difference!
A common saying is that exceptions are expensive when they are caught - not thrown. This is because most of the exception metadata gathering (such as getting a stack trace etc.) only really happens on the try-catch side (not on the throw side).
Unwinding the stack is actually pretty quick - the CLR walks up the call stack and only pays heed to the finally blocks it finds; at no point in a pure try-finally block does the runtime attempt to 'complete' an exception (it's metadata etc.).
From what I remember, any try-catches with filters (such as "catch (FooException) {}") are just as expensive - even if they do not do anything with the exception.
I would venture to say that a method (call it CatchesAndRethrows) with the following block:
try
{
ThrowsAnException();
}
catch
{
throw;
}
Might result in a faster stack walk in a method - such as:
try
{
CatchesAndRethrows();
}
catch (Exception ex) // The runtime has already done most of the work.
{
// Some fancy logic
}
Some numbers:
With: 0.13905ms
Without: 0.096ms
Percent difference: 144%
Here is the benchmark I ran (remember, release mode - run without debug):
static void Main(string[] args)
{
Stopwatch withCatch = new Stopwatch();
Stopwatch withoutCatch = new Stopwatch();
int iterations = 20000;
for (int i = 0; i < iterations; i++)
{
if (i % 100 == 0)
{
Console.Write("{0}%", 100 * i / iterations);
Console.CursorLeft = 0;
Console.CursorTop = 0;
}
CatchIt(withCatch, withoutCatch);
}
Console.WriteLine("With: {0}ms", ((float)(withCatch.ElapsedMilliseconds)) / iterations);
Console.WriteLine("Without: {0}ms", ((float)(withoutCatch.ElapsedMilliseconds)) / iterations);
Console.WriteLine("Percent difference: {0}%", 100 * withCatch.ElapsedMilliseconds / withoutCatch.ElapsedMilliseconds);
Console.ReadKey(true);
}
static void CatchIt(Stopwatch withCatch, Stopwatch withoutCatch)
{
withCatch.Start();
try
{
FinallyIt(withoutCatch);
}
catch
{
}
withCatch.Stop();
}
static void FinallyIt(Stopwatch withoutCatch)
{
try
{
withoutCatch.Start();
ThrowIt(withoutCatch);
}
finally
{
withoutCatch.Stop();
}
}
private static void ThrowIt(Stopwatch withoutCatch)
{
throw new NotImplementedException();
}
To see what it really costs, you can run the code below. It takes a simple two dimensional array and generates random coordinates which is out of range. If your exception only occurs one time, of course you will not notice it. My example is done to emphasize what it will mean when doing this several thousand times, and what catching an exception vs implementing a simple test will save you.
const int size = 1000;
const int maxSteps = 100000;
var randomSeed = (int)(DateTime.UtcNow - new DateTime(1970,1,1,0,0,0).ToLocalTime()).TotalMilliseconds;
var random = new Random(randomSeed);
var numOutOfRange = 0;
var grid = new int[size,size];
var stopwatch = new Stopwatch();
Console.WriteLine("---Start test with exception---");
stopwatch.Reset();
stopwatch.Start();
for (int i = 0; i < maxSteps; i++)
{
int coord = random.Next(0, size * 2);
try
{
grid[coord, coord] = 1;
}
catch (IndexOutOfRangeException)
{
numOutOfRange++;
}
}
stopwatch.Stop();
Console.WriteLine("Time used: " + stopwatch.ElapsedMilliseconds + "ms, Number out of range: " + numOutOfRange);
Console.WriteLine("---End test with exception---");
random = new Random(randomSeed);
stopwatch.Reset();
Console.WriteLine("---Start test without exception---");
numOutOfRange = 0;
stopwatch.Start();
for (int i = 0; i < maxSteps; i++)
{
int coord = random.Next(0, size * 2);
if (coord >= grid.GetLength(0) || coord >= grid.GetLength(1))
{
numOutOfRange++;
continue;
}
grid[coord, coord] = 1;
}
stopwatch.Stop();
Console.WriteLine("Time used: " + stopwatch.ElapsedMilliseconds + "ms, Number out of range: " + numOutOfRange);
Console.WriteLine("---End test without exception---");
Console.ReadLine();
Example output of this code:
---Start test with exception---
Time used: 3228ms, Number out of range: 49795
---End test with exception---
---Start test without exception---
Time used: 3ms, Number out of range: 49795
---End test without exception---
You might want to read up on Structured Exception Handling. It's Window's implementation of exceptions and used in .NET.
http://www.microsoft.com/msj/0197/Exception/Exception.aspx
I have a problem with multithreading in .net. With the following code:
class Program
{
private static ManualResetEvent[] resetEvents;
private void cs(object o)
{
int xx = 0;
while (true)
{
xx++;
System.Xml.XmlDocument document = new System.Xml.XmlDocument();
document.Load("ConsoleApplication6.exe.config");
MSScriptControl.ScriptControlClass s =
newMSScriptControl.ScriptControlClass();
s.Language = "JScript";
object res = s.Eval("1+2");
Console.WriteLine("thread {0} execution {1}" , o , xx);
System.Threading.Thread.Sleep(5000);
}
}
static void Main(string[] args)
{
Program c = new Program();
for (int i = 0; i < 1000; i++)
{
System.Threading.Thread t = new System.Threading.Thread(
new System.Threading.ParameterizedThreadStart(c.cs));
t.Start((object)i);
}
}
}
When this code executed, it crashes after some minutes. Why is it crashing? What can I do to prevent the crashes?
You're starting 1000 threads. That is 1000 MB in stack space alone plus all the objects the threads create. My guess is that you're running out of memory. What are you trying to accomplish?
It rarely makes sense to create more threads than processor cores b/c the app will spend alot of time context switching and you will not see much if any performance gain. The only number of threads that will run at the same time equals the number of cores your machine has. Why does this have to be done with 1000 threads? (Obviously I am assuming you don't have a 1000 core machine)
There are probably some aspects of this app that aren't apparent given the provided code. That being said, I haven't been able to reproduce the crash but I am able to run up an obscene amount of memory usage (160MB at 29 "xx++"s).
I suspect that the ScriptControlClass is staying alive in memory. While there may be better ways of solving this problem, I was able to keep the memory at a consistent 60-70MB with this addition:
Console.WriteLine("thread {0} execution {1}" , o , xx);
s = null;
UPDATE
I had similar results with this code memory-wise but the code below was slightly faster. After several minutes the memory usage was a consistent 51MB when xx < 30 was changed back to true.
class Program
{
private static ManualResetEvent[] resetEvents;
[ThreadStatic]
static ScriptControlClass s;
private void cs(object o)
{
int xx = 0;
if (s == null)
{
s = new ScriptControlClass();
}
/* you should be able to see consistent memory usage after 30 iterations */
while (xx < 30)
{
xx++;
//Unsure why this is here but doesn't seem to affect the memory usage
// like the ScriptControlClass object.
System.Xml.XmlDocument document = new System.Xml.XmlDocument();
document.Load("ConsoleApplication6.exe.config");
s.Language = "JScript";
object res = s.Eval("1+2");
Console.WriteLine("thread {0} execution {1}", o, xx);
System.Threading.Thread.Sleep(5000);
}
s = null;
}
static void Main(string[] args)
{
Program c = new Program();
for (int i = 0; i < 1000; i++)
{
System.Threading.Thread t = new System.Threading.Thread(
new System.Threading.ParameterizedThreadStart(c.cs));
t.Start((object)i);
}
}
}
UPDATE 2
My Googling can't find anything about using the ScriptControlClass in a multithreaded environment.