Why is my memory running out when file gets to big?

Why is my memory running out when file gets to big? - c#

So the error I am getting is this.
An unhandled exception of type 'System.OutOfMemoryException' occurred
in mscorlib.dll
I have never encountered this error before and I've looked it up on Google.
I do have a 64-Bit system.
I do have 16GB of RAM.
Some people said that I need to set the platform target to x64 in my project properties but wont that make it so only 64-bit systems will be able to run this application?
public static string RC4(string input, string key)
{
StringBuilder result = new StringBuilder();
int x, y, j = 0;
int[] box = new int[256];
for (int i = 0; i < 256; i++)
{
box[i] = i;
}
for (int i = 0; i < 256; i++)
{
j = (key[i % key.Length] + box[i] + j) % 256;
x = box[i];
box[i] = box[j];
box[j] = x;
}
for (int i = 0; i < input.Length; i++)
{
y = i % 256;
j = (box[y] + j) % 256;
x = box[y];
box[y] = box[j];
box[j] = x;
result.Append((char)(input[i] ^ box[(box[y] + box[j]) % 256]));
}
return result.ToString(); //This would be the line throwing me the error.
}
Because every second im appending new text from a keyboard hook to a textfile. So lets say I type abc the first second its going to append that, now if I type def the next second its going to append that. This is all happening inside of a timer tick so its really straight forward.
Whenever the textfile reaches around 350 000KB it throws me that error.

There is a limit to how big single objects are allowed to be, even on x64. The actual limit doesn't matter and depends on configuration, but the upshot is that when processing large volumes of data you should read through the data in pieces (usually via Stream), processing chunks at a time. Never try to hold the entire thing in memory at once. This applies equally to input and output.
Now, if you managed to load the entire thing into input, then you got lucky; but StringBuilder is intentionally oversized, so that it doesn't have to keep allocating all the time. You might be able to "fix" your code by telling StringBuilder the correct amount of characters you need in the constructor, but that is only a temporary hack that will let you to process slightly larger data. The real fix is to not attempt to process huge data in a single chunk.

Related

List<List<float>> hitting memory limits

When I open large audio files, I get an out of memory error.
I know this method isn't common, so I thought I have to ask a real boffin.
I have the following happening:
public static List<List<float>> int_filenamewavedataRight = new List<List<float>>();
From there I open up an audio file and load the audio level values in this array, so I can view them correctly through the naudio library.
I clear the arrays like so for a new file:
int_filenamewavedataRight.Clear();
int_filenamewavedataRight.Add(new List<float>());
Then, I load all of the values in to memory for speedy display of the waveforms:
waveStream.Position = 0;
int bytesRead;
byte[] waveData = new byte[bytesPerSample];
waveStream.Position = 0; // startPosition + (e.ClipRectangle.Left * bytesPerSample * samplesPerPixel);
int samples = (int)(waveStream.Length / bytesPerSample);
wavepeakloaded = 0;
int OldPrecentVal = 0;
for (int x = 0; x < samples; x++)
{
short high = 0;
bytesRead = waveStream.Read(waveData, 0, bytesPerSample);
if (bytesRead == 0)
break;
for (int n = 0; n < bytesRead; n += 2)
{
short sample = BitConverter.ToInt16(waveData, n);
if (sample > high) high = sample;
}
float highPercent2 = (float)Math.Round(((((float)high) - short.MinValue) / ushort.MaxValue), 2);
// ERRORING HERE
// ERRORING HERE
int_filenamewavedataRight[filename_value].Add((float)Math.Round(highPercent2, 2));
// ERRORING HERE
// ERRORING HERE
}
Small audio files, of a typical song with 5 mins in length are fine, but longer files, of 25mins or more, create an exception when a count of 67108864 occurs, and I then get an "Exception of type 'System.OutOfMemoryException' was thrown."
x = {"Exception of type 'System.OutOfMemoryException' was thrown."}
ex.StackTrace " at System.Collections.Generic.List`1.set_Capacity(Int32 value)\r\n at System.Collections.Generic.List`1.EnsureCapacity(Int32 min)\r\n at System.Collections.Generic.List`1.Add(T item)\r\n at APP.WaveViewer.LoadWaveToMemory(Int32 filename_value) in WaveViewer.cs:line 1391"
I'm using a list of a list so that I can address the audio files like a simple array, but as I don't know the initial size of the array, I can specify a size initially.
I'm also pre-loading the waveform data like this so I can have playback, zoom and pan the audio file at the same time.
Is this easily fixable, or should I find a different way of doing this, such as writing these peak volume values to a temporary file, rather than keeping them in memory, or is there a better way?
I've looked this up in various places on the net, such as here however, it seems like a rare thing to be doing this.
Thanks.

The simple answer is it was running in 32bit (x86) which doesn't have enough allocation for 50,000,000+ samples.
Instead, changing the program to x64 has solved that specific problem.
That's what I like about SO where you can pool resources from so many people and learn as you ask questions.

How to get the # of occurances of a char in a string FAST in C#?

I have a txt file. Right now, I need to load it line by line, and check how many times a '#' are in the entire file.
So, basically, I have a single line string, how to get the # of occurances of '#' fast?
I need to count this fast since we have lots of files like this and each of them are about 300-400MB.
I searched, it seems the straightforward way is the fastest way to do this:
int num = 0;
foreach (char c in line)
{
if (c == '#') num++;
}
Is there a different method that could be faster than this? Any other suggestions?
if needed, we do not have to load the txt file line by line, but we do need to know the # lines in each file.
Thanks

The fastest approach is really bound to I/O capabilities and computational speed. Usually the best method to understand what is the fastest technique is to benchmark them.
Disclaimer: Results are (of course) bound to my machine and may vary significantly on different hardware. For testing I have used a single text file of about 400MB in size. If interested the file may be downloaded here (zipped). Executable compiled as x86.
Option 1: Read entire file, no parallelization
long count = 0;
var text = File.ReadAllText("C:\\tmp\\test.txt");
for(var i = 0; i < text.Length; i++)
if (text[i] == '#')
count++;
Results:
Average execution time: 5828 ms
Average process memory: 1674 MB
This is the "naive" approach, which reads the entire file in memory and then uses a for loop (which is significantly faster than foreach or LINQ).
As expected process occupied memory is very high (about 4 times the file size), this may be caused by a combination of string size in memory (more info here) and string processing overhead.
Option 2: Read file in chunks, no parallelization
long count = 0;
using(var file = File.OpenRead("C:\\tmp\\test.txt"))
using(var reader = new StreamReader(file))
{
const int size = 500000; // chunk size 500k chars
char[] buffer = new char[size];
while(!reader.EndOfStream)
{
var read = await reader.ReadBlockAsync(buffer, 0, size); // read chunk
for(var i = 0; i < read; i++)
if(buffer[i] == '#')
count++;
}
}
Results:
Average execution time: 4819 ms
Average process memory: 7.48 MB
This was unexpected. In this version we are reading the file in chunks of 500k characters instead of loading it entirely in memory, and execution time is even lower than the previous approach. Please note that reducing the chunk size will increase execution time (because of the overhead). Memory consumption is extremely low (as expected, we are only loading roughly 500kB/1MB in memory directly into a char array).
Better (or worse) performance may be obtained by changing the chunk size.
Option 3: Read file in chunks, with parallelization
long count = 0;
using(var file = File.OpenRead("C:\\tmp\\test.txt"))
using(var reader = new StreamReader(file))
{
const int size = 2000000; // this is roughly 4 times the single threaded value
const int parallelization = 4; // this will split chunks in sub-chunks processed in parallel
char[] buffer = new char[size];
while(!reader.EndOfStream)
{
var read = await reader.ReadBlockAsync(buffer, 0, size);
var sliceSize = read/parallelization;
var counts = new long[parallelization];
Parallel.For(0, parallelization, i => {
var start = i * sliceSize;
var end = start + sliceSize;
if(i == parallelization)
end += read % parallelization;
long localCount = 0;
for(var j = start; j < end; j++)
{
if(buffer[(int)j] == '#')
localCount++;
}
counts[i] = localCount;
});
count += counts.Sum();
}
}
Results:
Average execution time: 3363 ms
Average process memory: 10.37 MB
As expected this version performs better the the single threaded one, but not 4 times better as we could have thought. Memory consumption is again very low compared to the first version (same considerations as before) and we are taking advantage of multi-core environments.
Parameters like chunk size and number of parallel tasks may significantly change the results, you should just go by trial and error to find what is the best combination for you.
Conclusions
I was inclined to think that the "load everything in memory" version was the fastest, but this really depends on the overhead of string processing and I/O speed. The parallel-chunked approach seems the fastest in my machine, this should lead you to an idea: when in doubt just benchmark it.

You can test if it's faster, but a shorter way to write it would be:
int num = File.ReadAllText(filePath).Count(i => i == '#');
Hmm, but I just saw you need the line count as well, so this is similar. Again, would need to be compared to what you have:
var fileLines = File.ReadAllLines(filePath);
var count = fileLines.Length();
var num = fileLines.Sum(line => line.Count(i => i == '#'));

You could use pointers. I don't know if this would be any faster though. You would have to do some testing:
static void Main(string[] args)
{
string str = "This is # my st#ing";
int numberOfCharacters = 0;
unsafe
{
fixed (char *p = str)
{
char *ptr = p;
while (*ptr != '\0')
{
if (*ptr == '#')
numberOfCharacters++;
ptr++;
}
}
}
Console.WriteLine(numberOfCharacters);
}
Note that you must go into your project properties and allow unsafe code in order for this code to work.

Why isn't this method returning anything?

This is for my internship so I can't give much more context, but this method isn't returning the desired int and causing an Index Out of Bounds exception instead.
The String[] taken into the method is composed of information from a handheld scanner used by my company's shipping department. Its resulting dataAsByteArray is really a Byte[][] so the .Length in the nested If statement will get the number of Bytes of a Bundle entry and then add it to fullBundlePacketSize as long as the resulting sum is less than 1000.
Why less than 1000? The bug I've been tasked with fixing is that some scanners (with older versions of Bluetooth) will only transmit about 1000 bytes of data to the receiver at a time. This method is to find how many bytes can be transmitted without cutting into a bundle entry (originally I had it hard coded to just transmit 1000 bytes at a time and that caused the receiver to get invalid bundle data).
The scanners are running a really old version of Windows CE and trying to debug in VS (2008) just opens an emulator for the device which doesn't help.
I'm sure it's something really simple, but I feel like a new set of eyes looking at it would help, so any help or solutions are greatly appreciated!
private int MaxPacketSizeForFullBundle(string[] data)
{
int fullBundlePacketSize = 0;
var dataAsByteArray = data.Select(s => Encoding.ASCII.GetBytes(s)).ToArray();
for (int i = 0; i < dataAsByteArray.Length; i++)
{
if ((fullBundlePacketSize + dataAsByteArray[i + 1].Length < 1000))
{
fullBundlePacketSize += dataAsByteArray[i].Length;
}
}
return fullBundlePacketSize;
}

Take a look at your loop:
for (int i = 0; i < dataAsByteArray.Length; i++)
{
if ((fullBundlePacketSize + dataAsByteArray[i + 1].Length < 1000))
^^^^^
I suspect you are throwing an exception because you are indexing an array beyond its length.
Do you mean this?
for (int i = 0; i < (dataAsByteArray.Length - 1); i++)

In conjunction is #n8wrl's answer. Your problem was accessing out of range exception. I believe you were attempting to access the actual byte values inside of the values in dataAsByteArray.
string[] data = {"444", "abc445x"};
int x = MaxPacketSizeForFullBundle(data);
dataAsByteArray = {byte[2][]}
byte[2][0] contains {byte[3]} which contains the actual values of each character in the string, so for "444", it would contain 52, 52, 52. You are probably attempting to access these individual values which means that you need to access the deeper nested bytes with byte[i][j] where 0<j<data[i].Length.

Video rate image construction from binary data performance

First things first:
I have a git repo over here that holds the code of my current efforts and an example data set
Background
The example data set holds a bunch of records in Int32 format. Each record is composed of several bit fields that basically hold info on events where an event is either:
The detection of a photon
The arrival of a synchronizing signal
Each Int32 record can be treated like following C-style struct:
struct {
unsigned TimeTag :16;
unsigned Channel :12;
unsigned Route :2;
unsigned Valid :1;
unsigned Reserved :1; } TTTRrecord;
Whether we are dealing with a photon record or a sync event, time
tag will always hold the time of the event relative to the start of
the experiment (macro-time).
If a record is a photon, valid == 1.
If a record is a sync signal or something else, valid == 0.
If a record is a sync signal, sync type = channel & 7 will give either a value indicating start of frame or end of scan line in a frame.
The last relevant bit of info is that Timetag is 16 bit and thus obviously limited. If the Timetag counter rolls over, the rollover counter is incremented. This rollover (overflow) count can easily be obtained from channel overflow = Channel & 2048.
My Goal
These records come in from a high speed scanning microscope and I would like to use these records to reconstruct images from the recorded photon data, preferably at 60 FPS.
To do so, I obviously have all the info:
I can look over all available data, find all overflows, which allows me to reconstruct the sequential macro time for each record (photon or sync).
I also know when the frame started and when each line composing the frame ended (and thus also how many lines there are).
Therefore, to reconstruct a bitmap of size noOfLines * noOfLines I can process the bulk array of records line by line where each time I basically make a "histogram" of the photon events with edges at the time boundary of each pixel in the line.
Put another way, if I know Tstart and Tend of a line, and I know the number of pixels I want to spread my photons over, I can walk through all records of the line and check if the macro time of my photons falls within the time boundary of the current pixel. If so, I add one to the value of that pixel.
This approach works, current code in the repo gives me the image I expect but it is too slow (several tens of ms to calculate a frame).
What I tried already:
The magic happens in the function int[] Renderline (see repo).
public static int[] RenderlineV(int[] someRecords, int pixelduration, int pixelCount)
{
// Will hold the pixels obviously
int[] linePixels = new int[pixelCount];
// Calculate everything (sync, overflow, ...) from the raw records
int[] timeTag = someRecords.Select(x => Convert.ToInt32(x & 65535)).ToArray();
int[] channel = someRecords.Select(x => Convert.ToInt32((x >> 16) & 4095)).ToArray();
int[] valid = someRecords.Select(x => Convert.ToInt32((x >> 30) & 1)).ToArray();
int[] overflow = channel.Select(x => (x & 2048) >> 11).ToArray();
int[] absTime = new int[overflow.Length];
absTime[0] = 0;
Buffer.BlockCopy(overflow, 0, absTime, 4, (overflow.Length - 1) * 4);
absTime = absTime.Cumsum(0, (prev, next) => prev * 65536 + next).Zip(timeTag, (o, tt) => o + tt).ToArray();
long lineStartTime = absTime[0];
int tempIdx = 0;
for (int j = 0; j < linePixels.Length; j++)
{
int count = 0;
for (int i = tempIdx; i < someRecords.Length; i++)
{
if (valid[i] == 1 && lineStartTime + (j + 1) * pixelduration >= absTime[i])
{
count++;
}
}
// Avoid checking records in the raw data that were already binned to a pixel.
linePixels[j] = count;
tempIdx += count;
}
return linePixels;
}
Treating photon records in my data set as an array of structs and addressing members of my struct in an iteration was a bad idea. I could increase speed significantly (2X) by dumping all bitfields into an array and addressing these. This version of the render function is already in the repo.
I also realised I could improve the loop speed by making sure I refer to the .Length property of the array I am running through as this supposedly eliminates bounds checking.
The major speed loss is in the inner loop of this nested set of loops:
for (int j = 0; j < linePixels.Length; j++)
{
int count = 0;
lineStartTime += pixelduration;
for (int i = tempIdx; i < absTime.Length; i++)
{
//if (lineStartTime + (j + 1) * pixelduration >= absTime[i] && valid[i] == 1)
// Seems quicker to calculate the boundary before...
//if (valid[i] == 1 && lineStartTime >= absTime[i] )
// Quicker still...
if (lineStartTime > absTime[i] && valid[i] == 1)
{
// Slow... looking into linePixels[] each iteration is a bad idea.
//linePixels[j]++;
count++;
}
}
// Doing it here is faster.
linePixels[j] = count;
tempIdx += count;
}
Rendering 400 lines like this in a for loop takes roughly 150 ms in a VM (I do not have a dedicated Windows machine right now and I run a Mac myself, I know I know...).
I just installed Win10CTP on a 6 core machine and replacing the normal loops by Parallel.For() increases the speed by almost exactly 6X.
Oddly enough, the non-parallel for loop runs almost at the same speed in the VM or the physical 6 core machine...
Regardless, I cannot imagine that this function cannot be made quicker. I would first like to eke out every bit of efficiency from the line render before I start thinking about other things.
I would like to optimise the function that generates the line to the maximum.
Outlook
Until now, my programming dealt with rather trivial things so I lack some experience but things I think I might consider:
Matlab is/seems very efficient with vectored operations. Could I achieve similar things in C#, i.e. by using Microsoft.Bcl.Simd? Is my case suited for something like this? Would I see gains even in my VM or should I definitely move to real HW?
Could I gain from pointer arithmetic/unsafe code to run through my arrays?
...
Any help would be greatly, greatly appreciated.
I apologize beforehand for the quality of the code in the repo, I am still in the quick and dirty testing stage... Nonetheless, criticism is welcomed if it is constructive :)
Update
As some mentioned, absTime is ordered already. Therefore, once a record is hit that is no longer in the current pixel or bin, there is no need to continue the inner loop.
5X speed gain by adding a break...
for (int i = tempIdx; i < absTime.Length; i++)
{
//if (lineStartTime + (j + 1) * pixelduration >= absTime[i] && valid[i] == 1)
// Seems quicker to calculate the boundary before...
//if (valid[i] == 1 && lineStartTime >= absTime[i] )
// Quicker still...
if (lineStartTime > absTime[i] && valid[i] == 1)
{
// Slow... looking into linePixels[] each iteration is a bad idea.
//linePixels[j]++;
count++;
}
else
{
break;
}
}

C# Char permutation with repetition on large set of chars

Hello Im trying to get all possible combinations with repetitions of given char array.
Char array consists of alphabet letters(only lower) and I need to generate strings with length of 30 or more chars.
I tried with method of many for-loops,but when I try to get all combinations of char in char array with length of string more then 5 I get out of Memory Exception.
So I created similar Method that takes only first 200000 strings,then next 2000000 and so on this was proven sucessfull but only with smaller length strings.
This was my method with length of 7 chars:
public static int Progress = 0;
public static ArrayList CreateRngUrl7()
{
ArrayList AllCombos = new ArrayList();
int passed = 0;
int Too = Progress + 200000;
char[] alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ".ToLower().ToCharArray();
for (int i = 0; i < alpha.Length; i++)
for (int i1 = 0; i1 < alpha.Length; i1++)
for (int i2 = 0; i2 < alpha.Length; i2++)
for (int i3 = 0; i3 < alpha.Length; i3++)
for (int i4 = 0; i4 < alpha.Length; i4++)
for (int i5 = 0; i5 < alpha.Length; i5++)
for (int i6 = 0; i6 < alpha.Length; i6++)
{
if (passed > (Too - 200000) && passed < Too)
{
string word = new string(new char[] { alpha[i], alpha[i1], alpha[i2], alpha[i3], alpha[i4], alpha[i5],alpha[i6] });
AllCombos.Add(word);
}
passed++;
}
if (Too >= passed)
{
MessageBox.Show("All combinations of RNG7 were returned");
}
Progress = Too;
return AllCombos;
}
I tried adding 30 for-loops with in way described above so i Would get strings with lenghts of 30 but application just hangs.Is there any better way to do this? All answers would be much appreciated. Thank you in advance!
Can someone please just post method how it is done with larger legth strings I just want to see an example? I don't have to store that data,I just need to compare it with something and release it from memory. I used alphabet for example I don't need whole alphabet.Question was not how long it would take or how much combinations would it be!!!!!

You get an OutOfMemoryException because inside the loop you allocate a string and store it in an ArrayList. The strings have to stay in memory until the ArrayList is garbage collected and your loop creates more strings than you will be able to store.
If you simply want to check the string for a condition you should put the check inside the loop:
for ( ... some crazy loop ...) {
var word = ... create word ...
if (!WordPassesTest(word)) {
Console.WriteLine(word + " failed test.");
return false;
}
}
return true;
Then you only need storage for a single word. Of course, if the loop is crazy enough, it will not terminate before the end of the universe as we know it.
If you need to execute many nested but similar loops you can use recursion to simplify the code. Here is an example that is not incredible efficient, but at least it is simple:
Char[] chars = "ABCD".ToCharArray();
IEnumerable<String> GenerateStrings(Int32 length) {
if (length == 0) {
yield return String.Empty;
yield break;
}
var strings = chars.SelectMany(c => GenerateStrings(length - 1), (c, s) => c + s);
foreach (var str in strings)
yield return str;
}
Calling GenerateStrings(3) will generate all strings of length 3 using lazy evaluation (so no additional storage is required for the strings).
Building on top of an IEnumerable generating your strings you can create primites to buffer and process buffers of strings. An easy solution is to using Reactive Extensions for .NET. Here you already have a Buffer primitive:
GenerateStrings(3)
.ToObservable()
.Buffer(10)
.Subscribe(list => ... ship the list to another computer and process it ...);
The lambda in Subscribe will be called with a List<String> with at most 10 strings (the parameter provided in the call to Buffer).
Unless you have an infinte number of computers you will still have to pull the computers from a pool and only recycle them back to the pool when they have finished the computation.
It should be obvious from the comments on this question that you will not be able to process 26^30 strings even if you have multiple computers at your disposal.

I don't have time right now to write some code but essentially if you are running out of RAM use disk. I'm thinking along the lines of one thread running an algorithm to find the combinations and another persisting the results to disk and releasing the RAM.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why is my memory running out when file gets to big? - c#

Related

List<List<float>> hitting memory limits

How to get the # of occurances of a char in a string FAST in C#?

Why isn't this method returning anything?

Video rate image construction from binary data performance

C# Char permutation with repetition on large set of chars

Categories

Resources