List<List<float>> hitting memory limits - c#

When I open large audio files, I get an out of memory error.
I know this method isn't common, so I thought I have to ask a real boffin.
I have the following happening:
public static List<List<float>> int_filenamewavedataRight = new List<List<float>>();
From there I open up an audio file and load the audio level values in this array, so I can view them correctly through the naudio library.
I clear the arrays like so for a new file:
int_filenamewavedataRight.Clear();
int_filenamewavedataRight.Add(new List<float>());
Then, I load all of the values in to memory for speedy display of the waveforms:
waveStream.Position = 0;
int bytesRead;
byte[] waveData = new byte[bytesPerSample];
waveStream.Position = 0; // startPosition + (e.ClipRectangle.Left * bytesPerSample * samplesPerPixel);
int samples = (int)(waveStream.Length / bytesPerSample);
wavepeakloaded = 0;
int OldPrecentVal = 0;
for (int x = 0; x < samples; x++)
{
short high = 0;
bytesRead = waveStream.Read(waveData, 0, bytesPerSample);
if (bytesRead == 0)
break;
for (int n = 0; n < bytesRead; n += 2)
{
short sample = BitConverter.ToInt16(waveData, n);
if (sample > high) high = sample;
}
float highPercent2 = (float)Math.Round(((((float)high) - short.MinValue) / ushort.MaxValue), 2);
// ERRORING HERE
// ERRORING HERE
int_filenamewavedataRight[filename_value].Add((float)Math.Round(highPercent2, 2));
// ERRORING HERE
// ERRORING HERE
}
Small audio files, of a typical song with 5 mins in length are fine, but longer files, of 25mins or more, create an exception when a count of 67108864 occurs, and I then get an "Exception of type 'System.OutOfMemoryException' was thrown."
x = {"Exception of type 'System.OutOfMemoryException' was thrown."}
ex.StackTrace " at System.Collections.Generic.List`1.set_Capacity(Int32 value)\r\n at System.Collections.Generic.List`1.EnsureCapacity(Int32 min)\r\n at System.Collections.Generic.List`1.Add(T item)\r\n at APP.WaveViewer.LoadWaveToMemory(Int32 filename_value) in WaveViewer.cs:line 1391"
I'm using a list of a list so that I can address the audio files like a simple array, but as I don't know the initial size of the array, I can specify a size initially.
I'm also pre-loading the waveform data like this so I can have playback, zoom and pan the audio file at the same time.
Is this easily fixable, or should I find a different way of doing this, such as writing these peak volume values to a temporary file, rather than keeping them in memory, or is there a better way?
I've looked this up in various places on the net, such as here however, it seems like a rare thing to be doing this.
Thanks.

The simple answer is it was running in 32bit (x86) which doesn't have enough allocation for 50,000,000+ samples.
Instead, changing the program to x64 has solved that specific problem.
That's what I like about SO where you can pool resources from so many people and learn as you ask questions.

Related

CSharp: Failed to read LARGE .npy file. Exception is "NumSharp.dll Arithmetic operation resulted in an overflow."

I am trying to read a large .npy file in CSharp.
In order to do that i am trying to use the NumSharp nuget.
The file is 7GB jagged float array (float[][]). It has ~1 million vectors, each vector is a 960 dimension.
Note:
To be more specific the data I use is the GIST from the following link Approximate Nearest Neighbors Large datasets.
The following is the method I use to load the data but it failes with an exception:
private static void ReadNpyVectorsFromFile(string pathPrefix, out List<float[]> candidates)
{
var npyFilename = #$"{pathPrefix}.npy";
var v = np.load(npyFilename);//NDArray
candidates = v
.astype(np.float32)
.ToJaggedArray<float>()
.OfType<float[]>()
.Select(a =>
{
return a.OfType<float>().ToArray();
})
.ToList();
}
The exception is:
Exception thrown: 'System.OverflowException' in NumSharp.dll An
unhandled exception of type 'System.OverflowException' occurred in
NumSharp.dll Arithmetic operation resulted in an overflow.
How can I workaround this?
Update
The NumSharp package has a limitation if the file is too big.
Read the comments/answers below for more explanations.
I added one answer with a suggestion for a workaround
However,
As a good alternative is to save the data as .npz (refer to: numpy.savez()) and then the following package can do the job:
https://github.com/matajoh/libnpy
Code sample:
NPZInputStream npz = new NPZInputStream(npyFilename);
var keys = npz.Keys();
//var header = npz.Peek(keys[0]);
var t = npz.ReadFloat32(keys[0]);
Debug.Assert(t.DataType == DataType.FLOAT32);
I see that you've already found a workaround. Just in case you want to now the cause of your problem, it is because of a limitation of the Array class in .NET.
The np.load(string path) method is defined here, which in turn calls np.load(Stream stream).
int bytes;
Type type;
int[] shape;
if (!parseReader(reader, out bytes, out type, out shape))
throw new FormatException();
Array array = Arrays.Create(type, shape.Aggregate((dims, dim) => dims * dim));
var result = new NDArray(readValueMatrix(reader, array, bytes, type, shape));
return result.reshape(shape);
Here, bytes is the size of your date type. Because you are using float, this value is 4. And shape is the number of vectors and the shape of them.
Next, let's look at the readValueMatrix method.
int total = 1;
for (int i = 0; i < shape.Length; i++)
total *= shape[i];
var buffer = new byte[bytes * total];
// omitted
NumSharp is trying to create a one-dimensional byte array with size equals bytes * total. Here, bytes is 4 and total is the number of vectors multiple by size of all dimensions.
However, in .NET, the maximum index in any given dimension of a byte array is 0X7FFFFFC7, which is 2147483591, as documented here. I haven't downloaded your data yet, but my guess is it is big enough that bytes * total > 2147483591.
Note that if you want to use NumSharp to write you data back to npy file then you will have the same problem inside writeValueMatrix method.
The issue is that the NumSharp data-structure is a heavy RAM consumer and it seems to be the CSharp GC is not aware of what NumSharp is allocating so it reaches the RAM limit very fast.
So,
In order to overcome this, I split the input npy file so that every part should not consume more than max memory allocation allowed in C# (2147483591). In my case i split into 5 different files (200k vectors each).
python part to split the large .npy file:
infile = r'C:\temp\input\GIST.1m.npy'
data = np.load(infile)
# create 5 files
incr = int(data.shape[0] / 5)
# the +1 is to handle any leftovers
r = range(0, int(size/incr + 1))
for i in r:
print(i)
start = i * incr
stop = min(start + incr, size)
if(start >= len(data)):
break
np.save(infile.replace('.npy', f'.{i}.npy'), data[start:stop])
Now in CSharp the code looks as follows:
private static void ReadNpyVectorsFromFile(string pathPrefix, out List<float[]> candidates)
{
candidates = new List<float[]>();
// TODO:
// For now I am assuming there are 10 files maximum...
// this can be improved by scanning the input folder and
// collecting all the relevant files.
foreach (var i in Enumerable.Range(-1, 10))
{
var npyFilename = #$"{pathPrefix}.{i}.npy";
Console.WriteLine(npyFilename);
if (!File.Exists(npyFilename))
continue;
var v = np.load(npyFilename); //NDArray
var tempList = v
.astype(np.float32)
.ToJaggedArray<float>()
.OfType<float[]>()
.Select(a => { return a.OfType<float>().ToArray(); })
.ToList();
candidates.AddRange(tempList);
}
}

How to get the # of occurances of a char in a string FAST in C#?

I have a txt file. Right now, I need to load it line by line, and check how many times a '#' are in the entire file.
So, basically, I have a single line string, how to get the # of occurances of '#' fast?
I need to count this fast since we have lots of files like this and each of them are about 300-400MB.
I searched, it seems the straightforward way is the fastest way to do this:
int num = 0;
foreach (char c in line)
{
if (c == '#') num++;
}
Is there a different method that could be faster than this? Any other suggestions?
if needed, we do not have to load the txt file line by line, but we do need to know the # lines in each file.
Thanks
The fastest approach is really bound to I/O capabilities and computational speed. Usually the best method to understand what is the fastest technique is to benchmark them.
Disclaimer: Results are (of course) bound to my machine and may vary significantly on different hardware. For testing I have used a single text file of about 400MB in size. If interested the file may be downloaded here (zipped). Executable compiled as x86.
Option 1: Read entire file, no parallelization
long count = 0;
var text = File.ReadAllText("C:\\tmp\\test.txt");
for(var i = 0; i < text.Length; i++)
if (text[i] == '#')
count++;
Results:
Average execution time: 5828 ms
Average process memory: 1674 MB
This is the "naive" approach, which reads the entire file in memory and then uses a for loop (which is significantly faster than foreach or LINQ).
As expected process occupied memory is very high (about 4 times the file size), this may be caused by a combination of string size in memory (more info here) and string processing overhead.
Option 2: Read file in chunks, no parallelization
long count = 0;
using(var file = File.OpenRead("C:\\tmp\\test.txt"))
using(var reader = new StreamReader(file))
{
const int size = 500000; // chunk size 500k chars
char[] buffer = new char[size];
while(!reader.EndOfStream)
{
var read = await reader.ReadBlockAsync(buffer, 0, size); // read chunk
for(var i = 0; i < read; i++)
if(buffer[i] == '#')
count++;
}
}
Results:
Average execution time: 4819 ms
Average process memory: 7.48 MB
This was unexpected. In this version we are reading the file in chunks of 500k characters instead of loading it entirely in memory, and execution time is even lower than the previous approach. Please note that reducing the chunk size will increase execution time (because of the overhead). Memory consumption is extremely low (as expected, we are only loading roughly 500kB/1MB in memory directly into a char array).
Better (or worse) performance may be obtained by changing the chunk size.
Option 3: Read file in chunks, with parallelization
long count = 0;
using(var file = File.OpenRead("C:\\tmp\\test.txt"))
using(var reader = new StreamReader(file))
{
const int size = 2000000; // this is roughly 4 times the single threaded value
const int parallelization = 4; // this will split chunks in sub-chunks processed in parallel
char[] buffer = new char[size];
while(!reader.EndOfStream)
{
var read = await reader.ReadBlockAsync(buffer, 0, size);
var sliceSize = read/parallelization;
var counts = new long[parallelization];
Parallel.For(0, parallelization, i => {
var start = i * sliceSize;
var end = start + sliceSize;
if(i == parallelization)
end += read % parallelization;
long localCount = 0;
for(var j = start; j < end; j++)
{
if(buffer[(int)j] == '#')
localCount++;
}
counts[i] = localCount;
});
count += counts.Sum();
}
}
Results:
Average execution time: 3363 ms
Average process memory: 10.37 MB
As expected this version performs better the the single threaded one, but not 4 times better as we could have thought. Memory consumption is again very low compared to the first version (same considerations as before) and we are taking advantage of multi-core environments.
Parameters like chunk size and number of parallel tasks may significantly change the results, you should just go by trial and error to find what is the best combination for you.
Conclusions
I was inclined to think that the "load everything in memory" version was the fastest, but this really depends on the overhead of string processing and I/O speed. The parallel-chunked approach seems the fastest in my machine, this should lead you to an idea: when in doubt just benchmark it.
You can test if it's faster, but a shorter way to write it would be:
int num = File.ReadAllText(filePath).Count(i => i == '#');
Hmm, but I just saw you need the line count as well, so this is similar. Again, would need to be compared to what you have:
var fileLines = File.ReadAllLines(filePath);
var count = fileLines.Length();
var num = fileLines.Sum(line => line.Count(i => i == '#'));
You could use pointers. I don't know if this would be any faster though. You would have to do some testing:
static void Main(string[] args)
{
string str = "This is # my st#ing";
int numberOfCharacters = 0;
unsafe
{
fixed (char *p = str)
{
char *ptr = p;
while (*ptr != '\0')
{
if (*ptr == '#')
numberOfCharacters++;
ptr++;
}
}
}
Console.WriteLine(numberOfCharacters);
}
Note that you must go into your project properties and allow unsafe code in order for this code to work.

Why is my memory running out when file gets to big?

So the error I am getting is this.
An unhandled exception of type 'System.OutOfMemoryException' occurred
in mscorlib.dll
I have never encountered this error before and I've looked it up on Google.
I do have a 64-Bit system.
I do have 16GB of RAM.
Some people said that I need to set the platform target to x64 in my project properties but wont that make it so only 64-bit systems will be able to run this application?
public static string RC4(string input, string key)
{
StringBuilder result = new StringBuilder();
int x, y, j = 0;
int[] box = new int[256];
for (int i = 0; i < 256; i++)
{
box[i] = i;
}
for (int i = 0; i < 256; i++)
{
j = (key[i % key.Length] + box[i] + j) % 256;
x = box[i];
box[i] = box[j];
box[j] = x;
}
for (int i = 0; i < input.Length; i++)
{
y = i % 256;
j = (box[y] + j) % 256;
x = box[y];
box[y] = box[j];
box[j] = x;
result.Append((char)(input[i] ^ box[(box[y] + box[j]) % 256]));
}
return result.ToString(); //This would be the line throwing me the error.
}
Because every second im appending new text from a keyboard hook to a textfile. So lets say I type abc the first second its going to append that, now if I type def the next second its going to append that. This is all happening inside of a timer tick so its really straight forward.
Whenever the textfile reaches around 350 000KB it throws me that error.
There is a limit to how big single objects are allowed to be, even on x64. The actual limit doesn't matter and depends on configuration, but the upshot is that when processing large volumes of data you should read through the data in pieces (usually via Stream), processing chunks at a time. Never try to hold the entire thing in memory at once. This applies equally to input and output.
Now, if you managed to load the entire thing into input, then you got lucky; but StringBuilder is intentionally oversized, so that it doesn't have to keep allocating all the time. You might be able to "fix" your code by telling StringBuilder the correct amount of characters you need in the constructor, but that is only a temporary hack that will let you to process slightly larger data. The real fix is to not attempt to process huge data in a single chunk.

Why isn't this method returning anything?

This is for my internship so I can't give much more context, but this method isn't returning the desired int and causing an Index Out of Bounds exception instead.
The String[] taken into the method is composed of information from a handheld scanner used by my company's shipping department. Its resulting dataAsByteArray is really a Byte[][] so the .Length in the nested If statement will get the number of Bytes of a Bundle entry and then add it to fullBundlePacketSize as long as the resulting sum is less than 1000.
Why less than 1000? The bug I've been tasked with fixing is that some scanners (with older versions of Bluetooth) will only transmit about 1000 bytes of data to the receiver at a time. This method is to find how many bytes can be transmitted without cutting into a bundle entry (originally I had it hard coded to just transmit 1000 bytes at a time and that caused the receiver to get invalid bundle data).
The scanners are running a really old version of Windows CE and trying to debug in VS (2008) just opens an emulator for the device which doesn't help.
I'm sure it's something really simple, but I feel like a new set of eyes looking at it would help, so any help or solutions are greatly appreciated!
private int MaxPacketSizeForFullBundle(string[] data)
{
int fullBundlePacketSize = 0;
var dataAsByteArray = data.Select(s => Encoding.ASCII.GetBytes(s)).ToArray();
for (int i = 0; i < dataAsByteArray.Length; i++)
{
if ((fullBundlePacketSize + dataAsByteArray[i + 1].Length < 1000))
{
fullBundlePacketSize += dataAsByteArray[i].Length;
}
}
return fullBundlePacketSize;
}
Take a look at your loop:
for (int i = 0; i < dataAsByteArray.Length; i++)
{
if ((fullBundlePacketSize + dataAsByteArray[i + 1].Length < 1000))
^^^^^
I suspect you are throwing an exception because you are indexing an array beyond its length.
Do you mean this?
for (int i = 0; i < (dataAsByteArray.Length - 1); i++)
In conjunction is #n8wrl's answer. Your problem was accessing out of range exception. I believe you were attempting to access the actual byte values inside of the values in dataAsByteArray.
string[] data = {"444", "abc445x"};
int x = MaxPacketSizeForFullBundle(data);
dataAsByteArray = {byte[2][]}
byte[2][0] contains {byte[3]} which contains the actual values of each character in the string, so for "444", it would contain 52, 52, 52. You are probably attempting to access these individual values which means that you need to access the deeper nested bytes with byte[i][j] where 0<j<data[i].Length.

Video rate image construction from binary data performance

First things first:
I have a git repo over here that holds the code of my current efforts and an example data set
Background
The example data set holds a bunch of records in Int32 format. Each record is composed of several bit fields that basically hold info on events where an event is either:
The detection of a photon
The arrival of a synchronizing signal
Each Int32 record can be treated like following C-style struct:
struct {
unsigned TimeTag :16;
unsigned Channel :12;
unsigned Route :2;
unsigned Valid :1;
unsigned Reserved :1; } TTTRrecord;
Whether we are dealing with a photon record or a sync event, time
tag will always hold the time of the event relative to the start of
the experiment (macro-time).
If a record is a photon, valid == 1.
If a record is a sync signal or something else, valid == 0.
If a record is a sync signal, sync type = channel & 7 will give either a value indicating start of frame or end of scan line in a frame.
The last relevant bit of info is that Timetag is 16 bit and thus obviously limited. If the Timetag counter rolls over, the rollover counter is incremented. This rollover (overflow) count can easily be obtained from channel overflow = Channel & 2048.
My Goal
These records come in from a high speed scanning microscope and I would like to use these records to reconstruct images from the recorded photon data, preferably at 60 FPS.
To do so, I obviously have all the info:
I can look over all available data, find all overflows, which allows me to reconstruct the sequential macro time for each record (photon or sync).
I also know when the frame started and when each line composing the frame ended (and thus also how many lines there are).
Therefore, to reconstruct a bitmap of size noOfLines * noOfLines I can process the bulk array of records line by line where each time I basically make a "histogram" of the photon events with edges at the time boundary of each pixel in the line.
Put another way, if I know Tstart and Tend of a line, and I know the number of pixels I want to spread my photons over, I can walk through all records of the line and check if the macro time of my photons falls within the time boundary of the current pixel. If so, I add one to the value of that pixel.
This approach works, current code in the repo gives me the image I expect but it is too slow (several tens of ms to calculate a frame).
What I tried already:
The magic happens in the function int[] Renderline (see repo).
public static int[] RenderlineV(int[] someRecords, int pixelduration, int pixelCount)
{
// Will hold the pixels obviously
int[] linePixels = new int[pixelCount];
// Calculate everything (sync, overflow, ...) from the raw records
int[] timeTag = someRecords.Select(x => Convert.ToInt32(x & 65535)).ToArray();
int[] channel = someRecords.Select(x => Convert.ToInt32((x >> 16) & 4095)).ToArray();
int[] valid = someRecords.Select(x => Convert.ToInt32((x >> 30) & 1)).ToArray();
int[] overflow = channel.Select(x => (x & 2048) >> 11).ToArray();
int[] absTime = new int[overflow.Length];
absTime[0] = 0;
Buffer.BlockCopy(overflow, 0, absTime, 4, (overflow.Length - 1) * 4);
absTime = absTime.Cumsum(0, (prev, next) => prev * 65536 + next).Zip(timeTag, (o, tt) => o + tt).ToArray();
long lineStartTime = absTime[0];
int tempIdx = 0;
for (int j = 0; j < linePixels.Length; j++)
{
int count = 0;
for (int i = tempIdx; i < someRecords.Length; i++)
{
if (valid[i] == 1 && lineStartTime + (j + 1) * pixelduration >= absTime[i])
{
count++;
}
}
// Avoid checking records in the raw data that were already binned to a pixel.
linePixels[j] = count;
tempIdx += count;
}
return linePixels;
}
Treating photon records in my data set as an array of structs and addressing members of my struct in an iteration was a bad idea. I could increase speed significantly (2X) by dumping all bitfields into an array and addressing these. This version of the render function is already in the repo.
I also realised I could improve the loop speed by making sure I refer to the .Length property of the array I am running through as this supposedly eliminates bounds checking.
The major speed loss is in the inner loop of this nested set of loops:
for (int j = 0; j < linePixels.Length; j++)
{
int count = 0;
lineStartTime += pixelduration;
for (int i = tempIdx; i < absTime.Length; i++)
{
//if (lineStartTime + (j + 1) * pixelduration >= absTime[i] && valid[i] == 1)
// Seems quicker to calculate the boundary before...
//if (valid[i] == 1 && lineStartTime >= absTime[i] )
// Quicker still...
if (lineStartTime > absTime[i] && valid[i] == 1)
{
// Slow... looking into linePixels[] each iteration is a bad idea.
//linePixels[j]++;
count++;
}
}
// Doing it here is faster.
linePixels[j] = count;
tempIdx += count;
}
Rendering 400 lines like this in a for loop takes roughly 150 ms in a VM (I do not have a dedicated Windows machine right now and I run a Mac myself, I know I know...).
I just installed Win10CTP on a 6 core machine and replacing the normal loops by Parallel.For() increases the speed by almost exactly 6X.
Oddly enough, the non-parallel for loop runs almost at the same speed in the VM or the physical 6 core machine...
Regardless, I cannot imagine that this function cannot be made quicker. I would first like to eke out every bit of efficiency from the line render before I start thinking about other things.
I would like to optimise the function that generates the line to the maximum.
Outlook
Until now, my programming dealt with rather trivial things so I lack some experience but things I think I might consider:
Matlab is/seems very efficient with vectored operations. Could I achieve similar things in C#, i.e. by using Microsoft.Bcl.Simd? Is my case suited for something like this? Would I see gains even in my VM or should I definitely move to real HW?
Could I gain from pointer arithmetic/unsafe code to run through my arrays?
...
Any help would be greatly, greatly appreciated.
I apologize beforehand for the quality of the code in the repo, I am still in the quick and dirty testing stage... Nonetheless, criticism is welcomed if it is constructive :)
Update
As some mentioned, absTime is ordered already. Therefore, once a record is hit that is no longer in the current pixel or bin, there is no need to continue the inner loop.
5X speed gain by adding a break...
for (int i = tempIdx; i < absTime.Length; i++)
{
//if (lineStartTime + (j + 1) * pixelduration >= absTime[i] && valid[i] == 1)
// Seems quicker to calculate the boundary before...
//if (valid[i] == 1 && lineStartTime >= absTime[i] )
// Quicker still...
if (lineStartTime > absTime[i] && valid[i] == 1)
{
// Slow... looking into linePixels[] each iteration is a bad idea.
//linePixels[j]++;
count++;
}
else
{
break;
}
}

Categories