System.OutOfMemoryException. Creating a big matrix - c#

I have a matrix [3,15000]. I need to count covariance matrix for the original matrix and then find its eigenvalues.
This is a part of my code:
double[,] covarianceMatrix = new double[numberOfObjects,numberOfObjects];
for (int n=0; n<numberOfObjects;n++)
{
for (int m=0;m<numberOfObjects;m++)
{
double sum = 0;
for (int k=0; k<TimeAndRepeats[i,1]; k++)
{
sum += originalMatrix[k,n]*originalMatrix[k,m];
}
covarianceMatrix[n,m] = sum/TimeAndRepeats[i,1];
}
}
alglib.smatrixevd(covarianceMatrix,numberOfObjects,1,true,out eigenValues, out eigenVectors);
NumberOfObjects here is about 15000.
When I do my computations for a smaller number of objects everything is Ok, but for all my data I get an exeption.
Is it possible to solve this problem?
I am using macOS, x64
My environment is MonoDevelop

double[,] covarianceMatrix = new double[numberOfObjects,numberOfObjects];
You said that your matrix is [3, 15000] and that numberOfObjects is 15000. By this line of code here, you're creating a matrix of [15000, 15000] of doubles
15000 * 15000 = 225000000 doubles at 8 bytes each: 1,800,000,000 bytes or 1.8GB
That's probably why you are running out of memory.
Edit:
According to this question and this question the size of objects in C# cannot be larger that 2GB. The 1.8GB does not count any additional overhead required to reference the items in the array, so that 1.8GB might actually be > 2GB when everything is accounted for (Can't say without the debugging info, someone with more C# experience might have to set me straight on this). You might consider this workaround if you're trying to work with really large array, since statically allocated arrays can get messy.

When you create covarianceMatrix, you are creatinf an object of 15000*15000 = 225000000
so you need 1800000000 bytes of memory. it is because of that that you have OutofMemoryException

Exception name tells you exactly what's the problem. You could use floats instead of doubles to bisect ammount of memory needed. Other option would be to create some class object for a covariance matrix that would save data in a disk file, though you'd need to implement proper mechanisms to operate on it and the performance would be limited aswell.

Related

Declaring a jagged array succeeds, but out of memory when declaring a multi-dimen array of same size

I get an out of memory exception when running this line of code:
double[,] _DataMatrix = new double[_total_traces, _samples_per_trace];
But this code completes successfully:
double[][] _DataMatrix = new double[_total_traces][];
for (int i = 0; i < _total_traces; i++)
{
_DataMatrix[i] = new double[_samples_per_trace];
}
My first question is why is this happening?
As a followup question, my ultimate goal is to run Principal Component Analysis (PCA) on this data. It's a pretty large dataset. The number of "rows" in the matrix could be a couple million. The number of "columns" will be around 50. I found a PCA library in the Accord.net framework that seems popular. It takes a jagged array as input (which I can successfully create and populate with data), but I run out of memory when I pass it to PCA - I guess because it is passing by value and creating a copy of the data(?). My next thought was to just write my own method to do the PCA so I wouldn't have to copy the data, but I haven't got that far yet. I haven't really had to deal with memory management much before, so I'm open to tips.
Edit: This is not a duplicate of the topic linked below, because that link did not explain how the memory of the two was stored differently and why one would cause memory issues despite them both being the same size.
In 32bits it is complex to have a continuous range of addresses of more than some hundred mb (see for example https://stackoverflow.com/a/30035977/613130). But it is easy to have scattered pieces of memory totalling some hundred mb (or even 1gb)...
The multidimensional array is a single slab of continuous memory, the jagged array is a collection of small arrays (so of small pieces of memory).
Note that in 64bits it is much easier to create an array of the maximum size permitted by .NET (around 2gb or even more... see https://stackoverflow.com/a/2338797/613130)

.Net Dictionary<int,int> out of memory exception at around 6,000,000 entries

I am using a Dictionary<Int,Int> to store the frequency of colors in an image, where the key is the the color (as an int), and the value is the number of times the color has been found in the image.
When I process larger / more colorful images, this dictionary grows very large. I get an out of memory exception at just around 6,000,000 entries. Is this the expected capacity when running in 32-bit mode? If so, is there anything I can do about it? And what might be some alternative methods of keeping track of this data that won't run out of memory?
For reference, here is the code that loops through the pixels in a bitmap and saves the frequency in the Dictionary<int,int>:
Bitmap b; // = something...
Dictionary<int, int> count = new Dictionary<int, int>();
System.Drawing.Color color;
for (int i = 0; i < b.Width; i++)
{
for (int j = 0; j < b.Height; j++)
{
color = b.GetPixel(i, j);
int colorString = color.ToArgb();
if (!count.Keys.Contains(color.ToArgb()))
{
count.Add(colorString, 0);
}
count[colorString] = count[colorString] + 1;
}
}
Edit: In case you were wondering what image has that many different colors in it: http://allrgb.com/images/mandelbrot.png
Edit: I also should mention that this is running inside an asp.net web application using .Net 4.0. So there may be additional memory restrictions.
Edit: I just ran the same code inside a console application and had no problems. The problem only happens in ASP.Net.
Update: Given the OP's sample image, it seems that the maximum number of items would be over 16 million, and apparently even that is too much to allocate when instantiating the dictionary. I see three options here:
Resize the image down to a manageable size and work from that.
Try to convert to a color scheme with fewer color possibilities.
Go for an array of fixed size as others have suggested.
Previous answer: the problem is that you don't allocate enough space for your dictionary. At some point, when it is expanding, you just run out of memory for the expansion, but not necessarily for the new dictionary.
Example: this code runs out of memory at nearly 24 million entries (in my machine, running in 32-bit mode):
Dictionary<int, int> count = new Dictionary<int, int>();
for (int i = 0; ; i++)
count.Add(i, i);
because with the last expansion it is currently using space for the entries already there, and tries to allocate new space for another so many million more, and that is too much.
Now, if we initially allocate space for, say, 40 million entries, it runs without problem:
Dictionary<int, int> count = new Dictionary<int, int>(40000000);
So try to indicate how many entries there will be when creating the dictionary.
From MSDN:
The capacity of a Dictionary is the number of elements that can be added to the Dictionary before resizing is necessary. As elements are added to a Dictionary, the capacity is automatically increased as required by reallocating the internal array.
If the size of the collection can be estimated, specifying the initial capacity eliminates the need to perform a number of resizing operations while adding elements to the Dictionary.
Each dictionary entry holds two 4-byte integers: 8 bytes total. 8 bytes * 6 millions entries is only about 48MB, +/- some space for object overhead, alignment, etc. There's plenty of space in memory for this. .Net provides virtual address space of up to 2 GB per process. 48MB or so shouldn't cause a problem.
I expect what's actually happening here is related to how the dictionary auto-expands and how the garbage collector handles (or doesn't handle) compaction.
First, the auto-expanding part. Last time I checked (back around .Net 2.0*), collections in .Net tended to use arrays internally. They would allocated a reasonably-sized array in the collection constructor (say, 10 items), and then use a doubling algorithm to create additional space whenever the array filled up. All the existing items would have to be copied to the new array, but then the old array could be garbage collected. The garbage collector is pretty reliable about this, and so it means you're left using space for at most 2n - 1 items in the collection.
Now the Garbage Collector compaction part. After a certain size, these arrays end up in a section of memory called the Large Object Heap. Garbage Collection still works here (though less often). What doesn't really work here well is compaction (think memory defragmentation). The physical memory used by the old object will be released, returned to the operating system, and available for other processes. However, the virtual address space in your process... the table that maps program memory offsets to physical memory addresses, will still have the (empty) space reserved.
This is important, because remember: we're working with a rapidly growing object. It's possible for such an object to take up address space far larger than the final size of the object itself. An object grows enough, fast enough, and suddenly you get an OutOfMemoryException, even though your app isn't really using all that much RAM.
The first solution here is allocate enough space in the initial collection for all of your data. This allows you to skip all those re-allocations and copying. Your data will live in a single array, and use only the space you actually asked for. Most collections, including the Dictionary, have an overload for the constructor that allows you to give it the number of items you want the first array to use. Be careful here: you don't need to allocate an item for every pixel in your image. There will be a lot of repeated colors. You only need to allocate enough to have space for each color in your image. If it's only large images that give you problems, and you're almost handling them with six millions records, you might find that 8 million is plenty.
My next suggestion is to group your pixel colors. A human can't tell and doesn't care if two colors are just one bit apart in any of the rgb components. You might go as far as to look at the separate RGB values for each pixel and normalize the pixel so that you only care about changes of more than 5 or so for an R,G,or B value. That would get you from 16.5 million potential colors all the way down to only about 132,000, and the data will likely be more useful, too. That might look something like this:
var colorCounts = new Dictionary<Color, int>(132651);
foreach(Color c in GetImagePixels().Select( c=> Color.FromArgb( (c.R/5) * 5, (c.G/5) * 5, (c.B/5) * 5) )
{
colorCounts[c] += 1;
}
* IIRC, somewhere in a recent or upcoming version of .Net both of these issues are being addressed. One by allowing you to force compaction of the LOH, and the other by using a set of arrays for collection backing stores, rather than trying to keep everything in one big array
The maximum size limit provided by CLR is 2GB
When you run a 64-bit managed application on a 64-bit Windows
operating system, you can create an object of no more than 2 gigabytes
(GB).
You may better use an array.
You may also check this BigArray<T>, getting around the 2GB array size limit
In the 32 bit runtime, the maximum number of items you can have in a Dictionary<int, int> is in the neighborhood of 61.7 million. See my old article for more info.
If you're running in 32 bit mode, then your entire application plus whatever bits of ASP.NET and the underlying machinery is required all have to fit within the memory available to your process: normally 2 GB in the 32-bit runtime.
By the way, a really wacky way to solve your problem (but one I wouldn't recommend unless you're really hurting for memory), would be the following (assuming a 24-bit image):
Call LockBits to get a pointer to the raw image data
Compress the per-scan-line padding by moving the data for each scan line to fill the previous row's padding. You end up with an array of 3-byte values followed by a bunch of empty space (to equal the padding).
Sort the image data. That is, sort the 3-byte values. You'd have to write a custom sort, but it wouldn't be too bad.
Go sequentially through the array and count the number of unique values.
Allocate a 2-dimensional array: int[count,2] to hold the values and their occurrence counts.
Go sequentially through the array again to count occurrences of each unique value and populate the counts array.
I wouldn't honestly suggest using this method. Just got a little laugh when I thought of it.
Try using an array instead. I doubt it will run out of memory. 6 million int array elements is not a big deal.

Extract a vector from a two dimensional array efficiently in C#

I have a very large two dimensional array and I need to compute vector operations on this array. NTerms and NDocs are both very large integers.
var myMat = new double[NTerms, NDocs];
I need to to extract vector columns from this matrix. Currently, I'm using for loops.
col = 100;
for (int i = 0; i < NTerms; i++)
{
myVec[i] = myMat[i, col];
}
This operation is very slow. In Matlab I can extract the vector without the need for iteration, like so:
myVec = myMat[:,col];
Is there any way to do this in C#?
There are no such constructs in C# that will allow you to work with arrays as in Matlab. With the code you already have you can speed up process of vector creation using Task Parallel Library that was introduced in .NET Framework 4.0.
Parallel.For(0, NTerms, i => myVec[i] = myMat[i, col]);
If your CPU has more than one core then you will get some improvement in performance otherwise there will be no effect.
For more examples of how Task Parallel Library could be used with matrixes and arrays you can reffer to the MSDN article Matrix Decomposition.
But I doubt that C# is a good choice when it comes to some serious math calculations.
Some possible problems:
Could it be the way that elements are accessed for multi-dimensional arrays in C#. See this earlier article.
Another problem may be that you are accessing non-contiguous memory - so not much help from cache, and maybe you're even having to fetch from virtual memory (disk) if the array is very large.
What happens to your speed when you access a whole row at a time, instead of a column? If that's significantly faster, you can be 90% sure it's a contiguous-memory issue...

Copying part of a Multidimentional Array into a smaller one

I have two multi-dimentional arrays declared like this:
bool?[,] biggie = new bool?[500, 500];
bool?[,] small = new bool?[100, 100];
I want to copy part of the biggie one into the small. Let’s say I want from the index 100 to 199 horizontally and 100 to 199 vertically.
I have written a simple for statement that goes like this:
for(int x = 0; x < 100; x++)
{
For(int y = 0; y < 100; y++)
{
Small[x,y] = biggie[x+100,y+100];
}
}
I do this A LOT in my code, and this has proven to be a major performance jammer.
Array.Copy only copies single-dimentional arrays, and with multi-dimentional arrays it just considers as if the whole matrix is a single array, putting each row at the end of the other, which won’t allow me to cut a square in the middle of my array.
Is there a more efficient way to do this?
Ps.: I do consider refactoring my code in order not to do this at all, and doing whatever I want to do with the bigger array. Copying matrixes just can’t be painless, the point is that I have already stumbled upon this before, looked for an answer, and got none.
In my experience, there are two ways to do this efficiently:
Use unsafe code and work directly with pointers.
Convert the 2D array to a 1D array and do the necessary arithmetic when you need to access it as a 2D array.
The first approach is ugly and it uses potentially invalid assumptions since 2D arrays are not guaranteed to be laid out contiguously in memory. The upshot to the first approach is that you don't have to change your code that is already using 2D arrays. The second approach is as efficient as the first approach, doesn't make invalid assumptions, but does require updating your code.

Is the size of an array constrained by the upper limit of int (2147483647)?

I'm doing some Project Euler exercises and I've run into a scenario where I have want arrays which are larger than 2,147,483,647 (the upper limit of int in C#).
Sure these are large arrays, but for instance, I can't do this
// fails
bool[] BigArray = new BigArray[2147483648];
// also fails, cannot convert uint to int
ArrayList BigArrayList = new ArrayList(2147483648);
So, can I have bigger arrays?
EDIT:
It was for a Sieve of Atkin, you know, so I just wanted a really big one :D
Anytime you are working with an array this big, you should probably try to find a better solution to the problem. But that being said I'll still attempt to answer your question.
As mentioned in this article there is a 2 GB limit on any object in .Net. For all x86, x64 and IA64.
As with 32-bit Windows operating
systems, there is a 2GB limit on the
size of an object you can create while
running a 64-bit managed application
on a 64-bit Windows operating system.
Also if you define an array too big on the stack, you will have a stack overflow. If you define the array on the heap, it will try to allocate it all in one big continuous block. It would be better to use an ArrayList which has implicit dynamic allocation on the heap. This will not allow you to get past the 2GB, but will probably allow you to get closer to it.
I think the stack size limit will be bigger only if you are using an x64 or IA64 architecture and operating system. Using x64 or IA64 you will have 64-bit allocatable memory instead of 32-bit.
If you are not able to allocate the array list all at once, you can probably allocate it in parts.
Using an array list and adding 1 object at a time on an x64 Windows 2008 machine with 6GB of RAM, the most I can get the ArrayList to is size: 134217728. So I really think you have to find a better solution to your problem that does not use as much memory. Perhaps writing to a file instead of using RAM.
The array limit is, afaik, fixed as int32 even on 64-bit. There is a cap on the maximum size of a single object. However, you could have a nice big jagged array quite easily.
Worse; because references are larger in x64, for ref-type arrays you actually get less elements in a single array.
See here:
I’ve received a number of queries as
to why the 64-bit version of the 2.0
.Net runtime still has array maximum
sizes limited to 2GB. Given that it
seems to be a hot topic of late I
figured a little background and a
discussion of the options to get
around this limitation was in order.
First some background; in the 2.0
version of the .Net runtime (CLR) we
made a conscious design decision to
keep the maximum object size allowed
in the GC Heap at 2GB, even on the
64-bit version of the runtime. This is
the same as the current 1.1
implementation of the 32-bit CLR,
however you would be hard pressed to
actually manage to allocate a 2GB
object on the 32-bit CLR because the
virtual address space is simply too
fragmented to realistically find a 2GB
hole. Generally people aren’t
particularly concerned with creating
types that would be >2GB when
instantiated (or anywhere close),
however since arrays are just a
special kind of managed type which are
created within the managed heap they
also suffer from this limitation.
It should be noted that in .NET 4.5 the memory size limit is optionally removed by the gcAllowVeryLargeObjects flag, however, this doesn't change the maximum dimension size. The key point is that if you have arrays of a custom type, or multi-dimension arrays, then you can now go beyond 2GB in memory size.
You don't need an array that large at all.
When your method runs into resource problems, don't just look at how to expand the resources, look at the method also. :)
Here's a class that uses a 3 MB buffer to calculate primes using the sieve of Eratosthenes. The class keeps track of how far you have calculated primes, and when the range needs to be expanded it creates a buffer to test another 3 million numbers.
It keeps the found prime numbers in a list, and when the range is expanded the previos primes are used to rule out numbers in the buffer.
I did some testing, and a buffer around 3 MB is most efficient.
public class Primes {
private const int _blockSize = 3000000;
private List<long> _primes;
private long _next;
public Primes() {
_primes = new List<long>() { 2, 3, 5, 7, 11, 13, 17, 19 };
_next = 23;
}
private void Expand() {
bool[] sieve = new bool[_blockSize];
foreach (long prime in _primes) {
for (long i = ((_next + prime - 1L) / prime) * prime - _next;
i < _blockSize; i += prime) {
sieve[i] = true;
}
}
for (int i = 0; i < _blockSize; i++) {
if (!sieve[i]) {
_primes.Add(_next);
for (long j = i + _next; j < _blockSize; j += _next) {
sieve[j] = true;
}
}
_next++;
}
}
public long this[int index] {
get {
if (index < 0) throw new IndexOutOfRangeException();
while (index >= _primes.Count) {
Expand();
}
return _primes[index];
}
}
public bool IsPrime(long number) {
while (_primes[_primes.Count - 1] < number) {
Expand();
}
return _primes.BinarySearch(number) >= 0;
}
}
I believe that even within a 64 bit CLR, there's a limit of 2GB (or possibly 1GB - I can't remember exactly) per object. That would prevent you from creating a larger array. The fact that Array.CreateInstance only takes Int32 arguments for sizes is suggestive too.
On a broader note, I suspect that if you need arrays that large you should really change how you're approaching the problem.
I'm very much a newbie with C# (i.e. learning it this week), so I'm not sure of the exact details of how ArrayList is implemented. However, I would guess that as you haven't defined a type for the ArrayList example, then the array would be allocated as an array of object references. This might well mean that you are actually allocating 4-8Gb of memory depending on the architecture.
According to MSDN, the index for array of bytes cannot be greater than 2147483591. For .NET prior to 4.5 it also was a memory limit for an array. In .NET 4.5 this maximum is the same, but for other types it can be up to 2146435071.
This is the code for illustration:
static void Main(string[] args)
{
// -----------------------------------------------
// Pre .NET 4.5 or gcAllowVeryLargeObjects unset
const int twoGig = 2147483591; // magic number from .NET
var type = typeof(int); // type to use
var size = Marshal.SizeOf(type); // type size
var num = twoGig / size; // max element count
var arr20 = Array.CreateInstance(type, num);
var arr21 = new byte[num];
// -----------------------------------------------
// .NET 4.5 with x64 and gcAllowVeryLargeObjects set
var arr451 = new byte[2147483591];
var arr452 = Array.CreateInstance(typeof(int), 2146435071);
var arr453 = new byte[2146435071]; // another magic number
return;
}

Categories