I want to make an array of size 10^9 elements where each elements can be an integer of the same size. I always get an OutOfMemoryException at the initialization line. How can I achieve this?
If this is not possible, please suggest alternative strategies?
Arrays are limited to 2GB in .net 4.0 or earlier, even in a 64 bit process. So with one billion elements, the maximum supported element size is two bytes, but an int is four bytes. So this will not work.
If you want to have a larger collection, you need to write it yourself, backed by multiple arrays.
In .net 4.5 it's possible to avoid this limitation, see Jon Skeet's answer for details.
Assuming you mean int as the element type, you can do this using .NET 4.5, if you're using a 64-bit CLR.
You need to use the <gcAllowVeryLargeObjects> configuration setting. This is not on by default.
If you're using an older CLR or you're on a 32-bit machine, you're out of luck. Of course, if you're on a 64-bit machine but just an old version of the CLR, you could encapsulate your "one large array" into a separate object which has a list of smaller arrays... you could even implement IList<int> that way so that most code wouldn't need to know that you weren't really using a single array.
(As noted in comments, you'll still only be able to create an array with 231 elements; but your requirement of 109 is well within this.)
I think you shouldn't load all this data into memory, store it somewhere in a file, and make a class that could work as an array but actually reads and writes data from and in the file
this is the general idea (this won't work as it is of course, plus you'll have to convert your int values into byte[] array before writing it and some other things)
public class FileArray
{
Stream s;
public this[int index]
{
get { s.Position = index * 4; return s.Read(); }
set { s.Position = index * 4; s.Write(value); }
}
}
that way you would have something that works just like an array, but the data would be stored on your hard drive
Related
I am writing .NET applications running on Windows Server 2016 that does an http get on a bunch of pieces of a large file. This dramatically speeds up the download process since you can download them in parallel. Unfortunately, once they are downloaded, it takes a fairly long time to pieces them all back together.
There are between 2-4k files that need to be combined. The server this will run on has PLENTLY of memory, close to 800GB. I thought it would make sense to use MemoryStreams to store the downloaded pieces until they can be sequentially written to disk, BUT I am only able to consume about 2.5GB of memory before I get an System.OutOfMemoryException error. The server has hundreds of GB available, and I can't figure out how to use them.
MemoryStreams are built around byte arrays. Arrays cannot be larger than 2GB currently.
The current implementation of System.Array uses Int32 for all its internal counters etc, so the theoretical maximum number of elements is Int32.MaxValue.
There's also a 2GB max-size-per-object limit imposed by the Microsoft CLR.
As you try to put the content in a single MemoryStream the underlying array gets too large, hence the exception.
Try to store the pieces separately, and write them directly to the FileStream (or whatever you use) when ready, without first trying to concatenate them all into 1 object.
According to the source code of the MemoryStream class you will not be able to store more than 2 GB of data into one instance of this class.
The reason for this is that the maximum length of the stream is set to Int32.MaxValue and the maximum index of an array is set to 0x0x7FFFFFC7 which is 2.147.783.591 decimal (= 2 GB).
Snippet MemoryStream
private const int MemStreamMaxLength = Int32.MaxValue;
Snippet array
// We impose limits on maximum array lenght in each dimension to allow efficient
// implementation of advanced range check elimination in future.
// Keep in sync with vm\gcscan.cpp and HashHelpers.MaxPrimeArrayLength.
// The constants are defined in this method: inline SIZE_T MaxArrayLength(SIZE_T componentSize) from gcscan
// We have different max sizes for arrays with elements of size 1 for backwards compatibility
internal const int MaxArrayLength = 0X7FEFFFFF;
internal const int MaxByteArrayLength = 0x7FFFFFC7;
The question More than 2GB of managed memory has already been discussed long time ago on the microsoft forum and has a reference to a blog article about BigArray, getting around the 2GB array size limit there.
Update
I suggest to use the following code which should be able to allocate more than 4 GB on a x64 build but will fail < 4 GB on a x86 build
private static void Main(string[] args)
{
List<byte[]> data = new List<byte[]>();
Random random = new Random();
while (true)
{
try
{
var tmpArray = new byte[1024 * 1024];
random.NextBytes(tmpArray);
data.Add(tmpArray);
Console.WriteLine($"{data.Count} MB allocated");
}
catch
{
Console.WriteLine("Further allocation failed.");
}
}
}
As has already been pointed out, the main problem here is the nature of MemoryStream being backed by a byte[], which has fixed upper size.
The option of using an alternative Stream implementation has been noted. Another alternative is to look into "pipelines", the new IO API. A "pipeline" is based around discontiguous memory, which means it isn't required to use a single contiguous buffer; the pipelines library will allocate multiple slabs as needed, which your code can process. I have written extensively on this topic; part 1 is here. Part 3 probably has the most code focus.
Just to confirm that I understand your question: you're downloading a single very large file in multiple parallel chunks and you know how big the final file is? If you don't then this does get a bit more complicated but it can still be done.
The best option is probably to use a MemoryMappedFile (MMF). What you'll do is to create the destination file via MMF. Each thread will create a view accessor to that file and write to it in parallel. At the end, close the MMF. This essentially gives you the behavior that you wanted with MemoryStreams but Windows backs the file by disk. One of the benefits to this approach is that Windows manages storing the data to disk in the background (flushing) so you don't have to, and should result in excellent performance.
All, I have the following Append which I am performing when I am producing a single line for a fixed text file
formattedLine.Append(this.reversePadding ?
strData.PadLeft(this.maximumLength) :
strData.PadRight(this.maximumLength));
This particular exception happens on the PadLeft() where this.maximumLength = 1,073,741,823 [a field length of an NVARCHAR(MAX) gathered from SQL Server]. formattedLine = "101102AA-1" at the time of exception so why is this happening. I should have a maximum allowed length of 2,147,483,647?
I am wondering if https://stackoverflow.com/a/1769472/626442 be the answer here - however, I am managing any memory with the appropriate Dispose() calls on any disposable objects and using block where possible.
Note. This fixed text export is being done on a background thread.
Thanks for your time.
This particular exception happens on the PadLeft() where this.maximumLength = 1,073,741,823
Right. So you're trying to create a string with over a billion characters in.
That's not going to work, and I very much doubt that it's what you really want to do.
Note that each char in .NET is two bytes, and also strings in .NET are null-terminated... and have some other fields beyond the data (the length, for one). That means you'd need at least 2147483652 bytes + object overhead, which pushes you over the 2GB-per-object limit.
If you're running on a 64-bit version of Windows, in .NET 4.5, there's a special app.config setting of <gcAllowVeryLargeObjects> that allows arrays bigger than 2GB. However, I don't believe that will change your particular use case:
Using this element in your application configuration file enables arrays that are larger than 2 GB in size, but does not change other limits on object size or array size:
The maximum number of elements in an array is UInt32MaxValue.
The maximum index in any single dimension is 2,147,483,591 (0x7FFFFFC7) for byte arrays and arrays of single-byte structures, and 2,146,435,071 (0X7FEFFFFF) for other types.
The maximum size for strings and other non-array objects is unchanged.
What would you want to do with such a string after creating it, anyway?
In order to allocate memory for this operation, the OS must find contiguous memory that is large enough to perform the operation.
Memory fragmentation can cause that to be impossible, especially when using a 32-bit .NET implementation.
I think there might be a better approach to what you are trying to accomplish. Presumably, this StringBuilder is going to be written to a file (that's what it sounds like from your description), and apparently, you are also potentially dealing with large (huge) database records.
You might consider a streaming approach, that wont require allocating such a huge block of memory.
To accomplish this you might investigate the following:
The SqlDataReader class exposes a GetChars() method, that allows you to read a chunk of a single large record.
Then, instead of using a StringBuilder, perhaps using a StreamWriter ( or some other TextWriter derived class) to write each chunk to the output.
This will only require having one buffer-full of the record in your application's memory space at one time. Good luck!
I'm trying to optimize some code where I have a large number of arrays containing structs of different size, but based on the same interface. In certain cases the structs are larger and hold more data, othertimes they are small structs, and othertimes I would prefer to keep null as a value to save memory.
My first question is. Is it a good idea to do something like this? I've previously had an array of my full data struct, but when testing mixing it up I would virtually be able to save lots of memory. Are there any other downsides?
I've been trying out different things, and it seams to work quite well when making an array of a common interface, but I'm not sure I'm checking the size of the array correctly.
To simplified the example quite a bit. But here I'm adding different structs to an array. But I'm unable to determine the size using the traditional Marshal.SizeOf method. Would it be correct to simply iterate through the collection and count the sizeof for each value in the collection?
IComparable[] myCollection = new IComparable[1000];
myCollection[0] = null;
myCollection[1] = (int)1;
myCollection[2] = "helloo world";
myCollection[3] = long.MaxValue;
System.Runtime.InteropServices.Marshal.SizeOf(myCollection);
The last line will throw this exception:
Type 'System.IComparable[]' cannot be marshaled as an unmanaged structure; no meaningful size or offset can be computed.
Excuse the long post:
Is this an optimal and usable solution?
How can I determine the size
of my array?
I may be wrong but it looks to me like your IComparable[] array is a managed array? If so then you can use this code to get the length
int arrayLength = myCollection.Length;
If you are doing platform interop between C# and C++ then the answer to your question headline "Can I find the length of an unmanaged array" is no, its not possible. Function signatures with arrays in C++/C tend to follow the following pattern
void doSomeWorkOnArrayUnmanaged(int * myUnmanagedArray, int length)
{
// Do work ...
}
In .NET the array itself is a type which has some basic information, such as its size, its runtime type etc... Therefore we can use this
void DoSomeWorkOnManagedArray(int [] myManagedArray)
{
int length = myManagedArray.Length;
// Do work ...
}
Whenever using platform invoke to interop between C# and C++ you will need to pass the length of the array to the receiving function, as well as pin the array (but that's a different topic).
Does this answer your question? If not, then please can you clarify
Optimality always depends on your requirements. If you really need to store many elements of different classes/structs, your solution is completely viable.
However, I guess your expectations on the data structure might be misleading: Array elements are per definition all of the same size. This is even true in your case: Your array doesn't store the elements themselves but references (pointers) to them. The elements are allocated somewhere on the VM heap. So your data structure actually goes like this: It is an array of 1000 pointers, each pointer pointing to some data. The size of each particular element may of course vary.
This leads to the next question: The size of your array. What are you intending to do with the size? Do you need to know how many bytes to allocate when you serialize your data to some persistent storage? This depends on the serialization format... Or do you need just a rough estimate on how much memory your structure is consuming? In the latter case you need to consider the array itself and the size of each particular element. The array which you gave in your example consumes approximately 1000 times the size of a reference (should be 4 bytes on a 32 bit machine and 8 bytes on a 64 bit machine). To compute the sizes of each element, you can indeed iterate over the array and sum up the size of the particular elements. Please be aware that this is only an estimate: The virtual machine adds some memory management overhead which is hard to determine exactly...
Today I noticed that C#'s String class returns the length of a string as an Int. Since an Int is always 32-bits, no matter what the architecture, does this mean that a string can only be 2GB or less in length?
A 2GB string would be very unusual, and present many problems along with it. However, most .NET api's seem to use 'int' to convey values such as length and count. Does this mean we are forever limited to collection sizes which fit in 32-bits?
Seems like a fundamental problem with the .NET API's. I would have expected things like count and length to be returned via the equivalent of 'size_t'.
Seems like a fundamental problem with
the .NET API...
I don't know if I'd go that far.
Consider almost any collection class in .NET. Chances are it has a Count property that returns an int. So this suggests the class is bounded at a size of int.MaxValue (2147483647). That's not really a problem; it's a limitation -- and a perfectly reasonable one, in the vast majority of scenarios.
Anyway, what would the alternative be? There's uint -- but that's not CLS-compliant. Then there's long...
What if Length returned a long?
An additional 32 bits of memory would be required anywhere you wanted to know the length of a string.
The benefit would be: we could have strings taking up billions of gigabytes of RAM. Hooray.
Try to imagine the mind-boggling cost of some code like this:
// Lord knows how many characters
string ulysses = GetUlyssesText();
// allocate an entirely new string of roughly equivalent size
string schmulysses = ulysses.Replace("Ulysses", "Schmulysses");
Basically, if you're thinking of string as a data structure meant to store an unlimited quantity of text, you've got unrealistic expectations. When it comes to objects of this size, it becomes questionable whether you have any need to hold them in memory at all (as opposed to hard disk).
Correct, the maximum length would be the size of Int32, however you'll likely run into other memory issues if you're dealing with strings larger than that anyway.
At some value of String.length() probably about 5MB its not really practical to use String anymore. String is optimised for short bits of text.
Think about what happens when you do
msString += " more chars"
Something like:
System calculates length of myString plus length of " more chars"
System allocates that amount of memory
System copies myString to new memory location
System copies " more chars" to new memory location after last copied myString char
The original myString is left to the mercy of the garbage collector.
While this is nice and neat for small bits of text its a nightmare for large strings, just finding 2GB of contiguous memory is probably a showstopper.
So if you know you are handling more than a very few MB of characters use one of the *Buffer classes.
It's pretty unlikely that you'll need to store more than two billion objects in a single collection. You're going to incur some pretty serious performance penalties when doing enumerations and lookups, which are the two primary purposes of collections. If you're dealing with a data set that large, There is almost assuredly some other route you can take, such as splitting up your single collection into many smaller collections that contain portions of the entire set of data you're working with.
Heeeey, wait a sec.... we already have this concept -- it's called a dictionary!
If you need to store, say, 5 billion English strings, use this type:
Dictionary<string, List<string>> bigStringContainer;
Let's make the key string represent, say, the first two characters of the string. Then write an extension method like this:
public static string BigStringIndex(this string s)
{
return String.Concat(s[0], s[1]);
}
and then add items to bigStringContainer like this:
bigStringContainer[item.BigStringIndex()].Add(item);
and call it a day. (There are obviously more efficient ways you could do that, but this is just an example)
Oh, and if you really really really do need to be able to look up any arbitrary object by absolute index, use an Array instead of a collection. Okay yeah, you use some type safety, but you can index array elements with a long.
The fact that the framework uses Int32 for Count/Length properties, indexers etc is a bit of a red herring. The real problem is that the CLR currently has a max object size restriction of 2GB.
So a string -- or any other single object -- can never be larger than 2GB.
Changing the Length property of the string type to return long, ulong or even BigInteger would be pointless since you could never have more than approx 2^30 characters anyway (2GB max size and 2 bytes per character.)
Similarly, because of the 2GB limit, the only arrays that could even approach having 2^31 elements would be bool[] or byte[] arrays that only use 1 byte per element.
Of course, there's nothing to stop you creating your own composite types to workaround the 2GB restriction.
(Note that the above observations apply to Microsoft's current implementation, and could very well change in future releases. I'm not sure whether Mono has similar limits.)
In versions of .NET prior to 4.5, the maximum object size is 2GB. From 4.5 onwards you can allocate larger objects if gcAllowVeryLargeObjects is enabled. Note that the limit for string is not affected, but "arrays" should cover "lists" too, since lists are backed by arrays.
Even in x64 versions of Windows I got hit by .Net limiting each object to 2GB.
2GB is pretty small for a medical image. 2GB is even small for a Visual Studio download image.
If you are working with a file that is 2GB, that means you're likely going to be using a lot of RAM, and you're seeing very slow performance.
Instead, for very large files, consider using a MemoryMappedFile (see: http://msdn.microsoft.com/en-us/library/system.io.memorymappedfiles.memorymappedfile.aspx). Using this method, you can work with a file of nearly unlimited size, without having to load the whole thing in memory.
Anybody know what the max number of items in a List is?
How do I increase that size? Or is there a collection that takes infinite items? (as much as would fit in memory, that is)
EDIT:
I get an out of memory exception when Count = 134217728 in a list of ints. got 3Gb of RAM of which 2.2 are in use. Sound normal
List<T> will be limited to the max of an array, which is 2GB (even in x64). If that isn't enough, you're using the wrong type of data storage. You can save a lot of overhead by starting it the right size, though - by passing an int to the constructor.
Re your edit - with 134217728 x Int32, that is 512MB. Remember that List<T> uses a doubling algorithm; if you are drip-feeding items via Add (without allocating all the space first) it is going to try to double to 1GB (on top of the 512MB you're already holding, the rest of your app, and of course the CLR runtime and libraries). I'm assuming you're on x86, so you already have a 2GB limit per process, and it is likely that you have fragmented your "large object heap" to death while adding items.
Personally, yes, it sounds about right to start getting an out-of-memory at this point.
Edit: in .NET 4.5, arrays larger than 2GB are allowed if the <gcAllowVeryLargeObjects> switch is enabled. The limit then is 2^31 items. This might be useful for arrays of references (8 bytes each in x64), or an array of large structs.
The List limit is 2.1 Billion objects or the size of your memory which ever is hit first.
It's limited only by memory.
edit: or not, 2Gb's is the limit! This is quite interesting, BigArray, getting around the 2GB array size limit
On a x64 Machine, using .Net Framework 4 (Not the Client Profile), compiling for Any CPU in Release mode, I could chew up all the available memory. My process is now 5.3GB and I've consumed all available memory (8GB) on my PC. It's actually a Server 2008 R2 x64.
I used a custom Collection class based on CollectionBase to store 61,910,847 instances of the following class:
public class AbbreviatedForDrawRecord {
public int DrawId { get; set; }
public long Member_Id { get; set; }
public string FactorySerial { get; set; }
public AbbreviatedForDrawRecord() {
}
public AbbreviatedForDrawRecord(int drawId, long memberId, string factorySerial) {
this.DrawId = drawId;
this.Member_Id = memberId;
this.FactorySerial = factorySerial;
}
}
The List will dynamically grow itself to accomodate as many items as you want to include - until you exceed available memory!
From MSDN documentation:
If Count already equals Capacity, the capacity of the List is increased by automatically reallocating the internal array, and the existing elements are copied to the new array before the new element is added.
If Count is less than Capacity, this method is an O(1) operation. If the capacity needs to be increased to accommodate the new element, this method becomes an O(n) operation, where n is Count.
The interface defines Count and IndexOf etc as type int so I would assume that int.MaxValue or 2,147,483,647 is the most items you could stick in a list.
Really got to question why on earth you would need that many, there is likely to be a more sensible approach to managing the data.