I've read in a few places now that the maximum instance size for a struct should be 16 bytes.
But I cannot see where that number (16) comes from.
Browsing around the net, I've found some who suggest that it's an approximate number for good performance but Microsoft talk like it is a hard upper limit. (e.g. MSDN )
Does anyone have a definitive answer about why it is 16 bytes?
It is just a performance rule of thumb.
The point is that because value types are passed by value, the entire size of the struct has to be copied if it is passed to a function, whereas for a reference type, only the reference (4 bytes) has to be copied. A struct might save a bit of time though because you remove a layer of indirection, so even if it is larger than these 4 bytes, it might still be more efficient than passing a reference around. But at some point, it becomes so big that the cost of copying becomes noticeable. And a common rule of thumb is that this typically happens around 16 bytes. 16 is chosen because it's a nice round number, a power of two, and the alternatives are either 8 (which is too small, and would make structs almost useless), or 32 (at which point the cost of copying the struct is already problematic if you're using structs for performance reasons)
But ultimately, this is performance advice. It answers the question of "which would be most efficient to use? A struct or a class?". But it doesn't answer the question of "which best maps to my problem domain".
Structs and classes behave differently. If you need a struct's behavior, then I would say to make it a struct, no matter the size. At least until you run into performance problems, profile your code, and find that your struct is a problem.
your link even says that it is just a matter of performance:
If one or more of these conditions are
not met, create a reference type
instead of a structure. Failure to
adhere to this guideline can
negatively impact performance.
If a structure is not larger than 16 bytes, it can be copied with a few simple processor instructions. If it's larger, a loop is used to copy the structure.
As long as the structure is not larger than 16 bytes, the processor has to do about the same work when copying the structure as when copying a reference. If the structure is larger, you lose the performance benefit of having a structure, and you should generally make it a class instead.
The size figure comes largely from the amount of time it takes to copy the struct on the stack, for example to pass to a method. Anything much larger than this and you are consuming a lot of stack space and CPU cycles just copying data - when a reference to an immutable class (even with dereferencing) could be a lot more efficient.
As other answers have noted, the per-byte cost of copying a structure which is larger than a certain threshold (which was 16 bytes in earlier versions of .NET, but has since grown to 20-24) is significantly greater than the per-byte cost of a smaller structure. It's important to note, however, that copying a structure of any particular size once will be a fraction of the cost creating a new class object instance of that same size. If a struct would be copied many times in its lifetime, and the value-type semantics are not particularly required, a class object may be preferable. If, however, a struct would end up being copied only once or twice, such copying would likely be cheaper than the creation of a new class object. The break-even number of copies where a class object would become cheaper varies with the size of the struct/object in question, but is much higher for things that are below the "cheap copying" threshold, than for things above.
BTW, another point worth mentioning is that the cost of passing a struct as a ref parameter is independent of the size of the struct. In many cases, optimal performance may be achieved by using value types and passing them by ref. Once must be careful to avoid using properties or readonly fields of structure types, however, since accessing either of those will create an implicit temporary copy of the struct in question.
Here is a scenario where structs can exhibit superior performance:
When you need to create 1000s of instances. In this case if you were to use a class, you would first need to allocate the array to hold the 1000s of instances and then in a loop allocate each instance. But instead if you were to use structs, then the 1000s of instances become available immediately after you allocate the array that is going to hold them.
In addition, structs are extremely useful when you need to do interop or want to dip into unsafe code for performance reasons.
As always there is a trade-off and one needs to analyze what they are doing to determine the best way to implement something.
ps: This scenario came into play when I was working with LIDAR data where there could be millions of points representing x,y,z and other attributes for ground data. This data needed to be loaded into memory for some intensive computation to output all kinds of stuff.
I think the 16 bytes is just a rule of thumb from a performance point of view. An object in .NET uses at least 24 bytes of memory (IIRC), so if you made your structure much larger than that, a reference type would be preferable.
I can't think of any reason why they chose 16 bytes specifically.
Related
Let's start small, say I need to store a const value of 200, should I always be using a unsigned byte for this?
This is just a minimal thing I guess. But what about structs? Is it wise to build up my structs so that it is dividable by 32 on a 32 bit system? Let's say I need to iterate over a very large array of structs, does it matter much if the struct consists of 34 bits or 64? I would think it gains a lot if I could squeeze off 2 bits from the 34 bit struct?
Or does all this make unnecessary overhead and am I better off replacing all my bits and shorts to ints inside this struct so the CPU does not have to "go looking" for the right memory block?
This is a strong processor implementation detail, the CLR and the jitter already do a lot of work to ensure that your data types are optimal to get the best perf out of the program. There is for example never a case where a struct ever occupies 34 bits, the CLR design choices already ensure that you get a running start on using types that work well on modern processors.
Structs are laid-out to be optimal and that involves alignment choices that depend on the data type. An int for example will always be aligned to an offset that's a multiple of 4. Which gives the processor an easy time to read the int, it doesn't have to multiplex the misaligned bytes back into an int and avoids a scenario where the value straddles a cpu cache line and needs to be glued back together from multiple memory bus reads. Some processors event treat misaligned reads and writes as fatal errors, one of the reasons you don't have an Itanium in your machine.
So if you have a struct that has a byte and an int then you'll end up with a data type that takes 8 bytes which doesn't use 3 of the bytes, the ones between the byte and the int. These unused bytes are called padding. There can also be padding at the end of a struct to ensure that alignment is still optimal when you put them in an array.
Declaring a single variable as a byte is okay, Intel/AMD processors take the same amount of time to read/write one as a 32-bit int. But using short is not okay, that requires an extra byte in the cpu instruction (a size override prefix) and can cost an extra cpu cycle. In practice you don't often save any memory because of the alignment rule. Using byte only buys you something if it can be combined with another byte. An array of bytes is fine, a struct with multiple byte members is fine. Your example is not, it works just as well when you declare it int.
Using types smaller than an int can be awkward in C# code, the MSIL code model is int-based. Basic operators like + and - are only defined for int and larger, there is no operator for smaller types. So you end up having to use a cast to truncate the result back to a smaller size. The sweet spot is int.
Wow, it really depends on a bunch of stuff. Are you concerned about performance or memory? If it's performance you are generally better off staying with the "natural" word size alignment. So for example if you are using a 64-bit processor using 64-bit ints, aligned on 64-bit boundaries provides the best performance. I don't think C# makes any guarantees about this type of thing (it's meant to remain abstract from the hardware).
That said there is a informal rule that says "Avoid the sin of premature optimization". This is particularly true in C#. If you aren't having a performance or memory issue, don't worry about it.
If you find you are having a performance problem, use a profiler to determine where the problem actually is (it might not be where you think). If it's a memory problem determine the objects consuming the most memory and determine where you can optimize (as per your example using a byte rather than an int or short, if possible).
If you really have to worry about such details you might want to consider using C++, where you can better control memory usage (for example you can allocate large blocks of memory without it being initialized), access bitfields, etc.
I am attempting to ascertain the maximum sizes (in RAM) of a List and a Dictionary. I am also curious as to the maximum number of elements / entries each can hold, and their memory footprint per entry.
My reasons are simple: I, like most programmers, am somewhat lazy (this is a virtue). When I write a program, I like to write it once, and try to future-proof it as much as possible. I am currently writing a program that uses Lists, but noticed that the iterator wants an integer. Since the capabilities of my program are only limited by available memory / coding style, I'd like to write it so I can use a List with Int64s or possibly BigInts (as the iterators). I've seen IEnumerable as a possibility here, but would like to find out if I can just stuff a Int64 into a Dictionary object as the key, instead of rewriting everything. If I can, I'd like to know what the cost of that might be compared to rewriting it.
My hope is that should my program prove useful, I need only hit recompile in 5 years time to take advantage of the increase in memory.
Is it specified in the documentation for the class? No, then it's unspecified.
In terms of current implementations, there's no maximum size in RAM in the classes themselves, if you create a value type that's 2MB in size, push a few thousand into a list, and receive an out of memory exception, that's nothing to do with List<T>.
Internally, List<T>s workings would prevent it from ever having more than 2billion items. It's harder to come to a quick answer with Dictionary<TKey, TValue>, since the way things are positioned within it is more complicated, but really, if I was looking at dealing with a billion items (if a 32-bit value, for example, then 4GB), I'd be looking to store them in a database and retrieve them using data-access code.
At the very least, once you're dealing with a single data structure that's 4GB in size, rolling your own custom collection class no longer counts as reinventing the wheel.
I am using a concurrentdictionary to rank 3x3 patterns in half a million games of go. Obviously there are a lot of possible patterns. With C# 4.0 the concurrentdictionary goes out of memory at around 120 million objects. It is using 8GB at that time (on a 32GB machine) but wants to grow way too much I think (tablegrowths happen in large chunks with concurrentdictionary). Using a database would slow me down at least a hundredfold I think. And the process is taking 10 hours already.
My solution was to use a multiphase solution, actually doing multiple passes, one for each subset of patterns. Like one pass for odd patterns and one for even patterns. When using more objects no longer fails I can reduce the amount of passes.
C# 4.5 adds support for larger arraysin 64bit by using unsigned 32bit pointers for arrays
(the mentioned limit goes from 2 billion to 4 billion). See also
http://msdn.microsoft.com/en-us/library/hh285054(v=vs.110).aspx. Not sure which objects will benefit from this, List<> might.
I think you have bigger issues to solve before even wondering if a Dictionary with an int64 key will be useful in 5 or 10 years.
Having a List or Dictionary of 2e+10 elements in memory (int32) doesn't seem to be a good idea, never mind 9e+18 elements (int64). Anyhow the framework will never allow you to create a monster that size (not even close) and probably never will. (Keep in mind that a simple int[int.MaxValue] array already far exceeds the framework's limit for memory allocation of any given object).
And the question remains: Why would you ever want your application to hold in memory a list of so many items? You are better of using a specialized data storage backend (database) if you have to manage that amount of information.
My question is basically about how the C# compiler handles memory allocation of small datatypes. I do know that for example operators like add are defined on int and not on short and thus computations will be executed as if the shorts are int members.
Assuming the following:
There's no business logic/validation logic associated with the choice of short as a datatype
We're not doing anything with unsafe code
Does using the short datatype wherever possible reduce the memory footprint of my application and is it advisable to do so? Or is using short and the like not worth the effort as the compiler allocates the full memory ammount of a int32 for example and adds additional casts when doing arithmetic.
Any links on the supposed runtime performance impact would be greatly appreciated.
Related questions:
Why should I use int instead of a byte or short in C#
Integer summing blues, short += short problem
From a memory-only perspective, using short instead of int will be better. The simple reason is that a short variable needs only half the size of an int variable in memory. The CLR does not expand short to int in memory.
Nevertheless this reduced memory consumption might and probably will decrease runtime performance of your application significantly. All modern CPUs do perform much better with 32bit numbers than with 16bit numbers. Additionally in many cases the CLR will have to convert between short and int when e.g. calling methods that take int arguments. There are many other performance considerations you have to take before going this way.
I would only change this at very dedicated locations and modules of your application and only if you really encounter measurable memory shortages.
In some cases you can of course switch from int to short easily without hurting performance. One example is a giant array of ints all of which do also fit to shorts.
It makes sense in terms of memory usage only if you have in your program very large arrays (or collections built on arrays like List<>) of these types, or arrays of packed structs composed of same. By 'large' I mean that the total memory footprint of these arrays is a large percentage of the working set and a large percentage of the available memory. As for advisability, I'd venture that it is inadvisable to use short types unless the data your program operates on is explicitly specified in terms of short etc., or the volume of data runs into gigabytes.
In short - yes. However you should also take care about memory alignment. You may find Mastering C# structs and Memory alignment of classes in c#? useful
Depends on what you are using the shorts for. Also, are you allocating that many variables that the memory footprint is going to matter?
If this program was going to be used on a mobile device or a device with memory limitations then I might be concerned. However, most machines today are running at least 1-2 gb of ram an have pretty decent dual core processors. Also, most mobile devices today are becoming beast mini computers. If your declaring so much that that type of machine would start to die then you have a problem in your code already.
However, in response to the question. It can matter in memory limited machines if your declaring a lot of 4 byte variables when you only need a 2 byte variable to fill them then you should probable use the short.
If you preforming complicated calculations, square roots and such, or high value calculations. Then you should probably use variables with more bytes so you don't risk losing any data. Just declare what you need when you need it. Zero it out if your done with it to make sure C# cleans it up, if your worried about memory limitations.
When it comes to the machine language, at the registry level, I think it is better to align with the registry size as most of the move and arithmetic functions are done at the registry boundaries. If the machine has 32-bit registry set, it is better to align with 32-bit. If the machine has 16-bit registries for I/O operations, it is a good practice to align with 16-bit to reduce the number of operations in moving the content.
In Java, a byte or short is stored in the JVM's 'natural' word length, i.e. for the most part, 32-bits. An exception would be an array of bytes, where each byte occupies a byte of memory.
Does the CLR do the same thing?
If it does do this, in what situations are there exceptions to this? E.g. How much memory does this occupy?
struct MyStruct
{
short s1;
short s2;
}
Although it's not really intended for this purpose, and may at times give slightly different answers (because it's thinking about things from a Marshalling point of view, not a CLR internal structures point of view), Marhsal.SizeOf can give an answer:
System.Runtime.InteropServices.Marshal.SizeOf(typeof(MyStruct))
In this case, it answers 4. (i.e. the shorts are being stored as shorts). Please note that this is an implementation detail, so the answer today should not be relied upon for any purpose.
It is actually the job of the JIT compiler to assign the memory layout of classes and structures. The actual layout is not discoverable in any way (beyond looking at the generated machine code), a [StructLayout] attribute is required to marshal the object to a known layout. The JIT takes advantage of this by re-ordering fields to keep them aligned and minimize the allocation size.
There will be no surprises in the struct you quoted, the fields are already aligned on any current CPU architecture that can execute managed code. The size of value types is guaranteed by the CLI, a short always takes 16 bits. Your structure will take 32 bits.
The CLR does to some extent pack members of the same size. It does pack arrays, and I would expect your example structure to take up four bytes on any platform.
Exactly which types are packed and how depends on the CLR implementation and the current platform. The rules are not strictly defined, so that the CLR has some freedom to rearrange the members to store them in the most efficient manner.
Since RAM seems to be the new disk, and since that statement also means that access to memory is now considered slow similarly to how disk access has always been, I do want to maximize locality of reference in memory for high performance applications. For example, in a sorted index, I want adjacent values to be close (unlike say, in a hashtable), and I want the data the index is pointing to close by, too.
In C, I can whip up a data structure with a specialized memory manager, like the developers of the (immensely complex) Judy array did. With direct control over the pointers, they even went so far as to encode additional information in the pointer value itself. When working in Python, Java or C#, I am deliberately one (or more) level(s) of abstraction away from this type of solution and I'm entrusting the JIT compilers and optimizing runtimes with doing clever tricks on the low levels for me.
Still, I guess, even at this high level of abstraction, there are things that can be semantically considered "closer" and therefore are likely to be actually closer at the low levels. For example, I was wondering about the following (my guess in parentheses):
Can I expect an array to be an adjacent block of memory (yes)?
Are two integers in the same instance closer than two in different instances of the same class (probably)?
Does an object occupy a contigous region in memory (no)?
What's the difference between an array of objects with only two int fields and a single object with two int[] fields? (this example is probably Java specific)
I started wondering about these in a Java context, but my wondering has become more general, so I'd suggest to not treat this as a Java question.
In .NET, elements of an array are certainly contiguous. In Java I'd expect them to be in most implementations, but it appears not to be guaranteed.
I think it's reasonable to assume that the memory used by an instance for fields is in a single block... but don't forget that some of those fields may be references to other objects.
For the Java array part, Sun's JNI documentation includes this comment, tucked away in a discussion about strings:
For example, the Java virtual machine may not store arrays contiguously.
For your last question, if you have two int[] then each of those arrays will be a contiguous block of memory, but they could be very "far apart" in memory. If you have an array of objects with two int fields, then each object could be a long way from each other, but the two integers within each object will be close together. Potentially more importantly, you'll end up taking a lot more memory with the "lots of objects" solution due to the per-object overhead. In .NET you could use a custom struct with two integers instead, and have an array of those - that would keep all the data in one big block.
I believe that in both Java and .NET, if you allocate a lot of smallish objects in quick succession within a single thread then those objects are likely to have good locality of reference. When the GC compacts a heap, this may improve - or it may potentially become worse, if a heap with
A B C D E
is compacted to
A D E B
(where C is collected) - suddenly A and B, which may have been "close" before, are far apart. I don't know whether this actually happens in any garbage collector (there are loads around!) but it's possible.
Basically in a managed environment you don't usually have as much control over locality of reference as you do in an unmanaged environment - you have to trust that the managed environment is sufficiently good at managing it, and that you'll have saved enough time by coding to a higher level platform to let you spend time optimising elsewhere.
First, your title is implying C#. "Managed code" is a term coined by Microsoft, if I'm not mistaken.
Java primitive arrays are guaranteed to be a continuous block of memory. If you have a
int[] array = new int[4];
you can from JNI (native C) get a int *p to point to the actual array. I think this goes for the Array* class of containers as well (ArrayList, ArrayBlockingQueue, etc).
Early implementations of the JVM had objects as contiuous struct, I think, but this cannot be assumed with newer JVMs. (JNI abstracts away this).
Two integers in the same object will as you say probably be "closer", but they may not be. This will probably vary even using the same JVM.
An object with two int fields is an object and I don't think any JVM makes any guarantee that the members will be "close". An int-array with two elements will very likely be backed by a 8 byte long array.
With regards to arrays here is an excerpt from CLI (Common Language Infrastructure) specification:
Array elements shall be laid out
within the array object in row-major
order (i.e., the elements associated
with the rightmost array dimension
shall be laid out contiguously from lowest to highest index). The
actual storage allocated for each
array element can include
platform-specific padding. (The size
of this storage, in bytes, is returned
by the sizeof instruction when it is
applied to the type of that array’s
elements.
Good question! I think I would resort to writing extensions in C++ that handle memory in a more carefully managed way and just exposing enough of an interface to allow the rest of the application to manipulate the objects. If I was that concerned about performance I would probably resort to a C++ extension anyway.
I don't think anyone has talked about Python so I'll have a go
Can I expect an array to be an adjacent block of memory (yes)?
In python arrays are more like arrays of pointers in C. So the pointers will be adjacent, but the actual objects are unlikely to be.
Are two integers in the same instance closer than two in different instances of the same class (probably)?
Probably not for the same reason as above. The instance will only hold pointers to the objects which are the actual integers. Python doesn't have native int (like Java), only boxed Int (in Java-speak).
Does an object occupy a contigous region in memory (no)?
Probably not. However if you use the __slots__ optimisation then some parts of it will be contiguous!
What's the difference between an array of objects with only two int fields and a single object with two int[] fields?
(this example is probably Java specific)
In python, in terms of memory locality, they are both pretty much the same! One will make an array of pointers to objects which will in turn contain two pointers to ints, the other will make two arrays of pointers to integers.
If you need to optimise to that level then I suspect a VM based language is not for you ;)