msdn says that
the sizeof operator can be used only in unsafe code blocks. Although
you can use the Marshal.SizeOf method, the value returned by this
method is not always the same as the value returned by sizeof.
and
Marshal.SizeOf returns the size after the type has been marshaled,
whereas sizeof returns the size as it has been allocated by the common
language runtime, including any ** padding **.
once ive read in the book : c# via clr (page 522)
that :
questions :
1) does the padding mentioned in here :
is the same as mentioned in the book ?
AND
2) if i have object type of Person - how can i know its TRUE SIZE in MEMORY ?
edit - why do i need that ?
please notice this :
they have a sample of reading records :
using (var accessor = mmf.CreateViewAccessor(offset, length))
{
int colorSize = Marshal.SizeOf(typeof(MyColor)); //<--------HERE
MyColor color;
for (long i = 0; i < length; i += colorSize)
{
accessor.Read(i, out color);
color.Brighten(10);
accessor.Write(i, ref color);
}
}
}
if the size being reported by MARSHAL.sizeOF is not the size as sizeOF so - which should i choose ? it has to be accurate !!
according to this sample , they dont consider the padding , and they should... ( or not...)
This might seem disingenuous - but the size you're interested from the memory-mapped-file point of view is not the same as the size of that object in memory. They might be called memory-mapped files, but in .Net that doesn't necessarily mean quite the same thing as it would in native code. (The underlying implementation is still the same, though - a section of logical memory is mapped to a section of a file, so the name is still correct)
sizeof returns the correct size of the object in physical memory, including any padding bytes etc. Therefore if you need to know the exact size of an object in native-memory terms, use that (but this doesn't apply to memory-mapped files as I'll explain in a moment).
As the documentation says, Marshal.SizeOf reports the size of an object from the .Net point of view, excluding the two hidden data items; which are used by the runtime only.
The example you have copied uses Marshal.SizeOf because the padding values are related only to the physical object when in memory. When the object is serialized, only the logical .Net data is serialized. When the object is loaded back again, those two padded values are re-assigned based on the state of the runtime at that point. E.g. The type pointer might be different. It would be meaningless to serialize them. It would be like serializing a native pointer (not an offset) to disk - it's incredibly unlikely that the data it points to will be in the same place next time.
Ergo - if you want to know how much an array of 100 Color objects uses in physical memory - use sizeof; if you want to know how large the memroy-mapped file for the same data would be, use Marshal.SizeOf.
Related
I want to make an array of size 10^9 elements where each elements can be an integer of the same size. I always get an OutOfMemoryException at the initialization line. How can I achieve this?
If this is not possible, please suggest alternative strategies?
Arrays are limited to 2GB in .net 4.0 or earlier, even in a 64 bit process. So with one billion elements, the maximum supported element size is two bytes, but an int is four bytes. So this will not work.
If you want to have a larger collection, you need to write it yourself, backed by multiple arrays.
In .net 4.5 it's possible to avoid this limitation, see Jon Skeet's answer for details.
Assuming you mean int as the element type, you can do this using .NET 4.5, if you're using a 64-bit CLR.
You need to use the <gcAllowVeryLargeObjects> configuration setting. This is not on by default.
If you're using an older CLR or you're on a 32-bit machine, you're out of luck. Of course, if you're on a 64-bit machine but just an old version of the CLR, you could encapsulate your "one large array" into a separate object which has a list of smaller arrays... you could even implement IList<int> that way so that most code wouldn't need to know that you weren't really using a single array.
(As noted in comments, you'll still only be able to create an array with 231 elements; but your requirement of 109 is well within this.)
I think you shouldn't load all this data into memory, store it somewhere in a file, and make a class that could work as an array but actually reads and writes data from and in the file
this is the general idea (this won't work as it is of course, plus you'll have to convert your int values into byte[] array before writing it and some other things)
public class FileArray
{
Stream s;
public this[int index]
{
get { s.Position = index * 4; return s.Read(); }
set { s.Position = index * 4; s.Write(value); }
}
}
that way you would have something that works just like an array, but the data would be stored on your hard drive
As I am trying to code a C# application from an existing application but developed in Delphi,
Very tough but managed some how till, but now I have come across a problem...
Delphi code contains following code:
type
TFruit = record
name : string[20];
case isRound : Boolean of // Choose how to map the next section
True :
(diameter : Single); // Maps to same storage as length
False :
(length : Single; // Maps to same storage as diameter
width : Single);
end;
i.e. a variant record (with case statement inside) and accordingly record is constructed and its size too.
On the other hand I am trying to do the same in C# struct, and haven't succeeded yet, I hope somemone can help me here.
So just let me know if there's any way I can implement this in C#.
Thanks in advance....
You could use an explicit struct layout to replicate this Delphi variant record. However, I would not bother since it seems pretty unlikely that you really want assignment to diameter to assign also to length, and vice versa. That Delphi record declaration looks like it dates from mid-1990s style of Delphi coding. Modern Delphi code would seldom be written that way.
I would just do it like this:
struct Fruit
{
string name;
bool isRound;
float diameter; // only valid when isRound is true
float length; // only valid when isRound is false
float width; // only valid when isRound is false
}
A more elegant option would be a class with properties for each struct field. And you would arrange that the property getters and setters for the 3 floats raised exceptions if they were accessed for an invalid value of isRound.
Perhaps this will do the trick?
This is NOT a copy-and-paste solution, note that the offsets and data sizes may need to be changed depending on how the Delphi structure is declared and/or aligned.
[StructLayout(LayoutKind.Explicit)]
unsafe struct Fruit
{
[FieldOffset(0)] public fixed char name[20];
[FieldOffset(20)] public bool IsRound;
[FieldOffset(21)] public float Diameter;
[FieldOffset(21)] public float Length;
[FieldOffset(25)] public float Width;
}
It depends on what you are trying to do.
If you're simply trying to make a corresponding structure then look at David Heffernan's answer. These days there is little justification for mapping two fields on top of each other unless they truly represent the same thing. (Say, individual items or the same items in an array.)
If you're actually trying to share files you need to look along the lines of ananthonline's answer but there's a problem with it that's big enough I couldn't put it in a comment:
Not only is there the Unicode issue but a Delphi shortstring has no corresponding structure in C# and thus it's impossible to simply map a field on top of it.
That string[20] actually comprises 21 bytes, a one-byte length code and then 20 characters worth of data. You have to honor the length code as there is no guarantee of what lies beyond the specified length--you're likely to find garbage there. (Hint: If the record is going to be written to disk always zap the field before putting new data in it. It makes it much easier to examine the file on disk when debugging.)
Thus you need to declare two fields and write code to process it on both ends.
Since you have to do that anyway I would go further and write code to handle the rest of it so as to eliminate the need for unsafe code at all.
I'm trying to optimize some code where I have a large number of arrays containing structs of different size, but based on the same interface. In certain cases the structs are larger and hold more data, othertimes they are small structs, and othertimes I would prefer to keep null as a value to save memory.
My first question is. Is it a good idea to do something like this? I've previously had an array of my full data struct, but when testing mixing it up I would virtually be able to save lots of memory. Are there any other downsides?
I've been trying out different things, and it seams to work quite well when making an array of a common interface, but I'm not sure I'm checking the size of the array correctly.
To simplified the example quite a bit. But here I'm adding different structs to an array. But I'm unable to determine the size using the traditional Marshal.SizeOf method. Would it be correct to simply iterate through the collection and count the sizeof for each value in the collection?
IComparable[] myCollection = new IComparable[1000];
myCollection[0] = null;
myCollection[1] = (int)1;
myCollection[2] = "helloo world";
myCollection[3] = long.MaxValue;
System.Runtime.InteropServices.Marshal.SizeOf(myCollection);
The last line will throw this exception:
Type 'System.IComparable[]' cannot be marshaled as an unmanaged structure; no meaningful size or offset can be computed.
Excuse the long post:
Is this an optimal and usable solution?
How can I determine the size
of my array?
I may be wrong but it looks to me like your IComparable[] array is a managed array? If so then you can use this code to get the length
int arrayLength = myCollection.Length;
If you are doing platform interop between C# and C++ then the answer to your question headline "Can I find the length of an unmanaged array" is no, its not possible. Function signatures with arrays in C++/C tend to follow the following pattern
void doSomeWorkOnArrayUnmanaged(int * myUnmanagedArray, int length)
{
// Do work ...
}
In .NET the array itself is a type which has some basic information, such as its size, its runtime type etc... Therefore we can use this
void DoSomeWorkOnManagedArray(int [] myManagedArray)
{
int length = myManagedArray.Length;
// Do work ...
}
Whenever using platform invoke to interop between C# and C++ you will need to pass the length of the array to the receiving function, as well as pin the array (but that's a different topic).
Does this answer your question? If not, then please can you clarify
Optimality always depends on your requirements. If you really need to store many elements of different classes/structs, your solution is completely viable.
However, I guess your expectations on the data structure might be misleading: Array elements are per definition all of the same size. This is even true in your case: Your array doesn't store the elements themselves but references (pointers) to them. The elements are allocated somewhere on the VM heap. So your data structure actually goes like this: It is an array of 1000 pointers, each pointer pointing to some data. The size of each particular element may of course vary.
This leads to the next question: The size of your array. What are you intending to do with the size? Do you need to know how many bytes to allocate when you serialize your data to some persistent storage? This depends on the serialization format... Or do you need just a rough estimate on how much memory your structure is consuming? In the latter case you need to consider the array itself and the size of each particular element. The array which you gave in your example consumes approximately 1000 times the size of a reference (should be 4 bytes on a 32 bit machine and 8 bytes on a 64 bit machine). To compute the sizes of each element, you can indeed iterate over the array and sum up the size of the particular elements. Please be aware that this is only an estimate: The virtual machine adds some memory management overhead which is hard to determine exactly...
I am a tinkerer—no doubt about that. For this reason (and very little beyond that), I recently did a little experiment to confirm my suspicion that writing to a struct is not an atomic operation, which means that a so-called "immutable" value type which attempts to enforce certain constraints could hypothetically fail at its goal.
I wrote a blog post about this using the following type as an illustration:
struct SolidStruct
{
public SolidStruct(int value)
{
X = Y = Z = value;
}
public readonly int X;
public readonly int Y;
public readonly int Z;
}
While the above looks like a type for which it could never be true that X != Y or Y != Z, in fact this can happen if a value is "mid-assignment" at the same time it is copied to another location by a separate thread.
OK, big deal. A curiosity and little more. But then I had this hunch: my 64-bit CPU should actually be able to copy 64 bits atomically, right? So what if I got rid of Z and just stuck with X and Y? That's only 64 bits; it should be possible to overwrite those in one step.
Sure enough, it worked. (I realize some of you are probably furrowing your brows right now, thinking, Yeah, duh. How is this even interesting? Humor me.) Granted, I have no idea whether this is guaranteed or not given my system. I know next to nothing about registers, cache misses, etc. (I am literally just regurgitating terms I've heard without understanding their meaning); so this is all a black box to me at the moment.
The next thing I tried—again, just on a hunch—was a struct consisting of 32 bits using 2 short fields. This seemed to exhibit "atomic assignability" as well. But then I tried a 24-bit struct, using 3 byte fields: no go.
Suddenly the struct appeared to be susceptible to "mid-assignment" copies once again.
Down to 16 bits with 2 byte fields: atomic again!
Could someone explain to me why this is? I've heard of "bit packing", "cache line straddling", "alignment", etc.—but again, I don't really know what all that means, nor whether it's even relevant here. But I feel like I see a pattern, without being able to say exactly what it is; clarity would be greatly appreciated.
The pattern you're looking for is the native word size of the CPU.
Historically, the x86 family worked natively with 16-bit values (and before that, 8-bit values). For that reason, your CPU can handle these atomically: it's a single instruction to set these values.
As time progressed, the native element size increased to 32 bits, and later to 64 bits. In every case, an instruction was added to handle this specific amount of bits. However, for backwards compatibility, the old instructions were still kept around, so your 64-bit processor can work with all of the previous native sizes.
Since your struct elements are stored in contiguous memory (without padding, i.e. empty space), the runtime can exploit this knowledge to only execute that single instruction for elements of these sizes. Put simply, that creates the effect you're seeing, because the CPU can only execute one instruction at a time (although I'm not sure if true atomicity can be guaranteed on multi-core systems).
However, the native element size was never 24 bits. Consequently, there is no single instruction to write 24 bits, so multiple instructions are required for that, and you lose the atomicity.
The C# standard (ISO 23270:2006, ECMA-334) has this to say regarding atomicity:
12.5 Atomicity of variable references
Reads and writes of the following data types shall be atomic: bool, char, byte, sbyte, short, ushort,
uint, int, float, and reference types. In addition, reads and writes of enum types with an underlying type
in the previous list shall also be atomic. Reads and writes of other types, including long, ulong, double,
and decimal, as well as user-defined types, need not be atomic. (emphasis mine) Aside from the library functions designed
for that purpose, there is no guarantee of atomic read-modify-write, such as in the case of increment or
decrement.Your example X = Y = Z = value is short hand for 3 separate assignment operations, each of which is defined to be atomic by 12.5. The sequence of 3 operations (assign value to Z, assign Z to Y, assign Y to X) is not guaranteed to be atomic.
Since the language specification doesn't mandate atomicity, while X = Y = Z = value; might be an atomic operation, whether it is or not is dependent on a whole bunch of factors:
the whims of the compiler writers
what code generation optimizations options, if any, were selected at build time
the details of the JIT compiler responsible for turning the assembly's IL into machine language. Identical IL run under Mono, say, might exhibit different behaviour than when run under .Net 4.0 (and that might even differ from earlier versions of .Net).
the particular CPU on which the assembly is running.
One might also note that even a single machine instruction is not necessarily warranted to be an atomic operation—many are interruptable.
Further, visiting the CLI standard (ISO 23217:2006), we find section 12.6.6:
12.6.6 Atomic reads and writes
A conforming CLI shall guarantee that read and write access to properly
aligned memory locations no larger than the native word size (the size of type
native int) is atomic (see §12.6.2) when all the write accesses to a location are
the same size. Atomic writes shall alter no bits other than those written. Unless
explicit layout control (see Partition II (Controlling Instance Layout)) is used to
alter the default behavior, data elements no larger than the natural word size (the
size of a native int) shall be properly aligned. Object references shall be treated
as though they are stored in the native word size.
[Note: There is no guarantee
about atomic update (read-modify-write) of memory, except for methods provided for
that purpose as part of the class library (see Partition IV). (emphasis mine)
An atomic write of a “small data item” (an item no larger than the native word size)
is required to do an atomic read/modify/write on hardware that does not support direct
writes to small data items. end note]
[Note: There is no guaranteed atomic access to 8-byte data when the size of
a native int is 32 bits even though some implementations might perform atomic
operations when the data is aligned on an 8-byte boundary. end note]
x86 CPU operations take place in 8, 16, 32, or 64 bits; manipulating other sizes requires multiple operations.
The compiler and x86 CPU are going to be careful to move only exactly as many bytes as the structure defines. There are no x86 instructions that can move 24 bits in one operation, but there are single instruction moves for 8, 16, 32, and 64 bit data.
If you add another byte field to your 24 bit struct (making it a 32 bit struct), you should see your atomicity return.
Some compilers allow you to define padding on structs to make them behave like native register sized data. If you pad your 24 bit struct, the compiler will add another byte to "round up" the size to 32 bits so that the whole structure can be moved in one atomic instruction. The downside is your structure will always occupy 30% more space in memory.
Note that alignment of the structure in memory is also critical to atomicity. If a multibyte structure does not begin at an aligned address, it may span multiple cache lines in the CPU cache. Reading or writing this data will require multiple clock cycles and multiple read/writes even though the opcode is a single move instruction. So, even single instruction moves may not be atomic if the data is misaligned. x86 does guarantee atomicity for native sized read/writes on aligned boundaries, even in multicore systems.
It is possible to achieve memory atomicity with multi-step moves using the x86 LOCK prefix. However this should be avoided as it can be very expensive in multicore systems (LOCK not only blocks other cores from accessing memory, it also locks the system bus for the duration of the operation which can impact disk I/O and video operations. LOCK may also force the other cores to purge their local caches)
Today I noticed that C#'s String class returns the length of a string as an Int. Since an Int is always 32-bits, no matter what the architecture, does this mean that a string can only be 2GB or less in length?
A 2GB string would be very unusual, and present many problems along with it. However, most .NET api's seem to use 'int' to convey values such as length and count. Does this mean we are forever limited to collection sizes which fit in 32-bits?
Seems like a fundamental problem with the .NET API's. I would have expected things like count and length to be returned via the equivalent of 'size_t'.
Seems like a fundamental problem with
the .NET API...
I don't know if I'd go that far.
Consider almost any collection class in .NET. Chances are it has a Count property that returns an int. So this suggests the class is bounded at a size of int.MaxValue (2147483647). That's not really a problem; it's a limitation -- and a perfectly reasonable one, in the vast majority of scenarios.
Anyway, what would the alternative be? There's uint -- but that's not CLS-compliant. Then there's long...
What if Length returned a long?
An additional 32 bits of memory would be required anywhere you wanted to know the length of a string.
The benefit would be: we could have strings taking up billions of gigabytes of RAM. Hooray.
Try to imagine the mind-boggling cost of some code like this:
// Lord knows how many characters
string ulysses = GetUlyssesText();
// allocate an entirely new string of roughly equivalent size
string schmulysses = ulysses.Replace("Ulysses", "Schmulysses");
Basically, if you're thinking of string as a data structure meant to store an unlimited quantity of text, you've got unrealistic expectations. When it comes to objects of this size, it becomes questionable whether you have any need to hold them in memory at all (as opposed to hard disk).
Correct, the maximum length would be the size of Int32, however you'll likely run into other memory issues if you're dealing with strings larger than that anyway.
At some value of String.length() probably about 5MB its not really practical to use String anymore. String is optimised for short bits of text.
Think about what happens when you do
msString += " more chars"
Something like:
System calculates length of myString plus length of " more chars"
System allocates that amount of memory
System copies myString to new memory location
System copies " more chars" to new memory location after last copied myString char
The original myString is left to the mercy of the garbage collector.
While this is nice and neat for small bits of text its a nightmare for large strings, just finding 2GB of contiguous memory is probably a showstopper.
So if you know you are handling more than a very few MB of characters use one of the *Buffer classes.
It's pretty unlikely that you'll need to store more than two billion objects in a single collection. You're going to incur some pretty serious performance penalties when doing enumerations and lookups, which are the two primary purposes of collections. If you're dealing with a data set that large, There is almost assuredly some other route you can take, such as splitting up your single collection into many smaller collections that contain portions of the entire set of data you're working with.
Heeeey, wait a sec.... we already have this concept -- it's called a dictionary!
If you need to store, say, 5 billion English strings, use this type:
Dictionary<string, List<string>> bigStringContainer;
Let's make the key string represent, say, the first two characters of the string. Then write an extension method like this:
public static string BigStringIndex(this string s)
{
return String.Concat(s[0], s[1]);
}
and then add items to bigStringContainer like this:
bigStringContainer[item.BigStringIndex()].Add(item);
and call it a day. (There are obviously more efficient ways you could do that, but this is just an example)
Oh, and if you really really really do need to be able to look up any arbitrary object by absolute index, use an Array instead of a collection. Okay yeah, you use some type safety, but you can index array elements with a long.
The fact that the framework uses Int32 for Count/Length properties, indexers etc is a bit of a red herring. The real problem is that the CLR currently has a max object size restriction of 2GB.
So a string -- or any other single object -- can never be larger than 2GB.
Changing the Length property of the string type to return long, ulong or even BigInteger would be pointless since you could never have more than approx 2^30 characters anyway (2GB max size and 2 bytes per character.)
Similarly, because of the 2GB limit, the only arrays that could even approach having 2^31 elements would be bool[] or byte[] arrays that only use 1 byte per element.
Of course, there's nothing to stop you creating your own composite types to workaround the 2GB restriction.
(Note that the above observations apply to Microsoft's current implementation, and could very well change in future releases. I'm not sure whether Mono has similar limits.)
In versions of .NET prior to 4.5, the maximum object size is 2GB. From 4.5 onwards you can allocate larger objects if gcAllowVeryLargeObjects is enabled. Note that the limit for string is not affected, but "arrays" should cover "lists" too, since lists are backed by arrays.
Even in x64 versions of Windows I got hit by .Net limiting each object to 2GB.
2GB is pretty small for a medical image. 2GB is even small for a Visual Studio download image.
If you are working with a file that is 2GB, that means you're likely going to be using a lot of RAM, and you're seeing very slow performance.
Instead, for very large files, consider using a MemoryMappedFile (see: http://msdn.microsoft.com/en-us/library/system.io.memorymappedfiles.memorymappedfile.aspx). Using this method, you can work with a file of nearly unlimited size, without having to load the whole thing in memory.