sizeof empty string in C# - c#

In Java, an empty string is 40 bytes. In Python it's 20 bytes. How big is an empty string object in C#? I cannot do sizeof, and I don't know how else to find out. Thanks.

It's 18 bytes:
16 bytes of memory + 2 bytes per character allocated + 2 bytes for the final null character.
Note that this was written about .Net 1.1.
The m_ArrayLength field was removed in .Net 4.0 (you can see this in the reference source)

The CLR version matters. Prior to .NET 4, a string object had an extra 4-byte field that stored the "capacity", m_arrayLength field. That field is no longer around in .NET 4. It otherwise has the standard object header, 4 bytes for the sync-block, 4 bytes for the method table pointer. Then 4 bytes to store the string length (m_stringLength), followed by 2 bytes each for each character in the string. And a 0 char to make it compatible with native code. Objects are always a multiple of 4 bytes long, minimum 16 bytes.
An empty string is thus 4 + 4 + 4 + 2 = 14 bytes, rounded up to 16 bytes on .NET 4.0. 20 bytes on earlier versions. Given values are for x86. This is all very visible in the debugger, check this answer for hints.

Jon Skeet recently wrote a whole article on the subject.
On x86, an empty string is 16 bytes, and on x64 it's 32 bytes

Related

How many bytes does a string take up in x64?

For the purpose of learning, I'm trying to understand how C# strings are internally stored in memory.
According to this blog post, C# string size is (x64 with .NET framework 4.0) :
26 + 2 * length
A string with a single character will take (26 + 2 * 1) / 8 * 8 = 32 bytes .
This is indeed similar to what I measured.
What puzzle me is what is in that 26 bytes overhead.
I have run the following code and inspected memory :
string abc = "abcdeg";
string aaa = "x";
string ccc = "zzzzz";
AFAIK those blocks are the following :
Green : Sync block (8 bytes)
Cyan : Type info (8 bytes)
Yellow : Length (4 bytes)
Pink : The actual characters : 2 bytes per char + 2 bytes for NULL terminator.
Look at the "x" string. It is indeed 32 bytes (as calculated).
Anyway it looks like the end of the string if padded with zeroes.
The "x" string could end up after the two bytes for NULL terminator and still be memory aligned (thus being 24 bytes).
Why do we need an extra 8 bytes ?
I have experimented similar results with other (bigger) string sizes.
It looks like there is always an extra 8 bytes.
As Hans Passant suggested, there is an extra field added at the end of string object which is 4 bytes (in x64, it might require another 4 bytes extra, for padding).
So in the end we have :
= 8 (sync) + 8 (type) + 4 (length) + 4(extra field) + 2 (null terminator) + 2 * length
= 26 + 2 * length
So Jon Skeet's blog post was right (how could it be wrong ?)

Why is the minimum size of a reference type 12 bytes for a 32 bit .NET process

I was reading the Pro .Net Performance book section on reference type internals. It mentions that for a 32 bit .net process a reference type has 4 bytes of object header and 4 bytes of method table pointer. Also, says that on a 32 bit system, the objects are aligned to the nearest 4 byte multiple, which makes the minimum size of a reference type 12 bytes.
My question is, why is the minimum size 12 bytes? The object is 8 bytes and that already aligns with a 4 byte boundary.
Minimum of 12 bytes is a requirement of the garbage collection implementation.
From here: http://msdn.microsoft.com/en-us/magazine/cc163791.aspx#S9
The Base Instance Size is the size of the object as computed by the class loader, based on the field declarations in the code. As discussed previously, the current GC implementation needs an object instance of at least 12 bytes. If a class does not have any instance fields defined, it will carry an overhead of 4 bytes. The rest of the 8 bytes will be taken up by the Object Header (which may contain a syncblk number) and TypeHandle.
(TypeHandle being a handle to the method table).
So you have 8 bytes of overhead (the object header and the method table pointer). If you want any data in the object, then you need at least one more byte, and because memory is allocated to objects in 4-byte chunks, you end up with a minimum of 12 bytes.

Binary file format: why did the system read all the 4 bytes and displayed the value correctly as 25736483

Using a c# program BinaryWriter class I am writing 2 words in a file
bw.write(i);
bw.write(s);
where i is an integer with value 25736483. and s is a string with value I am happy
I am reading the file again and outputting the value to a textbox.text
iN = br.ReadInt32();
newS = br.ReadString();
this.textBox1.Text = i.ToString() + newS;
The integer will be stored in 4 bytes and the string in another 11 bytes. When we read the ReadInt32 how did the system know that it has to spawn 4 bytes and not only 1 byte. Why did the system read all the 4 bytes and displayed the value correctly as 25736483?
You told it to read an Int32 with ReadInt32, and an Int32 is 4 bytes (32 bits). So you said "Read a four byte (32-bit) integer", and it did exactly what you told it to do.
The same thing happened when you used bw.write(i); - as you said, i is an integer, and you told it to write an integer. Since the default integer is 4 bytes (32 bits) on your platform, it wrote 4 bytes.
Integers are 32-bits on your platform. So when you passed an integer to write, you got an overload that writes 4 bytes. The ReadInt32 function knows to read four bytes because 32 bits is four bytes, so that's what it always does.
The Write method has an overload for each variable type that it knows how to write. This overload knows exactly how many bytes it takes to store the value in the stream.
You can see every specific overload here.
When you read data you call the correct version of .Read for the data type you are trying to read, which is also specifically coded to know how many bytes are in that type.

How is memory allocated in int array

How much space does a int array take up? Or how much space (in bytes) does a int array consumes that looks something like this:
int[] SampleArray=new int[]{1,2,3,4};
Is memory allocation language specific ??
Thank you all
Since you add a lot of language tags, I want to write for C#. In C#, this depends on operating system.
For 32-bit, each int is 4 byte and 4 byte also for reference to the object, that makes 4 * 4 + 4 = 20 byte
For 64-bit, each int is 4 byte and 8 byte also for reference to the object, that makes 4 * 4 + 8 = 24 byte
From C# 5.0 in a Nutshell in page 22;
Each reference to an object requires an extra four or eight bytes,
depending on whether the .NET runtime is running on a 32- or 64-bit
platform.
In C++, how much memory new int[4]{1, 2, 3, 4} actually allocates is implementation-defined but the size of the array will be sizeof(int)*4.
Ques is : Is memory allocation language specific ??
Yes memory allocation is language specific..it vary according the language..
for exp:
sizeof(int)*4
in java int size is 4byte so memory consumption is: 4*4=16bytes
It depends on both the language, but moreover to the operating system.
You need 4 integers. Normally an integer is 2 or 4 bytes (mostly 4 on most systems), but to be sure check sizeof(int).
(Also keep in mind the values can be differently represented depending on the operating system), like MSB first or LSB first (or a mix in case 4 bytes are used).
In Java int[] array is an Object which in memory represented by the header (8 bytes for x86) and int length field (4 bytes) followed by array of ints (arrayLength * 4).
approxSize = 8 + 4 + 4 * arraylength
see more here http://www.javamex.com/tutorials/memory/object_memory_usage.shtml

Bit/byte conversion

How many bits is a .NET string that's 10 characters in length? (.NET strings are UTF-16, right?)
On 32-bit systems:
4 bytes = Type pointer (Every object has one of these)
4 bytes = Lock (One of these too!)
4 bytes = Length (Need the length)
2 * Length bytes = Data (And the chars themselves)
=======================
12 + 2*Length bytes
=======================
96 + 16*Length bits
So 10 chars would = 256 bits = 32 bytes
I am not sure if the Lock grows to 64-bit on 64-bit systems. I kinda hope not, but you never know. The 64-bit structure overhead is therefore anywhere from 16-20 bytes (as opposed to the 12 bytes on 32-bit).
Every char in the string is two bytes in size, so if you are just converting the chars directly and not using any particular encoding, the answer is string.Length * 2 * 8
otherwise the result depends on the encoding, you can write:
int numbits = System.Text.Encoding.UTF8.GetByteCount(str)*8; //returns 80
or
int numbits = System.Text.Encoding.Unicode.GetByteCount(str)*8 //returns 160
If you are talking pure Unicode-16 then:
10 characters = 20 bytes = 160 bits
This really needs a context in order to be answered properly.
It all comes down to how you define character and how to you store the data.
For example, if you define character as a single letter from the users point of view it can be more than 2 bytes, for example this character: Å is two Unicode code points (U+0041 U+030A, Latin Capital A + Combining Ring Above) so it will require two .net chars or 4 bytes int UTF-16.
Now even if you are talking about 10 .net Char elements than if it's in memory you have some object overhead (that was already mentioned) and a bit of alignment overhead (on 32bit system everything has to be aligned to 4 bytes boundary, in 64bit the rules are more complicated) so you may have some empty bytes at the end.
If you are talking about database or files than each database and file system has its own overhead.

Categories