Make C# enums memory efficient? - c#

According to this question C# will assign 4 byte size to field of type Fruits whether it is defined like this:
enum Fruits : byte { Apple, Orange, Banana }
or like this:
enum Fruits { Apple, Orange, Banana }
I'm still curious if there is any way of sidesteping this and making the size of enum smaller than 4 bytes. I know that this probably wouldn't be very efficient or desirable but it's still interesting to know if it's possible at all.

Data alignment (typically on 1, 2, 4 byte border) is used for the faster access to the data (int should be aligned on 4 bytes border).
For instance
(let me use byte and int instead of enum for readability and struct instead of class - it's an easy way to get size of struct with a help of sizeof):
// sizeof() == 8 == 1 + 3 (padding) + 4
public struct MyDemo {
public byte A; // Padded with 3 unused bytes
public int B; // Aligned on 4 byte
}
// sizeof() == 8 == 1 + 1 + 2 (padding) + 4
public struct MyDemo {
public byte A; // Bytes should be aligned on 1 Byte Border
public byte B; // Padded with 2 unused bytes
public int C; // Aligned on 4 byte
}
// sizeof() == 2 == 1 + 1
public struct MyDemo {
public byte A; // Bytes should be aligned on 1 Byte Border
public byte B; // Bytes should be aligned on 1 Byte Border
}
So far so good you can have an effect even in case of fields within class (struct), e.g.
public struct MyClass {
// 4 Byte in total: 1 + 1 + 2 (we are lucky: no padding here)
private Fruits m_Fruits; // Aligned on 1 Byte border
private byte m_MyByte // Aligned on 1 Byte border
private short m_NyShort; // Aligned on 2 Byte border
}
In case of a collection (array) all the values are of the same type which should be aligned in the same way, that's why no padding is required:
// Length * 1Byte == Length byte in total
byte[] array = new [] {
byte1, // 1 Byte alignment
byte2, // 1 Byte alignment
byte3, // 1 Byte alignment
...
byteN, // 1 Byte alignment
}

For the vast majority of applications the size overhead will not matter at all. For some specialized applications, like image processing, it may make sense to use constant byte values and do bit-manipulations instead. This can also be a way to pack multiple values into a single byte, or combine flag-bits with values:
const byte Apple = 0x01;
const byte Orange= 0x02;
const byte Banana= 0x03;
const byte FruitMask = 0x0f; // bits 0-3 represent the fruit value
const byte Red = 0x10;
const byte Green = 0x20;
const byte ColorMask = 0x70; // bits 4-6 represents color
const byte IsValidFlag = 0x80; // bit 7 represent value flag
...
var fruitValue = myBytes[i] & FruitMask;
var IsRed = (myBytes[i] & ColorMask) == Red ;
var isValid = myBytes[i] & IsValidFlag > 0;

According to this question C# will assign 4 byte size to field of type Fruits whether it is defined like this
I would say that this is not what actually is written there. The post describes the memory alignment on stack which seems to align 4 bytes for byte variable too (can be platform depended):
byte b = 1;
results in the same IL_0000: ldc.i4.1 instruction as the var fb1 = FruitsByte.Apple and int i = 1; (see at sharplab.io) and the same 4 bytes difference (Core CLR 6.0.322.12309 on x86) in the move instructions.
Though using corresponding enum as struct fields will result in them being aligned to corresponding borders:
Console.WriteLine(Unsafe.SizeOf<C>()); // prints 2
Console.WriteLine(Unsafe.SizeOf<C1>()); // prints 8
public enum Fruits : byte { Apple, Orange, Banana }
public enum Fruits1 { Apple, Orange, Banana }
public struct C {
public Fruits f1;
public Fruits f2;
}
public struct C1 {
public Fruits1 f1;
public Fruits1 f2;
}
The same will happen for arrays, which will allocate continuous region of memory without aligning different elements.
Useful reading:
StructLayoutAttribute
Blittable and Non-Blittable Types
Article about blittable types with a lot of links

Related

How to do -1 on a specific byte of an integer in C#?

I have an integer u=101057541.
Binary, this is equal to: 00000110 00000110 00000100 00000101
Now, I regard each byte as a seperate decimal (so 6, 6, 4, 5 in this case).
I want to subtract -1 from the first byte, resulting in 6-1=5.
I try to do this as follows:
int West = u | (((u>>24) - 1) << 24);
However, the result is the same as when I ADD 1 to this byte. Can someone explain why and tell me how to subtract -1 from this byte?
UPDATE:
Thus, the result I want is the following binary number:
00000101 00000110 00000100 00000101
Because you're "or"-ing that byte back in:
u | (((u>>24) - 1) << 24);
should be
(u & mask) | (((u>>24) - 1) << 24);
where mask is everything except the byte you're playing with.
You might find unsafe code easier:
int i = 101057541;
byte* b = (byte*)&i;
b[3]--; // note CPU endianness is important here
Console.WriteLine(i);
You can do the same thing without unsafe using "spans" if you're using all the latest bits;
int i = 101057541;
var bytes = MemoryMarshal.Cast<int, byte>(MemoryMarshal.CreateSpan(ref i, 1));
bytes[3]--; // note CPU endianness is important here
Console.WriteLine(i);
or you could use a "union" via a struct with explicit layout - so 4 bytes overlapping 1 int:
var x = new Int32Bytes();
x.Value = 101057541;
x.Byte3--; // note CPU endianness is important here
Console.WriteLine(x.Value);
with:
[StructLayout(LayoutKind.Explicit)]
struct Int32Bytes
{
[FieldOffset(0)]
public int Value;
[FieldOffset(0)]
public byte Byte0;
[FieldOffset(1)]
public byte Byte1;
[FieldOffset(2)]
public byte Byte2;
[FieldOffset(3)]
public byte Byte3;
}
When you subtract 1 from 00000110 the result is 00000101. You OR this with the original value and you get 0000111, which is like if you added 1.
As a one-liner to your problem, you should mask out the region of the bits you are manipulating.:
int West = (u & 0x00FFFFFF) | ((((u & 0xFF000000)>>24) - 1) << 24);

Why the size of struct A is not equal size of struct B with same fields?

Why the size of struct A is not equal size of struct B?
And what I need to do, they will the same size?
using System;
namespace ConsoleApplication1
{
class Program
{
struct A
{
char a;
char c;
int b;
}
struct B
{
char a;
int b;
char c;
}
static void Main(string[] args)
{
unsafe
{
Console.WriteLine(sizeof(A));
Console.WriteLine(sizeof(B));
}
Console.ReadLine();
}
}
}
Output is :
8
12
There are some paddings among fields. Padding is calculated using previous fields and next field.
Also, this condition should be true:
(size of struct) % (size of largest type) == 0
In your case the largest type is int and its size is 4 bytes.
struct A
{
char a; // size is 2, no previous field, next field size is 2 - no alignment needed
char c; // size is 2, previous size is 2 -> 2 + 2 = 4, next size is 4 - no alignment needed
int b; //size is 4, it is last field, size is 4 + 4 = 8.
//current size is 2 + 2 + 4 = 8
//8 % 4 == 0 - true - 8 is final size
}
struct B
{
char a; // size is 2, next size is 4, alignment needed - 2 -> 4, size of this field with alignment is 4
int b; // size is 4, previous is 4, next size is 2(lower) - no alignment needed
char c; // size is 2, previous is 4 + 4 = 8 - no alignment needed
//current size is 4 + 4 + 2 = 10
//but size should be size % 4 = 0 -> 10 % 4 == 0 - false, adjust to 12
}
If you want same size for two structs you can use LayoutKind.Explicit:
[StructLayout(LayoutKind.Explicit)]
public struct A
{
[FieldOffset(0)]
char a;
[FieldOffset(2)]
char c;
[FieldOffset(4)]
int b;
}
[StructLayout(LayoutKind.Explicit)]
public struct B
{
[FieldOffset(0)]
char a;
[FieldOffset(2)]
int b;
[FieldOffset(6)]
char c;
}
OR
You can use LayoutKind.Sequential, Pack = 1 and CharSet = CharSet.Unicode to get size 8.
[StructLayout(LayoutKind.Sequential, Pack = 1, CharSet = CharSet.Unicode)]
public struct A
{
char a;
char c;
int b;
}
[StructLayout(LayoutKind.Sequential, Pack = 1, CharSet = CharSet.Unicode)]
public struct B
{
char a;
int b;
char c;
}
Also, you can get struct size without unsafe:
Console.WriteLine(System.Runtime.InteropServices.Marshal.SizeOf(typeof(A)));
Console.WriteLine(System.Runtime.InteropServices.Marshal.SizeOf(typeof(B)));
It's because your compiler reserves the right to insert padding between the members of the struct, plus some space at the end. (But note that padding is not allowed before the first member.)
It does this in order to align the start of members on easily-addressable memory locations.
In particular, a compiler is likely to insert padding between a single char and an int. An even number of chars followed by an int may well therefore take up less space than a char followed by an int followed by an odd number of chars.
This is a processor implementation detail, one that .NET tries very hard to hide. Variables need have a storage location that allows the processor to read and write the value with a single data bus operation. That makes the alignment of the variable address very important. Reading a single byte is never a problem. But a short (2 bytes) should have an address that is a multiple of 2. An int (4 bytes) should have an address that is a multiple of 4. Ideally a long or double (8 bytes) have an address that is a multiple of 8, but that cannot always be achieved, not on a 32-bit processor.
Intel and AMD processors allow unaligned reads and writes, unlike RISC cores. But that can come at a cost, it might require two data bus cycles to read two chunks of bytes, part of the upper bytes of a value and part of the lower bytes. With a circuit that shuffles those bytes into the right place. That takes time, typically an extra 1 to 3 clock cycles. Lots and lots of time on a RISC core to handle the bus error trap.
But more severely, it breaks the .NET memory model. It provides an atomicity guarantee for simple value types and object references. Unaligned reads and writes break that promise. It might cause tearing, observing part of the bytes being written. And much worse, it can break the garbage collector. The GC relies heavily on an object reference being updated atomically.
So when the CLR determines the layout of a structure or class, it must ensure the alignment requirement is met. And if it is not then it needs to leave extra unused space between the variables. And perhaps extra space at the end to ensure members are still aligned when stored in an array. Generic word for that extra space is "padding".
Specific to a class declaration, it has [StructLayout(LayoutKind.Auto)], it can shuffle the members around to achieve the best layout. Not a struct, they are LayoutKind.Sequential by default. Beyond classes and structs, this alignment guarantee is also required for static variables and the arguments and local variables of a method. But not nearly as easily observed.
The order of the fields is different; I would guess that the size is different as the members are padded (i.e. located in such a way that they begin on an even machine word in order to make access easier at the cost of memory consumption).

sizeof operator gives extra size of a struct in C# [duplicate]

This question already has answers here:
Structure padding and packing
(11 answers)
Closed 6 years ago.
I am trying to check size of all of my variables (value types) using sizeof operator. I gone through one of the msdn article where it is written that
For all other types, including structs, the sizeof operator can be used only in unsafe code blocks
and also structs should not contain any fields or properties that are reference types
For this, I enabled unsafe compilation in my project properties and created structure as follows-
struct EmployeeStruct
{
int empId;
long salary;
}
and used it as follows-
unsafe
{
size = sizeof(EmployeeStruct);
}
Console.WriteLine("Size of type in bytes is: {0}", size);
Here I am getting output as Size of type in bytes is: 16 however by looking at structure it should be 12 (4 for int and 8 for long).
Can someone help me understand here that why I am getting 4 byte extra size?
You don´t need to use unsafe code. Is recommended to use System.Runtime.InteropServices.Marshal.SizeOf()
eg: Marshal.SizeOf(new EmployeeStruct());
That return 16 instead of 12, because the default pack size in memory is 8.
So, for:
struct EmployeeStruct
{
int empId; // 4 bytes
long salary; 8 bytes
}
//return 16 instead 12 (because de min unit is 8)
for:
struct EmployeeStruct
{
int empId; // 4 bytes
int empAge; // 4 bytes
long salary; 8 bytes
}
//return 16 too
and for
struct EmployeeStruct
{
int empId; // 4 bytes
int empAge; // 4 bytes
int IdCompany; // 4 bytes
long salary; 8 bytes
}
return 24 instead 20 (because de min unit is 8)
I don't know what you want, but if you need the sum of each field size, you can try with this function:
public int SizeOf(Type t)
{
int s = 0;
var fields = t.GetFields(BindingFlags.Public | BindingFlags.Instance | BindingFlags.NonPublic);
foreach (var f in fields)
{
var x = f.FieldType;
s += x.IsPrimitive ? Marshal.SizeOf(x) : SizeOf(x);
}
return s;
}
It returns 12 instead 16, for your case, and you can use it for complex structures, eg:
struct EmployeeStruct
{
int field1; // 4 bytes
long field2; // 8 bytes
Person p; // 12 bytes
}
struct Person
{
int field1; // 4 bytes
long field2; // 8 bytes
}
SizeOf(typeof(EmployeeStruct) will return 24 instead 32, but remember, the REAL SIZE ON MEMORY is 32, the compiler use 32 bytes to assign memory.
Regards.

Combining bytes into logical values

I'm getting back data in the (expected) following format:
\u0001\u0001\u0004\0\u0001\0\0\0
Each segment represents a byte. The first two segments \u0001\u0001 represents the service version number, the second two \u0004\0 represent the status code, and the final 4 u0001\0\0\0 equals the request id.
How can I take the fields I KNOW go together and make a logical value out of the result? For example, the status code \u0004\0 should be a signed-short and the request id should be an int.
What I've played around with, but I don't know the validity:
byte s1 = 0004;
byte s2 = 0;
short statusCode = (short)(s1 | (s2 << 8));
byte r1 = 0001;
byte r2 = 0;
byte r3 = 0;
byte r4 = 0;
int requestId = (int)(r1 | (r2 << 8) | (r3 << 16) | (r4 << 24));
While your logic seems fine, manual bit shifting can become quite tedious, especially when the amount of data you have to handle increases. It’s simple enough for 8 bytes, but for everything else, I would suggest you to look into marshalling bytes directly into objects.
For this, define a value type that explains your data:
public struct Data
{
public short serviceVersion;
public short statusCode;
public int requestId;
}
Then, you can convert the string into a byte array, and marshall it as a Data object:
// raw input, as a string
string s = "\u0001\u0001\u0004\0\u0001\0\0\0";
// convert string into byte array
byte[] bytes = Encoding.UTF8.GetBytes(s);
// interpret byte array as `Data` object
GCHandle handle = GCHandle.Alloc(bytes, GCHandleType.Pinned);
Data data = (Data)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(Data));
handle.Free();
// access the data!
Console.WriteLine(data.serviceVersion); // 257
Console.WriteLine(data.statusCode); // 4
Console.WriteLine(data.requestId); // 1
Exanding on #poke's post above; if you want to get really fancy you can do a union like effect
using System.Runtime.InteropServices;
[StructLayout(LayoutKind.Explicit, Size = 8)]
public struct Data
{
[FieldOffset(0)]
public short serviceVersion;
[FieldOffset(2)]
public short statusCode;
[FieldOffset(4)]
public int requestId;
[FieldOffset(0)]
public ulong Value;
}
Where the use of the Data.Value field reads all of the bits you care about.
Additionally, using unsafe you can avoid the marshalling.
// raw input, as a string
string s = "\u0001\u0001\u0004\0\u0001\0\0\0";
// convert string into byte array
byte[] bytes = Encoding.UTF8.GetBytes(s);
Data data = new Data();
unsafe
{
Data* d = &data;
fixed(byte* b = bytes)
{
*d = *((Data*)b);
}
}

C# flags enum word size

If i declare
[Flags]
public enum MyColor
{
Red = 1;
Green = 2;
Blue = 4;
White = 8;
Magenta = 16;
... (etc)
}
Is there a way to determine/set the number of Bytes that this enum takes up? Also, what byte order would it end up in? (e.g. do i have to do a HostToNetwork() to properly send it over the wire?) Also, in order to call HostToNetwork, can i cast as a byte array and iterate?
[Flags]
public enum MyColor : byte // sets the underlying type.
{
Red = 0;
Green = 1;
Blue = 2;
White = 4;
Magenta = 8;
... (etc)
}
It's not possible to directly set the endianness. You can use some well-crafted numbers that simulate big-endian bytes on a little endian system. However, I'd always use explicit APIs for converting byte orders.
Complete answer is:
Is there a way to determine/set the number of Bytes that this enum takes up?
Yes:
[Flags]
public enum MyColor : byte // sets the underlying type.
{
Red = 1;
Green = 2;
Blue = 4;
White = 8;
Magenta = 16;
... (etc)
}
Also, what byte order would it end up in?
Whatever it's compiled in, so for my case, x86 (little).
Also, in order to call HostToNetwork, can i cast as a byte array and iterate?
This is where it's tricky. I found out a few things:
the enum's underlying type will expand (or be expanded by the ": long" you have to tag at the end of the declaration) and it must be a type. So it is actually impossible to do what I was really trying to do (an enum of 6 bytes).
the serialization of this structure to an array of bytes (to be converted to network order, and sent over the wire) is incredibly not straightforward. The BitConverter class does the trick, but this is pretty helpful for dancing in between endianness: http://snipplr.com/view/15179/adapt-systembitconverter-to-handle-big-endian-network-byte-ordering-in-order-to-create-number-types-from-bytes-and-viceversa/

Categories