Given that:
int onlyLastByteContainsValue = 35;
Which of the following is faster and why?
byte valueInByte = BitConverter.GetBytes(onlyLastByteContainsValue)[3];
Or
byte valueInByte = (byte)onlyLastByteContainsValue;
Follow-up question: Are there other differences between the two above?
Naturally, the cast will be faster, from my profiling by up to x5 with optimizations off (and even moreso with optimizations on).
Of course there are different things going on:
Your BitConverter statement allocates an array with a size of sizeof(int), fills it with all the bytes of the int value, and then indexes the array to retrieve only one byte. It goes without saying that this is wasteful of resources.
The cast checks that the specified value is within the range of byte (unless unchecked is specified), and casts the least significant byte of the value to a byte.
Related
Should primitive array content be accessed by int for best performance?
Here's an example
int[] arr = new arr[]{1,2,3,4,5};
Array is only 5 elements in length, so the index doesn't have to be int, but short or byte, that would save useless 3 byte memory allocation if byte is used instead of int. Of course, if only i know that array wont overflow size of 255.
byte index = 1;
int value = arr[index];
But does this work as good as it sounds?
Im worried about how this is executed on lower level, does index gets casted to int or other operations which would actually slow down the whole process instead of this optimizing it.
In C and C++, arr[index] is formally equivalent to *(arr + index). Your concerns about casting should be answerable in terms of the simpler question about what the machine will do when it needs to add add an integer offset to a pointer.
I think it's safe to say that on most modern machines when you add a "byte" to a pointer, its going to use the same instruction as it would if you added a 32-bit integer to a pointer. And indeed it's still going to represent that byte using the machine word size, padded with some unused space. So this isn't going to make using the array faster.
Your optimization might make a difference if you need to store millions of these indices in a table, and then using byte instead of int would use 4 times less memory and take less time to move that memory around. If the array you are indexing is huge, and the index needs to be larger than the machine word side, then that's a different consideration. But I think it's safe to say that in most normal situations this optimization doesn't really make sense, and size_t is probably the most appropriate generic type for array indices all things being equal (since it corresponds exactly to the machine word size, on the majority of architectures).
does index gets casted to int or other operations which would actually slow down the whole process instead of this optimizing it
No, but
that would save useless 3 byte memory allocation
You don't gain anything by saving 3 bytes.
Only if you are storing a huge array of those indices then the amount of space you would save might make it a worthwhile investment.
Otherwise stick with a plain int, it's the processor's native word size and thus the fastest.
I'm porting some C# decompression code to AS3, and since it's doing some pretty complex stuff, it's using a range of datatypes such as byte and short. The problem is, AS3 doesn't have those datatypes.
For the most part I can use uint to hold these values. However, at some points, I get a line such as:
length[symbol++] = (short)len;
To my understanding, this means that len must be read and assigned to the length array as a short. So I'm wondering, how would I do this in AS3? I'm guessing perhaps to do:
length[symbol++] = len & 0xFF;
But I'm unsure if this would give a proper result.
So basically, my question is this: how do I make sure to keep the the correct number of bytes when doing this sort of stuff in AS3? Maybe I should use ByteArrays instead?
Depending on reason why cast is in C# code you may or may not need to keep cast in AS3 code. If cast is purely to adjust type to type of elements of length array (i.e. there is no loss of precision) than you don't need cast. If len can actually be bigger than 0x7FFF you'll need to perform some cast.
I think ByteArray maybe a reasonable option if you need to handle result similar to C# StreamReader, random access may be harder than necessary.
Note that short is 2 bytes long (synonym for System.Int16) so to convert to it using bit manipulations you need to do & 0xFFFF. Be also very careful if casting between signed and unsigned types...
I wanted to try to allocate a 4 billion bytes array and this is my C# code:
long size = 4 * 1000;
size *= 1000;
size *= 1000;
byte[] array = new byte[size];
this code fails with System.OverflowException on the line containing new. Okay, turns out Length returns int, so the array length is also limited to what int can store.
Then why is there no compile-time error and long is allowed to be used as the number of array elements at allocation?
Because the specification says so in section 7.6.10.4:
Each expression in the expression list must be of type int, uint, long, or ulong, or implicitly convertible to one or more of these types.
This is most likely to easily allow creation of arrays larger than 2 GiB, even though they are not supported yet (but will be without a language change once the CLR makes such a change). Mono does support this, however and .NET 4.5 apparently will allow larger arrays too.
Regarding array length being an int by the way: There is also LongLength, returning a long. This was in .NET 1.1 and probably a future-proofing change.
why long is allowed as array length?
Answer is: long in .net means Int64
And array indexing can be Int64 according to specification.
2nd question: Why overflowexception is showing?
Because any single object can not be allocated more than 2GB of memory.
It is a limitation of the CLR, no single object can exceed 2GB, including arrays:
Large array C# OutOfMemoryException
This is regardless of 32-bit or 64-bit OSs. That said, it doesn't stop you from using more than that amount in total, just not on one object.
It is a runtime error because if you keep the long (or other initializing value) within range, it will work.
You can initialize arrays with all integral types: sbyte, char, short, int, and long - all compile; the unsigned variants work too.
There is a solution in .Net 4.5-4.6 for allowing large size for an array.
<runtime>
<gcAllowVeryLargeObjects enabled="true" />
</runtime>
See documentation.
The long is an integral type, so it can be used to define the array; the exception is from building too large of an array, not specifically from using a long.
Ex, this works just fine:
long size = 20;
byte[] array = new byte[size];
Just to avoid inventing hot-water, I am asking here...
I have an application with lots of arrays, and it is running out of memory.
So the thought is to compress the List<int> to something else, that would have same interface (IList<T> for example), but instead of int I could use shorter integers.
For example, if my value range is 0 - 100.000.000 I need only ln2(1000000) = 20 bits. So instead of storing 32 bits, I can trim the excess and reduce memory requirements by 12/32 = 37.5%.
Do you know of an implementation of such array. c++ and java would be also OK, since I could easily convert them to c#.
Additional requirements (since everyone is starting to getting me OUT of the idea):
integers in the list ARE unique
they have no special property so they aren't compressible in any other way then reducing the bit count
if the value range is one million for example, lists would be from 2 to 1000 elements in size, but there will be plenty of them, so no BitSets
new data container should behave like re-sizable array (regarding method O()-ness)
EDIT:
Please don't tell me NOT to do it. The requirement for this is well thought-over, and it is the ONLY option that is left.
Also, 1M of value range and 20 bit for it is ONLY AN EXAMPLE. I have cases with all different ranges and integer sizes.
Also, I could have even shorter integers, for example 7 bit integers, then packing would be
00000001
11111122
22222333
33334444
444.....
for first 4 elements, packed into 5 bytes.
Almost done coding it - will be posted soon...
Since you can only allocate memory in byte quantums, you are essentially asking if/how you can fit the integers in 3 bytes instead of 4 (but see #3 below). This is not a good idea.
Since there is no 3-byte sized integer type, you would need to use something else (e.g. an opaque 3-byte buffer) in its place. This would require that you wrap all access to the contents of the list in code that performs the conversion so that you can still put "ints" in and pull "ints" out.
Depending on both the architecture and the memory allocator, requesting 3-byte chunks might not affect the memory footprint of your program at all (it might simply litter your heap with unusable 1-byte "holes").
Reimplementing the list from scratch to work with an opaque byte array as its backing store would avoid the two previous issues (and it can also let you squeeze every last bit of memory instead of just whole bytes), but it's a tall order and quite prone to error.
You might want instead to try something like:
Not keeping all this data in memory at the same time. At 4 bytes per int, you 'd need to reach hundreds of millions of integers before memory runs out. Why do you need all of them at the same time?
Compressing the dataset by not storing duplicates if possible. There are bound to be a few of them if you are up to hundreds of millions.
Changing your data structure so that it stores differences between successive values (deltas), if that is possible. This might be not very hard to achieve, but you can only realistically expect something at the ballpark of 50% improvement (which may not be enough) and it will totally destroy your ability to index into the list in constant time.
One option that will get your from 32 bits to 24bits is to create a custom struct that stores an integer inside of 3 bytes:
public struct Entry {
byte b1; // low
byte b2; // middle
byte b3; // high
public void Set(int x) {
b1 = (byte)x;
b2 = (byte)(x >> 8);
b3 = (byte)(x >> 16);
}
public int Get() {
return (b3 << 16) | (b2 << 8) | b1;
}
}
You can then just create a List<Entry>.
var list = new List<Entry>();
var e = new Entry();
e.Set(12312);
list.Add(e);
Console.WriteLine(list[0].Get()); // outputs 12312
This reminds me of base64 and similar kinds of binary-to-text encoding.
They take 8 bit bytes and do a bunch of bit-fiddling to pack them into 4-, 5-, or 6-bit printable characters.
This also reminds me of the Zork Standard Code for Information Interchange (ZSCII), which packs 3 letters into 2 bytes, where each letter occupies 5 bits.
It sounds like you want to taking a bunch of 10- or 20-bit integers and pack them into a buffer of 8-bit bytes.
The source code is available for many libraries that handle a packed array of single bits
(a
b
c
d
e).
Perhaps you could
(a) download that source code and modify the source (starting from some BitArray or other packed encoding), recompiling to create a new library that handles packing and unpacking 10- or 20-bit integers rather than single bits.
It may take less programming and testing time to
(b) write a library that, from the outside, appears to act just like (a), but internally it breaks up 20-bit integers into 20 separate bits, then stores them using an (unmodified) BitArray class.
Edit: Given that your integers are unique you could do the following: store unique integers until the number of integers you're storing is half the maximum number. Then switch to storing the integers you don't have. This will reduce the storage space by 50%.
Might be worth exploring other simplification techniques before trying to use 20-bit ints.
How do you treat duplicate integers? If have lots of duplicates you could reduce the storage size by storing the integers in a Dictionary<int, int> where keys are unique integers and values are corresponding counts. Note this assumes you don't care about the order of your integers.
Are your integers all unique? Perhaps you're storing lots of unique integers in the range 0 to 100 mil. In this case you could try storing the integers you don't have. Then when determining if you have an integer i just ask if it's not in your collection.
I'm trying to optimize some code where I have a large number of arrays containing structs of different size, but based on the same interface. In certain cases the structs are larger and hold more data, othertimes they are small structs, and othertimes I would prefer to keep null as a value to save memory.
My first question is. Is it a good idea to do something like this? I've previously had an array of my full data struct, but when testing mixing it up I would virtually be able to save lots of memory. Are there any other downsides?
I've been trying out different things, and it seams to work quite well when making an array of a common interface, but I'm not sure I'm checking the size of the array correctly.
To simplified the example quite a bit. But here I'm adding different structs to an array. But I'm unable to determine the size using the traditional Marshal.SizeOf method. Would it be correct to simply iterate through the collection and count the sizeof for each value in the collection?
IComparable[] myCollection = new IComparable[1000];
myCollection[0] = null;
myCollection[1] = (int)1;
myCollection[2] = "helloo world";
myCollection[3] = long.MaxValue;
System.Runtime.InteropServices.Marshal.SizeOf(myCollection);
The last line will throw this exception:
Type 'System.IComparable[]' cannot be marshaled as an unmanaged structure; no meaningful size or offset can be computed.
Excuse the long post:
Is this an optimal and usable solution?
How can I determine the size
of my array?
I may be wrong but it looks to me like your IComparable[] array is a managed array? If so then you can use this code to get the length
int arrayLength = myCollection.Length;
If you are doing platform interop between C# and C++ then the answer to your question headline "Can I find the length of an unmanaged array" is no, its not possible. Function signatures with arrays in C++/C tend to follow the following pattern
void doSomeWorkOnArrayUnmanaged(int * myUnmanagedArray, int length)
{
// Do work ...
}
In .NET the array itself is a type which has some basic information, such as its size, its runtime type etc... Therefore we can use this
void DoSomeWorkOnManagedArray(int [] myManagedArray)
{
int length = myManagedArray.Length;
// Do work ...
}
Whenever using platform invoke to interop between C# and C++ you will need to pass the length of the array to the receiving function, as well as pin the array (but that's a different topic).
Does this answer your question? If not, then please can you clarify
Optimality always depends on your requirements. If you really need to store many elements of different classes/structs, your solution is completely viable.
However, I guess your expectations on the data structure might be misleading: Array elements are per definition all of the same size. This is even true in your case: Your array doesn't store the elements themselves but references (pointers) to them. The elements are allocated somewhere on the VM heap. So your data structure actually goes like this: It is an array of 1000 pointers, each pointer pointing to some data. The size of each particular element may of course vary.
This leads to the next question: The size of your array. What are you intending to do with the size? Do you need to know how many bytes to allocate when you serialize your data to some persistent storage? This depends on the serialization format... Or do you need just a rough estimate on how much memory your structure is consuming? In the latter case you need to consider the array itself and the size of each particular element. The array which you gave in your example consumes approximately 1000 times the size of a reference (should be 4 bytes on a 32 bit machine and 8 bytes on a 64 bit machine). To compute the sizes of each element, you can indeed iterate over the array and sum up the size of the particular elements. Please be aware that this is only an estimate: The virtual machine adds some memory management overhead which is hard to determine exactly...