Class Vs Pure Array Representation

Class Vs Pure Array Representation - c#

We need to represent huge numbers in our application. We're doing this using integer arrays. The final production should be maxed for performance. We were thinking about encapsulating our array in a class so we could add properties to be related to the array such as isNegative, numberBase and alike.
We're afraid that using classes, however, will kill us performance wise. We did a test where we created a fixed amount of arrays and set it's value through pure arrays usage and where a class was created and the array accessed through the class:
for (int i = 0; i < 10000; i++)
{
if (createClass)
{
BigNumber b = new BigNumber(new int[5000], 10);
for (int j = 0; j < b.Number.Length; j++)
{
b[j] = 5;
}
}
else
{
int[] test = new int[5000];
for (int j = 0; j < test.Length; j++)
{
test[j] = 5;
}
}
}
And it appears that using classes slows down the runnign time of the above code by a factor 6 almost. We tried the above just by encapsulating the array in a struct instead which caused the running time to be almost equal to pure array usage.
What is causing this huge overhead when using classes compared to structs? Is it really just the performance gain you get when you use the stack instead of the heap?
BigNumber just stores the array in a private variable exposed by a property. Simplified:
public class BigNumber{
private int[] number;
public BigNumber(int[] number) { this.number = number;}
public int[] Number{get{return number;}}
}

It's not surprising that the second loop is much faster than the first one. What's happening is not that the class is extraordinarily slow, it's that the loop is really easy for the compiler to optimize.
As the loop ranges from 0 to test.Length-1, the compiler can tell that the index variable can never be outside of the array, so it can remove the range check when accessing the array by index.
In the first loop the compiler can't do the connection between the loop and the array, so it has to check the index against the boundaries for each item that is accessed.
There will always be a bit of overhead when you encapsulate an array inside a class, but it's not as much as the difference that you get in your test. You have chosen a situation where the compiler is able to optimize the plain array access very well, so what you are testing is more the compilers capability to optimize the code rather than what you set out to test.

You should profile the code when you run it and see where the time is being spent.
Also consider another language that makes it easy to use big ints.

You're using an integer data type to store a single digit, which is part of a really large number. This is wrong
The numerals 0-9 can be represented in 4 bits. A byte contains 8 bits. So, you can stuff 2 digits into a single byte (there's your first speed up hint).
Now, go look up how many bytes an integer is taking up (hint: it will be way more than you need to store a single digit).
What's killing performance is the use of integers, which is consuming about 4 times as much memory as you should be. Use bytes or, worst case, a character array (2 digits per byte or character) to store the numerals. It doesn't take a whole lot of logic to "pack" and "unpack" the numerals into a byte.

From the face of it, I would not expect a big difference. Certainly not a factor 6. BigNumber is just a class around an int[] isn't it? It would help if you showed us a little from BigNumber. And check your benchmarking.
It would be ideal if you posted something small we could copy/paste and run.

Without seeing your BigInteger implementation, it is very difficult to tell. However, I have two guesses.
1) Your line, with the array test, can get special handling by the JIT which removes the array bounds checking. This can give you a significant boost, especially since you're not doing any "real work" in the loop
for (int j = 0; j < test.Length; j++) // This removes bounds checking by JIT
2) Are you timing this in release mode, outside of Visual Studio? If not, that, alone, would explain your 6x speed drop, since the visual studio hosting process slows down class access artificially. Make sure you're in release mode, using Ctrl+F5 to test your timings.

Rather than reinventing (and debugging and perfecting) the wheel, you might be better served using an existing big integer implementation, so you can get on with the rest of your project.
This SO topic is a good start.
You might also check out this CodeProject article.

As pointed out by Guffa, the difference is mostly bounds checking.
To guarantee that bounds checking will not ruin performance, you can also put your tight loops in an unsafe block and this will eliminate bounds checking. To do this you'll need to compile with the /unsafe option.

//pre-load the bits -- do this only ONCE
byte[] baHi = new byte[16];
baHi[0]=0;
baHi[1] = 000 + 00 + 00 + 16; //0001
baHi[2] = 000 + 00 + 32 + 00; //0010
baHi[3] = 000 + 00 + 32 + 16; //0011
baHi[4] = 000 + 64 + 00 + 00; //0100
baHi[5] = 000 + 64 + 00 + 16; //0101
baHi[6] = 000 + 64 + 32 + 00; //0110
baHi[7] = 000 + 64 + 32 + 16; //0111
baHi[8] = 128 + 00 + 00 + 00; //1000
baHi[9] = 128 + 00 + 00 + 16; //1001
//not needed for 0-9
//baHi[10] = 128 + 00 + 32 + 00; //1010
//baHi[11] = 128 + 00 + 32 + 16; //1011
//baHi[12] = 128 + 64 + 00 + 00; //1100
//baHi[13] = 128 + 64 + 00 + 16; //1101
//baHi[14] = 128 + 64 + 32 + 00; //1110
//baHi[15] = 128 + 64 + 32 + 16; //1111
//-------------------------------------------------------------------------
//START PACKING
//load TWO digits (0-9) at a time
//this means if you're loading a big number from
//a file, you read two digits at a time
//and put them into bLoVal and bHiVal
//230942034371231235 see that '37' in the middle?
// ^^
//
byte bHiVal = 3; //0000 0011
byte bLoVal = 7; //0000 1011
byte bShiftedLeftHiVal = (byte)baHi[bHiVal]; //0011 0000 =3, shifted (48)
//fuse the two together into a single byte
byte bNewVal = (byte)(bShiftedLeftHiVal + bLoVal); //0011 1011 = 55 decimal
//now store bNewVal wherever you want to store it
//for later retrieval, like a byte array
//END PACKING
//-------------------------------------------------------------------------
Response.Write("PACKING: hi: " + bHiVal + " lo: " + bLoVal + " packed: " + bNewVal);
Response.Write("<br>");
//-------------------------------------------------------------------------
//START UNPACKING
byte bUnpackedLoByte = (byte)(bNewVal & 15); //will yield 7
byte bUnpackedHiByte = (byte)(bNewVal & 240); //will yield 48
//now we need to change '48' back into '3'
string sHiBits = "00000000" + Convert.ToString(bUnpackedHiByte, 2); //drops leading 0s, so we pad...
sHiBits = sHiBits.Substring(sHiBits.Length - 8, 8); //and get the last 8 characters
sHiBits = ("0000" + sHiBits).Substring(0, 8); //shift right
bUnpackedHiByte = (byte)Convert.ToSByte(sHiBits, 2); //and, finally, get back the original byte
//the above method, reworked, could also be used to PACK the data,
//though it might be slower than hitting an array.
//You can also loop through baHi to unpack, comparing the original
//bUnpackedHyByte to the contents of the array and return
//the index of where you found it (the index would be the
//unpacked digit)
Response.Write("UNPACKING: input: " + bNewVal + " hi: " + bUnpackedHiByte + " lo: " + bUnpackedLoByte);
//now create your output with bUnpackedHiByte and bUnpackedLoByte,
//then move on to the next two bytes in where ever you stored the
//really big number
//END UNPACKING
//-------------------------------------------------------------------------
Even if you just changed your INT to SHORT in your original solution you'd chop your memory requirements in half, the above takes memory down to almost a bare minimum (I'm sure someone will come along screaming about a few wasted bytes)

Related

Bit reverse numbers by N bits

I am trying to find a simple algorithm that reverses the bits of a number up to N number of bits. For example:
For N = 2:
01 -> 10
11 -> 11
For N = 3:
001 -> 100
011 -> 110
101 -> 101
The only things i keep finding is how to bit reverse a full byte but thats only going to work for N = 8 and thats not always what i need.
Does any one know an algorithm that can do this bitwise operation? I need to do many of them for an FFT so i'm looking for something that can be very optimised too.

It is the C# implementation of bitwise reverse operation:
public uint Reverse(uint a, int length)
{
uint b = 0b_0;
for (int i = 0; i < length; i++)
{
b = (b << 1) | (a & 0b_1);
a = a >> 1;
}
return b;
}
The code above first shifts the output value to the left and adds the bit in the smallest position of the input to the output and then shifts the input to right. and repeats it until all bits finished. Here are some samples:
uint a = 0b_1100;
uint b = Reverse(a, 4); //should be 0b_0011;
And
uint a = 0b_100;
uint b = Reverse(a, 3); //should be 0b_001;
This implementation's time complexity is O(N) which N is the length of the input.
Edit in Dotnet fiddle

Here's a small look-up table solution that's good for (2<=N<=32).
For N==8, I think everyone agrees that a 256 byte array lookup table is the way to go. Similarly, for N from 2 to 7, you could create 4, 8, ... 128 lookup byte arrays.
For N==16, you could flip each byte and then reorder the two bytes. Similarly, for N==24, you could flip each byte and then reorder things (which would leave the middle one flipped but in the same position). It should be obvious how N==32 would work.
For N==9, think of it as three 3-bit numbers (flip each of them, reorder them and then do some masking and shifting to get them in the right position). For N==10, it's two 5-bit numbers. For N==11, it's two 5-bit numbers on either side of a center bit that doesn't change. The same for N==13 (two 6-bit numbers around an unchanging center bit). For a prime like N==23, it would be a pair of 8- bit numbers around a center 7-bit number.
For the odd numbers between 24 and 32 it gets more complicated. You probably need to consider five separate numbers. Consider N==29, that could be four 7-bit numbers around an unchanging center bit. For N==31, it would be a center bit surround by a pair of 8-bit numbers and a pair of 7-bit numbers.
That said, that's a ton of complicated logic. It would be a bear to test. It might be faster than #MuhammadVakili's bit shifting solution (it certainly would be for N<=8), but it might not. I suggest you go with his solution.

Using string manipulation?
static void Main(string[] args)
{
uint number = 269;
int numBits = 4;
string strBinary = Convert.ToString(number, 2).PadLeft(32, '0');
Console.WriteLine($"{number}");
Console.WriteLine($"{strBinary}");
string strBitsReversed = new string(strBinary.Substring(strBinary.Length - numBits, numBits).ToCharArray().Reverse().ToArray());
string strBinaryModified = strBinary.Substring(0, strBinary.Length - numBits) + strBitsReversed;
uint numberModified = Convert.ToUInt32(strBinaryModified, 2);
Console.WriteLine($"{strBinaryModified}");
Console.WriteLine($"{numberModified}");
Console.Write("Press Enter to Quit.");
Console.ReadLine();
}
Output:
269
00000000000000000000000100001101
00000000000000000000000100001011
267
Press Enter to Quit.

Defining a bit[] array in C#

currently im working on a solution for a prime-number calculator/checker. The algorythm is already working and verry efficient (0,359 seconds for the first 9012330 primes). Here is a part of the upper region where everything is declared:
const uint anz = 50000000;
uint a = 3, b = 4, c = 3, d = 13, e = 12, f = 13, g = 28, h = 32;
bool[,] prim = new bool[8, anz / 10];
uint max = 3 * (uint)(anz / (Math.Log(anz) - 1.08366));
uint[] p = new uint[max];
Now I wanted to go to the next level and use ulong's instead of uint's to cover a larger area (you can see that already), where i tapped into my problem: the bool-array.
Like everybody should know, bool's have the length of a byte what takes a lot of memory when creating the array... So I'm searching for a more resource-friendly way to do that.
My first idea was a bit-array -> not byte! <- to save the bool's, but haven't figured out how to do that by now. So if someone ever did something like this, I would appreciate any kind of tips and solutions. Thanks in advance :)

You can use BitArray collection:
http://msdn.microsoft.com/en-us/library/system.collections.bitarray(v=vs.110).aspx
MSDN Description:
Manages a compact array of bit values, which are represented as Booleans, where true indicates that the bit is on (1) and false indicates the bit is off (0).

You can (and should) use well tested and well known libraries.
But if you're looking to learn something (as it seems to be the case) you can do it yourself.
Another reason you may want to use a custom bit array is to use the hard drive to store the array, which comes in handy when calculating primes. To do this you'd need to further split addr, for example lowest 3 bits for the mask, next 28 bits for 256MB of in-memory storage, and from there on - a file name for a buffer file.
Yet another reason for custom bit array is to compress the memory use when specifically searching for primes. After all more than half of your bits will be 'false' because the numbers corresponding to them would be even, so in fact you can both speed up your calculation AND reduce memory requirements if you don't even store the even bits. You can do that by changing the way addr is interpreted. Further more you can also exclude numbers divisible by 3 (only 2 out of every 6 numbers has a chance of being prime) thus reducing memory requirements by 60% compared to plain bit array.
Notice the use of shift and logical operators to make the code a bit more efficient.
byte mask = (byte)(1 << (int)(addr & 7)); for example can be written as
byte mask = (byte)(1 << (int)(addr % 8));
and addr >> 3 can be written as addr / 8
Testing shift/logical operators vs division shows 2.6s vs 4.8s in favor of shift/logical for 200000000 operations.
Here's the code:
void Main()
{
var barr = new BitArray(10);
barr[4] = true;
Console.WriteLine("Is it "+barr[4]);
Console.WriteLine("Is it Not "+barr[5]);
}
public class BitArray{
private readonly byte[] _buffer;
public bool this[long addr]{
get{
byte mask = (byte)(1 << (int)(addr & 7));
byte val = _buffer[(int)(addr >> 3)];
bool bit = (val & mask) == mask;
return bit;
}
set{
byte mask = (byte) ((value ? 1:0) << (int)(addr & 7));
int offs = (int)addr >> 3;
_buffer[offs] = (byte)(_buffer[offs] | mask);
}
}
public BitArray(long size){
_buffer = new byte[size/8 + 1]; // define a byte buffer sized to hold 8 bools per byte. The spare +1 is to avoid dealing with rounding.
}
}

ToString("X") produces single digit hex numbers

We wrote a crude data scope.
(The freeware terminal programs we found were unable to keep up with Bluetooth speeds)
The results are okay, and we are writing them to a Comma separated file for use with a spreadsheet. It would be better to see the hex values line up in nice columns in the RichTextBox instead of the way it looks now (Screen cap appended).
This is the routine that adds the digits (e.g., numbers from 0 to FF) to the text in the RichTextBox.
public void Write(byte[] b)
{
if (writting)
{
for (int i = 0; i < b.Length; i++)
{
storage[sPlace++] = b[i];
pass += b[i].ToString("X") + " "; //// <<<--- Here is the problem
if (sPlace % numericUpDown1.Value == 0)
{
pass += "\r\n";
}
}
}
}
I would like a way for the instruction pass += b[i].ToString("X") + " "; to produce a leading zero on values from 00h to 0Fh
Or, some other way to turn the value in byte b into two alphabetic characters from 00 to FF
Digits on left, FF 40 0 5 Line up nice and neatly, because they are identical. As soon as we encounter any difference in data, the columns vanish and the data become extremely difficult to read with human observation.

Use a composite format string:
pass += b[i].ToString("X2") + " ";
The documentation on MSDN, Standard Numeric Format Strings has examples.

C to C# Bytearray + hex

I'm currently trying to get this C code converted into C#.
Since I'm not really familiar with C I'd really apprecheate your help!
static unsigned char byte_table[2080] = {0};
First of, some bytearray gets declared but never filled which I'm okay with
BYTE* packet = //bytes come in here from a file
int unknownVal = 0;
int unknown_field0 = *(DWORD *)(packet + 0x08);
do
{
*((BYTE *)packet + i) ^= byte_table[(i + unknownVal) & 0x7FF];
++i;
}
while (i <= packet[0]);
But down here.. I really have no idea how to translate this into C#
BYTE = byte[] right?
DWORD = double?
but how can (packet + 0x08) be translated? How can I add a hex to a bytearray? Oo
I'd be happy about anything that helps! :)

In C, setting any set of memory to {0} will set the entire memory area to zeroes, if I'm not mistaken.
That bottom loop can be rewritten in a simpler, C# friendly fashion.
byte[] packet = arrayofcharsfromfile;
int field = packet[8]+(packet[9]<<8)+(packet[10]<<16)+(packet[11]<<24); //Assuming 32 bit little endian integer
int unknownval = 0;
int i = 0;
do //Why waste the newline? I don't know. Conventions are silly!
{
packet[i] ^= byte_table[(i+unknownval) & 0x7FF];
} while( ++i <= packet[0] );
field is set by taking the four bytes including and following index 8 and generating a 32 bit int from them.
In C, you can cast pointers to other types, as is done in your provided snippet. What they're doing is taking an array of bytes (each one 1/4 the size of a DWORD) and adding 8 to the index which advances the pointer by 8 bytes (since each element is a byte wide) and then treating that pointer as a DWORD pointer. In simpler terms, they're turning the byte array in to a DWORD array, and then taking index 2, as 8/4=2.
You can simulate this behavior in a safe fashion by stringing the bytes together with bitshifting and addition, as I demonstrated above. It's not as efficient and isn't as pretty, but it accomplishes the same thing, and in a platform agnostic way too. Not all platforms are little endian.

StringBuilder Class OutOfMemoryException

I have written following function
public void TestSB()
{
string str = "The quick brown fox jumps over the lazy dog.";
StringBuilder sb = new StringBuilder();
int j = 0;
int len = 0;
try
{
for (int i = 0; i < (10000000 * 2); i++)
{
j = i;
len = sb.Length;
sb.Append(str);
}
Console.WriteLine("Success ::" + sb.Length.ToString());
}
catch (Exception ex)
{
Console.WriteLine(
ex.Message + " :: " + j.ToString() + " :: " + len.ToString());
}
}
Now I suppose, that StringBuilder has the capacity to take over 2 billion character (2,147,483,647 to be precise).
But when I ran the above function it gave System.OutOfMemoryException just on reaching the capacity of about 800 million.
Moreover, I am seeing widely different result on different PC having same memory and similar amount of load.
Can anyone please provide or explain me the reason for this?

Each character requires 2 bytes (as a char in .NET is a UTF-16 code unit). So by the time you've reached 800 million characters, that's 1.6GB of contiguous memory required1. Now when the StringBuilder needs to resize itself, it has to create another array of the new size (which I believe tries to double the capacity) - which means trying to allocate a 3.2GB array.
I believe that the CLR (even on 64-bit systems) can't allocate a single object of more than 2GB in size. (That certainly used to be the case.) My guess is that your StringBuilder is trying to double in size, and blowing that limit. You may be able to get a little higher by constructing the StringBuilder with a specific capacity - a capacity of around a billion may be feasible.
In the normal course of things this isn't a problem, of course - even strings requiring hundreds of megs are rare.
1 I believe the implementation of StringBuilder actually changed in .NET 4 to use fragments in some situations - but I don't know the details. So it may not always need contiguous memory while still in builder form... but it would if you ever called ToString.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Class Vs Pure Array Representation - c#

You should profile the code when you run it and see where the time is being spent. Also consider another language that makes it easy to use big ints.

From the face of it, I would not expect a big difference. Certainly not a factor 6. BigNumber is just a class around an int[] isn't it? It would help if you showed us a little from BigNumber. And check your benchmarking. It would be ideal if you posted something small we could copy/paste and run.

Rather than reinventing (and debugging and perfecting) the wheel, you might be better served using an existing big integer implementation, so you can get on with the rest of your project. This SO topic is a good start. You might also check out this CodeProject article.

As pointed out by Guffa, the difference is mostly bounds checking. To guarantee that bounds checking will not ruin performance, you can also put your tight loops in an unsafe block and this will eliminate bounds checking. To do this you'll need to compile with the /unsafe option.

Related

Bit reverse numbers by N bits

Defining a bit[] array in C#

ToString("X") produces single digit hex numbers

C to C# Bytearray + hex

StringBuilder Class OutOfMemoryException

Categories

Resources