Why arithmetic shift halfs a number only in SOME incidents? - c#

Hey, I'm self-learning about bitwise, and I saw somewhere in the internet that arithmetic shift (>>) by one halfs a number. I wanted to test it:
44 >> 1 returns 22, ok
22 >> 1 returns 11, ok
11 >> 1 returns 5, and not 5.5, why?
Another Example:
255 >> 1 returns 127
127 >> 1 returns 63 and not 63.5, why?
Thanks.

The bit shift operator doesn't actually divide by 2. Instead, it moves the bits of the number to the right by the number of positions given on the right hand side. For example:
00101100 = 44
00010110 = 44 >> 1 = 22
Notice how the bits in the second line are the same as the line above, merely
shifted one place to the right. Now look at the second example:
00001011 = 11
00000101 = 11 >> 1 = 5
This is exactly the same operation as before. However, the result of 5 is due to the fact that the last bit is shifted to the right and disappears, creating the result 5. Because of this behavior, the right-shift operator will generally be equivalent to dividing by two and then throwing away any remainder or decimal portion.

11 in binary is 1011
11 >> 1
means you shift your binary representation to the right by one step.
1011 >> 1 = 101
Then you have 101 in binary which is 1*1 + 0*2 + 1*4 = 5.
If you had done 11 >> 2 you would have as a result 10 in binary i.e. 2 (1*2 + 0*1).
Shifting by 1 to the right transforms sum(A_i*2^i) [i=0..n] in sum(A_(i+1)*2^i) [i=0..n-1]
that's why if your number is even (i.e. A_0 = 0) it is divided by two. (sorry for the customised LateX syntax... :))

Binary has no concept of decimal numbers. It's returning the truncated (int) value.
11 = 1011 in binary. Shift to the right and you have 101, which is 5 in decimal.

Bit shifting is the same as division or multiplication by 2^n. In integer arithmetics the result gets rounded towards zero to an integer. In floating-point arithmetics bit shifting is not permitted.
Internally bit shifting, well, shifts bits, and the rounding simply means bits that fall off an edge simply getting removed (not that it would actually calculate the precise value and then round it). The new bits that appear on the opposite edge are always zeroes for the right hand side and for positive values. For negative values, one bits are appended on the left hand side, so that the value stays negative (see how two's complement works) and the arithmetic definition that I used still holds true.

In most statically-typed languages, the return type of the operation is e.g. "int". This precludes a fractional result, much like integer division.
(There are better answers about what's 'under the hood', but you don't need to understand those to grok the basics of the type system.)

Related

Reading and writing non-standard floating point values

I'm working with a binary file (3d model file for an old video game) in C#. The file format isn't officially documented, but some of it has been reverse-engineered by the game's community.
I'm having trouble understanding how to read/write the 4-byte floating point values. I was provided this explanation by a member of the community:
For example, the bytes EE 62 ED FF represent the value -18.614.
The bytes are little endian ordered. The first 2 bytes represent the decimal part of the value, and the last 2 bytes represent the whole part of the value.
For the decimal part, 62 EE converted to decimal is 25326. This represents the fraction out of 1, which also can be described as 65536/65536. Thus, divide 25326 by 65536 and you'll get 0.386.
For the whole part, FF ED converted to decimal is 65517. 65517 represents the whole number -19 (which is 65517 - 65536).
This makes the value -19 + .386 = -18.614.
This explanation mostly makes sense, but I'm confused by 2 things:
Does the magic number 65536 have any significance?
BinaryWriter.Write(-18.613f) writes the bytes as 79 E9 94 C1, so my assumption is the binary file I'm working with uses its own proprietary method of storing 4-byte floating point values (i.e. I can't use C#'s float interchangably and will need to encode/decode the values first)?
Firstly, this isn't a Floating Point Number its a Fix Point Number
Note : A fixed point number has a specific number of bits (or digits) reserved for the integer part (the part to the left of the decimal point)
Does the magic number 65536 have any significance
Its the max number of values unsigned 16 bit number can hold, or 2^16, yeah its significant, because the number you are working with is 2 * 16 bit values encoded for integral and fractional components.
so my assumption is the binary file I'm working with uses its own
proprietary method of storing 4-byte floating point values
Nope wrong again, floating point values in .Net adhere to the IEEE Standard for Floating-Point Arithmetic (IEEE 754) technical standard
When you use BinaryWriter.Write(float); it basically just shifts the bits into bytes and writes it to the Stream.
uint TmpValue = *(uint *)&value;
_buffer[0] = (byte) TmpValue;
_buffer[1] = (byte) (TmpValue >> 8);
_buffer[2] = (byte) (TmpValue >> 16);
_buffer[3] = (byte) (TmpValue >> 24);
OutStream.Write(_buffer, 0, 4);
If you want to read and write this special value you will need to do the same thing, you are going to have to read and write the bytes and convert them your self
This should be a build-in value unique to the game.
It should be more similar to Fraction Value.
Where 62 EE represent the Fraction Part of the value and FF ED represent the Whole Number Part of the value.
The While Number Part is easy to understand so I'm not going to explain it.
The explanation of Fraction Part is:
For every 2 bytes, there are 65536 possibilities (0 ~ 65535).
256 X 256 = 65536
hence the magic number 65536.
And the game itself must have a build-in algorithm to divide the first 2 bytes by 65536.
Choosing any number other than this will be a waste of memory space and result in decreased accuracy of the value which can be represented.
Of course, it's all depended on what kind of accuracy the game wish to present.

How is an integer stored in memory?

This is most probably the dumbest question anyone would ask, but regardless I hope I will find a clear answer for this.
My question is - How is an integer stored in computer memory?
In c# an integer is of size 32 bit. MSDN says we can store numbers from -2,147,483,648 to 2,147,483,647 inside an integer variable.
As per my understanding a bit can store only 2 values i.e 0 & 1. If I can store only 0 or 1 in a bit, how will I be able to store numbers 2 to 9 inside a bit?
More precisely, say I have this code int x = 5; How will this be represented in memory or in other words how is 5 converted into 0's and 1's, and what is the convention behind it?
It's represented in binary (base 2). Read more about number bases. In base 2 you only need 2 different symbols to represent a number. We usually use the symbols 0 and 1. In our usual base we use 10 different symbols to represent all the numbers, 0, 1, 2, ... 8, and 9.
For comparison, think about a number that doesn't fit in our usual system. Like 14. We don't have a symbol for 14, so how to we represent it? Easy, we just combine two of our symbols 1 and 4. 14 in base 10 means 1*10^1 + 4*10^0.
1110 in base 2 (binary) means 1*2^3 + 1*2^2 + 1*2^1 + 0*2^0 = 8 + 4 + 2 + 0 = 14. So despite not having enough symbols in either base to represent 14 with a single symbol, we can still represent it in both bases.
In another commonly used base, base 16, which is also known as hexadecimal, we have enough symbols to represent 14 using only one of them. You'll usually see 14 written using the symbol e in hexadecimal.
For negative integers we use a convenient representation called twos-complement which is the complement (all 1s flipped to 0 and all 0s flipped to 1s) with one added to it.
There are two main reasons this is so convenient:
We know immediately if a number is positive of negative by looking at a single bit, the most significant bit out of the 32 we use.
It's mathematically correct in that x - y = x + -y using regular addition the same way you learnt in grade school. This means that processors don't need to do anything special to implement subtraction if they already have addition. They can simply find the twos-complement of y (recall, flip the bits and add one) and then add x and y using the addition circuit they already have, rather than having a special circuit for subtraction.
This is not a dumb question at all.
Let's start with uint because it's slightly easier. The convention is:
You have 32 bits in a uint. Each bit is assigned a number ranging from 0 to 31. By convention the rightmost bit is 0 and the leftmost bit is 31.
Take each bit number and raise 2 to that power, and then multiply it by the value of the bit. So if bit number three is one, that's 1 x 23. If bit number twelve is zero, that's 0 x 212.
Add up all those numbers. That's the value.
So five would be 00000000000000000000000000000101, because 5 = 1 x 20 + 0 x 21 + 1 x 22 + ... the rest are all zero.
That's a uint. The convention for ints is:
Compute the value as a uint.
If the value is greater than or equal to 0 and strictly less than 231 then you're done. The int and uint values are the same.
Otherwise, subtract 232 from the uint value and that's the int value.
This might seem like an odd convention. We use it because it turns out that it is easy to build chips that perform arithmetic in this format extremely quickly.
Binary works as follows (as your 32 bits).
1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1
2^ 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16......................................0
x
x = sign bit (if 1 then negative number if 0 then positive)
So the highest number is 0111111111............1 (all ones except the negative bit), which is 2^30 + 2 ^29 + 2^28 +........+2^1 + 2^0 or 2,147,483,647.
The lowest is 1000000.........0, meaning -2^31 or -2147483648.
Is this what high level languages lead to!? Eeek!
As other people have said it's a base 2 counting system. Humans are naturally base 10 counters mostly, though time for some reason is base 60, and 6 x 9 = 42 in base 13. Alan Turing was apparently adept at base 17 mental arithmetic.
Computers operate in base 2 because it's easy for the electronics to be either on or off - representing 1 and 0 which is all you need for base 2. You could build the electronics in such a way that it was on, off or somewhere in between. That'd be 3 states, allowing you to do tertiary math (as opposed to binary math). However the reliability is reduced because it's harder to tell the difference between those three states, and the electronics is much more complicated. Even more levels leads to worse reliability.
Despite that it is done in multi level cell flash memory. In these each memory cell represents on, off and a number of intermediate values. This improves the capacity (each cell can store several bits), but it is bad news for reliability. This sort of chip is used in solid state drives, and these operate on the very edge of total unreliability in order to maximise capacity.

C# bitwise addition and subtraction like in C

I'm writing up a QuadTree class and I need to include neighbor searches. I'm following along with this paper which includes a C implementation. (I'm basically converting it to C#.) And I'm having trouble because I'm not very good at working with binary numbers. I need to do the following things:
given #100 and #010, combine them to get #110. 1s will never overlap. right now what I do is store 100 and 10, when I add them it gives me 110.
perform left and right binary shifts on numbers like #011. right now I was storing 011 and applying the >> or << operator to it. This gives me completely the wrong number but strangely the logic I used it in to find common quad tree parents was working just fine.
given #011 and #001 sum them to get #100. Right now I'm storing 11 and 1, this obviously doesn't work at all
.
Perform bitwise subtraction, same as above but - instead of + :)
perform ANDs, ORs and XORs.
So, right now I'm storing everything as an integer that looks like the binary value, and the math seems to be working for the ANDs, ORs, XORs and shifts. The numbers are crazy in the middle but I get the right answer in the end.
I'm trying an implementation right now where I store the integer value that is the value of the binary number but I have to convert back and forth a lot, this is particularly a pain because I'm converting between an integer representation of the Value of the number and a string representation of the binary number itself. Creating a new string every second line of code is going to be a performance nightmare.
Can you guys recommend a basic strategy for working with binary numbers in C#?
P.S. I don't ever need to see the integer representation of these numbers. I really don't care that #010 == 2, the work I need to do sees it as #010 end of story.
Hopefully this is more clear than my original post!
I hope this will clear up some confusion:
The literal 0x00000001 represents 1 using hexadecimal notation.
The literal 1 represents 1 using decimal notation.
Here are some pairs of equivalent literals:
0x9 == 9
0xA == 10
0xF == 15
0x10 == 16
0x11 == 17
0x1F == 31
The primitive integral data types (byte, sbyte, ushort, short, char, uint, int, ulong, long) are are always stored in the computer's memory in binary format. Here are the above values notated with five bits:
1 == #00001
9 == #01001
10 == #01010
15 == #01111
16 == #10000
17 == #10001
31 == #11111
Now, consider:
unsigned int xLeftLocCode = xLocCode - 0x00000001;
Either of the following is equivalent in C#:
uint xLeftLocCode = xLocCode - 1;
uint xLeftLocCode = xLocCode - 0x00000001;
To assign the binary #10 to a uint variable, do this:
uint binaryOneZero = 2; //or 0x2
Hexadecimal notation is useful for bit manipulation because there is a one-to-one correspondence between hexadecimal and binary digits. More accurately, there is a one-to-four correspondence, as each of the 16 hexadecimal digits represents one of the 16 possible states of four bits:
0 == 0 == 0000
1 == 1 == 0001
2 == 2 == 0010
3 == 3 == 0011
4 == 4 == 0100
5 == 5 == 0101
6 == 6 == 0110
7 == 7 == 0111
8 == 8 == 1000
9 == 9 == 1001
A == 10 == 1010
B == 11 == 1011
C == 12 == 1100
D == 13 == 1101
E == 14 == 1110
F == 15 == 1111
It's therefore fairly easy to know that 0x700FF, for example, has all the bits of the least significant byte set, none of the next most significant byte, and the lower three bits of the next most significant byte. It's much harder to see that information at a glance if you use the literal 459007 to represent that value.
In conclusion:
Can you guys recommend a basic strategy for working with binary numbers in C#?
Learn the table above, and use hexadecimal literals.
why that line of interest does bitwise subtraction in C but integer subtraction in C#.
They should both do the same. Why do you think otherwise? And what do you mean with bitwise subtraction?
I'm storing the binary number 10 as the integer 10, not as the integer 2. I could be doing this all wrong.
Yes, That's wrong. #010 == 2, not 10.
I guess what I looking for is some basic strategy for working with binary numbers in C#.
By and large, those operations should be the same as in C.
The line you marked has nothing to do with bitwise anything. It's a straight subtraction of the number 1 (expressed as a hex double word (DWORD), or unsigned long integer (uint in C#).

Why does -2 % 360 give -2 instead of 358 in c#

Microsoft Mathematics and Google's calculator give me 358 for -2 % 360, but C# and windows calculator are outputting -2 ... which is the right answer ?
The C# compiler is doing the right thing according to the C# specification, which states that for integers:
The result of x % y is the value produced by x – (x / y) * y.
Note that (x/y) always rounds towards zero.
For the details of how remainder is computed for binary and decimal floating point numbers, see section 7.8.3 of the specification.
Whether this is the "right answer" for you depends on how you view the remainder operation. The remainder must satisfy the identity that:
dividend = quotient * divisor + remainder
I say that clearly -2 % 360 is -2. Why? Well, first ask yourself what the quotient is. How many times does 360 go into -2? Clearly zero times! 360 doesn't go into -2 at all. If the quotient is zero then the remainder must be -2 in order to satisfy the identity. It would be strange to say that 360 goes into -2 a total of -1 times, with a remainder of 358, don't you think?
Which is the right answer?
Both answers are correct. It's merely a matter of convention which value is returned.
Both, see Modulo operation on Wikipedia.
I found this very easy to understand explanation at http://mathforum.org/library/drmath/view/52343.html
There are different ways of thinking about remainders when you deal
with negative numbers, and he is probably confusing two of them. The
mod function is defined as the amount by which a number exceeds the
largest integer multiple of the divisor that is not greater than that
number. In this case, -340 lies between -360 and -300, so -360 is the
greatest multiple LESS than -340; we subtract 60 * -6 = -360 from -340
and get 20:
-420 -360 -300 -240 -180 -120 -60 0 60 120 180 240 300 360
--+----+----+----+----+----+----+----+----+----+----+----+----+----+--
| | | |
-360| |-340 300| |340
|=| |==|
20 40
Working with a positive number like 340, the multiple we subtract is
smaller in absolute value, giving us 40; but with negative numbers, we
subtract a number with a LARGER absolute value, so that the mod
function returns a positive value. This is not always what people
expect, but it is consistent.
If you want the remainder, ignoring the sign, you have to take the
absolute value before using the mod function.
Doctor Peterson, The Math Forum
http://mathforum.org/dr.math/
IMO, -2 is much easier to understand and code with. If you divide -2 by 360, your answer is 0 remainder -2 ... just as dividing 2 by 360 is 0 remainder 2. It's not as natural to consider that 358 is also the remainder of -2 mod 360.
From wikipedia:
if the remainder is nonzero, there are two possible choices for the
remainder, one negative and the other positive, and there are also two
possible choices for the quotient. Usually, in number theory, the
positive remainder is always chosen, but programming languages choose
depending on the language and the signs of a and n.[2] However, Pascal
and Algol68 do not satisfy these conditions for negative divisors, and
some programming languages, such as C89, don't even define a result if
either of n or a is negative.

C/C++ Date Solution/Conversion

I need to come up with a way to unpack a date into a readable format. unfortunately I don't completely understand the original process/code that was used.
Per information that was forwarded to me the date was packed using custom C/Python code as follows;
date = year << 20;
date |= month << 16;
date |= day << 11;
date |= hour << 6;
date |= minute;
For example, a recent packed date is 2107224749 which equates to Tuesday Sept. 22 2009 10:45am
I understand....or at least I am pretty sure....the << is shifting the bits but I am not sure what the "|" accomplishes.
Also, in order to unpack the code the notes read as follows;
year = (date & 0xfff00000) >> 20;
month = (date & 0x000f0000) >> 16;
day = (date & 0x0000f800) >> 11;
hour = (date & 0x000007c0) >> 6;
minute = (date & 0x0000003f);
Ultimately, what I need to do is perform the unpack and convert to readable format using either JavaScript or ASP but I need to better understand the process above in order to develop a solution.
Any help, hints, tips, pointers, ideas, etc. would be greatly appreciated.
The pipe (|) is bitwise or, it is used to combine the bits into a single value.
The extraction looks straight-forward, except I would recommend shifting first, and masking then. This keeps the constant used for the mask as small as possible, which is easier to manage (and can possibly be a tad more efficient, although for this case that hardly matters).
Looking at the masks used written in binary reveals how many bits are used for each field:
0xfff00000 has 12 bits set, so 12 bits are used for the year
0x000f0000 has 4 bits set, for the month
0x0000f800 has 5 bits set, for the day
0x000007c0 has 5 bits set, for the hour
0x0000003f has 6 bits set, for the minute
The idea is exactly what you said. Performing "<<" just shifts the bits to the left.
What the | (bitwise or) is accomplishing is basically adding more bits to the number, but without overwriting what was already there.
A demonstration of this principle might help.
Let's say we have a byte (8 bits), and we have two numbers that are each 4 bits, which we want to "put together" to make a byte. Assume the numbers are, in binary, 1010, and 1011. So we want to end up with the byte: 10101011.
Now, how do we do this? Assume we have a byte b, which is initialized to 0.
If we take the first number we want to add, 1010, and shift it by 4 bits, we get the number 10100000 (the shift adds bytes to the right of the number).
If we do: b = (1010 << 4), b will have the value 10100000.
But now, we want to add the 4 more bits (0011), without touching the previous bits. To do this, we can use |. This is because the | operator "ignores" anything in our number which is zero. So when we do:
10100000 (b's current value)
|
00001011 (the number we want to add)
We get:
10101011 (the first four numbers are copied from the first number,
the other four numbers copied from the second number).
Note: This answer came out a little long, I'm wikiing this, so, if anyone here has a better idea how to explain it, I'd appreciate your help.
These links might help:
http://www.gamedev.net/reference/articles/article1563.asp
http://compsci.ca/v3/viewtopic.php?t=9893
In the decode section & is bit wise and the 0xfff00000 is a hexadecimal bit mask. Basically each character in the bit mask represents 4 bits of the number. 0 being 0000 in binary and f being 1111 so if you look at the operation in binary you are anding 1111 1111 1111 0000 0000 ... with whatever is in date so basically you are getting the upper three nibbles(half bytes) and shifting them down so that 00A00000 gives you 10(A in hex) for the year.
Also note that |= is like += it is bit wise or then assignment rolled in to one.
Just to add some practical tips:
minute = value & ((1 << 6)-1);
hour = (value >> 6) & ((1<<5)-1); // 5 == 11-6 == bits reserved for hour
...
1 << 5 creates a bit at position 5 (i.e. 32=00100000b),
(1<<5)-1 cretaes a bit mask where the 5 lowest bits are set (i.e. 31 == 00011111b)
x & ((1<<5)-1) does a bitwise 'and' preserving only the bits set in the lowest five bits, extracting the original hour value.
Yes the << shifts bits and the | is the bitwise OR operator.

Categories