I have rarely used bitmasks and am trying to get more familiar with them. I understand various basic usages of them. As I understand them this should be able to work, but it seems to not work.
I have a use case where I have four different ints that may come in different arrangements, and I need to check if the current arrangement of ints has already come before as a different arrangement.
So one iteration they might come as:
2, 5, 10, 8
Next iteration:
1, 0, 2, 5
Now on the next iteration if this comes:
0, 1, 2, 5
It needs to discern that last set has already come in a different arrangement and skip it.
I am wondering, can I create a mask out of these ints, put them in a HashSet, so then I have easy lookup for whether or not that set of ints has come before?
Basically I am doing this:
int mask = int0 & int1 & int2 & int3;
if (checkHashSet.Contains(mask))
return; // int set already came, skip
//int set has not been processed, add mask and process
checkHashSet.Add(mask);
But that seems to be producing a mask that ends up equal to all following masks generated. So this doesn't work.
Can this work like this somehow? What would be the most performant way to check if a set of ints, no matter their arrangement, has already been processed?
Bit mask is generated by shift
int mask = (1 << int0) & (1 << int1) & (1 << int2) & (1 << int3);
HashSet.Add will check whether the item exists, Contains is redundant.
if(checkHashSet.Add(mask))
//int set has not been processed, add mask and process
else
// int set already came, skip
If the integer is greater than 31, you can use long or ulong, if it is greater than 64, then use 2 longs or BigInteger
To your question of "is there a better way". I think probably yes.
1) Sort your inputs and keep a map of entries you've already seen as a string. This is close to what you proposed but would be easier to read & implement.
2) An map having a key that is int[1000] would be easier than bit masks. As you process your input, for each number you find, increment the location in the array ++array[n]. You can then add that to a map with array[] as your key. You can search the map to work out if you've seen a particular combination before.
3) 1<< n only goes so far so you'd need array of ints to do it as a proper mask. array[n / 64] & (1 << (n % 64) ) or something like that. A good solution for embedded perhaps, but generally harder to understand. Also, it won't work reliably if any number is repeated as bits can only only mark one occurrence. Or use BigInteger as per Shingo's answer.
Personally I'd go the first one.
Related
C# 8.0 introduces a convenient way to slice arrays - see official C# 8.0 blogpost.
The syntax to access the last element of an array is
var value = new[] { 10, 11, 12, 13 };
int a = value[^1]; // 13
int b = value[^2]; // 12
I'm wondering why the indexing for accessing the elements backwards starts at 1 instead of 0? Is there a technical reason for this?
Official answer
Here is a comment from Mads Torgersen explaining this design decision from the C# 8 blog post:
We decided to follow Python when it comes to the from-beginning and from-end arithmetic. 0 designates the first element (as always), and ^0 the “length’th” element, i.e. the one right off the end. That way you get a simple relationship, where an element's position from beginning plus its position from end equals the length. the x in ^x is what you would have subtracted from the length if you’d done the math yourself.
Why not use the minus (-) instead of the new hat (^) operator? This primarily has to do with ranges. Again in keeping with Python and most of the industry, we want our ranges to be inclusive at the beginning, exclusive at the end. What is the index you pass to say that a range should go all the way to the end? In C# the answer is simple: x..^0 goes from x to the end. In Python, there is no explicit index you can give: -0 doesn’t work, because it is equal to 0, the first element! So in Python, you have to leave the end index off completely to express a range that goes to the end: x... If the end of the range is computed, then you need to remember to have special logic in case it comes out to 0. As in x..-y, where y was computed and came out to 0. This is a common nuisance and source of bugs.
Finally, note that indices and ranges are first class types in .NET/C#. Their behavior is not tied to what they are applied to, or even to be used in an indexer. You can totally define your own indexer that takes Index and another one that takes Range – and we’re going to add such indexers to e.g. Span. But you can also have methods that take ranges, for instance.
My answer
I think this is to match the classic syntax we are used to:
value[^1] == value[value.Length - 1]
If it used 0, it would be confusing when the two syntaxes were used side-by-side. This way it has lower cognitive load.
Other languages like Python also use the same convention.
I was digging around in .NET's implementation of Dictionaries, and found one function that I'm curious about: HashHelpers.GetPrime.
Most of what it does is quite straightforward, it looks for a prime number above some minimum which is passed to it as a parameter, apparently for the specific purpose of being used as a number of buckets in a hashtable-like structure. But there's one mysterious part:
if (HashHelpers.IsPrime(j) && (j - 1) % 101 != 0)
{
return j;
}
What is the purpose of the (j - 1) % 101 != 0 check? i.e. Why do we apparently want to avoid having a number of buckets which is 1 more than a multiple of 101?
The comments explain it pretty well:
‘InitHash’ is basically an implementation of classic DoubleHashing
(see http://en.wikipedia.org/wiki/Double_hashing)
1) The only ‘correctness’ requirement is that the ‘increment’ used to
probe a. Be non-zero b. Be relatively prime to the table size
‘hashSize’. (This is needed to insure you probe all entries in the
table before you ‘wrap’ and visit entries already probed)
2) Because
we choose table sizes to be primes, we just need to insure that the
increment is 0 < incr < hashSize
Thus this function would work: Incr = 1 + (seed % (hashSize-1))
While this works well for ‘uniformly distributed’ keys, in practice,
non-uniformity is common. In particular in practice we can see
‘mostly sequential’ where you get long clusters of keys that ‘pack’.
To avoid bad behavior you want it to be the case that the increment is
‘large’ even for ‘small’ values (because small values tend to happen
more in practice). Thus we multiply ‘seed’ by a number that will make
these small values bigger (and not hurt large values). We picked
HashPrime (101) because it was prime, and if ‘hashSize-1’ is not a
multiple of HashPrime (enforced in GetPrime), then incr has the
potential of being every value from 1 to hashSize-1. The choice was
largely arbitrary.
Can someone kindly point me to an explanation, if there is one, to this chunk of code and what it does and why? Specifically the bottom line...
protected uint uMask;
int nBits = (int)Math.Log(BlockSize, 2);
uMask = 0xffffffff << nBits;
For instance, on the first iteration BlockSize is 8, nBits is 3 and after the operation, the uMask is 4294967288.
I tried Googling the third line as I don't know how to put this into plain language, and I got examples of code and that is not what I was looking for.
This looks to be creating a mask to exclude bits from a larger structure. Some piece of data is probably stored in a larger value and has a maximum value Blocksize. This code determines how many bits are required for that item, given its maximum value in Blocksize. It then uses this number of bits to create a mask. After the last line, uMask will look something like this in binary (assuming Blocksize is 8 and nBits is 3:
1111111111111111111111111111111111111111111111111111111111111000
or in hex:
0xfffffffc
This would typically be used to remove one field stored in piece of data in order to isolate some other field. Conceptually, you might have bits used for value A and value B in a 64 bit value:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBB
Suppose you want to get the value for A. You could do something like this:
result = value & uMask; // Step 1: Mask off B
result = result >> nBits // Step 2:Align A
Data will look like this:
Step 1: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA000
Step 2: 000AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Unless you have a savant math ability, you're never going to be able to read masks in decimal.
I would like to generate coupon codes , e.g. AYB4ZZ2. However, I would also like to be able to mark the used coupons and limit their global number, let's say N. The naive approach would be something like "generate N unique alphanumeric codes, put them into database and perform a db search on every coupon operation."
However, as far as I realize, we can also attempt to find a function MakeCoupon(n), which converts the given number into a coupon-like string with predefined length.
As far as I understand, MakeCoupon should fullfill the following requirements:
Be bijective. It's inverse MakeNumber(coupon) should be effectively computable.
Output for MakeCoupon(n) should be alphanumeric and should have small and constant length - so that it could be called human readable. E.g. SHA1 digest wouldn't pass this requirement.
Practical uniqueness. Results of MakeCoupon(n) for every natural n <= N should be totally unique or unique in the same terms as, for example, MD5 is unique (with the same extremely small collision probability).
(this one is tricky to define) It shouldn't be obvious how to enumerate all remaining coupons from a single coupon code - let's say MakeCoupon(n) and MakeCoupon(n + 1) should visually differ.
E.g. MakeCoupon(n), which simply outputs n padded with zeroes would fail this requirement, because 000001 and 000002 don't actually differ "visually".
Q:
Does any function or function generator, which fullfills the following requirements, exist? My search attempts only lead me to [CPAN] CouponCode, but it does not fullfill the requirement of the corresponding function being bijective.
Basically you can split your operation into to parts:
Somehow "encrypt" your initial number n, so that two consecutive numbers yield (very) different results
Construct your "human-readable" code from the result of step 1
For step 1 I'd suggest to use a simple block cipher (e.g. a Feistel cipher with a round function of your choice). See also this question.
Feistel ciphers work in several rounds. During each round, some round function is applied to one half of the input, the result is xored with the other half and the two halves are swapped. The nice thing about Feistel ciphers is that the round function hasn't to be two-way (the input to the round function is retained unmodified after each round, so the result of the round function can be reconstructed during decryption). Therefore you can choose whatever crazy operation(s) you like :). Also Feistel ciphers are symmetric, which fulfills your first requirement.
A short example in C#
const int BITCOUNT = 30;
const int BITMASK = (1 << BITCOUNT/2) - 1;
static uint roundFunction(uint number) {
return (((number ^ 47894) + 25) << 1) & BITMASK;
}
static uint crypt(uint number) {
uint left = number >> (BITCOUNT/2);
uint right = number & BITMASK;
for (int round = 0; round < 10; ++round) {
left = left ^ roundFunction(right);
uint temp = left; left = right; right = temp;
}
return left | (right << (BITCOUNT/2));
}
(Note that after the last round there is no swapping, in the code the swapping is simply undone in the construction of the result)
Apart from fulfilling your requirements 3 and 4 (the function is total, so for different inputs you get different outputs and the input is "totally scrambled" according to your informal definition) it is also it's own inverse (thus implicitely fulfilling requirement 1), i.e. crypt(crypt(x))==x for each x in the input domain (0..2^30-1 in this implementation). Also it's cheap in terms of performance requirements.
For step 2 just encode the result to some base of your choice. For instance, to encode a 30-bit number, you could use 6 "digits" of an alphabet of 32 characters (so you can encode 6*5=30 bits).
An example for this step in C#:
const string ALPHABET= "AG8FOLE2WVTCPY5ZH3NIUDBXSMQK7946";
static string couponCode(uint number) {
StringBuilder b = new StringBuilder();
for (int i=0; i<6; ++i) {
b.Append(ALPHABET[(int)number&((1 << 5)-1)]);
number = number >> 5;
}
return b.ToString();
}
static uint codeFromCoupon(string coupon) {
uint n = 0;
for (int i = 0; i < 6; ++i)
n = n | (((uint)ALPHABET.IndexOf(coupon[i])) << (5 * i));
return n;
}
For inputs 0 - 9 this yields the following coupon codes
0 => 5VZNKB
1 => HL766Z
2 => TMGSEY
3 => P28L4W
4 => EM5EWD
5 => WIACCZ
6 => 8DEPDA
7 => OQE33A
8 => 4SEQ5A
9 => AVAXS5
Note, that this approach has two different internal "secrets": First, the round function together with the number of rounds used and second, the alphabet you use for encoding the encyrpted result. But also note, that the shown implementation is in no way secure in a cryptographical sense!
Also note, that the shown function is a total bijective function, in the sense, that every possible 6-character code (with characters out of your alphabet) will yield a unique number. To prevent anyone from entering just some random code, you should define some kind of restictions on the input number. E.g. only issue coupons for the first 10.000 numbers. Then, the probability of some random coupon code to be valid would be 10000/2^30=0.00001 (it would require about 50000 attempts to find a correct coupon code). If you need more "security", you can just increase the bit size/coupon code length (see below).
EDIT: Change Coupon code length
Changing the length of the resulting coupon code requires some math: The first (encrypting) step only works on a bit string with even bit count (this is required for the Feistel cipher to work).
In the the second step, the number of bits that can be encoded using a given alphabet depends on the "size" of chosen alphabet and the length of the coupon code. This "entropy", given in bits, is, in general, not an integer number, far less an even integer number. For example:
A 5-digit code using a 30 character alphabet results in 30^5 possible codes which means ld(30^5)=24.53 bits/Coupon code.
For a four-digit code, there is a simple solution: Given a 32-Character alphabet you can encode *ld(32^4)=5*4=20* Bits. So you can just set the BITCOUNT to 20 and change the for loop in the second part of the code to run until 4 (instead of 6)
Generating a five-digit code is a bit trickier and somhow "weakens" the algorithm: You can set the BITCOUNT to 24 and just generate a 5-digit code from an alphabet of 30 characters (remove two characters from the ALPHABET string and let the for loop run until 5).
But this will not generate all possible 5-digit-codes: with 24 bits you can only get 16,777,216 possible values from the encryption stage, the 5 digit codes could encode 24,300,000 possible numbers, so some possible codes will never be generated. More specifically, the last position of the code will never contain some characters of the alphabet. This can be seen as a drawback, because it narrows down the set of valid codes in an obvious way.
When decoding a coupon code, you'll first have to run the codeFromCoupon function and then check, if bit 25 of the result is set. This would mark an invalid code that you can immediately reject. Note that, in practise, this might even be an advantage, since it allows a quick check (e.g. on the client side) of the validity of a code without giving away all internals of the algorithm.
If bit 25 is not set you'll call the crypt function and get the original number.
Though I may get docked for this answer I feel like I need to respond - I really hope that you hear what I'm saying as it comes from a lot of painful experience.
While this task is very academically challenging, and software engineers tend to challenge their intelect vs. solving problems, I need to provide you with some direction on this if I may. There is no retail store in the world, that has any kind of success anyway, that doesn't keep very good track of each and every entity that is generated; from each piece of inventory to every single coupon or gift card they send out those doors. It's just not being a good steward if you are, because it's not if people are going to cheat you, it's when, and so if you have every possible item in your arsenal you'll be ready.
Now, let's talk about the process by which the coupon is used in your scenario.
When the customer redeems the coupon there is going to be some kind of POS system in front right? And that may even be an online business where they are then able to just enter their coupon code vs. a register where the cashier scans a barcode right (I'm assuming that's what we're dealing with here)? And so now, as the vendor, you're saying that if you have a valid coupon code I'm going to give you some kind of discount and because our goal was to generate coupon codes that were reversable we don't need a database to verify that code, we can just reverse it right! I mean it's just math right? Well, yes and no.
Yes, you're right, it's just math. In fact, that's also the problem because so is cracking SSL. But, I'm going to assume that we all realize the math used in SSL is just a bit more complex than anything used here and the key is substantially larger.
It does not behoove you, nor is it wise for you to try and come up with some kind of scheme that you're just sure nobody cares enough to break, especially when it comes to money. You are making your life very difficult trying to solve a problem you really shouldn't be trying to solve because you need to be protecting yourself from those using the coupon codes.
Therefore, this problem is unnecessarily complicated and could be solved like this.
// insert a record into the database for the coupon
// thus generating an auto-incrementing key
var id = [some code to insert into database and get back the key]
// base64 encode the resulting key value
var couponCode = Convert.ToBase64String(id);
// truncate the coupon code if you like
// update the database with the coupon code
Create a coupon table that has an auto-incrementing key.
Insert into that table and get the auto-incrementing key back.
Base64 encode that id into a coupon code.
Truncate that string if you want.
Store that string back in the database with the coupon just inserted.
What you want is called Format-preserving encryption.
Without loss of generality, by encoding in base 36 we can assume that we are talking about integers in 0..M-1 rather than strings of symbols. M should probably be a power of 2.
After choosing a secret key and specifying M, FPE gives you a pseudo-random permutation of 0..M-1 encrypt along with its inverse decrypt.
string GenerateCoupon(int n) {
Debug.Assert(0 <= n && n < N);
return Base36.Encode(encrypt(n));
}
boolean IsCoupon(string code) {
return decrypt(Base36.Decode(code)) < N;
}
If your FPE is secure, this scheme is secure: no attacker can generate other coupon codes with probability higher than O(N/M) given knowledge of arbitrarily many coupons, even if he manages to guess the number associated with each coupon that he knows.
This is still a relatively new field, so there are few implementations of such encryption schemes. This crypto.SE question only mentions Botan, a C++ library with Perl/Python bindings, but not C#.
Word of caution: in addition to the fact that there are no well-accepted standards for FPE yet, you must consider the possibility of a bug in the implementation. If there is a lot of money on the line, you need to weigh that risk against the relatively small benefit of avoiding a database.
You can use a base-36 number system. Assume that you want 6 characters in the coupen output.
pseudo code for MakeCoupon
MakeCoupon(n)
{
Have an byte array of fixed size, say 6. Initialize all the values to 0.
convert the number to base - 36 and store the 'digits' in an array
(using integer division and mod operations)
Now, for each 'digit' find the corresponding ascii code assuming the
digits to start from 0..9,A..Z
With this convension output six digits as a string.
}
Now the calculating the number back is the reverse of this operation.
This would work with very large numbers (35^6) with 6 allowed characters.
Choose a cryptographic function c. There are a few requirements on c, but for now let us take SHA1.
choose a secret key k.
Your coupon code generating function could be, for number n:
concatenate n and k as "n"+"k" (this is known as salting in password management)
compute c("n"+"k")
the result of SHA1 is 160bits, encode them (for instance with base64) as an ASCII string
if the result is too long (as you said it is the case for SHA1), truncate it to keep only the first 10 letters and name this string s
your coupon code is printf "%09d%s" n s, i.e. the concatenation of zero-padded n and the truncated hash s.
Yes, it is trivial to guess n the number of the coupon code (but see below). But it is hard to generate another valid code.
Your requirements are satisfied:
To compute the reverse function, just read the first 9 digits of the code
The length is always 19 (9 digits of n, plus 10 letters of hash)
It is unique, since the first 9 digits are unique. The last 10 chars are too, with high probability.
It is not obvious how to generate the hash, even if one guesses that you used SHA1.
Some comments:
If you're worried that reading n is too obvious, you can obfuscate it lightly, like base64 encoding, and alternating in the code the characters of n and s.
I am assuming that you won't need more than a billion codes, thus the printing of n on 9 digits, but you can of course adjust the parameters 9 and 10 to your desired coupon code length.
SHA1 is just an option, you could use another cryptographic function like private key encryption, but you need to check that this function remains strong when truncated and when the clear text is provided.
This is not optimal in code length, but has the advantage of simplicity and widely available libraries.
I've been studying C# and ran accross some familiar ground from my old work in C++. I never understood the reason for bitwise operators in a real application. I've never used them and have never had in a reason to use them. I've been studying how they work; the example below shows the shift bitwise operator. What is the point of bitwise operators, their use and how they work?
Maybe I'm missing something in bitwise logic.
byte bitComp = 15; // bitComp = 15 = 00001111b
byte bresult = (byte) ~bitComp; // bresult = 240 = 11110000b
Here's an example for the ~complement bitwise operator:
byte bitComp = 15; // bitComp = 15 = 00001111b
byte bresult = (byte) ~bitComp; // bresult = 240 = 11110000b
A typical use is manipulating bits that represent mutually exclusive 'flags'.
Example from MSDN: Enumeration Types
[Flags]
enum Days2
{
None = 0x0,
Sunday = 0x1,
Monday = 0x2,
Tuesday = 0x4,
Wednesday = 0x8,
Thursday = 0x10,
Friday = 0x20,
Saturday = 0x40
}
class MyClass
{
Days2 meetingDays = Days2.Tuesday | Days2.Thursday;
Days2 notWednesday = ~(Days2.Wednesday);
}
See also Stack Overflow question Most common C# bitwise operations.
Here's an everyday bitwise-op trick not many people have discovered:
When you have an enumerated type representing a bitfield, you need to define each enum entry as a distinct bit value, as in:
enum
{
Option1 = 1,
Option2 = 2,
Option3 = 4,
Option4 = 8,
Option5 = 16
};
but it's easy to forget that the next item in the sequence needs to be double the last number. Using bit shifting, it makes the sequence much easier to get right:
enum
{
Option1 = 1<<0,
Option2 = 1<<1,
Option3 = 1<<2,
Option4 = 1<<3,
Option5 = 1<<4
};
Another typical (but I think less common) usage is to compose several numbers into one big number. An example for this can be the windows RGB macro:
#define RGB(r, g ,b) ((DWORD) (((BYTE) (r) | ((WORD) (g) << 8)) | (((DWORD) (BYTE) (b)) << 16)))
Where you take 3 bytes and compose an integer from them the represent the RGB value.
Except for combining flags, bit logic isn't necessarily something you need in your UI code, but it is still tremendously important. For example, I maintain a binary serialization library, that needs to deal with all sorts of complex bit-packing strategies (variant length base-128 integer encoding, for example). This is one of the implementations (actually, this is a slower/safer version - there are other variants for dealing with buffered data, but they are harder to follow):
public static bool TryDecodeUInt32(Stream source, out uint value)
{
if (source == null) throw new ArgumentNullException("source");
int b = source.ReadByte();
if (b < 0)
{
value = 0;
return false;
}
if ((b & 0x80) == 0)
{
// single-byte
value = (uint) b;
return true;
}
int shift = 7;
value = (uint)(b & 0x7F);
bool keepGoing;
int i = 0;
do
{
b = source.ReadByte();
if (b < 0) throw new EndOfStreamException();
i++;
keepGoing = (b & 0x80) != 0;
value |= ((uint)(b & 0x7F)) << shift;
shift += 7;
} while (keepGoing && i < 4);
if (keepGoing && i == 4)
{
throw new OverflowException();
}
return true;
}
We have:
tests to see if the most-significant-bit is set (this changes the meaning of the data)
shifts-a-plenty
removal of the most-significant-bit
bitwise combination of values
This is real code, used in a real (and much used) protocol. In general, bit operations are used a lot in any kind of encoding layer.
It is also hugely important in graphics programming, for example. And lots of others.
There are also some micro-optimisations (maths intensive work etc) that can be done with bit operations.
An example from COM programming:
An HRESULT is an error code consisting of a 32 bit integer. The high bit is a flag indicating whether the code represents success (0) or failure (1). The next 15 bits are an integer representing what sort of error it is -- an ole automation error or a win32 error or whatever. The lower 16 bits are the actual error code.
Being able to shift bits around is quite useful when you want to get information into or out of an HRESULT.
Now, you almost always want to abstract away the bit twiddling. It's much better to have a method (or in C, a macro) that tells you whether the HRESULT is failure, rather than actually twiddling out the bit with (hr & 0x80000000) != 0 right there in your source code. Let the compiler inline it.
There are lots of examples of low-level data structures where information is crammed into words and needs to be extracted with bitwise operations.
Three major uses off of the top of my head:
1) In embedded applications, you often have to access memory-mapped registers, of which individual bits mean certain things (for instance the update bit in an ADC or serial register). This is more relevant to C++ than C#.
2) Calculations of checksums, like CRCs. These use shifts and masks very heavily. Before anybody says "use a standard library", I have come across non-standard checksums too many times, which have had to implemented from scratch.
3) When dealing with data which comes from another platform, which have a diffent bit or byte order (or both) from the one you are executing your code on. This is particularly true when doing software testing of embedded systems, receiving data across a network which has not been converted to network order, or processing bulk data from a data capture system. Check out the Wikipedia article on Endianness. If you are really interested, read the classic article On Holy Wars and a Call for Peace" by Danny Cohen.
A couple of examples:
Communication stacks: a header attached to data in a layer of a communication stack may contain bytes where individual bits within those bytes signify something, and so have to be masked before they can be processed. Similarly, when assembling the header in the response, individual bits will then need to be set or cleared.
Embedded software: embedded microcontrollers can have tens or hundreds of hardware registers, in which individual bits (or collections thereof) control different functions within the chip, or indicate the status of parts of the hardware.
Incidentally, in C and C++, bitfields are not recommended where portability is important, as the order of bits in a bitfield is compiler-dependent. Using masks instead guarantees which bit(s) will be set or cleared.
As to 'how they work': bitwise operations are one of the lowest level operations CPUs support, and in fact some bitwise operations, like NAND and NOR, are universal - you can build any operation at all out of a sufficiently large set of NAND gates. This may seem academic, but if you look at how things like adders are implemented in hardware, that's often basically what they boil down to.
As to the 'point': in a lot of higher level applications, of course, there is not much use for bit ops, but at the lowest levels of a system they are incredibly important. Just off the top of my head, things that would be very difficult to write without bit operations include device drivers, cryptographic software, error correction systems like RAID5 or erasure codes, checksums like CRC, video decoding software, memory allocators, or compression software.
They are also useful for maintaining large sets of integers efficiently, for instance in the fd_set used by the common select syscall, or when solving certain search/optimization problems.
Take a look at the source of an MPEG4 decoder, cryptography library, or operating system kernel sometime and you'll see many, many examples of bit operations being used.
Left and right shift operators (<< and >>) are often used in performance critical applications which do arithmetic operations and more specifically multiplications and divisions by powers of two.
For example suppose you had to calculate the mathematical expression 5*2^7. A naive implementation would be:
int result = 5 * (int)Math.Pow(2, 7);
Using left shift operator you could write:
int result = 5 << 7;
The second expression will be orders of magnitude faster than the first and yet yielding the same result.
Minimizing Memory Use
Naturally a very generalized reason is to cram a lot of data into a small amount of memory. If you consider an array of booleans like this:
bool data[64] = {...};
That can take 64 bytes (512 bits) of memory. Meanwhile the same idea can be represented with bits using 8 bytes (64-bits) of memory:
uint64_t data = ...;
And of course we have a boatload of DRAM these days so it might not seem like it'd matter to compact all this data into the minimum size, but we're still dealing with, say, 64-bit general purpose registers. We're still dealing with 64 byte cache lines, e.g., and kilobytes per physically-mapped page, and moving data down the memory hierarchy is expensive. So if you're processing a boatload of data sequentially, for example, and you can reduce that down to 1/8th its size, often you'll be able to process a whole lot more of it in a shorter amount of time.
So a common use of the analogy above to store a bunch of booleans in a small amount of space is when bit flags are involved, like this:
enum Flags
{
flag_selected = 1 << 0,
flag_hidden = 1 << 1,
flag_removed = 1 << 2,
flag_hovering = 1 << 3,
flag_minimized = 1 << 4,
...
};
uint8_t flags = flag_selected | flag_hovering;
Operating on Multiple Bits at Once
But on top of cramming all this data into a smaller amount of space, you can also do things like test for multiple bits simultaneously:
// Check if the element is hidden or removed.
if (flags & (flag_hidden | flag_removed))
{
...
}
And a smart optimizer will typically reduce that down to a single bitwise and if flag_hidden and flag_removed are literal constants known at compile-time.
As another example, let's go back to the example above:
bool data[64];
Let's say you wanted to test if all 64 booleans are set in which case you do something different. Given this type of representation, we might have to do this:
bool all_set = true;
for (int j=0; j < 64; ++j)
{
if (!data[j])
{
all_set = false;
break;
}
}
if (all_set)
{
// Do something different when all booleans are set.
...
}
And that's pretty expensive when the bitwise representation allows us to do this:
uint64_t data = ...;
if (data == 0xffffffffffffffff)
{
// Do something different when all bits are set.
...
}
This above version can perform the check for all 64 bits set in a single instruction on 64-bit machines. With SIMD registers, you can even test for more than 64 bits at a time with a single SIMD instruction.
As another example let's say you want to count how many of those booleans are set. In that case you might have to do this working with the boolean representation:
int count = 0;
for (int j=0; j < 64; ++j)
count += data[j];
// do something with count
Meanwhile if you used bitwise operations, you can do this:
uint64_t data = ...;
const int count = __popcnt64(data);
// do something with count
And some hardware can do that very efficiently as a native instruction. Others can still do it a whole, whole lot faster than looping through 64 booleans and counting the booleans set to true.
Efficient Arithmetic
Another common one is efficient arithmetic. If you have something like:
x = pow(2, n);
Where n is a runtime variable, then you can often get a much more efficient result doing:
x = 1 << n;
Of course an optimizing compiler using intrinsics for pow might be able to translate the former into the latter, but at least C and C++ compilers I've checked as of late cannot perform this optimization, at least when n is not known at compile-time.
Whenever you're working with power of two, you can often do a lot of things efficiently with bitshifts and other bitwise operations. For example, take this:
x = n % power_of_two;
... where power_of_two is a runtime variable that is always a power of two. In that case you can do:
x = n & (power_of_two - 1);
Which has the same effect as the modulo (only for power of two numbers). It's because a power of two value will always be a set bit followed by zeros. For example, 16 will be 0b10000. If you subtract one from that, it becomes: 0b1111, and using a bitwise and with that will effectively clear all upper bits in a way that gives you the analogical equivalent of n % 16.
Similar thing with left-shifts to multiply by a power of two, right-shifts to divide by a power of two, etc. One of the main reasons a lot of hardware favored power of two image sizes like 16x16, 32x32, 64x64, 256x256, etc. is due to the efficient arithmetic enabled by it using bitwise instructions.
Conclusion
So anyway, this is a brief introduction to what you can do with bitwise operations and instructions from fast arithmetic, reduced memory use, and being able to perform operations on potentially many bits at once without looping through them and operating on them one bit at a time.
And they're still very relevant today in performance-critical fields. For example, if you look at the Atomontage voxel rendering engine, it claims to be able to rep a voxel in just about a single bit, and that's important not just to fit huge voxel data in DRAM but also to render it really quickly from smaller, fast memory like registers. Naturally it can't do that if it's going to use 8 bits just to store a true/false kind of value which has to be checked individually.
I work in motion control (among other things) and the way you communicate with the drives is usually by using bit sequences. You set one bit pattern in a memory location x to set the motion profile, you set an enable bit at memory location y to start the motion, read a pattern from location z to get the status of the move, etc. The lower you go the more bit twiddling you have to perform.
There is a reason that, depending on the kind of person, can be important: cool-iness! ;)
Take this, an algorithm to calculate the absolute value of a number, without using any conditional branch instruction:
int abs(const int input) {
int temp = A >> 31;
return ( input ^ A ) - A;
}
A reason to avoid conditional branching is that it could stop your processor pre-fetching, waiting for the condition to be verified to know to which value the program counter should be set.
So, a part from the joke, there are very good technical reasons to do it.