Part of my application data contains a set of 9 ternary (base-3) "bits". To keep the data compact for the database, I would like to store that data as a single short. Since 3^9 < 2^15 I can represent any possible 9 digit base-3 number as a short.
My current method is to work with it as a string of length 9. I can read or set any digit by index, and it is nice and easy. To convert it to a short though, I am currently converting to base 10 by hand (using a shift-add loop) and then using Int16.Parse to convert it back to a binary short. To convert a stored value back to the base 3 string, I run the process in reverse. All of this takes time, and I would like to optimize it if at all possible.
What I would like to do is always store the value as a short, and read and set ternary bits in place. Ideally, I would have functions to get and set individual digits from the binary in place.
I have tried playing with some bit shifts and mod functions, but havn't quite come up with the right way to do this. I'm not even sure if it is even possible without going through the full conversion.
Can anyone give me any bitwise arithmetic magic that can help out with this?
public class Base3Handler
{
private static int[] idx = {1, 3, 9, 27, 81, 243, 729, 729*3, 729*9, 729*81};
public static byte ReadBase3Bit(short n, byte position)
{
if ((position > 8) || (position < 0))
throw new Exception("Out of range...");
return (byte)((n%idx[position + 1])/idx[position]);
}
public static short WriteBase3Bit(short n, byte position, byte newBit)
{
byte oldBit = ReadBase3Bit(n, position);
return (short) (n + (newBit - oldBit)*idx[position]);
}
}
These are small numbers. Store them as you wish, efficiently in memory, but then use a table lookup to convert from one form to another as needed.
You can't do bit operations on ternary values. You need to use multiply, divide and modulo to extract and combine values.
To use bit operations you need to limit the packing to 8 ternaries per short (i.e. 2 bits each)
Related
I have a list of entities, and for the purpose of analysis, an entity can be in one of three states. Of course I wish it was only two states, then I could represent that with a bool.
In most cases there will be a list of entities where the size of the list is usually 100 < n < 500.
I am working on analyzing the effects of the combinations of the entities and the states.
So if I have 1 entity, then I can have 3 combinations. If I have two entities, I can have six combinations, and so on.
Because of the amount of combinations, brute forcing this will be impractical (it needs to run on a single system). My task is to find good-but-not-necessarily-optimal solutions that could work. I don't need to test all possible permutations, I just need to find one that works. That is an implementation detail.
What I do need to do is to register the combinations possible for my current data set - this is basically to avoid duplicating the work of analyzing each combination. Every time a process arrives at a certain configuration of combinations, it needs to check if that combo is already being worked at or if it was resolved in the past.
So if I have x amount of tri-state values, what is an efficient way of storing and comparing this in memory? I realize there will be limitations here. Just trying to be as efficient as possible.
I can't think of a more effective unit of storage then two bits, where one of the four "bit states" is not used. But I don't know how to make this efficient. Do I need to make a choice on optimizing for storage size or performance?
How can something like this be modeled in C# in a way that wastes the least amount of resources and still performs relatively well when a process needs to ask "Has this particular combination of tri-state values already been tested?"?
Edit: As an example, say I have just 3 entities, and the state is represented by a simple integer, 1, 2 or 3. We would then have this list of combinations:
111
112
113
121
122
123
131
132
133
211
212
213
221
222
223
231
232
233
311
312
313
321
322
323
331
332
333
I think you can break this down as follows:
You have a set of N entities, each of which can have one of three different states.
Given one particular permutation of states for those N entities, you
want to remember that you have processed that permutation.
It therefore seems that you can treat the N entities as a base-3 number with 3 digits.
When considering one particular set of states for the N entities, you can store that as an array of N bytes where each byte can have the value 0, 1 or 2, corresponding to the three possible states.
That isn't a memory-efficient way of storing the states for one particular permutation, but that's OK because you don't need to store that array. You just need to store a single bit somewhere corresponding to that permutation.
So what you can do is to convert the byte array into a base 10 number that you can use as an index into a BitArray. You then use the BitArray to remember whether a particular permutation of states has been processed.
To convert a byte array representing a base three number to a decimal number, you can use this code:
public static int ToBase10(byte[] entityStates) // Each state can be 0, 1 or 2.
{
int result = 0;
for (int i = 0, n = 1; i < entityStates.Length; n *= 3, ++i)
result += n * entityStates[i];
return result;
}
Given that you have numEntities different entities, you can then create a BitArray like so:
int numEntities = 4;
int numPerms = (int)Math.Pow(numEntities, 3);
BitArray states = new BitArray(numPerms);
Then states can store a bit for each possible permutation of states for all the entities.
Let's suppose that you have 4 entities A, B, C and D, and you have a permutation of states (which will be 0, 1 or 2) as follows: A2 B1 C0 D1. That is, entity A has state 2, B has state 1, C has state 0 and D has state 1.
You would represent that as a boolean array like so:
byte[] permutation = { 2, 1, 0, 1 };
Then you can convert that to a base 10 number like so:
int asBase10 = ToBase10(permutation);
Then you can check if that permutation has been processed like so:
if (!bits[permAsBase10])
{
// Not processed, so process it.
process(permutation);
bits[permAsBase10] = true; // Remember that we processed it.
}
Without getting overly fancy with algorithms and data structures and assuming your tri-state values can be represented in strings and doesn't have a easily determined fix maximum amount. ie. "111", "112", etc (or even "1:1:1", "1:1:2") then a simple SortedSet may end up being fairly efficient.
As a bonus, it doesn't care about the number of values in your set.
SortedSet<string> alreadyTried = new SortedSet<string>();
if(!HasSetBeenTried("1:1:1"){
// do whatever
}
if(!HasSetBeenTried("500:212:100"){
// do whatever
}
public bool HasSetBeenTried(string set){
if(alreadyTried.Contains(set)) return false;
alreadyTried.Add(set);
return true;
}
Simple mathematic says:
3 entities in 3 states makes 27 combinations.
So you need exactly log(27)/log(2) = ~ 4.75 bits to store that information.
Because a pc only can make use of whole bits, you need to "waste" ~0.25 bits and use 5 bits per combination.
The more data you gather, the better you can pack that information, but in the end, maybe a compression algorithm could help even more.
Again: you only asked for memory efficiency, not performance.
In general you can calculate the bits you need by Math.Ceil(Math.Log( noCombinations , 2 )).
Because I needed to look at some methods in BigInteger, I DotPeeked into the assembly. And then I found something rather odd:
internal int _sign;
Why would you use an int for the sign of a number? Is there no reason, or is there something I'm missing. I mean, they could use a BitArray, or a bool, or a byte. Why an int?
If you look at some of the usages of _sign field in the decompiled code, you may find things like this:
if ((this._sign ^ other._sign) < 0)
return this._sign >= 0 ? 1 : -1;
Basically int type allows to compare signs of two values using multiplication. Obviously neither byte, nor bool would allow this.
Still there is a question: why not Int16 then, as it would consume less memory? This is perhaps connected with alignment.
Storing the sign as an int allows you to simply multiply by the sign to apply it to the result of a calculation. This could come in handy when converting to simpler types.
A bool can have only 2 states. The advantage of an int is that it now also is simple to keep track of the special value: 0
public bool get_IsZero()
{
return (this._sign == 0);
}
And several more shortcuts like that when you read the rest of the code.
The size of any class object is going to be rounded up to 32 bits (four bytes), so "saving" three bytes won't buy anything. One might be able to shave four bytes off the size of a typical BigInteger by stealing a bit from one of the words that holds the numeric value, but the extra processing required for such usage would outweigh the cost of wasting a 32-bit integer.
A more interesting possibility might be to have BigInteger be an abstract class, with derived classes PositiveBigInteger and NegativeBigInteger. Since every class object is going to have a word that says what class it is, such an approach would save 32 bits for each BigInteger that's created. Use of an abstract class in such fashion would add an extra virtual member dispatch to each function call, but would likely save an "if" test on most of them (since the methods of e.g. NegativeBigInteger would know by virtue of the fact that they are invoked that this is negative, they wouldn't have to test it). Such a design could also improve efficiency if there were classes for TinyBigInteger (a BigInteger whose value could fit in a single Integer) and SmallBigInteger (a BigInteger whose value could fit in a Long). I have no idea if Microsoft considered such a design, or what the trade-offs would have been.
Gets a number that indicates the sign (negative, positive, or zero) of the current System.Numerics.BigInteger object.
-1 The value of this object is negative. 0 The value of this object is 0 (zero). 1 The value of this object is positive.
That means
class Program
{
static void Main(string[] args)
{
BigInteger bInt1 = BigInteger.Parse("0");
BigInteger bInt2 = BigInteger.Parse("-5");
BigInteger bInt3 = BigInteger.Parse("5");
division10(bInt1);//it is Impossible
division10(bInt2);//it is Possible : -2
division10(bInt3);//it is Possible : 2
}
static void division10(BigInteger bInt)
{
double d = 10;
if (bInt.IsZero)
{
Console.WriteLine("it is Impossible");
}
else
{
Console.WriteLine("it is Possible : {0}", d / (int)bInt);
}
}
}
don't use byte or another uint, sbyte, ushort, short because exist CLS and CLS don't support their
We are rewriting some applications previously developed in Visual FoxPro and redeveloping them using .Net ( using C# )
Here is our scenario:
Our application uses smartcards. We read in data from a smartcard which has a name and number. The name comes back ok in readable text but the number, in this case '900' comes back as a 2 byte character representation (131 & 132) and look like this - ƒ„
Those 2 special characters can be seen in the extended Ascii table.. now as you can see the 2 bytes are 131 and 132 and can vary as there is no single standard extended ascii table ( as far as I can tell reading some of the posts on here )
So... the smart card was previously written to using the BINTOC function in VFP and therefore the 900 was written to the card as ƒ„. And within foxpro those 2 special characters can be converted back into integer format using CTOBIN function.. another built in function in FoxPro..
So ( finally getting to the point ) - So far we have been unable to convert those 2 special characters back to an int ( 900 ) and we are wondering if this is possible in .NET to read the character representation of an integer back to an actual integer.
Or is there a way to rewrite the logic of those 2 VFP functions in C#?
UPDATE:
After some fiddling we realise that to get 900 into 2bytes we need to convert 900 into a 16bit Binary Value, then we need to convert that 16 bit binary value into a decimal value.
So as above we are receiving back 131 and 132 and their corresponding binary values as being 10000011 ( decimal value 131 ) and 10000100 ( decimal value 132 ).
When we concatenate these 2 values to '1000001110000100' it gives the decimal value 33668 however if we removed the leading 1 and transform '000001110000100' to decimal it gives the correct value of 900...
Not too sure why this is though...
Any help would be appreciated.
It looks like VFP is storing your value as a signed 16 bit (short) integer. It seems to have a strange changeover point to me for the negative numbers but it adds 128 to 8 bit numbers and adds 32768 to 16 bit numbers.
So converting your 16 bit numbers from the string should be as easy as reading it as a 16 bit integer and then taking 32768 away from it. If you have to do this manually then the first number has to be multiplied by 256 and then add the second number to get the stored value. Then take 32768 away from this number to get your value.
Examples:
131 * 256 = 33536
33536 + 132 = 33668
33668 - 32768 = 900
You could try using the C# conversions as per http://msdn.microsoft.com/en-us/library/ms131059.aspx and http://msdn.microsoft.com/en-us/library/tw38dw27.aspx to do at least some of the work for you but if not it shouldn't be too hard to code the above manually.
It's a few years late, but here's a working example.
public ulong CharToBin(byte[] s)
{
if (s == null || s.Length < 1 || s.Length > 8)
return 0ul;
var v = s.Select(c => (ulong)c).ToArray();
var result = 0ul;
var multiplier = 1ul;
for (var i = 0; i < v.Length; i++)
{
if (i > 0)
multiplier *= 256ul;
result += v[i] * multiplier;
}
return result;
}
This is a VFP 8 and earlier equivalent for CTOBIN, which covers your scenario. You should be able to write your own BINTOC based on the code above. VFP 9 added support for multiple options like non-reversed binary data, currency and double data types, and signed values. This sample only covers reversed unsigned binary like older VFP supported.
Some notes:
The code supports 1, 2, 4, and 8-byte values, which covers all
unsigned numeric values up to System.UInt64.
Before casting the
result down to your expected numeric type, you should verify the
ceiling. For example, if you need an Int32, then check the result
against Int32.MaxValue before you perform the cast.
The sample avoids the complexity of string encoding by accepting a
byte array. You would need to understand which encoding was used to
read the string, then apply that same encoding to get the byte array
before calling this function. In the VFP world, this is frequently
Encoding.ASCII, but it depends on the application.
I'm trying to determine the number of digits in a c# ulong number, i'm trying to do so using some math logic rather than using ToString().Length. I have not benchmarked the 2 approaches but have seen other posts about using System.Math.Floor(System.Math.Log10(number)) + 1 to determine the number of digits.
Seems to work fine until i transition from 999999999999997 to 999999999999998 at which point, it i start getting an incorrect count.
Has anyone encountered this issue before ?
I have seen similar posts with a Java emphasis # Why log(1000)/log(10) isn't the same as log10(1000)? and also a post # How to get the separate digits of an int number? which indicates how i could possibly achieve the same using the % operator but with a lot more code
Here is the code i used to simulate this
Action<ulong> displayInfo = number =>
Console.WriteLine("{0,-20} {1,-20} {2,-20} {3,-20} {4,-20}",
number,
number.ToString().Length,
System.Math.Log10(number),
System.Math.Floor(System.Math.Log10(number)),
System.Math.Floor(System.Math.Log10(number)) + 1);
Array.ForEach(new ulong[] {
9U,
99U,
999U,
9999U,
99999U,
999999U,
9999999U,
99999999U,
999999999U,
9999999999U,
99999999999U,
999999999999U,
9999999999999U,
99999999999999U,
999999999999999U,
9999999999999999U,
99999999999999999U,
999999999999999999U,
9999999999999999999U}, displayInfo);
Array.ForEach(new ulong[] {
1U,
19U,
199U,
1999U,
19999U,
199999U,
1999999U,
19999999U,
199999999U,
1999999999U,
19999999999U,
199999999999U,
1999999999999U,
19999999999999U,
199999999999999U,
1999999999999999U,
19999999999999999U,
199999999999999999U,
1999999999999999999U
}, displayInfo);
Thanks in advance
Pat
log10 is going to involve floating point conversion - hence the rounding error. The error is pretty small for a double, but is a big deal for an exact integer!
Excluding the .ToString() method and a floating point method, then yes I think you are going to have to use an iterative method but I would use an integer divide rather than a modulo.
Integer divide by 10. Is the result>0? If so iterate around. If not, stop.
The number of digits is the number of iterations required.
Eg. 5 -> 0; 1 iteration = 1 digit.
1234 -> 123 -> 12 -> 1 -> 0; 4 iterations = 4 digits.
I would use ToString().Length unless you know this is going to be called millions of times.
"premature optimization is the root of all evil" - Donald Knuth
From the documentation:
By default, a Double value contains 15
decimal digits of precision, although
a maximum of 17 digits is maintained
internally.
I suspect that you're running into precision limits. Your value of 999,999,999,999,998 probably is at the limit of precision. And since the ulong has to be converted to double before calling Math.Log10, you see this error.
Other answers have posted why this happens.
Here is an example of a fairly quick way to determine the "length" of an integer (some cases excluded). This by itself is not very interesting -- but I include it here because using this method in conjunction with Log10 can get the accuracy "perfect" for the entire range of an unsigned long without requiring a second log invocation.
// the lookup would only be generated once
// and could be a hard-coded array literal
ulong[] lookup = Enumerable.Range(0, 20)
.Select((n) => (ulong)Math.Pow(10, n)).ToArray();
ulong x = 999;
int i = 0;
for (; i < lookup.Length; i++) {
if (lookup[i] > x) {
break;
}
}
// i is length of x "in a base-10 string"
// does not work with "0" or negative numbers
This lookup-table approach can be easily converted to any base. This method should be faster than the iterative divide-by-base approach but profiling is left as an exercise to the reader. (A direct if-then branch broken into "groups" is likely quicker yet, but that's way too much repetitive typing for my tastes.)
Happy coding.
I have an enum declaration like this:
public enum Filter
{
a = 0x0001;
b = 0x0002;
}
What does that mean? They are using this to filter an array.
It means they're the integer values assigned to those names. Enums are basically just named numbers. You can cast between the underlying type of an enum and the enum value.
For example:
public enum Colour
{
Red = 1,
Blue = 2,
Green = 3
}
Colour green = (Colour) 3;
int three = (int) Colour.Green;
By default an enum's underlying type is int, but you can use any of byte, sbyte, short, ushort, int, uint, long or ulong:
public enum BigEnum : long
{
BigValue = 0x5000000000 // Couldn't fit this in an int
}
It just means that if you do Filter->a, you get 1. Filter->b is 2.
The weird hex notation is just that, notation.
EDIT:
Since this is a 'filter' the hex notation makes a little more sense.
By writing 0x1, you specify the following bit pattern:
0000 0001
And 0x2 is:
0000 0010
This makes it clearer on how to use a filter.
So for example, if you wanted to filter out data that has the lower 2 bits set, you could do:
Filter->a | Filter->b
which would correspond to:
0000 0011
The hex notation makes the concept of a filter clearer (for some people). For example, it's relatively easy to figure out the binary of 0x83F0 by looking at it, but much more difficult for 33776 (the same number in base 10).
It's not clear what it is that you find unclear, so let's discuss it all:
The enum values have been given explicit numerical values. Each enum value is always represented as a numerical value for the underlying storage, but if you want to be sure what that numerical value is you have to specify it.
The numbers are written in hexadecimal notation, this is often used when you want the numerical values to contain a single set bit for masking. It's easier to see that the value has only one bit set when it's written as 0x8000 than when it's written as 32768.
In your example it's not as obvious as you have only two values, but for bit filtering each value represents a single bit so that each value is twice as large as the previous:
public enum Filter {
First = 0x0001,
Second = 0x0002,
Third = 0x0004,
Fourth = 0x0008
}
You can use such an enum to filter out single bits in a value:
If ((num & Filter.First) != 0 && (num & Filter.Third) != 0) {
Console.WriteLine("First and third bits are set.");
}
It could mean anything. We need to see more code then that to be able to understand what it's doing.
0x001 is the number 1. Anytime you see the 0x it means the programmer has entered the number in hexadecimal.
Those are literal hexadecimal numbers.
Main reason is :
It is easyer to read hex notation when writing numbers such as : "2 to the power of x" is needed.
To use enum type as bit flag, we need to increment enum values by power of 2 ...
1,2,4,8,16,32,64, etc. To keep it readable, hex notation is used.
Ex : 2^10 is 0x10000 in hex (neat and clean), but it is written 65536 in classical decimal notation ... Same for 0x200 (hex notation) and 512. (2^9)
Those look like they are bit masks of some sort. But their actual values are 1 and 2...
You can assign values to enums such as:
enum Example {
a = 10,
b = 23,
c = 0x00FF
}
etc...
Using Hexidecimal notation like that usually indicates that there may be some bit manipulation. I've used this notation often when dealing with this very thing, for the very reason you asked this question - this notation sort of pops out at you and says "Pay attention to me I'm important!"
Well we can use integers infact we can avoid any as the default nature of enum assigns 0 to its first member and an incremented value to the next available member. Many developers use this to hit two targets with one bow.
Complicate the code making it difficult to understand
Faster the performance as hex codes are nearer to binary one
my view is if we are still using why we are in fourth generation language just move to binary again
but its quite better technique to play with bits and encryption/decryption process