Hash function for comparison

Hash function for comparison - c#

Well I am wondering normally hash function create an unique number.
Are there also hash functions that can be used for approximately comparisons?
so for example
6 7 8 9 10 11 23 40 10
5 8 10 9 9 12 24 40 20 would match
25 7 12 9 10 12 90 90 would not match
I am wondering this because I'm thinking about pattern recognition. I wonder if there is some math for which one could give a percentage of match you like to find. Using C# as a programing language.
Some clarification, first let me explain a synonym of what i like to catch.
Imagine water droplets fall down but its not in a constant flow.
Measurement tools are also not perfect. So now i am timing the difference between droplets faling down, this is a measurement of a series, say between 19 and 25 droplets in fact i can measure at once such a series for example if i had camera and filmed it.
Now i like to figure out having this "series" when next series starts is it different or is it the same, there might be a random gap of time between series, and the measure ment tools dont detect beginning or end of a series, they just take in between 19 or 25 measurements at once.
I'm not sure in which direction to go with this, maybe fuzzie logic, neural network patern detection, distance vectors.. there seams to lots of ways, but i wonder would be something more simple (i was thinking of something like an hash, but maybe it should be something else).

Hash functions can be used for (not uniquely) identifying certain values. They are not guaranteed to be unique (better said, it is guaranteed that some different values will have identical hash codes). A small deviation in the value usualy results in a completely different hash code (As #Bobson already has mentioned.) Another use of hash codes is to find in-equality of two values in constant time.
It might be possible to design a hash code function that will do what you want, specialy if you know the domain your values are living in. But that will need a mathematical background to do.
As far as I know there is no hash function for the example you gave.
Here is another idea for integers, use modulo 10 operations and calculate the absolute difference betweeen each digit. This way you calculate the 'distance' between two number, not the 'difference'. I did something similar once on strings to find strings close to each other.
Some pseudo code:
int Distance(int x, int y)
{
int result = 0;
while ((x > 0) && (y > 0))
{
result += abs(x%10 - y%10);
x /= 10;
y /= 10;
}
return result;
}
void Caller()
{
int distance = Distance(123, 456);
if (distance == 0) write("x and y are equal");
else write("the relative distance between x and y = " + distance.ToString())'
}

Related

Transform 2 int into 1 int of 5 of lenght

This might not belong here so if I need to ask this somewhere else please tell me.
Let's say we have 10032(Will be X) and 154(Will be Y) as the input, what I would need is to get 1 int as the output. That output would also need to be of length of 4 or 5.
With the output and either X or Y know, I need to stop anyone from discovering the formula. This is a scenario where the Y will stay the same but the X will change often.
I am reading on hash but I am unsure of which one would be the best for me. Or if a simple math formula would do the job. In the program we are currently using it in the following way :
X + Y * 2 / 3 and then rounding to the lower number.
This solution would also need a very low amount of collision.
Thanks

For this question, you may have better luck at Cryptography Stack Exchange but here are a few thoughts.
It sounds like you want to map a 5-digit int and 3-digit int to a 4- or 5-digit int with the qualifications that:
a. The producing algorithm is difficult to determine given the input
b. There are few collisions
Given some function F(x,y) there are 100,000,000 combinations of x and y if x and y are between 1 and 5 digits and 1 and 3 respectively.
If F(x,y) produces a 5-digit number there are 100,000 possible solutions .
On average this would mean that each value of F(x,y) has 1,000 combinations of x, y that map to it.
So at best case this means that given x1, y1 and x2, y2 the odds that F(x1,y1)=F(x2,y2) is 1/1000, which for most uses I can think of would be considered too high.
Considering those things, probably the simplest idea would be a basic modular ring over the ints like Oscar mentioned. For your modulo you should pick the greatest prime number with the number of digits you want to keep. For instance if you want a 5 digit result use 99,877. Or if you wanted to avoid collisions and go with 9-digits, you would use 999,999,733. You can use a prime list to look up which prime you use for your modulo.

I assume that a good approach to minimise collisions would be to use modulus 10^6 after whatever operation you perform on both numbers.
The hard part would be the operation between the original ints, but look up theory about hashing and I am sure you can find nice suggestions.
In order to make it truly difficult to reverse, you could perform operations in several stages, each one of them depending on the results of the previous one. Just an idea...

decimal d = (X * Y) - (reverse X * reverse Y);
(When I say reverse 10032 would be 23001)
Then take the first 4 or 5 digits if there are more.
Or you could make a string that would look like this:
10032154 and then use a Hash method and then take the first 4 or 5 digits?
(You could reverse this too so the string is: 45123001)
BTW why do you need to take the 1st 4 or 5 digits?
Reducing the amount of digits will cause the chance of collusion to increase.

Weighted random number generation C#

I have been trying to search answer for this, but all discussions that I have found are either in language that I don't understand or relies on having a collection where each element has its own weight.
I want to basically just get a random number between 0 and 10, which is "middle-weighted" as in 5 comes more often than 0 and 10. Basically I have been trying to figure out an algorithm where I can give any number to be the "weighted number" between min and max values that I have defined and all the numbers generated would be weighted appropiately. I know that this may sound like "I dont want to think about this, I'll just sit back and wait someone else to do this", but I have been thinking and searching about this for like an hour and I'm really lost :|
So in the end, I want that I could call ( via extension method )
random.NextWeighted(MIN, MAX, WEIGHT);

You have an inverse normal distribution method available.
Scale your random number so that it's a double between zero and one.
Pass it to InverseNormalDistribution.
Scale the returned value based on the weight. (For example, divide by weight over 100.)
Calculate [ (MIN + MAX) / 2 ] + [ (ScaledValue) X (MAX - MIN) ]
If that's less than MIN, return MIN. If it's more than MAX, return MAX. Otherwise, return this value.

I don't know how much more often you want 5 to appear than the other numbers between 0-10 but you could create an array with the distribution you want.
Something like
var dist = new []{0,1,2,3,4,5,6,7,8,9,10,5,5,5};
Then you get a random positions of 0 and 13 you will get numbers between 0-10 but a 5 four times more often than the others. Pretty fast but not very practical if you want numbers between 0 and billion though.

Get number of digits in an unsigned long integer c#

I'm trying to determine the number of digits in a c# ulong number, i'm trying to do so using some math logic rather than using ToString().Length. I have not benchmarked the 2 approaches but have seen other posts about using System.Math.Floor(System.Math.Log10(number)) + 1 to determine the number of digits.
Seems to work fine until i transition from 999999999999997 to 999999999999998 at which point, it i start getting an incorrect count.
Has anyone encountered this issue before ?
I have seen similar posts with a Java emphasis # Why log(1000)/log(10) isn't the same as log10(1000)? and also a post # How to get the separate digits of an int number? which indicates how i could possibly achieve the same using the % operator but with a lot more code
Here is the code i used to simulate this
Action<ulong> displayInfo = number =>
Console.WriteLine("{0,-20} {1,-20} {2,-20} {3,-20} {4,-20}",
number,
number.ToString().Length,
System.Math.Log10(number),
System.Math.Floor(System.Math.Log10(number)),
System.Math.Floor(System.Math.Log10(number)) + 1);
Array.ForEach(new ulong[] {
9U,
99U,
999U,
9999U,
99999U,
999999U,
9999999U,
99999999U,
999999999U,
9999999999U,
99999999999U,
999999999999U,
9999999999999U,
99999999999999U,
999999999999999U,
9999999999999999U,
99999999999999999U,
999999999999999999U,
9999999999999999999U}, displayInfo);
Array.ForEach(new ulong[] {
1U,
19U,
199U,
1999U,
19999U,
199999U,
1999999U,
19999999U,
199999999U,
1999999999U,
19999999999U,
199999999999U,
1999999999999U,
19999999999999U,
199999999999999U,
1999999999999999U,
19999999999999999U,
199999999999999999U,
1999999999999999999U
}, displayInfo);
Thanks in advance
Pat

log10 is going to involve floating point conversion - hence the rounding error. The error is pretty small for a double, but is a big deal for an exact integer!
Excluding the .ToString() method and a floating point method, then yes I think you are going to have to use an iterative method but I would use an integer divide rather than a modulo.
Integer divide by 10. Is the result>0? If so iterate around. If not, stop.
The number of digits is the number of iterations required.
Eg. 5 -> 0; 1 iteration = 1 digit.
1234 -> 123 -> 12 -> 1 -> 0; 4 iterations = 4 digits.

I would use ToString().Length unless you know this is going to be called millions of times.
"premature optimization is the root of all evil" - Donald Knuth

From the documentation:
By default, a Double value contains 15
decimal digits of precision, although
a maximum of 17 digits is maintained
internally.
I suspect that you're running into precision limits. Your value of 999,999,999,999,998 probably is at the limit of precision. And since the ulong has to be converted to double before calling Math.Log10, you see this error.

Other answers have posted why this happens.
Here is an example of a fairly quick way to determine the "length" of an integer (some cases excluded). This by itself is not very interesting -- but I include it here because using this method in conjunction with Log10 can get the accuracy "perfect" for the entire range of an unsigned long without requiring a second log invocation.
// the lookup would only be generated once
// and could be a hard-coded array literal
ulong[] lookup = Enumerable.Range(0, 20)
.Select((n) => (ulong)Math.Pow(10, n)).ToArray();
ulong x = 999;
int i = 0;
for (; i < lookup.Length; i++) {
if (lookup[i] > x) {
break;
}
}
// i is length of x "in a base-10 string"
// does not work with "0" or negative numbers
This lookup-table approach can be easily converted to any base. This method should be faster than the iterative divide-by-base approach but profiling is left as an exercise to the reader. (A direct if-then branch broken into "groups" is likely quicker yet, but that's way too much repetitive typing for my tastes.)
Happy coding.

Calculate factorials in C#

How can you calculate large factorials using C#? Windows calculator in Win 7 overflows at Factorial (3500). As a programming and mathematical question I am interested in knowing how you can calculate factorial of a larger number (20000, may be) in C#. Any pointers?
[Edit] I just checked with a calc on Win 2k3, since I could recall doing a bigger factorial on Win 2k3. I was surprised by the way things worked out.
Calc on Win2k3 worked with even big numbers. I tried !50000 and I got an answer, 3.3473205095971448369154760940715e+213236
It was very fast while I did all this.
The main question here is not only to find out the appropriate data type, but also a bit mathematical. If I try to write a simple factorial code in C# [recursive or loop], the performance is really bad. It takes multiple seconds to get an answer. How is the calc in Windows 2k3 (or XP) able to perform such a huge factorial in less than 10 seconds? Is there any other way of calculating factorial programmatically in C#?

Have a look at the BigInteger structure:
http://msdn.microsoft.com/en-us/library/system.numerics.biginteger.aspx
Maybe this can help you implement this functionality.
CodeProject has an implementation for older versions of the framework at http://www.codeproject.com/KB/cs/biginteger.aspx.

If I try to write a simple factorial code in C# [recursive or loop], the performance is really bad. It takes multiple seconds to get an answer.
Let's do a quick order-of-magnitude calculation here for a naive implementation of factorial that performs n multiplications. Suppose we are on the last step. 19999! is about 218 bits. 20000 is about 25 bits; we'll assume that it is a 32 bit integer. The final multiplication therefore involves the addition of up to 25 partial results each roughly 218 bits long. The number of bit operations will therefore be on the order of 223.
That's for the last stage; there will be 20000 = 216 such operations at each stage, so that is a total of about 239 operations. Some of them will of course be cheaper, but we're going for an order of magnitude here.
A modern processor does about 232 operations per second. Therefore it will take about 27 seconds to get the result.
Of course, the big integer library writers were not naive; they take advantage of the ability of the chip to do many bit operations in parallel. They're probably doing the math in 32 bit chunks, giving speedups of a factor of 25. So our total order-of-magnitude calculation is that it should take about 22 seconds to get a result.
22 is 4. So your observation that it takes a few seconds to get a result is expected.
How is the calc in Windows 2k3 (or XP) able to perform such a huge factorial in less than 10 seconds?
I don't know. Extreme cleverness in exploiting the math operations on the chip probably. Or, using a non-naive algorithm for calculating factorial. Or, possibly they are using Stirling's Approximation and getting an inexact result.
Is there any other way of calculating factorial programmatically in C#?
Sure. If all you care about is the order of magnitude then you can use Stirling's Approximation. If you care about the exact value then you're going to have to compute it.

There exist sophisticated computational algorithms for efficiently computing the factorials of large, arbitrary precision numbers. The Schönhage–Strassen algorithm, for instance, allows you to perform asymptotically fast multiplication for arbitrarily large integers.
Case in point, Mathematica computes 22000! on my machine in less than 1 second. The Implementation Notes page at reference.wolfram.com states:
(Mathematica's) n! uses an O(log(n) M(n)) algorithm of Schönhage based on dynamic decomposition to prime powers.
Unfortunately, the implementation of such algorithms is both complicated and error prone. Rather than trying to roll your own implementation, it may be wiser for you to license a copy of Mathematica (or a similar product that meets your functional and performance needs) and either use it, or a .NET programming interface to it, to perform your computation.

Have you looked at System.Numerics.BigInteger?

Using System.Numerics BigInteger
var bi = new BigInteger(1);
var factorial = 171;
for (var i = 1; i <= factorial; i++)
{
bi *= i;
}
will be calculated to
1241018070217667823424840524103103992616605577501693185388951803611996075221691752992751978120487585576464959501670387052809889858690710767331242032218484364310473577889968548278290754541561964852153468318044293239598173696899657235903947616152278558180061176365108428800000000000000000000000000000000000000000
For 50000! it takes a couple seconds to calculate but it seems to work and the result is a 213237 digit number and that's also what Wolfram says.

You will probably have to implement your own arbitrary precision numeric type.
There are various approaches. probably not the most efficient, but perhaps the simplest is to have variable length arrays of byte (unsigned char). Each element represents a digit. ideally this would be included in a class, and you can then add a method which let's you multiply the number with another arbitrary precision number. A multiply with a standard C# integer would probably also be a good idea, but a little trickier to implement.

Since they don't give you the result down to the last digit, they may be "cheating" using some approximation.
Check out http://mathworld.wolfram.com/StirlingsApproximation.html
Using Stirling's formula you can calculate (an approximation of) the factorial of n in logn time. Of course, they might as well have a dictionary with pre-calculated values of factorial(n) for every n up to one million, making the calculator show the result extremely fast.

This answer covers limits for basic .Net types to compute and represent n!
Basic code to calculate factorial for "SomeType" that supports multiplication:
SomeType factorial = 1;
int n = 35;
for (int i = 1; i <= n; i++)
{
factorial *= i;
}
Limits for built in number types:
short - correct results up to 7!, incorrect results afterwards, code returns 0 starting 18 (similar to int)
int - correct results up to 12!, incorrect results afterwards, code returns 0 starting at 34 (Why computing factorial of realtively small numbers (34+) returns 0)
float - precise results up to 14!, correct but not precise afterwards, returns infinity starting at 35
long - correct results up to 20!, incorrect results afterwards, code returns 0 starting at 66 (similar to int)
double - precise results up to 22!, correct but not precise afterwards, returns infinity starting at 171
BigInteger - precise and upper limit is set by memory usage only.
Note: integer types overflow pretty quickly and start producing incorrect results. Realistically if you need factorials for any practical usage long is the type to go (up to 20!), if you can't expect limited numbers - BigInteger is the only type provided in .Net Framework to provide precise results (albeit slow for large numbers as there is no built-in optimized n! method)

You need a special big-number library for this. This link introduces the System.Numeric.BigInteger class, and incidentally has an example program that calculates factorials. But don't use the example! If you recurse like that, your stack will grow horribly. Just write a for-loop to do the multiplication.

I don't know how you could do this in a language without arbitrary precision arithmetic. I guess a start could be to count factors of 5 and 2, removing them from the product, and add on these zeroes at the end.
As you can see there are many.
>>> factorial(20000)
<<non-zeroes removed>>0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000L

How to manage AI actions based on percentages

I am looking now for some time about how can a programmer simulate a AI decision based on percentages of actions for the final fantasy tactic-like games (strategy game).
Say for example that the AI character has the following actions:
Attack 1: 10%
Attack 2: 9%
Magic : 4%
Move : 1%
All of this is far from equaling 100%
Now at first I though about having an array with 100 empty slots, attack would have 10 slots, attack 2 9 slots on the array. Combining random I could get the action to do then. My problem here is it is not really efficient, or doesn't seem to be. Also important thing, what do I do if I get on an empty slot. Do I have to calculate for each character all actions based on 100% or define maybe a "default" action for everyone ?
Or maybe there is a more efficient way to see all of this ? I think that percentage is the easiest way to implement an AI.

The best answer I can come up with is to make a list of all the possible moves you want the character to have, give each a relative value, then scale all of them to total 100%.
EDIT:
For example, here are three moves I have. I want attack and magic to be equally likely, and fleeing to be half as likely as attacking or using magic:
attack = 20
magic = 20
flee = 10
This adds up to 50, so dividing each by this total gives me a fractional value (multiply by 100 for percentage):
attack = 0.4
magic = 0.4
flee = 0.2
Then, I would make from this a list of cumulative values (i.e. each entry is a sum of that entry and all that came before it):
attack = 0.4
magic = 0.8
flee = 1
Now, generate a random number between 0 and 1 and find the first entry in the list that is greater than or equal to that number. That is the move you make.

No, you just create threshholds. One simple way is:
0 - 9 -> Attack1
10 - 18 -> Attack 2
19 - 22 -> Magic
23 -> Move
Something else -> 24-99 (you need to add up to 100)
Now create a random number and mod it by 100 (so num = randomNumber % 100) to define your action. The better the random number to close to a proper distribution you will get. So you take the result and see which category it falls into. You can actually make this even more efficient but it is a good start.

Well if they don't all add up to 100 they aren't really percentages. This doesnt matter though. you just need to figure out the relative probability of each action. To do this use the following formula...
prob = value_of_action / total_value_of_all_actions
This gives you a number between 0 and 1. if you really want a percentage rather than a fraction, multiply it by 100.
here is an example:
prob_attack = 10 / (10 + 9 + 4 + 1)
= 10 / 24
= 0.4167
This equates to attack being chosen 41.67% of the time.
you can then generate thresholds as is mentioned in other answers. And use a random number between 0 and 1 to choose your action.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.