Rabin Karp string matching algorithm - c#

I've seen this Rabin Karp string matching algorithm in the forums on the website and I'm interested in trying to implement it but I was wondering If anyone could tell me why the variables ulong Q and ulong D are 100007 and 256 respectively :S?
What significance do these values carry with them?
static void Main(string[] args)
{
string A = "String that contains a pattern.";
string B = "pattern";
ulong siga = 0;
ulong sigb = 0;
ulong Q = 100007;
ulong D = 256;
for (int i = 0; i < B.Length; i++)
{
siga = (siga * D + (ulong)A[i]) % Q;
sigb = (sigb * D + (ulong)B[i]) % Q;
}
if (siga == sigb)
{
Console.WriteLine(string.Format(">>{0}<<{1}", A.Substring(0, B.Length), A.Substring(B.Length)));
return;
}
ulong pow = 1;
for (int k = 1; k <= B.Length - 1; k++)
pow = (pow * D) % Q;
for (int j = 1; j <= A.Length - B.Length; j++)
{
siga = (siga + Q - pow * (ulong)A[j - 1] % Q) % Q;
siga = (siga * D + (ulong)A[j + B.Length - 1]) % Q;
if (siga == sigb)
{
if (A.Substring(j, B.Length) == B)
{
Console.WriteLine(string.Format("{0}>>{1}<<{2}", A.Substring(0, j),
A.Substring(j, B.Length),
A.Substring(j + B.Length)));
return;
}
}
}
Console.WriteLine("Not copied!");
}

About the magic numbers Paul's answer is pretty clear.
As far as the code is concerned, Rabin Karp's principal idea is to perform an hash comparison between a sliding portion of the string and the pattern.
The hash cannot be computed each time on the whole substrings, otherwise the computation complexity would be quadratic O(n^2) instead of linear O(n).
Therefore, a rolling hash function is applied, such as at each iteration only one character is needed to update the hash value of the substring.
So, let's comment your code:
for (int i = 0; i < B.Length; i++)
{
siga = (siga * D + (ulong)A[i]) % Q;
sigb = (sigb * D + (ulong)B[i]) % Q;
}
if (siga == sigb)
{
Console.WriteLine(string.Format(">>{0}<<{1}", A.Substring(0, B.Length), A.Substring(B.Length)));
return;
}
^ This piece computes the hash of pattern B (sigb), and the hashcode of the initial substring of A of the same length of B.
Actually it's not completely correct because hash can collide¹ and so, it is necessary to modify the if statement : if (siga == sigb && A.Substring(0, B.Length) == B).
ulong pow = 1;
for (int k = 1; k <= B.Length - 1; k++)
pow = (pow * D) % Q;
^ Here's computed pow that is necessary to perform the rolling hash.
for (int j = 1; j <= A.Length - B.Length; j++)
{
siga = (siga + Q - pow * (ulong)A[j - 1] % Q) % Q;
siga = (siga * D + (ulong)A[j + B.Length - 1]) % Q;
if (siga == sigb)
{
if (A.Substring(j, B.Length) == B)
{
Console.WriteLine(string.Format("{0}>>{1}<<{2}", A.Substring(0, j),
A.Substring(j, B.Length),
A.Substring(j + B.Length)));
return;
}
}
}
^ Finally, the remaining string (i.e. from the second character to end), is scanned updating the hash value of the A substring and compared with the hash of B (computed at the beginning).
If the two hashes are equal, the substring and the pattern are compared¹ and if they're actually equal a message is returned.
¹ Hash values can collide; hence, if two strings have different hash values they're definitely different, but if the two hashes are equal they can be equal or not.

The algorithm uses hashing for fast string comparison. Q and D are magic numbers that the coder probably arrived at with a little bit of trial and error and give a good distribution of hash values for this particular algorithm.
You can see these types of magic numbers used for hashing many places. The example below is the decompiled definition of the GetHashCode function of a .NET 2.0 string type:
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
public override unsafe int GetHashCode()
{
char* chrPointer = null;
int num1;
int num2;
fixed (string str = (string)this)
{
num1 = 352654597;
num2 = num1;
int* numPointer = chrPointer;
for (int i = this.Length; i > 0; i = i - 4)
{
num1 = (num1 << 5) + num1 + (num1 >> 27) ^ numPointer;
if (i <= 2)
{
break;
}
num2 = (num2 << 5) + num2 + (num2 >> 27) ^ numPointer + (void*)4;
numPointer = numPointer + (void*)8;
}
}
return num1 + num2 * 1566083941;
}
Here is another example from a R# generated GetHashcode override function for a sample type:
public override int GetHashCode()
{
unchecked
{
int result = (SomeStrId != null ? SomeStrId.GetHashCode() : 0);
result = (result*397) ^ (Desc != null ? Desc.GetHashCode() : 0);
result = (result*397) ^ (AnotherId != null ? AnotherId.GetHashCode() : 0);
return result;
}
}

Related

inaccurate results with function to add an array of digits together

so i have this function:
static int[] AddArrays(int[] a, int[] b)
{
int length1 = a.Length;
int length2 = b.Length;
int carry = 0;
int max_length = Math.Max(length1, length2) + 1;
int[] minimum_arr = new int[max_length - length1].Concat(a).ToArray();
int[] maximum_arr = new int[max_length - length2].Concat(b).ToArray();
int[] new_arr = new int[max_length];
for (int i = max_length - 1; i >= 0; i--)
{
int first_digit = maximum_arr[i];
int second_digit = i - (max_length - minimum_arr.Length) >= 0 ? minimum_arr[i - (max_length - minimum_arr.Length)] : 0;
if (second_digit + first_digit + carry > 9)
{
new_arr[i] = (second_digit + first_digit + carry) % 10;
carry = 1;
}
else
{
new_arr[i] = second_digit + first_digit + carry;
carry = 0;
}
}
if (carry == 1)
{
int[] result = new int[max_length + 1];
result[0] = 1;
Array.Copy(new_arr, 0, result, 1, max_length);
return result;
}
else
{
return new_arr;
}
}
it basically takes 2 lists of digits and adds them together. the point of this is that each array of digits represent a number that is bigger then the integer limits. now this function is close to working the results get innacurate at certein places and i honestly have no idea why. for example if the function is given these inputs:
"1481298410984109284109481491284901249018490849081048914820948019" and
"3475893498573573849739857349873498739487598" (both of these are being turned into a array of integers before being sent to the function)
the expected output is:
1,481,298,410,984,109,284,112,957,384,783,474,822,868,230,706,430,922,413,560,435,617
and what i get is:
1,481,298,410,984,109,284,457,070,841,142,258,634,158,894,233,092,241,356,043,561,7
i would very much appreciate some help with this ive been trying to figure it out for hours and i cant seem to get it to work perfectly.
I suggest Reverse arrays a and b and use good old school algorithm:
static int[] AddArrays(int[] a, int[] b) {
Array.Reverse(a);
Array.Reverse(b);
int[] result = new int[Math.Max(a.Length, b.Length) + 1];
int carry = 0;
int value = 0;
for (int i = 0; i < Math.Max(a.Length, b.Length); ++i) {
value = (i < a.Length ? a[i] : 0) + (i < b.Length ? b[i] : 0) + carry;
result[i] = value % 10;
carry = value / 10;
}
if (carry > 0)
result[result.Length - 1] = carry;
else
Array.Resize(ref result, result.Length - 1);
// Let's restore a and b
Array.Reverse(a);
Array.Reverse(b);
Array.Reverse(result);
return result;
}
Demo:
string a = "1481298410984109284109481491284901249018490849081048914820948019";
string b = "3475893498573573849739857349873498739487598";
string c = string.Concat(AddArrays(
a.Select(d => d - '0').ToArray(),
b.Select(d => d - '0').ToArray()));
Console.Write(c);
Output:
1481298410984109284112957384783474822868230706430922413560435617

How can i optimize this problem while using only 3 for loops?

So the problem that I'm trying to optimize is to find and print all four-digit numbers of the type ABCD for which: A + B = C + D.
For example:
1001
1010
1102
etc.
I have used four for loops to solve this (one for every digit of the number).
for (int a = 1; a <= 9; a++)
{
for (int b = 0; b <= 9; b++)
{
for (int c = 0; c <= 9; c++)
{
for (int d = 0; d <= 9; d++)
{
if ((a + b) == (c + d))
{
Console.WriteLine(" " + a + " " + b + " " + c + " " + d);
}
}
}
}
}
My question is: how can I solve this using only 3 for loops?
Here's an option with two loops (though still 10,000 iterations), separating the pairs of digits:
int sumDigits(int input)
{
int result = 0;
while (input != 0)
{
result += input % 10;
input /= 10;
}
return result;
}
//optimized for max of two digits
int sumDigitsAlt(int input)
{
return (input % 10) + ( (input / 10) % 10);
}
// a and b
for (int i = 0; i <= 99; i++)
{
int sum = sumDigits(i);
// c and d
for (int j = 0; j <= 99; j++)
{
if (sum == sumDigits(j))
{
Console.WriteLine( (100 * i) + j);
}
}
}
I suppose the while() loop inside of sumDigits() might count as a third loop, but since we know we have at most two digits we could remove it if needed.
And, of course, we can use a similar tactic to do this with one loop which counts from 0 to 9999, and even that we can hide:
var numbers = Enumerable.Range(0, 10000).
Where(n => {
// there is no a/b
if (n < 100 && n == 0) return true;
if (n < 100) return false;
int sumCD = n % 10;
n /= 10;
sumCD += n % 10;
n /= 10;
int sumAB = n % 10;
n /= 10;
sumAB += n % 10;
return (sumAB == sumCD);
});
One approach is to write a method that takes in an integer and returns true if the integer is four digits and the sum of the first two equal the sum of the second two:
public static bool FirstTwoEqualLastTwo(int input)
{
if (input < 1000 || input > 9999) return false;
var first = input / 1000;
var second = (input - first * 1000) / 100;
var third = (input - first * 1000 - second * 100) / 10;
var fourth = input - first * 1000 - second * 100 - third * 10;
return (first + second) == (third + fourth);
}
Then you can write a single loop from 1000-9999 and output the numbers for which this is true with a space between each digit (not sure why that's the output, but it appears that's what you were doing in your sample code):
static void Main(string[] args)
{
for (int i = 1000; i < 10000; i++)
{
if (FirstTwoEqualLastTwo(i))
{
Console.WriteLine(" " + string.Join(" ", i.ToString().ToArray()));
}
}
Console.Write("Done. Press any key to exit...");
Console.ReadKey();
}
We can compute the value of d from the values of a,b,c.
for (int a = 1; a <= 9; a++)
{
for (int b = 0; b <= 9; b++)
{
for (int c = 0; c <= 9; c++)
{
if (a + b >= c && a + b <= 9 + c)
{
int d = a + b - c;
Console.WriteLine(" " + a + " " + b + " " + c + " " + d);
}
}
}
}
We can further optimize by changing the condition of the third loop to for (int c = max(0, a + b - 9); c <= a + b; c++) and getting rid of the if statement.

Rounding amount with available set of denominations

This isn't regular rounding thing which rounds up or down based of a single value.
I would want to have a function where I pass the amount as integer and denominations as array of integer.
What that function should return to me is a nearest possible integer value achievable with passed array of denominations.
Whether to round up or down will again be sent as a parameter.
Code:
var amount = 61; // for. e.g.
int[] denoms = [20, 50]; // for. e.g.
bool roundUp = true;
amount = RoundAmount(amount, denoms, roundUp);
Expected result :
RoundAmount function should return me the nearest possible amount achievable with denoms that I have passed.
If roundUp = true, The return value should be 70, because 70 = 20+50
and amount 70 can be achieved by one note of 20s and one note of 50s.
If roundUp = false, It should have returned 60, because 60 =
20+20+20 and amount 60 can be achieved by 3 notes of 20s
What I got so far :
I was only reached to the point where I can manage to round the amount up or down based on a single integer (and not the array of integers)
public int RoundAmount(int amount, int value, bool roundUp)
{
if (roundUp)
amount = amount - (amount % value) + value;
else
amount = amount - (amount % value)
return amount;
}
Edit:
I have another recursive function which checks if amount is achievable or not,
Only if amount isn't achievable, RoundAmount function is called.
So in my example, amount = 70 will never be the input because 70 is achievable with available denoms and I won't call the RoundAmount in that case.
Solution: (Thanks to maraca and Koray)
I'm glad its working with long numbers though it wasn't original requirement.
private static long RoundAmount_maraca(long a, long[] d, bool up)
{
d = d.ToArray();
Array.Sort(d);
if (a < d[0])
return up ? d[0] : 0;
long count = 0;
for (long i = 0; i < d.Length; i++)
{
if (d[i] == 0)
continue;
for (long j = i + 1; j < d.Length; j++)
if (d[j] % d[i] == 0)
d[j] = 0;
if (d[i] > a && !up)
break;
d[count++] = d[i];
if (d[i] > a)
break;
}
if (count == 1)
return (!up ? a : (a + d[0] - 1)) / d[0] * d[0];
long gcd = euclid(d[1], d[0]);
for (long i = 2; i < count && gcd > 1; i++)
gcd = euclid(d[i], gcd);
if (up)
a = (a + gcd - 1) / gcd;
else
a /= gcd;
for (long i = 0; i < count; i++)
{
d[i] /= gcd;
if (a % d[i] == 0)
return a * gcd;
}
var set = new HashSet<long>();
set.Add(0);
long last = 0;
for (long n = d[0]; ; n++)
{
if (!up && n > a)
return last * gcd;
for (long i = 0; i < count && n - d[i] >= 0; i++)
{
if (set.Contains(n - d[i]))
{
if (n >= a)
return n * gcd;
if ((a - n) % d[0] == 0)
return a * gcd;
set.Add(n);
last = n;
break;
}
}
}
}
private static long euclid(long a, long b)
{
while (b != 0)
{
long h = a % b;
a = b;
b = h;
}
return a;
}
I am assuming that you are looking for a performant solution with a relatively small amount of denominations b (e.g. less than 100 denominations). While the amount a and the denominations d[i] can be quite large (e.g. less than 10^6).
Sort d ascending and remove duplicates. When rounding down only keep the values smaller or equal than a and when rounding up keep only the smallest value greater or equal than a and discard the greater ones.
(Optional) remove all numbers which are a multiple of some other number O(b^2).
Calculate the greatest common divisor gcd of the denominations. You can use the Euclidean algorithm starting with the first two numbers then calculate the greatest common divisor of the result and the third number and so on. Of course you can stop as soon as you reach one.
Divide a by gcd, round like you want to round the result (using integer division, rounding down: a /= gcd, rounding up: a = (a + gcd - 1) / gcd).
Divide all denominations by gcd (d[i] /= gcd). Now the greatest common divisor of all denominations is one and therefore it is guaranteed that a Frobenius number exists (all amounts greater than that number can be built and require no rounding). While doing so you can also check if the new value leads to a % d[i] == 0 and immediately return a * gcd if so.
Create a hash set for the values which can be built. It is better than an array because the array is potentially wasting a lot of space (remember the Frobenius number). Add zero to the set.
Create a variable n for the current number, initialize with smallest denomination: n = d[0].
If n can be built with any of the available denominations, in other words the set contains any of n - d[i] then proceed with the next step. Otherwise increase n by one and repeat this step unless n == a and you are rounding down, then you can return the last number that could be built multiplied by gcd immediately. You could also remove n - d[b - 1] from the set each time because this value will not be requested any more.
If n >= a return n * gcd (can only be true when rounding up, rounding down would have returned the result in step 8. already). Else if (a - n) % d[0] == 0 return a * gcd. This check is even better than looking for the Frobenius number (the number after which d[0] - 1 consecutive values can be built), it is more or less the equivalent (d[0] - 1 consecutive values means the difference between one of them and a modulo d[0] has to be zero) but could return much faster. Else increase n by one and continue with step 8.
An example with d = {4, 6} and a = 9999 (or any other big odd number) shows the advantages of this algorithm. It is easy to see that odd numbers can never be built and we would fill up the whole set with all even numbers except 2. But if we divide by gcd we get d = {2, 3} and aUp = 5000 and aDown = 4999. The Frobenius number for {2, 3} is 1 (the only number which cannot be built), so after at most 3 (first number where all modulos are covered) steps (instead of 10000) the modulo would be zero and we would return a * gcd which gives 9998 or 10000 depending on rounding direction, which is the correct result.
Here is the code with test included. I did six runs on my crappy notebook and it took 90, 92, 108, 94, 96 and 101 seconds (edit: early loop escape if current denomination greater than current number && n - d[i] >= 0 halves the times and gives an average of about 45s) for 7200 random roundings (3600 in each direction) with combinations of different amounts of denominations (range 2 to 100), dMax (range 100 to 10^6) and aMax (range 10^4 to 10^6), (see the code at the bottom for the exact values). I think the time for the random number generation and output can be neglected, so with this input and the given ranges the algorithm rounds about 160 numbers per second on average (edit: see thirty times faster version below).
public static final int round(int a, int[] d, boolean up) {
d = d.clone(); // otherwise input gets changed
Arrays.sort(d);
if (a < d[0])
return up ? d[0] : 0;
int count = 0;
for (int i = 0; i < d.length; i++) {
if (d[i] == 0)
continue;
for (int j = i + 1; j < d.length; j++)
if (d[j] % d[i] == 0)
d[j] = 0;
if (d[i] > a && !up)
break;
d[count++] = d[i];
if (d[i] > a)
break;
}
if (count == 1)
return (!up ? a : (a + d[0] - 1)) / d[0] * d[0];
int gcd = euclid(d[1], d[0]);
for (int i = 2; i < count && gcd > 1; i++)
gcd = euclid(d[i], gcd);
if (up)
a = (a + gcd - 1) / gcd;
else
a /= gcd;
for (int i = 0; i < count; i++) {
d[i] /= gcd;
if (a % d[i] == 0)
return a * gcd;
}
Set<Integer> set = new HashSet<>();
set.add(0);
int last = 0;
for (int n = d[0];; n++) {
if (!up && n > a)
return last * gcd;
for (int i = 0; i < count && n - d[i] >= 0; i++) {
if (set.contains(n - d[i])) {
if (n >= a)
return n * gcd;
if ((a - n) % d[0] == 0)
return a * gcd;
set.add(n);
last = n;
break;
}
}
}
}
public static final int euclid(int a, int b) {
while (b != 0) {
int h = a % b;
a = b;
b = h;
}
return a;
}
public static final int REPEAT = 100;
public static final int[] D_COUNT = {2, 5, 10, 20, 50, 100};
public static final int[] D_MAX = {100, 10000, 1000000};
public static final int[] A_MAX = {10000, 1000000};
public static void main(String[] args) {
long start = System.currentTimeMillis();
Random r = new Random();
for (int i = 0; i < REPEAT; i++) {
for (int j = 0; j < D_COUNT.length; j++) {
for (int k = 0; k < D_MAX.length; k++) {
for (int l = 0; l < A_MAX.length; l++) {
int[] d = new int[D_COUNT[j]];
for (int m = 0; m < d.length; m++)
d[m] = r.nextInt(D_MAX[k]);
int a = r.nextInt(A_MAX[l]);
System.out.println(round(a, d, false));
System.out.println(round(a, d, true));
}
}
}
}
System.out.println((System.currentTimeMillis() - start) / 1000 + " seconds");
}
As it turns out #Koray's edit 7 is about three times faster for the given input (only for very large gcds my algorithm above is faster). So to get the ultimate algorithm I replaced the dynamic programming part of my algorithm by that of #Koray (with some improvements). It worked, it is roughly ten times faster than edit 7 and thirty times faster than the algorithm above. Which would give about 5000 roundings per second (very rough estimation) on average.
private static int round(int a, int[] d, boolean up) {
d = d.clone();
Arrays.sort(d);
if (a < d[0])
return up ? d[0] : 0;
int count = 0;
for (int i = 0; i < d.length; i++) {
if (d[i] == 0)
continue;
if (a % d[i] == 0)
return a;
for (int j = i + 1; j < d.length; j++)
if (d[j] > 0 && d[j] % d[i] == 0)
d[j] = 0;
if (d[i] > a && !up)
break;
d[count++] = d[i];
if (d[i] > a)
break;
}
if (count == 1)
return (!up ? a : (a + d[0] - 1)) / d[0] * d[0];
int gcd = euclid(d[1], d[0]);
for (int i = 2; i < count && gcd > 1; i++)
gcd = euclid(d[i], gcd);
if (gcd > 1) {
if (up)
a = (a + gcd - 1) / gcd;
else
a /= gcd;
for (int i = 0; i < count; i++) {
d[i] /= gcd;
if (a % d[i] == 0)
return a * gcd;
}
}
int best = !up ? d[count - 1] : ((a + d[0] - 1) / d[0] * d[0]);
if (d[count - 1] > a) {
if (d[count - 1] < best)
best = d[count - 1];
count--;
}
Stack<Integer> st = new Stack<Integer>();
BitSet ba = new BitSet(a + 1);
for (int i = 0; i < count; i++) {
ba.set(d[i]);
st.push(d[i]);
}
while (st.size() > 0) {
int v1 = st.pop();
for (int i = 0; i < count; i++) {
int val = v1 + d[i];
if (val <= a && !ba.get(val)) {
if ((a - val) % d[0] == 0)
return a * gcd;
ba.set(val, true);
st.push(val);
if (!up && val > best)
best = val;
} else if (val > a) {
if (up && val < best)
best = val;
break;
}
}
}
return best * gcd;
}
private static void test()
{
var amount = 61;
int[] denoms = new int[] { 20, 50 };
int up = RoundAmount(amount, denoms, true);//->70
int down = RoundAmount(amount, denoms, false);//->60
}
private static int RoundAmount(int amount, int[] denoms, bool roundUp)
{
HashSet<int> hs = new HashSet<int>(denoms);
bool added = true;
while (added)
{
added = false;
var arr = hs.ToArray();
foreach (int v1 in arr)
foreach (int v2 in arr)
if ((v1 < amount) && (v2 < amount) && (hs.Add(v1 + v2)))
added = true;
}
int retval = roundUp ? int.MaxValue : int.MinValue;
foreach (int v in hs)
{
if (roundUp)
{
if ((v < retval) && (v >= amount))
retval = v;
}
else
{
if ((v > retval) && (v <= amount))
retval = v;
}
}
return retval;
}
Edit 7
Edit 6 had a bug if a "0" denom exists. I examined #maraca's code in detail (its just great I think) and inspired on that, I've tried some optimizations on my code. Here are the performance comparisons. (I've tried to convert maraca's code to c#, I hope I ve done it right.)
private static int REPEAT = 100;
private static int[] D_COUNT = { 2, 5, 10, 20, 50, 100 };
private static int[] D_MAX = { 100, 10000, 1000000 };
private static int[] A_MAX = { 10000, 1000000 };
private static void testR()
{
Random r = new Random();
long wMaraca = 0;
long wKoray = 0;
for (int i = 0; i < REPEAT; i++)
{
for (int j = 0; j < D_COUNT.Length; j++)
{
for (int k = 0; k < D_MAX.Length; k++)
{
for (int l = 0; l < A_MAX.Length; l++)
{
int[] d = new int[D_COUNT[j]];
for (int m = 0; m < d.Length; m++)
d[m] = r.Next(D_MAX[k]);
int a = r.Next(A_MAX[l]);
Stopwatch maraca = Stopwatch.StartNew();
int m1 = RoundAmount_maraca(a, d, false);
int m2 = RoundAmount_maraca(a, d, true);
maraca.Stop();
wMaraca += maraca.ElapsedMilliseconds;
Stopwatch koray = Stopwatch.StartNew();
int k1 = RoundAmount_koray(a, d, false);
int k2 = RoundAmount_koray(a, d, true);
koray.Stop();
wKoray += koray.ElapsedMilliseconds;
if ((m1 != k1) || (m2 != k2))
{
throw new Exception("something is wrong!");
}
}
}
}
}
//some results with debug compile
//try1
//maraca: 50757 msec
//koray: 19188 msec
//try2
//maraca: 52623 msec
//koray: 19102 msec
//try3
//maraca: 57139 msec
//koray: 18952 msec
//try4
//maraca: 64911 msec
//koray: 21070 msec
}
private static int RoundAmount_koray(int amount, int[] denoms, bool roundUp)
{
List<int> lst = denoms.ToList();
lst.Sort();
if (amount < lst[0])
return roundUp ? lst[0] : 0;
HashSet<int> hs = new HashSet<int>();
for (int i = 0, count = lst.Count; i < count; i++)
{
int v = lst[i];
if (v != 0)
{
if (v > amount && !roundUp)
break;
if (hs.Add(v))
{
if (amount % v == 0)
return amount;
else
for (int j = i + 1; j < count; j++)
if (lst[j] != 0)
if (v % lst[j] == 0)
lst[j] = 0;
else if (amount % (v + lst[j]) == 0)
return amount;
}
}
}
denoms = hs.ToArray();
HashSet<int> hsOK = new HashSet<int>(denoms);
Stack<int> st = new Stack<int>(denoms);
BitArray ba = new BitArray(amount + denoms.Max() * 2 + 1);
int minOK = amount - denoms.Min();
while (st.Count > 0)
{
int v1 = st.Pop();
foreach (int v2 in denoms)
{
int val = v1 + v2;
if (!ba.Get(val))
{
if (amount % val == 0)
return amount;
ba.Set(val, true);
if (val < amount)
st.Push(val);
if (val >= minOK)
hsOK.Add(val);
}
}
}
if (!roundUp)
{
int retval = 0;
foreach (int v in hsOK)
if (v > retval && v <= amount)
retval = v;
return retval;
}
else
{
int retval = int.MaxValue;
foreach (int v in hsOK)
if (v < retval && v >= amount)
retval = v;
return retval;
}
}
private static int RoundAmount_maraca(int a, int[] d, bool up)
{
d = d.ToArray();
Array.Sort(d);
if (a < d[0])
return up ? d[0] : 0;
int count = 0;
for (int i = 0; i < d.Length; i++)
{
if (d[i] == 0)
continue;
for (int j = i + 1; j < d.Length; j++)
if (d[j] % d[i] == 0)
d[j] = 0;
if (d[i] > a && !up)
break;
d[count++] = d[i];
if (d[i] > a)
break;
}
if (count == 1)
return (!up ? a : (a + d[0] - 1)) / d[0] * d[0];
int gcd = euclid(d[1], d[0]);
for (int i = 2; i < count && gcd > 1; i++)
gcd = euclid(d[i], gcd);
if (up)
a = (a + gcd - 1) / gcd;
else
a /= gcd;
for (int i = 0; i < count; i++)
{
d[i] /= gcd;
if (a % d[i] == 0)
return a * gcd;
}
var set = new HashSet<int>();
set.Add(0);
int last = 0;
for (int n = d[0]; ; n++)
{
if (!up && n > a)
return last * gcd;
for (int i = 0; i < count && n - d[i] >= 0; i++)
{
if (set.Contains(n - d[i]))
{
if (n >= a)
return n * gcd;
if ((a - n) % d[0] == 0)
return a * gcd;
set.Add(n);
last = n;
break;
}
}
}
}
private static int euclid(int a, int b)
{
while (b != 0)
{
int h = a % b;
a = b;
b = h;
}
return a;
}
Edit - Maraca in c#
Maraca's last edit clearly outperforms all! I have tried to prepare a better c# conversion of his code + added a ulong version. (int version is ~1.6 times faster than the ulong version)
#region maraca int
private static int RoundAmount_maraca(int a, int[] d0, bool up)
{
int[] d = new int[d0.Length];
Buffer.BlockCopy(d0, 0, d, 0, d.Length * sizeof(int));
Array.Sort(d);
if (a < d[0])
return up ? d[0] : 0;
int count = 0;
for (int i = 0; i < d.Length; i++)
{
if (d[i] == 0)
continue;
for (int j = i + 1; j < d.Length; j++)
if (d[j] % d[i] == 0)
d[j] = 0;
if (d[i] > a && !up)
break;
d[count++] = d[i];
if (d[i] > a)
break;
}
if (count == 1)
return (!up ? a : (a + d[0] - 1)) / d[0] * d[0];
int gcd = euclid(d[1], d[0]);
for (int i = 2; i < count && gcd > 1; i++)
gcd = euclid(d[i], gcd);
if (up)
a = (a + gcd - 1) / gcd;
else
a /= gcd;
for (int i = 0; i < count; i++)
{
d[i] /= gcd;
if (a % d[i] == 0)
return a * gcd;
}
int best = !up ? d[count - 1] : ((a + d[0] - 1) / d[0] * d[0]);
if (d[count - 1] > a)
{
if (d[count - 1] < best)
best = d[count - 1];
count--;
}
var st = new Stack<int>();
BitArray ba = new BitArray(a+1);
for (int i = 0; i < count; i++)
{
ba.Set(d[i], true);
st.Push(d[i]);
}
while (st.Count > 0)
{
int v1 = st.Pop();
for (int i = 0; i < count; i++)
{
int val = v1 + d[i];
if (val <= a && !ba.Get(val))
{
if ((a - val) % d[0] == 0)
return a * gcd;
ba.Set(val, true);
st.Push(val);
if (!up && val > best)
best = val;
}
else if (up && val > a && val < best)
best = val;
}
}
return best * gcd;
}
private static int euclid(int a, int b)
{
while (b != 0)
{
int h = a % b;
a = b;
b = h;
}
return a;
}
#endregion
#region maraca ulong
private static ulong RoundAmount_maraca_ulong(ulong a, ulong[] d0, bool up)
{
ulong[] d = new ulong[d0.Length];
Buffer.BlockCopy(d0, 0, d, 0, d.Length * sizeof(ulong));
Array.Sort(d);
if (a < d[0])
return up ? d[0] : 0ul;
int count = 0;
for (int i = 0; i < d.Length; i++)
{
if (d[i] == 0ul)
continue;
for (int j = i + 1; j < d.Length; j++)
if (d[j] % d[i] == 0ul)
d[j] = 0ul;
if (d[i] > a && !up)
break;
d[count++] = d[i];
if (d[i] > a)
break;
}
if (count == 1)
return (!up ? a : (a + d[0] - 1ul)) / d[0] * d[0];
ulong gcd = euclid(d[1], d[0]);
for (int i = 2; i < count && gcd > 1; i++)
gcd = euclid(d[i], gcd);
if (up)
a = (a + gcd - 1ul) / gcd;
else
a /= gcd;
for (int i = 0; i < count; i++)
{
d[i] /= gcd;
if (a % d[i] == 0ul)
return a * gcd;
}
ulong best = !up ? d[count - 1] : ((a + d[0] - 1ul) / d[0] * d[0]);
if (d[count - 1] > a)
{
if (d[count - 1] < best)
best = d[count - 1];
count--;
}
var st = new Stack<ulong>();
UlongBitArray ba = new UlongBitArray(a + 1ul);
for (int i = 0; i < count; i++)
{
ba.Set(d[i], true);
st.Push(d[i]);
}
while (st.Count > 0)
{
ulong v1 = st.Pop();
for (int i = 0; i < count; i++)
{
ulong val = v1 + d[i];
if (val <= a && !ba.Get(val))
{
if ((a - val) % d[0] == 0ul)
return a * gcd;
ba.Set(val, true);
st.Push(val);
if (!up && val > best)
best = val;
}
else if (up && val > a && val < best)
best = val;
}
}
return best * gcd;
}
private static ulong euclid(ulong a, ulong b)
{
while (b != 0)
{
ulong h = a % b;
a = b;
b = h;
}
return a;
}
class UlongBitArray
{
ulong[] bits;
public UlongBitArray(ulong length)
{
this.bits = new ulong[(length - 1ul) / 32ul + 1ul];
}
public bool Get(ulong index)
{
return (this.bits[index / 32ul] & (1ul << (int)(index % 32ul))) > 0ul;
}
public void Set(ulong index, bool val)
{
if (val)
this.bits[index / 32ul] |= 1ul << (int)(index % 32ul);
else
this.bits[index / 32ul] &= ~(1ul << (int)(index % 32ul));
}
}
#endregion
Edit 8
I have made some improvements and in random tests outperformed #maraca's latest update :) If you choose to use my custom stack class, please make measurements in release mode. (This custom stack class is of course much slower in debug mode but %5-15 faster than .NET's in relase mode. In my tests using the .NET Stack class did not change the performance comparison between two, its just an extra boost.)
private delegate int RoundAmountDelegate(int amount, int[] denoms, bool roundUp);
private static int REPEAT = 100;
private static int[] D_COUNT = { 2, 5, 10, 20, 50, 100 };
private static int[] D_MAX = { 100, 10000, 1000000 };
private static int[] A_MAX = { 10000, 1000000 };
private static void testR()
{
#if DEBUG
while (true)
#endif
{
Random r = new Random();
long wT1 = 0; RoundAmountDelegate func1 = RoundAmount_maraca;
long wT2 = 0; RoundAmountDelegate func2 = RoundAmount_koray;
for (int i = 0; i < REPEAT; i++)
{
for (int j = 0; j < D_COUNT.Length; j++)
{
for (int k = 0; k < D_MAX.Length; k++)
{
for (int l = 0; l < A_MAX.Length; l++)
{
int[] d = new int[D_COUNT[j]];
ulong[] dl = new ulong[D_COUNT[j]];
for (int m = 0; m < d.Length; m++)
{
d[m] = r.Next(D_MAX[k]) + 1;
dl[m] = (ulong)d[m];
}
int a = r.Next(A_MAX[l]);
ulong al = (ulong)a;
Stopwatch w1 = Stopwatch.StartNew();
int m1 = func1(a, d, false);
int m2 = func1(a, d, true);
w1.Stop();
wT1 += w1.ElapsedMilliseconds;
Stopwatch w2 = Stopwatch.StartNew();
int k1 = func2(a, d, false);
int k2 = func2(a, d, true);
w2.Stop();
wT2 += w2.ElapsedMilliseconds;
if ((m1 != k1) || (m2 != k2))
{
#if !DEBUG
MessageBox.Show("error");
#else
throw new Exception("something is wrong!");
#endif
}
}
}
}
}
//some results with release compile
//maraca: 1085 msec
//koray(with .NET Stack<int>): 801 msec
//maraca: 1127 msec
//koray(with .NET Stack<int>): 741 msec
//maraca: 989 msec
//koray(with .NET Stack<int>): 736 msec
//maraca: 962 msec
//koray(with .NET Stack<int>): 632 msec
//-------------------------------------------
//maraca: 1045 msec
//koray(with custom stack): 674 msec
//maraca: 1060 msec
//koray(with custom stack): 606 msec
//maraca: 1175 msec
//koray(with custom stack): 711 msec
//maraca: 878 msec
//koray(with custom stack): 699 msec
#if !DEBUG
MessageBox.Show(wT1 + " " + wT2 + " %" + (double)wT2 / (double)wT1 * 100d);
#endif
}
}
#region Koray
private static int RoundAmount_koray(int amount, int[] denoms, bool roundUp)
{
int[] sorted = new int[denoms.Length];
Buffer.BlockCopy(denoms, 0, sorted, 0, sorted.Length * sizeof(int));
Array.Sort(sorted);
int minD = sorted[0];
if (amount < minD)
return roundUp ? minD : 0;
HashSet<int> hs = new HashSet<int>();
for (int i = 0, count = sorted.Length; i < count; i++)
{
int v = sorted[i];
if (v != 0)
{
if (!roundUp && v > amount)
break;
else if (hs.Add(v))
{
if (amount % v == 0)
return amount;
else
for (int j = i + 1; j < count; j++)
if (sorted[j] != 0)
if (v % sorted[j] == 0)
sorted[j] = 0;
else if (amount % (v + sorted[j]) == 0)
return amount;
}
}
}
denoms = new int[hs.Count];
int k = 0;
foreach (var v in hs)
denoms[k++] = v;
HashSet<int> hsOK = new HashSet<int>(denoms);
stack st = new stack(denoms);
//Stack<int> st = new Stack<int>(denoms);
BitArray ba = new BitArray(amount + denoms[denoms.Length - 1] * 2 + 1);
int minOK = roundUp ? amount : amount - minD;
int maxOK = roundUp ? amount + minD : amount;
while (st.Count > 0)
{
int v1 = st.Pop();
foreach (int v2 in denoms)
{
int val = v1 + v2;
if (val <= maxOK)
{
if (!ba.Get(val))
{
if (amount % val == 0)
return amount;
int diff = amount - val;
if (diff % v1 == 0 || diff % v2 == 0)
return amount;
ba.Set(val, true);
if (val < amount)
st.Push(val);
if (val >= minOK)
hsOK.Add(val);
}
}
else
break;
}
}
if (!roundUp)
{
int retval = 0;
foreach (int v in hsOK)
if (v > retval && v <= amount)
retval = v;
return retval;
}
else
{
int retval = int.MaxValue;
foreach (int v in hsOK)
if (v < retval && v >= amount)
retval = v;
return retval;
}
}
private sealed class stack
{
int[] _array;
public int Count;
public stack()
{
this._array = new int[0];
}
public stack(int[] arr)
{
this.Count = arr.Length;
this._array = new int[this.Count*2];
Buffer.BlockCopy(arr, 0, this._array, 0, this.Count * sizeof(int));
}
public void Push(int item)
{
if (this.Count == this._array.Length)
{
int[] destinationArray = new int[2 * this.Count];
Buffer.BlockCopy(this._array, 0, destinationArray, 0, this.Count * sizeof(int));
this._array = destinationArray;
}
this._array[this.Count++] = item;
}
public int Pop()
{
return this._array[--this.Count];
}
}
#endregion
#region Maraca
private static int RoundAmount_maraca(int a, int[] d0, bool up)
{
int[] d = new int[d0.Length];
Buffer.BlockCopy(d0, 0, d, 0, d.Length * sizeof(int));
Array.Sort(d);
if (a < d[0])
return up ? d[0] : 0;
int count = 0;
for (int i = 0; i < d.Length; i++)
{
if (d[i] == 0)
continue;
for (int j = i + 1; j < d.Length; j++)
if (d[j] % d[i] == 0)
d[j] = 0;
if (d[i] > a && !up)
break;
d[count++] = d[i];
if (d[i] > a)
break;
}
if (count == 1)
return (!up ? a : (a + d[0] - 1)) / d[0] * d[0];
int gcd = euclid(d[1], d[0]);
for (int i = 2; i < count && gcd > 1; i++)
gcd = euclid(d[i], gcd);
if (up)
a = (a + gcd - 1) / gcd;
else
a /= gcd;
for (int i = 0; i < count; i++)
{
d[i] /= gcd;
if (a % d[i] == 0)
return a * gcd;
}
int best = !up ? d[count - 1] : ((a + d[0] - 1) / d[0] * d[0]);
if (d[count - 1] > a)
{
if (d[count - 1] < best)
best = d[count - 1];
count--;
}
var st = new Stack<int>();
BitArray ba = new BitArray(a + 1);
for (int i = 0; i < count; i++)
{
ba.Set(d[i], true);
st.Push(d[i]);
}
while (st.Count > 0)
{
int v1 = st.Pop();
for (int i = 0; i < count; i++)
{
int val = v1 + d[i];
if (val <= a && !ba.Get(val))
{
if ((a - val) % d[0] == 0)
return a * gcd;
ba.Set(val, true);
st.Push(val);
if (!up && val > best)
best = val;
}
else if (up && val > a && val < best)
best = val;
}
}
return best * gcd;
}
private static int euclid(int a, int b)
{
while (b != 0)
{
int h = a % b;
a = b;
b = h;
}
return a;
}
#endregion
This is a standard Knapsack problem and you can google it to refer to its wiki page for its concept.
I think your problem can be splitted to two parts.
Do Knapsack for denominations.
Use f[i] to represent the last denomination used to construct amount i, and f[i]==-1 means that i is not able to get.
fill f with -1
f[0] = 0
for i from 0 to target_amount + min(denoms) - 1
for j from 0 to denoms.size()
if f[i - denoms[j]] != -1
{
f[i] = denoms[j]
break
}
Find nearest amount based on roundUp.
roundUp == true
Starting from target_amount, ascendingly find a f[i] which is not -1.
roundUp == false
Starting from target_amount, descendingly find a f[i] which is not -1.
Optional: find which denominations construct your target amount
Backtrack your f[target_amount].
Just fill array of length amount + smallestdenomination + 1 with possible combinations of coins (standard dynamic programming problem).
Then walk this array from amount index in rounding direction.
Delphi example
var
A : array of Integer;
Denoms: array of Integer;
coin, amount, idx, i, Maxx: Integer;
roundUp: Boolean;
s: string;
begin
amount := 29;
SetLength(Denoms, 2);
Denoms[0] := 7;
Denoms[1] := 13;
Maxx := amount + MinIntValue(Denoms);
SetLength(A, Maxx + 1);
A[0] := 1;
for coin in Denoms do begin
for i := 0 to Maxx - coin do
if A[i] <> 0 then
A[i + coin] := coin;
end;
roundUp := True;
idx := amount;
i := 2 * Ord(roundUp) - 1;// 1 for roundUp=true, -1 for false
while A[idx] = 0 do //scan for nonzero entry
idx := idx + i;
s := '';
while idx > 0 do begin //roll back to get components of this sum
s := s + Format('%d ', [A[idx]]);
idx := idx - A[idx];
end;
Memo1.Lines.Add(s);
outputs 13 13 7 combination for roundUp := True; and 7 7 7 7 otherwise.
(Code does not seek for "optimal" solution)
Example for coins 3 and 5:
[0, 0, 0, 3, 0, 5, 3, 0, 5, 3, 5]
To find what coins make cell 8, step down by cell value:by 5 then by 3.
The Coin Problem is a well-researched topic and I would like to reference some papers where you can probably find better solutions:
The Money Changing Problem Revisited
Coin Problem
Also, using C# (statically typed language) will restrict you from having the most efficient algorithm over a dynamically typed language. If you plan to go down that route, you can have a look at this website The Frobenius problem. You can right click and inspect the code (though I really didn't understand much having no experience of javascript)
Anyhow, this is how I would tackle the problem in C#:
private static List<int> _denominations = new List<int>() { 1000, 5000 };
private static int _denominationMin = _denominations[0];
static void Main()
{
bool roundDown = false;
Console.WriteLine("Enter number: ");
int input = Convert.ToInt32(Console.ReadLine());
if(roundDown)
{
for(int i = input; i > _denominationMin; i--)
{
if(Check(0,0,i))
{
Console.WriteLine("Number: {0}", i);
break;
}
}
}
else
{
for (int i = input; i < int.MaxValue; i++)
{
if (Check(0, 0, i))
{
Console.WriteLine("Number: {0}", i);
break;
}
}
}
Console.Read();
}
static bool Check(int highest, int sum, int goal)
{
//Bingo!
if (sum == goal)
{
return true;
}
//Oops! exceeded here
if (sum > goal)
{
return false;
}
// Loop through _denominations.
foreach (int value in _denominations)
{
// Add higher or equal amounts.
if (value >= highest)
{
if(Check(value, sum + value, goal))
{
return true;
}
}
}
return false;
}
Worked well with {4,6} for input 19999, so I don't think it is all that bad. Surely has scope for improvements for not running into Stackoverflow Exception. One could half the input or quarter it. Or subtract a number that has factors whose subset are the denominations. Also, important to have the denominations sorted and contain no multiples of another entry E.x. {4, 6, 8} -> {4, 6}.
Anyhow, if I have time I will try to make it more efficient. Just wanted to provide an alternate solution.

Argument out of range exception thrown

I saw this function on Percentile calculation, so I copied it and pasted it into the compiler, and it gives me an OutOfRange exception at
else
{
int k = (int)n;
double d = n - k;
return sequence[k - 1] + d * (sequence[k] - sequence[k - 1]);//EXCEPTION
}
What could be the source of the problem, and how do I solve it?
Function:
public double Percentile(double[] sequence, double excelPercentile)
{
Array.Sort(sequence);
int N = sequence.Length;
double n = (N - 1) * excelPercentile + 1;
// Another method: double n = (N + 1) * excelPercentile;
if (n == 1d) return sequence[0];
else if (n == N) return sequence[N - 1];
else
{
int k = (int)n;
double d = n - k;
return sequence[k - 1] + d * (sequence[k] - sequence[k - 1]);
}
}
The issue is that k is a number larger than the number of items in the array.
As was mentioned, the function is designed to work with values between 0 and 1. Restricting the input should correct the problem.
public double Percentile(double[] sequence, double excelPercentile)
{
//if(excelPercentile > 1)
//excelPercentile = 1;
//else if(excelPercentile < 0)
//excelPercentile = 0;
//Depending on how you validate the input you can assume that it's a whole number percentage. Then you only need to check for the number to be between 0 and 100
if(excelPercentile > 100)
excelPercentile = 100;
else if(excelPercentile < 0)
excelPercentile = 0;
excelPercentile /= 100;
Array.Sort(sequence);
int N = sequence.Length;
double n = (N - 1) * excelPercentile + 1;
// Another method: double n = (N + 1) * excelPercentile;
if (n == 1d) return sequence[0];
else if (n == N) return sequence[N - 1];
else
{
int k = (int)n;
double d = n - k;
return sequence[k - 1] + d * (sequence[k] - sequence[k - 1]);
}
}

Rabin-Karp string matching algorithm by rolling hash

Here is an implementation of Rabin-Karp String matching algorithm in C#...
static void Main(string[] args)
{
string A = "String that contains a pattern.";
string B = "pattern";
ulong siga = 0;
ulong sigb = 0;
ulong Q = 100007;
ulong D = 256;
for (int i = 0; i < B.Length; i++)
{
siga = (siga * D + (ulong)A[i]) % Q;
sigb = (sigb * D + (ulong)B[i]) % Q;
}
if (siga == sigb)
{
Console.WriteLine(string.Format(">>{0}<<{1}", A.Substring(0, B.Length), A.Substring(B.Length)));
return;
}
ulong pow = 1;
for (int k = 1; k <= B.Length - 1; k++)
pow = (pow * D) % Q;
for (int j = 1; j <= A.Length - B.Length; j++)
{
siga = (siga + Q - pow * (ulong)A[j - 1] %Q) % Q;
siga = (siga * D + (ulong)A[j + B.Length - 1]) % Q;
if (siga == sigb)
{
if (A.Substring(j, B.Length) == B)
{
Console.WriteLine(string.Format("{0}>>{1}<<{2}", A.Substring(0, j),
A.Substring(j, B.Length),
A.Substring(j +B.Length)));
return;
}
}
}
Console.WriteLine("Not copied!");
}
but it has one problem if i change the position of second string than it shows result not copied but
string A = "String that contains a pattern.";
string B = "pattern";
here it shows not copied
string A = "String that contains a pattern.";
string B = "Matches contains a pattern ";
i want to check whether it is copy from first string or not even i would add something in it it or change the position but it shouldn't make difference so how to change it that it would just compare the hashes of each word in string than implement it............
Change
string B = "Matches contains a pattern ";
to
string B = "contains a pattern ";
and it will work

Categories