Generic Binary Search in C# - c#

Below is my Generic Binary Search. It works okay with the integers type array (it finds all the elements in it). But the problem arises when I use a string array to find any string data. It runs okay for the first index and last index elements but I can't find the middle elements.
Stringarray = new string[] { "b", "a", "ab", "abc", "c" };
public static void BinarySearch<T>(T[] array, T searchFor, Comparer<T> comparer) {
int high, low, mid;
high = array.Length - 1;
low = 0;
if (array[0].Equals(searchFor))
Console.WriteLine("Value {0} Found At Index {1}",array[0],0);
else if (array[high].Equals(searchFor))
Console.WriteLine("Value {0} Found At Index {1}", array[high], high);
else
{
while (low <= high)
{
mid = (high + low) / 2;
if (comparer.Compare(array[mid], searchFor) == 0)
{
Console.WriteLine("Value {0} Found At Index {1}", array[mid], mid);
break;
}
else
{
if (comparer.Compare(searchFor, array[mid]) > 0)
high = mid + 1;
else
low = mid + 1;
}
}
if (low > high)
{
Console.WriteLine("Value Not Found In the Collection");
}
}
}

A binary search requires that the input be sorted. How is "b, a, ab, abc, c" sorted? It does not appear to be sorted on any obvious sort key. If you are trying to search unsorted data you should be using a hash set, not a binary search on a list.
Also, your calculation of midpoint is subtly wrong because the addition of high + low can overflow. It then becomes a negative number, which is divided by two.
This is extremely unlikely for realistically-sized arrays but it is entirely possible that you'll want to use this algorithm someday for data types that support indexing with large integers, like a memory-mapped file of sorted data.
The best practice for writing a binary search algorithm is to do (high - low) / 2 + low when calculating the midpoint, because that stays in range the whole time.

The two lines are suspect:
high = mid + 1
low = mid + 1
Hmm. Look at the offsets. Of course this is well documented Binary Search Algorithm on Wikipedia. You also do extra work. Examine the pseudo-code and examples closely.

pst Your advice really worked. :) this code is working for both int and string.
public static int BinarySearch<T>(T[] array, T searchFor, Comparer<T> comparer)
{
int high, low, mid;
high = array.Length - 1;
low = 0;
if (array[0].Equals(searchFor))
return 0;
else if (array[high].Equals(searchFor))
return high;
else
{
while (low <= high)
{
mid = (high + low) / 2;
if (comparer.Compare(array[mid], searchFor) == 0)
return mid;
else if (comparer.Compare(array[mid], searchFor) > 0)
high = mid - 1;
else
low = mid + 1;
}
return -1;
}
}

//Binary search recursive method
public void BinarySearch(int[] input,int key,int start,int end)
{
int index=-1;
int mid=(start+end)/2;
if (input[start] <= key && key <= input[end])
{
if (key < input[mid])
BinarySearch(input, key, start, mid);
else if (key > input[mid])
BinarySearch(input, key, mid + 1, end);
else if (key == input[mid])
index = mid;
if (index != -1)
Console.WriteLine("got it at " + index);
}
}
int[] input4 = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
BinarySearch(input4, 1, 0, 8);

Related

Binary Search with recursive algorithm

Another problem here!
I've been coding a Binary Search with recursive algorithm.
Now it seems to be some problem when it search in upper half of the array. I can't really find whats wrong.
//=====Binary Search=====//
static int BinarySearch(City[] cities, int key, int low, int high)
{
int mid;
if (low > high)
{
return -1;
}
mid = low + high / 2;
if (key == cities[mid].temp)
{
return mid;
}
else if (key < cities[mid].temp)
{
return BinarySearch(cities, key, low, mid - 1);
}
else
{
return BinarySearch(cities, key, mid +1, high);
}
}
When I search for a number that can't be found it will print: "can't find temperature".
It is doing its work as long as I don't search for a number in the upper half.
Console.WriteLine("\n\tBINÄR SÖKNING\n");
do
{
loop = true;
Console.Write("search temperature:");
str = Console.ReadLine();
try
{
key = Convert.ToInt32(str);
index = BinarySearch(cities, key, low, high);
if (index == -1)
{
Console.WriteLine($"can't find temperature: {key}°C");
}
else
{
Console.WriteLine("");
Console.WriteLine(cities[index].ToString());
loop = false;
}
}
catch
{
Console.WriteLine("Only numbers, please");
}
} while (loop);
If I search for a number in the upper half, the console will print "Only numbers, please". It goes to the catch-part. as it should if I search for something that can NOT convert to int.
Operator precedence bites again.
How is the expression low + high / 2 parsed?
Probably not the way you think. Multiplicative operators have higher precedence than additive operators, so
low + high / 2
gets parsed as
low + (high / 2)
rather than your intended
(low + high) / 2

C# Get Binary Search To Display All Duplicate Values In Array

If I search through my sorted array it only displays one of the values even if there are multiple of the same value in the array. I don't want it to tell me how many duplicates there are, I want it to display all of the duplicate values in the array that I search for. Or is there are different search I need to use to do this?
So if I have array1{1,2,3,4,4,5,5,5,5,6} and I search for 5 I want it to output:
5
5
5
5
This is is my code with binary search.
public class search
{
public static void Main(string[] args)
{
//Arrays are created here. e.g. array1{1,2,3,4,4,5,5,5,5,6}
int Input;
Console.WriteLine("Enter the number you would like to search for.");
Input = Convert.ToInt32(Console.ReadLine());
int y = BinarySearch(array1, Input);
Console.WriteLine("array1 {0} : array2{1} : array3 {2} : array4 {3} : array5 {4}",array1[y], array2[y], array3[y], array4[y], array5[y]);
}
public static int BinarySearch(double[] Array, int Search)
{
int x = Array.Length;
int low = 0;
int high = x - 1;
while (low <= high)
{
while (low <= high)
{
int mid = (low + high) / 2;
if (Search < Array[mid])
{
high = mid - 1;
}
else if (Search > Array[mid])
{
low = mid + 1;
}
else if (Search == Array[mid])
{
Console.WriteLine("{0}", Search);
return mid;
}
}
Console.WriteLine("{0} was not found.", Search);
}
return high;
}
}
Sure, you can use binary search for this. But your code is kinda weird. Because when you find the first element here else if (Search == array[mid]) ... you immediately return from the function and never call it again. That is why you get only one result.
To make it work, when you find such element, you need to search in the array through indices low ... mid-1 and then through indices mid+1 ... high.
Here is the code, but I strongly advise you not just to copy that and maybe try to rewrite this into while loop (every recursion can be rewritten as a loop)
static void BinarySearch(int[] array, int low, int high, int searchedValue)
{
if (low > high)
return;
int mid = (low + high) / 2;
if (searchedValue < array[mid])
{
high = mid - 1;
BinarySearch(array, low, high, searchedValue);
}
else if (searchedValue > array[mid])
{
low = mid + 1;
BinarySearch(array, low, high, searchedValue);
}
else if (searchedValue == array[mid])
{
Console.WriteLine(array[mid]);
BinarySearch(array, low, mid - 1, searchedValue);
BinarySearch(array, mid + 1, high, searchedValue);
}
}

Quick Sort Implementation with large numbers [duplicate]

I learnt about quick sort and how it can be implemented in both Recursive and Iterative method.
In Iterative method:
Push the range (0...n) into the stack
Partition the given array with a pivot
Pop the top element.
Push the partitions (index range) onto a stack if the range has more than one element
Do the above 3 steps, till the stack is empty
And the recursive version is the normal one defined in wiki.
I learnt that recursive algorithms are always slower than their iterative counterpart.
So, Which method is preferred in terms of time complexity (memory is not a concern)?
Which one is fast enough to use in Programming contest?
Is c++ STL sort() using a recursive approach?
In terms of (asymptotic) time complexity - they are both the same.
"Recursive is slower then iterative" - the rational behind this statement is because of the overhead of the recursive stack (saving and restoring the environment between calls).
However -these are constant number of ops, while not changing the number of "iterations".
Both recursive and iterative quicksort are O(nlogn) average case and O(n^2) worst case.
EDIT:
just for the fun of it I ran a benchmark with the (java) code attached to the post , and then I ran wilcoxon statistic test, to check what is the probability that the running times are indeed distinct
The results may be conclusive (P_VALUE=2.6e-34, https://en.wikipedia.org/wiki/P-value. Remember that the P_VALUE is P(T >= t | H) where T is the test statistic and H is the null hypothesis). But the answer is not what you expected.
The average of the iterative solution was 408.86 ms while of recursive was 236.81 ms
(Note - I used Integer and not int as argument to recursiveQsort() - otherwise the recursive would have achieved much better, because it doesn't have to box a lot of integers, which is also time consuming - I did it because the iterative solution has no choice but doing so.
Thus - your assumption is not true, the recursive solution is faster (for my machine and java for the very least) than the iterative one with P_VALUE=2.6e-34.
public static void recursiveQsort(int[] arr,Integer start, Integer end) {
if (end - start < 2) return; //stop clause
int p = start + ((end-start)/2);
p = partition(arr,p,start,end);
recursiveQsort(arr, start, p);
recursiveQsort(arr, p+1, end);
}
public static void iterativeQsort(int[] arr) {
Stack<Integer> stack = new Stack<Integer>();
stack.push(0);
stack.push(arr.length);
while (!stack.isEmpty()) {
int end = stack.pop();
int start = stack.pop();
if (end - start < 2) continue;
int p = start + ((end-start)/2);
p = partition(arr,p,start,end);
stack.push(p+1);
stack.push(end);
stack.push(start);
stack.push(p);
}
}
private static int partition(int[] arr, int p, int start, int end) {
int l = start;
int h = end - 2;
int piv = arr[p];
swap(arr,p,end-1);
while (l < h) {
if (arr[l] < piv) {
l++;
} else if (arr[h] >= piv) {
h--;
} else {
swap(arr,l,h);
}
}
int idx = h;
if (arr[h] < piv) idx++;
swap(arr,end-1,idx);
return idx;
}
private static void swap(int[] arr, int i, int j) {
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
public static void main(String... args) throws Exception {
Random r = new Random(1);
int SIZE = 1000000;
int N = 100;
int[] arr = new int[SIZE];
int[] millisRecursive = new int[N];
int[] millisIterative = new int[N];
for (int t = 0; t < N; t++) {
for (int i = 0; i < SIZE; i++) {
arr[i] = r.nextInt(SIZE);
}
int[] tempArr = Arrays.copyOf(arr, arr.length);
long start = System.currentTimeMillis();
iterativeQsort(tempArr);
millisIterative[t] = (int)(System.currentTimeMillis()-start);
tempArr = Arrays.copyOf(arr, arr.length);
start = System.currentTimeMillis();
recursvieQsort(tempArr,0,arr.length);
millisRecursive[t] = (int)(System.currentTimeMillis()-start);
}
int sum = 0;
for (int x : millisRecursive) {
System.out.println(x);
sum += x;
}
System.out.println("end of recursive. AVG = " + ((double)sum)/millisRecursive.length);
sum = 0;
for (int x : millisIterative) {
System.out.println(x);
sum += x;
}
System.out.println("end of iterative. AVG = " + ((double)sum)/millisIterative.length);
}
Recursion is NOT always slower than iteration. Quicksort is perfect example of it. The only way to do this in iterate way is create stack structure. So in other way do the same that the compiler do if we use recursion, and propably you will do this worse than compiler. Also there will be more jumps if you don't use recursion (to pop and push values to stack).
That's the solution i came up with in Javascript. I think it works.
const myArr = [33, 103, 3, 726, 200, 984, 198, 764, 9]
document.write('initial order :', JSON.stringify(myArr), '<br><br>')
qs_iter(myArr)
document.write('_Final order :', JSON.stringify(myArr))
function qs_iter(items) {
if (!items || items.length <= 1) {
return items
}
var stack = []
var low = 0
var high = items.length - 1
stack.push([low, high])
while (stack.length) {
var range = stack.pop()
low = range[0]
high = range[1]
if (low < high) {
var pivot = Math.floor((low + high) / 2)
stack.push([low, pivot])
stack.push([pivot + 1, high])
while (low < high) {
while (low < pivot && items[low] <= items[pivot]) low++
while (high > pivot && items[high] > items[pivot]) high--
if (low < high) {
var tmp = items[low]
items[low] = items[high]
items[high] = tmp
}
}
}
}
return items
}
Let me know if you found a mistake :)
Mister Jojo UPDATE :
this code just mixes values that can in rare cases lead to a sort, in other words never.
For those who have a doubt, I put it in snippet.

How to calculate distance similarity measure of given 2 strings?

I need to calculate the similarity between 2 strings. So what exactly do I mean? Let me explain with an example:
The real word: hospital
Mistaken word: haspita
Now my aim is to determine how many characters I need to modify the mistaken word to obtain the real word. In this example, I need to modify 2 letters. So what would be the percent? I take the length of the real word always. So it becomes 2 / 8 = 25% so these 2 given string DSM is 75%.
How can I achieve this with performance being a key consideration?
I just addressed this exact same issue a few weeks ago. Since someone is asking now, I'll share the code. In my exhaustive tests my code is about 10x faster than the C# example on Wikipedia even when no maximum distance is supplied. When a maximum distance is supplied, this performance gain increases to 30x - 100x +. Note a couple key points for performance:
If you need to compare the same words over and over, first convert the words to arrays of integers. The Damerau-Levenshtein algorithm includes many >, <, == comparisons, and ints compare much faster than chars.
It includes a short-circuiting mechanism to quit if the distance exceeds a provided maximum
Use a rotating set of three arrays rather than a massive matrix as in all the implementations I've see elsewhere
Make sure your arrays slice accross the shorter word width.
Code (it works the exact same if you replace int[] with String in the parameter declarations:
/// <summary>
/// Computes the Damerau-Levenshtein Distance between two strings, represented as arrays of
/// integers, where each integer represents the code point of a character in the source string.
/// Includes an optional threshhold which can be used to indicate the maximum allowable distance.
/// </summary>
/// <param name="source">An array of the code points of the first string</param>
/// <param name="target">An array of the code points of the second string</param>
/// <param name="threshold">Maximum allowable distance</param>
/// <returns>Int.MaxValue if threshhold exceeded; otherwise the Damerau-Leveshteim distance between the strings</returns>
public static int DamerauLevenshteinDistance(int[] source, int[] target, int threshold) {
int length1 = source.Length;
int length2 = target.Length;
// Return trivial case - difference in string lengths exceeds threshhold
if (Math.Abs(length1 - length2) > threshold) { return int.MaxValue; }
// Ensure arrays [i] / length1 use shorter length
if (length1 > length2) {
Swap(ref target, ref source);
Swap(ref length1, ref length2);
}
int maxi = length1;
int maxj = length2;
int[] dCurrent = new int[maxi + 1];
int[] dMinus1 = new int[maxi + 1];
int[] dMinus2 = new int[maxi + 1];
int[] dSwap;
for (int i = 0; i <= maxi; i++) { dCurrent[i] = i; }
int jm1 = 0, im1 = 0, im2 = -1;
for (int j = 1; j <= maxj; j++) {
// Rotate
dSwap = dMinus2;
dMinus2 = dMinus1;
dMinus1 = dCurrent;
dCurrent = dSwap;
// Initialize
int minDistance = int.MaxValue;
dCurrent[0] = j;
im1 = 0;
im2 = -1;
for (int i = 1; i <= maxi; i++) {
int cost = source[im1] == target[jm1] ? 0 : 1;
int del = dCurrent[im1] + 1;
int ins = dMinus1[i] + 1;
int sub = dMinus1[im1] + cost;
//Fastest execution for min value of 3 integers
int min = (del > ins) ? (ins > sub ? sub : ins) : (del > sub ? sub : del);
if (i > 1 && j > 1 && source[im2] == target[jm1] && source[im1] == target[j - 2])
min = Math.Min(min, dMinus2[im2] + cost);
dCurrent[i] = min;
if (min < minDistance) { minDistance = min; }
im1++;
im2++;
}
jm1++;
if (minDistance > threshold) { return int.MaxValue; }
}
int result = dCurrent[maxi];
return (result > threshold) ? int.MaxValue : result;
}
Where Swap is:
static void Swap<T>(ref T arg1,ref T arg2) {
T temp = arg1;
arg1 = arg2;
arg2 = temp;
}
What you are looking for is called edit distance or Levenshtein distance. The wikipedia article explains how it is calculated, and has a nice piece of pseudocode at the bottom to help you code this algorithm in C# very easily.
Here's an implementation from the first site linked below:
private static int CalcLevenshteinDistance(string a, string b)
{
if (String.IsNullOrEmpty(a) && String.IsNullOrEmpty(b)) {
return 0;
}
if (String.IsNullOrEmpty(a)) {
return b.Length;
}
if (String.IsNullOrEmpty(b)) {
return a.Length;
}
int lengthA = a.Length;
int lengthB = b.Length;
var distances = new int[lengthA + 1, lengthB + 1];
for (int i = 0; i <= lengthA; distances[i, 0] = i++);
for (int j = 0; j <= lengthB; distances[0, j] = j++);
for (int i = 1; i <= lengthA; i++)
for (int j = 1; j <= lengthB; j++)
{
int cost = b[j - 1] == a[i - 1] ? 0 : 1;
distances[i, j] = Math.Min
(
Math.Min(distances[i - 1, j] + 1, distances[i, j - 1] + 1),
distances[i - 1, j - 1] + cost
);
}
return distances[lengthA, lengthB];
}
There is a big number of string similarity distance algorithms that can be used. Some listed here (but not exhaustively listed are):
Levenstein
Needleman Wunch
Smith Waterman
Smith Waterman Gotoh
Jaro, Jaro Winkler
Jaccard Similarity
Euclidean Distance
Dice Similarity
Cosine Similarity
Monge Elkan
A library that contains implementation to all of these is called SimMetrics
which has both java and c# implementations.
I have found that Levenshtein and Jaro Winkler are great for small differences betwen strings such as:
Spelling mistakes; or
ö instead of o in a persons name.
However when comparing something like article titles where significant chunks of the text would be the same but with "noise" around the edges, Smith-Waterman-Gotoh has been fantastic:
compare these 2 titles (that are the same but worded differently from different sources):
An endonuclease from Escherichia coli that introduces single polynucleotide chain scissions in ultraviolet-irradiated DNA
Endonuclease III: An Endonuclease from Escherichia coli That Introduces Single Polynucleotide Chain Scissions in Ultraviolet-Irradiated DNA
This site that provides algorithm comparison of the strings shows:
Levenshtein: 81
Smith-Waterman Gotoh 94
Jaro Winkler 78
Jaro Winkler and Levenshtein are not as competent as Smith Waterman Gotoh in detecting the similarity. If we compare two titles that are not the same article, but have some matching text:
Fat metabolism in higher plants. The function of acyl thioesterases in the metabolism of acyl-coenzymes A and acyl-acyl carrier proteins
Fat metabolism in higher plants. The determination of acyl-acyl carrier protein and acyl coenzyme A in a complex lipid mixture
Jaro Winkler gives a false positive, but Smith Waterman Gotoh does not:
Levenshtein: 54
Smith-Waterman Gotoh 49
Jaro Winkler 89
As Anastasiosyal pointed out, SimMetrics has the java code for these algorithms. I had success using the SmithWatermanGotoh java code from SimMetrics.
Here is my implementation of Damerau Levenshtein Distance, which returns not only similarity coefficient, but also returns error locations in corrected word (this feature can be used in text editors). Also my implementation supports different weights of errors (substitution, deletion, insertion, transposition).
public static List<Mistake> OptimalStringAlignmentDistance(
string word, string correctedWord,
bool transposition = true,
int substitutionCost = 1,
int insertionCost = 1,
int deletionCost = 1,
int transpositionCost = 1)
{
int w_length = word.Length;
int cw_length = correctedWord.Length;
var d = new KeyValuePair<int, CharMistakeType>[w_length + 1, cw_length + 1];
var result = new List<Mistake>(Math.Max(w_length, cw_length));
if (w_length == 0)
{
for (int i = 0; i < cw_length; i++)
result.Add(new Mistake(i, CharMistakeType.Insertion));
return result;
}
for (int i = 0; i <= w_length; i++)
d[i, 0] = new KeyValuePair<int, CharMistakeType>(i, CharMistakeType.None);
for (int j = 0; j <= cw_length; j++)
d[0, j] = new KeyValuePair<int, CharMistakeType>(j, CharMistakeType.None);
for (int i = 1; i <= w_length; i++)
{
for (int j = 1; j <= cw_length; j++)
{
bool equal = correctedWord[j - 1] == word[i - 1];
int delCost = d[i - 1, j].Key + deletionCost;
int insCost = d[i, j - 1].Key + insertionCost;
int subCost = d[i - 1, j - 1].Key;
if (!equal)
subCost += substitutionCost;
int transCost = int.MaxValue;
if (transposition && i > 1 && j > 1 && word[i - 1] == correctedWord[j - 2] && word[i - 2] == correctedWord[j - 1])
{
transCost = d[i - 2, j - 2].Key;
if (!equal)
transCost += transpositionCost;
}
int min = delCost;
CharMistakeType mistakeType = CharMistakeType.Deletion;
if (insCost < min)
{
min = insCost;
mistakeType = CharMistakeType.Insertion;
}
if (subCost < min)
{
min = subCost;
mistakeType = equal ? CharMistakeType.None : CharMistakeType.Substitution;
}
if (transCost < min)
{
min = transCost;
mistakeType = CharMistakeType.Transposition;
}
d[i, j] = new KeyValuePair<int, CharMistakeType>(min, mistakeType);
}
}
int w_ind = w_length;
int cw_ind = cw_length;
while (w_ind >= 0 && cw_ind >= 0)
{
switch (d[w_ind, cw_ind].Value)
{
case CharMistakeType.None:
w_ind--;
cw_ind--;
break;
case CharMistakeType.Substitution:
result.Add(new Mistake(cw_ind - 1, CharMistakeType.Substitution));
w_ind--;
cw_ind--;
break;
case CharMistakeType.Deletion:
result.Add(new Mistake(cw_ind, CharMistakeType.Deletion));
w_ind--;
break;
case CharMistakeType.Insertion:
result.Add(new Mistake(cw_ind - 1, CharMistakeType.Insertion));
cw_ind--;
break;
case CharMistakeType.Transposition:
result.Add(new Mistake(cw_ind - 2, CharMistakeType.Transposition));
w_ind -= 2;
cw_ind -= 2;
break;
}
}
if (d[w_length, cw_length].Key > result.Count)
{
int delMistakesCount = d[w_length, cw_length].Key - result.Count;
for (int i = 0; i < delMistakesCount; i++)
result.Add(new Mistake(0, CharMistakeType.Deletion));
}
result.Reverse();
return result;
}
public struct Mistake
{
public int Position;
public CharMistakeType Type;
public Mistake(int position, CharMistakeType type)
{
Position = position;
Type = type;
}
public override string ToString()
{
return Position + ", " + Type;
}
}
public enum CharMistakeType
{
None,
Substitution,
Insertion,
Deletion,
Transposition
}
This code is a part of my project: Yandex-Linguistics.NET.
I wrote some tests and it's seems to me that method is working.
But comments and remarks are welcome.
Here is an alternative approach:
A typical method for finding similarity is Levenshtein distance, and there is no doubt a library with code available.
Unfortunately, this requires comparing to every string. You might be able to write a specialized version of the code to short-circuit the calculation if the distance is greater than some threshold, you would still have to do all the comparisons.
Another idea is to use some variant of trigrams or n-grams. These are sequences of n characters (or n words or n genomic sequences or n whatever). Keep a mapping of trigrams to strings and choose the ones that have the biggest overlap. A typical choice of n is "3", hence the name.
For instance, English would have these trigrams:
Eng
ngl
gli
lis
ish
And England would have:
Eng
ngl
gla
lan
and
Well, 2 out of 7 (or 4 out of 10) match. If this works for you, and you can index the trigram/string table and get a faster search.
You can also combine this with Levenshtein to reduce the set of comparison to those that have some minimum number of n-grams in common.
Here's a VB.net implementation:
Public Shared Function LevenshteinDistance(ByVal v1 As String, ByVal v2 As String) As Integer
Dim cost(v1.Length, v2.Length) As Integer
If v1.Length = 0 Then
Return v2.Length 'if string 1 is empty, the number of edits will be the insertion of all characters in string 2
ElseIf v2.Length = 0 Then
Return v1.Length 'if string 2 is empty, the number of edits will be the insertion of all characters in string 1
Else
'setup the base costs for inserting the correct characters
For v1Count As Integer = 0 To v1.Length
cost(v1Count, 0) = v1Count
Next v1Count
For v2Count As Integer = 0 To v2.Length
cost(0, v2Count) = v2Count
Next v2Count
'now work out the cheapest route to having the correct characters
For v1Count As Integer = 1 To v1.Length
For v2Count As Integer = 1 To v2.Length
'the first min term is the cost of editing the character in place (which will be the cost-to-date or the cost-to-date + 1 (depending on whether a change is required)
'the second min term is the cost of inserting the correct character into string 1 (cost-to-date + 1),
'the third min term is the cost of inserting the correct character into string 2 (cost-to-date + 1) and
cost(v1Count, v2Count) = Math.Min(
cost(v1Count - 1, v2Count - 1) + If(v1.Chars(v1Count - 1) = v2.Chars(v2Count - 1), 0, 1),
Math.Min(
cost(v1Count - 1, v2Count) + 1,
cost(v1Count, v2Count - 1) + 1
)
)
Next v2Count
Next v1Count
'the final result is the cheapest cost to get the two strings to match, which is the bottom right cell in the matrix
'in the event of strings being equal, this will be the result of zipping diagonally down the matrix (which will be square as the strings are the same length)
Return cost(v1.Length, v2.Length)
End If
End Function

algorithm to find the correct set of numbers

i will take either python of c# solution
i have about 200 numbers:
19.16
98.48
20.65
122.08
26.16
125.83
473.33
125.92
3,981.21
16.81
100.00
43.58
54.19
19.83
3,850.97
20.83
20.83
86.81
37.71
36.33
6,619.42
264.53
...
...
i know that in this set of numbers, there is a combination of numbers that will add up to a certain number let's say it is 2341.42
how do i find out which combination of numbers will add up to that?
i am helping someone in accounting track down the correct numbers
Here's a recursive function in Python that will find ALL solutions of any size with only two arguments (that you need to specify).
def find_all_sum_subsets(target_sum, numbers, offset=0):
solutions = []
for i in xrange(offset, len(numbers)):
value = numbers[i]
if target_sum == value:
solutions.append([value])
elif target_sum > value:
sub_solutions = find_all_sum_subsets(target_sum - value, numbers, i + 1)
for sub_solution in sub_solutions:
solutions.append(sub_solution + [value])
return solutions
Here it is working:
>>> find_all_sum_subsets(10, [1,2,3,4,5,6,7,8,9,10,11,12])
[[4, 3, 2, 1], [7, 2, 1], [6, 3, 1], [5, 4, 1], [9, 1], [5, 3, 2], [8, 2], [7, 3], [6, 4], [10]]
>>>
You can use backtracking to generate all the possible solutions. This way you can quickly write your solution.
EDIT:
You just implement the algoritm in C#:
public void backtrack (double sum, String solution, ArrayList numbers, int depth, double targetValue, int j)
{
for (int i = j; i < numbers.Count; i++)
{
double potentialSolution = Convert.ToDouble(arrayList[i] + "");
if (sum + potentialSolution > targetValue)
continue;
else if (sum + potentialSolution == targetValue)
{
if (depth == 0)
{
solution = potentialSolution + "";
/*Store solution*/
}
else
{
solution += "," + potentialSolution;
/*Store solution*/
}
}
else
{
if (depth == 0)
{
solution = potentialSolution + "";
}
else
{
solution += "," + potentialSolution;
}
backtrack (sum + potentialSolution, solution, numbers, depth + 1, targetValue, i + 1);
}
}
}
You will call this function this way:
backtrack (0, "", numbers, 0, 2341.42, 0);
The source code was implemented on the fly to answer your question and was not tested, but esencially you can understand what I mean from this code.
[Begin Edit]:
I misread the original question. I thought that it said that there is some combination of 4 numbers in the list of 200+ numbers that add up to some other number. That is not what was asked, so my answer does not really help much.
[End Edit]
This is pretty clunky, but it should work if all you need is to find the 4 numbers that add up to a certain value (it could find more than 4 tuples):
Just get your 200 numbers into an array (or list or some IEnumerable structure) and then you can use the code that I posted. If you have the numbers on paper, you will have to enter them into the array manually as below. If you have them in softcopy, you can cut and paste them and then add the numbers[x] = xxx code around them. Or, you could cut and paste them into a file and then read the file from disk into an array.
double [] numbers = new numbers[200];
numbers[0] = 123;
numbers[1] = 456;
//
// and so on.
//
var n0 = numbers;
var n1 = numbers.Skip(1);
var n2 = numbers.Skip(2);
var n3 = numbers.Skip(3);
var x = from a in n0
from b in n1
from c in n2
from d in n3
where a + b + c + d == 2341.42
select new { a1 = a, b1 = b, c1 = c, d1 = d };
foreach (var aa in x)
{
Console.WriteLine("{0}, {1}, {2}, {3}", aa.a1, aa.b1, aa.c1, aa.d1 );
}
Try the following approach if finding a combination of any two (2) numbers:
float targetSum = 3;
float[] numbers = new float[]{1, 2, 3, 4, 5, 6};
Sort(numbers); // Sort numbers in ascending order.
int startIndex = 0;
int endIndex = numbers.Length - 1;
while (startIndex != endIndex)
{
float firstNumber = numbers[startIndex];
float secondNumber = numbers[endIndex];
float sum = firstNumber + secondNumber;
if (sum == targetSum)
{
// Found a combination.
break;
}
else if (sum < targetSum)
{
startIndex++;
}
else
{
endIndex--;
}
}
Remember that when use floating-point or decimal numbers, rounding could be an issue.
This should be implemented as a recursive algorithm. Basically, for any given number, determine if there is a subset of the remaining numbers for which the sum is your desired value.
Iterate across the list of numbers; for each entry, subtract that from your total, and determine if there is a subset of the remaining list which sums up to the new total. If not, try with your original total and the next number in the list (and a smaller sublist, of course).
As to implementation:
You want to define a method which takes a target number, and a list, and which returns a list of numbers which sum up to that target number. That algorithm should iterate through the list; if an element of the list subtracted from the target number is zero, return that element in a list; otherwise, recurse on the method with the remainder of the list, and the new target number. If any recursion returns a non-null result, return that; otherwise, return null.
ArrayList<decimal> FindSumSubset(decimal sum, ArrayList<decimal> list)
{
for (int i = 0; i < list.Length; i++)
{
decimal value = list[i];
if (sum - value == 0.0m)
{
return new ArrayList<decimal>().Add(value);
}
else
{
var subset = FindSumSubset(sum - value, list.GetRange(i + 1, list.Length -i);
if (subset != null)
{
return subset.Add(value);
}
}
}
return null;
}
Note, however, that the order of this is pretty ugly, and for significantly larger sets of numbers, this becomes intractable relatively quickly. This should be doable in less than geologic time for 200 decimals, though.

Categories