Fuzzy matching multiple words in string

Fuzzy matching multiple words in string - c#

I'm trying to employ the help of the Levenshtein Distance to find fuzzy keywords(static text) on an OCR page.
To do this, I want to give a percentage of errors that are allowed (say, 15%).
string Keyword = "past due electric service";
Since the keyword is 25 characters long, I want to allow for 4 errors (25 * .15 rounded up)
I need to be able to compare it to...
string Entire_OCR_Page = "previous bill amount payment received on 12/26/13 thank
you! current electric service total balances unpaid 7
days after the total due date are subject to a late
charge of 7.5% of the amount due or $2.00, whichever/5
greater. "
This is how I am doing it now...
int LevenshteinDistance = LevenshteinAlgorithm(Keyword, Entire_OCR_Page); // = 202
int NumberOfErrorsAllowed = 4;
int Allowance = (Entire_OCR_Page.Length() - Keyword.Length()) + NumberOfErrorsAllowed; // = 205
Clearly, Keyword is not found in OCR_Text (which it shouldn't be). But, using Levenshtein's Distance, the number of errors is less than the 15% leeway (therefore my logic says it's found).
Does anyone know of a better way to do this?

Answered My Question with the use of sub-strings. Posting in case others run into the same type of problem. A little unorthodox, but it works great for me.
int TextLengthBuffer = (int)StaticTextLength - 1; //start looking for correct result with one less character than it should have.
int LowestLevenshteinNumber = 999999; //initialize insanely high maximum
decimal PossibleStringLength = (PossibleString.Length); //Length of string to search
decimal StaticTextLength = (StaticText.Length); //Length of text to search for
decimal NumberOfErrorsAllowed = Math.Round((StaticTextLength * (ErrorAllowance / 100)), MidpointRounding.AwayFromZero); //Find number of errors allowed with given ErrorAllowance percentage
//Look for best match with 1 less character than it should have, then the correct amount of characters.
//And last, with 1 more character. (This is because one letter can be recognized as
//two (W -> VV) and visa versa)
for (int i = 0; i < 3; i++)
{
for (int e = TextLengthBuffer; e <= (int)PossibleStringLength; e++)
{
string possibleResult = (PossibleString.Substring((e - TextLengthBuffer), TextLengthBuffer));
int lAllowance = (int)(Math.Round((possibleResult.Length - StaticTextLength) + (NumberOfErrorsAllowed), MidpointRounding.AwayFromZero));
int lNumber = LevenshteinAlgorithm(StaticText, possibleResult);
if (lNumber <= lAllowance && ((lNumber < LowestLevenshteinNumber) || (TextLengthBuffer == StaticText.Length && lNumber <= LowestLevenshteinNumber)))
{
PossibleResult = (new StaticTextResult { text = possibleResult, errors = lNumber });
LowestLevenshteinNumber = lNumber;
}
}
TextLengthBuffer++;
}
public static int LevenshteinAlgorithm(string s, string t) // Levenshtein Algorithm
{
int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
for (int i = 0; i <= n; d[i, 0] = i++)
{
}
for (int j = 0; j <= m; d[0, j] = j++)
{
}
for (int i = 1; i <= n; i++)
{
for (int j = 1; j <= m; j++)
{
int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
d[i, j] = Math.Min(
Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
return d[n, m];
}

I think it's not working because a large chunk of your string is matched. So what I'd do, is try splitting your Keyword into separate words.
Then find all places where those words are matched in your OCR_TEXT.
Then look at all those places where they matched and see if any 4 of those places are consecutive and match the original phrase.
Am unsure if my explanation is clear?

Related

How can I extract the places values of a number?

I'm working on a math game in the Unity game engine using C#, specifically a reusable component to teach the grid method for multiplication. For example, when given the numbers 34 and 13, it should generate a 3X3 grid (a header column and row for the multiplier and multiplicand place values and 2X2 for the number of places in the multiplier and multiplicand). Something that looks like this:
My issue is that I don't know the best way to extract the place values of the numbers (eg 34 -> 30 and 4). I was thinking of just converting it to a string, adding 0s to the higher place values based on its index, and converting it back to an int, but this seems like a bad solution. Is there a better way of doing this?
Note: I'll pretty much only be dealing with positive whole numbers, but the number of place values might vary.
Thanks to all who answered! Thought it might be helpful to post my Unity-specific solution that I constructed with all the replies:
List<int> GetPlaceValues(int num) {
List<int> placeValues = new List<int>();
while (num > 0) {
placeValues.Add(num % 10);
num /= 10;
}
for(int i = 0;i<placeValues.Count;i++) {
placeValues[i] *= (int)Mathf.Pow(10, i);
}
placeValues.Reverse();
return placeValues;
}

Take advantage of the way our number system works. Here's a basic example:
string test = "12034";
for (int i = 0; i < test.Length; ++i) {
int digit = test[test.Length - i - 1] - '0';
digit *= (int)Math.Pow(10, i);
Console.WriteLine("digit = " + digit);
}
Basically, it reads from the rightmost digit (assuming the input is an integer), and uses the convenient place value of the way our system works to calculate the meaning of the digit.
test.Length - i - 1 treats the rightmost as 0, and indexes positive to the left of there.
- '0' converts from the encoding value for '0' to an actual digit.
Play with the code

Perhaps you want something like this (ideone):
int n = 76302;
int mul = 1;
int cnt = 0;
int res[10];
while(n) {
res[cnt++] = (n % 10) * mul;
mul*=10;
cout << res[cnt-1] << " ";
n = n / 10;
}
output
2 0 300 6000 70000

My answer is incredibly crude, and could likely be improved by someone with better maths skills:
void Main()
{
GetMulGrid(34, 13).Dump();
}
int[,] GetMulGrid(int x, int y)
{
int[] GetPlaceValues(int num)
{
var numDigits = (int)Math.Floor(Math.Log10(num) + 1);
var digits = num.ToString().ToCharArray().Select(ch => Convert.ToInt32(ch.ToString())).ToArray();
var multiplied =
digits
.Select((d, i) =>
{
if (i != (numDigits - 1) && d == 0) d = 1;
return d * (int)Math.Pow(10, (numDigits - i) - 1);
})
.ToArray();
return multiplied;
}
var xComponents = GetPlaceValues(x);
var yComponents = GetPlaceValues(y);
var arr = new int[xComponents.Length + 1, yComponents.Length + 1];
for(var row = 0; row < yComponents.Length; row++)
{
for(var col = 0; col < xComponents.Length; col++)
{
arr[row + 1,col + 1] = xComponents[col] * yComponents[row];
if (row == 0)
{
arr[0, col + 1] = xComponents[col];
}
if (col == 0)
{
arr[row + 1, 0] = yComponents[row];
}
}
}
return arr;
}
For your example of 34 x 13 it produces:
And for 304 x 132 it produces:
It spits this out as an array, so how you consume and display the results will be up to you.

For two-digit numbers you can use modulo
int n = 34;
int x = n % 10; // 4
int y = n - x; // 30

Calculate Nth Pi Digit

I am trying to calculate the nth digit of Pi without using Math.Pi which can be specified as a parameter.
I modified an existing algorithm, since I like to find the Nth digit without using string conversions or default classes.
This is how my algorithm currently looks like:
static int CalculatePi(int pos)
{
List<int> result = new List<int>();
int digits = 102;
int[] x = new int[digits * 3 + 2];
int[] r = new int[digits * 3 + 2];
for (int j = 0; j < x.Length; j++)
x[j] = 20;
for (int i = 0; i < digits; i++)
{
int carry = 0;
for (int j = 0; j < x.Length; j++)
{
int num = (int)(x.Length - j - 1);
int dem = num * 2 + 1;
x[j] += carry;
int q = x[j] / dem;
r[j] = x[j] % dem;
carry = q * num;
}
if (i < digits - 1)
result.Add((int)(x[x.Length - 1] / 10));
r[x.Length - 1] = x[x.Length - 1] % 10; ;
for (int j = 0; j < x.Length; j++)
x[j] = r[j] * 10;
}
return result[pos];
}
So far its working till digit 32, and then, an error occurs.
When I try to print the digits like so:
static void Main(string[] args)
{
for (int i = 0; i < 100; i++)
{
Console.WriteLine("{0} digit of Pi is : {1}", i, CalculatePi(i));
}
Console.ReadKey();
}
This I get 10 for the 32rd digit and the 85rd digit and some others as well, which is obviously incorrect.
The original digits from 27 look like so:
...3279502884.....
but I get
...32794102884....
Whats wrong with the algorithm, how can I fix this problem?
And can the algorithm still be tweaked to improve the speed?

So far it works right up until the cursor reaches digit 32. Upon which, an exception is thrown.
Rules are as follows:
Digit 31 is incorrect, because it should be 5 not a 4.
Digit 32 should be a 0.
When you get a 10 digit result you need to carry 1 over to the previous digit to change the 10 to a 0.
The code changes below will work up to ~ digit 361 when 362 = 10.
Once the program enters the 900's then there are a lot of wrong numbers.
Inside your loop you can do this by keeping track of the previous digit, only adding it to the list after the succeeding digit has been computed.
Overflows need to be handled as they occur, as follows:
int prev = 0;
for (int i = 0; i < digits; i++)
{
int carry = 0;
for (int j = 0; j < x.Length; j++)
{
int num = (int)(x.Length - j - 1);
int dem = num * 2 + 1;
x[j] += carry;
int q = x[j] / dem;
r[j] = x[j] % dem;
carry = q * num;
}
// calculate the digit, but don't add to the list right away:
int digit = (int)(x[x.Length - 1] / 10);
// handle overflow:
if(digit >= 10)
{
digit -= 10;
prev++;
}
if (i > 0)
result.Add(prev);
// Store the digit for next time, when it will be the prev value:
prev = digit;
r[x.Length - 1] = x[x.Length - 1] % 10;
for (int j = 0; j < x.Length; j++)
x[j] = r[j] * 10;
}
Since the digits are being updated sequentially one-by-one, one whole iteration later than previously, the if (i < digits - 1) check can be removed.
However, you need to add a new one to replace it: if (i > 0), because you don't have a valid prev value on the first pass through the loop.
The happy coincidence of computing only the first 100 digits means the above will work.
However, what do you suppose will happen when a 10 digit result follows a 9 digit one? Not good news I'm afraid, because the 1 with need carrying over to the 9 (the previous value), which will make it 10.
A more robust solution is to finish your calculation, then do a loop over your list going backwards, carrying any 10s you encounter over to the previous digits, and propagating any carries.
Consider the following:
for (int pos = digits - 2; pos >= 1; pos--)
{
if(result[pos] >= 10)
{
result[pos] -= 10;
result[pos - 1] += 1;
}
}

fuzzy matching word on OCR page

I have a static phrase the I am searching an OCR'd image for.
string KeywordToFind = "Account Number"
string OcrPageText = "
GEORGIA
POWER
A SOUTHERN COMPANY
AecountNumber
122- 493
Pagel of2
Please Pay By
Jan 29,2014
Total Due
39.11
"
How can I find the word "AecountNumber" using my keyword "Account Number"?
I have tried using variations of the Levenshtein Distance Algorithm HERE with varied success. I've also tried regexes, but the OCR often converts the text differently, thus rendering the regex useless.
Suggestions? I can provide more code if the link doesn't give enough information. Also, Thanks!

Why not try something mostly arbitrary, like this -- while it would certainly match a lot more than just account number, the chances of the start and end characters existing elsewhere in that order is pretty slim.
A.?c.?.?nt ?N.?[mn]b.?r
http://regex101.com/r/zV1yM2
It'll match things like:
Account Number
AccntNumbr
Aecnt Nunber

Answered My Question with the use of sub-strings. Posting in case others run into the same type of problem. A little unorthodox, but it works great for me.
int TextLengthBuffer = (int)StaticTextLength - 1; //start looking for correct result with one less character than it should have.
int LowestLevenshteinNumber = 999999; //initialize insanely high maximum
decimal PossibleStringLength = (PossibleString.Length); //Length of string to search
decimal StaticTextLength = (StaticText.Length); //Length of text to search for
decimal NumberOfErrorsAllowed = Math.Round((StaticTextLength * (ErrorAllowance / 100)), MidpointRounding.AwayFromZero); //Find number of errors allowed with given ErrorAllowance percentage
//Look for best match with 1 less character than it should have, then the correct amount of characters.
//And last, with 1 more character. (This is because one letter can be recognized as
//two (W -> VV) and visa versa)
for (int i = 0; i < 3; i++)
{
for (int e = TextLengthBuffer; e <= (int)PossibleStringLength; e++)
{
string possibleResult = (PossibleString.Substring((e - TextLengthBuffer), TextLengthBuffer));
int lAllowance = (int)(Math.Round((possibleResult.Length - StaticTextLength) + (NumberOfErrorsAllowed), MidpointRounding.AwayFromZero));
int lNumber = LevenshteinAlgorithm(StaticText, possibleResult);
if (lNumber <= lAllowance && ((lNumber < LowestLevenshteinNumber) || (TextLengthBuffer == StaticText.Length && lNumber <= LowestLevenshteinNumber)))
{
PossibleResult = (new StaticTextResult { text = possibleResult, errors = lNumber });
LowestLevenshteinNumber = lNumber;
}
}
TextLengthBuffer++;
}
public static int LevenshteinAlgorithm(string s, string t) // Levenshtein Algorithm
{
int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
for (int i = 0; i <= n; d[i, 0] = i++)
{
}
for (int j = 0; j <= m; d[0, j] = j++)
{
}
for (int i = 1; i <= n; i++)
{
for (int j = 1; j <= m; j++)
{
int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
d[i, j] = Math.Min(
Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
return d[n, m];
}

Regex or string compare with allowance of error

I'm trying to do a string compare in C# with some allowance for error. For example, if my search term is "Welcome", but if my comparison string (generated through OCR) is "We1come" and my error allowance is 20%, that should match. That part isn't so difficult using something like the Levenshtein algorithm. The hard part is making it work within a larger block of text, like a regular expression. For example, maybe my OCR result is "Hello. My name is Ben. We1come to my StackOverflow question.", I would like to pick out that We1come as a good result compared to my search term.

Took quite a while, but this works well. Fun problem :)
string PossibleString = PossibleString.ToString().ToLower();
string StaticText = StaticText.ToLower();
decimal PossibleStringLength = (PossibleString.Length);
decimal StaticTextLength = (StaticText.Length);
decimal NumberOfErrorsAllowed = Math.Round((StaticTextLength * (ErrorAllowance / 100)), MidpointRounding.AwayFromZero);
int LevenshteinDistance = LevenshteinAlgorithm(StaticText, PossibleString);
string PossibleResult = string.Empty;
if (LevenshteinDistance == PossibleStringLength - StaticTextLength)
{
// Perfect match. no need to calculate.
PossibleResult = StaticText;
}
else
{
int TextLengthBuffer = (int)StaticTextLength - 1;
int LowestLevenshteinNumber = 999999;
for (int i = 0; i < 3; i++) // Check for best results with same amount of characters as expected, as well as +/- 1
{
for (int e = TextLengthBuffer; e <= (int)PossibleStringLength; e++)
{
string possibleResult = (PossibleString.Substring((e - TextLengthBuffer), TextLengthBuffer));
int lAllowance = (int)(Math.Round((possibleResult.Length - StaticTextLength) + (NumberOfErrorsAllowed), MidpointRounding.AwayFromZero));
int lNumber = LevenshteinAlgorithm(StaticText, possibleResult);
if (lNumber <= lAllowance && ((lNumber < LowestLevenshteinNumber) || (TextLengthBuffer == StaticText.Length && lNumber <= LowestLevenshteinNumber)))
{
PossibleResult = possibleResult;
LowestLevenshteinNumber = lNumber;
}
}
TextLengthBuffer++;
}
}
public static int LevenshteinAlgorithm(string s, string t)
{
int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
for (int i = 0; i <= n; d[i, 0] = i++)
{
}
for (int j = 0; j <= m; d[0, j] = j++)
{
}
for (int i = 1; i <= n; i++)
{
for (int j = 1; j <= m; j++)
{
int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
d[i, j] = Math.Min(
Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
return d[n, m];
}

If it is somehow predictable how the OCR can miss letters, I would replace the letters in the search with misses.
If the search is Welcome, the regex would be (?i)We[l1]come.

Damerau–Levenshtein distance algorithm, disable counting of delete

How can i disable counting of deletion, in this implementation of Damerau-Levenshtein distance algorithm, or if there is other algorithm already implemented please point me to it.
Example(disabled deletion counting):
string1: how are you?
string2: how oyu?
distance: 1 (for transposition, 4 deletes doesn't count)
And here is the algorithm:
public static int DamerauLevenshteinDistance(string string1, string string2, int threshold)
{
// Return trivial case - where they are equal
if (string1.Equals(string2))
return 0;
// Return trivial case - where one is empty
if (String.IsNullOrEmpty(string1) || String.IsNullOrEmpty(string2))
return (string1 ?? "").Length + (string2 ?? "").Length;
// Ensure string2 (inner cycle) is longer_transpositionRow
if (string1.Length > string2.Length)
{
var tmp = string1;
string1 = string2;
string2 = tmp;
}
// Return trivial case - where string1 is contained within string2
if (string2.Contains(string1))
return string2.Length - string1.Length;
var length1 = string1.Length;
var length2 = string2.Length;
var d = new int[length1 + 1, length2 + 1];
for (var i = 0; i <= d.GetUpperBound(0); i++)
d[i, 0] = i;
for (var i = 0; i <= d.GetUpperBound(1); i++)
d[0, i] = i;
for (var i = 1; i <= d.GetUpperBound(0); i++)
{
var im1 = i - 1;
var im2 = i - 2;
var minDistance = threshold;
for (var j = 1; j <= d.GetUpperBound(1); j++)
{
var jm1 = j - 1;
var jm2 = j - 2;
var cost = string1[im1] == string2[jm1] ? 0 : 1;
var del = d[im1, j] + 1;
var ins = d[i, jm1] + 1;
var sub = d[im1, jm1] + cost;
//Math.Min is slower than native code
//d[i, j] = Math.Min(del, Math.Min(ins, sub));
d[i, j] = del <= ins && del <= sub ? del : ins <= sub ? ins : sub;
if (i > 1 && j > 1 && string1[im1] == string2[jm2] && string1[im2] == string2[jm1])
d[i, j] = Math.Min(d[i, j], d[im2, jm2] + cost);
if (d[i, j] < minDistance)
minDistance = d[i, j];
}
if (minDistance > threshold)
return int.MaxValue;
}
return d[d.GetUpperBound(0), d.GetUpperBound(1)] > threshold
? int.MaxValue
: d[d.GetUpperBound(0), d.GetUpperBound(1)];
}

public static int DamerauLevenshteinDistance( string string1
, string string2
, int threshold)
{
// Return trivial case - where they are equal
if (string1.Equals(string2))
return 0;
// Return trivial case - where one is empty
// WRONG FOR YOUR NEEDS:
// if (String.IsNullOrEmpty(string1) || String.IsNullOrEmpty(string2))
// return (string1 ?? "").Length + (string2 ?? "").Length;
//DO IT THIS WAY:
if (String.IsNullOrEmpty(string1))
// First string is empty, so every character of
// String2 has been inserted:
return (string2 ?? "").Length;
if (String.IsNullOrEmpty(string2))
// Second string is empty, so every character of string1
// has been deleted, but you dont count deletions:
return 0;
// DO NOT SWAP THE STRINGS IF YOU WANT TO DEAL WITH INSERTIONS
// IN A DIFFERENT MANNER THEN WITH DELETIONS:
// THE FOLLOWING IS WRONG FOR YOUR NEEDS:
// // Ensure string2 (inner cycle) is longer_transpositionRow
// if (string1.Length > string2.Length)
// {
// var tmp = string1;
// string1 = string2;
// string2 = tmp;
// }
// Return trivial case - where string1 is contained within string2
if (string2.Contains(string1))
//all changes are insertions
return string2.Length - string1.Length;
// REVERSE CASE: STRING2 IS CONTAINED WITHIN STRING1
if (string1.Contains(string2))
//all changes are deletions which you don't count:
return 0;
var length1 = string1.Length;
var length2 = string2.Length;
// PAY ATTENTION TO THIS CHANGE!
// length1+1 rows is way too much! You need only 3 rows (0, 1 and 2)
// read my explanation below the code!
// TOO MUCH ROWS: var d = new int[length1 + 1, length2 + 1];
var d = new int[2, length2 + 1];
// THIS INITIALIZATION COUNTS DELETIONS. YOU DONT WANT IT
// or (var i = 0; i <= d.GetUpperBound(0); i++)
// d[i, 0] = i;
// But you must initiate the first element of each row with 0:
for (var i = 0; i <= 2; i++)
d[i, 0] = 0;
// This initialization counts insertions. You need it, but for
// better consistency of code I call the variable j (not i):
for (var j = 0; j <= d.GetUpperBound(1); j++)
d[0, j] = j;
// Now do the job:
// for (var i = 1; i <= d.GetUpperBound(0); i++)
for (var i = 1; i <= length1; i++)
{
//Here in this for-loop: add "%3" to evey term
// that is used as first index of d!
var im1 = i - 1;
var im2 = i - 2;
var minDistance = threshold;
for (var j = 1; j <= d.GetUpperBound(1); j++)
{
var jm1 = j - 1;
var jm2 = j - 2;
var cost = string1[im1] == string2[jm1] ? 0 : 1;
// DON'T COUNT DELETIONS! var del = d[im1, j] + 1;
var ins = d[i % 3, jm1] + 1;
var sub = d[im1 % 3, jm1] + cost;
// Math.Min is slower than native code
// d[i, j] = Math.Min(del, Math.Min(ins, sub));
// DEL DOES NOT EXIST
// d[i, j] = del <= ins && del <= sub ? del : ins <= sub ? ins : sub;
d[i % 3, j] = ins <= sub ? ins : sub;
if (i > 1 && j > 1 && string1[im1] == string2[jm2] && string1[im2] == string2[jm1])
d[i % 3, j] = Math.Min(d[i % 3, j], d[im2 % 3, jm2] + cost);
if (d[i % 3, j] < minDistance)
minDistance = d[i % 3, j];
}
if (minDistance > threshold)
return int.MaxValue;
}
return d[length1 % 3, d.GetUpperBound(1)] > threshold
? int.MaxValue
: d[length1 % 3, d.GetUpperBound(1)];
}
here comes my explanation why you need only 3 rows:
Look at this line:
var d = new int[length1 + 1, length2 + 1];
If one string has the length n and the other has the length m, then your code needs a space of (n+1)*(m+1) integers. Each Integer needs 4 Byte. This is waste of memory if your strings are long. If both strings are 35.000 byte long, you will need more than 4 GB of memory!
In this code you calculate and write a new value for d[i,j]. And to do this, you read values from its upper neighbor (d[i,jm1]), from its left neighbor (d[im1,j]), from its upper-left neighbor (d[im1,jm1]) and finally from its double-upper-double-left neighbour (d[im2,jm2]). So you just need values from your actual row and 2 rows before.
You never need values from any other row. So why do you want to store them? Three rows are enough, and my changes make shure, that you can work with this 3 rows without reading any wrong value at any time.

I would advise not rewriting this specific algorithm to handle specific cases of "free" edits. Many of them radically simplify the concept of the problem to the point where the metric will not convey any useful information.
For example, when substitution is free the distance between all strings is the difference between their lengths. Simply transmute the smaller string into the prefix of the larger string and add the needed letters. (You can guarantee that there is no smaller distance because one insertion is required for each character of edit distance.)
When transposition is free the question reduces to determining the sum of differences of letter counts. (Since the distance between all anagrams is 0, sorting the letters in each string and exchanging out or removing the non-common elements of the larger string is the best strategy. The mathematical argument is similar to that of the previous example.)
In the case when insertion and deletion are free the edit distance between any two strings is zero. If only insertion OR deletion is free this breaks the symmetry of the distance metric - with free deletions, the distance from a to aa is 1, while the distance from aa to a is 1. Depending on the application this could possibly be desirable; but I'm not sure if it's something you're interested in. You will need to greatly alter the presented algorithm because it makes the mentioned assumption of one string always being longer than the other.

Try to change var del = d[im1, j] + 1; to var del = d[im1, j];, I think that solves your problem.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Fuzzy matching multiple words in string - c#

Related

How can I extract the places values of a number?

Calculate Nth Pi Digit

fuzzy matching word on OCR page

Regex or string compare with allowance of error

Damerau–Levenshtein distance algorithm, disable counting of delete

Categories

Resources