C# check given word to answer word

C# check given word to answer word - c#

I'm making a simple spell checker for C#, I want to try and compare a given answer to the randomly chosen word.
I want to compare the first letter to all the letters given in the answer so I can tell if it's correct, it's there was a swap of letters, a deletion or an added letter. I would ultimately like to be able to tell if only one letter was wrong to see a substitution was used.
For example, correct answer hello:
checking first letter ~H
h e l l o
1 0 0 0 0
h h e l l o
1 1 0 0 0 0
e l l o
0 0 0 0
Then go through it for the second letter.
I've absolutely no idea when it comes to C#.
I've tried
int CheckErrors(string Answer, string Guess)
{
if (Answer == Guess)
{
return 1;
}
if (Answer == null || Guess == null)
{
return -1;
}
for (int i = 0; i <= Answer.Length; i++)
{
if (Guess[i] != Answer[i])
{
count++;
//substitution = -4
//deletion = -5
//transposition = -6
//insertion = -7
}
return count;
}
return -9;
}
but I just can't get any further.
UPDATE:
with further research I guess what I was trying to do is something like:
STRING answer = answer_given
STRING input = correct_answer
int check = 0;
FOR (int i = 0; i < input.Length; i++)
{
FOR (int ii = 0; ii < answer.Length; ii++)
{
if (input[i] == answer[i])
{
int check++;
}
}
}
Obviously I know this would keep adding check up, but I can't guess what how else to do it.
ANOTHER UPDATE!
I can use this-
int CheckErrors(string Answer, string Guess)
{
int[,] d = new int[Answer.Length + 1, Guess.Length + 1];
for (int i = 0; i <= Answer.Length; i++)
d[i, 0] = i;
for (int j = 0; j <= Guess.Length; j++)
d[0, j] = j;
for (int j = 1; j <= Guess.Length; j++)
for (int i = 1; i <= Answer.Length; i++)
if (Answer[i - 1] == Guess[j - 1])
d[i, j] = d[i - 1, j - 1]; //no operation
else
d[i, j] = Math.Min(Math.Min(
d[i - 1, j] + 1, //a deletion
d[i, j - 1] + 1), //an insertion
d[i - 1, j - 1] + 1 //a substitution
);
return d[Answer.Length, Guess.Length];
}
But I need a way to do a count for the amount of times each error is used?

Several issues with your function:
You're trying to use a single return value to handle multiple scenarios in which the meaning of that value is not consistent. It's not advisable for a function to be able to return both a state (match, one or both values null, no match, etc) and a counter.
If you're going to use numbers to represent return states, use enum to make it more readable.
Your for loop always terminates after one iteration because it hits the return statement every time. Your return statement needs to be moved after the for loop.
if (Guess[i] != Answer[i]) will throw an exception if i is greater than the length of Guess.
It's not clear what count is supposed to represent, and it's not defined in the function.
You need to better define what exactly is your function supposed to do. If answer is "Hello" and guess is "Hhello", what are you returning? The number of letters that don't match (1)? A code that represents what the error was (Insertion)? Where the error is located? If you need more than one of those things, then you need a separate function.

You may get inspiration by looking at Approximate String Matching in Wikipedia and StackOverflow.

Have you considered trying string.Compare(...)?
http://msdn.microsoft.com/en-us/library/zkcaxw5y.aspx

Related

Non exact match in XML

I've got problem. I have to equal value from XML with string, which is typed in textBox. What I have to do is make program more "inteligent" which means, if I type "kraków" instead of "Kraków", program should find the location anyway.
Sample of code:
public static IEnumerable<XElement> GetRowsWithColumn(IEnumerable<XElement> rows, String name, String value)
{
return rows
.Where(row => row.Elements("col")
.Any(col =>
col.Attributes("name").Any(attr => attr.Value.Equals(name))
&& col.Value.Equals(value)));
}
If I type "Kraków" then I get good response from XML, but when I type "kraków" there's no match. What should I do?
And if I can ask one more question, how can I prompts such as google have? If you type "progr" google shows you "programming" for example.

You could compare your values while you use
.ToUpper()
for your strings.
To get these prompts as google have, you could need regular expression.
For more see here:
Learning Regular Expressions

just make a function which compares the strings. you can use any criteria you want
...
col.Attributes("name").Any(attr => AreEquivelant(attr.Value, name))
...
private static bool AreEquivelant(string s1, string s2)
{
//compare the strings however you want
}

You are going to find a distance. A distance is the difference between two words. You can use Levenshtein for this one.
From Wikipedia :
In information theory and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other.
A basic usecase :
static void Main(string[] args)
{
Console.WriteLine(Levenshtein.FindDistance("Alois", "aloisdg"));
Console.WriteLine(Levenshtein.FindDistance("Alois", "aloisdg", true));
Console.ReadLine();
}
Output
3
2
Lower the value, better is the match.
For your example, you could use it and if the match is lower than something (like 2) you got a valid match.
I made one here :
Code :
public static int FindDistance(string s1, string s2, bool forceLowerCase = false)
{
if (String.IsNullOrEmpty(s1) || s1.Length == 0)
return String.IsNullOrEmpty(s2) ? s2.Length : 0;
if (String.IsNullOrEmpty(s2) || s2.Length == 0)
return String.IsNullOrEmpty(s1) ? s1.Length : 0;
// not in Levenshtein but I need it.
if (forceLowerCase)
{
s1 = s1.ToLowerInvariant();
s2 = s2.ToLowerInvariant();
}
int s1Len = s1.Length;
int s2Len = s2.Length;
int[,] d = new int[s1Len + 1, s2Len + 1];
for (int i = 0; i <= s1Len; i++)
d[i, 0] = i;
for (int j = 0; j <= s2Len; j++)
d[0, j] = j;
for (int i = 1; i <= s1Len; i++)
{
for (int j = 1; j <= s2Len; j++)
{
int cost = Convert.ToInt32(s1[i - 1] != s2[j - 1]);
int min = Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1);
d[i, j] = Math.Min(min, d[i - 1, j - 1] + cost);
}
}
return d[s1Len, s2Len];
}

fuzzy matching word on OCR page

I have a static phrase the I am searching an OCR'd image for.
string KeywordToFind = "Account Number"
string OcrPageText = "
GEORGIA
POWER
A SOUTHERN COMPANY
AecountNumber
122- 493
Pagel of2
Please Pay By
Jan 29,2014
Total Due
39.11
"
How can I find the word "AecountNumber" using my keyword "Account Number"?
I have tried using variations of the Levenshtein Distance Algorithm HERE with varied success. I've also tried regexes, but the OCR often converts the text differently, thus rendering the regex useless.
Suggestions? I can provide more code if the link doesn't give enough information. Also, Thanks!

Why not try something mostly arbitrary, like this -- while it would certainly match a lot more than just account number, the chances of the start and end characters existing elsewhere in that order is pretty slim.
A.?c.?.?nt ?N.?[mn]b.?r
http://regex101.com/r/zV1yM2
It'll match things like:
Account Number
AccntNumbr
Aecnt Nunber

Answered My Question with the use of sub-strings. Posting in case others run into the same type of problem. A little unorthodox, but it works great for me.
int TextLengthBuffer = (int)StaticTextLength - 1; //start looking for correct result with one less character than it should have.
int LowestLevenshteinNumber = 999999; //initialize insanely high maximum
decimal PossibleStringLength = (PossibleString.Length); //Length of string to search
decimal StaticTextLength = (StaticText.Length); //Length of text to search for
decimal NumberOfErrorsAllowed = Math.Round((StaticTextLength * (ErrorAllowance / 100)), MidpointRounding.AwayFromZero); //Find number of errors allowed with given ErrorAllowance percentage
//Look for best match with 1 less character than it should have, then the correct amount of characters.
//And last, with 1 more character. (This is because one letter can be recognized as
//two (W -> VV) and visa versa)
for (int i = 0; i < 3; i++)
{
for (int e = TextLengthBuffer; e <= (int)PossibleStringLength; e++)
{
string possibleResult = (PossibleString.Substring((e - TextLengthBuffer), TextLengthBuffer));
int lAllowance = (int)(Math.Round((possibleResult.Length - StaticTextLength) + (NumberOfErrorsAllowed), MidpointRounding.AwayFromZero));
int lNumber = LevenshteinAlgorithm(StaticText, possibleResult);
if (lNumber <= lAllowance && ((lNumber < LowestLevenshteinNumber) || (TextLengthBuffer == StaticText.Length && lNumber <= LowestLevenshteinNumber)))
{
PossibleResult = (new StaticTextResult { text = possibleResult, errors = lNumber });
LowestLevenshteinNumber = lNumber;
}
}
TextLengthBuffer++;
}
public static int LevenshteinAlgorithm(string s, string t) // Levenshtein Algorithm
{
int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
for (int i = 0; i <= n; d[i, 0] = i++)
{
}
for (int j = 0; j <= m; d[0, j] = j++)
{
}
for (int i = 1; i <= n; i++)
{
for (int j = 1; j <= m; j++)
{
int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
d[i, j] = Math.Min(
Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
return d[n, m];
}

Fuzzy matching multiple words in string

I'm trying to employ the help of the Levenshtein Distance to find fuzzy keywords(static text) on an OCR page.
To do this, I want to give a percentage of errors that are allowed (say, 15%).
string Keyword = "past due electric service";
Since the keyword is 25 characters long, I want to allow for 4 errors (25 * .15 rounded up)
I need to be able to compare it to...
string Entire_OCR_Page = "previous bill amount payment received on 12/26/13 thank
you! current electric service total balances unpaid 7
days after the total due date are subject to a late
charge of 7.5% of the amount due or $2.00, whichever/5
greater. "
This is how I am doing it now...
int LevenshteinDistance = LevenshteinAlgorithm(Keyword, Entire_OCR_Page); // = 202
int NumberOfErrorsAllowed = 4;
int Allowance = (Entire_OCR_Page.Length() - Keyword.Length()) + NumberOfErrorsAllowed; // = 205
Clearly, Keyword is not found in OCR_Text (which it shouldn't be). But, using Levenshtein's Distance, the number of errors is less than the 15% leeway (therefore my logic says it's found).
Does anyone know of a better way to do this?

Answered My Question with the use of sub-strings. Posting in case others run into the same type of problem. A little unorthodox, but it works great for me.
int TextLengthBuffer = (int)StaticTextLength - 1; //start looking for correct result with one less character than it should have.
int LowestLevenshteinNumber = 999999; //initialize insanely high maximum
decimal PossibleStringLength = (PossibleString.Length); //Length of string to search
decimal StaticTextLength = (StaticText.Length); //Length of text to search for
decimal NumberOfErrorsAllowed = Math.Round((StaticTextLength * (ErrorAllowance / 100)), MidpointRounding.AwayFromZero); //Find number of errors allowed with given ErrorAllowance percentage
//Look for best match with 1 less character than it should have, then the correct amount of characters.
//And last, with 1 more character. (This is because one letter can be recognized as
//two (W -> VV) and visa versa)
for (int i = 0; i < 3; i++)
{
for (int e = TextLengthBuffer; e <= (int)PossibleStringLength; e++)
{
string possibleResult = (PossibleString.Substring((e - TextLengthBuffer), TextLengthBuffer));
int lAllowance = (int)(Math.Round((possibleResult.Length - StaticTextLength) + (NumberOfErrorsAllowed), MidpointRounding.AwayFromZero));
int lNumber = LevenshteinAlgorithm(StaticText, possibleResult);
if (lNumber <= lAllowance && ((lNumber < LowestLevenshteinNumber) || (TextLengthBuffer == StaticText.Length && lNumber <= LowestLevenshteinNumber)))
{
PossibleResult = (new StaticTextResult { text = possibleResult, errors = lNumber });
LowestLevenshteinNumber = lNumber;
}
}
TextLengthBuffer++;
}
public static int LevenshteinAlgorithm(string s, string t) // Levenshtein Algorithm
{
int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
for (int i = 0; i <= n; d[i, 0] = i++)
{
}
for (int j = 0; j <= m; d[0, j] = j++)
{
}
for (int i = 1; i <= n; i++)
{
for (int j = 1; j <= m; j++)
{
int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
d[i, j] = Math.Min(
Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
return d[n, m];
}

I think it's not working because a large chunk of your string is matched. So what I'd do, is try splitting your Keyword into separate words.
Then find all places where those words are matched in your OCR_TEXT.
Then look at all those places where they matched and see if any 4 of those places are consecutive and match the original phrase.
Am unsure if my explanation is clear?

Checkers Board Assistance

I'm just wondering if there is a simpler way of doing this:
for (int i = 0; i < 1; i++)
{
for (int j = 0; i < 8; j+2)
{
board[ i, j ] = 2;
board[( i + 1 ), j ] = 2;
board[( i + 2 ), j ] = 2;
}
}
What I'm trying to do is place checkers pieces on the actual checkers board. So this is to place the black pieces on the top.
P.S. If you could also give me some help on the bottom set of pieces(white).

Apart from fixing the loop you could otherwise explicitly place the pieces, makes it more readable
int[,] board = new[,]{{1,0,1,0,1,0,1,0},
{0,1,0,1,0,1,0,1},
{1,0,1,0,1,0,1,0},
{0,0,0,0,0,0,0,0},
{0,0,0,0,0,0,0,0},
{0,1,0,1,0,1,0,1},
{1,0,1,0,1,0,1,0},
{0,1,0,1,0,1,0,1}};

Modulo will do the trick but i think that TonyS's anwser was my first reaction and I would prefer that insteed the one shown below.
char[,] board = new char[8,8];
private void InitializeBoard()
{
string BoardRepresentation = "";
//for every board cell
for (int i = 0; i < 8; i++)
{
for (int j = 0; j < 8; j++)
{
//initialize board cell
board[i, j] = '0';
if (j <= 2)//black top
{
//Modulo is the trick
if ((j - i) == 0 || ((j - i) % 2) == 0)
{
board[i, j] = 'B';
}
}
else if (j >= 5) //white bot
{
if ((j - i) == 0 || ((j - i) % 2) == 0)
{
board[i, j] = 'W';
}
}
}
}
for (int j = 0; j < 8; j++)
{
for (int i = 0; i < 8; i++)
{
BoardRepresentation += board[i, j] + " ";
}
BoardRepresentation += Environment.NewLine;
}
}

You missed to tell us what exactly is it that you want to achieve. I see several big errors in your code, so I'll assume that you don't have full grasp of how for loop works. I hope it's not too presumptuous that I'm trying to explain it here
For loop
For loop is used to execute same portion of code several times. How many times it will be executed depends on conditions you set. Most often, you will see it in this format:
for (int i = 0; i < n; i++)
{
// Some code
}
This for loop executed code within the brackets ({ and }) n times. This is not only way to define a loop. More thorough definition of a loop is following:
for (<initialization>; <condition>; <afterthought>)
Initialization - You can some variables needed for looping. This is executed once before code within the loop is executed first time. This is optional, and you can leave it empty and use variable declared before in condition.
Condition - This is executed before each execution of code within the loop. If condition expression evaluates to true, loop is executed. Once loop is executed and afterthought is executed, condition is evaluated again and again until it evaluates to false. Condition is also optional. If you leave it out, in C# loop will be executed again until you break the loop in a different way.
Afterthought - This is executed each time code within the loop is finished executing. This is usually used to increment a variable on which loop depends. This is also optional.
Fixing your code
I assume you wanted to mark fields in a 8x8 matrix like in a checkerboard, although this is not stated in your question. You could do it this way:
// For each row in a board (start with 0, go until 7, increase by 1)
for (int i = 0; i < 8; i++)
{
// start coloring the row. Determine which field within the row needs
// to be black. In first row, first field is black, in second second
// field is black, in third row first field is black again and so on.
// So, even rows have black field in first blace, odd rows have black
// on second place.
// We calculate this by determining division remained when dividing by 2.
int firstBlack = i % 2;
// Starting with calculated field, and until last field color fields black.
// Skip every second field. (start with firstBlack, go until 7, increase by 2)
for (int j = firstBlack; j < 8; j += 2)
{
// Color the field black (set to 2)
board[i][j] = 2;
}
}
You can see my comments inline.
Big errors in your code
// This loop would be executed only once. It goes from 0 to less than 1 and is increased
// after first execution. You might as well done it without the loop.
for (int i = 0; i < 1; i++)
{
// This doesn't make sense, because you use i in condition, and initialize
// and update j.
// Also, you are trying to update j, but you are not doing so. You are not
// assigning anything to you. You should do j+=2 to increase by two. Or you
// can do j = j + 2
for (int j = 0; i < 8; j+2)
{
// I didn't really understand what you were trying to achieve here
board[ i, j ] = 2;
board[( i + 1 ), j ] = 2;
board[( i + 2 ), j ] = 2;
}
}

Damerau–Levenshtein distance algorithm, disable counting of delete

How can i disable counting of deletion, in this implementation of Damerau-Levenshtein distance algorithm, or if there is other algorithm already implemented please point me to it.
Example(disabled deletion counting):
string1: how are you?
string2: how oyu?
distance: 1 (for transposition, 4 deletes doesn't count)
And here is the algorithm:
public static int DamerauLevenshteinDistance(string string1, string string2, int threshold)
{
// Return trivial case - where they are equal
if (string1.Equals(string2))
return 0;
// Return trivial case - where one is empty
if (String.IsNullOrEmpty(string1) || String.IsNullOrEmpty(string2))
return (string1 ?? "").Length + (string2 ?? "").Length;
// Ensure string2 (inner cycle) is longer_transpositionRow
if (string1.Length > string2.Length)
{
var tmp = string1;
string1 = string2;
string2 = tmp;
}
// Return trivial case - where string1 is contained within string2
if (string2.Contains(string1))
return string2.Length - string1.Length;
var length1 = string1.Length;
var length2 = string2.Length;
var d = new int[length1 + 1, length2 + 1];
for (var i = 0; i <= d.GetUpperBound(0); i++)
d[i, 0] = i;
for (var i = 0; i <= d.GetUpperBound(1); i++)
d[0, i] = i;
for (var i = 1; i <= d.GetUpperBound(0); i++)
{
var im1 = i - 1;
var im2 = i - 2;
var minDistance = threshold;
for (var j = 1; j <= d.GetUpperBound(1); j++)
{
var jm1 = j - 1;
var jm2 = j - 2;
var cost = string1[im1] == string2[jm1] ? 0 : 1;
var del = d[im1, j] + 1;
var ins = d[i, jm1] + 1;
var sub = d[im1, jm1] + cost;
//Math.Min is slower than native code
//d[i, j] = Math.Min(del, Math.Min(ins, sub));
d[i, j] = del <= ins && del <= sub ? del : ins <= sub ? ins : sub;
if (i > 1 && j > 1 && string1[im1] == string2[jm2] && string1[im2] == string2[jm1])
d[i, j] = Math.Min(d[i, j], d[im2, jm2] + cost);
if (d[i, j] < minDistance)
minDistance = d[i, j];
}
if (minDistance > threshold)
return int.MaxValue;
}
return d[d.GetUpperBound(0), d.GetUpperBound(1)] > threshold
? int.MaxValue
: d[d.GetUpperBound(0), d.GetUpperBound(1)];
}

public static int DamerauLevenshteinDistance( string string1
, string string2
, int threshold)
{
// Return trivial case - where they are equal
if (string1.Equals(string2))
return 0;
// Return trivial case - where one is empty
// WRONG FOR YOUR NEEDS:
// if (String.IsNullOrEmpty(string1) || String.IsNullOrEmpty(string2))
// return (string1 ?? "").Length + (string2 ?? "").Length;
//DO IT THIS WAY:
if (String.IsNullOrEmpty(string1))
// First string is empty, so every character of
// String2 has been inserted:
return (string2 ?? "").Length;
if (String.IsNullOrEmpty(string2))
// Second string is empty, so every character of string1
// has been deleted, but you dont count deletions:
return 0;
// DO NOT SWAP THE STRINGS IF YOU WANT TO DEAL WITH INSERTIONS
// IN A DIFFERENT MANNER THEN WITH DELETIONS:
// THE FOLLOWING IS WRONG FOR YOUR NEEDS:
// // Ensure string2 (inner cycle) is longer_transpositionRow
// if (string1.Length > string2.Length)
// {
// var tmp = string1;
// string1 = string2;
// string2 = tmp;
// }
// Return trivial case - where string1 is contained within string2
if (string2.Contains(string1))
//all changes are insertions
return string2.Length - string1.Length;
// REVERSE CASE: STRING2 IS CONTAINED WITHIN STRING1
if (string1.Contains(string2))
//all changes are deletions which you don't count:
return 0;
var length1 = string1.Length;
var length2 = string2.Length;
// PAY ATTENTION TO THIS CHANGE!
// length1+1 rows is way too much! You need only 3 rows (0, 1 and 2)
// read my explanation below the code!
// TOO MUCH ROWS: var d = new int[length1 + 1, length2 + 1];
var d = new int[2, length2 + 1];
// THIS INITIALIZATION COUNTS DELETIONS. YOU DONT WANT IT
// or (var i = 0; i <= d.GetUpperBound(0); i++)
// d[i, 0] = i;
// But you must initiate the first element of each row with 0:
for (var i = 0; i <= 2; i++)
d[i, 0] = 0;
// This initialization counts insertions. You need it, but for
// better consistency of code I call the variable j (not i):
for (var j = 0; j <= d.GetUpperBound(1); j++)
d[0, j] = j;
// Now do the job:
// for (var i = 1; i <= d.GetUpperBound(0); i++)
for (var i = 1; i <= length1; i++)
{
//Here in this for-loop: add "%3" to evey term
// that is used as first index of d!
var im1 = i - 1;
var im2 = i - 2;
var minDistance = threshold;
for (var j = 1; j <= d.GetUpperBound(1); j++)
{
var jm1 = j - 1;
var jm2 = j - 2;
var cost = string1[im1] == string2[jm1] ? 0 : 1;
// DON'T COUNT DELETIONS! var del = d[im1, j] + 1;
var ins = d[i % 3, jm1] + 1;
var sub = d[im1 % 3, jm1] + cost;
// Math.Min is slower than native code
// d[i, j] = Math.Min(del, Math.Min(ins, sub));
// DEL DOES NOT EXIST
// d[i, j] = del <= ins && del <= sub ? del : ins <= sub ? ins : sub;
d[i % 3, j] = ins <= sub ? ins : sub;
if (i > 1 && j > 1 && string1[im1] == string2[jm2] && string1[im2] == string2[jm1])
d[i % 3, j] = Math.Min(d[i % 3, j], d[im2 % 3, jm2] + cost);
if (d[i % 3, j] < minDistance)
minDistance = d[i % 3, j];
}
if (minDistance > threshold)
return int.MaxValue;
}
return d[length1 % 3, d.GetUpperBound(1)] > threshold
? int.MaxValue
: d[length1 % 3, d.GetUpperBound(1)];
}
here comes my explanation why you need only 3 rows:
Look at this line:
var d = new int[length1 + 1, length2 + 1];
If one string has the length n and the other has the length m, then your code needs a space of (n+1)*(m+1) integers. Each Integer needs 4 Byte. This is waste of memory if your strings are long. If both strings are 35.000 byte long, you will need more than 4 GB of memory!
In this code you calculate and write a new value for d[i,j]. And to do this, you read values from its upper neighbor (d[i,jm1]), from its left neighbor (d[im1,j]), from its upper-left neighbor (d[im1,jm1]) and finally from its double-upper-double-left neighbour (d[im2,jm2]). So you just need values from your actual row and 2 rows before.
You never need values from any other row. So why do you want to store them? Three rows are enough, and my changes make shure, that you can work with this 3 rows without reading any wrong value at any time.

I would advise not rewriting this specific algorithm to handle specific cases of "free" edits. Many of them radically simplify the concept of the problem to the point where the metric will not convey any useful information.
For example, when substitution is free the distance between all strings is the difference between their lengths. Simply transmute the smaller string into the prefix of the larger string and add the needed letters. (You can guarantee that there is no smaller distance because one insertion is required for each character of edit distance.)
When transposition is free the question reduces to determining the sum of differences of letter counts. (Since the distance between all anagrams is 0, sorting the letters in each string and exchanging out or removing the non-common elements of the larger string is the best strategy. The mathematical argument is similar to that of the previous example.)
In the case when insertion and deletion are free the edit distance between any two strings is zero. If only insertion OR deletion is free this breaks the symmetry of the distance metric - with free deletions, the distance from a to aa is 1, while the distance from aa to a is 1. Depending on the application this could possibly be desirable; but I'm not sure if it's something you're interested in. You will need to greatly alter the presented algorithm because it makes the mentioned assumption of one string always being longer than the other.

Try to change var del = d[im1, j] + 1; to var del = d[im1, j];, I think that solves your problem.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# check given word to answer word - c#

You may get inspiration by looking at Approximate String Matching in Wikipedia and StackOverflow.

Have you considered trying string.Compare(...)? http://msdn.microsoft.com/en-us/library/zkcaxw5y.aspx

Related

Non exact match in XML

fuzzy matching word on OCR page

Fuzzy matching multiple words in string

Checkers Board Assistance

Damerau–Levenshtein distance algorithm, disable counting of delete

Categories

Resources