I am trying to implement Iterative Deeping Search. I do not know what I am doing wrong, but I don't seem to be getting it right. I always end up with an infinite loop.
Can anyone point out my mistake?
I implemented the Depth-Limited Search and used it in my IDS code. DLS seems to be working fine on its own, but I do not understand IDS and why i'm ending up in an infinite loop.
public class IterativeDeepeningSearch<T> where T : IComparable
{
string closed;
public int maximumDepth;
public int depth = 0;
bool Found = false;
Stack<Vertex<T>> open;
public IterativeDeepeningSearch()
{
open = new Stack<Vertex<T>>();
}
public bool IDS(Vertex<T> startNode, Vertex<T> goalNode)
{
// loops through until a goal node is found
for (int _depth = 0; _depth < Int32.MaxValue; _depth++)
{
bool found = DLS(startNode, goalNode, _depth);
if (found)
{
return true;
}
}
// this will never be reached as it
// loops forever until goal is found
return false;
}
public bool DLS(Vertex<T> startNode, Vertex<T> goalNode, int _maximumDepth)
{
maximumDepth = _maximumDepth;
open.Push(startNode);
while (open.Count > 0 && depth < maximumDepth)
{
Vertex<T> node = open.Pop();
closed = closed + " " + node.Data.ToString();
if (node.Data.ToString() == goalNode.Data.ToString())
{
Debug.Write("Success");
Found = true;
break;
}
List<Vertex<T>> neighbours = node.Neighbors;
depth++;
if (neighbours != null)
{
foreach (Vertex<T> neighbour in neighbours)
{
if (!closed.Contains(neighbour.ToString()))
open.Push(neighbour);
}
Debug.Write("Failure");
}
}
Console.WriteLine(closed);
return Found;
}
}
}
PS: My Vertex Class just has two properties, Data and Children
The for loop iterates on _depth but in the DLS function you are passing depth, which is always 0
for (int _depth = 0; _depth < Int32.MaxValue; _depth++)
{
bool found = DLS(startNode, goalNode, depth);
if (found)
{
return true;
}
}
Related
This question already has answers here:
C# compiler error: "not all code paths return a value"
(9 answers)
Closed 3 years ago.
not all code paths return a value error is thrown while executing it.please help us to resolve it at the earliest.
There seems to be some code path that is not returning any value.
can someone please help to fix it up?
There are many for loops in the code. i am not able to figure which one is causing this issue.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
//namespace ConsoleApp7
//{
class Solution
{
static bool CheckElementSymbol(string elementName, string symbol)
{
symbol = symbol.ToLower();
int symbol_length = symbol.Length;
int numberofchars = 0;
int firstletter = 0;
bool firstcharfound = false;
bool secondcharfound = false;
//bool symbolfound = false;
//int symbolpresent = 0;
int secondcharmatch = 0;
if (symbol_length == 2)
{
foreach (char sym in symbol)
{
numberofchars = numberofchars + 1;
var firstcharmatch = new List<int>();
//int index = 0;
int sourcelength = elementName.Length;
if (numberofchars == 1)
{
for (int index = 0; index < sourcelength; index++)
{
int matchfound1stchar = elementName.IndexOf(sym, index, 1);
if (matchfound1stchar != -1)
{
firstletter = 1;
firstcharmatch.Add(matchfound1stchar + 1);
}
}
if (firstletter == 1)
{
firstcharfound = true;
}
else
{
firstcharfound = false;
}
}
//int matchingchar = elementName
if (numberofchars == 2)
{
secondcharmatch = elementName.LastIndexOf(elementName, sym);
//yield return index;
if (secondcharmatch != -1)
{
secondcharfound = true;
secondcharmatch = secondcharmatch + 1;
}
else
{ secondcharfound = false; }
}
//int matchingchar = elementName
if (secondcharfound == true && firstcharfound == true)
{
foreach (int value in firstcharmatch)
{
if (secondcharmatch > value)
{
//symbolfound = true;
//return symbolfound;
return true;
}
return false;
}
}
else
{
return false;
}
}
}
else
{
return false;
}
}
static void Main(string[] args)
{
TextWriter textWriter = new StreamWriter(#System.Environment.GetEnvironmentVariable("OUTPUT_PATH"), true);
string elementName = Console.ReadLine();
string symbol = Console.ReadLine();
bool res = CheckElementSymbol(elementName, symbol);
textWriter.WriteLine((res ? 1 : 0));
textWriter.Flush();
textWriter.Close();
}
}
//}
Found the problem. If the string has no characters then it should return false
if (symbol_length == 2)
{
foreach (char sym in symbol)(...)//this code is irrelevant.
return false; //here is the solution, if there are no characters in the string, then return false .
}
else
{
return false;
}
Next time make your code more easily read and only show the relevant parts.
First of all, please post only relevant and minimal code in your question in order to get quick response.
For your query, you need to understand that the compiler error you are getting is
error CS0161: 'Solution.CheckElementSymbol(string, string)': not all
code paths return a value
This error is because all your return statements are inside either IF or ELSE statements.
The error would get fixed if you add a return statement at the end of CheckElementSymbol method.
return false;
Hope this helps.
I'm working on a hacker rank problem and I believe my solution is correct. However, most of my test cases are being timed out. Could some one suggest how to improve the efficiency of my code?
The important part of the code starts at the "Trie Class".
The exact question can be found here: https://www.hackerrank.com/challenges/contacts
using System;
using System.Collections.Generic;
using System.IO;
class Solution
{
static void Main(String[] args)
{
int N = Int32.Parse(Console.ReadLine());
string[,] argList = new string[N, 2];
for (int i = 0; i < N; i++)
{
string[] s = Console.ReadLine().Split();
argList[i, 0] = s[0];
argList[i, 1] = s[1];
}
Trie trie = new Trie();
for (int i = 0; i < N; i++)
{
switch (argList[i, 0])
{
case "add":
trie.add(argList[i, 1]);
break;
case "find":
Console.WriteLine(trie.find(argList[i, 1]));
break;
default:
break;
}
}
}
}
class Trie
{
Trie[] trieArray = new Trie[26];
private int findCount = 0;
private bool data = false;
private char name;
public void add(string s)
{
s = s.ToLower();
add(s, this);
}
private void add(string s, Trie t)
{
char first = Char.Parse(s.Substring(0, 1));
int index = first - 'a';
if(t.trieArray[index] == null)
{
t.trieArray[index] = new Trie();
t.trieArray[index].name = first;
}
if (s.Length > 1)
{
add(s.Substring(1), t.trieArray[index]);
}
else
{
t.trieArray[index].data = true;
}
}
public int find(string s)
{
int ans;
s = s.ToLower();
find(s, this);
ans = findCount;
findCount = 0;
return ans;
}
private void find(string s, Trie t)
{
if (t == null)
{
return;
}
if (s.Length > 0)
{
char first = Char.Parse(s.Substring(0, 1));
int index = first - 'a';
find(s.Substring(1), t.trieArray[index]);
}
else
{
for(int i = 0; i < 26; i++)
{
if (t.trieArray[i] != null)
{
find("", t.trieArray[i]);
}
}
if (t.data == true)
{
findCount++;
}
}
}
}
EDIT: I did some suggestions in the comments but realized that I can't replace s.Substring(1) with s[0]... because I actually need s[1..n]. AND s[0] returns a char so I'm going to need to do .ToString on it anyways.
Also, to add a little more information. The idea is that it needs to count ALL names after a prefix for example.
Input: "He"
Trie Contains:
"Hello"
"Help"
"Heart"
"Ha"
"No"
Output: 3
I could just post a solution here that gives you 40 points but i guess this wont be any fun.
Make use of Enumerator<char> instead of any string operations
Count while adding values
any charachter minus 'a' makes a good arrayindex (gives you values between 0 and 25)
I think, this link must be very helpfull
There, you can find the good implementation of a trie data structure, but it's a java implementation. I implemented trie on c# with that great article one year ago, and i tried to find solution to an another hackerrank task too ) It was successful.
In my task, i had to a simply trie (i mean my alphabet equals 2) and only one method for adding new nodes:
public class Trie
{
//for integer representation in binary system 2^32
public static readonly int MaxLengthOfBits = 32;
//alphabet trie size
public static readonly int N = 2;
class Node
{
public Node[] next = new Node[Trie.N];
}
private Node _root;
}
public void AddValue(bool[] binaryNumber)
{
_root = AddValue(_root, binaryNumber, 0);
}
private Node AddValue(Node node, bool[] val, int d)
{
if (node == null) node = new Node();
//if least sagnificient bit has been added
//need return
if (d == val.Length)
{
return node;
}
// get 0 or 1 index of next array(length 2)
int index = Convert.ToInt32(val[d]);
node.next[index] = AddValue(node.next[index], val, ++d);
return node;
}
I'm testing a self written element generator (ICollection<string>) and compare the calculated count to the actual count to get an idea if there's an error or not in my algorithm.
As this generator can generate lots of elements on demand I'm looking in Partitioner<string> and I have implemented a basic one which seems to also produce valid enumerators which together give the same amount of strings as calculated.
Now I want to test how this behaves if run parallel (again first testing for correct count):
MyGenerator generator = new MyGenerator();
MyPartitioner partitioner = new MyPartitioner(generator);
int isCount = partitioner.AsParallel().Count();
int shouldCount = generator.Count;
bool same = isCount == shouldCount; // false
I don't get why this count is not equal! What is the ParallelQuery<string> doing?
generator.Count() == generator.Count // true
partitioner.GetPartitions(xyz).Select(enumerator =>
{
int count = 0;
while (enumerator.MoveNext())
{
count++;
}
return count;
}).Sum() == generator.Count // true
So, I'm currently not seeing an error in my code. Next I tried to manualy count that ParallelQuery<string>:
int count = 0;
partitioner.AsParallel().ForAll(e => Interlocked.Increment(ref count));
count == generator.Count // true
Summed up: Everyone counts my enumerable correct, ParallelQuery.ForAll enumerates exactly generator.Count elements. But what does ParallelQuery.Count()?
If the correct count is something about 10k, ParallelQuery sees 40k.
internal sealed class PartialWordEnumerator : IEnumerator<string>
{
private object sync = new object();
private readonly IEnumerable<char> characters;
private readonly char[] limit;
private char[] buffer;
private IEnumerator<char>[] enumerators;
private int position = 0;
internal PartialWordEnumerator(IEnumerable<char> characters, char[] state, char[] limit)
{
this.characters = new List<char>(characters);
this.buffer = (char[])state.Clone();
if (limit != null)
{
this.limit = (char[])limit.Clone();
}
this.enumerators = new IEnumerator<char>[this.buffer.Length];
for (int i = 0; i < this.buffer.Length; i++)
{
this.enumerators[i] = SkipTo(state[i]);
}
}
private IEnumerator<char> SkipTo(char c)
{
IEnumerator<char> first = this.characters.GetEnumerator();
IEnumerator<char> second = this.characters.GetEnumerator();
while (second.MoveNext())
{
if (second.Current == c)
{
return first;
}
first.MoveNext();
}
throw new InvalidOperationException();
}
private bool ReachedLimit
{
get
{
if (this.limit == null)
{
return false;
}
for (int i = 0; i < this.buffer.Length; i++)
{
if (this.buffer[i] != this.limit[i])
{
return false;
}
}
return true;
}
}
public string Current
{
get
{
if (this.buffer == null)
{
throw new ObjectDisposedException(typeof(PartialWordEnumerator).FullName);
}
return new string(this.buffer);
}
}
object IEnumerator.Current
{
get { return this.Current; }
}
public bool MoveNext()
{
lock (this.sync)
{
if (this.position == this.buffer.Length)
{
this.position--;
}
if (this.position == -1)
{
return false;
}
IEnumerator<char> enumerator = this.enumerators[this.position];
if (enumerator.MoveNext())
{
this.buffer[this.position] = enumerator.Current;
this.position++;
if (this.position == this.buffer.Length)
{
return !this.ReachedLimit;
}
else
{
return this.MoveNext();
}
}
else
{
this.enumerators[this.position] = this.characters.GetEnumerator();
this.position--;
return this.MoveNext();
}
}
}
public void Dispose()
{
this.position = -1;
this.buffer = null;
}
public void Reset()
{
throw new NotSupportedException();
}
}
public override IList<IEnumerator<string>> GetPartitions(int partitionCount)
{
IEnumerator<string>[] enumerators = new IEnumerator<string>[partitionCount];
List<char> characters = new List<char>(this.generator.Characters);
int length = this.generator.Length;
int characterCount = this.generator.Characters.Count;
int steps = Math.Min(characterCount, partitionCount);
int skip = characterCount / steps;
for (int i = 0; i < steps; i++)
{
char c = characters[i * skip];
char[] state = new string(c, length).ToCharArray();
char[] limit = null;
if ((i + 1) * skip < characterCount)
{
c = characters[(i + 1) * skip];
limit = new string(c, length).ToCharArray();
}
if (i == steps - 1)
{
limit = null;
}
enumerators[i] = new PartialWordEnumerator(characters, state, limit);
}
for (int i = steps; i < partitionCount; i++)
{
enumerators[i] = Enumerable.Empty<string>().GetEnumerator();
}
return enumerators;
}
EDIT: I believe I have found the solution. According to the documentation on IEnumerable.MoveNext (emphasis mine):
If MoveNext passes the end of the collection, the enumerator is
positioned after the last element in the collection and MoveNext
returns false. When the enumerator is at this position, subsequent
calls to MoveNext also return false until Reset is called.
According to the following logic:
private bool ReachedLimit
{
get
{
if (this.limit == null)
{
return false;
}
for (int i = 0; i < this.buffer.Length; i++)
{
if (this.buffer[i] != this.limit[i])
{
return false;
}
}
return true;
}
}
The call to MoveNext() will return false only one time - when the buffer is exactly equal to the limit. Once you have passed the limit, the return value from ReachedLimit will start to become false again, making return !this.ReachedLimit return true, so the enumerator will continue past the end of the limit all the way until it runs out of characters to enumerate. Apparently, in the implementation of ParallelQuery.Count(), MoveNext() is called multiple times when it has reached the end, and since it starts to return a true value again, the enumerator happily continues returning more elements (this is not the case in your custom code that walks the enumerator manually, and apparently also is not the case for the ForAll call, so they "accidentally" return the correct results).
The simplest fix to this is to remember the return value from MoveNext() once it becomes false:
private bool _canMoveNext = true;
public bool MoveNext()
{
if (!_canMoveNext) return false;
...
if (this.position == this.buffer.Length)
{
if (this.ReachedLimit) _canMoveNext = false;
...
}
Now once it begins returning false, it will return false for every future call and this returns the correct result from AsParallel().Count(). Hope this helps!
The documentation on Partitioner notes (emphasis mine):
The static methods on Partitioner are all thread-safe and may
be used concurrently from multiple threads. However, while a created
partitioner is in use, the underlying data source should not be
modified, whether from the same thread that is using a partitioner or
from a separate thread.
From what I can understand of the code you have given, it would seem that ParallelQuery.Count() is most likely to have thread-safety issues because it may possibly be iterating multiple enumerators at the same time, whereas all the other solutions would require the enumerators to be run synchronized. Without seeing the code you are using for MyGenerator and MyPartitioner is it difficult to determine if thread-safety issues could be the culprit.
To demonstrate, I have written a simple enumerator that returns the first hundred numbers as strings. Also, I have a partitioner, that distributes the elements in the underlying enumerator over a collection of numPartitions separate lists. Using all the methods you described above on our 12-core server (when I output numPartitions, it uses 12 by default on this machine), I get the expected result of 100 (this is LINQPad-ready code):
void Main()
{
var partitioner = new SimplePartitioner(GetEnumerator());
GetEnumerator().Count().Dump();
partitioner.GetPartitions(10).Select(enumerator =>
{
int count = 0;
while (enumerator.MoveNext())
{
count++;
}
return count;
}).Sum().Dump();
var theCount = 0;
partitioner.AsParallel().ForAll(e => Interlocked.Increment(ref theCount));
theCount.Dump();
partitioner.AsParallel().Count().Dump();
}
// Define other methods and classes here
public IEnumerable<string> GetEnumerator()
{
for (var i = 1; i <= 100; i++)
yield return i.ToString();
}
public class SimplePartitioner : Partitioner<string>
{
private IEnumerable<string> input;
public SimplePartitioner(IEnumerable<string> input)
{
this.input = input;
}
public override IList<IEnumerator<string>> GetPartitions(int numPartitions)
{
var list = new List<string>[numPartitions];
for (var i = 0; i < numPartitions; i++)
list[i] = new List<string>();
var index = 0;
foreach (var s in input)
list[(index = (index + 1) % numPartitions)].Add(s);
IList<IEnumerator<string>> result = new List<IEnumerator<string>>();
foreach (var l in list)
result.Add(l.GetEnumerator());
return result;
}
}
Output:
100
100
100
100
This clearly works. Without more information it is impossible to tell you what is not working in your particular implementation.
Does anyone know about a good way to accomplish this task?
Currently i'm doing it more ore less this way, but i'm feeling someway unhappy with this code, unable to say what i could immediately improve.
So if anyone has a smarter way of doing this job i would be happy to know.
private bool Check(List<MyItem> list)
{
bool result = true;
//MyItem implements IComparable<MyItem>
list.Sort();
for (int pos = 0; pos < list.Count - 1; pos++)
{
bool previousCheckOk = true;
if (pos != 0)
{
if (!CheckCollisionWithPrevious(pos))
{
MarkAsFailed(pos);
result = false;
previousCheckOk = false;
}
else
{
MarkAsGood(pos);
}
}
if (previousCheckOk && pos != list.Count - 1)
{
if (!CheckCollisionWithFollowing(pos))
{
MarkAsFailed(pos);
result = false;
}
else
{
MarkAsGood(pos);
}
}
}
return result;
}
private bool CheckCollisionWithPrevious(int pos)
{
bool checkOk = false;
var previousItem = _Item[pos - 1];
// Doing some checks ...
return checkOk;
}
private bool CheckCollisionWithFollowing(int pos)
{
bool checkOk = false;
var followingItem = _Item[pos + 1];
// Doing some checks ...
return checkOk;
}
Update
After reading the answer from Aaronaught and a little weekend to refill full mind power i came up with the following solution, that looks far better now (and nearly the same i got from Aaronaught):
public bool Check(DataGridView dataGridView)
{
bool result = true;
_Items.Sort();
for (int pos = 1; pos < _Items.Count; pos++)
{
var previousItem = _Items[pos - 1];
var currentItem = _Items[pos];
if (previousItem.CollidesWith(currentItem))
{
dataGridView.Rows[pos].ErrorText = "Offset collides with item named " + previousItem.Label;
result = false;
sb.AppendLine("Line " + pos);
}
}
dataGridView.Refresh();
return result;
}
It's certainly possible to reduce the repetition:
private bool Check(List<MyItem> list)
{
list.Sort();
for (int pos = 1; pos < list.Count; pos++)
{
if (!CheckCollisionWithPrevious(list, pos))
{
MarkAsFailed();
return false;
}
MarkAsGood();
}
return true;
}
private bool CheckCollisionWithPrevious(List<MyItem> list, int pos)
{
bool checkOk = false;
var previousItem = list[pos - 1];
// Doing some checks ...
return checkOk;
}
Assuming that CheckCollisionWithPrevious and CheckCollisionWithFollowing perform essentially the same comparisons, then this will perform the same function with a lot less code.
I've also added the list as a parameter to the second function; it doesn't make sense to be taking it as a parameter in the first function, but then referencing a hard-coded member in the function it calls. If you're going to take a parameter, then pass that parameter down the chain.
As far as performance is concerned, though, you're re-sorting the list every time this happens; if it happens often enough, you might be better off using a sorted collection to begin with.
Edit: And just for good measure, if the whole point of this code is just to check for some kind of duplicate key, then you would be way better off using a data structure that prevents this in the first place, such as a Dictionary<TKey, TValue>.
Is there any simple algorithm to determine the likeliness of 2 names representing the same person?
I'm not asking for something of the level that Custom department might be using. Just a simple algorithm that would tell me if 'James T. Clark' is most likely the same name as 'J. Thomas Clark' or 'James Clerk'.
If there is an algorithm in C# that would be great, but I can translate from any language.
Sounds like you're looking for a phonetic-based algorithms, such as soundex, NYSIIS, or double metaphone. The first actually is what several government departments use, and is trivial to implement (with many implementations readily available). The second is a slightly more complicated and more precise version of the first. The latter-most works with some non-English names and alphabets.
Levenshtein distance is a definition of distance between two arbitrary strings. It gives you a distance of 0 between identical strings and non-zero between different strings, which might also be useful if you decide to make a custom algorithm.
Levenshtein is close, although maybe not exactly what you want.
I've faced similar problem and tried to use Levenstein distance first, but it did not work well for me. I came up with an algorithm that gives you "similarity" value between two strings (higher value means more similar strings, "1" for identical strings). This value is not very meaningful by itself (if not "1", always 0.5 or less), but works quite well when you throw in Hungarian Matrix to find matching pairs from two lists of strings.
Use like this:
PartialStringComparer cmp = new PartialStringComparer();
tbResult.Text = cmp.Compare(textBox1.Text, textBox2.Text).ToString();
The code behind:
public class SubstringRange {
string masterString;
public string MasterString {
get { return masterString; }
set { masterString = value; }
}
int start;
public int Start {
get { return start; }
set { start = value; }
}
int end;
public int End {
get { return end; }
set { end = value; }
}
public int Length {
get { return End - Start; }
set { End = Start + value;}
}
public bool IsValid {
get { return MasterString.Length >= End && End >= Start && Start >= 0; }
}
public string Contents {
get {
if(IsValid) {
return MasterString.Substring(Start, Length);
} else {
return "";
}
}
}
public bool OverlapsRange(SubstringRange range) {
return !(End < range.Start || Start > range.End);
}
public bool ContainsRange(SubstringRange range) {
return range.Start >= Start && range.End <= End;
}
public bool ExpandTo(string newContents) {
if(MasterString.Substring(Start).StartsWith(newContents, StringComparison.InvariantCultureIgnoreCase) && newContents.Length > Length) {
Length = newContents.Length;
return true;
} else {
return false;
}
}
}
public class SubstringRangeList: List<SubstringRange> {
string masterString;
public string MasterString {
get { return masterString; }
set { masterString = value; }
}
public SubstringRangeList(string masterString) {
this.MasterString = masterString;
}
public SubstringRange FindString(string s){
foreach(SubstringRange r in this){
if(r.Contents.Equals(s, StringComparison.InvariantCultureIgnoreCase))
return r;
}
return null;
}
public SubstringRange FindSubstring(string s){
foreach(SubstringRange r in this){
if(r.Contents.StartsWith(s, StringComparison.InvariantCultureIgnoreCase))
return r;
}
return null;
}
public bool ContainsRange(SubstringRange range) {
foreach(SubstringRange r in this) {
if(r.ContainsRange(range))
return true;
}
return false;
}
public bool AddSubstring(string substring) {
bool result = false;
foreach(SubstringRange r in this) {
if(r.ExpandTo(substring)) {
result = true;
}
}
if(FindSubstring(substring) == null) {
bool patternfound = true;
int start = 0;
while(patternfound){
patternfound = false;
start = MasterString.IndexOf(substring, start, StringComparison.InvariantCultureIgnoreCase);
patternfound = start != -1;
if(patternfound) {
SubstringRange r = new SubstringRange();
r.MasterString = this.MasterString;
r.Start = start++;
r.Length = substring.Length;
if(!ContainsRange(r)) {
this.Add(r);
result = true;
}
}
}
}
return result;
}
private static bool SubstringRangeMoreThanOneChar(SubstringRange range) {
return range.Length > 1;
}
public float Weight {
get {
if(MasterString.Length == 0 || Count == 0)
return 0;
float numerator = 0;
int denominator = 0;
foreach(SubstringRange r in this.FindAll(SubstringRangeMoreThanOneChar)) {
numerator += r.Length;
denominator++;
}
if(denominator == 0)
return 0;
return numerator / denominator / MasterString.Length;
}
}
public void RemoveOverlappingRanges() {
SubstringRangeList l = new SubstringRangeList(this.MasterString);
l.AddRange(this);//create a copy of this list
foreach(SubstringRange r in l) {
if(this.Contains(r) && this.ContainsRange(r)) {
Remove(r);//try to remove the range
if(!ContainsRange(r)) {//see if the list still contains "superset" of this range
Add(r);//if not, add it back
}
}
}
}
public void AddStringToCompare(string s) {
for(int start = 0; start < s.Length; start++) {
for(int len = 1; start + len <= s.Length; len++) {
string part = s.Substring(start, len);
if(!AddSubstring(part))
break;
}
}
RemoveOverlappingRanges();
}
}
public class PartialStringComparer {
public float Compare(string s1, string s2) {
SubstringRangeList srl1 = new SubstringRangeList(s1);
srl1.AddStringToCompare(s2);
SubstringRangeList srl2 = new SubstringRangeList(s2);
srl2.AddStringToCompare(s1);
return (srl1.Weight + srl2.Weight) / 2;
}
}
Levenstein distance one is much simpler (adapted from http://www.merriampark.com/ld.htm):
public class Distance {
/// <summary>
/// Compute Levenshtein distance
/// </summary>
/// <param name="s">String 1</param>
/// <param name="t">String 2</param>
/// <returns>Distance between the two strings.
/// The larger the number, the bigger the difference.
/// </returns>
public static int LD(string s, string t) {
int n = s.Length; //length of s
int m = t.Length; //length of t
int[,] d = new int[n + 1, m + 1]; // matrix
int cost; // cost
// Step 1
if(n == 0) return m;
if(m == 0) return n;
// Step 2
for(int i = 0; i <= n; d[i, 0] = i++) ;
for(int j = 0; j <= m; d[0, j] = j++) ;
// Step 3
for(int i = 1; i <= n; i++) {
//Step 4
for(int j = 1; j <= m; j++) {
// Step 5
cost = (t.Substring(j - 1, 1) == s.Substring(i - 1, 1) ? 0 : 1);
// Step 6
d[i, j] = System.Math.Min(System.Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1), d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
}
I doubt there is, considering even the Customs Department doesn't seem to have a satisfactory answer...
If there is a solution to this problem I seriously doubt it's a part of core C#. Off the top of my head, it would require a database of first, middle and last name frequencies, as well as account for initials, as in your example. This is fairly complex logic that relies on a database of information.
Second to Levenshtein distance, what language do you want? I was able to find an implementation in C# on codeproject pretty easily.
In an application I worked on, the Last name field was considered reliable.
So presented all the all the records with the same last name to the user.
User could sort by the other fields to look for similar names.
This solution was good enough to greatly reduce the issue of users creating duplicate records.
Basically looks like the issue will require human judgement.