Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 9 years ago.
Improve this question
I need help building a regex.
In my MVC5 view I have a text area that will contain or more groups of integers which can contain 6, 7, or 8 characters each.
In my controller I need to extract all of these numbers from the input string and put them into an array.
Examples would be:
123456 123457 123458
or
123456
123457
123458
or
123456,123457, 123458
These groups may or may not have 1 or 2 leading zeroes:
00123456, 00123457 123458
This is what I ended up with:
public string[] ExtractWorkOrderNumbers(string myText)
{
var result = new List<string>();
var regex = new Regex(#"( |,)*(\d+)");
var m = regex.Match(myText);
while (m.Success)
{
for (int i = 1; i <= 2; i++)
{
var wo = m.Groups[2].ToString();
if (result.Count == 0)
{
result.Add(wo);
}
else
{
var x = (from b in result where b == wo select b).ToList().Count;
if (x == 0) result.Add(wo);
}
}
m = m.NextMatch();
}
return result.ToArray();
}
Assumption: zero or more spaces and/or commas serve as delimiters.
[TestMethod()]
public void TestMethod3()
{
var myText = "123456 1234567, 123458, 00123456, 01234567";
var regex = new Regex(#"( |,)*(\d+)");
var m = regex.Match(myText);
var matchCount = 0;
while (m.Success)
{
Console.WriteLine("Match" + (++matchCount));
for (int i = 1; i <= 2; i++)
{
Group g = m.Groups[i];
Console.WriteLine("Group" + i + "='" + g + "'");
CaptureCollection cc = g.Captures;
for (int j = 0; j < cc.Count; j++)
{
Capture c = cc[j];
Console.WriteLine("Capture" + j + "='" + c + "', Position=" + c.Index);
}
}
m = m.NextMatch();
}
}
Output:
(For each match, all Group2's are your matches, Group1 is the delimiter)
Match1
Group1=''
Group2='123456'
Capture0='123456', Position=0
Match2
Group1=' '
Capture0=' ', Position=6
Group2='1234567'
Capture0='1234567', Position=7
Match3
Group1=' '
Capture0=',', Position=14
Capture1=' ', Position=15
Group2='123458'
Capture0='123458', Position=16
Match4
Group1=' '
Capture0=',', Position=22
Capture1=' ', Position=23
Group2='00123456'
Capture0='00123456', Position=24
Match5
Group1=' '
Capture0=',', Position=32
Capture1=' ', Position=33
Group2='01234567'
Capture0='01234567', Position=34
By using the named capturing groups feature of regular expressions (Regex), we can extract the data from matching patterns. In your case, we can extract the non-zero integer portion of the text string:
using System.Text.RegularExpressions;
// A pattern consisting of at most two leading zeros followed by 6 to 8 non-zero digits.
var regex = new Regex(#"^[0]{0,2}(?<Value>[1-9]{6,8})$");
var firstString = "123456";
var secondString = "01234567";
var thirdString = "0012345678";
var firstMatch = regex.Match(firstString);
var secondMatch = regex.Match(secondString);
var thirdMatch = regex.Match(thirdString);
int firstValue = 0;
int secondValue = 0;
int thirdValue = 0;
if (firstMatch.Success)
int.TryParse(firstMatch.Groups["Value"].Value, out firstValue);
if (secondMatch.Success)
int.TryParse(secondMatch.Groups["Value"].Value, out secondValue);
if (thirdMatch.Success)
int.TryParse(thirdMatch.Groups["Value"].Value, out thirdValue);
Console.WriteLine("First Value = {0}", firstValue);
Console.WriteLine("Second Value = {0}", secondValue);
Console.WriteLine("Third Value = {0}", thirdValue);
Output:
First Value = 123456
Second Value = 1234567
Third Value = 12345678
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I have a .txt file with the following:
F Am G F
I was tired of my lady,
Gm7 C E D C
We'd been together too long.
Dm7 F Am G F
Like a worn out recording,
Gm7 C E D C
of a favorite song.
I want to produce a txt file with the following output:
I was tired of my [F]lady,[Am][G][F]
We'd been toge[Gm7]ther too [C]long.[E][D][C]
Like a worn [Dm7]out recor[F]ding,[Am][G][F]
of a fa[Gm7]vorite [C]song.[E][D][C]
Note:
The chords (i.e. F, Am, G, F etc.) have been inserted into the line below (it can be before or in a word; approximate location is fine)
Square brackets have been added around the chords (i.e. F, Am, G, F etc.)
I am a C# developer, so I would like to use a C# library of some sort to do the above.
As per Flydog57:
Use the File class to read the first 2 lines into 2 string (chords and text). Create a StringBuilder object (say buffer). Find the index of the first non-space character in chords. Get the Substring of the text string up to that point, and Append it to buffer. Get a Substring from chords (based on the index of the next space). Format it using string interpolation and Append it to buffer. Repeat to the end of the line, and then repeat for every pair of lines in the file
The following code uses chord positions and lyric character positions to merge every two lines in your Lyric/Chord data.
The below class will do the work to parse the lines and merge them into
a single line with Lyrics and Chords.
public class LyricAndChordMerger
{
public IList<string> MakeMergedLines(string[] lines)
{
IList<string> mergedLines = new List<string>();
for (int i = 0; i < lines.Length; i = i + 2)
{
string chordLine = lines[i];
string lyricLine = lines[i + 1];
Dictionary<int, string> chords = MakeChordsArray(chordLine);
string mergedLine = string.Empty;
for (int j = 0; j < chordLine.Length; j++)
{
string chord = string.Empty;
if (chords.ContainsKey(j))
{
chord = chords[j] ?? "";
if (chord.Length > 0) chord = string.Format("[{0}]", chord);
}
string lyricChar = "";
if (lyricLine.Length > j)
{
lyricChar = lyricLine[j].ToString();
}
mergedLine += chord + lyricChar;
}
mergedLines.Add(mergedLine);
}
return mergedLines;
}
public Dictionary<int, string> MakeChordsArray(string chordLine)
{
string[] values = chordLine.Split(' ');
Dictionary<int, string> chordsAndPositions = new Dictionary<int, string>();
int indexOffset = 0;
for (int i = 0; i < values.Count(); i++)
{
int index = i + indexOffset;
chordsAndPositions.Add(index, values[i]);
int valueLength = values[i].Length;
indexOffset += valueLength <= 1 ? 0 : valueLength - 1;
}
return chordsAndPositions;
}
}
And you would use it like so...
string inputFile = "[path to your lyrics and chord file]";
string[] inputLines;
using (var sr = new StreamReader(inputFile))
{
inputLines = sr.ReadToEnd().Split(new char[]{'\n','\r'}, StringSplitOptions.RemoveEmptyEntries);
}
var merger = new LyricAndChordMerger();
IList<string> mergedLines = merger.MakeMergedLines(inputLines);
foreach (string line in mergedLines)
{
Console.WriteLine(line);
}
And the output looks like...
I was tired of my [F]lady[Am],[G][F]
We'd been to[Gm7]gether too [C]long[E].[D][C]
Like a worn [Dm7]out reco[F]rding[Am],[G][F]
of a [Gm7]favorite [C]song[E].[D][C]
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
How to add like number or character pattern to my random keygen?
and is it hard becuse im new to coding :) Thx for Help!
it took me alot of time to get to this and been stuck here for 1 and half day and can't find way to add patterns to this
Like This :
D4B6C5604E26-4F1198-44C1
EA3705694B8A-478E83-2D01
D3B8E2DE7BFC-49CF95-68E6
A6CD996B352A-48B89A-8C69
After - 4 Numbers and After second - 3 Numbers
static void Main(string[] args)
{
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var stringChars = new char[12];
var stringChars4 = new char[6];
var stringChars7 = new char[4];
var random = new Random();
for (int i = 0; i < stringChars.Length; i++)
{
stringChars[i] = chars[random.Next(chars.Length)];
}
for (int i = 0; i < stringChars4.Length; i++)
{
stringChars4[i] = chars[random.Next(chars.Length)];
}
for (int i = 0; i < stringChars7.Length; i++)
{
stringChars7[i] = chars[random.Next(chars.Length)];
}
var finalString = new String(stringChars);
var finalString4 = new String(stringChars4);
var finalString7 = new String(stringChars7);
var chars2 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var stringChars2 = new char[12];
var stringChars5 = new char[6];
var stringChars8 = new char[4];
var randoms = new Random();
for (int i = 0; i < stringChars.Length; i++)
{
stringChars2[i] = chars2[random.Next(chars.Length)];
}
for (int i = 0; i < stringChars5.Length; i++)
{
stringChars5[i] = chars2[random.Next(chars.Length)];
}
for (int i = 0; i < stringChars8.Length; i++)
{
stringChars8[i] = chars2[random.Next(chars.Length)];
}
var finalString2 = new String(stringChars2);
var finalString8 = new String(stringChars8);
var finalString5 = new String(stringChars5);
var chars3 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var stringChars3 = new char[12];
var stringChars6 = new char[6];
var stringChars9 = new char[4];
var randomss = new Random();
for (int i = 0; i < stringChars3.Length; i++)
{
stringChars3[i] = chars3[random.Next(chars3.Length)];
}
for (int i = 0; i < stringChars6.Length; i++)
{
stringChars6[i] = chars3[random.Next(chars3.Length)];
}
for (int i = 0; i < stringChars9.Length; i++)
{
stringChars9[i] = chars3[random.Next(chars3.Length)];
}
var finalString3 = new String(stringChars3);
var finalString6 = new String(stringChars6);
var finalString9 = new String(stringChars9);
Console.WriteLine("Keys:");
Console.WriteLine();
Console.ReadKey();
Console.WriteLine(finalString + "-" + finalString4 + "-" + finalString7);
Console.WriteLine();
Console.ReadKey();
Console.WriteLine(finalString2 + "-" + finalString5 + "-" + finalString8);
Console.WriteLine();
Console.ReadKey();
Console.WriteLine(finalString3 + "-" + finalString6 + "-" + finalString9);
Console.WriteLine();
Console.ReadKey();
}
Assuming you are looking for "get string of random characters (from given set of characters) that formatted to given specification like 'xxx-xx-xxxx!xxx' where 'x' is random character".
Regex.Replace is a nice way to construct such string - it let you run arbitrary code to construct replacement - so replacing every 'x' with randomly selected character will produce result you seem to be looking for:
var r = new Random();
// convert string to array of strings for individual characters as Replace wants strings
var charsAsStrings = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
.Select(x=>x.ToString()).ToArray();
var result = Regex.Replace("xxx-xxx", "x",
m => charsAsStrings[r.Next(charsAsStrings.Length)]));
Notes:
make sure to read Random number generator only generating one random number to properly instantiate Random.
random numbers/strings are not unique. Presumably you will store them in some sort of list/database and re-generate the once that are not unique
using similarly-looking symbols like 'O' and '0' (or 'I', 'l','1') in strings that may need to be read by humans is not the best idea.
Create a function for the code generation, it makes the main method more readable.
private static readonly Random _random = new Random();
private static string CreateCode()
{
var bytes = new byte[11];
_random.NextBytes(bytes);
string s = BitConverter.ToString(bytes).Replace("-", "");
string result = new StringBuilder(s)
.Insert(18, '-')
.Insert(12, '-')
.ToString();
return result;
}
static void Main(string[] args)
{
const int N = 3;
var codes = new string[N];
for (int i = 0; i < N; i++) {
codes[i] = CreateCode();
Console.WriteLine(codes[i]);
}
}
I use the Random.NextBytes method to generate random bytes. We need 11 of them, because one byte is represented by 2 hex positions.
Your codes are in hexadecimal format, i,e, they contain only the letters A - F and digits. This solution uses the BitConverter to format a byte array as hexadecimal string. It produces strings like ""BD-EB-1F-0C-9B-9E-0C-F5-6E-2E-46". Therefore it is necessary to remove the "-" first.
Then I convert the string into a StringBuilder. The latter one has a Insert method that we can use to insert dashes at the required places. I insert the second one first, so that the index of the other one is not shifted.
You could also simply call Console.WriteLine(CreateCode()); three times and not create the codes array. But if you want to do other things with the codes, like saving them to a file or copy them to the cilpboard, it's better to store them somewhere.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
The sequence should go like this.
A-Z,AA-AZ,BA-BZ,CA-CZ,.......,ZA-ZZ
After ZZ it should start from AAA.
Then AAA to ZZZ and then AAAA to ZZZZ and so on.
This sequence is pretty much like that of an Excel sheet.
Edit: Added my code
private void SequenceGenerator()
{
var numAlpha = new Regex("(?<Numeric>[0-9]*)(?<Alpha>[a-zA-Z]*)");
var match = numAlpha.Match(txtBNo.Text);
var alpha = match.Groups["Alpha"].Value;
var num = Convert.ToInt32(match.Groups["Numeric"].Value);
lastChar = alpha.Substring(alpha.Length - 1);
if (lastChar=="Z")
{
lastChar = "A";
txtBNo.Text = num.ToString() + "A" + alpha.Substring(0, alpha.Length - 1) + lastChar;
}
else
{
txtBNo.Text = num.ToString() + alpha.Substring(0, alpha.Length - 1) + Convert.ToChar(Convert.ToInt32(Convert.ToChar(lastChar)) + 1);
}
}
This is what I've done. But, I know that is a wrong logic.
Thanks.
As I've wrote in the comment, it's a base-conversion problem, where your output is in base-26, with symbols A-Z
static string NumToLetters(int num)
{
string str = string.Empty;
// We need to do at least a "round" of division
// to handle num == 0
do
{
// We have to "prepend" the new digit
str = (char)('A' + (num % 26)) + str;
num /= 26;
}
while (num != 0);
return str;
}
Lucky for you, I've done this once before. the problems I've encountered is that in the Excel sheet there is no 0, not even in double 'digit' 'numbers'. meaning you start with a (that's 1) and then from z (that's 26) you go straight to aa (27). This is why is't not a simple base conversion problem, and you need some extra code to handle this.
Testing the function suggested by xanatos results with the following:
NumToLetters(0) --> A
NumToLetters(25) --> Z
NumToLetters(26) --> BA
My solution has more code but it has been tested against Excel and is fully compatible, except it starts with 0 and not 1, meaning that a is 0, z is 25, aa is 26, zz 701, aaa is 702 and so on). you can change it to start from 1 if you want, it's fairly easy.
private static string mColumnLetters = "zabcdefghijklmnopqrstuvwxyz";
// Convert Column name to 0 based index
public static int ColumnIndexByName(string ColumnName)
{
string CurrentLetter;
int ColumnIndex, LetterValue, ColumnNameLength;
ColumnIndex = -1; // A is the first column, but for calculation it's number is 1 and not 0. however, Index is alsways zero-based.
ColumnNameLength = ColumnName.Length;
for (int i = 0; i < ColumnNameLength; i++)
{
CurrentLetter = ColumnName.Substring(i, 1).ToLower();
LetterValue = mColumnLetters.IndexOf(CurrentLetter);
ColumnIndex += LetterValue * (int)Math.Pow(26, (ColumnNameLength - (i + 1)));
}
return ColumnIndex;
}
// Convert 0 based index to Column name
public static string ColumnNameByIndex(int ColumnIndex)
{
int ModOf26, Subtract;
StringBuilder NumberInLetters = new StringBuilder();
ColumnIndex += 1; // A is the first column, but for calculation it's number is 1 and not 0. however, Index is alsways zero-based.
while (ColumnIndex > 0)
{
if (ColumnIndex <= 26)
{
ModOf26 = ColumnIndex;
NumberInLetters.Insert(0, mColumnLetters.Substring(ModOf26, 1));
ColumnIndex = 0;
}
else
{
ModOf26 = ColumnIndex % 26;
Subtract = (ModOf26 == 0) ? 26 : ModOf26;
ColumnIndex = (ColumnIndex - Subtract) / 26;
NumberInLetters.Insert(0, mColumnLetters.Substring(ModOf26, 1));
}
}
return NumberInLetters.ToString().ToUpper();
}
Try this method:
public static IEnumerable<string> GenerateItems()
{
var buffer = new[] { '#' };
var maxIdx = 0;
while(true)
{
var i = maxIdx;
while (true)
{
if (buffer[i] < 'Z')
{
buffer[i]++;
break;
}
if (i == 0)
{
buffer = Enumerable.Range(0, ++maxIdx + 1).Select(c => 'A').ToArray();
break;
}
buffer[i] = 'A';
i--;
}
yield return new string(buffer);
}
// ReSharper disable once FunctionNeverReturns
}
This is infinite generator of alphabetical sequence you need, you must restrict count of items like this:
var sequence = GenerateItems().Take(10000).ToArray();
Do not call it like this (it cause infinite loop):
foreach (var i in GenerateItems())
Console.WriteLine(i);
This question already has answers here:
Add separator to string at every N characters?
(15 answers)
Closed 8 years ago.
The string displays value as:
123456789012
I need it like:
1234 5678 9012
There should be space between every 4 characters in this string. How do I do that?
displaynum_lbl.Text = Regex.Replace(printClass.mynumber.ToString(), ".{4}", "$0");
Assuming that it's fine to work from right-to-left, this should do the trick:
displaynum_lbl.Text = System.Text.RegularExpressions.Regex.Replace(printClass.mynumber.ToString(), ".{4}", "$0 ");
You can find that and a good deal more information in other StackOverflow answers, example: Add separator to string at every N characters?
String abc = "123456789012";
for (int i = 4; i <= abc.Length; i += 4)
{
abc = abc.Insert(i, " ");
i++;
}
You can do this in LINQ:
var s = "123456789012";
var list = Enumerable
.Range(0, s.Length/4)
.Select(i => s.Substring(i*4, 4))
.ToList();
var res = string.Join(" ", list);
Console.WriteLine(res);
Fiddle
public string InsertSpaces(string s)
{
char[] result = new char[s.Length + (s.Length / 4)];
for (int i = 0, target = 0; i < s.Length; i++)
{
result[target++] = s[i];
if (i & 3 == 3)
result[target++] = ' ';
}
return new string(result);
}
I am creating a program that checks if the word is simplified word(txt, msg, etc.) and if it is simplified it finds the correct spelling like txt=text, msg=message. Iam using the NHunspell Suggest Method in c# which suggest all possible results.
The problem is if I inputted "txt" the result is text,tat, tot, etc. I dont know how to select the correct word. I used Levenshtein Distance (C# - Compare String Similarity) but the results still results to 1.
Input: txt
Result: text = 1, ext = 1 tit = 1
Can you help me how to get the meaning or the correct spelling of the simplified words?
Example: msg
I have tested your input with your sample data and only text has a distance of 25 whereas the other have a distance of 33. Here's my code:
string input = "TXT";
string[] words = new[]{"text","tat","tot"};
var levenshtein = new Levenshtein();
const int maxDistance = 30;
var distanceGroups = words
.Select(w => new
{
Word = w,
Distance = levenshtein.iLD(w.ToUpperInvariant(), input)
})
.Where(x => x.Distance <= maxDistance)
.GroupBy(x => x.Distance)
.OrderBy(g => g.Key)
.ToList();
foreach (var topCandidate in distanceGroups.First())
Console.WriteLine("Word:{0} Distance:{1}", topCandidate.Word, topCandidate.Distance);
and here is the levenshtein class:
public class Levenshtein
{
///*****************************
/// Compute Levenshtein distance
/// Memory efficient version
///*****************************
public int iLD(String sRow, String sCol)
{
int RowLen = sRow.Length; // length of sRow
int ColLen = sCol.Length; // length of sCol
int RowIdx; // iterates through sRow
int ColIdx; // iterates through sCol
char Row_i; // ith character of sRow
char Col_j; // jth character of sCol
int cost; // cost
/// Test string length
if (Math.Max(sRow.Length, sCol.Length) > Math.Pow(2, 31))
throw (new Exception("\nMaximum string length in Levenshtein.iLD is " + Math.Pow(2, 31) + ".\nYours is " + Math.Max(sRow.Length, sCol.Length) + "."));
// Step 1
if (RowLen == 0)
{
return ColLen;
}
if (ColLen == 0)
{
return RowLen;
}
/// Create the two vectors
int[] v0 = new int[RowLen + 1];
int[] v1 = new int[RowLen + 1];
int[] vTmp;
/// Step 2
/// Initialize the first vector
for (RowIdx = 1; RowIdx <= RowLen; RowIdx++)
{
v0[RowIdx] = RowIdx;
}
// Step 3
/// Fore each column
for (ColIdx = 1; ColIdx <= ColLen; ColIdx++)
{
/// Set the 0'th element to the column number
v1[0] = ColIdx;
Col_j = sCol[ColIdx - 1];
// Step 4
/// Fore each row
for (RowIdx = 1; RowIdx <= RowLen; RowIdx++)
{
Row_i = sRow[RowIdx - 1];
// Step 5
if (Row_i == Col_j)
{
cost = 0;
}
else
{
cost = 1;
}
// Step 6
/// Find minimum
int m_min = v0[RowIdx] + 1;
int b = v1[RowIdx - 1] + 1;
int c = v0[RowIdx - 1] + cost;
if (b < m_min)
{
m_min = b;
}
if (c < m_min)
{
m_min = c;
}
v1[RowIdx] = m_min;
}
/// Swap the vectors
vTmp = v0;
v0 = v1;
v1 = vTmp;
}
// Step 7
/// Value between 0 - 100
/// 0==perfect match 100==totaly different
///
/// The vectors where swaped one last time at the end of the last loop,
/// that is why the result is now in v0 rather than in v1
//System.Console.WriteLine("iDist=" + v0[RowLen]);
int max = System.Math.Max(RowLen, ColLen);
return ((100 * v0[RowLen]) / max);
}
///*****************************
/// Compute the min
///*****************************
private int Minimum(int a, int b, int c)
{
int mi = a;
if (b < mi)
{
mi = b;
}
if (c < mi)
{
mi = c;
}
return mi;
}
///*****************************
/// Compute Levenshtein distance
///*****************************
public int LD(String sNew, String sOld)
{
int[,] matrix; // matrix
int sNewLen = sNew.Length; // length of sNew
int sOldLen = sOld.Length; // length of sOld
int sNewIdx; // iterates through sNew
int sOldIdx; // iterates through sOld
char sNew_i; // ith character of sNew
char sOld_j; // jth character of sOld
int cost; // cost
/// Test string length
if (Math.Max(sNew.Length, sOld.Length) > Math.Pow(2, 31))
throw (new Exception("\nMaximum string length in Levenshtein.LD is " + Math.Pow(2, 31) + ".\nYours is " + Math.Max(sNew.Length, sOld.Length) + "."));
// Step 1
if (sNewLen == 0)
{
return sOldLen;
}
if (sOldLen == 0)
{
return sNewLen;
}
matrix = new int[sNewLen + 1, sOldLen + 1];
// Step 2
for (sNewIdx = 0; sNewIdx <= sNewLen; sNewIdx++)
{
matrix[sNewIdx, 0] = sNewIdx;
}
for (sOldIdx = 0; sOldIdx <= sOldLen; sOldIdx++)
{
matrix[0, sOldIdx] = sOldIdx;
}
// Step 3
for (sNewIdx = 1; sNewIdx <= sNewLen; sNewIdx++)
{
sNew_i = sNew[sNewIdx - 1];
// Step 4
for (sOldIdx = 1; sOldIdx <= sOldLen; sOldIdx++)
{
sOld_j = sOld[sOldIdx - 1];
// Step 5
if (sNew_i == sOld_j)
{
cost = 0;
}
else
{
cost = 1;
}
// Step 6
matrix[sNewIdx, sOldIdx] = Minimum(matrix[sNewIdx - 1, sOldIdx] + 1, matrix[sNewIdx, sOldIdx - 1] + 1, matrix[sNewIdx - 1, sOldIdx - 1] + cost);
}
}
// Step 7
/// Value between 0 - 100
/// 0==perfect match 100==totaly different
//System.Console.WriteLine("Dist=" + matrix[sNewLen, sOldLen]);
int max = System.Math.Max(sNewLen, sOldLen);
return (100 * matrix[sNewLen, sOldLen]) / max;
}
}
Not a complete solution, just a hopefully helpful suggestion...
It seems to me that people are unlikely to use simplifications that are as long as the correct word, so you could at least filter out all results whose length <= the input's length.
You really need to implement the SOUNDEX routine that exists in SQL. I've done that in the following code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Soundex
{
class Program
{
static char[] ignoreChars = new char[] { 'a', 'e', 'h', 'i', 'o', 'u', 'w', 'y' };
static Dictionary<char, int> charVals = new Dictionary<char, int>()
{
{'b',1},
{'f',1},
{'p',1},
{'v',1},
{'c',2},
{'g',2},
{'j',2},
{'k',2},
{'q',2},
{'s',2},
{'x',2},
{'z',2},
{'d',3},
{'t',3},
{'l',4},
{'m',5},
{'n',5},
{'r',6}
};
static void Main(string[] args)
{
Console.WriteLine(Soundex("txt"));
Console.WriteLine(Soundex("text"));
Console.WriteLine(Soundex("ext"));
Console.WriteLine(Soundex("tit"));
Console.WriteLine(Soundex("Cammmppppbbbeeelll"));
}
static string Soundex(string s)
{
s = s.ToLower();
StringBuilder sb = new StringBuilder();
sb.Append(s.First());
foreach (var c in s.Substring(1))
{
if (ignoreChars.Contains(c)) { continue; }
// if the previous character yields the same integer then skip it
if ((int)char.GetNumericValue(sb[sb.Length - 1]) == charVals[c]) { continue; }
sb.Append(charVals[c]);
}
return string.Join("", sb.ToString().Take(4)).PadRight(4, '0');
}
}
}
See, with this code, the only match out of the examples you gave would be text. Run the console application and you'll see the output (i.e. txt would match text).
One method I think programs like word uses to correct spellings, is to use NLP (Natural Language Processing) techniques to get the order of Nouns/Adjectives used in the context of the spelling mistakes.. then comparing that to known sentence structures they can estimate 70% chance the spelling mistake was a noun and use that information to filter the corrected spellings.
SharpNLP looks like a good library but I haven't had a chance to fiddle with it yet. To build a library of known sentence structures BTW, in uni we applied our algorithms to public domain books.
check out sams simMetrics library I found on SO (download here, docs here) for loads more options for algorithms to use besides Levenshtein distance.
Expanding on my comment, you could use regex to search for a result that is an 'expansion' of the input. Something like this:
private int stringSimilarity(string input, string result)
{
string regexPattern = ""
foreach (char c in input)
regexPattern += input + ".*"
Match match = Regex.Match(result, regexPattern,
RegexOptions.IgnoreCase);
if (match.Success)
return 1;
else
return 0;
}
Ignore the 1 and the 0 - I don't know how similarity valuing works.