c# longest common words sample - c#

i am looking for a longest common words c# implementation. Most of the samples i have came across are comparing character by character.
in otherwords,
string1 = access
string2 = advised
should return null output from the function
any sample codes?

I think this problem is usually referred to as the Longest common substring problem. The Wikipedia article contains pseudocode, and C# implementations can be found on the Web.

If by word you mean these letter things, seperated from the others by punktuation, try this:
private String longestCommonWord(String s1, String s2)
{
String[] seperators = new String[] { " ", ",", ".", "!", "?", ";" };
var result = from w1 in s1.Split(seperators, StringSplitOptions.RemoveEmptyEntries)
where (from w2 in s2.Split(seperators, StringSplitOptions.RemoveEmptyEntries)
where w2 == w1
select w2).Count() > 0
orderby w1.Length descending
select w1;
if (result.Count() > 0)
{
return result.First();
}
else
{
return null;
}
}
This probably is not the most elegant way to do it, but it works for me. =)

Turning the algorithm which computes LCS of arrays of characters into one that does it to arrays of anything else -- like, say, an array of words -- is usually pretty straightforward. Have you tried that?
If you need some hints, here's an article I wrote a couple years ago on how to implement Longest Common Subsequence on an array of words in JScript. You should be able to adapt it to C# without too much difficulty.
http://blogs.msdn.com/ericlippert/archive/2004/07/21/189974.aspx

Finding differences in strings is called the Longest Common Subsequence problem. The following is a generic solution to the LCS problem, written in C#:
static int[,] GetLCSDifferenceMatrix<T>(
Collection<T> baseline,
Collection<T> revision)
{
int[,] matrix = new int[baseline.Count + 1, revision.Count + 1];
for (int baselineIndex = 0; baselineIndex < baseline.Count; baselineIndex++)
{
for (int revisionIndex = 0; revisionIndex < revision.Count; revisionIndex++)
{
if (baseline[baselineIndex].Equals(revision[revisionIndex]))
{
matrix[baselineIndex + 1, revisionIndex + 1] =
matrix[baselineIndex, revisionIndex] + 1;
}
else
{
int possibilityOne = matrix[baselineIndex + 1, revisionIndex];
int possibilityTwo = matrix[baselineIndex, revisionIndex + 1];
matrix[baselineIndex + 1, revisionIndex + 1] =
Math.Max(possibilityOne, possibilityTwo);
}
}
}
return matrix;
}
This code gives you a "difference" matrix, which can then be used to construct the difference from the two inputs. For unit tests and example usage, see http://sethflowers.com/2012/01/18/basic-diff-with-a-generic-solution-to-the-longest-common-subsequence-problem.html.

Related

is there a simple shortcut to get the next to last item from an array?

Is there a simple way to get the next to last item from an array? For example, I might have the following string arrays:
var myArr1 = new string[]{"S1", "S2", "S3"};
var myArr2 = new string[]{"S1", "S2", "S3", "S4"};
I need to write a generic routine which would return "S2.S3" for myArr1 and "S3.S4" for myArr2. Getting then last item is easy enough with Last() but there doesn't appear to be a Last(-1) option. Is there anything similar to this? If not then what would be the easiest and most elegant way to do this?
You can get the second last element of an array using this code...
myArr1[myArr1.Length - 2]
myArr2[myArr2.Length - 2]
Output
S2
S3
Online Demo: http://rextester.com/AERTN64718
 
Updated...
myArr1[myArr1.Length - 2] + "." + myArr1[myArr1.Length - 1]
myArr2[myArr2.Length - 2] + "." + myArr2[myArr2.Length - 1]
or
myArr1[myArr1.Length - 2] + "." + myArr1.Last()
myArr2[myArr2.Length - 2] + "." + myArr2.Last()
Output
S2.S3
S3.S4
Online Demo: http://rextester.com/DJU86580
This is based of off #Ehsan's answer (which was in VB, but I translated it to C#)
string LastTwoStrings(string[] array)
{
return (array[array.Length-2] + "." + array[array.Length-1]);
}
However, this WILL throw an exception if the array is smaller than 2.
Using System.Linq you could do:
String.Join(".", arr.Skip(arr.Length - 2));
Depending on your C# version (8.0+) you may also have access to an unary expression that counts the index in reverse from the current collection.
E.g. just like indexing from the front [0] to get 1st entry, you can get the the last entry by counting from the back [^1].
This is to be read like "length-1" of the current collection.
private readonly string[] _entries = new[]
{
"Hello",
"How's life?",
"Likewise",
"Bye"
};
public string GetLastEntry()
{
return _entries[^1]; // Returns "Bye" and is the same as "_entries[_entries.Length-1]"
}
So for your particular example, you could just use string.Join() for the string-based use case, but let's try the reverse index. With those, we could output the last n entries like here:
private readonly string[] _entries = { "S1", "S2", "S3", "S4" };
public string GetTailingEntries(int depth)
{
if (depth < 0 || depth > _entries.Length)
{
// Catch this however you require
throw new IndexOutOfRangeException();
}
string output = _entries[^depth--];
while (depth > 0)
{
output += $".{_entries[^depth--]}";
}
return output;
}
Essentially we take any depth, let's take '2' like in your question. It would initialize the output with the furthest entry from the end ([^2]) which is "S3" and decrement the depth, putting the next entry at [^1]. In the loop we append the values to the output until we reached our desired depth.
GetTailingEntries(2) // "S3" -> "S3.S4"
GetTailingEntries(4) // "S1" -> "S1.S2" -> "S1.S2.S3" -> "S1.S2.S3.S4"
With LINQ
This need is so basic and intuitive that I would consider writing my own Last(int) extension method-- exactly the one you'd imagine-- and it might look like this:
public static class ExtensionMethods
{
public static IEnumerable<T> Last<T>(this IEnumerable<T> This, int count)
{
return This.Reverse().Take(count).Reverse();
}
}
You can then get what you need with:
Console.WriteLine
(
string.Join(".", myArr1.Last(2))
);
Which outputs:
S2.S3
Using array (more efficient)
On the other hand, if you're looking for efficiency, you should just work with the array and the known indices, which will be more efficient that using IEnumerable (which will have to scan).
public static IEnumerable<T> Last<T>(this T[] This, int count)
{
var i = This.Length;
if (count > i) count = i;
while (count-- > 0) yield return This[--i];
}
...does exactly the same thing, but only for arrays.

C# Reverse() function not working properly

I'm really confused why the reverse function isn't working properly..
I currently have
List<string> decimalVector = new List<string>();
string tempString = "10"
//For Vector Representation
for (int i = 0; i < tempString.Length; i++)
{
//As long as we aren't at the last digit...
if (i != (tempString.Length-1))
{
decimalVector.Add(tempString[i].ToString() + ",");
}
else
{
decimalVector.Add(tempString[i].ToString());
}
}
Console.Write("Decimal: " + decimalOutput);
Console.Write(" Vector Representation: [");
decimalVector.Reverse();
for (int i = 0; i < decimalVector.Count; i++)
{
Console.Write(decimalVector[i]);
}
Console.Write("]");
For some reason instead of the code outputting [0,1] as it should - since that is the reverse of what is currently in the decimalVector ([1,0]) ..It prints out [01,] I am so confused. Why is it randomly moving my comma out of place? Am I doing something really stupid and not seeing it?
You're reversing the order of the elements, not the order of the characters. It's 1, followed by 0. When reversed it's 0 followed by 1,. When you print that, you get 01,.
You should not include the separating , as part of the list elements, but rather only add it when printing.
Btw there is the string.Join method, which solves your problem elegantly:
string.join(",", tempString.Select(c => c.ToString()).Reverse())
Try this:
foreach (string s in decimalVector.Reverse())
{
Console.Write(s);
}

Replace strings in C#

This might be a very basic question. I need to write a code which works similar as string replace algorithm.
static string stringReplace(string s, string stringOld, string stringNew)
{
string newWord = "";
int oldMax = stringOld.Length;
int index = 0;
for (int i = 0; i < s.Length; i++)
{
if (index != oldMax && s[i] == stringOld[index])
{
if (stringOld[index] < stringNew[index])
{
newWord = newWord + stringNew[index];
index++;
}
else
{
newWord = newWord + stringNew[index];
}
}
else
{
newWord = newWord + s[i];
}
}
return newWord;
}
Since it's 3am the code above is probably bugged. When the new word is shorter than the old one, it goes wrong. Same as when it's longer. When the index variable is equal for both stringOld and stringNew, it will do the swap. I think... Please don't post "use string.Replace(), I have to write that algorithm myself...
I don't know what you're trying to do with your code, but the problem is not a small one.
Think logically about what you are trying to do.
It is a two step process:
Find the starting index of stringOld in s.
If found replace stringOld with stringNew.
Step 1:
There are many rather complex (and elegant) efficient string search algorithms, you can search for them online or look at popular 'Introduction to Algorithms' by Cormen, Leiserson, Rivest & Stein, but the naive approach involves two loops and is pretty simple. It is also described in that book (and online.)
Step 2:
If a match is found at index i; simply copy characters 0 to i-1 of s to newWord, followed by newString and then the rest of the characters in s starting at index i + oldString.Length.

How to parse the following string

I am trying to generate a formula which could be anything like this, this is just a sample,
A + B + C > D - A
Now, A, B, C, D, etc are Column Names of a sheet (like excel sheet) i will be accessing in memory.
I need to generate a Rule, like the above A + B + C > D - A which will decide what kind of values user can add in a Cell.
Currently this is how i have begun:
string toValidate = "A + B + C > D + E - A";
string lhs = "", rhs = "";
string[] comparisonOperators = new string[] { "=", ">", "<", "<>", "!=" };
char[] arithmeticOperators = { '+', '-', '/', '*' };
toValidate = toValidate.Replace(#" ", "");
for (int i = 0; i < comparisonOperators.Length; i++)
{
if (toValidate.Contains(comparisonOperators[i]))
{
operatorIndex = toValidate.IndexOf(comparisonOperators[i]);
break;
}
}
lhs = toValidate.Substring(0, operatorIndex);
rhs = toValidate.Substring(operatorIndex + 1);
string[] columnLhsList = lhs.Split(arithmeticOperators);
string[] columnRhsList = rhs.Split(arithmeticOperators);
However even though i have the strings as lhs and rhs and even my operator which > in the above code, i am not able to understand how can i apply the formula on the sheet itself. I just need to know which Column has which operator associated.
Since i have the individual column names, but not the operator before them, for e.g,
+ before A - before A in another case.
How do i parse the above please help.
It is, however, a very fun question if you want to make simple formula parsers like this yourself.
I advice you to check out this article, since it is very clearly written and understandable because of it.
Shunting-yard Algorithm
Personally, I would never try/dare to create my own formula expression parser. Instead, I would (and did) use one of the may available ones, e.g. NCalc over at CodePlex.com.
Using these tools, it is as easy as writing
Expression e = new Expression("2 + 3 * 5");
Debug.Assert(17 == e.Evaluate());
to get your formula evaluated.
Usually such libraries are very solid, well tested and have a rich function set. It would take ages (if ever) to do such a high quality library on my own.
To further cite the NCalc website, you can even use variables like e.g.:
Expression e = new Expression("Round(Pow([Pi], 2) + Pow([Pi2], 2) + [X], 2)");
e.Parameters["Pi2"] = new Expression("Pi * [Pi]");
e.Parameters["X"] = 10;
e.EvaluateParameter +=
delegate(string name, ParameterArgs args)
{
if (name == "Pi")
args.Result = 3.14;
};
Debug.Assert(117.07 == e.Evaluate());

VB6 to C# InsTR function conversion issue

Is the VB6 code
i = InStr(1, strText, "Mc", CompareMethod.Binary)
If (i <> 0) And (i + 2 <= lngLength) Then Mid(strText, i + 2, 1) = UCase(Mid(strText, i + 2, 1))
doing the same as
i = strText.IndexOf("Mc");
if ((i != 1) && (i + 2 <= lngLength))
{
strText = strText.Substring(i + 2, 1);
strText = strText.ToUpper();
}
in C#? i is an int that has been initialized. Now I did make adjustment with the returned value if the comparisons are good from 0 in VB6 to 1 in C#.
It’s not doing the same. The assignment
Mid(strText, i + 2, 1) = UCase(Mid(strText, i + 2, 1))
replaces only that part (i.e. a single character at i+2) inside the string, and leaves the rest untouched. Your C# code throws the rest of the string away.
Since .NET strings are immutable, this approach cannot be directly translated.
The closest translation is to construct the string explicitly, i.e. to do
strText = strText.Substring(0, i + 1) +
strText.Substring(i + 2, 1).ToUpper() +
strText.Substring(i + 3);
However, doing this a lot inside a loop is very inefficient, which is why .NET offers the StringBuilder class for repeated constructions of strings. In general, VB6 code which manipulates strings in-place is best translated by using said StringBuilder.
That said, there is probably a simpler translation by going after the intent after the first code, rather than the letter. In both VB6 and C# you wouldn’t use InStr followed by substitution – you’d directly use String.Replace.
Also beware of the changed indices (C# and VB.NET are 0-based, VB6 may be 1-based).
StringBuilder strText = new StringBuilder("Mcdonald");
int lngLength = strText.Length;
int i = strText.ToString().IndexOf("Mc");
if ((i != 1) && (i + 2 <= lngLength))
{
strText[i + 2] = char.ToUpper(strText[i + 2]);
}
Console.WriteLine(strText.ToString()); // prints McDonald
EDIT: Basically, you don't have to get the string before/after the position at which the character is found and do a concatenation of it. StringBuilder helps to modify things in place (string as array of characters) and the ToString method joins the array of characters to a string.

Categories