Parsing string and generating all possible combinations in multiple strings

Parsing string and generating all possible combinations in multiple strings - c#

In my SQL Server database I have strings stored representing the correct solution to a question. In this string a certain format can be used to represent multiple correct solutions. The format:
possible-text [posibility-1/posibility-2] possible-text
This states either posibility-1 or posibility-2 is correct. There is no limit on how many possibilities there are (e.g. [ pos-1 / pos-2 / pos-3 / ... ] is possible).
However, a possibility can be null, e.g.:
I am [un/]certain.
This means the answer could be "I am certain" or "I am uncertain".
The format can also be nested in a sentence, e.g.:
I am [[un/]certain/[un/]sure].
The format can also occur multiple times in one sentence, e.g.:
[I am/I'm] [[un/]certain/[/un]sure].
What I want is to generate all the possible combinations. E.g. the above expression should return:
I am uncertain.
I am certain.
I am sure.
I am unsure.
I'm uncertain.
I'm certain.
I'm sure.
I'm unsure.
There is no limit on the nesting, nor the amount of possibilities. If there is only one possible solution then it will have not be in the above format. I'm not sure on how to do this.
I have to write this in C#. I think a possible solution could be to write a regex expression that can capture the [ / ] format and return me the possible solutions in a list (for every []-pair) and then generating the possible solutions by going over them in a stack-style way (some sort of recursion and backtracking style), but I'm not to a working solution yet.
I'm at a loss on to how exactly start on this. If somebody could give me some pointers on how to tackle this problem I'd appreciate it. When I find something I'll add it here.
Note: I noticed there are a lot of similar questions, however the solutions all seem to be specific to the particular problem and I think not applicable to my problem. If however I'm wrong, and you remember a previously answered question that can solve this, could you then tell me? Thanks in advance.
Update: Just to clarify if it was unclear. Every line in code is possible input. So this whole line is input:
[I am/I'm] [[un/]certain/[/un]sure].

This should work. I didn't bother optimizing it or doing error checking (in case the input string is malformed).
class Program
{
static IEnumerable<string> Parts(string input, out int i)
{
var list = new List<string>();
int level = 1, start = 1;
i = 1;
for (; i < input.Length && level > 0; i++)
{
if (input[i] == '[')
level++;
else if (input[i] == ']')
level--;
if (input[i] == '/' && level == 1 || input[i] == ']' && level == 0)
{
if (start == i)
list.Add(string.Empty);
else
list.Add(input.Substring(start, i - start));
start = i + 1;
}
}
return list;
}
static IEnumerable<string> Combinations(string input, string current = "")
{
if (input == string.Empty)
{
if (current.Contains('['))
return Combinations(current, string.Empty);
return new List<string> { current };
}
else if (input[0] == '[')
{
int end;
var parts = Parts(input, out end);
return parts.SelectMany(x => Combinations(input.Substring(end, input.Length - end), current + x)).ToList();
}
else
return Combinations(input.Substring(1, input.Length - 1), current + input[0]);
}
static void Main(string[] args)
{
string s = "[I am/I'm] [[un/]certain/[/un]sure].";
var list = Combinations(s);
}
}

You should create a parser that read character by character and builds up a logical tree of the sentence. When you have the tree it is easy to generate all possible combinations. There are several lexical parsers available that you could use, for example ANTLR: http://programming-pages.com/2012/06/28/antlr-with-c-a-simple-grammar/

Related

How to generalize my algorithm to detect if one string is a rotation of another

So I've been going through various problems to review for upcoming interviews and one I encountered is determining whether two strings are rotations of each other. Obviously, I'm hardly the first person to solve this problem. In fact, I did discover that my idea for solving this seems similar to the approach taken in this question.
Full disclosure: I do have a related question on Math SE that's focused on the properties from a more mathematical perspective (although it's worth noting that the way that I tried to formulate the ideas behind this there end up being incorrect for reasons that are explained there).
Here's the idea (and this is similar to the approach taken in the linked question): suppose you have a string abcd and the rotation cdab. Clearly, both cd and ab are substrings of cdab, but if you concatenate them together you get abcd.
So basically, a rotation simply entails moving a substring from the end of the string to the beginning (e.g. we constructed cdab from abcd by moving cd from the end of the string to the beginning of the string).
I came up with an approach that works in a very restricted case (if both of the substrings consist of consecutive letters, like they do in the example there), but it fails otherwise (and I give an example of passing and failing cases and inputs/outputs below the code). I'm trying to figure out if it's possible (or even worthwhile) to try to fix it to work in the general case.
public bool AreRotations(string a, string b)
{
if (a == null)
throw new ArgumentNullException("a");
else if (b == null)
throw new ArgumentNullException("b");
else if (a.Trim().Length == 0)
throw new ArgumentException("a is empty or consists only of whitespace");
else if (b.Trim().Length == 0)
throw new ArgumentException("b is empty or consists only of whitespace");
// Obviously, if the strings are of different lengths, they can't possibly be rotations of each other
if (a.Length != b.Length)
return false;
int[] rotationLengths = new int[a.Length];
/* For rotations of length -2, -2, -2, 2, 2, 2, the distinct rotation lengths are -2, 2
*
* In the example I give below of a non-working input, this contains -16, -23, 16, 23
*
* On the face of it, that would seem like a useful pattern, but it seems like this
* could quickly get out of hand as I discover more edge cases
*/
List<int> distinctRotationLengths = new List<int>();
for (int i = 0; i < a.Length; i++)
{
rotationLengths[i] = a[i] - b[i];
if (i == 0)
distinctRotationLengths.Add(rotationLengths[0]);
else if (rotationLengths[i] != rotationLengths[i - 1])
{
distinctRotationLengths.Add(rotationLengths[i]);
}
}
return distinctRotationLengths.Count == 2;
}
And now for the sample inputs/outputs:
StringIsRotation rot = new StringIsRotation();
// This is the case that doesn't work right - it gives "false" instead of "true"
bool success = rot.AreRotations("acqz", "qzac");
// True
success = rot.AreRotations("abcdef", "cdefab");
// True
success = rot.AreRotations("ablm", "lmab");
// False, but should be true - this is another illustration of the bug
success = rot.AreRotations("baby", "byba");
// True
success = rot.AreRotations("abcdef", "defabc");
//True
success = rot.AreRotations("abcd", "cdab");
// True
success = rot.AreRotations("abc", "cab");
// False
success = rot.AreRotations("abcd", "acbd");
// This is an odd situation - right now it returns "false" but you could
// argue about whether that's correct
success = rot.AreRotations("abcd", "abcd");
Is it possible/worthwhile to salvage this approach and have it still be O(n), or should I just go with one of the approaches described in the post I linked to? (Note that this isn't actually production code or homework, it's purely for my own learning).
Edit: For further clarification based on the comments, there are actually two questions here - first, is this algorithm fixable? Secondly, is it even worth fixing it (or should I just try another approach like one described in the answers or the other question I linked to)? I thought of a few potential fixes but they all involved either inelegant special-case reasoning or making this algorithm O(n^2), both of which would kill the point of the algorithm in the first place.

Let suppose the first string is S and the second is S', clearly if they have different length then we output they are not a rotation of each other. Create a string S''=SS. In fact concatenation of S to itself. Then if S,S' are rotation of each other we find a substring S' in S'' by KMP Algorithm in O(n), otherwise we output they are not a rotation of each other. BTW if you are looking for a fast practical algorithm then instead of KMP use Boyer Moore algorithm.
To address the question more explicit, I'd say that I don't expect an easy algorithm for this special case of string matching problem. So having this background in mind, I don't think an easy modification on your algorithm can work. In fact the field of string matching algorithms is very well developed. If there is a somewhat simpler algorithm than sth like KMP or suffix tree based algorithms, for this special case, then still I think studying those general algorithms can help.

Would something like this work?:
private bool IsRotation(string a, string b)
{
if (a.Length != b.Length) { return false; }
for (int i = 0; i < b.Length; ++i)
{
if (GetCharactersLooped(b, i).SequenceEqual(a))
{
return true;
}
}
return false;
}
private IEnumerable<char> GetCharactersLooped(string data, int startPos)
{
for (int i = startPos; i < data.Length; ++i)
{
yield return data[i];
}
for (int i = 0; i < startPos; ++i)
{
yield return data[i];
}
}
P.S. This will return true for abcd = abcd, since you could consider it a full rotation. If this is not desired, change the start of the loop from 0 to 1 in the first function.

If you're looking just for a method that will check if a string is a rotation of another string, this is a C# implementation of the function you linked (which as far as I know is about the fastest way to solve this particular problem):
bool IsRotation(string a, string b)
{
if (a == null || b == null || a.Length != b.Length)
return false;
return (a + a).Contains(b);
}
If you're asking for feedback on your algorithm, I'm not sure I understand what your algorithm is trying to do. It seems like you are trying to detect a rotation by storing the difference of the char values in the string and seeing if they sum to 0? Or if the list of unique differences contains mirror pairs (pairs (x,y) where x = -y)? Or simply if the number of unique differences is even? Or something else entirely that I am missing from your description?
I'm not sure if what you're doing can be generalized, simply because it depends so heavily on the characters within the words that it may not adequately check for the order in which they are presented. And even if you could, it would be a scholarly exercise only, as the above method will be far faster and more efficient than your method could ever be.

how can I improve a method that checks if current item already exist in the database

I have a method that checks if the current item exists in the database before it adds it to the database, if it does exist it deletes the item else it adds it.
Is there any better way of doing this?
Because right now the titles have to be exactly the same. If the titles have a char/word difference then it wont delete it.
Basically what I mean is this:
If title is "Ronaldo lost his right leg" and there is a title in the database that is "Ronaldo lost his right leg yesterday" it should delete current item.
Another example:
If title is "hello world" and there is a title in the database that is "hello world everyone" it should delete current item.
So basically if the text has common words it should delete the item.
Here is the method I have so far:
public void AddNews(News news)
{
var exists = db.News.Any(x => x.Title == news.Title);
if (exists == false)
{
db.News.AddObject(news);
}
else
{
db.News.DeleteObject(news);
}
}
Any kind of help is appreciated.

First, I agree with #Jonesy that the strings could be split into words using
string[] list1 = myStr.Split(null);
The null forces splitting on whitespace. See: Best way to specify whitespace in a String.Split operation
and those words can be put into lists. The intersection of the lists right away tells you which words match exactly, and how many words match exactly. Any other words are words that don't match.
var result = list1.Intersect(list2, StringComparer.InvariantCultureIgnoreCase);
So for the words that don't match, you can get a score for each word comparison using the Levenshtein distance. I included code below but haven't tested to see if this is a correctly working implementation. Anyway, the reason to use this is that you can compare each word by how many operations it takes to make one word match another. So misspelled words that are very close can be counted as equal.
However, as has been pointed out, the whole process is going to be very error prone. What it sounds like you really want to do is compare the MEANING of the two strings, and while we are making advances in that direction, I am not aware of any C# over the counter AI for parsing meaning from sentences yet.
using System;
/// <summary>
/// Contains approximate string matching
/// </summary>
static class LevenshteinDistance
{
/// <summary>
/// Compute the distance between two strings.
/// </summary>
public static int Compute(string s, string t)
{
int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];
// Step 1
if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}
// Step 2
for (int i = 0; i <= n; d[i, 0] = i++)
{
}
for (int j = 0; j <= m; d[0, j] = j++)
{
}
// Step 3
for (int i = 1; i <= n; i++)
{
//Step 4
for (int j = 1; j <= m; j++)
{
// Step 5
int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
// Step 6
d[i, j] = Math.Min(
Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
// Step 7
return d[n, m];
}
}
Cited from here: http://www.dotnetperls.com/levenshtein

I don't know C#, but BASIC has instr$ and javascript has indexOf()... C# probably has something similar which will check if your string is present in the other string - which means it will show as a match for "hello" or "hello world" or "world hello" if you search "hello", but "hello world" won't find "world hello"... since I don't know C# this isn't valid code but should set you on the right track...
var dbTitle = wherever you get the existing titles from
var yourSearchTerm = what you want to find
if (dbTitle.indexOf(yourSearchTerm)>0) { //indexOf() returns -1 if match not found
db.News.AddObject(news);
}
else {
db.News.DeleteObject(news);
}
Search string manipulation in your help file to find the right command.

Your question raises more questions than it invites to provide a clear answer. May I ask why you'd ever want to delete an item in a method that's called AddItem? Don't you mean you want to update an item when it's identifier matches the one that's provided as an argument?
Having said that, there's basically two options to implement the desired behavior: perform the match in code, or perform the match in the database. However, both naturally require you two define exactly what it means two have a match. The other answers already hint to that.
The advantage of performing the match in code is flexibility: it's generally easier to pogram (complex) logic of this kind in C# that it is to write it in SQL. Plus you'll get the persistance-code for free since you can use EF (as I assume you're using looking at the code sample).
The advantage of doing it in SQL is that it's more performant, because you don't have to retrieve the entire entity from the database before you make the insert/update/delete decision. You can do this by adding an INSTEAD-OF INSERT trigger on the entity-table and perform an update/delete whenever you find the provided entity actually matches an existing one.

I have a method that checks if the current item exists in the database before it adds it to the database, if it does exist it deletes the item else it adds it.
Are you sure you want to delete (and re-add?) the item when found? You probably want to find a way to update the data instead. That will be much more efficient and less error prone. (For example, if you use delete, your record will be missing for everyone for a few miliseconds, and it will be gone forever if the client crashes at the wrong time.)
Also, you probably want to record all the things that the user types.
1) they are helpful for later mapping "what people searched for" to "what people really wanted". If one person typos, it's likely other people will typo that same way. (i.e. people rarely type "tqe" when typing "the". But they type "teh" all the time.)
2) you never know which one is the "best". More words isn't always better.
You're probably better off having a name table with "name, item_id" that allows multiple names to map to the same item in the items table that has the item attributes.

if the title just have a small char difference it wont delete it.
using ToUpper() on both ends will ensure a valid check, even if the casing is different
var exists = db.News.Any(x => x.Title.ToUpper() == news.Title.ToUpper());
if you want other ways to check if an object exists, we need more information.
Update
Going on your comment, you can strip all non-alphanumeric characters from it
Regex rgx = new Regex("[^a-zA-Z0-9 -]");
var exists = db.News.Any(x => rgx.Replace(x.Title.ToUpper(), "") == rgx.Replace(news.Title.ToUpper(), ""));
and "Hello world" will match "Hello World!"

Constantly Incrementing String

So, what I'm trying to do this something like this: (example)
a,b,c,d.. etc. aa,ab,ac.. etc. ba,bb,bc, etc.
So, this can essentially be explained as generally increasing and just printing all possible variations, starting at a. So far, I've been able to do it with one letter, starting out like this:
for (int i = 97; i <= 122; i++)
{
item = (char)i
}
But, I'm unable to eventually add the second letter, third letter, and so forth. Is anyone able to provide input? Thanks.

Since there hasn't been a solution so far that would literally "increment a string", here is one that does:
static string Increment(string s) {
if (s.All(c => c == 'z')) {
return new string('a', s.Length + 1);
}
var res = s.ToCharArray();
var pos = res.Length - 1;
do {
if (res[pos] != 'z') {
res[pos]++;
break;
}
res[pos--] = 'a';
} while (true);
return new string(res);
}
The idea is simple: pretend that letters are your digits, and do an increment the way they teach in an elementary school. Start from the rightmost "digit", and increment it. If you hit a nine (which is 'z' in our system), move on to the prior digit; otherwise, you are done incrementing.
The obvious special case is when the "number" is composed entirely of nines. This is when your "counter" needs to roll to the next size up, and add a "digit". This special condition is checked at the beginning of the method: if the string is composed of N letters 'z', a string of N+1 letter 'a's is returned.
Here is a link to a quick demonstration of this code on ideone.

Each iteration of Your for loop is completely
overwriting what is in "item" - the for loop is just assigning one character "i" at a time
If item is a String, Use something like this:
item = "";
for (int i = 97; i <= 122; i++)
{
item += (char)i;
}

something to the affect of
public string IncrementString(string value)
{
if (string.IsNullOrEmpty(value)) return "a";
var chars = value.ToArray();
var last = chars.Last();
if(char.ToByte() == 122)
return value + "a";
return value.SubString(0, value.Length) + (char)(char.ToByte()+1);
}
you'll probably need to convert the char to a byte. That can be encapsulated in an extension method like static int ToByte(this char);
StringBuilder is a better choice when building large amounts of strings. so you may want to consider using that instead of string concatenation.

Another way to look at this is that you want to count in base 26. The computer is very good at counting and since it always has to convert from base 2 (binary), which is the way it stores values, to base 10 (decimal--the number system you and I generally think in), converting to different number bases is also very easy.
There's a general base converter here https://stackoverflow.com/a/3265796/351385 which converts an array of bytes to an arbitrary base. Once you have a good understanding of number bases and can understand that code, it's a simple matter to create a base 26 counter that counts in binary, but converts to base 26 for display.

Neat solution to a counting within a string

I am trying to solve the following problem but cannot find an elegant solution. Any ideas?
Thanks.
Input - a variable length string of numbers, e.g.,
string str = "5557476374202110373551116201";
Task - Check (from left to right) that each number (ignoring the repetitions) does not appear in the following 2 indexes. Using eg. above, First number = 5. Ignoring reps we see that last index of 5 in the group is 2. So we check next 2 indexes, i.e. 3 and 4 should not have 5. If it does we count it as error. Goal is to count such errors in the string.
In the above string errors are at indexes, 3,10 and 16.

in addition to the other excellent solutions you can use a simple regexp:
foreach (Match m in Regexp.Matches(str, #"(\d)(?!\1)(?=\d\1)"))
Console.WriteLine("Error: " + m.Index);
returns 3,10,16. this would match adjacent errors using lookahead with a backreference. handles repetitions. .net should support that. if not, you can use a non-backreference version:
(?<=0[^0])0|(?<=1[^1])1|(?<=2[^2])2|(?<=3[^3])3|(?<=4[^4])4|(?<=5[^5])5|(?<=6[^6])6|(?<=7[^7])7|(?<=8[^8])8|(?<=9[^9])9

A simple indexed for loop with a couple of look ahead if checks would work. You can treat a string as a char[] or as an IEnumerable - either way you can use that to loop over all of the characters and perform a lookahead check to see if the following one or two characters is a duplicate.

Sorry, not a C# man, but here's a simple solution in Ruby:
a="5557476374202110373551116201"
0.upto(a.length) do |i|
puts "error at #{i}" if a[i]!=a[i+1] && a[i]==a[i+2]
end
Output:
error at 3
error at 10
error at 16

Here's something I threw together in C# that worked with the example input from the question. I haven't checked it that thoroughly, though...
public static IEnumerable<int> GetErrorIndices(string text) {
if (string.IsNullOrEmpty(text))
yield break;
int i = 0;
while (i < text.Length) {
char c = text[i];
// get the index of the next character that isn't a repetition
int nextIndex = i + 1;
while (nextIndex < text.Length && text[nextIndex] == c)
nextIndex++;
// if we've reached the end of the string, there's no error
if (nextIndex + 1 >= text.Length)
break;
// we actually only care about text[nextIndex + 1],
// NOT text[nextIndex] ... why? because text[nextIndex]
// CAN'T be a repetition (we already skipped to the first
// non-repetition)
if (text[nextIndex + 1] == c)
yield return i;
i = nextIndex;
}
yield break;
}

What is the most efficient (read time) string search method? (C#)

I find that my program is searching through lots of lengthy strings (20,000+) trying to find a particular unique phrase.
What is the most efficent method for doing this in C#?
Below is the current code which works like this:
The search begins at startPos because the target area is somewhat removed from the start
It loops through the string, at each step it checks if the substring from that point starts with the startMatchString, which is an indicator that the start of the target string has been found. (The length of the target string varys).
From here it creates a new substring (chopping off the 11 characters that mark the start of the target string) and searches for the endMatchString
I already know that this is a horribly complex and possibly very inefficent algorithm.
What is a better way to accomplish the same result?
string result = string.Empty;
for (int i = startPos; i <= response.Length - 1; i++)
{
if (response.Substring(i).StartsWith(startMatchString))
{
string result = response.Substring(i).Substring(11);
for (int j = 0; j <= result.Length - 1; j++)
{
if (result.Substring(j).StartsWith(endMatchString))
{
return result.Remove(j)
}
}
}
}
return result;

You can use String.IndexOf, but make sure you use StringComparison.Ordinal or it may be one order of magnitude slower.
private string Search2(int startPos, string startMatchString, string endMatchString, string response) {
int startMarch = response.IndexOf(startMatchString, startPos, StringComparison.Ordinal);
if (startMarch != -1) {
startMarch += startMatchString.Length;
int endMatch = response.IndexOf(endMatchString, startMarch, StringComparison.Ordinal);
if (endMatch != -1) { return response.Substring(startMarch, endMatch - startMarch); }
}
return string.Empty;
}
Searching 1000 times a string at about the 40% of a 183 KB file took about 270 milliseconds. Without StringComparison.Ordinal it took about 2000 milliseconds.
Searching 1 time with your method took over 60 seconds as it creates a new string (O(n)) each iteration, making your method O(n^2).

There are a whole bunch of algorithms,
boyer and moore
Sunday
Knuth-Morris-Pratt
Rabin-Karp
I would recommend to use the simplified Boyer-Moore, called Boyer–Moore–Horspool.
The C-code appears at the wikipedia.
For the java code look at
http://www.fmi.uni-sofia.bg/fmi/logic/vboutchkova/sources/BoyerMoore_java.html
A nice article about these is available under
http://www.ibm.com/developerworks/java/library/j-text-searching.html
If you want to use built-in stuff go for regular expressions.

It depends on what you're trying to find in the string. If you're looking for a specific sequence IndexOf/Contains are fast, but if you're looking for wild card patterns Regex is optimized for this kind of search.

I would try to use a Regular Expression instead of rolling my own string search algorithm. You can precompile the regular expression to make it run faster.

For very long strings you cannot beat the boyer-moore search algorithm. It is more complex than I might try to explain here, but The CodeProject site has a pretty good article on it.

You could use a regex; it’s optimized for this kind of searching and manipulation.
You could also try IndexOf ...
string result = string.Empty;
if (startPos >= response.Length)
return result;
int startingIndex = response.IndexOf(startMatchString, startPos);
int rightOfStartIndex = startingIndex + startMatchString.Length;
if (startingIndex > -1 && rightOfStartIndex < response.Length)
{
int endingIndex = response.IndexOf(endMatchString, rightOfStartIndex);
if (endingIndex > -1)
result = response.Substring(rightOfStartIndex, endingIndex - rightOfStartIndex);
}
return result;

Here's an example using IndexOf (beware: written from the top of my head, didn't test it):
int skip = 11;
int start = response.IndexOf(startMatchString, startPos);
if (start >= 0)
{
int end = response.IndexOf(startMatchString, start + skip);
if (end >= 0)
return response.Substring(start + skip, end - start - skip);
else
return response.Substring(start + skip);
}
return string.Empty;

As said before regex is your friend.
You might want to look at RegularExpressions.Group.
This way you can name part of the matched resultset.
Here is an example

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parsing string and generating all possible combinations in multiple strings - c#

Related

How to generalize my algorithm to detect if one string is a rotation of another

how can I improve a method that checks if current item already exist in the database

Constantly Incrementing String

Neat solution to a counting within a string

What is the most efficient (read time) string search method? (C#)

Categories

Resources