Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
So, I am creating a word filter for a game server in C# and basically I am trying to scourer the sentence for banned words and replace them with clean words. I've already done so, but now I'm up to the part where I want to scan the sentence for a list of sentence banned words. I'm hopeless at this bit, and I can't seem to wrap my head around it.
Basically I am CheckSentence(Message) in the ChatManager, and need the following code to count and return continue; if the value is more than 5. So far I have:
public bool CheckSentence(string Message)
{
foreach (WordFilter Filter in this._filteredWords.ToList())
{
if (Message.ToLower().Contains(Filter.Word) && Filter.IsSentence)
{
// count Message, if message contains >5
// from (Message.Contains(Filter.Word))
// continue; else (ignore)
}
}
return false;
}
I'm not too sure if that makes much sense, but I want it to continue; if there are more than 5 Message.Contains(Filter.Word)
public bool CheckSentence(string rawMessage)
{
var lower = rawMessage.ToLower();
var count = 0;
foreach (WordFilter Filter in this._filteredWords.ToList())
{
if (lower.Contains(Filter.Word) && Filter.IsSentence)
{
count++;
}
}
return count >= 5;
}
If this becomes too slow, you may be better of caching the list of filtered words in a HashSet, and iterating over each word in the message, checking if it exists in the HashSet, which would give you O(n) speed, where N is the number of words.
LINQ Version
public bool CheckSentenceLinq(string rawMessage)
{
var lower = rawMessage.ToLower();
return _filteredWords
.Where(x => x.IsSentence)
.Count(x => lower.Contains(x.Word)) >= 5;
}
EDIT 2: LINQ Updated As per #S.C. Comment
By #S.C.
For the linq version, there's no need to count past the first five. return _filteredWords.Where(x => x.IsSentence && lower.Contains(x.Word)).Skip(5).Any();
public bool CheckSentenceLinq(string rawMessage)
{
var lower = rawMessage.ToLower();
return _filteredWords
.Where(x => x.IsSentence)
.Where(x => lower.Contains(x.Word))
.Skip(5)
.Any();
}
ToUpper vs ToLower
As #DevEstacion mentioned and per Microsoft best practices for using string recommendations here it is best to use ToUpperInvariant() for string comparisons rather than ToLowerInvariant().
EDIT:Using Continue
public bool CheckSentenceWithContinue(string rawMessage)
{
var lower = rawMessage.ToLower();
var count = 0;
foreach (WordFilter Filter in this._filteredWords.ToList())
{
if (!Filter.IsSentence)
continue; // Move on to the next filter, as this is not a senetece word filter
if (!lower.Contains(Filter.Word))
continue; // Move on to the next filter, as the message does not contain this word
// If you are here it means filter is a Sentence filter, and the message contains the word, so increment the counter
count++;
}
return count >= 5;
}
I believe someone already posted a correct answer, I'm just here to provide an alternative.
So instead of doing a forloop or foreach, I'll be providing you with Regex solution.
public bool CheckSentence(string rawMessage)
{
/*
The string.Join("|", _filteredWords) will create the pattern for the Regex
the '|' means or so from the list of filtered words, it will look it up on
the raw message and get all matches
*/
return new Regex(string.Join("|", _filteredWords.Where(x => x.IsSentence)),
RegexOptions.IgnoreCase | RegexOptions.Compiled).Match(rawMessage).Length >= 5;
}
Benefits? much shorter, prevents loop and could be faster :)
Don't forget to add these two lines of using declaration on top of the .cs file
using System.Linq;
using System.Text.RegularExpressions;
Related
I'm a beginner in c# and I am working with text exercises. I made a method to filter vehicle's plate numbers. It should consist of 3 letters and 3 integers ( AAA:152 ). My method sends the wrong plate numbers to a file, but also it adds that bad number to a good ones list.
private static string[] InvalidPlates(string[] csvLines, int fieldToCorrect)
{
var toReturn = new List<string>();
var toSend = new List<string>();
int wrongCount = 0;
for (int i = 0; i < csvLines.Length; i++)
{
string[] stringFields = csvLines[i].Split(csvSeparator[0]);
string[] values = stringFields[fieldToCorrect].Split(':');
if(Regex.IsMatch(values[0], #"^[a-zA-Z]+$") && Regex.IsMatch(values[1], "^[0-9]+$"))
{
toReturn.Add(string.Join(csvSeparator, stringFields));
}
else
{
toSend.Add(string.Join(csvSeparator, stringFields));
wrongCount++;
}
}
WriteLinesToFile(OutputFile, toSend.ToArray(), wrongCount);
return toReturn.ToArray();
}
Can somebody help me to fix that?
You need to constrain the possible length using quantifiers:
^[a-zA-Z]{3}\:\d{3}$
which literally means the following, in the strict order:
the strings begins from exactly 3 lowercase or uppercase English alphabet letters, continues with semicolon (:), and ends with exactly three digits
Remember that \ should be escaped in C#.
Also, there is no need to join stringFields back into a string, when you can use non-splitted csvLines[i]:
if (Regex.IsMatch(stringFields, #"^[a-zA-Z]{3}\\:\\d{3}$"))
toReturn.Add(csvLines[i]);
}
else
{
toSend.Add(csvLines[i]);
wrongCount++;
}
Another important thing is that your code is incorrect in terms of OOP. It is pretty inobvious that your method called InvalidPlates will save something to a file. It may confuse you after some time or other developers. There should be no "hidden" functionality, and all methods should actually do only the one thing.
Here is how I would do this using LINQ:
private static bool IsACorrectPlate(string p) => Regex.IsMatch(p, #"^[a-zA-Z]{3}\:\d{3}$");
private static void SortPlatesOut(string[] csvLines, int column, out string[] correct, out string[] incorrect)
{
var isCorrect = csvLines
.GroupBy(l => IsACorrectPlate(l.Split(';')[column]))
.ToDictionary(g => g.Key, g => g.ToArray());
correct = isCorrect[true];
incorrect = isCorrect[false];
}
// Usage:
string[] incorrect, correct;
SortPlatesOut(csvLines, 1, out correct, out incorrect);
File.WriteAllLines("", incorrect);
// do whatever you need with correct
Now, SortPlatesOut method has an expectable behavior without side effects. The code has also become two times shorter. At the same time, it looks more readable for me. If it looks non-readable for you, you can unpack LINQ and split some things other things up.
I have a equation string and when I split it with a my pattern I get the folowing string array.
string[] equationList = {"code1","+","code2","-","code3"};
Then from this I create a list which only contains the codes.
List<string> codeList = {"code1","code2","code3"};
Then existing code loop through the codeList and retrieve the value of each code and replaces the value in the equationList with the below code.
foreach (var code in codeList ){
var codeVal = GetCodeValue(code);
for (var i = 0; i < equationList.Length; i++){
if (!equationList[i].Equals(code,StringComparison.InvariantCultureIgnoreCase)) continue;
equationList[i] = codeVal;
break;
}
}
I am trying to improve the efficiency and I believe I can get rid of the for loop within the foreach by using linq.
My question is would it be any better if I do in terms of speeding up the process?
If yes then can you please help with the linq statement?
Before jumping to LINQ... which doesn't solve any problems you've described, let's look at the logic you have here.
We split a string with a 'pattern'. How?
We then create a new list of codes. How?
We then loop through those codes and decode them. How?
But since we forgot to keep track of where those code came from, we now loop through the equationList (which is an array, not a List<T>) to substitute the results.
Seems a little convoluted to me.
Maybe a simpler solution would be:
Take in a string, and return IEnumerable<string> of words (similar to what you do now).
Take in a IEnumerable<string> of words, and return a IEnumerable<?> of values.
That is to say with this second step iterate over the strings, and simply return the value you want to return - rather than trying to extract certain values out, parsing them, and then inserting them back into a collection.
//Ideally we return something more specific eg, IEnumerable<Tokens>
public IEnumerable<string> ParseEquation(IEnumerable<string> words)
{
foreach (var word in words)
{
if (IsOperator(word)) yield return ToOperator(word);
else if (IsCode(word)) yield return ToCode(word);
else ...;
}
}
This is quite similar to the LINQ Select Statement... if one insisted I would suggest writing something like so:
var tokens = equationList.Select(ToToken);
...
public Token ToToken(string word)
{
if (IsOperator(word)) return ToOperator(word);
else if (IsCode(word)) return ToCode(word);
else ...;
}
If GetCodeValue(code) doesn't already, I suggest it probably could use some sort of caching/dictionary in its implementation - though the specifics dictate this.
The benefits of this approach is that it is flexible (we can easily add more processing steps), simple to follow (we put in these values and get these as a result, no mutating state) and easy to write. It also breaks the problem down into nice little chunks that solve their own task, which will help immensely when trying to refactor, or find niggly bugs/performance issues.
If your array is always alternating codex then operator this LINQ should do what you want:
string[] equationList = { "code1", "+", "code2", "-", "code3" };
var processedList = equationList.Select((s,j) => (j % 2 == 1) ? s :GetCodeValue(s)).ToArray();
You will need to check if it is faster
I think the fastest solution will be this:
var codeCache = new Dictionary<string, string>();
for (var i = equationList.Length - 1; i >= 0; --i)
{
var item = equationList[i];
if (! < item is valid >) // you know this because you created the codeList
continue;
string codeVal;
if (!codeCache.TryGetValue(item, out codeVal))
{
codeVal = GetCodeValue(item);
codeCache.Add(item, codeVal);
}
equationList[i] = codeVal;
}
You don't need a codeList. If every code is unique you can remove the codeCace.
I'm trying to find words in a List. I have an Array with words to search, I have a string with word to find and I have a List that I'm adding letters.
I want to find words contained in Array search, looking for in List wordsCollected.
How can I do this ?
I'm trying this.
private string[] search = {"CAKE", "COFFEE"}; //words to search
private string wordFind = "CAKE"; //word find
private List<String> wordsCollected = new List<string>(); //add letters
/** add letters - A B C D E F G H .... */
public void addWordsCollected(string p){
if(!wordsCollected.Contains(p)){
wordsCollected.Add(p);
}
}
/** check if wordFind is found */
public bool isWordFound(){
bool found = false;
for (int x = 0; x < wordsCollected.Count; x++){
found = wordsCollected[x].IndexOf(wordFind);
if(found >= 0){
break;
found = true;
}
}
return found;
}
}
One possibility is to use LINQ to look for a specific word inside an array of strings. But what if you instead consider accepting a string array for searching inside that other array of strings, which will give more a general solution (one word to search for, or many).
Here is some code I put up in LinqPad that should be much more compact than the code you listed here, by using LINQ, I have tested it inside LinqPad and if I have understood the problem you want to solve here, it might be a solution:
void Main()
{
string[] search = { "CAKE", "COFFEE", "TEA", "HONEY", "SUGAR", "CINNEMON" };
string[] wordsToFind = { "CAKE", "TEAPOT" };
List<String> wordCollected = search.Where(s => s == wordsToFind[0]).ToList();
wordCollected.Dump();
wordCollected = search.Where(x => wordsToFind.Any(w => w == x)).ToList();
wordCollected.Dump();
}
The code above first searches for the first word "CAKE", while the next code searches for "CAKE" and "TEAPOT", in the array search.
Please note that the Dump method here is an extension method inside LinqPad for displaying the results. If you will use the code outside of LinqPad (which I guess you want), remove the two lines above of course.
Also note that there is a probability that the Any operator is quicker than Contains, since this exits quicker? Am I correct here?
I'm looking to implement some algorithm to help me match imperfect sequences.
Say I have a stored sequence of ABBABABBA and I want to find something that 'looks like' that in a large stream of characters.
If I give my algorithm the allowance to have 2 wildcards (differences), how can I use Regex to match something like: where ( and ) mark the differences:
A(A)BABAB(A)A
or
(B)BBA(A)ABBA
My Dilemma is that I am looking to find these potential target matches (with imperfections) in a big string of characters.
So in something like:
ABBDBABDBCBDBABDB(A(A)BABAB(A)A)DBDBABDBCBDBAB
ADBDBABDBDBDBCBDBABCBDBABCBDBABCBDBABABBBDBABABBCD
DBABCBDABDBABCBCBDBABABDABDBABCBDBABABDDABCBDBABAB
I must be able to search for these 'near enough' matches.
Where brackets denote: (The Good enough Match with the (Differences))
Edit: To be more formal in this example, A match of Length N can be accepted if N-2 characters are the same as the original (2 Differences)
I've used Regex before, but only to find perfect sequences - not for something that 'looks like' one.
Hope this is clear enough to get some advice on.
Thanks for reading and any help!
You could use LINQ to be nice and expressive.
In order to use this make sure you have a using System.Linq at the top of your code.
Assuming that
source is the stored target pattern
test is the string to test.
Then you can do
public static bool IsValid(string source, string test)
{
return test != null
&& source != null
&& test.Length == source.Length
&& test.Where((x,i) => source[i] != x).Count() <=2
}
There is also a shortcut version that exits false the moment it fails, saving iterating the rest of the string.
public static bool IsValid(string source, string test)
{
return test != null
&& source != null
&& test.Length == source.Length
&& !test.Where((x,i) => source[i] != x).Skip(2).Any();
}
As requested in comments, a little explanation of how this works
in C# a string can be treated as an array of characters, which means that the Linq methods can be used on it.
test.Where((x,i) => source[i] != x)
This uses the overload of Where that for each character in test, x gets assigned to the character and i gets assigned to the index. If the condition character at position i in source is not equal to x then output into the result.
Skip(2)
this skips the first 2 results.
Any()
this returns true if there any results left or false if not. Because linq defers execution the moment that this is false the function exits rather than evaluating the rest of the string.
The entire test is then negated by prefixing with a '!' to indicate we want to know where there are no more results.
Now in order to match as substring you are going to need to behave similar to a regex backtracking...
public static IEnumerable<int> GetMatches(string source, string test)
{
return from i in Enumerable.Range(0,test.Length - source.Length)
where IsValid(source, !test.Skip(i).Take(source.Length))
select i;
}
public static bool IsValid(string source, IEnumerable<char> test)
{
return test.Where((x,i) => source[i] != x).Skip(2).Any();
}
UPDATE Explained
Enumerable.Range(0,test.Length - source.Length)
This creates a sequence of numbers from 0 to test.Length - source.Length, there is no need in checking starting at every char in test because once the length is shorter the answer is invalid.
from i in ....
Basically iterate over the collection assigning i to be the current value each time
where IsValid(source, !test.Skip(i).Take(source.Length))
Filter the results to only include the ones where there is a match in test starting at index i (hence the skip) and going on for source.Length chars (hence the take.
select i
return i
This returns an enumerable over the indexes in test where there is a match, you could extract them with
GetMatches(source,test).Select(i =>
new string(test.Skip(i).Take(source.Length).ToArray()));
I don't think this can be done with regexes (if it can, I'm unfamiliar with the syntax). However, you can use the dynamic programming algorithm for Levenshtein distance.
Edit: If you don't need to handle letters that have switched positions, a much easier approach is to just compare each pair of characters from the two strings, and just count the number of differences.
I can't think how you'd do it with regex but it should be pretty simple to code.
I'd probably just split the strings up and compare them character by character. If you get a difference count it and move to the next character. If you exceed 2 differences then move on to the next full string.
I don't think there's a good regular expression to handle this case. (Or at least, there isn't one that won't take up a good three lines of text and cause multiple bullets in your feet.) However, that doesn't mean you can't solve this problem.
Depending on how large your strings are (I'm assuming they won't be millions of characters each) I don't see anything stopping you from using a single loop to compare individuals character in order, while keeping a tally of differences:
int differences = 0; // Count of discrepancies you've detected
int tolerance = 7; // Limit of discrepancies you'll allow
CheckStrings(int differences, int tolerance) {
for (i = 0; i < StringA.Length; i++)
{
if (StringA[i] != StringB[i]) {
differences++;
if (differences > tolerance) {
return false;
}
}
}
return true;
}
Most of the time, don't be concerned about your strings being too long to put into a loop. Behind-the-scenes, any code that assesses every character of a string will loop in some form or another. Until you literally have millions of characters to deal with, a loop should do the trick just fine.
I'll bypass the 'regex' part and focus on:
Is there a better way than doing nested loops to wildcard every position?
It sounds like there's a programmatic way that might help you. See this post about iterating over two IEnumerables. By iterating over both strings at the same time, you can complete the task in O(n) time. Even better, if you know your tolerance(maximum of 2 errors), you can sometimes finish faster than O(n).
Here's a simple example that I wrote up. It probably needs tweaking for your own case, but it might be a good starting point.
static void imperfectMatch(String original, String testCase, int tolerance)
{
int mistakes = 0;
if (original.Length == testCase.Length)
{
using (CharEnumerator enumerator1 = original.GetEnumerator())
using (CharEnumerator enumerator2 = testCase.GetEnumerator())
{
while (enumerator1.MoveNext() && enumerator2.MoveNext())
{
if (mistakes >= tolerance)
break;
if (enumerator1.Current != enumerator2.Current)
mistakes++;
}
}
}
else
mistakes = -1;
Console.WriteLine(String.Format("Original String: {0}", original));
Console.WriteLine(String.Format("Test Case String: {0}", testCase));
Console.WriteLine(String.Format("Number of errors: {0}", mistakes));
Console.WriteLine();
}
Does any combination of A, B, ( and ) work?
bool isMatch = Regex.IsMatch(inputString, "^[AB()]+$")
For sufficiently small patterns (ABCD), you could generate a regexp:
..CD|.B.D|.BC.|A..D|A.C.|AB..
You could also code a custom comparison loop
I want to be able to check if the string contains all the values held in the list;
So it will only give you a 'correct answer' if you have all the 'key words' from the list in your answer.
Heres something i tired which half fails;(Doesn't check for all the arrays, will accept just one).
Code i tired:
foreach (String s in KeyWords)
{
if (textBox1.Text.Contains(s))
{
correct += 1;
MessageBox.Show("Correct!");
LoadUp();
}
else
{
incorrect += 1;
MessageBox.Show("Incorrect.");
LoadUp();
}
}
Essentially what i want to do is:
Question: What is the definition of Psychology?
Key words in arraylist: study,mental process,behaviour,humans
Answer: Psychology is the study of mental process and behaviour of humans
Now if and ONLY if the answer above contains all key words will my code accept the answer.
I hope i have been clear with this.
Edit: Thank you all for your help. All answers have been voted up and i thank everyone for quick answers. I voted up the answer that can be easily adapted to any code. :)
Using LINQ:
// case insensitive check to eliminate user input case differences
var invariantText = textBox1.Text.ToUpperInvariant();
bool matches = KeyWords.All(kw => invariantText.Contains(kw.ToUpperInvariant()));
This should help:
string text = "Psychology is the study of mental process and behaviour of humans";
bool containsAllKeyWords = KeyWords.All(text.Contains);
You can use some of the LINQ methods like:
if(Keywords.All(k => textBox1.Text.Contains(k))) {
correct += 1;
MessageBox.Show("Correct");
} else {
incorrect -= 1;
MessageBox.Show("Incorrect");
}
The All method returns true when the function returns true for all of the items in the list.