Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
If the input string is
Cat fish bannedword bread bánnedword mouse bãnnedword
It should output
Cat fish bread mouse
What would be the best way to do this without slowing down the performance?
There are number of ways you can use but non of them (at least as far as I know) will work without certain performance cost.
The most obvious way is to remove the accented characters first and then use simple string.Replace(). As for removing accented characters this or this stackoverflow questions should help you.
Other approach could be splitting the string into an array of strings (each string being separate word) and then removing each word that equals the 'bannedword' using a parameter that makes Equals() method ignore accents.
Something like:
string[] splittedInput = input.Split(' ');
StringBuilder output = new StringBuilder();
foreach(string word in splittedInput)
{
if(string.Compare(word, bannedWord, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace) == false)
{
output.Append(word);
}
}
string s_output = output.ToString();
//I've not tested it in Visual Studio so there might be mistakes... (A LINQ could also simplify it (and potentially enable pluralization)).
And finally, it should be possible to come up with a clever regex solution (probably the fastest way) but not being an expert on regex I can't help you with that (this might point you in the right direction (if you know at least something about regexes)).
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
We are allowing the user to freely type into textbox like this:
SomeText[Foo.Id]-[Bar.Value] (valid)
Xyz[Foo.Id]_[Bar.Value] (valid)
abc-[pqr] (invalid)
Predefined values:
Foo.Id
Bar.Value
What is the best way to ensure that:
Texts inside [] should be matching to a predefined set of values
If any of the invalid text is entered, identify that wrong text
I think Regex would be the right way to go ahead with this.
A flexible way is to extract the text from the [] and verify it against a whitelist of your choice:
var validWords = new HashSet<string> {"[Foo.Id]", "[Bar.Value]"};
foreach (Match match in Regex.Matches("SomeText[Foo.Id]-[Bar.Value]-[Big.Mac]", #"(\[.*?\])")) {
foreach (Capture capture in match.Captures) {
if (!validWords.Contains(capture.Value)) {
Console.WriteLine($"{capture.Value} is not valid (Position {capture.Index})");
}
}
}
Prepare a string with all possible values. The string should look like this:
(?:possible_value1|possible_value2|...|possible_valueN)
Then use that inside this regex:
\w+\[REGEX_FOR_POSSIBLE_VALUES\][_-]\[REGEX_FOR_POSSIBLE_VALUES\]
For example, considering the only possible values are:
Foo.Id
Bar.Value
Then the final regex would be:
\w+\[(?:Foo.Id|Bar.Value)\][_-]\[(?:Foo.Id|Bar.Value)\]
This was considering you allways need two bracket groups.
If only one bracket group is possible, then use:
\w+\[REGEX_FOR_POSSIBLE_VALUES\](?:[_-]\[REGEX_FOR_POSSIBLE_VALUES\])?
If there can exists multiple bracket groups use:
\w+\[REGEX_FOR_POSSIBLE_VALUES\](?:[_-]\[REGEX_FOR_POSSIBLE_VALUES\])*
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have the following:
The __animal__ jumped over the __object__
I'd like to write a concise method in C# that returns animal and object. How can this be done without RegEx? A dirty solution would be to iterate through the characters one by one until we find a __ and then build a string until we find the closing __ - but I'm looking for a more elegant approach.
You are going to have to iterate in some form - if you don't want to use regex.
string text = "The __animal__ jumped over the __object__";
List<string> words = text.Split(' ').ToList();
words = words.Where(x => x.StartsWith("__") && x.EndsWith("__")).ToList();
You can use extensions of IEnumerable.to search, like above. but the framework is technically iterating.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm returning a string from an API that has a length of 45 characters. There is one word that is unique for one condition that doesn't appear in the other condition.
I'm wondering if using string.contains() is faster performance-wise than comparing the whole string with string.equals() or string == "blah blah".
I don't know the inner workings of any of these methods, but logically, it seems like contains() should be faster because it can stop traversing the string after it finds the match. Is this accurate? Incidentally, the word I want to check is the first word in the string.
I agree with D Stanley (comment). You should use String.StartsWith()
That said, I also don't know the inner working of each method either, but I can see your logic. However "String.Contains()" may still load the entire string before processing it, in which case the performance difference would be very small.
As a final point, with a string length of only 45 characters, the performance difference should me extremely minute. I was shocked when I wrote a junky method to substitute characters and found that is processes ~10kb of text in a fraction of a blink of the eye. So unless you're doing some crazy handling else wise in your app, it shouldn't matter much.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Basically I want to check if a String is a Sentence ("Hello, I am Me!") or Symbol Spam ("HH,,,{''{"), without using the number of symbols as a factor as much as possible. Right now it just detects based on a counter of symbols, but when someone says something with lots of punctuation, they get kicked.
Help?
If the number of symbols in the text is not sufficient, and you don't want to use something too fancy (or bought) could I suggest implementing one or more of these further steps (of increasing difficulty):
Make a count of all A-Za-z and space characters in the string and make a ratio of this to the count of symbols - so if they write a sentence then !!!!!!!!!!!!! at the end it still doesn't snag as the ratio is high enough.
If this still isn't discerning enough, add a further check if you pass item 1...
Count numbers of consecutive A-Za-z characters in the string - work out the average length of these 'words' - if the average is too short then it is probably spam.
These can be done in RegEx reasonably easily - If you want more sophistication then you have to use something written by someone else that has much more developed statistical methods (or start reading lexographical university papers that are beyond me!)
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I have a string "Building1Floor2" and it's always in that format, how do I cleanly get the building number (e.g. 1) and floor number. I'm thinking I need a regex, but not entirely sure that's the best way. I could just use the index if the format stays the same, but if I have have a high floor number e.g. 100 it will break.
P.S. I'm using C#.
Use a regex like this:
Building(\d+)Floor(\d+)
Regex would be an ok option here if "Building" and "Floor" could change. e.g.: "Floor1Room23"
You could use "[A-Za-z]+([0-9]{1,})[A-Za-z]+([0-9]{1,})"
With those groupings, $1 would now be the Building number, and $2 would be Floor.
If "Building" and "Floor" never changed, however, then regex might be overkill.. you could use a string split
Find the index of the "F" and substring on that.
int first = str.IndexOf("F") ;
String building = str.substring(1, first);