How to detect a particular word in a string using C#?

How to detect a particular word in a string using C#? - c#

I need to write an if statement using C# code which will detect if the word "any" exists in the string:
string source ="is there any way to figure this out";

Note that if you really want to match the word (and not things like "anyone"), you can use regular expressions:
string source = "is there any way to figure this out";
string match = #"\bany\b";
bool match = Regex.IsMatch(source, match);
You can also do case-insensitive match.

String stringSource = "is there any way to figure this out";
String valueToCheck = "any";
if (stringSource.Contains(valueToCheck)) {
}

Here is an approach combining and extending the answers of IllidanS4 and smoggers:
public bool IsMatch(string inputSource, string valueToFind, bool matchWordOnly)
{
var regexMatch = matchWordOnly ? string.Format(#"\b{0}\b", valueToFind) : valueToFind;
return System.Text.RegularExpressions.Regex.IsMatch(inputSource, regexMatch);
}
You can now do stuff like:
var source = "is there any way to figure this out";
var value = "any";
var isWordDetected = IsMatch(source, value, true); //returns true, correct
Notes:
if matchWordOnly is set to true, the function will return true for "any way" and false for "anyway"
if matchWordOnly is set to false, the function will return true for both "any way" and "anyway". This is logical as in order for "any" in "any way" to be a word, it needs to be a part of the string in the first place. \B (the negation of \b in the regular expression) can be added to the mix to match non-words only but I do not find it necessary based on your requirements.

Related

C# Regex, any more efficient way to parse string enclosed by symbol?

I'm not sure if it's okay to ask... But here goes.
I implemented a method that parses a string using regex, each matching are parsed through the delegates with an order ( actually, order is not important-- I think, wait, is it? ... But I wrote it this way, and it's not fully tested ):
Pattern Regex.Replace: #"(?<!\\)\$.+?\$" then String.Replace: #"\$", #"$"; Replace string enclosed by dollar sign. Ignores backslash ones, then erases backslash. Ex: "$global name$" -> "motherofglobalvar", "Money \$9000" -> "Money $9000"
Pattern Regex.Replace #"(?<!\\)%.+?%" then String.Replace #"\%", #"%"; Replace string enclosed by percentage sign. Ignores backslash ones, then erase backslash. Same as previous example: "%local var%" -> "lordoflocalvar", "It's over 9000\%" -> "It's over 9000%"
Pattern Regex.Replace #"(?<!\\)#" then String.Replace #"\#", #"#"; Replace char '#' with whitespace, ' '. But ignore backslash ones, then erase the backslash. Ex: "I#hit#the#ground#too#hard" -> "I hit the ground too hard", "qw\#op" -> "qw#op"
What I've done without much experience (I think):
//parse variable
public static string ParseVariable(string text)
{
return Regex.Replace(Regex.Replace(Regex.Replace(text, #"(?<!\\)\$.+?\$", match =>
{
string trim = match.Value.Trim('$');
string trimUpper = trim.ToUpper();
return variableGlobal.ContainsKey(trim) ? variableGlobal[trim] : match.Value;
}).Replace(#"\$", #"$"), #"(?<!\\)%.+?%", match =>
{
string trim = match.Value.Trim('%');
string trimUpper = trim.ToUpper();
return variableLocal.ContainsKey(trim) ? variableLocal[trim] : match.Value;
}).Replace(#"\%", #"%"), #"(?<!\\)#", " ").Replace(#"\#", #"#");
}
In short, what I used is: Regex.Replace().Replace()
Since I need to parse 3 kinds of symbols, I chained it as following: Regex.Replace(Regex.Replace(Regex.Replace().Replace()).Replace()).Replace()
Is there any more efficient way than this? I mean, like without need to go through the text 6 times? (3 times regex.replace, 3 times string.replace, where each replace modifies the text to be used by the next replace )
Or is it the best way it can do?
Thanks.

Here's a unique take on the problem, I think. You can build a class that will be used to construct the overall pattern piece-by-piece. This class will be responsible for the generating of the MatchEvaluator delegate that will be passed to Replace as well.
class RegexReplacer
{
public string Pattern { get; private set; }
public string Replacement { get; private set; }
public string GroupName { get; private set; }
public RegexReplacer NextReplacer { get; private set; }
public RegexReplacer(string pattern, string replacement, string groupName, RegexReplacer nextReplacer = null)
{
this.Pattern = pattern;
this.Replacement = replacement;
this.GroupName = groupName;
this.NextReplacer = nextReplacer;
}
public string GetAggregatedPattern()
{
string constructedPattern = this.Pattern;
string alternation = (this.NextReplacer == null ? string.Empty : "|" + this.NextReplacer.GetAggregatedPattern()); // If there isn't another replacer, then we won't have an alternation; otherwise, we build an alternation between this pattern and the next replacer's "full" pattern
constructedPattern = string.Format("(?<{0}>{1}){2}", this.GroupName, this.Pattern, alternation); // The (?<XXX>) syntax builds a named capture group. This is used by our GetReplacementDelegate metho.
return constructedPattern;
}
public MatchEvaluator GetReplaceDelegate()
{
return (match) =>
{
if (match.Groups[this.GroupName] != null && match.Groups[this.GroupName].Length > 0) // Did we get a hit on the group name?
{
return this.Replacement;
}
else if (this.NextReplacer != null) // No? Then is there another replacer to inspect?
{
MatchEvaluator next = this.NextReplacer.GetReplaceDelegate();
return next(match);
}
else
{
return match.Value; // No? Then simply return the value
}
};
}
}
It should be obvious as to what Pattern and Replacement represent. GroupName is kind of a hack to let the replacement evaluator know which RegexReplacer fragment resulted in the match. NextReplacer points to another replacer instance that holds a different pattern fragment (et al.).
The idea here is to have a kind of linked list of objects that will represent the overall pattern. You can call GetAggregatedPattern on the outer-most replacer to get the full pattern--each replacer calls the next replacer's GetAggregatedPattern to get that replacer's patter fragment, to which it concatenates its own fragment. The GetReplacementDelegate generates a MatchEvaluator. This MatchEvaluator will compare its own GroupName to the Match's captured groups. If the group name was captured, then we have a hit, and we return this replacer's Replacement value. Otherwise, we step into the next replacer (if there is one) and repeat the group name comparison. If there is no hit on any replacer, then we simply yield back the original value (i.e. what was matched by the pattern; this should be rare).
The usage of such might look like this:
string target = #"$global name$ Money \$9000 %local var% It's over 9000\% I#hit#the#ground#too#hard qw\#op";
RegexReplacer dollarWrapped = new RegexReplacer(#"(?<!\\)\$[^$]+\$", "motherofglobalvar", "dollarWrapped");
RegexReplacer slashDollar = new RegexReplacer(#"\\\$", string.Empty, "slashDollar", dollarWrapped);
RegexReplacer percentWrapped = new RegexReplacer(#"(?<!\\)%[^%]+%", "lordoflocalvar", "percentWrapped", slashDollar);
RegexReplacer slashPercent = new RegexReplacer(#"\\%", string.Empty, "slashPercent", percentWrapped);
RegexReplacer singleAt = new RegexReplacer(#"(?<!\\)#", " ", "singleAt", slashPercent);
RegexReplacer slashAt = new RegexReplacer(#"\\#", "#", "slashAt", singleAt);
RegexReplacer replacer = slashAt;
string pattern = replacer.GetAggregatedPattern();
MatchEvaluator evaluator = replacer.GetReplaceDelegate();
string result = Regex.Replace(target, pattern, evaluator);
Because you want each replacer to know if it got a hit, and because we are hacking this by using group names, you want to make sure that each group name is distinct. A simple way to ensure this would be to use a name that's identical to the variable name since you can't have two variables with the same name within the same scope.
You can see above that I am building each part of the pattern separately, but as I build, I pass the previous replacer as a 4th parameter to the current replacer. This builds the chain of replacers. Once built, I use the last replacer constructed in order to generate the overall pattern and evaluator. If you use anything but, then you will only have part of the overall pattern. Finally, it's simply a matter of passing the generated pattern and evaluator to the Replace method.
Keep in mind that this approach was targeted more at the problem as described. It may work in more general scenarios, but I've only worked with what you've presented. Also, since this is more of a parsing question, a parser may be the proper route to take--although the learning curve is going to be higher.
Also keep in mind that I haven't profiled this code. It certainly doesn't loop over the target string multiple times, but it does involve additional method calls during replacement. You would certainly want to test it in your environment.

Best way to provide the user an escape string

Suppose I want to ask a user what format they want a certain output to be in and the output will include fill-in fields. So they provide something like this string:
"Output text including some field {FieldName1Value} and another {FieldName2Value} and so on..."
Anything bound by the {} should be a column name in a table somewhere they will be replaced with the the stored value with the code I am writing. Seems simple, I could just do a string.Replace on any instance that matches the patter "{" + FieldName + "}". But, what if I also want to give the user the option of using an escape so they can use brackets like any other string. I was thinking they provide "{{" or "}}" to escape that bracket - nice and easy for them. So, they could provide something like:
"Output text including some field {FieldName1Value} and another {FieldName2Value} but not this {{FieldName2Value}}"
But now that "{{FieldName2Value}}" is to be treated like any other string and ignored by the by the Replace. Also, if they decided to put something like "{{{FieldName2Value}}}" with the triple brackets, that would be interpreted by the code as the field value wrapped with brackets and so on.
This is where I get stuck. I am trying with RegEx and came up with this:
public object Convert(object[] values, Type targetType, object parameter, CultureInfo culture)
{
string format = (string)values[0];
ObservableCollection<CalloutFieldAliasMap> oc = (ObservableCollection<CalloutFieldAliasMap>)values[1];
foreach (CalloutFieldMap map in oc)
format = Regex.Replace(format, #"(?<!{){" + map.FieldName + "(?<!})}", " " + map.FieldAlias + " ", RegexOptions.IgnoreCase);
return format;
}
This works in the situation with double brackets {{ }} but NOT if there are three, ie {{{ }}}. The triple brackets are treated like string when it should be treated as {FieldValue}.
Thanks for any help.

By expanding on your regular expression, the presence of literals can be accommodated.
format = Regex.Replace(format,
#"(?<!([^{]|^){(?:{{)*){" + Regex.Escape(map.FieldName) + "}",
String.Format(" {0} ", map.FieldAlias),
RegexOptions.IgnoreCase | RegexOptions.Compiled);
The first part of the expression, (?<!([^{]|^){(?:{{)*){, designates that the { must be preceded by an even number of { characters for it to mark the beginning of a field token. Thus, {FieldName} and {{{FieldName} will denote the start of a field name, whereas {{FieldName} and {{{{FieldName} would not.
The closing } simply requires that the end of the field be a simple }. There is some ambiguity in the syntax in that {FieldName1Value}}} could be parsed as a token with FieldName1Value (followed by the literal }) or FieldName1Value}. The regex assumes the former. (If the latter is intended, you could replace this with }(?!}(}})*) instead.
A couple of other notes. I added Regex.Escape(map.FieldName) so that all characters in the field name are treated as literals; and added the RegexOptions.Compiled flag. (Since this is both a complex expression and executed in a loop, it is a good candidate for compilation.)
After the loop executes, a simple:
format = format.Replace("{{", "{").Replace("}}", "}")
can be used to unescape the literal {{ and }} characters.

The simplest way would be to use String.Replace to replace the double brackets with a character sequence that the user can not (or almost certainly will not) enter. Then do the replacement of your fields, and finally convert replacement back to the double brackets.
For example, given:
string replaceOpen = "{x"; // 'x' should be something like \u00ff, for example
string replaceClose = "x}";
string template = "Replace {ThisField} but not {{ThatField}}";
string temp = template.Replace("{{", replaceOpen).Replace("}}", replaceClose);
string converted = temp.Replace("{ThisField}", "Foo");
string final = converted.Replace(replaceOpen, "{{").Replace(replaceClose, "}});
It's not particularly pretty, but it's effective.
How you go about it is going to depend in large part on how often you call this, and how fast you really need it to be.

I have an extension method I wrote that almost does what you ask, but, while it does escape using double braces, it doesn't do the triple braces like you suggested. Here is the method (also on GitHub at https://github.com/benallred/Icing/blob/master/Icing/Icing.Core/StringExtensions.cs):
private const string FormatTokenGroupName = "token";
private static readonly Regex FormatRegex = new Regex(#"(?<!\{)\{(?<" + FormatTokenGroupName + #">\w+)\}(?!\})", RegexOptions.Compiled);
public static string Format(this string source, IDictionary<string, string> replacements)
{
if (string.IsNullOrWhiteSpace(source) || replacements == null)
{
return source;
}
string replaced = replacements.Aggregate(source,
(current, pair) =>
FormatRegex.Replace(current,
new MatchEvaluator(match =>
(match.Groups[FormatTokenGroupName].Value == pair.Key
? pair.Value : match.Value))));
return replaced.Replace("{{", "{").Replace("}}", "}");
}
Usage:
"This is my {FieldName}".Format(new Dictionary<string, string>() { { "FieldName", "value" } });
Even easier if you add this:
public static string Format(this string source, object replacements)
{
if (string.IsNullOrWhiteSpace(source) || replacements == null)
{
return source;
}
IDictionary<string, string> replacementsDictionary = new Dictionary<string, string>();
foreach (PropertyDescriptor propertyDescriptor in TypeDescriptor.GetProperties(replacements))
{
string token = propertyDescriptor.Name;
object value = propertyDescriptor.GetValue(replacements);
replacementsDictionary.Add(token, (value != null ? value.ToString() : String.Empty));
}
return Format(source, replacementsDictionary);
}
Usage:
"This is my {FieldName}".Format(new { FieldName = "value" });
Unit tests for this method are at https://github.com/benallred/Icing/blob/master/Icing/Icing.Tests/Core/TestOf_StringExtensions.cs
If this doesn't work, what would your ideal solution do for more than three braces? In other words, if {{{FieldName}}} becomes {value}, what does {{{{FieldName}}}} become? What about {{{{{FieldName}}}}} and so on? While those cases are unlikely, they still need to be handled purposefully.

RegEx will not do what you want because it only knows it's current state and what transitions are available. It has no concept of memory. The language you're trying parse is not regular so you will never be able to write a RegEx to handle the general case. You would need i expressions where i is the number of matching braces.
There is a lot of theory behind this and I'll provide some links at the bottom if you're curious. But basically the language you're trying to parse is context-free and to implement a general solution you'll need model a push down automaton, which uses a stack to ensure that an opening brace has a matching closing brace (yes, this is why most languages have matching braces).
Each time you encounter { you put it on the stack. If you encounter } you pop from the stack. When you empty the stack you will know that you've reached the end of a field. Of course that's a major simplification of the problem, but if you're looking for a general solution it should get you moving in the right direction.
http://en.wikipedia.org/wiki/Regular_language
http://en.wikipedia.org/wiki/Context-free_language
http://en.wikipedia.org/wiki/Pushdown_automaton

Replace part of a string with new value

I've got a scenario, wherein i need to replace the string literal with new text.
For example, if my string is "01HW128120", i will first check if the text contains "01HW" If yes, then replace that with the string "MachineID-".
So eventually i wanted "01HW128120" to be "MachineID-128120". Sometimes i do get the string as "1001HW128120" - In this case also i require to replace the "1001HW" with "MachineID-"
I tried the below code snippet, but this does not work to my expectation.
string sampleText = "01HW128120";
if(sampleText.Contains("01HW"))
sampleText = sampleText.Replace("01HW","MachineID-");
Any suggestion would be of great help to me.

Few Possible Search Values
If there are only a few possible combinations, you can simply do multiple tests:
string value = "01HW128120";
string replacement = "MachineID-";
if( value.Contains( "01HW" ) ) {
value = value.Replace( "01HW", replacement );
}
else if( value.Contains( "1001HW" ) ) {
value = value.Replace( "1001HW", replacement );
}
Assert.AreEqual( "MachineID-128120", value );
Many Possible Search Values
Of course, this approach quickly becomes unwieldy if you have a large quantity of possibilities. Another approach is to keep all of the search strings in a list.
string value = "01HW128120";
string replacement = "MachineID-";
var tokens = new List<string> {
"01HW",
"1001HW"
// n number of potential search strings here
};
foreach( string token in tokens ) {
if( value.Contains( token ) ) {
value = value.Replace( token, replacement );
break;
}
}
"Smarter" Matching
A regular expression is well-suited for string replacement if you have a manageable number of search strings but you perhaps need not-exact matches, case-insensitivity, lookaround, or capturing of values to insert into the replaced string.
An extremely simple regex which meets your stated requirements: 1001HW|01HW.
Demo: http://regexr.com?34djm
A slightly smarter regex: ^\d{2,4}HW
Assert position at start of string
Match 2-4 digits
Match the value "HW" literally
See also: Regex.Replace Method

If you just want to replace everything up to "01HW" with "MachineID-", you could use a generic regex:
sampleText = Regex.Replace(sampleText, "^.*01HW", "MachineID-");

Is there a way to evaluate more than one string inside of a string.contains() method?

if (description.ToUpper().Contains("BOUGHT") || description.ToUpper().Contains("PURCHASE"))
The code above is what I have and I wondered if I had a longer list of strings for the same condition, how I would do it without making the code too long. Maybe a lambda expression?

No, there is no built in function. But it's not hard to write it yourself:
string[] needles = new string[]{"BOUGHT", "PURCHASE"};
string haystack = description.ToUpperInvariant();
bool found = needles.Any(needle=> haystack.Contains(needle));
I only convert hackstack to upper once to improve performance.
Alternatively you could use IndexOf(needle, StringComparison.OrdinalIgnoreCase)>=0:
string[] needles = new string[]{"BOUGHT", "PURCHASE"};
string haystack = description;
bool found = needles.Any(needle=> haystack.IndexOf(needle, StringComparison.OrdinalIgnoreCase)>=0);
You should not use ToUpper() here, since that uses the current culture. Using the current culture can lead to unexpected problems on some computers, for example i does not uppercase to I when using the Turkish culture.
There might still some subtle problems remaining where ToUpperInvariant() on both sides and a case insensitive comparison might return different results, but that's only relevant if you have unusual characters in both your haystack and needles.

You can rework the code to something like this:
var words = new[] { "BOUGHT", "PURCHASE" };
var desc = description.ToUpper();
if(words.Any(w => description.Contains(w)) {
// something matched
}

if (someCollectionOfStrings.Any(string => originalString.Contains(string))
{
//stuff
}

Use a regular expression:
if (Regex.IsMatch(description, "purchase|bought", RegexOptions.IgnoreCase)) {
// ...
}

Regex.IsMatch(input, string.Join("|", strings));
You might have to escape the strings if they contain Regex control characters.

public static bool ContainsOneOfManyIgnoreCase(this string str, params string [] items)
{
return items.Any(x => str.IndexOf(x, StringComparison.CurrentCultureIgnoreCase) != -1);
}

What's the most efficient way to pull culture information out of a resx's filename in c#?

What's the most efficient way to pull culture information out of a resx's filename using C#? The solution should also handle there not being culture info in the file name (ie form1.resx). In that case a string assigned "Default" should be returned.

This seems like it would work:
string GetCulture(string s)
{
var arr=s.Split(".");
if(arr.Length>2)
return arr[1];
else
return "Default";
}

Your best bet is just using regex:
string ParseCulture(string input)
{
var r = Regex.new('[\w\d]+\.([\w\-]+)\.resx')
// Match the regular expression pattern against a text string.
Match m = r.Match(input);
if (m.Success)
{
return m.Groups[1];
}
else
{
return "Default";
}
}
This gets back a match (with the phrase you're looking for as the match), or no match (which means you should use "Default").

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to detect a particular word in a string using C#? - c#

I need to write an if statement using C# code which will detect if the word "any" exists in the string: string source ="is there any way to figure this out";

Note that if you really want to match the word (and not things like "anyone"), you can use regular expressions: string source = "is there any way to figure this out"; string match = #"\bany\b"; bool match = Regex.IsMatch(source, match); You can also do case-insensitive match.

String stringSource = "is there any way to figure this out"; String valueToCheck = "any"; if (stringSource.Contains(valueToCheck)) { }

Related

C# Regex, any more efficient way to parse string enclosed by symbol?

Best way to provide the user an escape string

Replace part of a string with new value

Is there a way to evaluate more than one string inside of a string.contains() method?

What's the most efficient way to pull culture information out of a resx's filename in c#?

Categories

Resources