I am trying to use Regex to count the number of times a certain string appears in another comma-separated string.
I am using Regex.Matches(comma-separated string, certain string).Count to grab the number. The only issue I have is that I want it to simply count as a match if it lines up at the start of the string.
For instance, if I have the comma separated string
string comma_separated = "dog,cat,bird,blackdog,dog(1)";
and want to see how many times the search string matches with the contents of the comma-separated string
string search = "dog";
I use:
int count = Regex.Matches(comma_separated, search).Count;
I would expect it to be 2 since it matches up with
"dog,cat,bird,blackdog,dog(1)",
however it returns a 3 since it is also matching up with the dog part of blackdog.
Is there any way I can get it to only count as a match when it recognizes a match starting at the start of the string? Or am I just using Regex incorrectly?
As noted in the comments, a regex may not be the most logical way for you to achieve your desired result. However, if you would like to use a regex to find your matches, something like this would provide your desired result
(?<=,|^)dog
This will perform a "positive lookbehind" to ensure that the word "dog" is preceded by either a comma or is at the start of the string you are searching.
More info available on lookarounds in Regex here: https://www.regular-expressions.info/lookaround.html
string comma_separated = "dog,cat,bird,blackdog,dog(1)";
int count = Regex.Matches(comma_separated, string.Format(#"\b{0}\b", Regex.Escape("dog")), RegexOptions.IgnoreCase).Count;
By appending the \b to either side of the text you can find the "EXACT" match within the text.
Try using this pattern: search = #"\bdog";. \b matches word boundary.
Related
I am really struggling with RegEx. I want my RegEx (if possible) to do 2 things:
1- Validate that the whole string respects the format NAME_STKBYGRP.CSV
2- Extract the NAME substring if match
Examples:
TEST_STKBYGRP.CSV -> TEST
other_stkbygrp.csv -> other
test_wrong.csv -> ""
Here is what I tried so far
string input = "NAME_STKBYGRP.CSV";
Regex regex = new Regex("([A-Z])*_STKBYGRP.CSV", RegexOptions.IgnoreCase);
string s = regex.Match(input).Value;
It does return "" if it doesn't match but return the whole input if it matches.
You need to read regex.Match(input).Groups[1].Value if you only want the value of the first group.
You should also add a ^ and $ at the start and end of your regex if you want to rule out strings like evilnumber12345_NAME_STKBYGRP.CSVevilsuffix.
Edit: adv12 also has a good point about the location of the * - it should be inside the parentheses.
First off, your * should be inside the parentheses. Otherwise, you'll capture several single-character groups. Then, use Match.Groups[1] to get just the characters matched by the portion of the regex in the parentheses.
I have a string "myname 18-may 1234" and I want only "myname" from whole string using a regex.
I tried using the \b(^[a-zA-Z]*)\b regex and that gave me "myname" as a result.
But when the string changes to "1234 myname 18-may" the regex does not return "myname". Please suggest the correct way to select only "myname" whole word.
Is it also possible - given the string in
"1234 myname 18-may" format - to get myname only, not may?
UPDATE
Judging by your feedback to your other question you might need
(?<!\p{L})\p{L}+(?!\p{L})
ORIGINAL ANSWER
I have come up with a lighter regex that relies on the specific nature of your data (just a couple of words in the string, only one is whole word):
\b(?<!-)\p{L}+\b
See demo
Or even a more restrictive regex that finds a match only between (white)spaces and string start/end:
(?<=^|\s)\p{L}+(?=\s|$)
The following regex is context-dependent:
\p{L}+(?=\s+\d{1,2}-\p{L}{3}\b)
See demo
This will match only the word myname.
The regex means:
\p{L}+ - Match 1 or more Unicode letters...
(?=\s+\d{1,2}-\p{L}{3}\b) - until it finds 1 or more whitespaces (\s+) followed with 1 or 2 digits, followed with a hyphen and 3 Unicode letters (\p{L}{3}) which is a whole word (\b). This construction is a positive look-ahead that only checks if something can be found after the current position in the string, but it does not "consume" text.
Since the date may come before the string, you can add an alternation:
\p{L}+(?=[ ]+\d{1,2}-\p{L}{3}\b)|(?<=\d{1,2}-\p{L}{3}[ ]+)\p{L}+
See another demo
The (?<=\d{1,2}-\p{L}{3}\s+) is a look-behind that checks for the same thing (almost) as the look-ahead, but before the myname.
here is a solution without RegEx
string input = "myname 18-may 1234";
string result = input.Split(' ').Where(x => x.All(y => char.IsLetter(y))).FirstOrDefault();
Do a replace using this regex:
(\s*\d+\-.{3}\s*|\s*.{3}\-\d+\s*)|(\s*\d+\s*)
you will end up with just your name.
Demo
I need to recognize the number between the tags [DN]4[-DN] so I wrote this regex:
Regex regexCount = new Regex(#"\[DN]([^)]*)\[-DN]");
Match matchCount = regexCount.Match("[DN]4[-DN]");
However when I try to convert the string match to a Int32, I get this error:
Input string was not in a correct format.
This is how I tried converting:
int count = Convert.ToInt32(matchCount.Value);
When I debugged, I saw that the matched value returns {[DN]2[-DN]} instead of 2. However the regex101 test gave away the correct result with the same regex: regex101
What am I doing wrong folks?
You're currently returning the entire match. You need to return the matched context from your capturing group. The Groups property gets the captured groups within the regular expression.
int Count = Convert.ToInt32(matchCount.Groups[1].Value);
Also, the negated character class seems incorrect, I would use the regex token \d instead.
#"\[DN](\d+)\[-DN]"
How can I use regex to replace matching strings that do not include a specific string?
input string
Keepword mywordsecond mythirdword myfourthwordKeep
string to replace
word
exclude string
Keep
Desired out put
Keepword mysecond mythird myfourthKeep
Will there ever be more than one word in a word? If there are more than one, do you want to replace all of them? If not, this should sort you out:
Regex r = new Regex(#"\b((?:(?!Keep|word)\w)*)word((?:(?!Keep)\w)*)\b");
s1 = r.Replace(s0, "$1$2");
to explain:
First, \b((?:(?!Keep|word)\w)*) captures whatever text precedes the first occurrence of word or Keep.
The next thing it sees must be word, If it sees Keep or the end of the string instead, the match attempt immediately fails.
Then ((?:(?!Keep)\w)*)\b captures the remainder of the text in order to ensure it doesn't contain Keep.
When faced with a problem like this, most users' first impulse is to match (in the sense of consuming) only the part of the string they're interested in, using lookarounds to establish the context. It's usually much easier to write the regex so that it always moves forward through the string as it matches. You capture the parts you want to retain so you can plug them back into the result string by means of group references ($1, $2, etc.).
Given that you're using C#, you could use the lookaround approach:
Regex r = new Regex(#"(?<!Keep\w*)word(?!\w*Keep)");
s1 = r.Replace(s0, "");
But please don't. There are very few regex flavors that support unrestricted lookbehinds like .NET does, and most problems don't work so neatly as this one anyway.
string str = "Keepword mywordsecond mythirdword myfourthwordKeep";
str = Regex.Replace(str, "(?<!Keep)word", "");
And I'm going to link you to a one of good Regular Expressions Cheat sheet here
This works in notepad++:
(?<!Keep)word(?!Keep)
It uses "look ahead".
You can use negative look-behind assertion if you want to remove all "word" that are not proceeded by "Keep":
String input = "Keepword mywordsecond mythirdword myfourthwordKeep";
String pattern = "(?<!Keep)word";
String output = Regex.Replace(input, pattern, "");
I have a regex expression which searches for strings using a Prefix and a Suffix. In it's simplest form \$\$\w+\$\$ will find $$My_Name$$ (in this case the Prefix and Suffix are both equal to $$) Another example would be \[\#\w+\#\] to match [#My_Name#].
The Prefix and Suffix will always be a specific string of 0 to n characters which I can always safely escape for a direct character match.
I extract the Matches in my C# code so I can work with them but obviously my match contains $$My_Name$$ but what I want is to simply get the inner string between the Suffix and Prefix: My_Name.
How do I exclude the Prefix and Suffix from the result?
Change your REGEX to \$\$(\w+)\$\$ and use $1 to get the matching (inner) string.
For example
string pattern = #"\$\$(\w+)\$\$";
string input = "$$My_Name$$";
Regex rgx = new Regex(pattern);
Match result = rgx.Match(input);
Console.WriteLine(result.Groups[1]);
Outputs: "My Name"
P.S - There's no need to use explicitly typed local variables, but I just wanted the types to be clear.
You can group your w+ into a group like this (w+) then when you retrieve the matched string you might be able to ask for that subgroup.
I do not know if I am wrong (but you didn't provided any code whatsoever) but I think this is how it is done: .Groups[1].Value on the the result of Regex.Match.
How about the regex below. It works by capturing the first character into a named group then capturing any repeats into a named group called first group which it then uses to match the end of the string. It will work with any number of repeated character so long as they repeated at the end of the word.
'(?<first_group>(?<first_char>.)\k<first_char>+)(?<word>\w+)\k<first_group>+'
You just need to then extract the capture group named word like so:
String sample = "$$My_Name$$";
Regex regex = new Regex("(?<first_group>(?<first_char>.)\k<first_char>+)(?<word>\w+)\k<first_group>+");
Match match = regex.Match(sample);
if (match.Success)
{
Console.WriteLine(match.Groups["word"].Value);
}
You can use named group like this:
(\$\$)(?<group1>.+?)\1 -- pattern 1 (first case)
\[(#)(?<group2>.+?)\1\] -- pattern 2 (second case)
or combined representation would be:
(\$\$)(?<group1>.+?)\1|\[(#)(?<group2>.+?)\3\]
I would suggest you to use .+? it will help you to match any character other than your prefix/suffix.
Live Demo