I have the following regex in my c#:
(?<!\w)M20A\w+
Actual code:
string regex = $#"(?<!\w){prefix}\w+";
Notice the prefix var matches strings such as M20A and X50G.
It perfectly matches the following cases:
M20A0820
M20A1234
M20A7U8V
But now I got a new requirement from the business to match, for example:
M20A-SDR
It will be the prefix followed by the exact string "-SDR". Not just a dash followed by 3 alphanumerics, but literally "-SDR". The existing matches need to still work, but prefix + "-SDR" must also be matched.
What would be the regex that would match the following:
M20A0820
M20A1234
M20A7U8V
M20A-SDR
You may use
string regex = $#"(?<!\w){prefix}\w*(?:-SDR)?";
See the regex demo.
Or, to match as a whole word, you may use word boundaries:
string regex = $#"\b{prefix}\w*(?:-SDR)?\b";
See this regex demo
The \b word boundary at the start will work if all the values in prefix start with a word char, a letter, digit or _. The word boundary at the end will make sense if after -SDR, there can be no more word chars.
The (?:-SDR)? will match a -SDR string optonally.
Details
\b - word boundary
M20A - a literal string
\w* - 0+ word chars
(?:-SDR)? - a non-capturing group that matches 1 or 0 times (as there is a ? after it) an -SDR substring
\b - a word boundary.
Related
I need to replace the word PARAM_DATETIME in the string:
string input = "^FT734,274^A0I,28,28^FH\\^FDPARAM_DATETIME^FS";
I'm trying with:
string newstr = Regex.Replace("^FT734,274^A0I,28,28^FH\\^FDPARAM_DATETIME^FS", #"\bPARAM_DATETIME\b", "27-01-2022");
but it doesn´t work.
The goal is to match the word PARAM_DATETIME even if it is preceded with F at the start of the word followed with any uppercase letter.
You can use
Regex.Replace(text, #"\b(F[A-Z])?PARAM_DATETIME\b", "${1}27-01-2022")
See the regex demo. Details:
\b - a word boundary
(F[A-Z])? - Group 1 (optional): F and then any one ASCII uppercase letter
PARAM_DATETIME - a word
\b - a word boundary
The match is replaced with Group 1 value (${1}) and the hardcoded string.
If you only need to change the word PARAM_DATETIME, isn't it easier to use String.Replace?
string input = "^FT734,274^A0I,28,28^FH\\^FDPARAM_DATETIME^FS";
input = input.Replace("PARAM_DATETIME", "27-01-2022");
I have created a Regex Pattern (?<=[TCC|TCC_BHPB]\s\d{3,4})[-_\s]\d{1,2}[,]
This Pattern match just:
TCC 6005_5,
What should I change to the end to match these both strings:
TCC 6005-5 ,
TCC 6005_5,
You can add a non-greedy wildcard to your expression (.*?):
(?<=(?:TCC|TCC_BHPB)\s\d{3,4})[-_\s]\d{1,2}.*?[,]
^^^
This will now also match any characters between the last digit and the comma.
As has been pointed out in the comments, [TCC|TCC_BHPB] is a character class rather than a literal match, so I've changed this to (?:TCC|TCC_BHPB) which is presumably what your intention was.
Try it online
This part of the pattern [TCC|TCC_BHPB] is a character class that matches one of the listed characters. It might also be written for example as [|_TCBHP]
To "match" both strings, you can match all parts instead of using a positive lookbehind.
\bTCC(?:_BHPB)?\s\d{3,4}[-_\s]\d{1,2}\s?,
See a regex demo
\bTCC A word boundary to prevent a partial match, then match TCC
(?:_BHPB)?\s\d{3,4} Optionally match _BHPB, match a whitespace char and 3-4 digits (Use [0-9] to match a digit 0-9)
[-_\s]\d{1,2} Match one of - _ or a whitespace char
\s?, Match an optional space and ,
Note that \s can also match a newline.
Using the lookbehind:
(?<=TCC(?:_BHPB)?\s\d{3,4})[-_\s]\d{1,2}\s?,
Regex demo
Or if you want to match 1 or more spaces except a newline
\bTCC(?:_BHPB)?[\p{Zs}\t][0-9]{3,4}[-_\p{Zs}\t][0-9]{1,2}[\p{Zs}\t]*,
Regex demo
I have a regex that detect urls:
#"((http|ftp|https)\:\/\/)?([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?";
I am using it with regex.replace to remove urls from text.
I do not want it to replace any word that starts with /images
for example if the text is "this is my text here is a link http://dfdf.com and my is /images/dd.gif"
I need the http://dfdf.com replaces but not the /images/dd.gif
my regex replaces the dd.gif
so I want to negate any word after images/
any idea how can I fix this ?
You may start matching after a word boundary, and fail the match if it is immediately preceded with a whole "word" images/ using
\b(?<!\bimages/)(?:(?:http|ftp)s?://)?([\w-]+(?:\.[\w-]+)+)([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?
See the regex demo. Details:
\b - a word boundary
(?<!\bimages/) - no images/ as a whole word is allowed immediately on the left
(?:(?:http|ftp)s?://)? - an optional sequence of either http or ftp followed with an optional s and then :// substring
([\w-]+(?:\.[\w-]+)+) - Group 1: one or more word or hyphen chars followed with one or more sequences of a . and then one or more word or hyphen chars
([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])? - an optional Group 2: zero or more word chars or chars from the .,#?^=%&:/~+#- set and then a word char or a char from the #?^=%&/~+#- set.
As an alternative solution, you could match match what you don't want to remove and capture what you do want to remove.
You can use a callback with Replace and test for the existence of group 1. If it is there, return an empty string. If it is not there, return the match to leave it unchanged.
\S*/images\S*|(?<!\S)((?:(?:https?|ftp)://)?[\w-]+(?:(?:\.[\w-]+)+)(?:[\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?)
Explanation
\S*/images\S* Match /images preceded and followed by optional non whitespace chars that your want to keep
| Or
(?<!\S) Assert a whitespace boundary to the left
((?:(?:https?|ftp)://)?[\w-]+(?:(?:\.[\w-]+)+)(?:[\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?) The pattern that you tried with some minor changes to make it a bit shorter
Regex demo (Click on the Table tab to see the matches)
For example
var s = #"this is my text here is a link http://dfdf.com and my is /images/dd.gif";
var regex = new Regex(#"\S*/images\S*|(?<!\S)((?:(?:https?|ftp)://)?[\w-]+(?:(?:\.[\w-]+)+)(?:[\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?)");
var result = regex.Replace(s, match => match.Groups[1].Success ? "" : match.Value);
Console.WriteLine(result);
See a C# demo
I'm still learning how to write a regex, but this I can't solve on my own.
have a string that contains a word looking like this : ##companyname##
I have tried the following, but it doesn't work
content = Regex.Replace(content, #"\b##companyname##\b", setup.Company, RegexOptions.IgnoreCase);
\b matches a word boundary, so it won't match # character.
Use \B instead to match a non-word boundary.
content = Regex.Replace(content, #"\B##companyname##\B", setup.Company, RegexOptions.IgnoreCase);
That is because word boundary matches a word boundary position such as whitespace or the beginning or end of the string.
But your regex itself contains #. Do this:
"##companyname##"
The original regex was not a word boundary.
Problem is with the meaning of the \b specifier:
Specifies that the match must occur on a boundary between \w (alphanumeric) and \W (nonalphanumeric) characters. The match must occur on word boundaries — that is, at the first or last characters in words separated by any nonalphanumeric characters.
In your case it is not real boundary between words because in your case both # and < and > are not word characters.
In my oppinion just replacing simply ##companyname## will be enough.
Difference between \b and \B in regex
\b matches the empty string at the beginning or end of a word. \B matches the empty string not at the beginning or end of a word.
content = Regex.Replace(content,
#"\B##companyname##\B",
setup.Company,
RegexOptions.IgnoreCase
);
You can test this regex B##companyname##\B here - http://regexr.com/38p8i
P.S: Started learning regex today :)
I want it to search string like "$12,56,450" using Regex in c#, but it doesn't match the string
Here is my code:
string input="Total earn for the year $12,56,450";
string pattern = #"\b(?mi)($12,56,450)\b";
Regex regex = new Regex(pattern);
if (regex.Match(input).Success)
{
return true;
}
This Regex will do the job, (?mi)(\$\d{2},\d{2},\d{3}), and here's a Regex 101 to prove it.
Now let's break it down a little:
\$ matches the literal $ at the beginning of the string
\d{2} matches any two digits
, matches the literal ,
\d{2} matches any two digits
, matches the literal ,
\d{3} matches any three digits
Now, for the purposes of the demonstration I removed the word boundaries, \b, but I'm also pretty confident you don't need them anyway. See, word boundaries aren't generally necessary for such a finite string match. Consider their definition:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
You need to escape $ and some other special regex caracters.
try this #"\b(?mi)(\$12,56,450)\b";
if you want you can use \d to match a digit, and use \d{2,3} to match a digit with size 2 or 3.