Remove everything that doesn't match - c#

string line = "Rok rok irrelevant text irrelevant;text.irrelevant,text";
string NewLine = Regex.Replace(line, #"\b[rR]\w*", "");
Right now it replaces every word starting with r/R with a blank space, but I want to make everything a blank space EXCEPT words starting with r/R.

Edit
It seems all you want is to extract words starting with r or R and join them with a space. In this case, use a mere \b[rR]\w* regex and the following code:
var result = string.Join(" ", Regex.Matches(line, #"\b[rR]\w*").Cast<Match>().Select(x => x.Value));
See the C# demo.
Original answer
You may use a negative lookahead after a word boundary:
\b(?![rR])\w+
^^^^^^^^
Note that the + quantifier is better here since you want to remove at least 1 char found.
Or, in case you also want to remove all non-word chars after the found word, use
\b(?![rR])\w+\W*
See the regex demo #1 and regex demo #2.
If you want to remove any non-word chars before and after a qualifying word, use
var result = Regex.Replace(line, #"\W*\b(?![rR])\w+\W*", " ").Trim();
It will remove all non-word chars before a word not starting with r and R and after it.
Details
\b - a word boundary
(?![rR]) - a negative lookahead that will fail the match if, immediately to the right of the current location, there is r or R
\w+ - 1+ word chars
\W* - 0+ non-word chars.

Related

How to negate filename after a specific term in a regex

I have a regex that detect urls:
#"((http|ftp|https)\:\/\/)?([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?";
I am using it with regex.replace to remove urls from text.
I do not want it to replace any word that starts with /images
for example if the text is "this is my text here is a link http://dfdf.com and my is /images/dd.gif"
I need the http://dfdf.com replaces but not the /images/dd.gif
my regex replaces the dd.gif
so I want to negate any word after images/
any idea how can I fix this ?
You may start matching after a word boundary, and fail the match if it is immediately preceded with a whole "word" images/ using
\b(?<!\bimages/)(?:(?:http|ftp)s?://)?([\w-]+(?:\.[\w-]+)+)([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?
See the regex demo. Details:
\b - a word boundary
(?<!\bimages/) - no images/ as a whole word is allowed immediately on the left
(?:(?:http|ftp)s?://)? - an optional sequence of either http or ftp followed with an optional s and then :// substring
([\w-]+(?:\.[\w-]+)+) - Group 1: one or more word or hyphen chars followed with one or more sequences of a . and then one or more word or hyphen chars
([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])? - an optional Group 2: zero or more word chars or chars from the .,#?^=%&:/~+#- set and then a word char or a char from the #?^=%&/~+#- set.
As an alternative solution, you could match match what you don't want to remove and capture what you do want to remove.
You can use a callback with Replace and test for the existence of group 1. If it is there, return an empty string. If it is not there, return the match to leave it unchanged.
\S*/images\S*|(?<!\S)((?:(?:https?|ftp)://)?[\w-]+(?:(?:\.[\w-]+)+)(?:[\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?)
Explanation
\S*/images\S* Match /images preceded and followed by optional non whitespace chars that your want to keep
| Or
(?<!\S) Assert a whitespace boundary to the left
((?:(?:https?|ftp)://)?[\w-]+(?:(?:\.[\w-]+)+)(?:[\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?) The pattern that you tried with some minor changes to make it a bit shorter
Regex demo (Click on the Table tab to see the matches)
For example
var s = #"this is my text here is a link http://dfdf.com and my is /images/dd.gif";
var regex = new Regex(#"\S*/images\S*|(?<!\S)((?:(?:https?|ftp)://)?[\w-]+(?:(?:\.[\w-]+)+)(?:[\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?)");
var result = regex.Replace(s, match => match.Groups[1].Success ? "" : match.Value);
Console.WriteLine(result);
See a C# demo

Alternate regex with -SDR?

I have the following regex in my c#:
(?<!\w)M20A\w+
Actual code:
string regex = $#"(?<!\w){prefix}\w+";
Notice the prefix var matches strings such as M20A and X50G.
It perfectly matches the following cases:
M20A0820
M20A1234
M20A7U8V
But now I got a new requirement from the business to match, for example:
M20A-SDR
It will be the prefix followed by the exact string "-SDR". Not just a dash followed by 3 alphanumerics, but literally "-SDR". The existing matches need to still work, but prefix + "-SDR" must also be matched.
What would be the regex that would match the following:
M20A0820
M20A1234
M20A7U8V
M20A-SDR
You may use
string regex = $#"(?<!\w){prefix}\w*(?:-SDR)?";
See the regex demo.
Or, to match as a whole word, you may use word boundaries:
string regex = $#"\b{prefix}\w*(?:-SDR)?\b";
See this regex demo
The \b word boundary at the start will work if all the values in prefix start with a word char, a letter, digit or _. The word boundary at the end will make sense if after -SDR, there can be no more word chars.
The (?:-SDR)? will match a -SDR string optonally.
Details
\b - word boundary
M20A - a literal string
\w* - 0+ word chars
(?:-SDR)? - a non-capturing group that matches 1 or 0 times (as there is a ? after it) an -SDR substring
\b - a word boundary.

Regular expressions \b but with not just alphanumeric characters in c#

I want the same functionality as \b but with other characters.
In C#, I want to have something like
string str = "\\b" + Regex.Escape(string) + "\\b");
However I have some so Regex.Escape("#(Something")
will find it in the string Typing #(Something to you.
The problem you experience is related to the fact that \b word boundary is context dependent, and \b\(abc\b will match (abc in x(abc) but not in :(abc) (\b\( means there must be a word char before ().
To match any string that is not enclosed with word chars use
var pattern = $#"(?<!\w){Regex.Escape(string)}(?!\w)";
See the regex demo.
Here, (?<!\w) is a negative lookbehind that will make sure there is no word char immediately to the left of the current location, and (?!\w) negative lookahead will make sure there is no word char immediately to the right of the current location.
Other custom "word" boundaries:
Whitespace word boundary: var pattern = $#"(?<!\S){Regex.Escape(string)}(?!\S)"; // Match when enclosed with whitespaces
Word and symbol boundary (if you do not want to find c in c++): var pattern = $#"(?<![\w\p{S}]){Regex.Escape(string)}(?![\w\p{S}])";
For this you'd need a conditional word boundary at each end.
It just guards the string begin and end, if it's a word, it must be at
a word boundary.
If it's not a word, the default is nothing, as it should be.
(?(?= \w )
\b
)
(?: #\(Something )
(?(?<= \w )
\b
)
So, it ends up looking like
string str = "(?(?=\\w)\\b)" + Regex.Escape(string) + "(?(?<=\\w)\\b)";
Regexstorm.net demo
This takes the guesswork out of it.

C# Regular expression to squeeze word where every character is separated by a space

I'm trying to write a regular expression to transform words written like "H e l l o Everyone" to "Hello Everyone".
If it is words separated by spaces like "Hello everyone, how are you?", nothing should happen.
Basically all single characters should be squeezed to a make a word and we can consider if it is more than 2 characters only are following this pattern.
If it is like "a b cdef" - Nothing should happen
But "a b c def" -> "abc def"
I tried something like this "^\w(?:(\s)\w)*$" but it is matching with "Hello world" as well.
And also, I'm not sure on how to squeeze these single characters.
Any help is greatly appreciated.
Thanks!
I suggest to match chunks of single word chars separated with single whitespaces and then removing the spaces inside within a match evaluator.
The regex is
(?<!\S)\w(?:\s\w){2,}(?!\S)
See its demo at RegexStorm. The (?<!\S) and (?!\S) make sure these chunks are enclosed with whitespaces (or are at string start/end).
Details:
(?<!\S) - a negative lookbehind making sure there is a whitespace or start of string immediately before the current location
\w - a word char (letter/digit/underscore, to match a letter, use \p{L} instead)
(?:\s\w){2,} - 2 or more sequences of:
\s - a whitespace
\w - a word char
(?!\S) - a negative lookahead making sure there is a whitespace or start of string immediately after the current location
See the C# demo:
var res = Regex.Replace(s, #"(?<!\S)\w(?:\s\w){2,}(?!\S)", m =>
new string(m.Value
.Where(c => !Char.IsWhiteSpace(c))
.ToArray()));
If you're looking for a pure regex solution,
Regex.Replace(s, #"(?<=^\w|(\s\w)+)\s(?=(\w\s)+|\w$)", string.Empty);
replaces a space with at least one space and letter pair on each side with nothing (with a little extra to handle start/end of the string).

Regex that removes the 2 trailing letters from a string not preceded with other letters

This is in C#. I've been bugging my head but not luck so far.
So for example
123456BVC --> 123456BVC (keep the same)
123456BV --> 123456 (remove trailing letters)
12345V -- > 12345V (keep the same)
12345 --> 12345 (keep the same)
ABC123AB --> ABC123 (remove trailing letters)
It can start with anything.
I've tried #".*[a-zA-Z]{2}$" but no luck
This is in C# so that I always return a string removing the two trailing letters if they do exist and are not preceded with another letter.
Match result = Regex.Match(mystring, pattern);
return result.Value;
Your #".*[a-zA-Z]{2}$" regex matches any 0+ characters other than a newline (as many as possible) and 2 ASCII letters at the end of the string. You do not check the context, so the 2 letters are matched regardless of what comes before them.
You need a regex that will match the last two letters not preceded with a letter:
(?<!\p{L})\p{L}{2}$
See this regex demo.
Details:
(?<!\p{L}) - fails the match if a letter (\p{L}) is found before the current position (you may use [a-zA-Z] if you only want to deal with ASCII letters)
\p{L}{2} - 2 letters
$ - end of string.
In C#, use
var result = Regex.Replace(mystring, #"(?<!\p{L})\p{L}{2}$", string.Empty);
If you're looking to remove those last two letters, you can simply do this:
string result = Regex.Replace(originalString, #"[A-Za-z]{2}$", string.Empty);
Remember that in regex $ means the end of the input or the string before a newline.

Categories