I have this regular expression pattern: .{2}\#.{2}\K|\..*(*SKIP)(?!)|.(?=.*\.)
It works perfectly to convert to replace the matches to get
trabc#abtrec.com.lo => ***bc#ab*****.com.lo
demomail#demodomain.com => ******il#de*********.com
But when I try to use it on C# the \K and the (*SKIP) and (*F) are not allowed.
what will be the c# version of this pattern? or do you know a simpler way to mask the email without the unsupported pattern entries?
Demo
UPDATE:
(*SKIP): this verb causes the match to fail at the current starting position in the subject if the rest of the pattern does not match
(*F): Forces a matching failure at the given position in the pattern (the same as (?!)
Try this regex:
\w(?=.{2,}#)|(?<=#[^\.]{2,})\w
Click for Demo
Explanation:
\w - matches a word character
(?=.{2,}#) - positive lookahead to find the position immediately followed by 2+ occurrences of any character followed by #
| - OR
(?<=#[^\.]{2,}) - positive lookbehind to find the position immediately preceded by # followed by 2+ occurrences of any character that is not a .
\w - matches a word character.
Replace each match with a *
You can achieve the same result with a regex that matches items in one block, and applying a custom match evaluator:
var res = Regex.Replace(
s
, #"^.*(?=.{2}\#.{2})|(?<=.{2}\#.{2}).*(?=.com.*$)"
, match => new string('*', match.ToString().Length)
);
The regex has two parts:
The one on the left ^.*(?=.{2}\#.{2}) matches the user name portion except the last two characters
The one on the right (?<=.{2}\#.{2}).*(?=.com.*$) matches the suffix of the domain up to the ".com..." ending.
Demo.
Related
I have a regex that detect urls:
#"((http|ftp|https)\:\/\/)?([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?";
I am using it with regex.replace to remove urls from text.
I do not want it to replace any word that starts with /images
for example if the text is "this is my text here is a link http://dfdf.com and my is /images/dd.gif"
I need the http://dfdf.com replaces but not the /images/dd.gif
my regex replaces the dd.gif
so I want to negate any word after images/
any idea how can I fix this ?
You may start matching after a word boundary, and fail the match if it is immediately preceded with a whole "word" images/ using
\b(?<!\bimages/)(?:(?:http|ftp)s?://)?([\w-]+(?:\.[\w-]+)+)([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?
See the regex demo. Details:
\b - a word boundary
(?<!\bimages/) - no images/ as a whole word is allowed immediately on the left
(?:(?:http|ftp)s?://)? - an optional sequence of either http or ftp followed with an optional s and then :// substring
([\w-]+(?:\.[\w-]+)+) - Group 1: one or more word or hyphen chars followed with one or more sequences of a . and then one or more word or hyphen chars
([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])? - an optional Group 2: zero or more word chars or chars from the .,#?^=%&:/~+#- set and then a word char or a char from the #?^=%&/~+#- set.
As an alternative solution, you could match match what you don't want to remove and capture what you do want to remove.
You can use a callback with Replace and test for the existence of group 1. If it is there, return an empty string. If it is not there, return the match to leave it unchanged.
\S*/images\S*|(?<!\S)((?:(?:https?|ftp)://)?[\w-]+(?:(?:\.[\w-]+)+)(?:[\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?)
Explanation
\S*/images\S* Match /images preceded and followed by optional non whitespace chars that your want to keep
| Or
(?<!\S) Assert a whitespace boundary to the left
((?:(?:https?|ftp)://)?[\w-]+(?:(?:\.[\w-]+)+)(?:[\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?) The pattern that you tried with some minor changes to make it a bit shorter
Regex demo (Click on the Table tab to see the matches)
For example
var s = #"this is my text here is a link http://dfdf.com and my is /images/dd.gif";
var regex = new Regex(#"\S*/images\S*|(?<!\S)((?:(?:https?|ftp)://)?[\w-]+(?:(?:\.[\w-]+)+)(?:[\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?)");
var result = regex.Replace(s, match => match.Groups[1].Success ? "" : match.Value);
Console.WriteLine(result);
See a C# demo
I have a bunch of URLs that I need to filter out, based on whether it contains the keyword 'staff'
1. /services
2. /services/EarNoseThroat
3. /services/EarNoseThroat/Audiology
4. /services/EarNoseThroat/Audiology/CochlearImplant
5. /services/BehavioralHealth/Clinic
6. /services/BehavioralHealth/Clinic/staff
7. /services/BehavioralHealth/Clinic/staff/Jamie-Hudgins
I want to create one regex pattern to match all the URLs that have /services after the host URL, but not 'staff' anywhere in the URL. Basically match URLS 1 to 5.
I also need a pattern than only match URL 6 and 7.
It seems like the negative lookahead will do the trick, except I don't know how to put it together. Can someone help me out?
Something like:
^\/services\/(?:[^\/]+\/?)*$
OR
^/services\/...any Depth here...\/(?!staff)
Regex to match the following:
/services
/services/EarNoseThroat
/services/EarNoseThroat/Audiology
/services/EarNoseThroat/Audiology/CochlearImplant
/services/BehavioralHealth/Clinic
Regex:
^\/services\/(?!.*\bstaff\b).*$
Click for Demo
Explanation:
^ - asserts the start of the string
\/services\/ - matches /services/
(?!.*\bstaff\b) - negative lookahead to make sure that the word staff does not appear anywhere in the string
.* - matches 0+ occurrences of any character except a newline character
$ - asserts the end of string
Regex to match the following:
/services/BehavioralHealth/Clinic/staff
/services/BehavioralHealth/Clinic/staff/Jamie-Hudgins
Regex:
^\/services\/(?=.*\bstaff\b).*$
Click for Demo
Explanation:
The only difference is the positive lookahead:
(?=.*\bstaff\b) - positive lookahead to make sure that the word staff appears somewhere in the string before the end of the string
I want to match exact and prefix wildcard match but there's one condition that It should not be surrounded by a particular tag.
For example: if the word to match is test, the regular expression should match
test, testing,tester ,testing.aspx but it should not match test</x> and testing</x>, tester</x> and other words with prefix test
I came up with a regex which is matching test</x> too.
string regex = string.Format("\\b{0}(\\S)*(?!</x>)", "test");
Can somebody help me in correcting my regex?
The \btest(\S)*(?!</x>) pattern matches test</x> because \btest finds a word starting with test, then matches and repeatedly captures any 0+ non-whitespace chars, and then checks if there is no </x> immediately to the right of the current location. Since (\S)* matches the whole </x> at once the negative lookahead checks for </x> when the regex index is already placed after this </x> - and thus it returns true and the match is a success.
Yo may use
string regex = string.Format(#"(?>\b{0}[^<\s]*)(?!</x>)", "test");
// or, beginning with C#6
// var regex = $#"(?>\b{SearchWord}[^<\s]*)(?!</x>)";
See the regex demo
Now, it will match like this:
(?>\btest[^<\s]*) - an atomic group matching
\b - a word boundary
test - search term
[^<\s]* - 0+ chars other than < and whitespace
(?!</x>) - a negative lookahead that fails the match if there is a </x> char sequence immediately to the right of the current location
I'm still learning a lot about regex, so please forgive any naivety.
I've been using this site to test:
http://www.systemtextregularexpressions.com/regex.match
Basically, I'm having issues writing a regular expression that will match on any value after a pipe in between brackets.
Given an example string of:
"<div> \n [dont1.dont2|match1|match2] |dont3 [dont4] dont5. \n </div>"
Expected output would be a collection:
match1,
match2
The closest I've been able to get so far is:
(?!\[.*(\|)\])(?:\|)([\w-_.,:']*)
Above gives me the values, including the pipes, and dont3.
I've also tried this guy:
\|(.*(?=\]))
but it outputs:
|match1|match2
Here's one way of doing it:
(?<=\[[^\]]*\|)[^\]|]*
Here's the meaning of the pattern:
(?<=\[[^\]]*\|) - Lookbehind expression to ensure that any match must be preceded by an open bracket, followed by any number of non-close-bracket characters, followed by a pipe character
(?<= ... ) - Declares a lookbehind expression. Something matching the lookbehind must immediately precede the text in order for it the match. However, the part matched by the lookbehind is not included in the resulting match.
\[ - Matches an open bracket character
[^\]]* - Matches any number of non-close-bracket characters
\| - Matches a pipe character
[^\]|]* - Matches any number of characters which are neither close brackets nor pipe characters.
The lookbehind is greedy, so it will allow for any number of pipes between the open bracket and the matching text.
try this:
\[.*?(?:\|(?<mydata>.*?))+\]
note: the online tool will only show you the last capture inside a quantifed () for a given match, but .NET will remember each capture of a group that matches multiple times
Try this:
^<div>\s*[^|]+|([^|]+)|([^|]+)
what is the difference between the two regex
new Regex(#"(([[[{""]))", RegexOptions.Compiled)
and
new Regex(#"(^[[[{""])", RegexOptions.Compiled)
I've used the both regex but can't find the difference. it's almost match similar things.
The regex patterns are not well written because
There are duplicate characters in character classes (thus redundant)
The first regex contains duplicate capture group on the whole pattern.
The first regex - (([[[{""])) - matches 1 character, either a [, a {, or a ", and captures it into Group 1 and Group 2. See demo. It is equal to
[[{"]
Demo
The second regex - (^[[[{""]) - only matches the same characters as the pattern above, but at the beginning of a string (if RegexOptions.Multiline is not set), or the beginning of a line (if that option is set). See demo. It is equal to
^[[{"]
See demo
You will access the matched characters using Regex.Match(s).Value.
More about anchors
Aslo see Caret ^: Beginning of String (or Line)