Regex to clean repetitions of characters - c#

I have a pattern in the string like this:
T T and I want to T
And It can be any character from [a-z].
I have tried this Regex Example but not able to replace it.
EDIT
Like I have A Aa ar r then it should become Aar means replace any character 1st or 2nd no matter what it is.

You can use the backreferences for this.
/([a-z])\s*\1\s?/gi
Example
Some more explanation:
( begin matching group 1
[a-z] match any character from a to z
) end matching group 1
\s* match any amount of space characters
\1 match the result of matching group 1
exactly as it was again
this allows for the repition
\s? match none or one space character
this will allow to remove multiple
spaces when replacing

Related

Find regex pattern match string have multiple condition?

I have some strings formatted as follows:
1=case1,case2,..caseN;2=case1,..,caseN;3=case1, ..,caseN
Note: comma ";" is used to separate cases and case1, case2 are anything like strings, number doesn't matter their type.
I want to find regex pattern to match string
1=home,house;2=abc;3=2019,2021
however, it will not match the following:
1=home,;2=abc;3=2019,2021 (Excess comma mark at case 1)
1=;2=abc,2012;3= (must 1=..; not 1=;)
1=home,age;2 (must 2=.. not 2)
2=home;;3=sea (must ;3 not ;;3)
4=flower;k3=sea (must 3= , not k3)
I tried with the pattern: (\d+={1}[^;]+;). However, it will match if the backstring is not.
Please show me the way.
Many thanks!
Maybe this pattern helps you out:
^\b(?:(?:^|;)\d+=[^,;]+(?:,[^,;]+)*)+$
See the Online Demo
^ - Start string ancor.
\b - Word-boundary.
(?: - Opening 1st non-capture group.
(?:- Opening 2nd non-capture group.
^|; - Alternation between start string ancor or semi-colon.
) - Closing 2nd non-capture group.
\d+= - One or more digits followed by a =.
[^,;]+ - Negated character class, any character other than comma or semicolon one or more times.
(?: - Opening 3rd non-capture group.
, - A comma.
[^,;]+ - Negated character class, any character other than comma or semicolon one or more times.
)* - Close 3rd non-capture group and make it match zero or more times.
)+ - Close 1st non-capture group and make sure it's matches one or more times.
$ - End string ancor.
Note: I went with a negated character class since you mentioned "case1, case2 are anything like strings, number doesn't matter their type", therefor I read there can be spaces, special characters or any kind other than comma and semicolon.
This works on regex101
^(?:\d=(?:\w{1,},)*(?:\w{1,});)*(?:\d=(?:\w{1,},)*\w{1,})$
^(?:\d+=[a-z\d]+(?:,[a-z\d]+)*(?:;|$))+$
Demo
^ : match beginning of string
(?: : begin nc group
\d+=[a-z\d]+ : match 1+ digits, then '=' then 1+ lc letters or digits
(?:,[a-z\d]+) : match ',' then 1+ lc letters or digits in nc group
* : execute nc group 0+ times
(?:;|$) : match ';' or end of string
)+ : end nc group and execute 1+ times
$ : match end of string
I don't know if c# supports recursive pattern, but, if it does, use:
^(\d+=\w+(?:,\w+)*)(?:;(?1))*$
if it doesn't:
^\d+=\w+(?:,\w+)*(?:;\d+=\w+(?:,\w+)*)*$
Demo & explanation

How to add custom symbols to Regular Expression? DevExpress Mask

I have following regular expression
(\w)+(,(\w)+)*
which is comma separated characters and numbers only
test123,test3,test9
I want to also add symbols like #, #, $ that can be used within \w
when i try [(\w)$#] not worked.
I need to use it in DevExpress TextEdit Mask. it says syntax error
http://prntscr.com/pbyq7p
There is a reply at the bottom if this page which mentions that special characters cannot be used within [].
The available character are listed on Mask Type: Extended Regular Expressions
The advice is to use grouping with an alternation to separate the character class and the special character.
You might try
(\w+|[##$]+)+(,(\w+|[##$]+))+
In parts
( Group 1
\w+ Match 1+ word chars
| Or
[##$]+ Match 1+ times any of the lister
)+ Close group and repeat 1+ times
( Group 2
, Match literally
(\w+|[##$]+) Same pattern as group 1
)+ Close group and repeat the whole group starting with , 1+ times
Regex demo
If your data only consists of characters a-z and numbers only, you could also try
([a-z0-9##$]+)+(,([a-z0-9##$]+))+
Regex demo

How to match a string between <>?

I tried \w+\:(\w+\-?\.?(\d+)?) but that is not correct
I have following text
<staticText:HelloWorld>_<xmlNode:Node.03>_<date:yyy-MM-dd>_<time:HH-mm-ss-fff>
The end result I want is something like the following
["staticText:HelloWorld", "xmlNode:Node.03","date:yyy-MM-dd","time:HH-mm-ss-fff"]
You could use the following regex.
<(.*?)>
Then have a look at how groups work to retrieve the result.
Regex rx = new Regex("<(.*?)>");
string text = "<staticText:HelloWorld>_<xmlNode:Node.03>_<date:yyy-MM-dd>_<time:HH-mm-ss-fff>";
MatchCollection matches = rx.Matches(text);
Console.WriteLine(matches.Count);
foreach(Match match in matches){
var groups = match.Groups;
Console.WriteLine(groups[1]);
}
This line should be able to match the content:
<(.*?)>
It will catch the arrows at the end which you don't seem to want, but you could remove them after words without regex.
You should consider a website like https://regexr.com - it helps exponentially in writing regex by allowing you to paste your cases and see how it works with them.
Matches any string within the <>. Hope this helps.
<(.*?)>
Your pattern does not match the 3rd and the 4th part of the example data because in this part \w+\-?\.?(\d+)? the dash and the digits match only once and are not repeated.
For your example data, you might use a character class [\w.-]+to match the part after the colon to make the match a bit more broad:
<(\w+\:[\w.-]+)>
Regex demo | C# demo
Or to make it more specific, specify a pattern for either the Node.03 part and for the year month date hour etc parts using a repeated pattern.
<(\w+\:\w+(?:\.\d+|\d+(?:-\d+)+)?)>
Explanation
< Match <
( Capturing group
\w+\:\w+ Match 1+ word chars, : and 1+ word chars
(?: Non capturing group
\.\d+ Match . and 1+ digits
| Or
\d+(?:-\d+)+ Match 1+ digits and repeat 1+ times matching - and 1+ digits
)? Close non capturing group and make it optional
) Close capturing group
>
Regex demo | C# Demo

Need value in regex.match group

I want to match regex such that the sign(+ or -) in one group and figure in other group. It may possible that figure comes without any sign(+ or -)
Example
[-] 87.90
[+] 87.78
(-) 87.90
(+) 87.78
89
-89.56
- 89.98
I have used below regular expression
^\W*(\-|\+|)\W*(\d+(\.\d+)?)
By this I am getting empty in group 1
If I use
^\W*(\-|\+)\W*(\d+(\.\d+)?)
then 3rd figure will not match. So in short I want to match figure with (+ or -) or without any sign.
Group 1 is empty because the \W* greedily matches all non-word characters, that is, all parentheses and signs.
You should specify the literal parentheses in the pattern and a character class will be a more natural construct to match either a + or a -:
(?:\(?([-+])\)?)?\p{Zs}*(\d+(\.\d+)?)
See regex demo (if you need a full string match, use ^ at the start and $ at the end of the pattern).
Regex matches:
(?:\(?([-+])\)?)? - an optional non-capturing group ((?:...)) that matches a ( optionally, followed by a plus or minus (Group 1), and then by an optional )
\p{Zs}* - zero or more whitespace symbols
(\d+(\.\d+)?) - (Group 2) one or more digits followed by an optional capturing group (Group 3) that matches a period followed by one or more digits.
Result:

Regex for string with spaces and special characters - C#

I have been using Regex to match strings embedded in square brackets [*] as:
new Regex(#"\[(?<name>\S+)\]", RegexOptions.IgnoreCase);
I also need to match some codes that look like:
[TESTTABLE: A, B, C, D]
it has got spaces, comma, colon
Can you please guide me how can I modify my above Regex to include such codes.
P.S. other codes have no spaces/special charaters but are always enclosed in [...].
Regex myregex = new Regex(#"\[([^\]]*)]")
will match all characters that are not closing brackets and that are enclosed between brackets. Capture group \1 will match the content between brackets.
Explanation (courtesy of RegexBuddy):
Match the character “[” literally «\[»
Match the regular expression below and capture its match into backreference number 1 «([^\]]*)»
Match any character that is NOT a ] character «[^\]]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “]” literally «]»
This will also work if you have more than one pair of matching brackets in the string you're looking at. It will not work if brackets can be nested, e. g. [Blah [Blah] Blah].
/\[([^\]:])*(?::([^\]]*))?\]/
Capture group 1 will contain the entire tag if it doesn't have a colon, or the part before the colon if it does.
Capture group 2 will contain the part after the colon. You can then split on ',' and trim each entry to get the individual parts.

Categories