Regular Expression with wildcard - c#

I am trying to replace some content using regular expression and not able to do it, can you please have a look..
My Input: <Tag>E2iamjunkblabla</Tag>
Expected Output: <Tag>E2done</Tag>
I am trying this:
string input = "<Tag>E2iamjunkblabla</Tag>";
string output= System.Text.RegularExpressions.Regex.Replace(input, "<Tag>E2*</Tag>", "<Tag>E2done</Tag>");
What am I doing wrong? Also is there any way to retain first 3 characters(numbers or alphbets) after E2?
I mean the output should be
<Tag>E2iam</Tag>

Sounds like you want this:
string input = "<Tag>E2iamjunkblabla</Tag>";
string output = System.Text.RegularExpressions.Regex.Replace(input, "<Tag>E2(...).*</Tag>", #"<Tag>E2$1done</Tag>");
To break it down:
The match:
Match <Tag> then match E2 then match any character 3 times (...) (the parenthesis mean to store that capture in a group), then match any character zero or more times .* followed by the literal </Tag>
The replace:
Replace the value with <Tag>E2 then the value of capture group 1 $1 then the literal done</Tag>
Let me know if you have issues - and read up on regex! (oh and there are probably a load of ways to do this, this is just one of them)

Related

Get substring with RegEx

I am really struggling with RegEx. I want my RegEx (if possible) to do 2 things:
1- Validate that the whole string respects the format NAME_STKBYGRP.CSV
2- Extract the NAME substring if match
Examples:
TEST_STKBYGRP.CSV -> TEST
other_stkbygrp.csv -> other
test_wrong.csv -> ""
Here is what I tried so far
string input = "NAME_STKBYGRP.CSV";
Regex regex = new Regex("([A-Z])*_STKBYGRP.CSV", RegexOptions.IgnoreCase);
string s = regex.Match(input).Value;
It does return "" if it doesn't match but return the whole input if it matches.
You need to read regex.Match(input).Groups[1].Value if you only want the value of the first group.
You should also add a ^ and $ at the start and end of your regex if you want to rule out strings like evilnumber12345_NAME_STKBYGRP.CSVevilsuffix.
Edit: adv12 also has a good point about the location of the * - it should be inside the parentheses.
First off, your * should be inside the parentheses. Otherwise, you'll capture several single-character groups. Then, use Match.Groups[1] to get just the characters matched by the portion of the regex in the parentheses.

How do I exclude a regex value in a replace

I have a regex expression which searches for strings using a Prefix and a Suffix. In it's simplest form \$\$\w+\$\$ will find $$My_Name$$ (in this case the Prefix and Suffix are both equal to $$) Another example would be \[\#\w+\#\] to match [#My_Name#].
The Prefix and Suffix will always be a specific string of 0 to n characters which I can always safely escape for a direct character match.
I extract the Matches in my C# code so I can work with them but obviously my match contains $$My_Name$$ but what I want is to simply get the inner string between the Suffix and Prefix: My_Name.
How do I exclude the Prefix and Suffix from the result?
Change your REGEX to \$\$(\w+)\$\$ and use $1 to get the matching (inner) string.
For example
string pattern = #"\$\$(\w+)\$\$";
string input = "$$My_Name$$";
Regex rgx = new Regex(pattern);
Match result = rgx.Match(input);
Console.WriteLine(result.Groups[1]);
Outputs: "My Name"
P.S - There's no need to use explicitly typed local variables, but I just wanted the types to be clear.
You can group your w+ into a group like this (w+) then when you retrieve the matched string you might be able to ask for that subgroup.
I do not know if I am wrong (but you didn't provided any code whatsoever) but I think this is how it is done: .Groups[1].Value on the the result of Regex.Match.
How about the regex below. It works by capturing the first character into a named group then capturing any repeats into a named group called first group which it then uses to match the end of the string. It will work with any number of repeated character so long as they repeated at the end of the word.
'(?<first_group>(?<first_char>.)\k<first_char>+)(?<word>\w+)\k<first_group>+'
You just need to then extract the capture group named word like so:
String sample = "$$My_Name$$";
Regex regex = new Regex("(?<first_group>(?<first_char>.)\k<first_char>+)(?<word>\w+)\k<first_group>+");
Match match = regex.Match(sample);
if (match.Success)
{
Console.WriteLine(match.Groups["word"].Value);
}
You can use named group like this:
(\$\$)(?<group1>.+?)\1 -- pattern 1 (first case)
\[(#)(?<group2>.+?)\1\] -- pattern 2 (second case)
or combined representation would be:
(\$\$)(?<group1>.+?)\1|\[(#)(?<group2>.+?)\3\]
I would suggest you to use .+? it will help you to match any character other than your prefix/suffix.
Live Demo

Regex Substring or Left Equivalent

Greetings beloved comrades.
I cannot figure out how to accomplish the following via a regex.
I need to take this format number 201101234 and transform it to 11-0123401, where digits 3 and 4 become the digits to the left of the dash, and the remaining five digits are inserted to the right of the dash, followed by a hardcoded 01.
I've tried http://gskinner.com/RegExr, but the syntax just defeats me.
This answer, Equivalent of Substring as a RegularExpression, sounds promising, but I can't get it to parse correctly.
I can create a SQL function to accomplish this, but I'd rather not hammer my server in order to reformat some strings.
Thanks in advance.
You can try this:
var input = "201101234";
var output = Regex.Replace(input, #"^\d{2}(\d{2})(\d{5})$", "${1}-${2}01");
Console.WriteLine(output); // 11-0123401
This will match:
two digits, followed by
two digits captured as group 1, followed by
five digits captured as group 2
And return a string which replaces that matched text with
group 1, followed by
a literal hyphen, followed by
group 2, followed by
a literal 01.
The start and end anchors ( ^ / $ ) ensure that if the input string does not exactly match this pattern, it will simply return the original string.
If you can use custom C# scripts, you may want to use Substring instead:
string newStr = string.Format("{0}-{1}01", old.Substring(2,2), old.Substring(4));
I don't think you really need a regex here. Substring would be better. But still if you want regex only, you can use this:
string newString = Regex.Replace(input, #"^\d{2}(\d{2})(\d+)$", "$1-${2}01");
Explanation:
^\d{2} // Match first 2 digits. Will be ignored
(\d{2}) // Match next 2 digits. Capture it in group 1
(\d+)$ // Match rest of the digits. Capture it in group 2
Now, the required digits, are in group 1 and 2, which you use in the replacement string.
Do you even SQL? Pull some levers and stuff.

Regular Expression for string

I have a string like
e.g AHDFFH XXXX
where 'AHDFFH' can be char string of any length.
AND 'XXXX' will be repeated no. of 'X' chars of any length which needs to be replaced by auto incremented database value in a table.
I need to find repeated 'X' chars from above string using regular expression.
Can anyone please help me to figure this out..??
Try this:
\b(\p{L})\1+\b
Explanation:
<!--
\b(\p{L})\1+\b
Options: case insensitive; ^ and $ match at line breaks
Assert position at a word boundary «\b»
Match the regular expression below and capture its match into backreference number 1 «(\p{L})»
A character with the Unicode property “letter” (any kind of letter from any language) «\p{L}»
Match the same text as most recently matched by capturing group number 1 «\1+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Assert position at a word boundary «\b»
-->
is your meaning some chars + (on or some)space + some numbers?
if so u can use this regexpression:
\w+\s+(\d+)
c# codes like this:
System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(#"\w+\s+(\d+)");
System.Text.RegularExpressions.Match m = regex.Match("aaaa 3333");
if(m.Success) {
MessageBox.Show(m.Groups[1].Value);
}

Fetch values between two [] from a string using Regular expressions

I have a string like as folows :
"channel_changes":[[1313571300,26.879846,true],[1313571360,26.901025,true]]
I want to extract each string in angular brace like 1313571300, 26.879846, true
through regular expression.
I have tried using
string regexPattern = #"\[(.*?)\]";
but that gives the first string as [[1313571420,26.901025,true]
i.e with one extra angular brace.
Please help me how can I achieve this.
This seemed to work in Expresso for me:
\[([\w,\.]*?)\]
Literal [
[1]: A numbered capture group. [[\w,.]*?]
- Any character in this class: [\w,.], any number of repetitions, as few as possible
Literal ]
The problem seemed to be the "." in your regex - since it was picking up the first literal "[" and considering the following "[" in your input to be valid as the next character.
I constrained it to just alphanumeric characters, commas and literal full-stops (period mark), since that's all that was present in your example. You could go further and really specify the format of the data inside those inner square brackets assuming it's consistent, and end up with something more like this:
\[[0-9.]+,[0-9.]+,(true|false)\]
Example C# code:
var matches = Regex.Matches("\"channel_changes\":[[1313571300,26.879846,true],[1313571360,26.901025,true]]", #"\[([\w,\.]*?)\]");
foreach (var match in matches)
{
Console.WriteLine(match);
}
Try this:
#"\[+([^\]]+)\]+"
"[^]]+" - it means any character except right square bracket
Try this
\[([^\[\]]*)\]
See it here online on Regexr
[^\[\]]* is a negated character class, means match any character but [ and ]. With this construct you don't need the ? to make your * ungreedy.

Categories