c# regex won't match newline [duplicate] - c#

This question already has answers here:
C# Regex Match between with or without new lines
(2 answers)
How do I match any character across multiple lines in a regular expression?
(26 answers)
Closed 4 years ago.
I am trying to use a regex in c# to parse a multi line file ( Currently just in a string ) And this part seems to be where the problem is (.*?\n) it will split the .v/L1 and .v/L2 however when I put a nl between it fails the input file will look something like this.
MSG
.v/1L
.v/2L
.some other data
.and so on
ENDMSG
And this is part of the c# code
string nl = new string(new char[] { '\u000A' });
string pattern = #"(?<group>((?<type>MSG\n)(.*?\n)(?<end>ENDMSG\n)))";
string input = #" MSG" + nl + ".v/1L.v/2L" + nl + "ENDMSG" + nl;
// The Line below doesn't work
//string input = #" MSG" +nl+ ".v/1L" +nl+ ".v/2L" +nl+ "ENDMSG" + nl;
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
And this is the output of the one that works:
Starting RegEx !
RegEx : (?<group>((?<type>MSG\n)(.*?\n)(?<end>ENDMSG\n)))
'MSG
.v/1L.v/2L << Should be split into 2 when adding a \n between
ENDMSG
' found at index 1.
Executing finally block.

If you specify RegexOptions.Multiline then you can use ^ and $ to match the start and end of a line, respectively.
If you don't wish to use this option, remember that a new line may be any one of the following: \n, \r, \r\n, so instead of looking only for \n, you should perhaps use something like: [\n\r]+, or more exactly: (\n|\r|\r\n).

Related

Regex don't match more than once the pattern in incoming string [duplicate]

This question already has answers here:
What do ^ and $ mean in a regular expression?
(2 answers)
Difference between \b and \B in regex
(10 answers)
Closed 2 years ago.
I have a task to write a regex for a valid name. It has to consist two words, each word starts with capital letter, afterwards contains only lowercase letters, should be at least two letters long and the words are separated by a space.
string input = "Stephen King, Bruce hatow, eddie Lee, AShley Greene, Test Testov, Ann Grey";
string pattern = #"^\b[A-Z][a-z]{1,}\s\b[A-Z][a-z]{1,}$";
MatchCollection output = Regex.Matches(input, pattern);
foreach (Match item in output)
{
Console.Write(item);
}
My pattern matches only if the string has one name. e.g. string input = "Stephen King". Is there a way to do it with string or I should use List of strings and check each one of them
Try removing the starting and ending anchors ^ and $. Instead, replace them with word boundaries, allowing for a first/last name match to occur anywhere in the input string:
string input = "Stephen King, Bruce hatow, eddie Lee, AShley Greene, Test Testov, Ann Grey";
string pattern = #"\b[A-Z][a-z]{1,}\s[A-Z][a-z]{1,}\b";
MatchCollection output = Regex.Matches(input, pattern);
foreach (Match item in output)
{
Console.Write(item + "\n");
}
This prints:
Stephen King
Test Testov

Replacing mutiple occurrences of string using string builder by regex pattern matching

We are trying to replace all matching patterns (regex) in a string builder with their respective "groups".
Firstly, we are trying to find the count of all occurrences of that pattern and loop through them (count - termination condition). For each match we are assigning the match object and replace them using their respective groups.
Here only the first occurrence is replaced and the other matches are never replaced.
*str* - contains the actual string
Regex - ('.*')\s*=\s*(.*)
To match pattern:
'nam_cd'=isnull(rtrim(x.nam_cd),''),
'Company'=isnull(rtrim(a.co_name),'')
Pattern : created using https://regex101.com/
*matches.Count* - gives the correct count (here 2)
String pattern = #"('.*')\s*=\s*(.*)";
MatchCollection matches = Regex.Matches(str, pattern);
StringBuilder sb = new StringBuilder(str);
Match match = Regex.Match(str, pattern);
for (int i = 0; i < matches.Count; i++)
{
String First = String.Empty;
Console.WriteLine(match.Groups[0].Value);
Console.WriteLine(match.Groups[1].Value);
First = match.Groups[2].Value.TrimEnd('\r');
First = First.Trim();
First = First.TrimEnd(',');
Console.WriteLine(First);
sb.Replace(match.Groups[0].Value, First + " as " + match.Groups[1].Value) + " ,", match.Index, match.Groups[0].Value.Length);
match = match.NextMatch();
}
Current output:
SELECT DISTINCT
isnull(rtrim(f.fleet),'') as 'Fleet' ,
'cust_clnt_id' = isnull(rtrim(x.cust_clnt_id),'')
Expected output:
SELECT DISTINCT
isnull(rtrim(f.fleet),'') as 'Fleet' ,
isnull(rtrim(x.cust_clnt_id),'') as 'cust_clnt_id'
A regex solution like this is too fragile. If you need to parse any arbitrary SQL, you need a dedicated parser. There are examples on how to parse SQL properly in Parsing SQL code in C#.
If you are sure there are no "wild", unbalaned ( and ) in your input, you may use a regex as a workaround, for a one-off job:
var result = Regex.Replace(s, #"('[^']+')\s*=\s*(\w+\((?>[^()]+|(?<o>\()|(?<-o>\)))*\))", "\n $2 as $1");
See the regex demo.
Details
('[^']+') - Capturing group 1 ($1): ', 1 or more chars other than ' and then '
\s*=\s* - = enclosed with 0+ whitespaces
(\w+\((?>[^()]+|(?<o>\()|(?<-o>\)))*\)) - Capturing group 2 ($2):
\w+ - 1+ word chars
\((?>[^()]+|(?<o>\()|(?<-o>\)))*\) - a (...) substring with any amount of balanced (...)s inside (see my explanation of this pattern).

RegEx in C# Replace Method [duplicate]

This question already has answers here:
C#: How to Delete the matching substring between 2 strings?
(6 answers)
Closed 4 years ago.
I am trying to write the RegEx for replacing "name" part in below string.
\profile\name\details
Where name: -Can have special characters
-No spaces
Let's say I want to replace "name" in above path with ABCD, the result would be
\profile\ABCD\details
What would be the RegEx to be used in Replace for this?
I have tried [a-zA-Z0-9##$%&*+\-_(),+':;?.,!\[\]\s\\/]+$ but it's not working.
As your dynamic part is surrounded by two static part you can use them to find it.
\\profile\\(.*)\\details
Now if you want to replace only the middle part you can either use LookAround.
string pattern = #"(?<=\\profile\\).*(?=\\details)";
string substitution = #"titi";
string input = #"\profile\name\details
\profile\name\details
";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
Or use the replacement patterns $GroupIndex
string pattern = #"(\\profile\\)(.*)(\\details)";
string substitution = #"$1Replacement$3";
string input = #"\profile\name\details
\profile\name\details
";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
For readable nammed group substitution is a possibility.

Find hashtags in string

I am working on a Xamarin.Forms PCL project in C# and would like to detect all the hashtags.
I tried splitting at spaces and checking if the word begins with an # but the problem is if the post contains two spaces like "Hello #World Test" it would lose that the double space
string body = "Example string with a #hashtag in it";
string newbody = "";
foreach (var word in body.Split(' '))
{
if (word.StartsWith("#"))
newbody += "[" + word + "]";
newbody += word;
}
Goal output:
Example string with a [#hashtag] in it
I also only want it to have A-Z a-z 0-9 and _ stopping at any other character
Test #H3ll0_W0rld$%Test => Test [#H3ll0_W0rld]$%Test
Other Stack questions try to detect the string and extract it, I would like it work with it and put it back in the string without losing anything that methods such as splitting by certain characters would lose.
You can use Regex with #\w+ and $&
Explanation
# matches the character # literally (case sensitive)
\w+ matches any word character (equal to [a-zA-Z0-9_])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
$& Includes a copy of the entire match in the replacement string.
Example
var input = "asdads sdfdsf #burgers, #rabbits dsfsdfds #sdf #dfgdfg";
var regex = new Regex(#"#\w+");
var matches = regex.Matches(input);
foreach (var match in matches)
{
Console.WriteLine(match);
}
or
var result = regex.Replace(input, "[$&]" );
Console.WriteLine(result);
Ouput
#burgers
#rabbits
#sdf
#dfgdfg
asdads sdfdsf [#burgers], [#rabbits] dsfsdfds [#sdf] [#dfgdfg]
Updated Demo here
Another Example
Use a regular expression: \#\w*
string pattern = "\#\w*";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(input);

Split a string by Regex [duplicate]

This question already has answers here:
Regular expression to extract text between square brackets
(15 answers)
Closed 5 years ago.
I'm currently thinking of how to split this kind of string into regex using c#.
[01,01,01][02,03,00][03,07,00][04,06,00][05,02,00][06,04,00][07,08,00][08,05,00]
Can someone knowledgeable on regex can point me on how to achieved this goal?
sample regex pattern that don't work:
[\dd,\dd,\dd]
sample output:
[01,01,01]
[02,03,00]
[03,07,00]
[04,06,00]
[05,02,00]
[06,04,00]
[07,08,00]
[08,05,00]
This will do the job in C# (\[.+?\]), e.g.:
var s = #"[01,01,01][02,03,00][03,07,00][04,06,00][05,02,00][06,04,00][07,08,00][08,05,00]";
var reg = new Regex(#"(\[.+?\])");
var matches = reg.Matches(s);
foreach(Match m in matches)
{
Console.WriteLine($"{m.Value}");
}
EDIT This is how the expression (\[.+?\]) works
first the outter parenthesis, ( and ), means to capture whatever the inside pattern matched
then the escaped square brackets, \[ and \], is to match the [ and ] in the source string
finally the .+? means to match one or more characters, but as few times as possible, so that it won't match all the characters before the first [ and the last ]
I know you stipulated Regex, however it's worth looking at Split again, if for only for academic purposes:
Code
var input = "[01,01,01][02,03,00][03,07,00][04,06,00][05,02,00][06,04,00][07,08,00][08,05,00]";
var output = input.Split(']',StringSplitOptions.RemoveEmptyEntries)
.Select(x => x + "]") // the bracket back
.ToList();
foreach(var o in output)
Console.WriteLine(o);
Output
[01,01,01]
[02,03,00]
[03,07,00]
[04,06,00]
[05,02,00]
[06,04,00]
[07,08,00]
[08,05,00]
The Regex solution below is restricted to 3 values of only 2 digits seperated by comma. Inside the foreach loop you can access the matching value via match.Value. >> Refiddle example
Remember to include using System.Text.RegularExpressions;
var input = "[01,01,01][02,03,00][03,07,00][04,06,00][05,02,00][06,04,00][07,08,00][08,05,00]";
foreach(var match in Regex.Matches(input, #"(\[\d{2},\d{2},\d{2}\])+"))
{
// do stuff
}
Thanks all for the answer i also got it working by using this code
string pattern = #"\[\d\d,\d\d,\d\d]";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(myResult);
Debug.WriteLine(matches.Count);
foreach (Match match in matches)
Debug.WriteLine(match.Value);

Categories