.NET Regex: how to retrieve multiple matches on multiple lines

.NET Regex: how to retrieve multiple matches on multiple lines - c#

I have the following Regex:
\b((.|\n)*)=((.|\n)*)new((.|\n)*)\(\)
It is used to detect an object assignment from a c# source code string,
like this one: var a = new Person();
it works fine when I have only one match, but if I try to process this:
var a = new Person();
var x = new WebClient();
It returns only one match, like this: {var a = new Person(); var x = new WebClient()}
I need to extract both matches. How do I do that , I'm relatively new to regex and I have no idea what to do.
When I test my regex on RegExr , it works just fine (with the global checkbox checked)..

This expression should get you started. Try passing in the Multiline regex option rather than trying to deal with newlines in the regex itself:
var src = #"var a = new Person();
var x = new WebClient();";
var pattern = #"(\w+\s*)(\w*\s*)=\s+new\s+(\w+)\(\)";
var expr = new System.Text.RegularExpressions.Regex(pattern,RegexOptions.Multiline);
foreach(Match match in expr.Matches(src) )
{
var assignType = match.Groups[1].Value;
var id = match.Groups[2].Value;
var objType = match.Groups[3].Value;
}
That said, there are (much) better tools than RegEx to deal with C# parsing, are you interested in those?

\n is allowing it to match new line.
This works for me against your test data in expresso:
\b((.)*)=((.)*)new((.)*)\(\)
If you don't need the matching groups - the brackets - this seems to work as well:
\b.*=.*new.*\(\)
This is possibly a better fit than using . (any character).
\b[\w\s]*=[\w\s]*new[\w\s]*\(\)
If you're confident the code base has exact spacing (e.g. enforced by something like StyleCop) then you can get more specific again with regards to the \w (word character) and \s (space character).
Also I'm not sure if it is intentional, but you're not matching the ; at the end of the line.

You can use named groups. I modified the pattern to the following and the groups named asgn will match a whole assignment:
(?<asgn>\b\w+\s+\w+\s*\=\s*new\s+\w+\([^)]*\)\s*;)
This is how to access the named group:
string pat = #"(?<asgn>\b\w+\s+\w+\s*\=\s*new\s+\w+\([^)]*\)\s*;)";
string input = #"var a = new Person();
var x = new WebClient();";
foreach (Match m in Regex.Matches(input, pat))
{
Console.WriteLine(m.Groups["asgn"].Value);
}
If you need to parse and extract each part of the assignment, you can name more groups into the pattern, as the following:
(?<asgn>\b(?<vtype>\w+)\s+(?<name>\w+)\s*\=\s*new\s+(?<type>\w+)\((?<args>[^)]*)\)\s*;)
with which you can extract variable-type, variable name, type, and constructor args from the matched string.

Related

Can LINQ be used to search for Regex expressions in a string?

I have the following code that works, but would like to edit it up using LINQ to find if any of the Regex search strings are in the target.
foreach (Paragraph comment in
wordDoc.MainDocumentPart.Document.Body.Descendants<Paragraph>().Where<Paragraph>(comment => comment.InnerText.Contains("cmt")))
{
//print values
}
More precisely I have to select through LINQ if the string start with letters or start with symbols - or •
This Regex is correct for my case ?
string pattern = #"^[a-zA-Z-]+$";
Regex rg = new Regex(pattern);
Any suggestion please?
Thanks in advance for any help

You can. It would be better to use query syntax though, as described here: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/how-to-combine-linq-queries-with-regular-expressions
Example:
var queryMatchingFiles =
from file in fileList
where file.Extension == ".htm"
let fileText = System.IO.File.ReadAllText(file.FullName)
let matches = searchTerm.Matches(fileText)
where matches.Count > 0
select new
{
name = file.FullName,
matchedValues = from System.Text.RegularExpressions.Match match in matches
select match.Value
};
Your pattern is fine, just remove the $ from the end and add any character
#"^[a-zA-Z-]+. *"

Your regex should be modified as
^[\p{L}•-]
To also allow whitespace at the start of the string add \s and use
^[\p{L}\s•-]
Details
^ - start of string
[\p{L}•-] - a letter, • or -
[\p{L}•-] - a letter, whitespace, • or -
In C#, use
var reg = new Regex(#"^[\p{L}•-]");
foreach (Paragraph comment in
wordDoc.MainDocumentPart.Document.Body.Descendants<Paragraph>()
.Where<Paragraph>(comment => reg.IsMatch(comment.InnerText)))
{
//print values
}
If you want to match those items containing cmt and also matching this regex, you may adjust the pattern to
var reg = new Regex(#"^(?=.*cmt)[\p{L}\s•-]", RegexOptions.Singleline);
If you need to only allow cmt at the start of the string:
var reg = new Regex(#"^(?:cmt|[\p{L}\s•-])");

"Cut out" a specific Text of a file and put it into a string

My application saves informations into a file like this:
[Name]{ExampleName}
[Path]{ExamplePath}
[Author]{ExampleAuthor}
I want to cut the [Name]{....} out and just get back the "ExampleName".
//This is what the strings should contain in the end.
string name = "ExampleName"
string path = "ExamplePath"
Is there any way to do this in C#?

You can extract the keys and the values and push them into a dictionary that you can later easily access like this:
var text = "[Name]{ExampleName} [Path]{ExamplePath} [Author]{ExampleAuthor}";
// You can use regex to extract the Value/Pair
var rgx = new Regex(#"\[(?<key>[a-zA-Z]+)\]{(?<value>[a-zA-Z]+)}", RegexOptions.IgnorePatternWhitespace);
var matches = rgx.Matches(text);
// Now you can add the values to a dictionary
var dic = new Dictionary<string, string>();
foreach (Match match in matches)
{
dic.Add(match.Groups["key"].Value, match.Groups["value"].Value);
}
// Then you can access your values like this.
var name = dic["Name"];

You could use a regular expression to cut out the string part between brackets:
var regex = new Regex("\[.*\]{(?<variableName>.*)}");
If you try matching this regular expression on your strings, you end up with your resulting strings in the match group 'variableName' (match.Groups["variableName"].Value).

I'm not sure to understand what you need but I will try.
You can use this :
var regex = new Regex(Regex.Escape("Name"));
var newText = regex.Replace("NameExampleName", "", 1);

Would like to split a string using a regex pattern

I have a string that I would like to split into
var finalQuote = "2012-0001-1";
var quoteNum = "2012-0001";
var revision = "1"
I used something like this
var quoteNum = quoteNum.subString(0,9);
var revision = quoteNum.subString(quoteNum.lastIndexOf("-") + 1);
But can't it be done using regex more efficiently? I come across patterns like this that need to be split into two.
var finalQuote = "2012-0001-1";
string pat = #"(\d|[A-Z]){4}-\d{4}";
Regex r = new Regex(pat, RegexOptions.IgnoreCase);
Match m = r.Match(text);
var quoteNum = m.Value;
So far I have reached here. But I feel I am not using the correct method. Please guide me.
EDIT: I wanna edit by the pattern. Splitting with dashes is not an option as the first part of the split contains a dash. ie, "2012-0001"

I would simply go with:
var quoteNum = finalQuote.Substring(0,9);
var revision = finalQuote.Substring(10);
quoteNum would consist of the first 9 characters, and revision of the 10th and everything that may follow the 10th, e.g. if the revision is 10 or higher it would still work.
Using complicated regexes or extension methods is very quickly overkill; sometimes the simple methods are efficient enough by itself.

I would agree with others that using substring is a better solution than regex for this.
But if you're insisting on using regex you can use something like:
^(\d{4}-\d{4})-(\d)$
Untested since I don't have a C# environment installed:
var finalQuote = "2012-0001-1";
string pat = #"^(\d{4}-\d{4})-(\d)$";
Regex r = new Regex(pat);
Match m = r.Match(finalQuote);
var quoteNum = m.Groups[1].Value;
var revision = m.Groups[2].Value;
Alternatively, if you want a string[] you could try (again, untested):
string[] data = Regex.Split("2012-0001-1",#"-(?=\d$)");
data[0] would be quoteNum and data[1] would be revision.
Update:
Explanation of the Regex.Split:
From the Regex.Split documentation: The Regex.Split methods are similar to the String.Split method, except that Regex.Split splits the string at a delimiter determined by a regular expression instead of a set of characters.
The regex -(?=\d$) matches a single - given it is followed by a digit followed by the end of the string so it would only match the last dash in the string. The last digit is not consumed because we use a zero-width lookahead assertion (?=)

sIt would be easier to maintain in the future if you something that the new comer would understand.
you could use:
var finalQuote = "2012-0001-1";
string[] parts = finalQuote.Split("-");
var quoteNum = parts[0] + "-" + parts[1] ;
var revision = parts[3];
However if you insists you need a regEx then
(\d{4}-\d{4})-(\d)
There are two groups in this expression, group 1 capture the first part and the group 2 capture the second part.
var finalQuote = "2012-0001-1";
string pat = #"(\d{4}-\d{4})-(\d)";
Regex r = new Regex(pat, RegexOptions.IgnoreCase);
Match m = r.Match(finalQuote);
var quoteNum = m.Groups[1].Value;
var revision = m.Groups[2].Value;

Regex Problems, extracting data to groups

How I love regex!
I have a string which will be a mangled form of XML, like:
<Category>DIR</Category><Location>DL123A</Location><Reason>Because</Reason><Qty>42</Qty><Description>Some Desc</Description><IPAddress>127.0.0.1</IPAddress>
Everything will all be on one line, however the 'headers' will often be different.
So what I need to do is extract all information from the string above, putting it into a Dictionary/Hashtable
--
string myString = #"<Category>DIR</Category><Location>DL123A</Location><Reason>Because</Reason><Qty>42</Qty><Description>Some Desc</Description><IPAddress>127.0.0.1</IPAddress>";
//this will extract the name of the label in the header
Regex r = new Regex(#"(?<header><[A-Za-z]+>?)");
//Create a collection of matches
MatchCollection mc = r.Matches(myString);
foreach (Match m in mc)
{
headers.Add(m.Groups["header"].Value);
}
//this will try and get the values.
r = new Regex(#"(?'val'>[A-Za-z0-9\s]*</?)");
mc = r.Matches(myString);
foreach (Match m in mc)
{
string match = m.Groups["val"].Value;
if (string.IsNullOrEmpty(match) || match == "><" || match == "> <")
continue;
else
values.Add(match);
}
--
I hacked that together from previous work with regexes to the closest I could.
But it doesnt really work the way I want it.
the 'header' also pulls the angle brackets in.
The 'value' pulls in a lot of empties (hence the dodgy if statement in the loop). It also doesnt work on strings with periods, commas, spaces, etc.
It would also be much better if I could combine the two statements so I dont have to loop through the regex twice.
Can anyone give me some info where I can improve it?

If it looks like XML, why not use the XML parser functionalities of .net? All you need to do is to add a root element around it:
string myString = #"<Category>DIR</Category><Location>DL123A</Location><Reason>Because</Reason><Qty>42</Qty><Description>Some Desc</Description><IPAddress>127.0.0.1</IPAddress>";
var values = new Dictionary<string, string>();
var xml = XDocument.Parse("<root>" + myString + "</root>");
foreach(var e in xml.Root.Elements()) {
values.Add(e.Name.ToString(), e.Value);
}

This should strip the angle brackets:
Regex r = new Regex(#"<(?<header>[A-Za-z]+)>");
and this should get rid of empty spaces:
r = new Regex(#">\s*(?'val'[A-Za-z0-9\s]*)\s*</");

This will match the headers without <>:
(?<=<)(?<header>[A-Za-z]+)(?=>)
This will get all values (i'm not sure about what can be accepted as a value) :
(?<=>)(?'val'[^<]*)(?=</)
However this is all xml so You can :
XDocument doc = XDocument.Parse(string.Format("<root>{0}</root>",myString));
var pairs = doc.Root.Descendants().Select(node => new KeyValuePair<string, string>(node.Name.LocalName, node.Value));

.NET regular expression find the number and group the number

I have a question about .NET regular expressions.
Now I have several strings in a list, there may be a number in the string, and the rest part of string is same, just like
string[] strings = {"var1", "var2", "var3", "array[0]", "array[1]", "array[2]"}
I want the result is {"var$i" , "array[$i]"}, and I have a record of the number which record the number matched, like a dictionary
var$i {1,2,3} &
array[$i] {0, 1 ,2}
I defined a regex like this
var numberReg = new Regex(#".*(<number>\d+).*");
foreach(string str in strings){
var matchResult = numberReg.Match(name);
if(matchResult.success){
var number = matchResult.Groups["number"].ToString();
//blablabla
But the regex here seems to be not work(never match success), I am new at regex, and I want to solve this problem ASAP.

Try this as your regex:
(?<number>\d+)

It is not clear to me what exactly you want. However looking into your code, I assume you have to somehow extract the numbers (and maybe variable names) from your list of values. Try this:
// values
string[] myStrings = { "var1", "var2", "var3", "array[0]", "array[1]", "array[2]" };
// matches
Regex x = new Regex(#"(?<pre>\w*)(?<number>\d+)(?<post>\w*)");
MatchCollection matches = x.Matches(String.Join(",", myStrings));
// get the numbers
foreach (Match m in matches)
{
string number = m.Groups["number"].Value;
...
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

.NET Regex: how to retrieve multiple matches on multiple lines - c#

Related

Can LINQ be used to search for Regex expressions in a string?

"Cut out" a specific Text of a file and put it into a string

Would like to split a string using a regex pattern

Regex Problems, extracting data to groups

.NET regular expression find the number and group the number

Categories

Resources