Regex global without collection c# - c#

How can I get global Regex in one string, and not in collection
Regex r = new Regex(".+");
Match match = r.Match("aaaa \r\n bbbb");
string result=match.Value;
I get: result="aaaa " and I want: result="aaaa \r\n bbbb"
I know that I can get that in a collection, but I need get that in Match datatype rather.

. doesn't match linebreaks unless you make it to.
You can use (?s) for that, like: new Regex("(?s).+")
Or the Singleline option, like: new Regex(".+", RegexOptions.Singleline)

The dot does not match newline characters by default, so you need to compile the regex using the RegexOptions.Singleline flag:
Regex r = new Regex(".+", RegexOptions.Singleline);

Related

C# Regex Match between with or without new lines

I am trying to match text between two delimiters, [% %], and I want to get everything whether the string contains new lines or not.
Code
string strEmailContent = sr.ReadToEnd();
string commentPatt = #"\[%((\r\n?|\n).*(\r\n?|\n))%\]";
Regex commentRgx = new Regex(commentPatt, RegexOptions.Singleline);
Sample Inputs
//Successful
[%
New Comment
%] other content from input
//Match: [%\r\nNew Comment\r\n%]
//Fail
[% New Comment %]
//Match: false
//Successfully match single line with
string commentPatt = #"\[%(.*)%\]";
//Match: [% New Comment %]
I do not know how to combine these two patterns to match both cases. Can anyone provide any assistance?
To get text between two delimiters you need to use lazy matching with .*?, but to also match newline symbols, you need (?s) singleline modifier so that the dot could also match newline symbols:
(?s)\[%(.*?)%]
Note that (?s)\[%(.*?)%] will match even if the % is inside [%...%].
See regex demo. Note that the ] does not have to be escaped since it is situated in an unambiguous position and can only be interpreted as a literal ].
In C#, you can use
var rx = new Regex(#"(?s)\[%(.*?)%]");
var res = rx.Matches(str).Cast<Match>().Select(p => p.Groups[1].Value).ToList();
Try this pattern:
\[%([^%]*)%\]
It captures all characters between "[%" and "%]" that is not a "%" character.
Tested # Regex101
If you want to "see" the "\r\n" in your results, you'll have to escape them with a String.Replace().
See Fiddle Demo

C# RegEx - get only first match in string

I've got an input string that looks like this:
level=<device[195].level>&name=<device[195].name>
I want to create a RegEx that will parse out each of the <device> tags, for example, I'd expect two items to be matched from my input string: <device[195].level> and <device[195].name>.
So far I've had some luck with this pattern and code, but it always finds both of the device tags as a single match:
var pattern = "<device\\[[0-9]*\\]\\.\\S*>";
Regex rgx = new Regex(pattern);
var matches = rgx.Matches(httpData);
The result is that matches will contain a single result with the value <device[195].level>&name=<device[195].name>
I'm guessing there must be a way to 'terminate' the pattern, but I'm not sure what it is.
Use non-greedy quantifiers:
<device\[\d+\]\.\S+?>
Also, use verbatim strings for escaping regexes, it makes them much more readable:
var pattern = #"<device\[\d+\]\.\S+?>";
As a side note, I guess in your case using \w instead of \S would be more in line with what you intended, but I left the \S because I can't know that.
depends how much of the structure of the angle blocks you need to match, but you can do
"\\<device.+?\\>"
I want to create a RegEx that will parse out each of the <device> tags
I'd expect two items to be matched from my input string:
1. <device[195].level>
2. <device[195].name>
This should work. Get the matched group from index 1
(<device[^>]*>)
Live demo
String literals for use in programs:
#"(<device[^>]*>)"
Change your repetition operator and use \w instead of \S
var pattern = #"<device\[[0-9]+\]\.\w+>";
String s = #"level=<device[195].level>&name=<device[195].name>";
foreach (Match m in Regex.Matches(s, #"<device\[[0-9]+\]\.\w+>"))
Console.WriteLine(m.Value);
Output
<device[195].level>
<device[195].name>
Use named match groups and create a linq entity projection. There will be two matches, thus separating the individual items:
string data = "level=<device[195].level>&name=<device[195].name>";
string pattern = #"
(?<variable>[^=]+) # get the variable name
(?:=<device\[) # static '=<device'
(?<index>[^\]]+) # device number index
(?:]\.) # static ].
(?<sub>[^>]+) # Get the sub command
(?:>&?) # Match but don't capture the > and possible &
";
// Ignore pattern whitespace is to document the pattern, does not affect processing.
var items = Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
Variable = mt.Groups["variable"].Value,
Index = mt.Groups["index"].Value,
Sub = mt.Groups["sub"].Value
})
.ToList();
items.ForEach(itm => Console.WriteLine ("{0}:{1}:{2}", itm.Variable, itm.Index, itm.Sub));
/* Output
level:195:level
name:195:name
*/

Dot word pattern matching

I want to create a regular expression to match a word that begins with a period. The word(s) can exist N times in a string. I want to ensure that the word comes up whether it's at the beginning of a line, the end of a line or somewhere in the middle. The latter part is what I'm having difficulty with.
Here is where I am at so far.
const string pattern = #"(^|(.* ))(?<slickText>\.[a-zA-Z0-9]*)( .*|$)";
public static MatchCollection Find(string input)
{
Regex regex = new Regex(pattern,RegexOptions.IgnoreCase | RegexOptions.Multiline);
MatchCollection collection = regex.Matches(input);
return collection;
}
My test pattern finds .lee and .good. My test pattern fails to find .bruce:
static void Main()
{
MatchCollection results = ClassName.Find("a short stump .bruce\r\nand .lee a small tree\r\n.good roots");
foreach (Match item in results)
{
GroupCollection groups = item.Groups;
Console.WriteLine("{0} ", groups["slickText"].Value);
}
System.Diagnostics.Debug.Assert(results.Count > 0);
}
Maybe you're just looking for \.\w+?
Test:
var s = "a short stump .bruce\r\nand .lee a small tree\r\n.good roots";
Regex.Matches(s, #"\.\w+").Dump();
Result:
Note:
If you don't want to find foo in some.foo (because there's no whitespace between some and .foo), you can use (?<=\W|^)\.\w+ instead.
Bizarrely enough, it seems that with RegexOptions.Multiline, ^ and $ will only additionally match \n, not \r\n.
Thus you get .good because it is preceded by \n which is matched by ^, but you don't get .bruce because it is succeeded by \r which is not matched by $.
You could do a .Replace("\r", "") on the input, or rewrite your expression to take individual lines of input.
Edit: Or replace $ with \r?$ in your pattern to explicitly include the \r; thanks to SvenS for the suggestion.
In your RegEx, a word has to be terminated by a space, but bruce is terminated by \r instead.
I would give this regex a go:
(?:.*?(\.[A-Za-z]+(?:\b|.\s)).*?)+
And change the RegexOptions from Multiline to Singleline - in this mode dot matches all characters including newline.

regex lines that matchs a pattern

I would like a regex pattern to match all lines in a text file that has the follow pattern:
TcQuery {dynamic_content} Alias "{dynamic_content}" New
If the text file has these two lines:
//tcquery c_query alias "qrybklog" new <= This one shouldn´t be found because there is two backslashes before TcQuery.
tcquery c_query alias "qrybklog" new <= I want a pattern to match this line
I´ve tried this but both lines are matched:
var prw = System.IO.File.ReadAllText(#"d:\backlog.prw", Encoding.ASCII);
prw = "//TcQuery c_query Alias teste1 new";
prw = "\nTcQuery c_query Alias teste2 new";
prw = "\nTcQuery c_query Alias teste3 new";
prw = "\n//TcQuery c_query Alias teste4 new";
var regexTcQuery = new Regex("TcQuery+[ *]+[0-9a-zA-Z_]+[ *]+alias+[ *]+[0-9a-zA-Z_\"]+[ *]new$", RegexOptions.IgnoreCase);
var resultTcQuery = regexTcQuery.Matches(prw);
Use singleline and ignorecase option with this regex
(?<!\/\/)\s*?\btcquery\b(.*?)\balias\b.*?new
(?<!\/\/) checks for // before tcquery
.*? is a lazily matches 0 to many characters.If you use .* then it would match the last tcqurey skipping ibetween tcquery matches
\b is a non-word boundary.This allows us to match separate words
\s matches 0 to many space which can be any of the [\n\r\t\f]
If you want to get the c_query text you can do this
List<string> lst=Regex.Matches(input,#"(?<!\/\/)\s*?\btcquery\b(.*?)\balias\b.*?new").Cast<Match>().Select(x=>x.Groups[1].Value).ToList();
If you want to match the full line (without any sub-groups), you could use:
^tcquery.*$
The ^ indicates that the text starts at the beginning of the string; since tcquery is the text immediately following, lines that start with // will be ignored.
If there are is any whitespace before tcquery, you can match this with \s*:
^\s*tcquery.*$
If you have lines that can begin with tcquery but don't necessarily follow the format you specified, you can update the pattern with:
^\s*tcquery.*alias.*new.*?$
To match the "dynamic" content portions of the line(s) into groups, you should be able to use:
^\s*tcquery\s+(.*)\s+alias\s+"(.*)"\s+new.*?$
Also, worth noting, you should have the ignore-case regex option enabled for each of my above-examples.
([^\/\/])tcquery (.*) alias \"(.*)\" new$/i
if your language does not support /i for caseless matching, then switch the regex to:
([^\/\/])TcQuery (.*) Alias \"(.*)\" New$
try this:
^tcquery [^ ]+ alias "[^"]+" new
it's an amateur one but the main idea is that the line must start with tcquery.

C# - Regex Match whole words

I need to match all the whole words containing a given a string.
string s = "ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
Regex r = new Regex("(?<TM>[!\..]*TEST.*)", ...);
MatchCollection mc = r.Matches(s);
I need the result to be:
MYTESTING
YOUTESTED
TESTING
But I get:
TESTING
TESTED
.TESTING
How do I achieve this with Regular expressions.
Edit: Extended sample string.
If you were looking for all words including 'TEST', you should use
#"(?<TM>\w*TEST\w*)"
\w includes word characters and is short for [A-Za-z0-9_]
Keep it simple: why not just try \w*TEST\w* as the match pattern.
I get the results you are expecting with the following:
string s = #"ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
var m = Regex.Matches(s, #"(\w*TEST\w*)", RegexOptions.IgnoreCase);
Try using \b. It's the regex flag for a non-word delimiter. If you wanted to match both words you could use:
/\b[a-z]+\b/i
BTW, .net doesn't need the surrounding /, and the i is just a case-insensitive match flag.
.NET Alternative:
var re = new Regex(#"\b[a-z]+\b", RegexOptions.IgnoreCase);
Using Groups I think you can achieve it.
string s = #"ABC.TESTING
XYZ.TESTED";
Regex r = new Regex(#"(?<TM>[!\..]*(?<test>TEST.*))", RegexOptions.Multiline);
var mc= r.Matches(s);
foreach (Match match in mc)
{
Console.WriteLine(match.Groups["test"]);
}
Works exactly like you want.
BTW, your regular expression pattern should be a verbatim string ( #"")
Regex r = new Regex(#"(?<TM>[^.]*TEST.*)", RegexOptions.IgnoreCase);
First, as #manojlds said, you should use verbatim strings for regexes whenever possible. Otherwise you'll have to use two backslashes in most of your regex escape sequences, not just one (e.g. [!\\..]*).
Second, if you want to match anything but a dot, that part of the regex should be [^.]*. ^ is the metacharacter that inverts the character class, not !, and . has no special meaning in that context, so it doesn't need to be escaped. But you should probably use \w* instead, or even [A-Z]*, depending on what exactly you mean by "word". [!\..] matches ! or ..
Regex r = new Regex(#"(?<TM>[A-Z]*TEST[A-Z]*)", RegexOptions.IgnoreCase);
That way you don't need to bother with word boundaries, though they don't hurt:
Regex r = new Regex(#"(?<TM>\b[A-Z]*TEST[A-Z]*\b)", RegexOptions.IgnoreCase);
Finally, if you're always taking the whole match anyway, you don't need to use a capturing group:
Regex r = new Regex(#"\b[A-Z]*TEST[A-Z]*\b", RegexOptions.IgnoreCase);
The matched text will be available via Match's Value property.

Categories