Regex printing newlines C# - c#

I am using Regex to return two lines of text, but it returns newlines with additional text.
//Text is located in txt file and contains
I've been meaning to talk with you.
I've been meaning to talk with you.
I've been meaning to talk with you.
string text = File.ReadAllText(#"C:\...\1.txt");
Regex unit = new Regex(#"(?<Text>.+(\r\n){1}.+)");
MatchCollection mc = unit.Matches(text);
foreach (Match m in mc)
foreach (Group g in m.Groups)
Console.WriteLine(g.Value);

You may use
var m = Regex.Match(text, #"^.+(?:\n.+)?");
if (m.Success)
{
Console.Write(m.Value.Trim());
}
Details
^ - start of string
.+ - any 1+ chars other than an LF symbol
(?:\n.+)? - an optional sequence of:
\n - a newline
.+ - any 1+ chars other than an LF symbol
The .Trim() is used here to trim the result from a possible CR symbol (since in .NET regex, . also matches CR symbol.

Related

Regex - split by "_" and exclude file extension

I need to split the following string AAA_BBB_CCC.extension by "_" and exclude from the results any file extension.
Where A, B and C can be any character or space. I wish to get AAA, BBB and CCC.
I know that \.(?:.(?!\.))+$ will match .extension but I could not combine it with matching "_" for splitting.
Use the Path.GetFileNameWithoutExtension function to strip the extension from the file name.
Then use String.Split to get an array with three items:
var fileName = Path.GetFileNameWithoutExtension(fullName);
var parts = fileName.Split('_');
var partAAA = parts[0];
var partBBB = parts[1];
var partCCC = parts[2];
If the parts are always the same fixed number of characters long, you can as well extract them using the Substring function. No need to resort to regex here.
Another option is to make use of the .NET Group.Captures property and capture any char except an _ in a named capture group, which you can extract from the match using a named group.
^(?'val'[^_]+)(?:_(?'val'[^_]+))+\.\w+$
Explanation
^ Start of string
(?'val'[^_]+) Named group val, match 1+ chars other than _ using a negated character class
(?: Non caputure group
_(?'val'[^_]+) Match an _ and capture again 1+ chars other than _ in same named group val
)+ Close the non capture group and repeat 1+ times for at least 1 occurrence with _
\.\w+ Match a . and 1+ word chars
$ End of string
Regex demo
string pattern = #"^(?'val'[^_]+)(?:_(?'val'[^_]+))+\.\w+$";
string input = #"AAA_BBB_CCC.extension";
Match m = Regex.Match(input, pattern);
foreach (Capture capture in m.Groups["val"].Captures) {
Console.WriteLine(capture.Value);
}
Output
AAA
BBB
CCC
If you wanted to use a regex based approach here, you could try doing a find all on the following regex pattern:
[^_]+(?=.*\.\w+$)
This pattern will match every term in between underscore, except for the portion after the extension, which will be excluded by the lookahead.
Regex rx = new Regex(#"[^_]+(?=.*\.\w+$)");
string text = "AAA_BBB_CCC.extension";
MatchCollection matches = rx.Matches(text);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[0].Value);
}
This prints:
AAA
BBB
CCC

Regex match with Arabic

i have a text in Arabic and i want to use Regex to extract numbers from it. here is my attempt.
String :
"ما المجموع:
1+2"
Match match = Regex.Match(text, "المجموع: ([^\\r\\n]+)", RegexOptions.IgnoreCase);
it will always return false. and groups.value will always return null.
expected output:
match.Groups[1].Value //returns (1+2)
The regex you wrote matches a word, then a colon, then a space and then 1 or more chars other than backslash, r and n.
You want to match the whole line after the word, colon and any amount of whitespace chars:
var text = "ما المجموع:\n1+2";
var result = Regex.Match(text, #"المجموع:\s*(.+)")?.Groups[1].Value;
Console.WriteLine(result); // => 1+2
See the C# demo
Other possible patterns:
#"المجموع:\r?\n(.+)" // To match CRLF or LF line ending only
#"المجموع:\n(.+)" // To match just LF ending only
Also, if you run the regex against a long multiline text with CRLF endings, it makes sense to replace .+ wit [^\r\n]+ since . in a .NET regex matches any chars but newlines, LF, and thus matches CR symbol.

Regex to extract substrings in C#

I have a string as:
string subjectString = #"(((43*('\\uth\Hgh.Green.two.190ITY.PCV')*9.8)/100000+('VBNJK.PVI.10JK.PCV'))*('ASFGED.Height Density.1JKHB01.PCV')/476)";
My expected output is:
Hgh.Green.two.190ITY.PCV
VBNJK.PVI.10JK.PCV
ASFGED.Height Density.1JKHB01.PCV
Here's what I have tried:
Regex regexObj = new Regex(#"'[^\\]*.PCV");
Match matchResults = regexObj.Match(subjectString);
string val = matchResults.Value;
This works when the input string is :"#"(((43*('\\uth\Hgh.Green.two.190ITY.PCV')*9.8)/100000+"; but when the string grows and the number of substrings to be extracted is more than 1 , I am getting undesired results .
How do I extract three substrings from the original string?
It seems you want to match word and . chars before .PCV.
Use
[\w\s.]*\.PCV
See the regex demo
To force at least 1 word char at the start use
\w[\w\s.]*\.PCV
Optionally, if needed, add a word boundary at the start: #"\b\w[\w\s.]*\.PCV".
To force \w match only ASCII letters and digits (and _) compile the regex object with RegexOptions.ECMAScript option.
Here,
\w - matches any letter, digit or _
[\w\s.]* - matches 0+ whitespace, word or/and . chars
\. - a literal .
PCV - a PCV substring.
Sample usage:
var results = Regex.Matches(str, #"\w[\w\s.]*\.PCV")
.Cast<Match>()
.Select(m=>m.Value)
.ToList();

Match a particular word after double quotes - c#,regex

I want to match a particular word which is followed by double quotes.
I am using regex #"\bspecific\S*id\b" which will match anything that starts with specific and ends with id.
But, I want something which should match
"specific-anything-id"(it should be with double quotes)
**<specific-anything-id>** - should not match
specific-"anything"-id - should not match
You can include the double quotes and use a negated character class [^"] (matching any char but ") rather than \S (that can also match double quotes as it matches any non-whitespace character):
var pattern = #"""specific[^""]*id""";
You do not need word boundaries either here.
See the regex demo and a C# demo:
var s = "\"specific-anything-id\" <specific-anything-id> specific-\"anything\"-id";
var matches = Regex.Matches(s, #"""specific[^""]*id""");
foreach (Match m in matches)
Console.WriteLine(m.Value); // => "specific-anything-id"
Do:
"([^"]+)"
the matched group would contain the ID you want.

Regex match between two strings that might contain another string

I'm doing a regex that is trying to match the following string:
.\SQL2012
From the two strings (they are contained within another larger string but that is irrelevant in this case):
/SERVER "\".\SQL2012\""
/SERVER .\SQL2012
So the "\" before and the \"" after the match may both be omitted in some cases. The regex I've come up with (from a previous question here on StackOverflow) is the following:
(?<=\/SERVER\s*(?:[""\\""]+)?)\w+(?=(?:[\\""""]+|$)| )
Which works fine if I'm trying to match TEST_SERVER instead of .\SQL2012 (because \w does not match special characters). Is there a way to match anything until \"" or a whitespace occurs?
I'm doing this in C#, here's my code:
string input = "/SERVER \"\\\".\\SQL2012\\\"\"";
string pattern = #"(?<=\/SERVER\s*(?:[""\\""]+)?)\w+(?=(?:[\\""""]+|$)| )";
Regex regEx = new Regex(pattern);
MatchCollection matches = regEx.Matches(input);
foreach (Match match in matches)
{
Console.WriteLine(match.ToString());
}
Console.ReadKey();
Add a word boundary \b just before to the lookahead,
string input = "/SERVER .\\SQL2012";
Regex rgx = new Regex(#"(?<=\/SERVER\s+""\\"").*?\b(?=\\""""|$| )|(?<=\/SERVER\s+).*?\b(?= |$)");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Groups[0].Value);
Console.WriteLine(input);
IDEONE

Categories