C# regular expression

C# regular expression - c#

I have string like this:
{F971h}[0]<0>some result code: 1
and I want to split it into:
F971
0
0
some result code: 1
I know I can first split "{|}|[|]|<|>" it into:
{F971h}
[0]
<0>
some result code: 1
and next: {F971h} -> F971; [0] -> 0; etc.
But how can I do it with one regular expression?
I try somethink like this:
Regex rgx = new Regex(#"(?<timestamp>[0-9A-F]+)" + #"(?<subsystem>\d+)" + #"(?<level>\d+)" + #"(?<messagep>[0-9A-Za-z]+)");
var result = rgx.Matches(input);

You can try just Split without any regular expressions:
string source = "{F971h}[0]<0>some result code: 1";
string[] items = source.Split(new char[] { '{', '}', '[', ']', '<', '>' },
StringSplitOptions.RemoveEmptyEntries);
Test:
// F971h
// 0
// 0
// some result code: 1
Console.Write(String.Join(Environment.NewLine, items));

There are two issues with your regex:
You do not allow lowercase ASCII letters in the first capture group (add a-z or a RegexOptions.IgnoreCase flag)
The delimiting characters are missing in the pattern (<, >, [, ], etc.)
Use
{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<messagep>.+)
^ ^^^ ^^^ ^^ ^
See the regex demo
Since the messagep group should match just the rest of the line, I suggest just using .+ at the end. Else, you'd need to replace your [0-9A-Za-z]+ that does not allow whitespace with something like [\w\s]+ (match all word chars and whitespaces, 1 or more times).
C# code:
var s = #"{F971h}[0]<0>some result code: 1";
var pat = #"{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<messagep>.+)";
var m = Regex.Match(s, pat);
if (m.Success)
{
Console.Out.WriteLine(m.Groups["timestamp"].Value);
Console.Out.WriteLine(m.Groups["subsystem"].Value);
Console.Out.WriteLine(m.Groups["level"].Value);
Console.Out.WriteLine(m.Groups["messagep"].Value);
}
Or for a multiline string containing multiple matches:
var s = "{F971h}[0]<0>some result code: 1\r\n{FA71h}[0]<0>some result code: 3\r\n{FB72h}[0]<0>some result code: 5";
var pat = #"{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<messagep>[^\r\n]+)";
var res = System.Text.RegularExpressions.Regex.Matches(s, pat)
.Cast<System.Text.RegularExpressions.Match>()
.Select(x => new[] {
x.Groups["timestamp"].Value,
x.Groups["subsystem"].Value,
x.Groups["level"].Value,
x.Groups["messagep"].Value})
.ToList();

You can get it like that:
string line = #"{F971h}[0]<0>some result code: 1";
var matchCollection = Regex.Matches(line, #"\{(?<timestamp>.*?)\}\[(?<subsystem>.*?)\]<(?<level>.*?)>(?<messagep>.*)");
if (matchCollection.Count > 0)
{
string timestamp = matchCollection[0].Groups["timestamp"].Value;
string subsystem = matchCollection[0].Groups["subsystem"].Value;
string level = matchCollection[0].Groups["level"].Value;
string messagep = matchCollection[0].Groups["messagep"].Value;
Console.Out.WriteLine("First part is {0}, second: {1}, thrid: {2}, last: {3}", timestamp, subsystem, level, messagep);
}
else
{
Console.Out.WriteLine("No match found.");
}
You can watch it live here on regex storm. You'll have to learn about:
Named capture groups
Repetitions

Thank you all! Code below works for me. I missed that it can be multiple string:
{F971h}[0]<0>some result code: 1\r\n{FA71h}[0]<0>some result code: 3\r\n{FB72h}[0]<0>some result code: 5
code:
var pat = #"{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<message>.+)";
var collection = Regex.Matches(input, pat);
foreach (Match m in collection)
{
var timestamp = m.Groups["timestamp"];
var subsystem = m.Groups["subsystem"];
var level = m.Groups["level"];
var message = m.Groups["message"];
}

Related

C# need regular expression to capture the second occurence after a underscore

I am using the regular expression below to capture all numbers/letters after a underscore but I need to capture only the second occurence i.e "00500" as you see below:
regular expresion: (?<=_)[a-zA-Z0-9]+
string:
"-rw-rw-rw- 1 rats rats 31K Sep 17 13:33 /opt/data/automation_sent/20180918/labc/0/20180918_00500.itx"
I am doing in C# and I thought the value would be in the second group[1] but it is not; it only captures the string "_sent":
string temp2 = "";
Regex getValueAfterUnderscore = new Regex(#"(?<=_)[a-zA-Z0-9]+");
Match match2 = getValueAfterUnderscore.Match(line);
if (match2.Success)
{
temp2 = match2.Groups[1].Value;
Console.WriteLine(temp2);
}
Any ideas? Thank you!

you can use the following code that capture text after second underscore
var line = "-rw-rw-rw- 1 rats rats 31K Sep 17 13:33 /opt/data/automation_sent/20180918/labc/0/20180918_00500.itx";
string temp2 = "";
Regex getValueAfterUnderscore = new Regex(#"_.+_([a-zA-Z0-9]+)");
Match match2 = getValueAfterUnderscore.Match(line);
if (match2.Success)
{
temp2 = match2.Groups[1].Value;
Console.WriteLine(temp2);
}
output:
00500

Maybe you are confusing "groups" with "matches". You should search for matches of your regular expression. Here's how to list all matches of your regex in a given string:
string str = "-rw-rw-rw- 1 rats rats 31K Sep 17 13:33 /opt/data/automation_sent/20180918/labc/0/20180918_00500.itx";
MatchCollection matches = Regex.Matches(str, #"(?<=_)[a-zA-Z0-9]+");
foreach (Match curMatch in matches)
Console.WriteLine(curMatch.Value);
For your specific case, verify if there are at least 2 matches and retrieve the value of matches[1] (which is the second match).
if (matches.Count >= 2)
Console.WriteLine($"Your result: {matches[1].Value}");

var input = "-rw-rw-rw- 1 rats rats 31K Sep 17 13:33 /opt/data/automation_sent/20180918/labc/0/20180918_00500.itx";
Regex regex = new Regex(#"(?<Identifier1>\d+)_(?<Identifier2>\d+)");
var results = regex.Matches(input);
foreach (Match match in results)
{
Console.WriteLine(match.Groups["Identifier1"].Value);
Console.WriteLine(match.Groups["Identifier2"].Value);//second occurence
}
tested here : http://rextester.com/SIMXNS63534

If all your strings look like this pattern {SOME_STRING}_{YOUR_NUMBER}.itx, then you can use this solution (without regex)
var arr = str.Split(new[] {"_", ".itx"}, StringSplitOptions.RemoveEmptyEntries);
var result = arr[arr.Length - 1];

Get number between characters in Regex

Having difficulty creating a regex.
I have this text:
"L\":0.01690502,\"C\":0.01690502,\"V\":33.76590433"
I need only the number after C\": extracted, this is what I currently have.
var regex = new Regex(#"(?<=C\\"":)\d +.\d + (?=\s *,\\)");
var test = regex.Match(content).ToString();
decimal.TryParse(test, out decimal closingPrice);

To extract the number after C\":, you can capture (\d+.\d+) in a group:
C\\":(\d+.\d+)
You could also use a positive lookbehind:
(?<=C\\":)\d+.\d+

You can use this code to fetch all pairs of letter and number.
var regex = new Regex("(?<letter>[A-Z])[^:]+:(?<number>[^,\"]+)");
var input = "L\":0.01690502,\"C\":0.01690502,\"V\":33.76590433";
var matches = regex.Matches(input).Cast<Match>().ToArray();
foreach (var match in matches)
Console.WriteLine($"Letter: {match.Groups["letter"].Value}, number: {match.Groups["number"].Value}");
If you only need only number from "C" letter you can use this linq expression:
var cNumber = matches.FirstOrDefault(m => m.Groups["letter"].Value == "C")?.Groups["number"].Value ?? "";
Regex explanation:
(?<letter>[A-Z]) // capture single letter
[^:]+ // skip all chars until ':'
: // colon
(?<number>[^,"]+) // capture all until ',' or '"'
Working demo

Fixed it with this.
var regex = new Regex("(?<=C\\\":)\\d+.\\d+(?=\\s*,)");
var test = regex.Match(content).ToString();

String literal to use for C#:
#"C\\"":([.0-9]*),"
If you wish to filter for only a valid numbers:
#"C\\"":([0-9]+.[0-9]+),"

Trying to get a piece of text out of HTML source with webrequest

So I am making a webrequest and reading to source and all.
In the source there is a particular string that I need to have.
The source:
RESPONSIVE.constant.user = {
id: 71723922,
name: 'Raktott',
member: false,
language: 0,
isLoggedIn: 1
};
The part that I need is Name: '', So only the part within the ''
How would I accomplish this?
I have tried regular expressions htmlagilitypack etc.

Considering this your string
id: 71723922,
name: 'Raktott',
member: false,
language: 0,
isLoggedIn: 1
What i would do i split the string into string array based on , delimiter.
string str = "id: 71723922,name: 'Raktott', member: false,language: 0,isLoggedIn: 1";
string[] arrstr = str.Split(',');
for (int i=0; i<=arrstr.Length; i++){
if(arrstr[i].Contains("name"))
{
string name = arrstr[i];
//Perform you Logic here
break;
}
}

You can use Regex with a lazy quantifier (*?) to capture the text between the { curly braces }:
// Don't forget to escape full-stops!
Regex regex = new Regex( #"RESPONSIVE\.constant\.user = {(?<userParams>.*?)}", RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match match = regex.Match(pageSourceCode);
if (match.Success)
{
// Split up the values using comma
var keyValuePairs = match.Groups["userParams"].Value.Split(',');
// Split up each line using : as delimeter and clean up both sides, removing whitespace and single quote characters
var dict = keyValuePairs
.Select(kvp => kvp.Split(':'))
.ToDictionary(kvp => kvp[0].Trim(), kvp => kvp[1].Trim().Trim(new char[] { '\'' }));
// Read name
var name = dict["name"];
}

Split constantly on the last delimiter in C#

I have the following string:
string x = "hello;there;;you;;;!;"
The result I want is a list of length four with the following substrings:
"hello"
"there;"
"you;;"
"!"
In other words, how do I split on the last occurrence when the delimiter is repeating multiple times? Thanks.

You need to use a regex based split:
var s = "hello;there;;you;;;!;";
var res = Regex.Split(s, #";(?!;)").Where(m => !string.IsNullOrEmpty(m));
Console.WriteLine(string.Join(", ", res));
// => hello, there;, you;;, !
See the C# demo
The ;(?!;) regex matches any ; that is not followed with ;.
To also avoid matching a ; at the end of the string (and thus keep it attached to the last item in the resulting list) use ;(?!;|$) where $ matches the end of string (can be replaced with \z if the very end of the string should be checked for).

It seems that you don't want to remove empty entries but keep the separators.
You can use this code:
string s = "hello;there;;you;;;!;";
MatchCollection matches = Regex.Matches(s, #"(.+?);(?!;)");
foreach(Match match in matches)
{
Console.WriteLine(match.Captures[0].Value);
}

string x = "hello;there;;you;;;!;"
var splitted = x.Split(new char[] { ';' }, StringSplitOptions.RemoveEmptryEntries);
foreach (var s in splitted)
Console.WriteLine("{0}", s);

what regex must i use to split this?

i am very newbie to c#..
i want program if input like this
input : There are 4 numbers in this string 40, 30, and 10
output :
there = string
are = string
4 = number
numbers = string
in = string
this = string
40 = number
, = symbol
30 = number
, = symbol
and = string
10 = number
i am try this
{
class Program
{
static void Main(string[] args)
{
string input = "There are 4 numbers in this string 40, 30, and 10.";
// Split on one or more non-digit characters.
string[] numbers = Regex.Split(input, #"(\D+)(\s+)");
foreach (string value in numbers)
{
Console.WriteLine(value);
}
}
}
}
but the output is different from what i want.. please help me.. i am stuck :((

The regex parser has an if conditional and the ability to group items into named capture groups; to which I will demonstrate.
Here is an example where the patttern looks for symbols first (only a comma add more symbols to the set [,]) then numbers and drops the rest into words.
string text = #"There are 4 numbers in this string 40, 30, and 10";
string pattern = #"
(?([,]) # If a comma (or other then add it) is found its a symbol
(?<Symbol>[,]) # Then match the symbol
| # else its not a symbol
(?(\d+) # If a number
(?<Number>\d+) # Then match the numbers
| # else its not a number
(?<Word>[^\s]+) # So it must be a word.
)
)
";
// Ignore pattern white space allows us to comment the pattern only, does not affect
// the processing of the text!
Regex.Matches(text, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt =>
{
if (mt.Groups["Symbol"].Success)
return "Symbol found: " + mt.Groups["Symbol"].Value;
if (mt.Groups["Number"].Success)
return "Number found: " + mt.Groups["Number"].Value;
return "Word found: " + mt.Groups["Word"].Value;
}
)
.ToList() // To show the result only remove
.ForEach(rs => Console.WriteLine (rs));
/* Result
Word found: There
Word found: are
Number found: 4
Word found: numbers
Word found: in
Word found: this
Word found: string
Number found: 40
Symbol found: ,
Number found: 30
Symbol found: ,
Word found: and
Number found: 10
*/
Once the regex has tokenized the resulting matches, then we us linq to extract out those tokens by identifying which named capture group has a success. In this example we get the successful capture group and project it into a string to print out for viewing.
I discuss the regex if conditional on my blog Regular Expressions and the If Conditional for more information.

You could split using this pattern: #"(,)\s?|\s"
This splits on a comma, but preserves it since it is within a group. The \s? serves to match an optional space but excludes it from the result. Without it, the split would include the space that occurred after a comma. Next, there's an alternation to split on whitespace in general.
To categorize the values, we can take the first character of the string and check for the type using the static Char methods.
string input = "There are 4 numbers in this string 40, 30, and 10";
var query = Regex.Split(input, #"(,)\s?|\s")
.Select(s => new
{
Value = s,
Type = Char.IsLetter(s[0]) ?
"String" : Char.IsDigit(s[0]) ?
"Number" : "Symbol"
});
foreach (var item in query)
{
Console.WriteLine("{0} : {1}", item.Value, item.Type);
}
To use the Regex.Matches method instead, this pattern can be used: #"\w+|,"
var query = Regex.Matches(input, #"\w+|,").Cast<Match>()
.Select(m => new
{
Value = m.Value,
Type = Char.IsLetter(m.Value[0]) ?
"String" : Char.IsDigit(m.Value[0]) ?
"Number" : "Symbol"
});

Well to match all numbers you could do:
[\d]+
For the strings:
[a-zA-Z]+
And for some of the symbols for example
[,.?\[\]\\\/;:!\*]+

You can very easily do this like so:
string[] tokens = Regex.Split(input, " ");
foreach(string token in tokens)
{
if(token.Length > 1)
{
if(Int32.TryParse(token))
{
Console.WriteLine(token + " = number");
}
else
{
Console.WriteLine(token + " = string");
}
}
else
{
if(!Char.isLetter(token ) && !Char.isDigit(token))
{
Console.WriteLine(token + " = symbol");
}
}
}
I do not have an IDE handy to test that this compiles. Essentially waht you are doing is splitting the input on space and then performing some comparisons to determine if it is a symbol, string, or number.

If you want to get the numbers
var reg = new Regex(#"\d+");
var matches = reg.Matches(input );
var numbers = matches
.Cast<Match>()
.Select(m=>Int32.Parse(m.Groups[0].Value));
To get your output:
var regSymbols = new Regex(#"(?<number>\d+)|(?<string>\w+)|(?<symbol>(,))");
var sMatches = regSymbols.Matches(input );
var symbols = sMatches
.Cast<Match>()
.Select(m=> new
{
Number = m.Groups["number"].Value,
String = m.Groups["string"].Value,
Symbol = m.Groups["symbol"].Value
})
.Select(
m => new
{
Match = !String.IsNullOrEmpty(m.Number) ?
m.Number : !String.IsNullOrEmpty(m.String)
? m.String : m.Symbol,
MatchType = !String.IsNullOrEmpty(m.Number) ?
"Number" : !String.IsNullOrEmpty(m.String)
? "String" : "Symbol"
}
);
edit
If there are more symbols than a comma you can group them in a class, like #Bogdan Emil Mariesan did and the regex will be:
#"(?<number>\d+)|(?<string>\w+)|(?<symbol>[,.\?!])"
edit2
To get the strings with =
var outputLines = symbols.Select(m=>
String.Format("{0} = {1}", m.Match, m.MatchType));

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# regular expression - c#

Related

C# need regular expression to capture the second occurence after a underscore

Get number between characters in Regex

Trying to get a piece of text out of HTML source with webrequest

Split constantly on the last delimiter in C#

what regex must i use to split this?

Categories

Resources