what regex must i use to split this? - c#

i am very newbie to c#..
i want program if input like this
input : There are 4 numbers in this string 40, 30, and 10
output :
there = string
are = string
4 = number
numbers = string
in = string
this = string
40 = number
, = symbol
30 = number
, = symbol
and = string
10 = number
i am try this
{
class Program
{
static void Main(string[] args)
{
string input = "There are 4 numbers in this string 40, 30, and 10.";
// Split on one or more non-digit characters.
string[] numbers = Regex.Split(input, #"(\D+)(\s+)");
foreach (string value in numbers)
{
Console.WriteLine(value);
}
}
}
}
but the output is different from what i want.. please help me.. i am stuck :((

The regex parser has an if conditional and the ability to group items into named capture groups; to which I will demonstrate.
Here is an example where the patttern looks for symbols first (only a comma add more symbols to the set [,]) then numbers and drops the rest into words.
string text = #"There are 4 numbers in this string 40, 30, and 10";
string pattern = #"
(?([,]) # If a comma (or other then add it) is found its a symbol
(?<Symbol>[,]) # Then match the symbol
| # else its not a symbol
(?(\d+) # If a number
(?<Number>\d+) # Then match the numbers
| # else its not a number
(?<Word>[^\s]+) # So it must be a word.
)
)
";
// Ignore pattern white space allows us to comment the pattern only, does not affect
// the processing of the text!
Regex.Matches(text, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt =>
{
if (mt.Groups["Symbol"].Success)
return "Symbol found: " + mt.Groups["Symbol"].Value;
if (mt.Groups["Number"].Success)
return "Number found: " + mt.Groups["Number"].Value;
return "Word found: " + mt.Groups["Word"].Value;
}
)
.ToList() // To show the result only remove
.ForEach(rs => Console.WriteLine (rs));
/* Result
Word found: There
Word found: are
Number found: 4
Word found: numbers
Word found: in
Word found: this
Word found: string
Number found: 40
Symbol found: ,
Number found: 30
Symbol found: ,
Word found: and
Number found: 10
*/
Once the regex has tokenized the resulting matches, then we us linq to extract out those tokens by identifying which named capture group has a success. In this example we get the successful capture group and project it into a string to print out for viewing.
I discuss the regex if conditional on my blog Regular Expressions and the If Conditional for more information.

You could split using this pattern: #"(,)\s?|\s"
This splits on a comma, but preserves it since it is within a group. The \s? serves to match an optional space but excludes it from the result. Without it, the split would include the space that occurred after a comma. Next, there's an alternation to split on whitespace in general.
To categorize the values, we can take the first character of the string and check for the type using the static Char methods.
string input = "There are 4 numbers in this string 40, 30, and 10";
var query = Regex.Split(input, #"(,)\s?|\s")
.Select(s => new
{
Value = s,
Type = Char.IsLetter(s[0]) ?
"String" : Char.IsDigit(s[0]) ?
"Number" : "Symbol"
});
foreach (var item in query)
{
Console.WriteLine("{0} : {1}", item.Value, item.Type);
}
To use the Regex.Matches method instead, this pattern can be used: #"\w+|,"
var query = Regex.Matches(input, #"\w+|,").Cast<Match>()
.Select(m => new
{
Value = m.Value,
Type = Char.IsLetter(m.Value[0]) ?
"String" : Char.IsDigit(m.Value[0]) ?
"Number" : "Symbol"
});

Well to match all numbers you could do:
[\d]+
For the strings:
[a-zA-Z]+
And for some of the symbols for example
[,.?\[\]\\\/;:!\*]+

You can very easily do this like so:
string[] tokens = Regex.Split(input, " ");
foreach(string token in tokens)
{
if(token.Length > 1)
{
if(Int32.TryParse(token))
{
Console.WriteLine(token + " = number");
}
else
{
Console.WriteLine(token + " = string");
}
}
else
{
if(!Char.isLetter(token ) && !Char.isDigit(token))
{
Console.WriteLine(token + " = symbol");
}
}
}
I do not have an IDE handy to test that this compiles. Essentially waht you are doing is splitting the input on space and then performing some comparisons to determine if it is a symbol, string, or number.

If you want to get the numbers
var reg = new Regex(#"\d+");
var matches = reg.Matches(input );
var numbers = matches
.Cast<Match>()
.Select(m=>Int32.Parse(m.Groups[0].Value));
To get your output:
var regSymbols = new Regex(#"(?<number>\d+)|(?<string>\w+)|(?<symbol>(,))");
var sMatches = regSymbols.Matches(input );
var symbols = sMatches
.Cast<Match>()
.Select(m=> new
{
Number = m.Groups["number"].Value,
String = m.Groups["string"].Value,
Symbol = m.Groups["symbol"].Value
})
.Select(
m => new
{
Match = !String.IsNullOrEmpty(m.Number) ?
m.Number : !String.IsNullOrEmpty(m.String)
? m.String : m.Symbol,
MatchType = !String.IsNullOrEmpty(m.Number) ?
"Number" : !String.IsNullOrEmpty(m.String)
? "String" : "Symbol"
}
);
edit
If there are more symbols than a comma you can group them in a class, like #Bogdan Emil Mariesan did and the regex will be:
#"(?<number>\d+)|(?<string>\w+)|(?<symbol>[,.\?!])"
edit2
To get the strings with =
var outputLines = symbols.Select(m=>
String.Format("{0} = {1}", m.Match, m.MatchType));

Related

Regex to find all placeholder occurrences in text

Im struggling to create a Regex that finds all placeholder occurrences in a given text. Placeholders will have the following format:
[{PRE.Word1.Word2}]
Rules:
Delimited by "[{PRE." and "}]" ("PRE" upper case)
2 words (at least 1 char long each) separated by a dot. All chars valid on each word apart from newline.
word1: min 1 char, max 15 chars
word2: min 1 char, max 64 chars
word1 cannot have dots, if there are more than 2 dots inside placeholder extra ones will be part of word2. If less than 2 dots, placeholder is invalid.
Looking to get all valid placeholders regardless of what the 2 words are.
Im not being lazy, just spent an horrible amount of time building the rule on regexr.com, but was unable to cross all these rules.
Looking fwd to checking your suggestions.
The closest I've got to was the below, and any attempt to expand on that breaks all valid matches.
\[\{OEP\.*\.*\}\]
Much appreciated!
Sample text where Regex should find matches:
Random text here
[{Test}] -- NO MATCH
[{PRE.TestTest3}] --NO MATCH
[{PRE.TooLong.12345678901234567890}] --NO MATCH
[{PRE.Address.Country}] --MATCH
[{PRE.Version.1.0}] --MATCH
Random text here
You can use
\[{PRE\.([^][{}.]{1,15})\.(.{1,64}?)}]
See the regex demo
Details
\[{ - a [{ string
PRE\. - PRE. text
([^][{}.]{1,15}) - Group 1: any one to fifteen chars other than [, ], {, } and .
\. - a dot
(.{1,64}?) - any one to 64 chars other than line break chars as few as possible
}] - a }] text.
If you need to get all matches in C#, you can use
var pattern = #"\[{PRE\.([^][{}.]{1,15})\.(.{1,64}?)}]";
var matches = Regex.Matches(text, pattern);
See this C# demo:
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var text = "[{PRE.Word1.Word2}] and [{PRE.Word 3.Word..... 2 %%%}]";
var pattern = #"\[{PRE\.([^][{}.]{1,15})\.(.{1,64}?)}]";
var matches = Regex.Matches(text, pattern);
var props = new List<Property>();
foreach (Match m in matches)
props.Add(new Property(m.Groups[1].Value,m.Groups[2].Value));
foreach (var item in props)
Console.WriteLine("Word1 = " + item.Word1 + ", Word2 = " + item.Word2);
}
public class Property
{
public string Word1 { get; set; }
public string Word2 { get; set; }
public Property()
{}
public Property(string w1, string w2)
{
this.Word1 = w1;
this.Word2 = w2;
}
}
}
Output:
Word1 = Word1, Word2 = Word2
Word1 = Word 3, Word2 = Word..... 2 %%%
string input = "[{PRE.Word1.Word2}]";
// language=regex
string pattern = #"\[{ PRE \. (?'group1' .{1,15}? ) \. (?'group2' .{1,64}? ) }]";
var match = Regex.Match(input, pattern, RegexOptions.IgnorePatternWhitespace);
Console.WriteLine(match.Groups["group1"].Value);
Console.WriteLine(match.Groups["group2"].Value);

How do I Split the string into two separate variables in C#

I have a string that I want to store in two different varaibles in C#.
s= "Name=team1; ObjectGUID=d8fd5125-b065-48cb-b5f3-c20f509b7476"
I want Var1 = team1 & Var2 = d8fd5125-b065-48cb-b5f3-c20f509b7476
Here's what I am trying to do:
var1 = s.Replace("Name=","").Replace("; ObjectGUID=", "");
But I am not able to figure out how to bifurcate the Name value to var1 and eliminate the rest. And it is possible that the value of 'Name' could vary so I can't fix the length to chop off.
You could use a regex where the value of Name could be captured in group 1 matching not a ; using a negated character class.
The value of ObjectGUID could be captured in group 2 using a repeated pattern matching 1+ times a digit 0-9 or characters a-f. Then repeat that pattern 1+ times preceded with a -
Name=([^;]+); ObjectGUID=([a-f0-9]+(?:-[a-f0-9]+)+)
.NET regex demo | C# demo
For example:
string pattern = #"Name=([^;]+); ObjectGUID=([a-f0-9]+(?:-[a-f0-9]+)+)";
string s= "Name=team1; ObjectGUID=d8fd5125-b065-48cb-b5f3-c20f509b7476";
Match m = Regex.Match(s, pattern);
string var1 = m.Groups[1].Value;
string var2 = m.Groups[2].Value;
Console.WriteLine(var1);
Console.WriteLine(var2);
Result
team1
d8fd5125-b065-48cb-b5f3-c20f509b7476
Split by ';' then split by '='. Also works for any key/value pairs such as the ones in connection strings.
var values = s.Split(';').Select(kv => kv.Split('=')[1]).ToArray();
var var1 = values[0];
var val2 = values[1];
You can use IndexOf to take point at "=" and Substring to take the next value.
using System;
public class SubStringTest {
public static void Main() {
string [] info = { "Name: Felica Walker", "Title: Mz.",
"Age: 47", "Location: Paris", "Gender: F"};
int found = 0;
Console.WriteLine("The initial values in the array are:");
foreach (string s in info)
Console.WriteLine(s);
Console.WriteLine("\nWe want to retrieve only the key information. That
is:");
foreach (string s in info) {
found = s.IndexOf(": ");
Console.WriteLine(" {0}", s.Substring(found + 2));
}
}
}
The example displays the following output:
The initial values in the array are:
Name: Felica Walker
Title: Mz.
Age: 47
Location: Paris
Gender: F
We want to retrieve only the key information. That is:
Felica Walker
Mz.
47
Paris
F

Get number between characters in Regex

Having difficulty creating a regex.
I have this text:
"L\":0.01690502,\"C\":0.01690502,\"V\":33.76590433"
I need only the number after C\": extracted, this is what I currently have.
var regex = new Regex(#"(?<=C\\"":)\d +.\d + (?=\s *,\\)");
var test = regex.Match(content).ToString();
decimal.TryParse(test, out decimal closingPrice);
To extract the number after C\":, you can capture (\d+.\d+) in a group:
C\\":(\d+.\d+)
You could also use a positive lookbehind:
(?<=C\\":)\d+.\d+
You can use this code to fetch all pairs of letter and number.
var regex = new Regex("(?<letter>[A-Z])[^:]+:(?<number>[^,\"]+)");
var input = "L\":0.01690502,\"C\":0.01690502,\"V\":33.76590433";
var matches = regex.Matches(input).Cast<Match>().ToArray();
foreach (var match in matches)
Console.WriteLine($"Letter: {match.Groups["letter"].Value}, number: {match.Groups["number"].Value}");
If you only need only number from "C" letter you can use this linq expression:
var cNumber = matches.FirstOrDefault(m => m.Groups["letter"].Value == "C")?.Groups["number"].Value ?? "";
Regex explanation:
(?<letter>[A-Z]) // capture single letter
[^:]+ // skip all chars until ':'
: // colon
(?<number>[^,"]+) // capture all until ',' or '"'
Working demo
Fixed it with this.
var regex = new Regex("(?<=C\\\":)\\d+.\\d+(?=\\s*,)");
var test = regex.Match(content).ToString();
String literal to use for C#:
#"C\\"":([.0-9]*),"
If you wish to filter for only a valid numbers:
#"C\\"":([0-9]+.[0-9]+),"

Splitting a string at first number and then returning 2 strings

Having some trouble adapting my splitting of a string into 2 parts to do it from the first number. It's currently splitting on the first space, but that won't work long term because cities have spaces in them too.
Current code:
var string = "Chicago 1234 Anytown, NY"
var commands = parameters.Split(new[] { ' ' }, 2);
var originCity = commands[0];
var destination = commands[1];
This works great for a city that has a single name, but I break on:
var string = "Los Angeles 1234 Anytown, NY"
I've tried several different approaches that I just haven't been able to work out. Any ideas on being able to return 2 strings as the following:
originCity = Los Angeles
destination = 1234 Anytown, NY
You can't use .Split() for this.
Instead, you need to find the index of the first number. You can use .indexOfAny() with an array of numbers (technically a char[] array) to do this.
int numberIndex = address.IndexOfAny("0123456789".ToCharArray())
You can then capture two substrings; One before the index, the other after.
string before = line.Substring(0, numberIndex);
string after = line.Substring(numberIndex);
You could use Regex. In the following, match is the first match in the regex results.
var match = Regex.Match(s, "[0-9]");
if (match.Success)
{
int index = match.Index;
originCity = s.Substring(0, index);
destination = s.Substring(index, s.Length - index);
}
Or you can do it yourself:
int index = 0;
foreach (char c in s)
{
int result;
if (int.TryParse(c, out result))
{
index = result;
break;
}
//or if (char.IsDigit()) { index = int.Parse(c); break; }
}
...
You should see if using a regular expression will do what you need here. At least with the sample data you're showing, the expression:
(\D+)(\d+)(\D+)
would group the results into non-numeric characters up to the first numeric character, the numeric characters until a non-numeric is encountered, and then the rest of the non-numeric characters. Here is how it would be used in code:
var pattern = #"(\D+)(\d+)(\D+)";
var input = "Los Angeles 1234 Anytown, NY";
var result = Regex.Match(input, pattern);
var city = result.Groups[1];
var destination = $"{result.Groups[2]} {result.Groups[3]}";
This falls apart in cases like 29 Palms, California or if the numbers would contain comma, decimal, etc so it is certainly not a silver bullet but I don't know your data and it may be ok for such a simple solution.

C# regular expression

I have string like this:
{F971h}[0]<0>some result code: 1
and I want to split it into:
F971
0
0
some result code: 1
I know I can first split "{|}|[|]|<|>" it into:
{F971h}
[0]
<0>
some result code: 1
and next: {F971h} -> F971; [0] -> 0; etc.
But how can I do it with one regular expression?
I try somethink like this:
Regex rgx = new Regex(#"(?<timestamp>[0-9A-F]+)" + #"(?<subsystem>\d+)" + #"(?<level>\d+)" + #"(?<messagep>[0-9A-Za-z]+)");
var result = rgx.Matches(input);
You can try just Split without any regular expressions:
string source = "{F971h}[0]<0>some result code: 1";
string[] items = source.Split(new char[] { '{', '}', '[', ']', '<', '>' },
StringSplitOptions.RemoveEmptyEntries);
Test:
// F971h
// 0
// 0
// some result code: 1
Console.Write(String.Join(Environment.NewLine, items));
There are two issues with your regex:
You do not allow lowercase ASCII letters in the first capture group (add a-z or a RegexOptions.IgnoreCase flag)
The delimiting characters are missing in the pattern (<, >, [, ], etc.)
Use
{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<messagep>.+)
^ ^^^ ^^^ ^^ ^
See the regex demo
Since the messagep group should match just the rest of the line, I suggest just using .+ at the end. Else, you'd need to replace your [0-9A-Za-z]+ that does not allow whitespace with something like [\w\s]+ (match all word chars and whitespaces, 1 or more times).
C# code:
var s = #"{F971h}[0]<0>some result code: 1";
var pat = #"{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<messagep>.+)";
var m = Regex.Match(s, pat);
if (m.Success)
{
Console.Out.WriteLine(m.Groups["timestamp"].Value);
Console.Out.WriteLine(m.Groups["subsystem"].Value);
Console.Out.WriteLine(m.Groups["level"].Value);
Console.Out.WriteLine(m.Groups["messagep"].Value);
}
Or for a multiline string containing multiple matches:
var s = "{F971h}[0]<0>some result code: 1\r\n{FA71h}[0]<0>some result code: 3\r\n{FB72h}[0]<0>some result code: 5";
var pat = #"{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<messagep>[^\r\n]+)";
var res = System.Text.RegularExpressions.Regex.Matches(s, pat)
.Cast<System.Text.RegularExpressions.Match>()
.Select(x => new[] {
x.Groups["timestamp"].Value,
x.Groups["subsystem"].Value,
x.Groups["level"].Value,
x.Groups["messagep"].Value})
.ToList();
You can get it like that:
string line = #"{F971h}[0]<0>some result code: 1";
var matchCollection = Regex.Matches(line, #"\{(?<timestamp>.*?)\}\[(?<subsystem>.*?)\]<(?<level>.*?)>(?<messagep>.*)");
if (matchCollection.Count > 0)
{
string timestamp = matchCollection[0].Groups["timestamp"].Value;
string subsystem = matchCollection[0].Groups["subsystem"].Value;
string level = matchCollection[0].Groups["level"].Value;
string messagep = matchCollection[0].Groups["messagep"].Value;
Console.Out.WriteLine("First part is {0}, second: {1}, thrid: {2}, last: {3}", timestamp, subsystem, level, messagep);
}
else
{
Console.Out.WriteLine("No match found.");
}
You can watch it live here on regex storm. You'll have to learn about:
Named capture groups
Repetitions
Thank you all! Code below works for me. I missed that it can be multiple string:
{F971h}[0]<0>some result code: 1\r\n{FA71h}[0]<0>some result code: 3\r\n{FB72h}[0]<0>some result code: 5
code:
var pat = #"{(?<timestamp>[0-9a-zA-F]+)}\[(?<subsystem>\d+)]<(?<level>\d+)>(?<message>.+)";
var collection = Regex.Matches(input, pat);
foreach (Match m in collection)
{
var timestamp = m.Groups["timestamp"];
var subsystem = m.Groups["subsystem"];
var level = m.Groups["level"];
var message = m.Groups["message"];
}

Categories