I have a string containing "0,35mA" I now have the code below, which splits "0,35mA" into
"0"
","
"35"
"mA"
List<string> splittedString = new List<string>();
foreach (string strItem in strList)
{
splittedString.AddRange(Regex.Matches(strItem, #"\D+|\d+")
.Cast<Match>()
.Select(m => m.Value)
.ToList());
}
What I want is the code to be splitted into
"0,35"
"mA"
How do I achieve this?
It looks like you want to tokenize the string into numbers and everything else.
A better regex approach is to split with a number matching pattern while wrapping the whole pattern into a capturing group so as to also get the matching parts into the resulting array.
Since you have , as a decimal separator, you may use
var results = Regex.Split(s, #"([-+]?[0-9]*,?[0-9]+(?:[eE][-+]?[0-9]+)?)")
.Where(x => !string.IsNullOrEmpty(x))
.ToList();
See the regex demo:
The regex is based on the pattern described in Matching Floating Point Numbers with a Regular Expression.
The .Where(x => !string.IsNullOrEmpty(x)) is necessary to get rid of empty items (if any).
I assume that all your strings will have the same format.
So, try using this regex:
string regex = "([\\d|,]{4})|[\\w]{2}";
It should work.
var st = "0,35mA";
var li = Regex.Matches(st, #"([,\d]+)([a-zA-z]+)").Cast<Match>().ToList();
foreach (var t in li)
{
Console.WriteLine($"Group 1 {t.Groups[1]}")
Console.WriteLine($"Group 2 {t.Groups[2]}");
}
Group 1 0,35
Group 2 mA
Related
I`m facing with regex split problem.
Here is my pattern
string[] words = Regex.Split(line, "[\\s,.;:/?!()\\-]+");
And this is text file:
ir KAS gi mus nugales.
jei! mes MIRTI NEBIJOM,
JEIGU mes nugalejom mirti
DZUKAS
And I have a task to find last word in upper, here is code:
z = words.LastOrDefault(c => c.All(ch => char.IsUpper(ch)));
When in end of the line is some kind of delimiter, it just dont print z . When there are no delimiter (3th, 4th lines), everything is going fine..
Why does it happen?
Why not match the words (not split), and take the last one?
string source = #"ir KAS gi mus nugales.
jei!mes MIRTI NEBIJOM,
JEIGU mes nugalejom mirti
DZUKAS";
// or #"\b\p{Lu}+\b" depending on letters you want being selected out
string pattern = #"\b[A-Z]+\b";
string result = Regex
.Matches(source, pattern)
.OfType<Match>()
.Select(match => match.Value)
.LastOrDefault();
Edit: If I understand your requirements right (Regex.Split must be preserved, and you have to output the last all caps letters word per each line), you're looking for something like this:
var result = source
.Split(new string[] { Environment.NewLine }, StringSplitOptions.None)
.Select(line => Regex.Split(line, "[\\s,.;:/?!()\\-]+"))
.Select(words => words
.Where(word => word.Length > 0 && word.All(c => char.IsUpper(c)))
.LastOrDefault());
// You may want to filter out lines which doesn't have all-ups words:
// .Where(line => line != null);
Test
Console.Write(string.Join(Environment.NewLine, result));
Output
KAS
NEBIJOM
JEIGU
DZUKAS
Please notice, that .All(c => char.IsUpper(c)) includes empty string case, that's why we have to add explicit word.Length > 0. So you've faced not Regex but Linq problem (empty string sutisfies .All(...) condition).
using System;
using System.Text.RegularExpressions;
namespace ConsoleApp
{
class Program
{
static void Main()
{
string s = #"ir KAS gi mus nugales.
jei!mes MIRTI NEBIJOM,
JEIGU mes nugalejom mirti
DZUKAS";
Match result = Regex.Match(s, "([A-Z]+)", RegexOptions.RightToLeft);
Console.WriteLine(result.Value);
Console.ReadKey();
}
}
}
From the question and comments it's hard to figure out what you want but I'll try to cover both cases.
If you're looking for the last word in whole text that is uppercase you can do something like this :
Regex r = new Regex("[,.;:/?!()\\-]+", RegexOptions.Multiline);
string result = r.Replace(source, string.Empty).Split(' ').LastOrDefault(word => word.All(c => char.IsUpper(c));
If you want to find the last match from each line :
Regex r = new Regex("[,.;:/?!()\\-]+", RegexOptions.Multiline);
string[] result = r.Replace(source, string.Empty).Split(Environment.NewLine).Select(line => line.Split(' ').LastOrDefault(word => word.All(c => char.IsUpper(c)).ToArray();
EDIT:
I have the following string:
string x = "hello;there;;you;;;!;"
The result I want is a list of length four with the following substrings:
"hello"
"there;"
"you;;"
"!"
In other words, how do I split on the last occurrence when the delimiter is repeating multiple times? Thanks.
You need to use a regex based split:
var s = "hello;there;;you;;;!;";
var res = Regex.Split(s, #";(?!;)").Where(m => !string.IsNullOrEmpty(m));
Console.WriteLine(string.Join(", ", res));
// => hello, there;, you;;, !
See the C# demo
The ;(?!;) regex matches any ; that is not followed with ;.
To also avoid matching a ; at the end of the string (and thus keep it attached to the last item in the resulting list) use ;(?!;|$) where $ matches the end of string (can be replaced with \z if the very end of the string should be checked for).
It seems that you don't want to remove empty entries but keep the separators.
You can use this code:
string s = "hello;there;;you;;;!;";
MatchCollection matches = Regex.Matches(s, #"(.+?);(?!;)");
foreach(Match match in matches)
{
Console.WriteLine(match.Captures[0].Value);
}
string x = "hello;there;;you;;;!;"
var splitted = x.Split(new char[] { ';' }, StringSplitOptions.RemoveEmptryEntries);
foreach (var s in splitted)
Console.WriteLine("{0}", s);
to get the value of gs from the below query.
(2|3|4|5|6|7|8|9|10|11|gs=accountinga sdf* |gs=tax*|12|ic='38')
I have tried with below pattern
(?<=gs=)(.*)([|])
But this results gs=accounting asdf* |gs=tax*|12|
Desired output should be : accounting asdf*,tax*
is that possible with change in pattern ?
This regex will match as you want.
(?<=gs=)([^|)]*)
It will also handle the case where gs is the last clause without including the closing bracket in the group.
I suggest 2 solutions, see their online demo.
One is regex-based:
var x = "(2|3|4|5|6|7|8|9|10|11|gs=accountinga sdf* |gs=tax*|12|ic='38')";
var result = Regex.Matches(x, #"(?:^|\|)gs=([^|]*)")
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
foreach (var s in result)
Console.WriteLine(s);
NOTE that the (?:^|\|)gs=([^|]*) pattern will only match gs= at the string beginning or after |, and then ([^|]*) will capture zero or more chars other than | into Group 1 that you will collect later with Select. See the regex demo.
Or a non-regex based, just split with |, check if the item starts with gs=, and then split with = to get the last part:
var res2 = x.Split('|')
.Where(p => p.StartsWith("gs="))
.Select(n => n.Split('=').LastOrDefault())
.ToList();
foreach (var t in res2)
Console.WriteLine(t);
I want to get white spaces which are greater than 1 space long.
The following gets me the null chars between each letter, and also the white spaces. However I only want to extract the two white spaces string between c and d, and the 3 white spaces string between f and g.
string b = "ab c def gh";
List<string> c = Regex.Split(b, #"[^\s]").ToList();
UPDATE:
The following works, but I'm looking for a more elegant way of achieving this:
c.RemoveAll(x => x == "" || x == " ");
The desired result would be a List<string> containing " " and " "
If you want List<String> as a result you could execute this Linq query
string b = "ab c def gh";
List<String> c = Regex
.Matches(b, #"\s{2,}")
.OfType<Match>()
.Select(match => match.Value)
.ToList();
This should give you your desired List.
string b = "ab c def gh";
var regex = new Regex(#"\s\s+");
var result = new List<string>();
foreach (Match m in regex.Matches(b))
result.Add(m.Value);
If all you are interested in are these groups of whitespaces, you could use
foreach(var match in Regex.Matches(b, #"\s\s+")) {
// ... do something with match
}
This guarantees that you will match at least 2 whitespaces.
Rather than splitting using a Regex, try using Regex.Matches to get all items matching your pattern - in this case I've used a pattern to match two or more whitespace characters, which I think is what you want?
var matchValues = Regex.Matches("ab c def gh", "\\s\\s+")
.OfType<Match>().Select(m => m.Value).ToList();
Annoyingly, the MatchCollection returned by Regex.Matches isn't IEnumerable<Match>, hence the need to use OfType<> in the LINQ expression.
You can use the following single line :
var list =Regex.Matches(value,#"[ ]{2,}").Cast<Match>().Select(match => match.Value).ToList();
Hope it will help you.
I have search query
string input = "FirstName=\"xy z\" LastName=\"Huber\"";
would like to use Regex to split it
I would like to have a string array with the following tokens:
FirstName=\"xy z\"
LastName=\"Huber\"
As you can see, the tokens preserve the spaces with in double quotes
my regex
("[^"]+"|\w+)\s*
nearly not want I want..need to fix more it gets
FirstName= \"xy z\" LastName = \"Huber\"
Altering I4V's answer to match the OP requirements
It seems the OP wants the strings FirstName=\"xy z\" and LastName=\"Huber\" rather than a key value pair so the solution is to simply use the matches from I4V regex.
string input = "FirstName=\"xy z\" LastName=\"Huber\"";
var matches = Regex.Matches(input, #"(\w+?)=\""(.+?)\""")
.OfType<Match>()
.Select(x => x.Value)
.ToArray();
This will give you a string array of the values.
EDIT For the specific case asked by OP
string input = "FirstName=\"xy z\" LastName=\"Huber\"";
var matches = Regex.Matches(input, #"[^\s\""]+(?:\"".*?\"")?")
.OfType<Match>()
.Select(x => x.Value)
.ToArray();