Extract multiple white spaces from string

Extract multiple white spaces from string - c#

I want to get white spaces which are greater than 1 space long.
The following gets me the null chars between each letter, and also the white spaces. However I only want to extract the two white spaces string between c and d, and the 3 white spaces string between f and g.
string b = "ab c def gh";
List<string> c = Regex.Split(b, #"[^\s]").ToList();
UPDATE:
The following works, but I'm looking for a more elegant way of achieving this:
c.RemoveAll(x => x == "" || x == " ");
The desired result would be a List<string> containing " " and " "

If you want List<String> as a result you could execute this Linq query
string b = "ab c def gh";
List<String> c = Regex
.Matches(b, #"\s{2,}")
.OfType<Match>()
.Select(match => match.Value)
.ToList();

This should give you your desired List.
string b = "ab c def gh";
var regex = new Regex(#"\s\s+");
var result = new List<string>();
foreach (Match m in regex.Matches(b))
result.Add(m.Value);

If all you are interested in are these groups of whitespaces, you could use
foreach(var match in Regex.Matches(b, #"\s\s+")) {
// ... do something with match
}
This guarantees that you will match at least 2 whitespaces.

Rather than splitting using a Regex, try using Regex.Matches to get all items matching your pattern - in this case I've used a pattern to match two or more whitespace characters, which I think is what you want?
var matchValues = Regex.Matches("ab c def gh", "\\s\\s+")
.OfType<Match>().Select(m => m.Value).ToList();
Annoyingly, the MatchCollection returned by Regex.Matches isn't IEnumerable<Match>, hence the need to use OfType<> in the LINQ expression.

You can use the following single line :
var list =Regex.Matches(value,#"[ ]{2,}").Cast<Match>().Select(match => match.Value).ToList();
Hope it will help you.

Related

Regular expression to extract based on capital letters

Hi please can someone help with a C# regex to split into just two words as follows:
"SetTable" ->> ["Set", "Table"]
"GetForeignKey" ->> ["Get", "ForeignKey"] //No split on Key!

This can be solved in different ways; one method is the following
string source = "GetForeignKey";
var result = Regex.Matches(source, "[A-Z]").OfType<Match>().Select(x => x.Index).ToArray();
string a, b;
if (result.Length > 1)
{
a = source.Substring(0, result[1]);
b = source.Substring(result[1]);
}

Try the regex below
(?![A-Z][a-z]+Key)[A-Z][a-z]+|[A-Z][a-z]+Key
c# code
var matches = Regex.Matches(input, #"(?![A-Z][a-z]+Key)[A-Z][a-z]+|[A-Z][a-z]+Key");
foreach (Match match in matches)
match.Groups[0].Value.Dump();
for Splitting
matches.OfType<Match>().Select(x => x.Value).ToArray().Dump();
Fiddle

Split on numeric to letters excluding comma

I have a string containing "0,35mA" I now have the code below, which splits "0,35mA" into
"0"
","
"35"
"mA"
List<string> splittedString = new List<string>();
foreach (string strItem in strList)
{
splittedString.AddRange(Regex.Matches(strItem, #"\D+|\d+")
.Cast<Match>()
.Select(m => m.Value)
.ToList());
}
What I want is the code to be splitted into
"0,35"
"mA"
How do I achieve this?

It looks like you want to tokenize the string into numbers and everything else.
A better regex approach is to split with a number matching pattern while wrapping the whole pattern into a capturing group so as to also get the matching parts into the resulting array.
Since you have , as a decimal separator, you may use
var results = Regex.Split(s, #"([-+]?[0-9]*,?[0-9]+(?:[eE][-+]?[0-9]+)?)")
.Where(x => !string.IsNullOrEmpty(x))
.ToList();
See the regex demo:
The regex is based on the pattern described in Matching Floating Point Numbers with a Regular Expression.
The .Where(x => !string.IsNullOrEmpty(x)) is necessary to get rid of empty items (if any).

I assume that all your strings will have the same format.
So, try using this regex:
string regex = "([\\d|,]{4})|[\\w]{2}";
It should work.

var st = "0,35mA";
var li = Regex.Matches(st, #"([,\d]+)([a-zA-z]+)").Cast<Match>().ToList();
foreach (var t in li)
{
Console.WriteLine($"Group 1 {t.Groups[1]}")
Console.WriteLine($"Group 2 {t.Groups[2]}");
}
Group 1 0,35
Group 2 mA

Split constantly on the last delimiter in C#

I have the following string:
string x = "hello;there;;you;;;!;"
The result I want is a list of length four with the following substrings:
"hello"
"there;"
"you;;"
"!"
In other words, how do I split on the last occurrence when the delimiter is repeating multiple times? Thanks.

You need to use a regex based split:
var s = "hello;there;;you;;;!;";
var res = Regex.Split(s, #";(?!;)").Where(m => !string.IsNullOrEmpty(m));
Console.WriteLine(string.Join(", ", res));
// => hello, there;, you;;, !
See the C# demo
The ;(?!;) regex matches any ; that is not followed with ;.
To also avoid matching a ; at the end of the string (and thus keep it attached to the last item in the resulting list) use ;(?!;|$) where $ matches the end of string (can be replaced with \z if the very end of the string should be checked for).

It seems that you don't want to remove empty entries but keep the separators.
You can use this code:
string s = "hello;there;;you;;;!;";
MatchCollection matches = Regex.Matches(s, #"(.+?);(?!;)");
foreach(Match match in matches)
{
Console.WriteLine(match.Captures[0].Value);
}

string x = "hello;there;;you;;;!;"
var splitted = x.Split(new char[] { ';' }, StringSplitOptions.RemoveEmptryEntries);
foreach (var s in splitted)
Console.WriteLine("{0}", s);

C# regex. Everything inside curly brackets{} and mod(%) charaters

I'm trying to get the values between {} and %% in a same Regex.
This is what I have till now. I can successfully get values individually for each but I was curious to learn about how can I combine both.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
String s = "This is a {test} %String%. %Stack% {Overflow}";
Expected answer for the above string
test
String
Stack
Overflow
Individual regex
#"%(.*?)%" gives me String and Stack
#"\{([^}]*)\}" gives me test and Overflow
Following is my code.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
var matches = regex.Matches(s);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Value);
}

Similar to your regex. You can use Named Capturing Groups
String s = "This is a {test} %String%. %Stack% {Overflow}";
var list = Regex.Matches(s, #"\{(?<name>.+?)\}|%(?<name>.+?)%")
.Cast<Match>()
.Select(m => m.Groups["name"].Value)
.ToList();

If you want to learn how conditional expressions work, here is a solution using that kind of .NET regex capability:
(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})
See the regex demo
Here is how it works:
(?:(?<p>%)|(?<b>{)) - match and capture either Group "p" with % (percentage), or Group "b" (brace) with {
(?<v>.*?) - match and capture into Group "v" (value) any character (even a newline since I will be using RegexOptions.Singleline) zero or more times, but as few as possible (lazy matching with *? quantifier)
(?(p)%|}) - a conditional expression meaning: if "p" group was matched, match %, else, match }.
C# demo:
var s = "This is a {test} %String%. %Stack% {Overflow}";
var regex = "(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})";
var matches = Regex.Matches(s, regex, RegexOptions.Singleline);
// var matches_list = Regex.Matches(s, regex, RegexOptions.Singleline)
// .Cast<Match>()
// .Select(p => p.Groups["v"].Value)
// .ToList();
// Or just a demo writeline
foreach (Match match in matches)
Console.WriteLine(match.Groups["v"].Value);

Sometimes the capture is in group 1 and sometimes it's in group 2 because you have two pairs of parentheses.
Your original code will work if you do this instead:
Console.WriteLine(match.Groups[1].Value + match.Groups[2].Value);
because one group will be the empty string and the other will be the value you're interested in.

#"[\{|%](.*?)[\}|%]"
The idea being:
{ or %
anything
} or %

I think you should use a combination of conditional anda nested groups:
((\{(.*)\})|(%(.*)%))

Is there a better way to create acronym from upper letters in C#?

What is the best way to create acronym from upper letters in C#?
Example:
Alfa_BetaGameDelta_Epsilon
Expected result:
ABGDE
My solution works, but it's not nice
var classNameAbbreviationRegex = new Regex("[A-Z]+", RegexOptions.Compiled);
var matches = classNameAbbreviationRegex.Matches(enumTypeName);
var letters = new string[matches.Count];
for (var i = 0; i < matches.Count; i++)
{
letters[i] = matches[i].Value;
}
var abbreviation = string.Join(string.Empty, letters);

string.Join("", s.Where(char.IsUpper));

string.Join("", s.Where(x => char.IsUpper(x))

string test = "Alfa_BetaGameDelta_Epsilon";
string result = string.Concat(test.Where(char.IsUpper));

You can use the Where method to filter out the upper case characters, and the Char.IsUpper method can be used as a delegate directly without a lambda expression. You can create the resulting string from an array of characters:
string abbreviation = new String(enumTypeName.Where(Char.IsUpper).ToArray());

By using MORE regexes :-)
var ac = string.Join(string.Empty,
Regex.Match("Alfa_BetaGameDelta_Epsilon",
"(?:([A-Z]+)(?:[^A-Z]*))*")
.Groups[1]
.Captures
.Cast<Capture>()
.Select(p => p.Value));
More regexes are always the solution, expecially with LINQ! :-)
The regex puts all the [A-Z] in capture group 1 (because all the other () are non-capturing group (?:)) and "skips" all the non [A-Z] ([^A-Z]) by putting them in a non-capturing group. This is done 0-infinite times by the last *. Then a little LINQ to select the value of each capture .Select(p => p.Value) and the string.Join to join them.
Note that this isn't Unicode friendly... ÀÈÌÒÙ will be ignored. A better regex would use #"(?:(\p{Lu}+)(?:[^\p{Lu}]*))*" where \p{Lu} is the Unicode category UppercaseLetter.
(yes, this is useless... The other methods that use LINQ + IsUpper are better :-) but the whole example was built just to show the problems of Regexes with Unicode)
MUCH EASIER:
var ac = Regex.Replace("Alfa_BetaGameDelta_Epsilon", #"[^\p{Lu}]", string.Empty);
simply remove all the non-uppercase letters :-)

var str = "Alfa_BetaGammaDelta_Epsilon";
var abbreviation = string.Join(string.Empty, str.Where(c => c.IsUpper()));

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extract multiple white spaces from string - c#

If you want List<String> as a result you could execute this Linq query string b = "ab c def gh"; List<String> c = Regex .Matches(b, #"\s{2,}") .OfType<Match>() .Select(match => match.Value) .ToList();

This should give you your desired List. string b = "ab c def gh"; var regex = new Regex(#"\s\s+"); var result = new List<string>(); foreach (Match m in regex.Matches(b)) result.Add(m.Value);

If all you are interested in are these groups of whitespaces, you could use foreach(var match in Regex.Matches(b, #"\s\s+")) { // ... do something with match } This guarantees that you will match at least 2 whitespaces.

You can use the following single line : var list =Regex.Matches(value,#"[ ]{2,}").Cast<Match>().Select(match => match.Value).ToList(); Hope it will help you.

Related

Regular expression to extract based on capital letters

Split on numeric to letters excluding comma

Split constantly on the last delimiter in C#

C# regex. Everything inside curly brackets{} and mod(%) charaters

Is there a better way to create acronym from upper letters in C#?

Categories

Resources