C# Regex split by delimiter - c#

I`m facing with regex split problem.
Here is my pattern
string[] words = Regex.Split(line, "[\\s,.;:/?!()\\-]+");
And this is text file:
ir KAS gi mus nugales.
jei! mes MIRTI NEBIJOM,
JEIGU mes nugalejom mirti
DZUKAS
And I have a task to find last word in upper, here is code:
z = words.LastOrDefault(c => c.All(ch => char.IsUpper(ch)));
When in end of the line is some kind of delimiter, it just dont print z . When there are no delimiter (3th, 4th lines), everything is going fine..
Why does it happen?

Why not match the words (not split), and take the last one?
string source = #"ir KAS gi mus nugales.
jei!mes MIRTI NEBIJOM,
JEIGU mes nugalejom mirti
DZUKAS";
// or #"\b\p{Lu}+\b" depending on letters you want being selected out
string pattern = #"\b[A-Z]+\b";
string result = Regex
.Matches(source, pattern)
.OfType<Match>()
.Select(match => match.Value)
.LastOrDefault();
Edit: If I understand your requirements right (Regex.Split must be preserved, and you have to output the last all caps letters word per each line), you're looking for something like this:
var result = source
.Split(new string[] { Environment.NewLine }, StringSplitOptions.None)
.Select(line => Regex.Split(line, "[\\s,.;:/?!()\\-]+"))
.Select(words => words
.Where(word => word.Length > 0 && word.All(c => char.IsUpper(c)))
.LastOrDefault());
// You may want to filter out lines which doesn't have all-ups words:
// .Where(line => line != null);
Test
Console.Write(string.Join(Environment.NewLine, result));
Output
KAS
NEBIJOM
JEIGU
DZUKAS
Please notice, that .All(c => char.IsUpper(c)) includes empty string case, that's why we have to add explicit word.Length > 0. So you've faced not Regex but Linq problem (empty string sutisfies .All(...) condition).

using System;
using System.Text.RegularExpressions;
namespace ConsoleApp
{
class Program
{
static void Main()
{
string s = #"ir KAS gi mus nugales.
jei!mes MIRTI NEBIJOM,
JEIGU mes nugalejom mirti
DZUKAS";
Match result = Regex.Match(s, "([A-Z]+)", RegexOptions.RightToLeft);
Console.WriteLine(result.Value);
Console.ReadKey();
}
}
}

From the question and comments it's hard to figure out what you want but I'll try to cover both cases.
If you're looking for the last word in whole text that is uppercase you can do something like this :
Regex r = new Regex("[,.;:/?!()\\-]+", RegexOptions.Multiline);
string result = r.Replace(source, string.Empty).Split(' ').LastOrDefault(word => word.All(c => char.IsUpper(c));
If you want to find the last match from each line :
Regex r = new Regex("[,.;:/?!()\\-]+", RegexOptions.Multiline);
string[] result = r.Replace(source, string.Empty).Split(Environment.NewLine).Select(line => line.Split(' ').LastOrDefault(word => word.All(c => char.IsUpper(c)).ToArray();
EDIT:

Related

printing a reverse sentence with punctuation c#

I'm trying to reverse a sentence like the following:
The input:
my name is john. i am 23 years old.
The output:
.old years 23 am i .john is name my
I can't figure it out how to switch the dot at the end.
I tried using Split but it always return the dot at the end of the word.
string[] words = sentence.Split(' ');
Array.Reverse(words);
return string.Join(" ", words);
Add extra logic to move period ('.') before the word starts, like
var sentence ="my name is john. i am 23 years old."; //Input string
string[] words = sentence.Split(' '); //Split sentence into words
Array.Reverse(words); //Revere the array of words
//If word starts with . then start your word with period and trim end.
var result = words.Select(x => x.EndsWith('.') ? $".{x.Trim('.')}" : x);
//^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^This was missing
Console.WriteLine(string.Join(" ", result));
Elegant one liner approach suggested by #metroSmurf in the comment section
var result = sentence.Split(' ') //Split array with space as a delimiter.
.Reverse() //Use IEnumerable<T>.Reverse to reverse the array. No need to use Array.Reverse()
.Select(x => x.EndsWith('.') ? $".{x.Trim('.')}" : x); //Apply same logic mentioned above.
Try online
You are reversing words separated by space (in old. the point is part of the word). If you want to reverse the points too you want to consider them as words (sperated by space):
public static class TextExtensions
{
public static string PointTrick(this string str) => str.Replace(".", " .");
public static string PointUntrick(this string str) => str.Replace(". ", ".");
public static string ReverseWords(this string str) => string.Join(" ", str.Split(" ").Reverse());
}
Those tests pass.
[TestClass]
public class SOTests
{
private string GetReversedWithPointTrick(string input) => input.PointTrick().ReverseWords().PointUntrick();
[TestMethod]
public void ReverseWordsTest()
{
var sut = "three two. one";
var expected = "one two. three";
var result = sut.ReverseWords();
Assert.AreEqual(expected, result);
}
[TestMethod]
public void ReverseWords_PointTrick_Test()
{
var sut = "my name is john. i am 23 years old.";
var expected = ".old years 23 am i .john is name my";
var result = GetReversedWithPointTrick(sut);
Assert.AreEqual(expected, result);
}
}
You can try combination of Linq and Regular Expressions:
using System.Linq;
using System.Text.RegularExpressions;
...
string source = "my name is john. i am 23 years old.";
// .old years 23 am i .john is name my
string result = string.Join(" ", Regex
.Matches(source, #"\S+")
.Cast<Match>()
.Select(m => Regex.Replace(m.Value, #"^(.*?)(\p{P}+)$", "$2$1"))
.Reverse());
Here we use two patterns: a simple one "\S+" which matches any characters which are not whitespaces. The next pattern ^(.*?)(\p{P}+)$ worth explaining:
^(.*?)(\p{P}+)$
here
^ - anchor, start of the string
(.*?) - group #1: any symbols, but as few as possible
(\p{P}+) - group #2: one or more punctuation symbols
$ - anchor, end of the string
and when matched we swap these groups: "&2&1"
Demo:
private static string Solve(string source) => string.Join(" ", Regex
.Matches(source, #"\S+")
.Cast<Match>()
.Select(m => Regex.Replace(m.Value, #"^(.*?)(\p{P}+)$", "$2$1"))
.Reverse());
...
string[] tests = new string[] {
"my name is john. i am 23 years old.",
"It's me, here am I!",
"Test...",
};
string report = string.Join(Environment.NewLine, tests
.Select(test => $"{test,-35} => {Solve(test)}"));
Console.Write(report);
Outcome:
my name is john. i am 23 years old. => .old years 23 am i .john is name my
It's me, here am I! => !I am here ,me It's
Test... => ...Test
Because someone always has to post a LINQ version of these things 😜
sentence.Split().Reverse.Select(w => w[^1] == '.' ? ('.' + w[..^1]) : w);
Split without any argument splits on whitespace, Reverse is a LINQ thing that reverses the input and then we just have a bit of logic that asks if the last (^1 is from the indexes and ranges feature of c# 9, meaning "one from the end") char is a dot, move it to the start (concat a dot plus all the string up to 1 from the end) othwise just output the word..
And all that remains is to string join it, which you know how to do: string.Join(" ", ...)

Trying to filter only digits in string array using LINQ

I'm trying to filter only digits in string array. This works if I have this array:
12324 asddd 123 123, but if I have chars and digits in one string e.g. asd1234, it does not take it.
Can u help me how to do it ?
int[] result = input
.Where(x => x.All(char.IsDigit))// tried with .Any(), .TakeWhile() and .SkipWhile()
.Select(int.Parse)
.Where(x => x % 2 == 0)
.ToArray();
Something like this should work. The function digitString will select only digits from the input string, and recombine into a new string. The rest is simple, just predicates selecting non-empty strings and even numbers.
var values = new[]
{
"helloworld",
"hello2",
"4",
"hello123world123"
};
bool isEven(int i) => i % 2 == 0;
bool notEmpty(string s) => s.Length > 0;
string digitString(string s) => new string(s.Where(char.IsDigit).ToArray());
var valuesFiltered = values
.Select(digitString)
.Where(notEmpty)
.Select(int.Parse)
.Where(isEven)
.ToArray();
You need to do it in 2 steps: First filter out all the invalid strings, then filter out all the non-digits in the valid strings.
A helper Method would be very readable here, but it is also possible with pure LINQ:
var input = new[]{ "123d", "12e", "pp", "33z3"};
input
.Where(x => x.Any(char.IsDigit))
.Select(str => string.Concat(str.Where(char.IsDigit)));
Possible null values should be drop to avoid NullReferenceException.
string.Join() suitable for concatenation with digit filtering.
Additinally empty texts should be dropped because it cannot be converted to an integer.
string[] input = new string[] { "1234", "asd124", "2345", "2346", null, "", "asdfas", "2" };
int[] result = input
.Where(s => s != null)
.Select(s => string.Join("", s.Where(char.IsDigit)))
.Where(s => s != string.Empty)
.Select(int.Parse)
.Where(x => x % 2 == 0)
.ToArray();
Using Linq Aggregate method and TryParse() can give you perfect result:
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
var input = new string[] { "aaa123aaa124", "aa", "778", "a777", null };
var result = input.Aggregate(
new List<int>(),
(x, y) =>
{
if (y is null)
return x;
var digitOnlyString = Regex.Replace(y, "[^0-9]", string.Empty);
if (int.TryParse(digitOnlyString, out var temp) && temp % 2 == 0)
x.Add(temp);
return x;
})
.ToArray();
Max,
You can do this in a single expression like so:
using System.Linq;
using System.Text.RegularExpressions;
var input = new[] { "aaa123aaa124", "aa", "778", "a777", null };
var rx = new Regex(#"[0-9]+");
var numbersOnly = input.Where(s => !string.IsNullOrEmpty(s) && rx.IsMatch(s))
.Select(s => string.Join("", rx.Matches(s).Cast<Match>().Select(m => m.Value)));
foreach (var number in numbersOnly) Console.WriteLine(number);
Which returns:
123124
778
777
if I have chars and digits in one string e.g. asd1234, it does not take it
Apparently you want to parse this line also. You want to translate "asd1234" to "1234" and then parse it.
But what if your input sequence of strings contains a string with two numbers: "123asd456". Do you want to interpret this as "123", or maybe as "123456", or maybe you consider this as two numbers "123" and "456".
Let's assume you don't have this problem: every string contains at utmost one number, or if you have a string with more than one number, you only want the first number.
In fact, you only want to keep those string that are "zero or more non-digits followed by one or more digits followed by zero or more characters.
Enter Regular Expressions!
const string regexTxt = "\D*\d+.*";
Regex regex = new Regex(regexTxt);
\D: any non-digit
*: zero or more
\d: any digit
+: one or more
. any character
(...) capture the parts between the parentheses
So this regular expression matches any string that starts with zero or more non-digits, followed by at least one digit, followed by zero or more characters. Capture the "at least one digit" part.
If you try to Match() an input string with this regular expression, you get a Match object. Property Success tells you whether the input string is according to the regular expression.
The Match object has a property Groups which contains all Matches. Groups[0] is the complete string, Groups1 contains a Group which has the first captured string in property Value.
A simple program that shows how to use the regular expression:
const string regexTxt = #"\D*(\d+).*";
Regex regex = new Regex(regexTxt);
var lines = new string[]
{
String.Empty,
"A",
"A lot of text, no numbers",
"1",
"123456",
"Some text and then a number 123",
"Several characters, then a number: 123 followed by another number 456!",
"___123---456...",
};
foreach (var line in lines)
{
Match match = regex.Match(line);
if (match.Success)
{
string capturedDigits = match.Groups[1].Value;
int capturedNumber = Int32.Parse(capturedDigits);
Console.WriteLine("{0} => {1}", line, capturedNumber);
}
}
Or in a LINQ statement:
const string regexTxt = #"\D*(\d+).*";
Regex regex = new Regex(regexTxt);
IEnumerable<string> sourceLines = ...
var numbers= sourceLines
.Select(line => regex.Match(line)) // Match the Regex
.Where(match => match.IsMatch) // Keep only the Matches that match
.Select(match => Int32.Parse(match.Groups[1].Value);
// Finally Parse the captured text to int

How to first 'Split a string to an Array' then 'Add something to that Array'? || C# Console App

I'm trying to create a program that splits a string to an array then adds
to that array.
Splitting the string works but adding to the array is really putting up a
fight.
//here i create the text
string text = Console.ReadLine();
Console.WriteLine();
//Here i split my text to elements in an Array
var punctuation = text.Where(Char.IsPunctuation).Distinct().ToArray();
var words = text.Split().Select(x => x.Trim(punctuation));
//here i display the splitted string
foreach (string x in words)
{
Console.WriteLine(x);
}
//Here a try to add something to the Array
Array.words(ref words, words.Length + 1);
words[words.Length - 1] = "addThis";
//I try to display the updated array
foreach (var x in words)
{
Console.WriteLine(x);
}
//Here are the error messages |*error*|
Array.|*words*|(ref words, words.|*Length*| + 1);
words[words.|*Length*| - 1] = "addThis";
'Array' does not contain definition for 'words'
Does not contain definition for Length
Does not contain definition for length */
Convert the IEnumerable to List:
var words = text.Split().Select(x => x.Trim(punctuation)).ToList();
Once it is a list, you can call Add
words.Add("addThis");
Technically, if you want to split on punctuation, I suggest Regex.Split instead of string.Split
using System.Text.RegularExpressions;
...
string text =
#"Text with punctuation: comma, full stop. Apostroph's and ""quotation?"" - ! Yes!";
var result = Regex.Split(text, #"\p{P}");
Console.Write(string.Join(Environment.NewLine, result));
Outcome:
Text with punctuation # Space is not a punctuation, 3 words combined
comma
full stop
Apostroph # apostroph ' is a punctuation, split as required
s and
quotation
Yes
if you want to add up some items, I suggest Linq Concat() and .ToArray():
string text =
string[] words = Regex
.Split(text, #"\p{P}")
.Concat(new string[] {"addThis"})
.ToArray();
However, it seems that you want to extract words, not to split on puctuation which you can do matching these words:
using System.Linq;
using System.Text.RegularExpressions;
...
string text =
#"Text with punctuation: comma, full stop. Apostroph's and ""quotation?"" - ! Yes!";
string[] words = Regex
.Matches(text, #"[\p{L}']+") // Let word be one or more letters or apostrophs
.Cast<Match>()
.Select(match => match.Value)
.Concat(new string[] { "addThis"})
.ToArray();
Console.Write(string.Join(Environment.NewLine, result));
Outcome:
Text
with
punctuation
comma
full
stop
Apostroph's
and
quotation
Yes
addThis

Split on numeric to letters excluding comma

I have a string containing "0,35mA" I now have the code below, which splits "0,35mA" into
"0"
","
"35"
"mA"
List<string> splittedString = new List<string>();
foreach (string strItem in strList)
{
splittedString.AddRange(Regex.Matches(strItem, #"\D+|\d+")
.Cast<Match>()
.Select(m => m.Value)
.ToList());
}
What I want is the code to be splitted into
"0,35"
"mA"
How do I achieve this?
It looks like you want to tokenize the string into numbers and everything else.
A better regex approach is to split with a number matching pattern while wrapping the whole pattern into a capturing group so as to also get the matching parts into the resulting array.
Since you have , as a decimal separator, you may use
var results = Regex.Split(s, #"([-+]?[0-9]*,?[0-9]+(?:[eE][-+]?[0-9]+)?)")
.Where(x => !string.IsNullOrEmpty(x))
.ToList();
See the regex demo:
The regex is based on the pattern described in Matching Floating Point Numbers with a Regular Expression.
The .Where(x => !string.IsNullOrEmpty(x)) is necessary to get rid of empty items (if any).
I assume that all your strings will have the same format.
So, try using this regex:
string regex = "([\\d|,]{4})|[\\w]{2}";
It should work.
var st = "0,35mA";
var li = Regex.Matches(st, #"([,\d]+)([a-zA-z]+)").Cast<Match>().ToList();
foreach (var t in li)
{
Console.WriteLine($"Group 1 {t.Groups[1]}")
Console.WriteLine($"Group 2 {t.Groups[2]}");
}
Group 1 0,35
Group 2 mA

Split constantly on the last delimiter in C#

I have the following string:
string x = "hello;there;;you;;;!;"
The result I want is a list of length four with the following substrings:
"hello"
"there;"
"you;;"
"!"
In other words, how do I split on the last occurrence when the delimiter is repeating multiple times? Thanks.
You need to use a regex based split:
var s = "hello;there;;you;;;!;";
var res = Regex.Split(s, #";(?!;)").Where(m => !string.IsNullOrEmpty(m));
Console.WriteLine(string.Join(", ", res));
// => hello, there;, you;;, !
See the C# demo
The ;(?!;) regex matches any ; that is not followed with ;.
To also avoid matching a ; at the end of the string (and thus keep it attached to the last item in the resulting list) use ;(?!;|$) where $ matches the end of string (can be replaced with \z if the very end of the string should be checked for).
It seems that you don't want to remove empty entries but keep the separators.
You can use this code:
string s = "hello;there;;you;;;!;";
MatchCollection matches = Regex.Matches(s, #"(.+?);(?!;)");
foreach(Match match in matches)
{
Console.WriteLine(match.Captures[0].Value);
}
string x = "hello;there;;you;;;!;"
var splitted = x.Split(new char[] { ';' }, StringSplitOptions.RemoveEmptryEntries);
foreach (var s in splitted)
Console.WriteLine("{0}", s);

Categories