Regular expression to break string C# - c#

Here is my string:
1-1 This is my first string. 1-2 This is my second string. 1-3 This is my third string.
How can I break like in C# like;
result[0] = This is my first string.
result[1] = This is my second string.
result[2] = This is my third string.

IEnumerable<string> lines = Regex.Split(text, "(?:^|[\r\n]+)[0-9-]+ ").Skip(1);
EDIT: If you want the result in an array you can do string[] result = lines.ToArray();

Regex regex = new Regex("^(?:[0-9]+-[0-9]+ )(.*?)$", RegexOptions.Multiline);
var str = "1-1 This is my first string.\n1-2 This is my second string.\n1-3 This is my third string.";
var matches = regex.Matches(str);
List<string> strings = matches.Cast<Match>().Select(p => p.Groups[1].Value).ToList();
foreach (var s in strings)
{
Console.WriteLine(s);
}
We use a multiline Regex, so that ^ and $ are the beginning and end of the line. We skip one or more numbers, a -, one or more numbers and a space (?:[0-9]+-[0-9]+ ). We lazily (*?) take everything (.) else until the end of the line (.*?)$, lazily so that the end of the line $ is more "important" than any character .
Then we put the matches in a List<string> using Linq.

Lines will end with newline, carriage-return or both, This splits the string into lines with all line-endings.
using System.Text.RegularExpressions;
...
var lines = Regex.Split( input, "[\r\n]+" );
Then you can do what you want with each line.
var words = Regex.Split( line[i], "\s" );

Related

My Regex.Split with '\n' takes up two spaces instead of 1

I need to split my text into each word, space, and new line.
Although the words and spaces are properly working, the \n is taking up two spaces only if it's not after a word.
Example: "\nTest\nword", here, the first \n takes up two spaces while the second one takes up one.
How would I write the proper regex?
My code:
string delimiterChars = "([ \r\n])";
wordArray = Regex.Split(myTexy, delimiterChars);
For context, I am using Unity.
Input: enter image description here
Output: enter image description here
On the output of the picture: The first element is empty and the second is \n here. I don't want the empty element.
Regex.Split will always produce empty items where the matches are consecutive, or when they are at the start/end of string.
Instead, you can use a matching and extracting approach:
string delimiterChars = "[^ \r\n]+|[ \r\n]";
string[] wordArray = Regex.Matches(myTexy, delimiterChars)
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
The [^ \r\n]+|[ \r\n] regex matches one or more chars other than a space, CR and LF, or a space, CR or an LF char.
You can use regular expressions to remove leading delimiter characters.
var myTexy = "\nTest\nword";
string delimiterChars = "([ \r\n])";
myTexy = Regex.Replace(myTexy, "^" + delimiterChars, "");
var wordArray = Regex.Split(myTexy, delimiterChars);
The "^" regex option says only look for these characters at the beginning of the string.
Also, just so you are aware the behavior you are seeing is intended and is documented here:
If a match is found at the beginning or the end of the input string,
an empty string is included at the beginning or the end of the
returned array.
Let me know if this is what you are looking for -
String text = "\nTest\nword";
string[] words = Regex.Split(text, #"(\n+)");
Output -
Try this :-
string myStr = "This is test text";
wordArray = myStr.Split(new char[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries);
Output:

Split constantly on the last delimiter in C#

I have the following string:
string x = "hello;there;;you;;;!;"
The result I want is a list of length four with the following substrings:
"hello"
"there;"
"you;;"
"!"
In other words, how do I split on the last occurrence when the delimiter is repeating multiple times? Thanks.
You need to use a regex based split:
var s = "hello;there;;you;;;!;";
var res = Regex.Split(s, #";(?!;)").Where(m => !string.IsNullOrEmpty(m));
Console.WriteLine(string.Join(", ", res));
// => hello, there;, you;;, !
See the C# demo
The ;(?!;) regex matches any ; that is not followed with ;.
To also avoid matching a ; at the end of the string (and thus keep it attached to the last item in the resulting list) use ;(?!;|$) where $ matches the end of string (can be replaced with \z if the very end of the string should be checked for).
It seems that you don't want to remove empty entries but keep the separators.
You can use this code:
string s = "hello;there;;you;;;!;";
MatchCollection matches = Regex.Matches(s, #"(.+?);(?!;)");
foreach(Match match in matches)
{
Console.WriteLine(match.Captures[0].Value);
}
string x = "hello;there;;you;;;!;"
var splitted = x.Split(new char[] { ';' }, StringSplitOptions.RemoveEmptryEntries);
foreach (var s in splitted)
Console.WriteLine("{0}", s);

search string for everything before a set of characters in C#

I'm looking for a way to search a string for everything before a set of characters in C#. For Example, if this is my string value:
This is is a test.... 12345
I want build a new string with all of the characters before "12345".
So my new string would equal "This is is a test.... "
Is there a way to do this?
I've found Regex examples where you can focus on one character but not a sequence of characters.
You don't need to use a Regex:
public string GetBitBefore(string text, string end)
{
var index = text.IndexOf(end);
if (index == -1) return text;
return text.Substring(0, index);
}
You can use a lazy quantifier to match anything, followed by a lookahead:
var match = Regex.Match("This is is a test.... 12345", #".*?(?=\d{5})");
where:
.*? lazily matches everything (up to the lookahead)
(?=…) is a positive lookahead: the pattern must be matched, but is not included in the result
\d{5} matches exactly five digits. I'm assuming this is your lookahead; you can replace it
You can do so with help of regex lookahead.
.*(?=12345)
Example:
var data = "This is is a test.... 12345";
var rxStr = ".*(?=12345)";
var rx = new System.Text.RegularExpressions.Regex (rxStr,
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
var match = rx.Match(data);
if (match.Success) {
Console.WriteLine (match.Value);
}
Above code snippet will print every thing upto 12345:
This is is a test....
For more detail about see regex positive lookahead
This should get you started:
var reg = new Regex("^(.+)12345$");
var match = reg.Match("This is is a test.... 12345");
var group = match.Groups[1]; // This is is a test....
Of course you'd want to do some additional validation, but this is the basic idea.
^ means start of string
$ means end of string
The asterisk tells the engine to attempt to match the preceding token zero or more times. The plus tells the engine to attempt to match the preceding token once or more
{min,max} indicate the minimum/maximum number of matches.
\d matches a single character that is a digit, \w matches a "word character" (alphanumeric characters plus underscore), and \s matches a whitespace character (includes tabs and line breaks).
[^a] means not so exclude a
The dot matches a single character, except line break characters
In your case there many way to accomplish the task.
Eg excluding digit: ^[^\d]*
If you know the set of characters and they are not only digit, don't use regex but IndexOf(). If you know the separator between first and second part as "..." you can use Split()
Take a look at this snippet:
class Program
{
static void Main(string[] args)
{
string input = "This is is a test.... 12345";
// Here we call Regex.Match.
MatchCollection matches = Regex.Matches(input, #"(?<MySentence>(\w+\s*)*)(?<MyNumberPart>\d*)");
foreach (Match item in matches)
{
Console.WriteLine(item.Groups["MySentence"]);
Console.WriteLine("******");
Console.WriteLine(item.Groups["MyNumberPart"]);
}
Console.ReadKey();
}
}
You could just split, not as optimal as the indexOf solution
string value = "oiasjdoiasj12345";
string end = "12345";
string result = value.Split(new string[] { end }, StringSplitOptions.None)[0] //Take first part of the result, not the quickest but fairly simple

Remove numbers in specific part of string (within parentheses)

I have a string Test123(45) and I want to remove the numbers within the parenthesis. How would I go about doing that?
So far I have tried the following:
string str = "Test123(45)";
string result = Regex.Replace(str, "(\\d)", string.Empty);
This however leads to the result Test(), when it should be Test123().
tis replaces all parenthesis, filled with digits by parenthesis
string str = "Test123(45)";
string result = Regex.Replace(str, #"\(\d+\)", "()");
\d+(?=[^(]*\))
Try this.Use with verbatinum mode #.The lookahead will make sure number have ) without ( before it.Replace by empty string.
See demo.
https://regex101.com/r/uE3cC4/4
string str = "Test123(45)";
string result = Regex.Replace(str, #"\(\d+\)", "()");
you can also try this way:
string str = "Test123(45)";
string[] delimiters ={#"("};;
string[] split = str.Split(delimiters, StringSplitOptions.None);
var b=split[0]+"()";
Remove a number that is in fact inside parentheses BUT not the parentheses and keep anything else inside them that is not a number with C# Regex.Replace means matching all parenthetical substrings with \([^()]+\) and then removing all digits inside the MatchEvaluator.
Here is a C# sample program:
var str = "Test123(45) and More (5 numbers inside parentheses 123)";
var result = Regex.Replace(str, #"\([^()]+\)", m => Regex.Replace(m.Value, #"\d+", string.Empty));
// => Test123() and More ( numbers inside parentheses )
To remove digits that are enclosed in ( and ) symbols, the ASh's \(\d+\) solution will work well: \( matches a literal (, \d+ matches 1+ digits, \) matches a literal ).

Regex Ignore first and last terminator

I have string in text that have uses | as a delimiter.
Example:
|2P|1|U|F8|
I want the result to be 2P|1|U|F8. How can I do that?
The regex is very easy, but why not just use Trim():
var str = "|2P|1|U|F8|";
str = str.Trim(new[] {'|'});
or just without new[] {...}:
str = str.Trim('|');
Output:
In case there are leading/trailing whitespaces, you can use chained Trims:
var str = "\r\n |2P|1|U|F8| \r\n";
str = str.Trim().Trim('|');
Output will be the same.
You can use String.Substring:
string str = "|2P|1|U|F8|";
string newStr = str.Substring(1, str.Length - 2);
Just remove the starting and the ending delimiter.
#"^\||\|$"
Use the below regex and then replace the match with an empty string.
Regex rgx = new Regex(#"^\||\|$");
string result = rgx.Replace(input, "");
Use mulitline modifier m when you're dealing with multiple lines.
Regex rgx = new Regex(#"(?m)^\||\|$");
Since | is a special char in regex, you need to escape this in-order to match a literal | symbol.
string input = "|2P|1|U|F8|";
foreach (string item in input.Split("|".ToCharArray(), StringSplitOptions.RemoveEmptyEntries))
{
Console.WriteLine(item);
}
Result is:
2P
1
U
F8
^\||\|$
You can try this.Replace by empty string.Use verbatim mode.See demo.
https://regex101.com/r/oF9hR9/14
For completionists-sake, you can also use Mid
Strings.Mid("|2P|1|U|F8|", 2, s.Length - 2)
This will cut out the part from the second character to the previous to last one and produce the correct output.
I'm assuming that at some point you will want to parse the string to extract its '|' separated components, so here goes another alternative that goes in that direction:
string.Join("|", theString.Split(new[] {'|'}, StringSplitOptions.RemoveEmptyEntries))

Categories