Split string base on the last N numbers of delimiters - c#

I need help to develop a logic to split a string, but only based on the last 2 delimiters of the string.
Example inputs:
string s1 = "Dog \ Cat \ Bird \ Cow";
string s2 = "Hello \ World \ How \ Are \ You";
string s3 = "I \ am \ Peter";
Expected Outputs:
string[] newS1 = "Dog Cat", "Bird", "Cow"
string[] newS2 = "Hello World How", "Are", "You"
string[] newS3 = "I", "am", "Peter"
So, as you can see, I only want to split the string on the last 2 "\", and everything else before the last 2 "\" will be concatenated into one string.
I tried the .Split method but it will just split every "\" in a string.
Edited: If the string has less than 2 "\", it will just split according to whatever it has
Updates: Wow, these are a bunch of interesting solutions! Thank you a lot!

Try this:
var parts = s1.Split(new[] { " \\ " }, StringSplitOptions.None);
var partsCount = parts.Count();
var result = new[] { string.Join(" ", parts.Take(partsCount - 2)) }.Concat(parts.Skip(partsCount - 2));

Offering a regex solution:
var output = Regex.Split(input, #"\s*\\\s*([^\\]*?)\s*\\\s*(?=[^\\]*$)");
This split finds the second to last element and splits around that, but captures it in a group so it will be included in the output array.
For input "Dog \ Cat \ Bird \ Cow", this will produce { "Dog \ Cat", "Bird", "Cow" }. If you also need to strip the \ out of the first element that can be done with a simple replace:
output[0] = output[0].Replace(" \\", "");
Update: This version will correctly handle strings with only one delimiter:
var output = Regex.Split(str, #"\s*\\\s*([^\\]*?)\s*\\\s*(?=[^\\]*$)|(?<=^[^\\\s]*)\s*\\\s*(?=[^\\\s]*$)");
Update: And to match other delimiters like whitespace, "~", and "%", you can use a character class:
var output = Regex.Split(str, #"(?:[%~\s\\]+([^%~\s\\]+?)[%~\s\\]+|(?<=^[^%~\s\\]+)[%~\s\\]+)(?=[^%~\s\\]+$)");
The structure of this regex is slightly simpler than the previous one since it represents any sequence of one or more characters in the class [%~\s\\] as a delimiter, and any sequence of one or more characters in the negated character class [^%~\s\\] to be a segment. Note that the \s means 'whitespace' character.
You might also be able to simplify this further using:
var output = Regex.Split(str, #"(?:\W+(\w+)\W+|(?<=^\w+)\W+)(?=\w+$)");
Where \w matches any 'word' character (letters, digits, or underscores) and \W matches any 'non-word' character.

Looks like you want to Split the string on every <space>\<space>:
string input = #"Dog \ Cat \ Bird \ Cow";
string[] parts = input.Split(new string[]{#" \ "},
StringSplitOptions.None);
And then Join everything with a space in between, except the final two parts:
// NOTE: Check that there are at least 2 parts.
string part0 = String.Join(" ", parts.Take(parts.Length - 2));
string part1 = parts[parts.Length - 2];
string part2 = parts[parts.Length - 1];
This will give you three strings, which you can put in an array.
string[] newParts = new []{ part0, part1, part2 };
In this example:
new [] { "Dog Cat", "Bird", "Cow" }

How about simply taking the output of split, then taking first N-2 items and Join back together, then create new string array of 3 items, first being output of Join, second being item N-1 of first split, and third being N of first split. I think that'll accomplish what you're trying to do.

Interesting question. My initial solution to this would be:
String[] tokens = theString.Split("\\");
String[] components = new String[3];
for(int i = 0; i < tokens.length - 2; i++)
{
components[0] += tokens[i];
}
components[1] = tokens[tokens.length - 2];
components[2] = tokens[tokens.length - 1];

Loop from the end of the string and count delimiters until you encounter two.
Record index positions in 2 variables previously set to -1.
After the loop, if first var is -1, nothing happens, return whole string.
If second var is -1, create array of 2 strings, split using substring and return.
Create array of 3 string, split using information from two vars, return.
Hope you understood my pseudocode, give me a comment if you need help.

Related

C# how to separate a string by numbering (1. 2. ...)

I have this string "1: This 2: Is 3: A 4: Test" and would like to split it based on the numbering, like this:
"1: This"
"2: Is"
"3: A"
"4: Test"
I think this should be possible with a regular expression, but unfortunately I don't understand much about it.
This: string[] result = Regex.Split(input, #"\D+"); just splits the numbers without the colon and the content behind it.
You can use
string[] result = Regex.Split(text, #"(?!^)(?=(?<!\d)\d+:)")
See this regex demo. Note that the (?<!\d) negative lookbehind is necessary when you have bullet point with two or more digits. Details:
(?!^) - not at the start of string
(?=(?<!\d)\d+:) - the position that is immediately followed with one or more digits (not preceded with any digit) and a : char.
If you use a capture group () like this:
string[] result = Regex.Split(str, #"(\d+:)");
the captured values will be added to the array too. Then all that is left to do is to merge every first value with every second value (we skip index 0 as it is empty):
List<string> values = new();
for (int i = 1; i < result.Length; i += 2)
{
values.Add(result[i] + result[i + 1]);
}
There are probably cleaner ways to do this, but this works.
Using \D+ matches 1 or more non digits, and will therefore match : This to split on.
Instead of using split, you can also match the parts:
\b[0-9]+:.*?(?=\b[0-9]+:|$)
The pattern matches:
\b A word boundary to prevent a partial word match
[0-9]+: Match 1+ digits and :
.*? Match as least as possible characters
(?=\b[0-9]+:|$) Positive lookahead, assert either 1+ digits and : or the end of the string to the right
.NET regex demo
Example in C#:
string str = "1: This 2: Is 3: A 4: Test";
string pattern = #"\b[0-9]+:.*?(?=\b[0-9]+:|$)";
MatchCollection matchList = Regex.Matches(str, pattern);
string[] result = matchList.Cast<Match>().Select(match => match.Value).ToArray();
Array.ForEach(result, Console.WriteLine);
Output
1: This
2: Is
3: A
4: Test
Split by space then take each second item. Because if you define the word as something delimited by (white)space, '1.' or '2.' are words too, and you aren't able to distinguish them.
string[] split = content.Split(' ', StringSplitOptions.None);
string[] result = new string[split.Length / 2];
for (int i = 1; i < split.Length; i = i + 2) result[i / 2] = split[i];

C# Unable to split string by new lines \n

I'm facing a odd problem were C# is unable to split a string for new lines. I tried many combinations like use only Split.('\n') but all lead to return the whole string unsplited on first position of the array so lines[0] is the same as the input string to be splited, that never happen before with other strings i had to parse.
Image bellow:
String:
Don't remove the following keywords! These keywords are used in the
"compatible printer" condition of the print and filament profiles to
link the particular print and filament profiles to this printer
profile.\nPRINTER_VENDOR_PRUSA3D\nPRINTER_MODEL_SL1\nPRINTER_VENDOR_EPAX\nPRINTER_MODEL_X1\n\nSTART_CUSTOM_VALUES\nFLIP_XY\nLayerOffTime_0\nBottomLightOffDelay_2\nBottomLiftHeight_5\nLiftHeight_5.5\nBottomLiftSpeed_40.2\nLiftSpeed_60\nRetractSpeed_150\nBottomLightPWM_255\nLightPWM_255\nAntiAliasing_4
; Use 0 or 1 for disable AntiAliasing with "printer gamma correction"
set to 0, otherwise use multiples of 2 and "gamma correction" set to 1
for enable\nEND_CUSTOM_VALUES
Code:
var lines = previousString.Split(new[] { "\r\n", "\r", "\n" }, StringSplitOptions.RemoveEmptyEntries);
Output:
An array of lenght = 1 producing lines[0] == previousString
string[] lines = theText.Split(
new[] { Environment.NewLine },
StringSplitOptions.None
);
edit:
string[] lines = theText.Split(
new[] { "\r\n", "\r", "\n" },
StringSplitOptions.None
);
working fiddle: https://dotnetfiddle.net/HNY8a6
See: this SO post
Sometimes when you see a \n on screen it really is a backslash (ASCII 92 and an en(ASCII 110) not a placeholder/escape sequence for new line (ASCII 10) A big hint for that here is that text boxes will usually not display newlines with escape codes but will put in actual new lines.
To split on \n use the string "\\n" which represents a string of two characters: the two backslashes produce a single character ASCII 92 = '' in the string and then a lowercase n.
Alternately you could use #"\n". The # sign tells C# not to use escape codes in the quoted string.
I'm not quite sure why you are using the Printer methods but I hope you don't require them.
string test = "Hello \nTest \n123"; //Create Test String
string[] seperated = test.Split('\n'); //Splite String by '\n'
for(int i = 0; i < seperated.Length; i++){ //Output substrings
Console.WriteLine(seperated[i]);
}
Output:
Hello
Test
123
I hope this solution works for you!
Edit: Added \r\n and \r support
If you also need to split strings by '\r' or '\r\n' then this code is the one to go with.
string test = "Hello \r\nTest \n123 \rEnd"; //Create Test String
test = test.Replace("\r\n","\n");
test = test.Replace("\r","\n");
string[] seperated = test.Split('\n'); //Splite String by '\n'
for(int i = 0; i < seperated.Length; i++){ //Output substrings
Console.WriteLine(seperated[i]);
}
Output:
Hello
Test
123
End
Edit2: Hopefully Solution
So you are saying that
\nPRINTER_VENDOR_PRUSA3D\nPRINTER_MODEL_SL1\nPRINTER_VENDOR_EPAX\nPRINTER_MODEL_X1\n\nSTART_CUSTOM_VALUES\nFLIP_XY\nLayerOffTime_0\nBottomLightOffDelay_2\nBottomLiftHeight_5\nLiftHeight_5.5\nBottomLiftSpeed_40.2\nLiftSpeed_60\nRetractSpeed_150\nBottomLightPWM_255\nLightPWM_255\nAntiAliasing_4 ; Use 0 or 1 for disable AntiAliasing with "printer gamma correction" set to 0, otherwise use multiples of 2 and "gamma correction" set to 1 for enable\nEND_CUSTOM_VALUES
is the string then the problem might be that this string contains some " which will interfere with the .Split method
If you're able to input the string manually you should replace a simple " with a "

Regex Ignore first and last terminator

I have string in text that have uses | as a delimiter.
Example:
|2P|1|U|F8|
I want the result to be 2P|1|U|F8. How can I do that?
The regex is very easy, but why not just use Trim():
var str = "|2P|1|U|F8|";
str = str.Trim(new[] {'|'});
or just without new[] {...}:
str = str.Trim('|');
Output:
In case there are leading/trailing whitespaces, you can use chained Trims:
var str = "\r\n |2P|1|U|F8| \r\n";
str = str.Trim().Trim('|');
Output will be the same.
You can use String.Substring:
string str = "|2P|1|U|F8|";
string newStr = str.Substring(1, str.Length - 2);
Just remove the starting and the ending delimiter.
#"^\||\|$"
Use the below regex and then replace the match with an empty string.
Regex rgx = new Regex(#"^\||\|$");
string result = rgx.Replace(input, "");
Use mulitline modifier m when you're dealing with multiple lines.
Regex rgx = new Regex(#"(?m)^\||\|$");
Since | is a special char in regex, you need to escape this in-order to match a literal | symbol.
string input = "|2P|1|U|F8|";
foreach (string item in input.Split("|".ToCharArray(), StringSplitOptions.RemoveEmptyEntries))
{
Console.WriteLine(item);
}
Result is:
2P
1
U
F8
^\||\|$
You can try this.Replace by empty string.Use verbatim mode.See demo.
https://regex101.com/r/oF9hR9/14
For completionists-sake, you can also use Mid
Strings.Mid("|2P|1|U|F8|", 2, s.Length - 2)
This will cut out the part from the second character to the previous to last one and produce the correct output.
I'm assuming that at some point you will want to parse the string to extract its '|' separated components, so here goes another alternative that goes in that direction:
string.Join("|", theString.Split(new[] {'|'}, StringSplitOptions.RemoveEmptyEntries))

Removing White Space: C#

I am trying to remove white space that exists in a String input. My ultimate goal is to create an infix evaluator, but I am having issues with parsing the input expression.
It seems to me that the easy solution to this is using a Regular Expression function, namely Regex.Replace(...)
Here's what I have so far..
infixExp = Regex.Replace(infixExp, "\\s+", string.Empty);
string[] substrings = Regex.Split(infixExp, "(\\()|(\\))|(-)|(\\+)|(\\*)|(/)");
Assuming the user inputs the infix expression (2 + 3) * 4, I would expect that this would break the string into the array {(, 2, +, 3, ), *, 4}; however, after debugging, I am getting the following output:
infixExp = "(2+3)*7"
substrings = {"", (, 2, +, 3, ), "", *, 7}
It appears that the white space is being properly removed from the infix expression, but splitting the resulting string is improper.
Could anyone give me insight as to why? Likewise, if you have any constructive criticism or suggestions, let me know!
If a match is at one end of the string, you will get an empty match next to it. Likewise, if there are two adjacent matches, the string will be split on both of them, so you end up with an empty string in between. Citing MSDN:
If multiple matches are adjacent to one another, an empty string is inserted into the array. For example, splitting a string on a single hyphen causes the returned array to include an empty string in the position where two adjacent hyphens are found [...].
and
If a match is found at the beginning or the end of the input string, an empty string is included at the beginning or the end of the returned array.
Just filter them out in a second step.
Also, please make your life easier and use verbatim strings:
infixExp = Regex.Replace(infixExp, #"\s+", string.Empty);
string[] substrings = Regex.Split(infixExp, #"(\(|\)|-|\+|\*|/)");
The second expression could be simplified even further:
#"([()+*/-])"
Please, ditch Regex. There are better tools to use. You can use String.Trim(), .TrimEnd(), and .TrimStart().
string inputString = " asdf ";
string output = inputString.Trim();
For whitespace within the string, use String.Replace.
string output2 = output.Replace(" ", "");
You will have to expand this to other whitespace characters.
var result = Regex.Split(input, "(\\d+|\\D)")
.Where(x=>x!="").ToArray();
m.buettner's answer is correct. Also consider that you can do this in one step. From MSDN:
If capturing parentheses are used in a Regex.Split expression, any
captured text is included in the resulting string array.
Therefore, if you include the whitespace in the split pattern but outside the capturing parentheses, you can split on it as well but not include it in the result array:
var substrings = Regex.Split("(2 + 3) * 7", #"([()+*/-])|\s+");
The result:
substrings = {"", ( , 2, "", +, "", 3, ), "", "", *, "", 7}
And your final result would be:
substrings.Where(s => s != String.Empty)
Why not just remove the white spaces and then split the string with normal string handling functions? Like this...
string x = "(2 + 3) * 4";
x = x.Replace(" ", "").Replace("\t",""); //etc...
char[] y = x.ToCharArray();
Why bother making this more complicated than it needs to be?
A non-regex solution would probably be String.Replace - you could simply replace " ", "\t", and other whitespace with the empty string "".
I found the solution I was looking for thanks to all of your replies.
// Ignore all whitespace within the expression.
infixExp = Regex.Replace(infixExp, #"\s+", String.Empty);
// Seperate the expression based on the tokens (, ), +, -,
// *, /, and ignore any of the empty Strings that are added
// due to duplicates.
string[] substrings = Regex.Split(infixExp, #"([()+*/-])");
substrings = substrings.Where(s => s != String.Empty).ToArray();
By doing this it seperates the characters of the String into parts based on the regular mathematical operators (+, -, *, /) and parenthesis. After doing this it eliminates any remaining empty Strings within the substrings

Regular expression to break string C#

Here is my string:
1-1 This is my first string. 1-2 This is my second string. 1-3 This is my third string.
How can I break like in C# like;
result[0] = This is my first string.
result[1] = This is my second string.
result[2] = This is my third string.
IEnumerable<string> lines = Regex.Split(text, "(?:^|[\r\n]+)[0-9-]+ ").Skip(1);
EDIT: If you want the result in an array you can do string[] result = lines.ToArray();
Regex regex = new Regex("^(?:[0-9]+-[0-9]+ )(.*?)$", RegexOptions.Multiline);
var str = "1-1 This is my first string.\n1-2 This is my second string.\n1-3 This is my third string.";
var matches = regex.Matches(str);
List<string> strings = matches.Cast<Match>().Select(p => p.Groups[1].Value).ToList();
foreach (var s in strings)
{
Console.WriteLine(s);
}
We use a multiline Regex, so that ^ and $ are the beginning and end of the line. We skip one or more numbers, a -, one or more numbers and a space (?:[0-9]+-[0-9]+ ). We lazily (*?) take everything (.) else until the end of the line (.*?)$, lazily so that the end of the line $ is more "important" than any character .
Then we put the matches in a List<string> using Linq.
Lines will end with newline, carriage-return or both, This splits the string into lines with all line-endings.
using System.Text.RegularExpressions;
...
var lines = Regex.Split( input, "[\r\n]+" );
Then you can do what you want with each line.
var words = Regex.Split( line[i], "\s" );

Categories