Append arrays and lists - c#

For example, if the entered input is:
1 2 3 |4 5 6 | 7 8
we should manipulate it to
1 2 3|4 5 6|7 8
Another example:
7 | 4 5|1 0| 2 5 |3
we should manipulate it to
7|4 5|1 0|2 5|3
This is my idea because I want to exchange some of the subarrays (7; 4 5; 1 0; 2 5; 3).
I'm not sure that this code is working and it can be the base of I want to do but I must upload it for you to see my work.
static void Main(string[] args)
{
List<string> arrays = Console.ReadLine()
.Split(' ', StringSplitOptions.RemoveEmptyEntries)
.ToList();
foreach (var element in arrays)
{
Console.WriteLine("element: " + element);
}
}

You need to split your input by "|" first and then by space. After this, you can reassemble your input with string.Join. Try this code:
var input = "1 2 3 |4 5 6 | 7 8";
var result = string.Join("|", input.Split('|')
.Select(part => string.Join(" ",
part.Trim().Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries))));
// now result is "1 2 3|4 5 6|7 8"

This could do this with a simple regular expression:
var result = Regex.Replace(input, #"\s?\|\s?", "|");
This will match any (optional) white space character, followed by a | character, followed by an (optional) white space character and replace it with a single | character.
Alternatively, if you need to potentially strip out multiple spaces around the |, replace the zero-or-one quantifiers (?) with zero-or-more quantifiers (*):
var result = Regex.Replace(input, #"\s*\|\s*", "|");
To also deal with multiple spaces between numbers (not just around | characters), I'd recommend something like this:
var result = Regex.Replace(input, #"\s*([\s|])\s*", "$1")
This will match any occurrence of zero or more white space characters, followed by either a white space character or a | character (captured in group 1), followed by zero or more white space characters and replace it with whatever was captured in group 1.

Related

Regex split by same character within brackets

I have a like long string, like so:
(A) name1, name2, name3, name3 (B) name4, name5, name7 (via name7) ..... (AA) name47, name47 (via name 46) (BB) name48, name49
Currently I split by "(" but it picks up the via as new lines)
string[] lines = routesRaw.Split(new[] { " (" }, StringSplitOptions.RemoveEmptyEntries);
How can I split the information within the first brackets only? There is no AB, AC, AD, etc. the characters are always the same within the brackets.
Thanks.
You may use a matching approach here since the pattern you need will contain a capturing group in order to be able to match the same char 0 or more amount of times, and Regex.Split outputs all captured substrings together with non-matches.
I suggest
(?s)(.*?)(?:\(([A-Z])\2*\)|\z)
Grab all non-empty Group 1 values. See the regex demo.
Details
(?s) - a dotall, RegexOptions.Singleline option that makes . match newlines, too
(.*?) - Group 1: any 0 or more chars, but as few as possible
(?:\(([A-Z])\2*\)|\z) - a non-capturing group that matches:
\(([A-Z])\2*\) - (, then Group 2 capturing any uppercase ASCII letter, then any 0 or more repetitions of this captured letter and then )
| - or
\z - the very end of the string.
In C#, use
var results = Regex.Matches(text, #"(?s)(.*?)(?:\(([A-Z])\2*\)|\z)")
.Cast<Match>()
.Select(x => x.Groups[1].Value)
.Where(z => !string.IsNullOrEmpty(z))
.ToList();
See the C# demo online.

Regex to search for single 0's and add commas

I have the following all number data:
4245 4 0 0242 4424.09 0 422404 5955 0
2234234.234 224 0
2423 234 0
I need to process each line individually. I need to remove all the single 0's and output as follows with commas:
4245,4, 0242,4424.09, 422404,5955
2234234.234,224
2423,234
I got the part of removing the single digits working:
var result = Regex.Replace(inData, #"\b\s0\b", string.Empty);
But cannot figure out how to add the commas in between each number. Any help would be appreciated. Thanks.
You can achieve what you want with one Regex.Replace operation, but with a custom match evaluator:
var input = "4245 4 0 242 4424.09 0 422404 5955 0";
var results = Regex.Replace(input, #"(?:\s+|^)0(\s+)|(\s+0)$|\s+", m =>
m.Groups[1].Success ? ", " :
m.Groups[2].Success ? "" : ",");
The point is to match those parts we need and capture into groups, so that they can be further analyzed and an appropriate action could be taken.
Pattern details:
(?:\s+|^)0(\s+) - match 0 that is either at the start or with whitespaces before it and that is followed with 1 or more whitespaces (the whitespaces after 0 are stored in Group 1)
| - or
(\s+0)$ - Group 2 capturing one or more whitespaces, then a 0 at the end ($) of the string
| - or
\s+ - (3rd option) 1 or more whitespaces in all other contexts.
And just in case one likes a more readable version, here is an alternative where the final 0 is removed with string methods, and then 1 regex is used to replace all spaces inside digits with a comma, but before we replace all 0 s with a mere String.Replace.
var inp = "4245 4 0 0242 4424.09 0 422404 5955 0";
inp = inp.EndsWith(" 0") ? inp.Substring(0, inp.Length - 2) : inp;
var output = Regex.Replace(inp.Replace(" 0 ", ", "), #"(\d) (\d)", "$1,$2");
I understand that you want to
Replace spaces with commas ("xy z" => "xy,z")
Replace single zeros with spaces ("xy 0 z" => "xy, z")
Then I would recommend two string replacements:
inData.replace(" ", ",");,
inData.replace(",0", " ");
Using this will replace any whitespace character with a comma.
var result = Regex.Replace(inData, #"\s+", ",");
\s+ matches any whitespace character.
Then run your other regex to remove the single digit 0's
You could just do a string.replace(" ", ","), right? (if I am understanding your question correctly)
Or you could even do a string.split(" ") into an array, then string.join(','). Although this is probably less efficient.

Catching a pattern, but ignoring it within quotes

So, what I need to do in c# regex is basically split a string whenever I find a certain pattern, but ignore that pattern if it is surrounded by double quotes in the string.
Example:
string text = "abc , def , a\" , \"d , oioi";
string pattern = "[ \t]*,[ \t]*";
string[] result = Regex.Split(text, pattern, RegexOptions.ECMAScript);
Wanted result after split (3 splits, 4 strings):
{"abc",
"def",
"a\" , \"d",
"oioi"}
Actual result (4 splits, 5 strings):
{"abc",
"def",
"a\"",
"\"d",
"oioi"}
Another example:
string text = "a%2% 6y % \"ad%t6%&\" %(7y) %";
string pattern = "%";
string[] result = Regex.Split(text, pattern, RegexOptions.ECMAScript);
Wanted result after split (5 splits, 6 strings):
{"a",
"2",
" 6y ",
" \"ad%t6%&\" ",
"(7y) ",
""}
Actual result (7 splits, 8 strings):
{"a",
"2",
" 6y ",
"\"ad",
"t6",
"&\" ",
"(7y) ",
""}
A 3rd example, to exemplify a tricky split where only the first case should be ignored:
string text = "!!\"!!\"!!\"";
string pattern = "!!";
string[] result = Regex.Split(text, pattern, RegexOptions.ECMAScript);
Wanted result after split (2 splits, 3 strings):
{"",
"\"!!\"",
"\""}
Actual result (3 splits, 4 strings):
{"",
"\"",
"\"",
"\"",}
So, how do I move from pattern to a new pattern that achieves the desired result?
Sidenote: If you're going to mark someone's question as duplicate (and I have nothing against that), at least point them to the right answer, not to some random post (yes, I'm looking at you, Mr. Avinash Raj)...
The rules are more or less like in a csv line except that:
the delimiter can be a single character, but it can be a string or a pattern too (in these last cases items must be trimmed if they start or end with the last or first possible tokens of the pattern delimiter),
an orphan quote is allowed for the last item.
First, when you want to separate items (to split) with a little advanced rules, the split method is no more a good choice. The split method is only handy for simple situations, not for your case. (even without orphan quotes, using split with ,(?=(?:[^"]*"[^"]*")*[^"]*$) is a very bad idea since the number of steps needed to parse the string grows exponentially with the string size.)
The other approach consists to capture items. That is more simple and faster. (bonus: it checks the format of the whole string at the same time).
Here is a general way to do it:
^
(?>
(?:delimiter | start_of_the_string)
(
simple_part
(?>
(?: quotes | delim_first_letter_1 | delim_first_letter_2 | etc. )
simple_part
)*
)
)+
$
Example with \s*,\s* as delimiter:
^
# non-capturing group for one delimiter and one item
(?>
(?: \s*,\s* | ^ ) # delimiter or start of the string
# (eventually change "^" to "^ \s*" to trim the first item)
# capture group 1 for the item
( # simple part of the item (maybe empty):
[^\s,"]* # all that is not the quote character or one of the possible first
# character of the delimiter
# edge case followed by a simple part
(?>
(?: # edge cases
" [^"]* (?:"|$) # a quoted part or an orphan quote in the last item (*)
| # OR
(?> \s+ ) # start of the delimiter
(?!,) # but not the delimiter
)
[^\s,"]* # simple part
)*
)
)+
$
demo (click on the table link)
The pattern is designed for the Regex.Match method since it describes all the string. All items are available in group 1 since the .net regex flavor is able to store repeated capture groups.
This example can be easily adapted to all cases.
(*) if you want to allow escaped quotes inside quoted parts, you can use one more time simple_part (?: edge_case simple_part)* instead of " [^"]* (?:"|$), i.e: "[^\\"]* (?: \\. [^\\"]*)* (?:"|$)
I think this is a two step process and it has been overthought trying to make it a one step regex.
Steps
Simply remove any quotes from a string.
Split on the target character(s).
Example of Process
I will split on the , for step 2.
var data = string.Format("abc , def , a{0}, {0}d , oioi", "\"");
// `\x22` is hex for a quote (") which for easier reading in C# editing.
var stage1 = Regex.Replace(data, #"\x22", string.Empty);
// abc , def , a", "d , oioi
// becomes
// abc , def , a, d , oioi
Regex.Matches(stage1, #"([^\s,]+)[\s,]*")
.OfType<Match>()
.Select(mt => mt.Groups[1].Value )
Result

Replace using Regular Expression - fixed digit location

I would like to replace from a number of 16 digits, it's 5th to 10th digit.
How can that be achieved with a regular expression (C#)?
The way to do it is to capture in the inner and outer portions separately, like this:
// Split into 2 groups of 5 digits and 1 of 6
string regex = "(\\d{5})(\\d{5})(\\d{6})";
// Insert ABCDEF in the middle of
// match 1 and match 3
string replaceRegex = "${1}ABCDE${3}";
string testString = "1234567890999999";
string result = Regex.Replace(testString, regex, replaceRegex);
// result = '12345ABCDE999999'
Why use a regular expression? If by "number of 16 digits", you mean a 16 character long string representation of a number, then you'd probably be better off just using substring.
string input = "0000567890000000";
var output = input.Substring(0, 4) + "222222" + input.Substring(10, 6);
Or did you mean you want to swap the 5th and 10th digits? Your question isn't very clear.
Use the regular expression (?<=^\d{4})\d{6}(?=\d{6}$) to achieve it without capture groups.
It looks for 6 consecutive digits (5th to 10th inclusively) that are preceded by the first 4 digits and the last 6 digits of the string.
Regex.Replace("1234567890123456", #"(?<=^\d{4})\d{6}(?=\d{6}$)", "replacement");
Got it...
by creating 3 capturing groups:
([\d]{5})([\d]{5})([\d]{6})
keep capturing group1 and 3 and replace group2 with stars (or whatever)
$1*****$3
C# code below
string resultString = null;
try {
resultString = Regex.Replace(subjectString, #"([\d]{5})([\d]{5})([\d]{6})", "$1*****$2", RegexOptions.Singleline);
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}

Regex to find all matching statements

I have a string:
put 1 in pot put 2 in pot put 3 in pot...
up to
put n in pot
How can I use C# regex to obtain all put statements like:
"put 1 in pot"
"put 2 in pot"
"put 3 in pot"
...
"put n in pot"
for n statements?
Thanks
I probably shouldn't answer this as your question shows no effort at all, but I think a possible regex would be:
string regex = #"put (?<number>\d+) in pot";
Then you can match using:
var matches = Regex.Matches("Put 1 in pot put 2 in pot", #"put (?<number>\d+) in pot", RegexOptions.IgnoreCase);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}
To find the actual number, you can use
int matchNumber = Convert.ToInt32(match.Groups["number"].Value);
You can also do this
var reg=#"put.*?(?=put|$)";
List<string> puts=Regex.Matches(inp,reg,RegexOptions.Singleline)
.Cast<Match>()
.Select(x=>x.Value)
.ToList();
put.*?(?=put|$)
------ -------
| |
| |->checks if `.*?`(0 to many characters) is followed by `put` or `end` of the file
|->matches put followed by 0 to many characters

Categories