Using regex to split string by Non Digit and Digit - c#

Ive seen a few answers that are similar but none seem to go far enough. I need to split the string when the letters change to numbers and back. The trick is the pattern is variable meaning there can be any number of letter or number groupings.
For Example
AB1000 => AB 1000
ABC1500 => ABC 1500
DE160V1 => DE 160 V 1
FGG217H5IJ1 => FGG 217 H 5 IJ 1
Etc.

If you want to split the string, one way would be lookarounds:
string[] results = Regex.Split("FGG217H5IJ1", #"(?<=\d)(?=\D)|(?<=\D)(?=\d)");
Console.WriteLine(String.Join(" ", results)); //=> "FGG 217 H 5 IJ 1"

You can use a regex like this:
[A-Z]+|\d+
Working demo

Related

How to remove decimals from file (round them)?

I have to open file, find all decimals, remove decimal part, round them and replace in the text. Result text should be print in the Console.
I tried to do it, but the only thing I made was to remove the decimal part. Please tell me how to round them and replace in the result text. Here is my code:
Console.WriteLine("Enter path to first file:");
String path1 = Console.ReadLine();
string text = File.ReadAllText(path1);
string pattern = #"(\d+)\.\d+";
if(File.Exists(path1) ){
foreach(string phrase in Regex.Split(text, pattern)){
Console.Write(phrase);
}
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
}
You can use #"\d+([\.\,]\d+)" pattern to capture each number with any amount of decimals. Then use Regex.Replace with MatchEvaluator, where parse captured value as double then "cut" decimals by simple ToString("F0") (check about Fixed-point format).
Example below include decimals with comma , or . fraction separators with help of double.TryParse overload, where we can specify NumberStyles.Any and CultureInfo.InvariantCulture (from System.Globalization namespace) and simple replacement of comma , to dot .. Also works with negative numbers (e.g. -0.98765 in example):
var input = "I have 11.23$ and can spend 20,01 of it. "+
"Melons cost 01.25$ per -0.98765 kg, "+
"but my mom ordered me to buy 1234.56789 kg. "+
"Please do something with that decimals.";
var result = Regex.Replace(input, #"\d+([\.\,]\d+)", (match) =>
double.TryParse(match.Value.Replace(",", "."), NumberStyles.Any, CultureInfo.InvariantCulture, out double value)
? value.ToString("F0")
: match.Value);
// Result:
// I have 11$ and can spend 20 of it.
// Melons cost 1$ per -1 kg,
// but my mom ordered me to buy 1235 kg.
// Please do something with that decimals.
On "Aaaa 50.05 bbbb 82.52 cccc 6.8888" would work too with result of "Aaaa 50 bbbb 83 cccc 7".
You can use Math.Round on all matches that you can transform using Regex.Replace and a match evaluator as the replacement:
var text = "Aaaa 50.05 bbbb 82.52 cccc 6.8888";
var pattern = #"\d+\.\d+";
var result = Regex.Replace(text, pattern, x => $"{Math.Round(Double.Parse(x.Value))}");
Console.WriteLine(result); // => Aaaa 50 bbbb 83 cccc 7
See the C# demo.
The \d+\.\d+ regex is simple, it matches one or more digits, . and one or more digits. Double.Parse(x.Value) converts the found value to a Double, and then Math.Round rounds the number.

Append arrays and lists

For example, if the entered input is:
1 2 3 |4 5 6 | 7 8
we should manipulate it to
1 2 3|4 5 6|7 8
Another example:
7 | 4 5|1 0| 2 5 |3
we should manipulate it to
7|4 5|1 0|2 5|3
This is my idea because I want to exchange some of the subarrays (7; 4 5; 1 0; 2 5; 3).
I'm not sure that this code is working and it can be the base of I want to do but I must upload it for you to see my work.
static void Main(string[] args)
{
List<string> arrays = Console.ReadLine()
.Split(' ', StringSplitOptions.RemoveEmptyEntries)
.ToList();
foreach (var element in arrays)
{
Console.WriteLine("element: " + element);
}
}
You need to split your input by "|" first and then by space. After this, you can reassemble your input with string.Join. Try this code:
var input = "1 2 3 |4 5 6 | 7 8";
var result = string.Join("|", input.Split('|')
.Select(part => string.Join(" ",
part.Trim().Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries))));
// now result is "1 2 3|4 5 6|7 8"
This could do this with a simple regular expression:
var result = Regex.Replace(input, #"\s?\|\s?", "|");
This will match any (optional) white space character, followed by a | character, followed by an (optional) white space character and replace it with a single | character.
Alternatively, if you need to potentially strip out multiple spaces around the |, replace the zero-or-one quantifiers (?) with zero-or-more quantifiers (*):
var result = Regex.Replace(input, #"\s*\|\s*", "|");
To also deal with multiple spaces between numbers (not just around | characters), I'd recommend something like this:
var result = Regex.Replace(input, #"\s*([\s|])\s*", "$1")
This will match any occurrence of zero or more white space characters, followed by either a white space character or a | character (captured in group 1), followed by zero or more white space characters and replace it with whatever was captured in group 1.

How to validate this using Regex C#

I have a input which can be the following (Either one of these three):
1-8…in other words 1, 2,3,4,5,6,7,8
A-Z….in other words, A, B, C, D etc
01-98…in other words, 01,02,03,04 etc
I came up with this regex but it's not working not sure why:
#"[A-Z0-9][1-8]-
I am thinking to check for corner cases like just 0 and just 9 after regex check because regex check isn't validating this
Not sure I understand, but how about:
^(?:[A-Z]|[1-8]|0[1-9]|[1-8][0-9]|9[0-8])$
Explanation:
(?:...) is a group without capture.
| introduces an alternative
[A-Z] means one letter
[1-8] one digit between 1 and 8
0[1-9] a 0 followed by a digit between 1 and 9
[1-8][0-9] a digit between 1 and 8 followed by a digit between 1 and 9
9[0-8] 9 followed by a digit between 0 and 8
May be it is, depending on your real needs:
^(?:[A-Z]|[0-9]?[1-8])$
I think you may use this pattern
#"^([1-8A-Z]|0[1-9]|[1-9]{2})$"
What about ^([1-8]|([0-9][0-9])|[A-Z])$ ?
That will give a match for
A
8 (but not 9)
09
[1-8]{0,1}[A-Z]{0,1}\d{1,2}
matches all of the following
8A8 8B9 9 0 00
You can use following pattern:
^[1-8][A-Z](?:0[1-9]|[1-8][0-9]|9[0-8])$
^[1-8] - input should start with number 1-8
[A-Z] - then should be single letter A-Z
(0[1-9]|[1-8][0-9]|9[0-8])$ and it should end with two numbers which are 01-19 or 10-89 or 90-98
Test:
string pattern = #"^[1-8][A-Z](0[1-9]|[1-8][0-9]|9[0-8])$";
Regex regex = new Regex(pattern);
string[] valid = { "1A01", "8Z98" };
bool allMatch = valid.All(regex.IsMatch);
string[] invalid = { "0A01", "11A01", "1A1", "1A99", "1A00", "101", "1AA01" };
bool allNotMatch = !invalid.Any(regex.IsMatch);

Regex "or" Expression

This is probably a really basic question, but I can't find any answers. I need to match a string by either two or more spaces OR an equals sign.
When I split this string: 9 x 13 = (8.9 x 13.4) (89 x 134)
with ( +) I get:
part 0: 9 x 13 = (8.9 x 13.4)
part 1: (89 x 134)
When I split it with (=) I get:
part 0: 9 x 13
part 1: (8.9 x 13.4) (89 x 134)
How can split by BOTH? Something like: (=)OR( +)
Edit:
This does not work(=)|( +), I was expecting:
part 0: 9 x 13
part 1: (8.9 x 13.4)
part 2: (89 x 134)
Your regex should have worked, except it would leave the spaces that were before and after the =. That's assuming you really did use two spaces in the ( +) part (which got normalized to one space by SO's formatting). This one yields the exact result you said you want:
#" {2,}|\s*=\s*"
Simply,
Pattern = "\s*=\s*|(?!\))\s+?(?=\()"
(=)|( +)
Is that good for you?
Explanation and example:
http://msdn.microsoft.com/en-us/library/ze12yx1d.aspx , scroll down to the 3rd remark...
You can use a regex like this: [= ]+
var regex = new Regex("[= ]+");
var parts = regex.Split("this is=a test");
// parts = [ "this", "is", "a", "test" ]
If you want to keep the separators enclose the regex in parens: ([= ]+)

How to extract decimal number from string in C#

string sentence = "X10 cats, Y20 dogs, 40 fish and 1 programmer.";
string[] digits = Regex.Split (sentence, #"\D+");
For this code I get these values in the digits array
10,20,40,1
string sentence = "X10.4 cats, Y20.5 dogs, 40 fish and 1 programmer.";
string[] digits = Regex.Split (sentence, #"\D+");
For this code I get these values in the digits array
10,4,20,5,40,1
But I would like to get like
10.4,20.5,40,1
as decimal numbers. How can I achieve this?
Small improvement to #Michael's solution:
// NOTES: about the LINQ:
// .Where() == filters the IEnumerable (which the array is)
// (c=>...) is the lambda for dealing with each element of the array
// where c is an array element.
// .Trim() == trims all blank spaces at the start and end of the string
var doubleArray = Regex.Split(sentence, #"[^0-9\.]+")
.Where(c => c != "." && c.Trim() != "");
Returns:
10.4
20.5
40
1
The original solution was returning
[empty line here]
10.4
20.5
40
1
.
The decimal/float number extraction regex can be different depending on whether and what thousand separators are used, what symbol denotes a decimal separator, whether one wants to also match an exponent, whether or not to match a positive or negative sign, whether or not to match numbers that may have leading 0 omitted, whether or not extract a number that ends with a decimal separator.
A generic regex to match the most common decimal number types is provided in Matching Floating Point Numbers with a Regular Expression:
[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?
I only changed the capturing group to a non-capturing one (added ?: after (). It matches
If you need to make it even more generic, if the decimal separator can be either a dot or a comma, replace \. with a character class (or a bracket expression) [.,]:
[-+]?[0-9]*[.,]?[0-9]+(?:[eE][-+]?[0-9]+)?
^^^^
Note the expressions above match both integer and floats. To match only float/decimal numbers make sure the fractional pattern part is obligatory by removing the second ? after \. (demo):
[-+]?[0-9]*\.[0-9]+(?:[eE][-+]?[0-9]+)?
^
Now, 34 is not matched: is matched.
If you do not want to match float numbers without leading zeros (like .5) make the first digit matching pattern obligatory (by adding + quantifier, to match 1 or more occurrences of digits):
[-+]?[0-9]+\.[0-9]+(?:[eE][-+]?[0-9]+)?
^
See this demo. Now, it matches much fewer samples:
Now, what if you do not want to match <digits>.<digits> inside <digits>.<digits>.<digits>.<digits>? How to match them as whole words? Use lookarounds:
[-+]?(?<!\d\.)\b[0-9]+\.[0-9]+(?:[eE][-+]?[0-9]+)?\b(?!\.\d)
And a demo here:
Now, what about those floats that have thousand separators, like 12 123 456.23 or 34,345,767.678? You may add (?:[,\s][0-9]+)* after the first [0-9]+ to match zero or more sequences of a comma or whitespace followed with 1+ digits:
[-+]?(?<![0-9]\.)\b[0-9]+(?:[,\s][0-9]+)*\.[0-9]+(?:[eE][-+]?[0-9]+)?\b(?!\.[0-9])
See the regex demo:
Swap a comma with \. if you need to use a comma as a decimal separator and a period as as thousand separator.
Now, how to use these patterns in C#?
var results = Regex.Matches(input, #"<PATTERN_HERE>")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
try
Regex.Split (sentence, #"[^0-9\.]+")
You'll need to allow for decimal places in your regular expression. Try the following:
\d+(\.\d+)?
This will match the numbers rather than everything other than the numbers, but it should be simple to iterate through the matches to build your array.
Something to keep in mind is whether you should also be looking for negative signs, commas, etc.
Check the syntax lexers for most programming languages for a regex for decimals.
Match that regex to the string, finding all matches.
If you have Linq:
stringArray.Select(s=>decimal.Parse(s));
A foreach would also work. You may need to check that each string is actually a number (.Parse does not throw en exception).
Credit for following goes to #code4life. All I added is a for loop for parsing the integers/decimals before returning.
public string[] ExtractNumbersFromString(string input)
{
input = input.Replace(",", string.Empty);
var numbers = Regex.Split(input, #"[^0-9\.]+").Where(c => !String.IsNullOrEmpty(c) && c != ".").ToArray();
for (int i = 0; i < numbers.Length; i++)
numbers[i] = decimal.Parse(numbers[i]).ToString();
return numbers;
}

Categories