C# [regex] trim spaces before specific word - c#

I want to trim all spaces between numbers before words "usd" and "eur".
I have regex pattern like this:
#"\b(\d\s*)+\s(usd|eur)"
How to exclude space and usd|eur from result match?.
String example: "sdklfjsd 10 343 usd ds 232 300 eur"
Result should be: "sdklfjsd 10343 usd ds 232300 eur"
string line = "2 300 $ 12 Asdsfd 2 300 530 usd and 2 351 eur";
MatchCollection matches;
Regex defaultRegex = new Regex(#"\b(\d+\s*)+(usd|eur)");
matches = defaultRegex.Matches(line);
WriteLine("Parsing '{0}'", line);
for (int ctr = 0; ctr < matches.Count; ctr++)
WriteLine("={0}){1}", ctr, matches[ctr].Value);

There my be a more eloquent way, but it can be done easily with a MatchEvaluator
new Regex(#"\b(\d+\s*)+(?=\s(usd|eur))").
Replace("sdklfjsd 10 343 usd ds 232 300 eur",
m => string.Join("", m.Groups[1].Captures.Cast<Capture>().Select(c => c.Value.Trim())))
The Regex \b(\d+\s*)+(?=\s(usd|eur)) uses a look-ahead to only match numbers that are followed by \s(usd|eur) and a grouping to match each consecutive match to \d+\s* (I assume the \b boundary from your question so that with abc12 34 56 eur it would only match 34 56 is desired, remove it otherwise).
Then for each match it gets all of that group's captures, trims them all, and concatenates them together to produce the replacement text.
(Note that generally currency codes should be capitalised, so you my have another issue there).

Try Regex: (\d+) *(\d+)(?= (?:usd|eur))
Demo

Assuming there only two numbers, you can use
\b(\d+)\s*(\d+)(?=\s(usd|eur)) with a replacement string of $1$2

You could also use a posotive lookbehind and a positive lookahead to match all the spaces you want to remove:
(?<=\d)\s+(?=(?:\d+\s+)*\d+\s+(?:eur|usd)\b)
Explanation
(?<=\d) Positive lookbehind to assert what is on the left is
\s+ Match 1+ whitespace characters
(?= Positive lookahead to assert what is on the right is
(?:\d+\s+)* Repeat 0+ times matching 1+ digits followed by 1+ whitespace characters
\d+\s+(?:eur|usd)\b match 1+ digits followed by 1+ whitespace characters and eur or usd
) Close positive lookahead
Regex demo
string line = "2 300 $ 12 Asdsfd 2 300 530 usd and 2 351 eur";
string result = Regex.Replace(line , #"(?<=\d)\s+(?=(?:\d+\s+)*\d+\s+(?:eur|usd)\b)", "");
Console.WriteLine(result); // 2 300 $ 12 Asdsfd 2300530 usd and 2351 eur
Demo C#

Related

C# Regex Replace sequence of numbers preceded with a space

I have this string:
Hello22, I'm 19 years old
I just want to replace the number with * if its preceded with a space, so it would look like this:
Hello22, I'm ** years old
I've been trying a bunch of regexes but no luck. Hope someone can help out with the correct regex. Thank you.
Regexes which I tried:
Regex.Replace(input, #"([\d-])", "*");
Returns all numbers replaced with *
Regex.Replace(input, #"(\x20[\d-])", "*");
Does not work as expected
You can try (?<= )[0-9]+ pattern where
(?<= ) - look behind for a space
[0-9]+ - one or more digits.
Code:
string source = "Hello22, I'm 19 years old";
string result = Regex.Replace(
source,
"(?<= )[0-9]+",
m => new string('*', m.Value.Length));
Have a look at \b[0-9]+\b (here \b stands for word bound). This pattern
will substitute all 19 in the "19, as I say, 19, I'm 19" (note, that 1st 19 doesn't have space before it):
string source = "19, as I say, 19, I'm 19";
string result = Regex.Replace(
source,
#"\b[0-9]+\b",
m => new string('*', m.Value.Length));
In C# you could also make use of a pattern with a lookbehind and an infinite quantifier.
(?<= [0-9]*)[0-9]
The pattern matches:
(?<= Positive lookbehind, assert what is to the left of the current position is
[0-9]* Match a space followed by optional digits 0-9
) Close lookbehind
[0-9]\ Match a single digit 0-9
Example
string s = "Hello22, I'm 19 years old";
string result = Regex.Replace(s, "(?<= [0-9]*)[0-9]", "*");
Console.WriteLine(result);
Output
Hello22, I'm ** years old

find specific pattern of digits in a string

Consider the following strings:
"via caporale degli zuavi 278a , 78329"
and
"autostrada a1 km - 47"
I am looking to isolate a specific sequence that can be present (first example) or not (second example)
In particular, i am looking for a sequence of digit that can be long 1 to 4 digit and can be followed by a single letter, but also in the string there must not be the substring "km". So in my previous example "278a" is valid but the rest of the sequence of digit are not.
What i've done until now is the following:
Since i know that any string that contains "km" is not valid i applied this piece of code:
if(!stripped.ToLower().Contains("km"))
{
// apply Regex
}
else
// string not valid, move on
I know that this Regex will give me all the squence of digits : Regex.Matches(t, #"\d+"); , but it is not enough. How can i proceed from here?
Edit: for further clarification, when a sequence of digit is followed by a letter, that letter must be the next char (so no whitespace or anything else)
Edit2: note that the sequence of digit can be followed by a letter or not (so 278a is as valid as 278)
You can assert not km to the left and right, and capture 1-4 digits 0-9 in a group and match and a char a-zA-Z:
(?<!\bkm\b.*)\b[0-9]{1,4}[A-Za-z]?\b(?!.*\bkm)
(?<!\bkm\b.*) Assert not km to the left
\b[0-9]{1,4}[A-Za-z]\b Match 1-4 digits 0-9 and match a single char A-Za-z
(?!.*\bkm) Assert not km to the right
.NET Regex demo
string pattern = #"(?<!\bkm\b.*)\b[0-9]{1,4}[A-Za-z]?\b(?!.*\bkm)";
string input = #"via caporale degli zuavi 278a , 78329
via caporale degli zuavi 277 , 78329
via caporale degli zuavi 279a , 78329 km
km via caporale degli zuavi 280a , 78329
autostrada a1 km - 47";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine(m.Value);
}
Output
278a
277
If there is only 1 match expected, you might also rule out km in the whole string, and use a capture group as well with Regex.Match
^(?!.*\bkm\b).*\b([0-9]{1,4}[A-Za-z]?)\b
Regex demo
You can use
^(?!.*(?<!\p{L})km\b)(?:.*\D)?(\d{1,4})(?=\p{L}?\b)
See the .NET regex demo. Details:
^ - start of string
(?!.*(?<!\p{L})km\b) - no km without any letter preceding the word and no alphanumeric/underscore following it is allowed anywhere in the string
(?:.*\D)? - an optional sequence of any zero or more chars other than a newline char, as many as possible, and then a non-digit char
(\d{1,4}) - Grooup 1: one to four digits
(?=\p{L}?\b) - immediately on the right, there should be an optional letter not followed with any alphanumeric or connector punctuation (like _).
See a C# demo:
var l = new List<string> {"via caporale degli zuavi 278a , 78329","autostrada a1 km - 47"};
foreach (var t in l)
{
var rx = #"^(?!.*(?<!\p{L})km\b)(?:.*\D)?(\d{1,4})(?=\p{L}?\b)";
var match = Regex.Match(t, rx, RegexOptions.ECMAScript)?.Groups[1].Value;
if (!string.IsNullOrEmpty(match))
{
Console.WriteLine($"There is a match in '{t}': {match}");
}
else
{
Console.WriteLine($"There is no match in '{t}'.");
}
}
Output:
There is a match in 'via caporale degli zuavi 278a , 78329': 278
There is no match in 'autostrada a1 km - 47'.
The RegexOptions.ECMAScript option is used to make \d only match ASCII digits (it does not affect \p{L} though).

Regex - Get digits after a colon

I have a regex:
var topPayMatch = Regex.Match(result, #"(?<=Top Pay)(\D*)(\d+(?:\.\d+)?)", RegexOptions.IgnoreCase);
And I have to convert this to int which I did
topPayMatch = Convert.ToInt32(topPayMatchString.Groups[2].Value);
So now...
Top Pay: 1,000,000 then it currently grabs the first digit, which is 1. I want all 1000000.
If Top Pay: 888,888 then I want all 888888.
What should I add to my regex?
You can use something as simple like #"(?<=Top Pay: )([0-9,]+)". Note that, decimals will be ignored with this regex.
This will match all numbers with their commas after Top Pay:, which after you can parse it to an integer.
Example:
Regex rgx = new Regex(#"(?<=Top Pay: )([0-9,]+)");
string str = "Top Pay: 1,000,000";
Match match = rgx.Match(str);
if (match.Success)
{
string val = match.Value;
int num = int.Parse(val, System.Globalization.NumberStyles.AllowThousands);
Console.WriteLine(num);
}
Console.WriteLine("Ended");
Source:
Convert int from string with commas
If you use the lookbehind, you don't need the capture groups and you can move the \D* into the lookbehind.
To get the values, you can match 1+ digits followed by optional repetitions of , and 1+ digits.
Note that your example data contains comma's and no dots, and using ? as a quantifier means 0 or 1 time.
(?<=Top Pay\D*)\d+(?:,\d+)*
The pattern matches:
(?<=Top Pay\D*) Positive lookbehind, assert what is to the left is Top Pay and optional non digits
\d+ Match 1+ digits
(?:,\d+)* Optionally repeat a , and 1+ digits
See a .NET regex demo and a C# demo
string pattern = #"(?<=Top Pay\D*)\d+(?:,\d+)*";
string input = #"Top Pay: 1,000,000
Top Pay: 888,888";
RegexOptions options = RegexOptions.IgnoreCase;
foreach (Match m in Regex.Matches(input, pattern, options))
{
var topPayMatch = int.Parse(m.Value, System.Globalization.NumberStyles.AllowThousands);
Console.WriteLine(topPayMatch);
}
Output
1000000
888888

Regex to match positive and negative numbers and text between "" after a character

I need a regex for an input that contains positive and negative numbers and sometimes a string between " and ". I'm not sure if this can be done in only one pattern. Here's some test cases for the pattern:
*PATH "C:\Users\User\Desktop\Media\SoundBanks\Ambient\WAV_Data\AD_SMP_SFX_WIND0.wav"
*NODECOLOR 0 255 140
*FILEREF -7
*FREQUENCY 22050
The idea would be to use a pattern that returns:
C:\Users\User\Desktop\Media\SoundBanks\Ambient\WAV_Data\AD_SMP_SFX_WIND0.wav
0 255 140
-7
22050
The content always goes after the character *. I've split this in two patterns because I don't know how to do it all in one, but doesn't work:
MatchCollection NumberMtaches = Regex.Matches(FileLine, #"(?<=[*])-?[0-9]+");
MatchCollection FilePathMatches = Regex.Matches(FileLine, #"/,([^,]*)(?=,)/g");
You may read the file into a string and run the following regex:
var matches = Regex.Matches(filecontents, #"(?m)^\*\w+[\s-[\r\n]]*""?(.*?)""?\r?$")
.Cast<Match>()
.Select(x => x.Groups[1].Value)
.ToList();
See the .NET regex demo.
Details:
(?m) - RegexOptions.Multiline option on
^ - start of a line
\* - a * char
\w+ - one or more word chars
[\s-[\r\n]]* - zero or more whitespaces other than CR and LF
"? - an optional " char
(.*?) - Group 1: any zero or more chars other than an LF char, as few as possible
"? - an optional " char
\r? - an optional CR
$ - end of a line/string.

How do I write regex to validate EIN numbers?

I want to validate that a string follows this format (using regex):
valid: 123456789 //9 digits
valid: 12-1234567 // 2 digits + dash + 7 digits
Here's an example, how I would use it:
var r = new Regex("^[1-9]\d?-\d{7}$");
Console.WriteLine(r.IsMatch("1-2-3"));
I have the regex for the format with dash, but can't figure how to include the non-dash format???
Regex regex = new Regex("^\\d{2}-?\\d{7}$");
This will accept the two formats you want: 2 digits then an optional dash and 7 numbers.
^ \d{9} | \d{2} - \d{7} $
Remove the spaces, they are there for readability.

Categories