I have some input that is an integer stored as a string that may have 1 or 2 digits. I would like to know if it is possible to come up with a regex pattern and substitution string that allows me to add a 0 at the front of any input that has only one digit.
ie. I'd like to find pattern and subst such that:
Regex.Replace("1",pattern,subst); // returns "01"
Regex.Replace("31",pattern,subst); // returns "31"
Edit: the question is specific to C# regex. Please do not answer to provide alternative methods
Using regex you can use word boundaries around a single digit:
string num = "5";
Regex.Replace(num, #"\b\d\b", "0$&");
//=> 05
num = "31";
Regex.Replace(num, #"\b\d\b", "0$&");
//=> 31
Code Demo
Regex \b\d\b will match a single digit with word boundaries on either side to ensure we're only matching a single digit.
More Infor about Word boundary
In case digit can appear in the middle of the word then you can use lookarounds regex like this:
num = Console.WriteLine(Regex.Replace(num, #"(?<!\d)\d(?!\d)", "0$&"));
Related
I have a lot of movie files and I want to get their production year from their file names. as below:
Input: Kingdom.of.Heaven.2005.720p.Dubbed.Film2media
Output: 2005
This code just splits all the numbers:
string[] result = Regex.Split(str, #"(\d+:)");
You must be more specific about which numbers you want. E.g.
Regex to find the year (not for splitting):
\b(19\d\d)|(20\d\d)\b
19\d\d selects numbers like 1948, 1989.
20\d\d selects numbers like 2001, 2022.
\b specifies the word limits. It excludes numbers or words with 5 or more digits.
| means or
But it is difficult to make a fool proof algorithm without knowing how exactly the filename is constructed. E.g. the movie "2001: A Space Odyssey" was released in 1968. So, 2001 is not a correct result here.
To omit the movie name, you could search backwards like this:
string productionYear =
Regex.Match(str, #"\b(19\d\d)|(20\d\d)\b", RegexOptions.RightToLeft);
If instead of 720p we had a resolution of 2048p for instance, this would not be a problem, because the 2nd \b requires the number to be at the word end.
If the production year was always the 4th item from the right, then a better way to get this year would be:
string[] parts = str.Split('.');
string productionYear = parts[^4]; // C# 8.0+, .NET Core
// or
string productionYear = parts[parts.Length - 4]; // C# < 8 or .NET Framework
Note that the regex expression you specify in Regex.Split designates the separators, not the returned values.
I would not try to split the string, more like match a field. Also, consider matching \d{4} and not \d+ if you want to be sure to get years and not other fields like resolution in your example
You can try this:
string str = "Kingdom.of.Heaven.2005.720p.Dubbed.Film2media";
string year = Regex.Match(str, #"(?<=\.)(\d{4})(?=\.)").Groups[1].Value;
Console.WriteLine("Year: " + year);
Output: Year: 2005
Demo: https://dotnetfiddle.net/KM2PNk
\d{4}: This matches any sequence of four digits.
(?<=\.): This is a positive lookbehind assertion, which means that the preceding pattern must be present, but is not included in the match. In this case, the preceding pattern is a dot, so the regular expression will only match a sequence of four digits if it is preceded by a dot.
(?=\.): This is a positive lookahead assertion, which means that the following pattern must be present, but is not included in the match. In this case, the following pattern is a dot, so the regular expression will only match a sequence of four digits if it is followed by a dot.
Imagine a string that contains special characters like $§%%,., numbers and letters.
I want to receive the letter and number junks of an arbitrary string as an array of strings.
A good solution seems to be the use of regex, but I don't know how to express [numbers and letters]
// example
"abc" = {"abc"};
"ab .c" = {"ab", "c"}
"ab123,cd2, ,,%&$§56" = {"ab123", "cd2", "56"}
// try
string input = "jdahs32455$§&%$§df233§$fd";
string[] output = input.Split(Regex("makejunksfromstring"));
To extract chunks of 1 or more letters/digits you may use
[A-Za-z0-9]+ # ASCII only letters/digits
[\p{L}0-9]+ # Any Unicode letters and ASCII only digits
[\p{L}\p{N}]+ # Any Unicode letters/digits
See a regex demo.
C# usage:
string[] output = Regex.Matches(input, #"[\p{L}\p{N}]+").Cast<Match>().Select(x => x.Value).ToArray();
Yes, regex is indeed a good solution for this.
And in fact, to just match all standard words in the input sequence, this is all you need:
(\w+)
Let me quickly explain
\w matches any word character and is equivalent to [a-zA-Z0-9_] - matching a through z or A through Z or 0-9 or _, you might wanna go with [a-zA-Z0-9] instead to avoid that underscore.
Wrapping an expression in () means that you want to capture that part as a group.
The + means that you want sequences of 1 or more of the preceding characters.
Refer to a regular expression cheat sheet to see all the possibilities, such as
https://cheatography.com/davechild/cheat-sheets/regular-expressions/
Or any that you find online.
Also there are tools available to quickly test out your regular expressions, such as
https://regex101.com/ (quite well visualised matching)
or http://regexstorm.net/tester specifically for .NET
We have a security issue where a specific field in a database has some sensitive information in it. I need a way to detect numbers that are between 2 and 8 in length, replace the digits with a "filler" of the same length.
For instance:
Jim8888Dandy
Mike9999999999Thompson * Note: this is 10 in length and we don't want to replace the digits
123Area Code
Tim Johnson5555555
In these instances anytime we find a number that is between 2 and 8 (inclusive) then I want to replace/fill/substitute that value with the number 0 and keep the length of the original digits
End Result
Jim0000Dandy
Mike9999999999Thompson
000Area Code
Tim Johnson0000000
Is there an easy way to accomplish this using RegEx?
You need to provide a static evaluator method that would do the replacing. It replaces digits in the match with zeroes:
public static string Evaluate(Match m)
{
return Regex.Replace(m.Value, "[0-9]", "0");
}
And then use it with this code:
string input = "9999999099999Thompson534543";
MatchEvaluator evaluator = new MatchEvaluator(Program.Evaluate);
string replaced = Regex.Replace(input, "(?:^|[^0-9])[0-9]{2,8}(?:$|[^0-9])", evaluator);
The regex is:
(?:^|[^0-9]) - should be at the start or preceeded by non-digit
[0-9]{2,8} - the to capture between 2 and 8 digits
(?:$|[^0-9]) - should be at the end or followed by non-digit
Just for the clever regex department. This is not an efficient regex.
(?<=(?>(?'front'\d){0,7}))\d(?=(?'back'(?'-front'\d)){0,7}(?!\d))((?'-front')|(?'-back'))
Replace to 0.
/(?<=(?>(?'front'\d){0,7})) # Measure how many digits we're behind.
\d # This digit is matched
(?=
(?'back' # Measure how many digits we're in front of.
(?'-front'\d)){0,7}
# For every digit here, subtract one group from 'front',
# As to assert we'll never go over the < 8 digit requirement.
(?!\d) # no more digits
)
(
(?'-front') # At least one capturing group left for 'front' or 'back'
|(?'-back') # for > 2 digits requirement.
)/x
Greetings beloved comrades.
I cannot figure out how to accomplish the following via a regex.
I need to take this format number 201101234 and transform it to 11-0123401, where digits 3 and 4 become the digits to the left of the dash, and the remaining five digits are inserted to the right of the dash, followed by a hardcoded 01.
I've tried http://gskinner.com/RegExr, but the syntax just defeats me.
This answer, Equivalent of Substring as a RegularExpression, sounds promising, but I can't get it to parse correctly.
I can create a SQL function to accomplish this, but I'd rather not hammer my server in order to reformat some strings.
Thanks in advance.
You can try this:
var input = "201101234";
var output = Regex.Replace(input, #"^\d{2}(\d{2})(\d{5})$", "${1}-${2}01");
Console.WriteLine(output); // 11-0123401
This will match:
two digits, followed by
two digits captured as group 1, followed by
five digits captured as group 2
And return a string which replaces that matched text with
group 1, followed by
a literal hyphen, followed by
group 2, followed by
a literal 01.
The start and end anchors ( ^ / $ ) ensure that if the input string does not exactly match this pattern, it will simply return the original string.
If you can use custom C# scripts, you may want to use Substring instead:
string newStr = string.Format("{0}-{1}01", old.Substring(2,2), old.Substring(4));
I don't think you really need a regex here. Substring would be better. But still if you want regex only, you can use this:
string newString = Regex.Replace(input, #"^\d{2}(\d{2})(\d+)$", "$1-${2}01");
Explanation:
^\d{2} // Match first 2 digits. Will be ignored
(\d{2}) // Match next 2 digits. Capture it in group 1
(\d+)$ // Match rest of the digits. Capture it in group 2
Now, the required digits, are in group 1 and 2, which you use in the replacement string.
Do you even SQL? Pull some levers and stuff.
When the textbox changes I want add a whitespace between numeric and alphanumeric characters.
For example
34 YT 567 *Allowed*
22 KL 2345 *Allowed*
22KL 2345 *Not Allowed*
22KL2345 *Not Allowed*
22 KL2345 *Not Allowed*
This will fix an incorrect value by inserting spaces where necessary:
var correctedValue = Regex.Replace(
incorrectValue,
"(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])",
" ");
You can use the same pattern to detect an incorrect value using Regex.IsMatch if you want to warn the user rather than fix it automatically.
Edit:
Regex.IsMatch(MyTextBox.Text,
"(?<=[0-9])(?=[A-Za-z])|(?<=[A-Za-z])(?=[0-9])|[^a-zA-Z0-9 ]")
will return true if the user inputs a number next to a letter, or inputs any non-alphanumeric (and non-space) character.
If you want to remove non-alphanumeric characters and insert spaces you'll need to do it in two steps; first Regex.Replace with pattern [^a-zA-Z0-9 ], then the Regex.Replace call above.
You can easily find bad input using RegEx.
Regex rgx = new RegEx("([0-9]+[a-z]|[A-Z]+)||([a-z]|[A-Z]+[0-9]+)");
if (rgx.IsMatch(MyTextBox.Text)
{
//bad input
}
else
//input was good.
The regular expression is matching one or more numbers followed directly by one or more letters or the other way around (letters then numbers).