Greetings beloved comrades.
I cannot figure out how to accomplish the following via a regex.
I need to take this format number 201101234 and transform it to 11-0123401, where digits 3 and 4 become the digits to the left of the dash, and the remaining five digits are inserted to the right of the dash, followed by a hardcoded 01.
I've tried http://gskinner.com/RegExr, but the syntax just defeats me.
This answer, Equivalent of Substring as a RegularExpression, sounds promising, but I can't get it to parse correctly.
I can create a SQL function to accomplish this, but I'd rather not hammer my server in order to reformat some strings.
Thanks in advance.
You can try this:
var input = "201101234";
var output = Regex.Replace(input, #"^\d{2}(\d{2})(\d{5})$", "${1}-${2}01");
Console.WriteLine(output); // 11-0123401
This will match:
two digits, followed by
two digits captured as group 1, followed by
five digits captured as group 2
And return a string which replaces that matched text with
group 1, followed by
a literal hyphen, followed by
group 2, followed by
a literal 01.
The start and end anchors ( ^ / $ ) ensure that if the input string does not exactly match this pattern, it will simply return the original string.
If you can use custom C# scripts, you may want to use Substring instead:
string newStr = string.Format("{0}-{1}01", old.Substring(2,2), old.Substring(4));
I don't think you really need a regex here. Substring would be better. But still if you want regex only, you can use this:
string newString = Regex.Replace(input, #"^\d{2}(\d{2})(\d+)$", "$1-${2}01");
Explanation:
^\d{2} // Match first 2 digits. Will be ignored
(\d{2}) // Match next 2 digits. Capture it in group 1
(\d+)$ // Match rest of the digits. Capture it in group 2
Now, the required digits, are in group 1 and 2, which you use in the replacement string.
Do you even SQL? Pull some levers and stuff.
Related
I have a lot of movie files and I want to get their production year from their file names. as below:
Input: Kingdom.of.Heaven.2005.720p.Dubbed.Film2media
Output: 2005
This code just splits all the numbers:
string[] result = Regex.Split(str, #"(\d+:)");
You must be more specific about which numbers you want. E.g.
Regex to find the year (not for splitting):
\b(19\d\d)|(20\d\d)\b
19\d\d selects numbers like 1948, 1989.
20\d\d selects numbers like 2001, 2022.
\b specifies the word limits. It excludes numbers or words with 5 or more digits.
| means or
But it is difficult to make a fool proof algorithm without knowing how exactly the filename is constructed. E.g. the movie "2001: A Space Odyssey" was released in 1968. So, 2001 is not a correct result here.
To omit the movie name, you could search backwards like this:
string productionYear =
Regex.Match(str, #"\b(19\d\d)|(20\d\d)\b", RegexOptions.RightToLeft);
If instead of 720p we had a resolution of 2048p for instance, this would not be a problem, because the 2nd \b requires the number to be at the word end.
If the production year was always the 4th item from the right, then a better way to get this year would be:
string[] parts = str.Split('.');
string productionYear = parts[^4]; // C# 8.0+, .NET Core
// or
string productionYear = parts[parts.Length - 4]; // C# < 8 or .NET Framework
Note that the regex expression you specify in Regex.Split designates the separators, not the returned values.
I would not try to split the string, more like match a field. Also, consider matching \d{4} and not \d+ if you want to be sure to get years and not other fields like resolution in your example
You can try this:
string str = "Kingdom.of.Heaven.2005.720p.Dubbed.Film2media";
string year = Regex.Match(str, #"(?<=\.)(\d{4})(?=\.)").Groups[1].Value;
Console.WriteLine("Year: " + year);
Output: Year: 2005
Demo: https://dotnetfiddle.net/KM2PNk
\d{4}: This matches any sequence of four digits.
(?<=\.): This is a positive lookbehind assertion, which means that the preceding pattern must be present, but is not included in the match. In this case, the preceding pattern is a dot, so the regular expression will only match a sequence of four digits if it is preceded by a dot.
(?=\.): This is a positive lookahead assertion, which means that the following pattern must be present, but is not included in the match. In this case, the following pattern is a dot, so the regular expression will only match a sequence of four digits if it is followed by a dot.
Im having a hard time with grouping parts of a Regex. I want to validate a few things in a string that follows this format: I-XXXXXX.XX.XX.XX
Validate that the first set of 6 X's (I-xxxxxx.XX.XX.XX) does not contain characters and its length is no more than 6.
Validate that the third set of X's (I-XXXXXX.XX.xx.XX) does not contain characters and is only 1 or 2.
Now, I have already validation on the last set of XX's to make sure the numbers are 1-8 using
string pattern1 = #"^.+\.(0?[1-8])$";
Match match = Regex.Match(TxtWBS.Text, pattern1);
if (match.Success)
;
else
{ errMessage += "WBS invalid"; errMessage +=
Environment.NewLine; }
I just cant figure out how to target specific parts of the string. Any help would be greatly appreciated and thank you in advance!
You're having some trouble adding new validation to this string because it's very generic. Let's take a look at what you're doing:
^.+\.(0?[1-8])$
This finds the following:
^ the start of the string
.+ everything it can, other than a newline, basically jumping the engine's cursor to the end of your line
\. the last period in the string, because of the greedy quantifier in the .+ that comes before it
0? a zero, if it can
[1-8] a number between 1 and 8
()$ stores the two previous things in a group, and if the end of the string doesn't come after this, it may even backtrace and try the same thing from the second to last period instead, which we know isn't a great strategy.
This ends up matching a lot of weird stuff, like for example the string The number 0.1
Let's try patterning something more specific, if we can:
^I-(\d{6})\.(\d{2})\.(\d{1,2})\.([1-8]{2})$
This will match:
^I- an I and a hyphen at the start of the string
(\d{6}) six digits, which it stores in a capture group
\. a period. By now, if there was any other number of digits than six, the match fails instead of trying to backtrace all over the place.
(\d{2})\. Same thing, but two digits instead of six.
(\d{1,2})\. Same thing, the comma here meaning it can match between one and two digits.
([1-8]{2}) Two digits that are each between 1 and 8.
$ The end of the string.
I hope I understood what exactly you're trying to match here. Let me know if this isn't what you had in mind.
This regex:
^.-[0-9]{6}(\.[1-8]{1,2}){3}$
will validate the following:
The first character can be any character, but is of length 1
It is followed by a dash
The dash is followed by exactly 6 numbers 0 - 9. (If this could be less than 6 characters - for example, between 3 and 6 characters - just replace {6} with {3,6}).
This is followed by 3 groups of characters. Each of this groups are proceeded by a period, are of length 1 or 2, and can be any number 1 - 8.
An example of a valid string is:
I-587954.12.34.56
This is also valid:
I-587954.1.3.5
But this isn't:
I-587954.12.80.356
because the second-to-last group contains a 0, and because the last group is of length 3.
Pleas let me know if I have misunderstood any of the rules.
^I-([0-9]{1,6})\.(.{1,2})\.(0[1-2])\.(.{1,2})$
groups delimited by . (\.) :
([0-9]{1,6}) - 1-6 digits
(.{1,2}) - 1-2 any single character
(0[1-2]) - 01 or 02
(.{1,2}) - 1-2 any single character
you can write and easy test regex on your input data, just google "regex online"
What i have:
1. 25686-47362-04822-08149-48999-28161-15124-63556
2. 25686-47362-04822-08149-48999-28161-15124-6355654534
3. 54354325686-47362-04822-08149-48999-28161-15124-63556
4. 25686-47362-04822-08149-48999-28161-15124-6355654534fds
5. fdsfds54354325686-47362-04822-08149-48999-28161-15124-63556
6. 25686-47362-04822-08149-48999-28161-15124-63556-63556
What i expect to get
1. 25686-47362-04822-08149-48999-28161-15124-63556
I tried something nearest ([0-9]{5,5}){8}
I trying to avoid 2,3,4,5,6.
Try this
string source = #"25686-47362-04822-08149-48999-28161-15124-63556";
bool result = Regex.IsMatch(source, "^[0-9]{5}(-[0-9]{5}){7}$");
Explanation:
^ anchor (beginning of the string)
[0-9]{5} 5 digits group
(-[0-9]{5}){7} 7 more groups of 5 digits
$ anchor (ending of the string)
I am not sure there is a way to ask for it to "repeat" the grouping, but i would type it like that:
/^([0-9]{5}\-[0-9]{5}\-[0-9]{5}\-[0-9]{5}\-[0-9]{5}\-[0-9]{5}\-[0-9]{5}\-[0-9]{5})/
You can use this:
^\d+\.\s(\d{5}-?){8}$
It matches a whole line that matches your criteria: A digit or more, a dot, a whitespace, 8 blocks à 5 digits with hyphens.
You can qualify that line with:
/^((?:\D|^)\d{5}){8}$/m
Demo
Or
/^((?:-|^)\d{5}){8}$/m
To be more specific with hyphen delimiters.
The input string "134.45sdfsf" passed to the following statement
System.Text.RegularExpressions.Regex.Match(input, pattern).Success;
returns true for following patterns.
pattern = "[0-9]+"
pattern = "\\d+"
Q1) I am like, what the hell! I am specifying only digits, and not special characters or alphabets. So what is wrong with my pattern, if I were to get false returned value with the above code statement.
Q2) Once I get the right pattern to match just the digits, how do I extract all the numbers in a string?
Lets say for now I just want to get the integers in a string in the format "int.int^int" (for example, "11111.222^3333", In this case, I want extract the strings "11111", "222" and "3333").
Any idea?
Thanks
You are specifying that it contains at least one digit anywhere, not they are all digits. You are looking for the expression ^\d+$. The ^ and $ denote the start and end of the string, respectively. You can read up more on that here.
Use Regex.Split to split by any non-digit strings. For example:
string input = "123&$456";
var isAllDigit = Regex.IsMatch(input, #"^\d+$");
var numbers = Regex.Split(input, #"[^\d]+");
it says that it has found it.
if you want the whole expression to be checked so :
^[0-9]+$
Q1) Both patterns are correct.
Q2) Assuming you are looking for a number pattern "5 digits-dot-3 digits-^-4 digits" - here is what your looking for:
var regex = new Regex("(?<first>[0-9]{5})\.(?<second>[0-9]{3})\^(?<third>[0-9]{4})");
var match = regex.Match("11111.222^3333");
Debug.Print(match.Groups["first"].ToString());
Debug.Print(match.Groups["second"].ToString
Debug.Print(match.Groups["third"].ToString
I prefer named capture groups - they will give a more clear way to acces than
i have the following string
Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)
i want to capture
212,323.222
2-2.24
0.5
i.e. i want the above three results from the string,
can any one help me with this regex
I noticed that your hyphen in 2–2.4kg is not really hyphen, its a unicode 0x2013 "DASH".
So, here is another regex in C#
#"[0-9]+([,.\u2013-][0-9]+)*"
Test
MatchCollection matches = Regex.Matches("Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)", #"[0-9]+([,.\u2013-][0-9]+)*");
foreach (Match m in matches) {
Console.WriteLine(m.Groups[0]);
}
Here is the results, my console does not support printing unicode char 2013, so its "?" but its properly matched.
2121,323.222
2?2.4
0.5
Okay I didn't notice the C# tag until now. I will leave the answer but I know that's not what you expected, see if you can do something with it. Perhaps the title should have mentioned the programming language?
Sure:
Fat mass loss was (.*) greater for GPLC \((.*) vs. (.*)kg\)
Find your substrings in \1, \2 and \3.
If for Emacs, swap all parentheses and escaped parentheses.
How about something like this:
^.*((?:\d+,)*\d+(?:\.\d+)?).*(\d+(?:\.\d+)?(?:-\d+(?:\.\d+))?).*(\d+(?:\.\d+)).*$
A little more general, I think. I'm a little concerned about .* being greedy.
Fat mass loss was 2121,323.222 greater
for GPLC (2–2.4kg vs. 0.5kg)
a generalized extractor:
/\D+?([\d\,\.\-]+)/g
explanation:
/ # start pattern
\D+ # 1 or more non-digits
( # capture group 1
[\d,.-]+ # character class, 1 or more of digits, comma, period, hyphen
) # end capture group 1
/g # trailing regex g modifier (make regex continue after last match)
sorry I don't know c# well enough for a full writeup, but the pattern should plug right in.
see: http://www.radsoftware.com.au/articles/regexsyntaxadvanced.aspx for some implementation examples.
I came out with something like this atrocity:
-?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))?(?:[–-]-?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))?)?
Out of witch -?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))? is repeated twice, with – in the middle (note that this is a long hyphen).
This should take care of dots and commas outside of numbers, eg: hello,23,45.2-7world - will capture 23,45.2-7.
It looks like you're trying to find all numbers in the string (possibly with commas inside the number), and all ranges of numbers such as "2-2.4". Here is a regex that should work:
\d+(?:[,.-]\d+)*
From C# 3, you can use it like this:
var input = "Fat mass loss was 2121,323.222 greater for GPLC (2-2.4kg vs. 0.5kg)";
var pattern = #"\d+(?:[,.-]\d+)*";
var matches = Regex.Matches(input, pattern);
foreach ( var match in matches )
Console.WriteLine(match.Value);
Hmm, this is a tricky question, especially because the input string contains unicode character – (EN DASH) instead of - (HYPHEN-MINUS). Therefore the correct regex to match the numbers in the original string would be:
\d+(?:[\u2013,.]\d+)*
If you want a more generic approach would be:
\d+(?:[\p{Pd}\p{Pc}\p{Po}]\d+)*
which matches dash punctuation, connecter punctuation and other punctuation. See here for more information about those.
An implementation in C# would look like this:
string input = "Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)";
try {
Regex rx = new Regex(#"\d+(?:[\p{Pd}\p{Pc}\p{Po}\p{C}]\d+)*", RegexOptions.IgnoreCase | RegexOptions.Multiline);
Match match = rx.Match(input);
while (match.Success) {
// matched text: match.Value
// match start: match.Index
// match length: match.Length
match = match.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Let's try this one :
(?=\d)([0-9,.-]+)(?<=\d)
It captures all expressions containing only :
"[0-9,.-]" characters,
must start with a digit "(?=\d)",
must finish with a digit "(?<=\d)"
It works with a single digit expression and does not include beginning or trailing [.,-].
Hope this helps.
I got the solution to my problem.
The following is the Regex that gave my desired result:
(([0-9]+)([–.,-]*))+