Regex pattern to extract version number from string - c#

I want to extract version number from string.
a string = "Tale: The Secrets 1.6"
b string=" The 34. Mask 1.6.98";
So for a version number is 1.6 and for b is 1.6.98

\d+(\.\d+)+
\d+         : one or more digits
\.           : one point
(\.\d+)+ : one or more occurences of point-digits
Will find
2.5
3.4.567
3.4.567.001
But will not find
12
3.
.23
If you want to exclude decimal numbers like 2.5 and expect a version number to have at least 3 parts, you can use a quantifier like this
\d+(\.\d+){2,}
After the comma, you can specify a maximum number of ocurrences.

Try:
Regex pattern = new Regex("\d+(\.\d+)+");
Match m = pattern.Match(a);
string version = m.Value;

You can write
[0-9]+(\.[0-9]+)+$
This should match the format. The $ is for matching at the end, can be dropped if not needed.

By version number, do you mean any sequence of digits interspersed with dots?
\d+(\.\d+)+

Related

How to get whats between two numbers in a string?

I have a lot of movie files and I want to get their production year from their file names. as below:
Input: Kingdom.of.Heaven.2005.720p.Dubbed.Film2media
Output: 2005
This code just splits all the numbers:
string[] result = Regex.Split(str, #"(\d+:)");
You must be more specific about which numbers you want. E.g.
Regex to find the year (not for splitting):
\b(19\d\d)|(20\d\d)\b
19\d\d selects numbers like 1948, 1989.
20\d\d selects numbers like 2001, 2022.
\b specifies the word limits. It excludes numbers or words with 5 or more digits.
| means or
But it is difficult to make a fool proof algorithm without knowing how exactly the filename is constructed. E.g. the movie "2001: A Space Odyssey" was released in 1968. So, 2001 is not a correct result here.
To omit the movie name, you could search backwards like this:
string productionYear =
Regex.Match(str, #"\b(19\d\d)|(20\d\d)\b", RegexOptions.RightToLeft);
If instead of 720p we had a resolution of 2048p for instance, this would not be a problem, because the 2nd \b requires the number to be at the word end.
If the production year was always the 4th item from the right, then a better way to get this year would be:
string[] parts = str.Split('.');
string productionYear = parts[^4]; // C# 8.0+, .NET Core
// or
string productionYear = parts[parts.Length - 4]; // C# < 8 or .NET Framework
Note that the regex expression you specify in Regex.Split designates the separators, not the returned values.
I would not try to split the string, more like match a field. Also, consider matching \d{4} and not \d+ if you want to be sure to get years and not other fields like resolution in your example
You can try this:
string str = "Kingdom.of.Heaven.2005.720p.Dubbed.Film2media";
string year = Regex.Match(str, #"(?<=\.)(\d{4})(?=\.)").Groups[1].Value;
Console.WriteLine("Year: " + year);
Output: Year: 2005
Demo: https://dotnetfiddle.net/KM2PNk
\d{4}: This matches any sequence of four digits.
(?<=\.): This is a positive lookbehind assertion, which means that the preceding pattern must be present, but is not included in the match. In this case, the preceding pattern is a dot, so the regular expression will only match a sequence of four digits if it is preceded by a dot.
(?=\.): This is a positive lookahead assertion, which means that the following pattern must be present, but is not included in the match. In this case, the following pattern is a dot, so the regular expression will only match a sequence of four digits if it is followed by a dot.

C# regex to match exact number (including integers and decimals)

Summary
I'm trying to use regex to match an exact number (i.e. the number as a human would understand it, not the digit itself) within a larger string. The number I'm trying to match will vary. It could be an integer or a decimal, and it could be a single digit or multiple digits.
Examples
If trying to match the number 2, I want it to find the 2 in x + 2 + 3 but not in 2.5, 2.52 or 5.2 (because that's the digit 2, not the actual number 2).
If trying to match the number 2.5, I want it to find the 2.5 in x + 2.5 + 3 and 2.5, but not 2.52 or 12.5.
Note that 2 and 2.5 are just examples, I want this to work for any arbitrary positive number (if it works for negative numbers that's not a problem, but it's also not a requirement).
Initial attempt
I started with (\bX\b)+ (where X will be the number I want to match), which works when X is 2.5 but not when X is 2. This is because it's using word breaks to identify the start and end of the number, but a decimal point counts as a word break. This means that if X is 2 (i.e. the regex is (\b2\b)+) it will match the number 2 (correct), but also 2.x (incorrect) and x.2 (also incorrect).
Current attempt
I've fixed the problem of 2.x by changing the expression to (\bX\b(?!\.))+. This excludes numbers where X is followed by a decimal point, so if X is 2 it will match 2 (correct), will not match 2.x (correct) but will still match x.2 (incorrect). If X is a decimal number, this works correctly (so if X is 2.5 it will correctly match 2.5 and exclude 12.5 or 2.51).
How can I avoid matching X when it's preceded by a decimal point?
Real use-case
If it helps, the end goal is to use this with the C# Regex.Replace function as follows:
private static string ReplaceNumberWithinFormula(string originalFormula, double numberToReplace, string textToReplaceNumberWith)
{
return Regex.Replace(originalFormula, $#"(\b{numberToReplace}\b(?!\.))+", textToReplaceNumberWith);
}
You may use
private static string ReplaceNumberWithinFormula(string originalFormula, double numberToReplace, string textToReplaceNumberWith)
{
return Regex.Replace(originalFormula, $#"(?<!\d\.?){Regex.Escape(numberToReplace.ToString())}(?!\.?\d)", textToReplaceNumberWith);
}
See the C# demo
The (?<!\d\.?){Regex.Escape(numberToReplace.ToString())}(?!\.?\d), given the variable inside is equal to 2.5, translates into (?<!\d\.?)2\.5(?!\.?\d) and matches 2.5 only if
(?<!\d\.?) - not preceded with a digit and an optional .
(?!\.?\d) - not followed with an optional . and then a digit.
See the regex demo.
A simpler regex that will work with the input like you have only can be a word boundary + lookarounds based pattern like
private static string ReplaceNumberWithinFormula(string originalFormula, double numberToReplace, string textToReplaceNumberWith)
{
return Regex.Replace(originalFormula, $#"\b(?<!\.){Regex.Escape(numberToReplace.ToString())}\b(?!\.)", textToReplaceNumberWith);
}
Here, the regex will look like \b(?<!\.)2\.5\b(?!\.) and will match a word boundary position first (with \b), then will make sure there is no . right before the location (with (?<!\.)), then will match 2.5, assure there is no word char (letter, digit, or _) right after the number and then will check that there is no . after the number. See this regex demo.
It is equal to $#"(?<![\w.]){Regex.Escape(numberToReplace.ToString())}(?![\w.])", and is more restrictive than the top solution that will let you match the exact float or integer number in any context.
Add also a negative lookbehind:
((?<!\.)\b2\b(?!\.))+
Check the demo.
I think this is what you are looking for
^\d+(\.\d+)?\b
This matches whole numbers like 2
Matches decimal numbers like 2.1
Does not match patterns like 2. or .3

Regex parse group of numbers

What i have:
1. 25686-47362-04822-08149-48999-28161-15124-63556
2. 25686-47362-04822-08149-48999-28161-15124-6355654534
3. 54354325686-47362-04822-08149-48999-28161-15124-63556
4. 25686-47362-04822-08149-48999-28161-15124-6355654534fds
5. fdsfds54354325686-47362-04822-08149-48999-28161-15124-63556
6. 25686-47362-04822-08149-48999-28161-15124-63556-63556
What i expect to get
1. 25686-47362-04822-08149-48999-28161-15124-63556
I tried something nearest ([0-9]{5,5}){8}
I trying to avoid 2,3,4,5,6.
Try this
string source = #"25686-47362-04822-08149-48999-28161-15124-63556";
bool result = Regex.IsMatch(source, "^[0-9]{5}(-[0-9]{5}){7}$");
Explanation:
^ anchor (beginning of the string)
[0-9]{5} 5 digits group
(-[0-9]{5}){7} 7 more groups of 5 digits
$ anchor (ending of the string)
I am not sure there is a way to ask for it to "repeat" the grouping, but i would type it like that:
/^([0-9]{5}\-[0-9]{5}\-[0-9]{5}\-[0-9]{5}\-[0-9]{5}\-[0-9]{5}\-[0-9]{5}\-[0-9]{5})/
You can use this:
^\d+\.\s(\d{5}-?){8}$
It matches a whole line that matches your criteria: A digit or more, a dot, a whitespace, 8 blocks à 5 digits with hyphens.
You can qualify that line with:
/^((?:\D|^)\d{5}){8}$/m
Demo
Or
/^((?:-|^)\d{5}){8}$/m
To be more specific with hyphen delimiters.

.NET REGEX Matching matches empty strings

I have this
pattern:
[0-9]*\.?[0-9]*
Target:
X=113.3413475 Y=18.2054775
And i want to match the numbers. It matches find in testing software like http://regexpal.com/ and Regex Coach.
But in Dot net and http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
I get:
Found 11 matches:
1.
2.
3.
4.
5.
6. 113.3413475
7.
8.
9.
10. 18.2054775
11.
String literals for use in programs:
C#
#"[0-9]*[\.]?[0-9]*"
Any one have any idea why i'm getting all these empty matches.
Thanks and Regards,
Kevin
Yes, that will match empty string. Look at it:
[0-9]* - zero or more digits
\.? - an optional period
[0-9]* - zero or more digits
Everything's optional, so an empty string matches.
It sounds like you always want there to be digits somewhere, for example:
[0-9]+\.[0-9]*|\.[0-9]+|[0-9]+
(The order here matters, as you want it to take the most possible.)
That works for me:
using System;
using System.Text.RegularExpressions;
class Test
{
static void Main(string[] args)
{
string x = "X=113.3413475 Y=18.2054775";
Regex regex = new Regex(#"[0-9]+\.[0-9]*|\.[0-9]+|[0-9]+");
var matches = regex.Matches(x);
foreach (Match match in matches)
{
Console.WriteLine(match);
}
}
}
Output:
113.3413475
18.2054775
There may well be better ways of doing it, admittedly :)
Try this one:
[0-9]+(\.[0-9]+)?
It's slightly different that Jon Skeet's answer in that it won't match .45, it requires either a number alone (e.g. 8) or a real decimal (e.g. 8.1 or 0.1)
Another alternative is to keep your original regex, and just assert it must have a number in it (maybe after a dot):
[0-9]*\.?[0-9]*
Goes to:
(?=\.?[0-9])[0-9]*\.?[0-9]*
The key problem is the *, which means "match zero or more of the preceding characters". The empty string matches zero or more digits, which is why you're getting all those matches.
Change your two *s to +s and you'll get what you want.
The problem with this regex is that it is completely optional in all the fields, so an empty string also is matched by it. I would consider adding all the cases. By the regex, I see you want the numbers with or without dot, and with or without a set of decimal digits. You can separate first those that contain only numbers [0-9]+, then those that contain numbers plus only a dot, [0-9]+\. and then join them all with | (or).
The problem with the regex as it is is that it allows cases that are not real numbers, for example, the cases in which the first set of numbers and the last set of numbers are empty (just a dot), so you have to put the valid cases explicitly.
Regex pattern = new Regex( #"[0-9]+[\.][0-9]+");
string info = "X=113.3413475 Y=18.2054775";
MatchCollection matches = pattern.Matches(info);
int count = 1;
foreach(Match match in matches)
{
Console.WriteLine("{0} : {1}", count++, match.Value);
}
//output
//1 : 113.3413475
//2 : 18.2054775
Replace your * with + and remove ? from your period case.
EDIT: from above conversation: #"[0-9]+.[0-9]*|.[0-9]+|[0-9]+", is the better case. catches 123, .123, 123.123 etc

How to use C# Regular Expression to get the match number list?

string inputText = "abc13500008888, *a1c13688886666abc mm13565685555**" ;
How to use C# Regular Expression to get the match number list?
The rule is that is a 11 continuous number and the first letter is 1.
The results should be:
13500008888
13688886666
13565685555
If the numbers are always 11 digits starting with 1, you can just do
Regex.Matches(inputText, #"1\d{10}");
If you want to match other lengths as well, you can either use + for one or more or {min,} where min is the minimum number of digits you want to match.
Regex.Matches(input, #"\d+");
Regex.Matches(input, #"1\d{10}");

Categories