Regular expression for numbers in string - c#

The input string "134.45sdfsf" passed to the following statement
System.Text.RegularExpressions.Regex.Match(input, pattern).Success;
returns true for following patterns.
pattern = "[0-9]+"
pattern = "\\d+"
Q1) I am like, what the hell! I am specifying only digits, and not special characters or alphabets. So what is wrong with my pattern, if I were to get false returned value with the above code statement.
Q2) Once I get the right pattern to match just the digits, how do I extract all the numbers in a string?
Lets say for now I just want to get the integers in a string in the format "int.int^int" (for example, "11111.222^3333", In this case, I want extract the strings "11111", "222" and "3333").
Any idea?
Thanks

You are specifying that it contains at least one digit anywhere, not they are all digits. You are looking for the expression ^\d+$. The ^ and $ denote the start and end of the string, respectively. You can read up more on that here.
Use Regex.Split to split by any non-digit strings. For example:
string input = "123&$456";
var isAllDigit = Regex.IsMatch(input, #"^\d+$");
var numbers = Regex.Split(input, #"[^\d]+");

it says that it has found it.
if you want the whole expression to be checked so :
^[0-9]+$

Q1) Both patterns are correct.
Q2) Assuming you are looking for a number pattern "5 digits-dot-3 digits-^-4 digits" - here is what your looking for:
var regex = new Regex("(?<first>[0-9]{5})\.(?<second>[0-9]{3})\^(?<third>[0-9]{4})");
var match = regex.Match("11111.222^3333");
Debug.Print(match.Groups["first"].ToString());
Debug.Print(match.Groups["second"].ToString
Debug.Print(match.Groups["third"].ToString
I prefer named capture groups - they will give a more clear way to acces than

Related

How to structure REGEX in C#

I currently have a regex that checks if a US State is spelled correctly
var r = new Regex(string.Format(#"\b(?:{0})\b", pattern), RegexOptions.IgnoreCase)
pattern is a pipe delimited string containing all US states.
It was working as intended today until one of the states was spelled like "Florida.." I would have liked it picked up the fact there was a fullstop character.
I found this regex that will only match letters.
^[a-zA-Z]+
How do I combine this with my current Regex or is it not possible?
I tried some variations of this but it didn't work
var r = new Regex(string.Format(#"\b^[a-zA-Z]+(?:{0})\b", pattern), RegexOptions.IgnoreCase);
EDIT: Florida.. was in my input string. My pattern string hasn't changed at all. Apologies for not being clearer.
It seems you need start of string (^) and end of string ($) anchors:
var r = new Regex(string.Format(#"^(?:{0})$", pattern), RegexOptions.IgnoreCase);
The regex above would match any string comprising a name of a state only.
You should make a replacement of the pattern variable to escape the regex special characters. One of them is the . character. Something similar to pattern.Replace(".", #"\.") but doing all the especial characters.
I believe you can't merge both patterns into one, so you would have to perform two diferent regex operations, one to split the states into a list, and a subsequent one for the validation of each item within it.
I'd rather go for something "simpler" such as
var states = input.Split('|').Select(s => new string(s.Where(char.IsLetter).ToArray()))
.Where(s => !string.IsNullOrWhiteSpace(s));
Basically don't use a regex here.
List<string> values = new List<string>() {"florida", etc.};
string input;
//is input in values, ignore case and look for any value that includes the input value
bool correct = values.Any(a =>
input.IndexOf(a, StringComparison.CurrentCultureIgnoreCase) >= 0);
This will be considerably more efficient than a regex based option. This should match florida, Florida and Florida..., etc.
Don't search for characters directly, tell regex to consume all which are not targeted specific characters such as [^\|.]+. It uses the set [ ] with the not ^ indicator says consume anything which is not a literal | or .. Hence it consumes just the text needed. Such as on
Colorado|Florida..|New Mexico
returns 3 matches of Colorado Florida and New Mexico

How do I exclude a regex value in a replace

I have a regex expression which searches for strings using a Prefix and a Suffix. In it's simplest form \$\$\w+\$\$ will find $$My_Name$$ (in this case the Prefix and Suffix are both equal to $$) Another example would be \[\#\w+\#\] to match [#My_Name#].
The Prefix and Suffix will always be a specific string of 0 to n characters which I can always safely escape for a direct character match.
I extract the Matches in my C# code so I can work with them but obviously my match contains $$My_Name$$ but what I want is to simply get the inner string between the Suffix and Prefix: My_Name.
How do I exclude the Prefix and Suffix from the result?
Change your REGEX to \$\$(\w+)\$\$ and use $1 to get the matching (inner) string.
For example
string pattern = #"\$\$(\w+)\$\$";
string input = "$$My_Name$$";
Regex rgx = new Regex(pattern);
Match result = rgx.Match(input);
Console.WriteLine(result.Groups[1]);
Outputs: "My Name"
P.S - There's no need to use explicitly typed local variables, but I just wanted the types to be clear.
You can group your w+ into a group like this (w+) then when you retrieve the matched string you might be able to ask for that subgroup.
I do not know if I am wrong (but you didn't provided any code whatsoever) but I think this is how it is done: .Groups[1].Value on the the result of Regex.Match.
How about the regex below. It works by capturing the first character into a named group then capturing any repeats into a named group called first group which it then uses to match the end of the string. It will work with any number of repeated character so long as they repeated at the end of the word.
'(?<first_group>(?<first_char>.)\k<first_char>+)(?<word>\w+)\k<first_group>+'
You just need to then extract the capture group named word like so:
String sample = "$$My_Name$$";
Regex regex = new Regex("(?<first_group>(?<first_char>.)\k<first_char>+)(?<word>\w+)\k<first_group>+");
Match match = regex.Match(sample);
if (match.Success)
{
Console.WriteLine(match.Groups["word"].Value);
}
You can use named group like this:
(\$\$)(?<group1>.+?)\1 -- pattern 1 (first case)
\[(#)(?<group2>.+?)\1\] -- pattern 2 (second case)
or combined representation would be:
(\$\$)(?<group1>.+?)\1|\[(#)(?<group2>.+?)\3\]
I would suggest you to use .+? it will help you to match any character other than your prefix/suffix.
Live Demo

Regex Substring or Left Equivalent

Greetings beloved comrades.
I cannot figure out how to accomplish the following via a regex.
I need to take this format number 201101234 and transform it to 11-0123401, where digits 3 and 4 become the digits to the left of the dash, and the remaining five digits are inserted to the right of the dash, followed by a hardcoded 01.
I've tried http://gskinner.com/RegExr, but the syntax just defeats me.
This answer, Equivalent of Substring as a RegularExpression, sounds promising, but I can't get it to parse correctly.
I can create a SQL function to accomplish this, but I'd rather not hammer my server in order to reformat some strings.
Thanks in advance.
You can try this:
var input = "201101234";
var output = Regex.Replace(input, #"^\d{2}(\d{2})(\d{5})$", "${1}-${2}01");
Console.WriteLine(output); // 11-0123401
This will match:
two digits, followed by
two digits captured as group 1, followed by
five digits captured as group 2
And return a string which replaces that matched text with
group 1, followed by
a literal hyphen, followed by
group 2, followed by
a literal 01.
The start and end anchors ( ^ / $ ) ensure that if the input string does not exactly match this pattern, it will simply return the original string.
If you can use custom C# scripts, you may want to use Substring instead:
string newStr = string.Format("{0}-{1}01", old.Substring(2,2), old.Substring(4));
I don't think you really need a regex here. Substring would be better. But still if you want regex only, you can use this:
string newString = Regex.Replace(input, #"^\d{2}(\d{2})(\d+)$", "$1-${2}01");
Explanation:
^\d{2} // Match first 2 digits. Will be ignored
(\d{2}) // Match next 2 digits. Capture it in group 1
(\d+)$ // Match rest of the digits. Capture it in group 2
Now, the required digits, are in group 1 and 2, which you use in the replacement string.
Do you even SQL? Pull some levers and stuff.

Using Regex.Split to remove anything non numeric and splitting on -

I'm not sure why but for some reason The Regex Split method is going over my head. I'm trying to look through tutorials for what I need and can't seem to find anything.
I simply am reading an excel doc and want to format a string such as $145,000-$179,999 to give me two strings. 145000 and 179999. At the same time I'd like to prune a string such as '$180,000-Limit to simply 180000.
var loanLimits = Regex.Matches(Result.Rows[row + 2 + i][column].ToString(), #"\d+");
The above code seems to chop '$145,000-$179,999 up into 4 parts: 145, 000, 179, 999. Any ideas on how to achieve what I'm asking?
Regular expressions match exactly character by character (there's no knowledge of the concept of a "number" or a "word" in regular expressions - you have to define that yourself in your expression). The expression you are using, \d+, uses the character class \d, which means any digit 0-9 (and + means match one or more). So in the expression $145,000, notice that the part you are looking for is not just composed of digits; it also includes commas. So the regular expression finds every continuous group of characters that matches your regular expression, which are the four groups of numbers.
There are a couple of ways to approach the problem.
Include , in your regular expression, so (\d|,)+, which means match as many characters in a row that are either a digit or a comma. There will be two matches: 145,000 and 179,999, from which you can further remove the commas with myStr.Replace(",", ""). (DEMO)
Do as you say in the title, and remove all non-numeric characters. So you could use Regex.Replace with the expression [^\d-]+ - which means match anything that is not a digit or a hyphen - and then replace those with "". Then the result would be 145000-179999, which you can split with a simple non-regular-expression split, myStr.Split('-'), to get your two parts. (DEMO)
Note that for your second example ($180,000-Limit), you'll need an extra check to count the number of results returned from Match in the first example, and Split in the second example to determine whether there were two numbers in the range, or only a single number.
you can try to treat each string separately by spiting it based on - and extraction only numbers from it
ArrayList mystrings = new ArrayList();
List<string> myList = Result.Rows[row + 2 + i][column].ToString().Split('-').ToList();
foreach(var item in myList)
{
string result = Regex.Replace(item, #"[^\d]", "");
mystrings.Add(result);
}
An alternative to using RegEx is to use the built in string and char methods in the DotNet framework. Assuming the input string will always have a single hypen:
string input = "$145,000-$179,999";
var split = input.Split( '-' )
.Select( x => string.Join( "", x.Where( char.IsLetterOrDigit ) ) )
.ToList();
string first = split.First(); //145000
string second = split.Last(); //179999
first you split the string using the standard Split method
then you create a new string by selectively taking only Letters or Digits from each item in the collection: x.Where...
then you join the string using the standard Join method
finally, take the first and last item in the collection for your 2 strings.

Regular Expressions (.NET) - How can I match a pattern that contains a variable number of digits at the end of the string?

string src = "portfolio1, portfolio2, portfolio20, portfolio300";
I'd like to match all strings that are of the pattern #"portfolio\d"
where \d can be anywhere from 1-3 digits in length. I have read that the use of {a, b} should work, so I tried:
pattern = #"portfolio\d{1, 3}"
Searching in the string, src, for this pattern returned an empty set. the following patterns worked partially:
pattern = #"portfolio\d"
pattern = #"portfolio\d{1}"
Try this:
pattern = #"portfolio\d{1,3}"
Note that you should not put a space in between the brackets, as you have in your example. That's why it didn't work right.
String pattern = #"^(?:(?:portfolio\d{1,3})(?:\x2C\s)*)+$";
My attempt. Will match any number of comma-seperated portfolio\d{1,3}'s
Running pattern "portfolio\d{1,3}" in Expresso, I get 4 matches on each of the portfolios. The space did it was the key.

Categories