Regular Expression for splitting string by number of characters

Regular Expression for splitting string by number of characters - c#

I have a 2D Barcode that I need to be parsed into two different items. I want my first expression to read the first 10 characters (numbers and letters) only. The second expression I want the first 10 characters to be ignored and then read the remaining characters (numbers, letters, _ ). The total number of characters remaing are not consistant.
Here is a sample of what the barcode reads. 20P0000002_0_DP-3_TR_DEBIT
Any suggestions?

You don't need regular expressions, String.Substring will do:
var first = barcode.Substring(0, 10);
var second = barcode.Substring(10);
You can then check if the first part is just letters and numbers with the nice but not theoretically 100% accurate
var isValid = first.All(char.IsLetterOrDigit);
or with the more prosaic
var acceptable = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var isValid = first.All(c => acceptable.IndexOf(c.ToUpper()) != -1);

For your first expression you would use this.
^([\dA-Za-z]{10})
^ = match beginning of string
( = begin capture group
[ = begin set of characters to match
\d = match all digits (0-9)
A-Za-z = match all uppercase and lowercase letters
] = end character set
{10} = match exactly 10 of the previous character set
) = end capture group
For your second, this one
^.{10}(.*)$
`^.{10} = match the first ten characters of the string (but don't capture them)
`(.*)$ = capture all remaining characters until the end of the string
EDIT:
As pointed out in the comments, you could easily combine these two expressions into one like so.
^([\dA-Za-z]{10})(.*)$
This will yield two capture groups with only one match operation.
It's worth noting that using a RegEx might be a good solution since the match will tell you whether or not the initial ten characters are only alphanumeric characters. If you're only seeking to capture the first ten characters regardless of what they are, then a RegEx is overkill. But if you want validation, a RegEx is a nice way to do that. Performance could be argued though, but you're already using .NET which carries some performance impact anyway.

Related

Regular expression only one character and 7 numbers

i want regex match only one char in any position of word and 7 numbers
match example:
1111111q
2222222q
111e1111
11e11111
i do this pattern but not working in all patterns:
[A-Za-z][0-9]{7}

Regular expressions match patterns. In your case, it would seem that the letter can be at any point in your string, which would mean that you would have a multitude of patterns which would need to be taken into consideration.
I think that for this case, you should not use regular expressions for simplicity's sake. I would recommend you take a look at the Char.isDigit(Char c) and Char.isLetter(Char c) methods and use counters to see that the string is in the format you are after.

there are readily available methods in C# for checking the conditions you want. I would use Regex if there is no parser or simple c# solution.
I would do like below
var str = "1111111u";
var isValid = str.Length ==8 &&
str.Where(char.IsDigit).Count() ==7 &&
str.Where(char.IsLetter).Count() ==1;

It is not that difficult in regex:
If the complete string has to match just use:
^(?=.{8}$)\d*[a-zA-Z]\d*$
See it here on regexr.
If this is a word in a larger text use:
\b(?=[a-z0-9]{8}\b)\d*[a-z]\d*\b
See it here on Regexr
\d*[a-z]\d* matches any amount of digits, followed by one letter, then again any amount of digits.
(?=[a-z0-9]{8} is a positive lookahead assertion, this ensures the length of 8 in total.
Important here is the use of anchors or word boundaries to avoid partial wrong matches.
If you really want to match any letter then use the Unicode property \p{L} instead of the character class:
^(?=.{8}$)\d*\p{L}\d*$

I can only come up with a "brute force" regex method:
foundMatch = Regex.IsMatch(subjectString,
#"\b
(?:[a-z]\d{7}|
\d[a-z]\d{6}|
\d{2}[a-z]\d{5}|
\d{3}[a-z]\d{4}|
\d{4}[a-z]\d{3}|
\d{5}[a-z]\d{2}|
\d{6}[a-z]\d{1}|
\d{7}[a-z])
\b",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
Note the word boundary anchors, which you should remove if this pattern is part of a longer string.
Also note the IgnoreCase option, which you can remove if all letters will be lower case.
Edit: See #stema Answer -- much more concise regex

This will match what you want:
(\d{1}\w\d{6}|\d{2}\w\d{5}|\d{3}\w\d{4}|\d{4}\w\d{3}|\d{5}\w\d{2}|\d{6}\w\d{1}|\d{7}\w)
I generated it like this, in powershell:
$n = 6;
for ($i = 1; $i -le 6; $i++) {
write-host "\d{"$i"}\w\d{"$n"}"
$n--
}

Your example will only work when the character is the first character in the string.
The problem you've got is that you need a total of 7 digits, and absolutely only one character potentially within those 7 digits. This is not something that's possible with regular expressions as defined in theory, because you have to have a link between the two groups of digits to see how many are in the other group and regexes can't carry that kind of context around with them.
I was wondering if it was possible using a lookahead assertion to ensure there's only one letter, but the best I can do is ensuring there's no instance of two letters in a row, which doesn't cover all possible invalid cases. Thus I think you're going to have to find another method, as npinti suggested. So something like:
public static bool Match(string s) {
return (s.Length == 8) &&
(s.Where(Char.IsDigit).Count() == 7) &&
(s.Where(Char.IsLetter).Count() == 1);
}
But I haven't tested that.

just use this if you want one letter and 7 digit
"[A-Za-z]{1}[0-9]{7}|[0-9]{7}[A-Za-z]{1}|[0-9]{1}[A-Za-z]{1}[0-9]{6}[0-9]{1}|[0-9]{2}[A-Za-z]{1}[0-9]{5}|[0-9]{3}[A-Za-z]{1}[0-9]{4}|[0-9]{4}[A-Za-z]{1}[0-9]{3}|[0-9]{5}[A-Za-z]{1}[0-9]{2}"
and here a code snippet how you can iterate through your result
string st = "1111111q 2222222q 111e1111 11e11111";
string pattS = #"[A-Za-z]{1}[0-9]{7}|[0-9]{7}[A-Za-z]{1}|[0-9]{1}[A-Za-z]{1}[0-9]{6}[0-9]{1}|[0-9]{2}[A-Za-z]{1}[0-9]{5}|[0-9]{3}[A-Za-z]{1}[0-9]{4}|[0-9]{4}[A-Za-z]{1}[0-9]{3}|[0-9]{5}[A-Za-z]{1}[0-9]{2}";
Regex regex = new Regex(pattS);
var res = regex.Matches(st);
foreach (var re in res)
{
}
check here on rubular it covers all examples you provide

You can use this pattern:
^([0-9])(?:\1|[a-z](?!.*[a-z])){7}|[a-z]([0-9])\2{6}$

With Regex, you can do it in two steps. First you can remove the character, in whatever position it is:
string input = "111a1111";
Regex rgx = new Regex(#"[a-zA-Z]");
string output=rgx.Replace(input,"",1); // remove only one character
// output = "1111111"
then you can match with [0-9]{7} (if you don't want all digits to be the same)
or with ^(\d)\1{6}$ (if you want 7 occurrences of the same digit)

Regex Substring or Left Equivalent

Greetings beloved comrades.
I cannot figure out how to accomplish the following via a regex.
I need to take this format number 201101234 and transform it to 11-0123401, where digits 3 and 4 become the digits to the left of the dash, and the remaining five digits are inserted to the right of the dash, followed by a hardcoded 01.
I've tried http://gskinner.com/RegExr, but the syntax just defeats me.
This answer, Equivalent of Substring as a RegularExpression, sounds promising, but I can't get it to parse correctly.
I can create a SQL function to accomplish this, but I'd rather not hammer my server in order to reformat some strings.
Thanks in advance.

You can try this:
var input = "201101234";
var output = Regex.Replace(input, #"^\d{2}(\d{2})(\d{5})$", "${1}-${2}01");
Console.WriteLine(output); // 11-0123401
This will match:
two digits, followed by
two digits captured as group 1, followed by
five digits captured as group 2
And return a string which replaces that matched text with
group 1, followed by
a literal hyphen, followed by
group 2, followed by
a literal 01.
The start and end anchors ( ^ / $ ) ensure that if the input string does not exactly match this pattern, it will simply return the original string.

If you can use custom C# scripts, you may want to use Substring instead:
string newStr = string.Format("{0}-{1}01", old.Substring(2,2), old.Substring(4));

I don't think you really need a regex here. Substring would be better. But still if you want regex only, you can use this:
string newString = Regex.Replace(input, #"^\d{2}(\d{2})(\d+)$", "$1-${2}01");
Explanation:
^\d{2} // Match first 2 digits. Will be ignored
(\d{2}) // Match next 2 digits. Capture it in group 1
(\d+)$ // Match rest of the digits. Capture it in group 2
Now, the required digits, are in group 1 and 2, which you use in the replacement string.

Do you even SQL? Pull some levers and stuff.

Using Regex.Split to remove anything non numeric and splitting on -

I'm not sure why but for some reason The Regex Split method is going over my head. I'm trying to look through tutorials for what I need and can't seem to find anything.
I simply am reading an excel doc and want to format a string such as $145,000-$179,999 to give me two strings. 145000 and 179999. At the same time I'd like to prune a string such as '$180,000-Limit to simply 180000.
var loanLimits = Regex.Matches(Result.Rows[row + 2 + i][column].ToString(), #"\d+");
The above code seems to chop '$145,000-$179,999 up into 4 parts: 145, 000, 179, 999. Any ideas on how to achieve what I'm asking?

Regular expressions match exactly character by character (there's no knowledge of the concept of a "number" or a "word" in regular expressions - you have to define that yourself in your expression). The expression you are using, \d+, uses the character class \d, which means any digit 0-9 (and + means match one or more). So in the expression $145,000, notice that the part you are looking for is not just composed of digits; it also includes commas. So the regular expression finds every continuous group of characters that matches your regular expression, which are the four groups of numbers.
There are a couple of ways to approach the problem.
Include , in your regular expression, so (\d|,)+, which means match as many characters in a row that are either a digit or a comma. There will be two matches: 145,000 and 179,999, from which you can further remove the commas with myStr.Replace(",", ""). (DEMO)
Do as you say in the title, and remove all non-numeric characters. So you could use Regex.Replace with the expression [^\d-]+ - which means match anything that is not a digit or a hyphen - and then replace those with "". Then the result would be 145000-179999, which you can split with a simple non-regular-expression split, myStr.Split('-'), to get your two parts. (DEMO)
Note that for your second example ($180,000-Limit), you'll need an extra check to count the number of results returned from Match in the first example, and Split in the second example to determine whether there were two numbers in the range, or only a single number.

you can try to treat each string separately by spiting it based on - and extraction only numbers from it
ArrayList mystrings = new ArrayList();
List<string> myList = Result.Rows[row + 2 + i][column].ToString().Split('-').ToList();
foreach(var item in myList)
{
string result = Regex.Replace(item, #"[^\d]", "");
mystrings.Add(result);
}

An alternative to using RegEx is to use the built in string and char methods in the DotNet framework. Assuming the input string will always have a single hypen:
string input = "$145,000-$179,999";
var split = input.Split( '-' )
.Select( x => string.Join( "", x.Where( char.IsLetterOrDigit ) ) )
.ToList();
string first = split.First(); //145000
string second = split.Last(); //179999
first you split the string using the standard Split method
then you create a new string by selectively taking only Letters or Digits from each item in the collection: x.Where...
then you join the string using the standard Join method
finally, take the first and last item in the collection for your 2 strings.

Match pattern of [0-9]-[0-9]-[0-9], but without matching [0-9]-[0-9]

I'm not sure how to accomplish this with a regular expression (or if I can; I'm new to regex). I have an angle value the user will type in and I'm trying to validate the entry. It is in the form degrees-minutes-seconds. The problem I'm having, is that if the user mistypes the seconds portion, I have to catch that error, but my match for degrees-minutes is a success.
Perhaps the method will explain better:
private Boolean isTextValid(String _angleValue) {
Regex _degreeMatchPattern = new Regex("0*[1-9]");
Regex degreeMinMatchPattern = new Regex("(0*[0-9]-{1}0*[0-9]){1}");
Regex degreeMinSecMatchPattern = new Regex("0*[0-9]-{1}0*[0-9]-{1}0*[0-9]");
Match _degreeMatch, _degreeMinMatch, _degreeMinSecMatch;
_degreeMinSecMatch = degreeMinSecMatchPattern.Match(_angleValue);
if (_degreeMinSecMatch.Success)
return true;
_degreeMinMatch = degreeMinMatchPattern.Match(_angleValue);
if (_degreeMinMatch.Success)
return true;
_degreeMatch = _degreeMatchPattern.Match(_angleValue);
if (_degreeMatch.Success)
return true;
return false;
}
}
I want to check for degrees-minutes if the degrees-minutes-seconds match is unsuccessful, but only if the user didn't enter any seconds data. Can I do this via regex, or do I need to parse the string and evaluate each portion separately? Thanks.
EDIT: Sample data would be 45-23-10 as correct data. The problem is 45-23 is also valid data; the 0 seconds is understood. So if the user types 45-23-1= on accident, the degreeMinMatchPattern regex in my code will match succesfully, even though it is invalid.
Second EDIT: Just to make it clear, the minutes and second portions are both optional. The user can type 45 and that is valid.

You can specify "this part of the pattern must match at least 3 times" using the {m,} syntax. Since there are hyphens between each component, specify the first part separately, and then each hyphen-digit combination can be grouped together after:
`[0-9](-[0-9]){2,}`
You also can shorten [0-9] to \d: \d(-\d){2,}

First off, a character in a regex is matched once by default, so {1} is redundant.
Second, since you can apparently isolate this value (you prompt for just this value, instead of having to look for it in a paragraph of entered data) you should include ^ and $ in your string, to enforce that the string should contain ONLY this pattern.
Try "^\d{1,3}-\d{1,2}(-\d{1,2})?$".
Breaking it down: ^ matches the beginning of the string. \d matches any single decimal character, and then behind that you're specifying {1,3} which will match a set of one to three occurrences of any digit. Then you're looking for one dash, then a similar decimal pattern but only one or two times. The last term is enclosed in parenthesis so we can group the characters. Its form is similar to the first two, then there's a ? which marks the preceding character group as optional. The $ at the end indicates that the input should end. Given this, it will match 222-33-44 or 222-33, but not 222-3344 or 222-33-abc.
Keep in mind there are additional rules you might want to incorporate. For instance, seconds can be expressed as a decimal (if you want a resolution smaller than one second). You would need to optionally expect the decimal point and one or more additional digits. Also, you probably have a maximum degree value; the above regex will match the maximum integer DMS value of 359-59-59, however it will also match 999-99-99 which is not valid. You can limit the maximum value using regex (for example "(3[0-5]\d|[1-2]\d{2}|\d{1,2})" will match any number from 0 to 359, by matching a 3, then 0-5, then 0-9, OR any 3-digit number starting with 1 or 2, OR any two-digit number), but as the example shows the regex will get long and messy, so document it well in code as to what you're doing.

Maybe you would do better to just parse the input out and check is piece separately.

I'm not sure I understand correctly, but I think
(?<degrees>0*[0-9])-?(?<minutes>0*[0-9])(?:-?(?<seconds>0*[0-9]))?$
might work.
But this is quite ambiguous; also I'm wondering why you're only allowing single-digit degree/minute/second values. Please show some examples you do and don't want to match.

Maybe you should to try something like this and test for empty/invalid groups:
Regex degrees = new Regex(
#"(?<degrees>\d+)(?:-(?<minutes>\d+))?(?:-(?<seconds>\d+))?");
string[] samples = new []{ "123", "123-456", "123-456-789" };
foreach (var sample in samples)
{
Match m = degrees.Match(sample);
if(m.Success)
{
string degrees = m.Groups["degrees"].Value;
string minutes = m.Groups["minutes"].Value;
string seconds = m.Groups["seconds"].Value;
Console.WriteLine("{0}°{1}'{2}\"", degrees,
String.IsNullOrEmpty(minutes) ? "0" : minutes,
String.IsNullOrEmpty(seconds) ? "0" : seconds
);
}
}

I need a regex that validates for minimum 7 digits in the given string

I wanna validate a phone number.
My condition is that I want mimimum 7 numbers in the given string, ignoring separators, X, parantheses.
Actually I want to achieve this function in regex:
Func<string, bool> Validate = s => s.ToCharArray().Where(char.IsDigit).Count() >= 7;
Func<string, bool> RegexValidate = s => System.Text.RegularExpressions.Regex.IsMatch(s, #"regex pattern should come here.")
string x = "asda 1234567 sdfasdf";
string y = "asda sdfa 123456 sdfasdf";
bool xx = Validate(x); //true
bool yy = Validate(y); //false
The purpose of my need is I want to include this regex in an asp:RegularExpressionValidator

Seven or more digits, mixed with any number of any other kind of character? That doesn't seem like a very useful requirement, but here you go:
^\D*(?:\d\D*){7,}$

(?:\d.*){7,}
(?:...) - group the contained pattern into an atomic unit
\d - match a digit
.* match 0 or more of any character
{7,} match 7 or more of the preceeding pattern
If the only separators you want to ignore are spaces, dashes, parentheses, and the character 'X', then use this instead:
(?:\d[- ()X]*){7,}
[...] creates a character class, matching any one of the contained characters
The difference being, for example, that the first regex will match "a1b2c3d4e5f6g7h", and the second one won't.
As Gregor points out in the comments, the choice of regex depends on what function you're using it with. Some functions expect a regex to match the entire string, in which case you should add an extra .* in front to match any padding before the 7 digits. Some only expect a regex to match part of a string (which is what I expected in my examples).
According to the documentation for IsMatch() it only "indicates whether the regular expression finds a match in the input string," not requires it to match the entire string, so you shouldn't need to modify my examples for them to work.

Why do you want to use regular expressions for this? The first Validate function you posted which simply counts the number of digits is vastly more comprehensible, and probably faster as well. I'd just ditch the unnecessary ToCharArray call, collapse the predicate into the Count function and be done with it:
s.Count(char.IsDigit) >= 7;
Note that if you only want to accept 'normal' numbers (i.e. 0-9) then you'd want to change the validation function, as IsDigit matches many different number representations, e.g.
s.Count(c => c >= '0' && c <= '9') >= 7;

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.