Regular expression to validate string format to specific length with csv - c#

I am trying to write a regular expression which validates a text box to have only digits with length 5 or 9. I found the below regular expression to get this done
^\d{1,5}([,]\d{5})*$
but it could not fix my requirement correctly, Can any one please help me in modifying or writing a new regular expression which supports below pattern.
09103,09101, valid (ending with comma)
09103,09101 valid (not ending with comma)
12345,1234567 Invalid (should not support if 1st digit is length 5 and 2nd less than 9)
12345,123456789 valid (must support only digit length 5 or 9)

Please try the following:
var lines = new []
{
"09103,09101,",
"09103,09101",
"12345,1234567",
"12345,123456789",
"12345"
};
var re = new Regex(#"^\d{1,5}(,(\d{5}|\d{9}))?,?$");
foreach (var line in lines)
{
Console.WriteLine("{0} = {1}", line, re.IsMatch(line) ? "Valid" : "Invalid");
}
Output
09103,09101, = Valid
09103,09101 = Valid
12345,1234567 = Invalid
12345,123456789 = Valid
12345 = Valid
You can run it here: C# Fiddle

Just make the part which was preceded by comma to optional, so that it would match only 12345 or 12345,
^(?:\d{5}|\d{9})(?:,(?:\d{5}|\d{9}))?,?$
DEMO

Try this, for your exact test cases.
^\d{5},?$|^\d{5},\d{5},?$|^\d{5},\d{9},?$
It uses the | character to separate 'alternative' patterns, read it as "or". I.e.
^\d{5}$ OR ^\d{5},\d{5}$ OR ^\d{5},\d{9}$

Related

REGEX Matching string nonconsecutively

I'm trying to understand how to match a specific string that's held within an array (This string will always be 3 characters long, ex: 123, 568, 458 etc) and I would match that string to a longer string of characters that could be in any order (9841273 for example). Is it possible to check that at least 2 of the 3 characters in the string match (in this example) strMoves? Please see my code below for clarification.
private readonly string[] strSolutions = new string[8] { "123", "159", "147", "258", "357", "369", "456", "789" };
Private Static string strMoves = "1823742"
foreach (string strResult in strSolutions)
{
Regex rgxMain = new Regex("[" + strMoves + "]{2}");
if (rgxMain.IsMatch(strResult))
{
MessageBox.Show(strResult);
}
}
The portion where I have designated "{2}" in Regex is where I expected the result to check for at least 2 matching characters, but my logic is definitely flawed. It will return true IF the two characters are in consecutive order as compared to the string in strResult. If it's not in the correct order it will return false. I'm going to continue to research on this but if anyone has ideas on where to look in Microsoft's documentation, that would be greatly appreciated!
Correct order where it would return true: "144257" when matched to "123"
incorrect order: "35718" when matched to "123"
The 3 is before the 1, so it won't match.
You can use the following solution if you need to find at least two different not necessarily consecutive chars from a specified set in a longer string:
new Regex($#"([{strMoves}]).*(?!\1)[{strMoves}]", RegexOptions.Singleline)
It will look like
([1823742]).*(?!\1)[1823742]
See the regex demo.
Pattern details:
([1823742]) - Capturing group 1: one of the chars in the character class
.* - any zero or more chars as many as possible (due to RegexOptions.Singleline, . matches any char including newline chars)
(?!\1) - a negative lookahead that fails the match if the next char is a starting point of the value stored in the Group 1 memory buffer (since it is a single char here, the next char should not equal the text in Group 1, one of the specified digits)
[1823742] - one of the chars in the character class.

Regex to extract data from card swipe

I'm searching for a regular expression that will extract 0263563
from
;010263563=2119?
and 0267829
from
%00000026782904?;010267829=4119?
(Must be the same regular expression).
Start at the 3rd character after the semicolon and take 7 characters.
;..(\d{7})
or more general:
;..(.{7})
Or according to comment : "To clarify characters to take are digits"
;\d\d(\d{7})
Well, you need this:
;\d{2}(\d+)
The First group will contain the number you want.
;[\S]{2}([\S]{7})
Since you said characters and not numbers, but it would work either way
For rules that are that precise (start 3 chars after ;, then next 7), you could use a plain substring:
string s = "%00000026782904?;010267829=4119?";
var pos = s.IndexOf(';');
var number = s.Substring(pos+3, 7);
And of course, test whether that IndexOf really found the ;
The below regex would exactly 7 digits which must be preceded by a ; , any two characters.
(?<=;.{2})\d{7}
Code:
String input = #";010263563=2119?
%00000026782904?;010267829=4119?";
Regex rgx = new Regex(#"(?<=;.{2})\d{7}");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Groups[0].Value);
IDEONE
Output:
0263563
0267829
Assuming regex is not an absolute requirement, you could use:
var input = "%00000026782904?;010267829=4119?";
// output will be: 0267829
var digits = input.SkipWhile(x => x != ';').Skip(3).Take(7).ToArray();
var output = new string(digits);

Simple regex failing test to determine a 3 digit number

I have had a difficult time wrapping my head around regular expressions. In the following code, I used a Regex to determine if the data passed was a 1 to 3 digit number. The expression worked if the data started with a number (ex. "200"), but also passed if the data had a letter not in the first digit (ex. "3A5"). I managed to handle the error with the INT32.TryParse() method, but it seems there should be an easier way.
if (LSK == MainWindow.LSK6R)
{
int ci;
int length = SP_Command.Length;
if (length > 3) return MainWindow.ENTRY_OUT_OF_RANGE; //Cannot be greater than 999
String pattern = #"[0-9]{1,3}"; //RegEx pattern for 1 to 3 digit number
if (Regex.IsMatch(SP_Command, pattern)) //Does not check for ^A-Z. See below.
{
bool test = Int32.TryParse(SP_Command, out ci); //Trying to parse A-Z. Only if
if (test) //it no letter will it succeed
{
FlightPlan.CostIndex = ci; //Update the flightplan CI
CI.Text = ci.ToString(); //Update the Init page
}
else return MainWindow.FORMAT_ERROR; //It contained a letter
}
else return MainWindow.FORMAT_ERROR; //It didn't fit the RegEx
}
Regex.IsMatch searches the input string for the pattern (and thus returns true for 3A5 because it finds 3).
You should also include start (^) and end ($) of string:
String pattern = #"^[0-9]{1,3}$";
Adding line begin/end should help.
^[0-9]{1,3}$

How to find repeatable characters

I can't understand how to solve the following problem:
I have input string "aaaabaa" and I'm trying to search for string "aa" (I'm looking for positions of characters)
Expected result is
0 1 2 5
aa aabaa
a aa abaa
aa aa baa
aaaab aa
This problem is already solved by me using another approach (non-RegEx).
But I need a RegEx I'm new to RegEx so google-search can't help me really.
Any help appreciated! Thanks!
P.S.
I've tried to use (aa)* and "\b(\w+(aa))*\w+" but those expressions are wrong
You can solve this by using a lookahead
a(?=a)
will find every "a" that is followed by another "a".
If you want to do this more generally
(\p{L})(?=\1)
This will find every character that is followed by the same character. Every found letter is stored in a capturing group (because of the brackets around), this capturing group is then reused by the positive lookahead assertion (the (?=...)) by using \1 (in \1 there is the matches character stored)
\p{L} is a unicode code point with the category "letter"
Code
String text = "aaaabaa";
Regex reg = new Regex(#"(\p{L})(?=\1)");
MatchCollection result = reg.Matches(text);
foreach (Match item in result) {
Console.WriteLine(item.Index);
}
Output
0
1
2
5
The following code should work with any regular expression without having to change the actual expression:
Regex rx = new Regex("(a)\1"); // or any other word you're looking for.
int position = 0;
string text = "aaaaabbbbccccaaa";
int textLength = text.Length;
Match m = rx.Match(text, position);
while (m != null && m.Success)
{
Console.WriteLine(m.Index);
if (m.Index <= textLength)
{
m = rx.Match(text, m.Index + 1);
}
else
{
m = null;
}
}
Console.ReadKey();
It uses the option to change the start index of a regex search for each consecutive search. The actual problem comes from the fact that the Regex engine, by default, will always continue searching after the previous match. So it will never find a possible match within another match, unless you instruct it to by using a Look ahead construction or by manually setting the start index.
Another, relatively easy, solution is to just stick the whole expression in a forward look ahead:
string expression = "(a)\1"
Regex rx2 = new Regex("(?=" + expression + ")");
MatchCollection ms = rx2.Matches(text);
var indexes = ms.Cast<Match>().Select(match => match.Index);
That way the engine will automatically advance the index by one for every match it finds.
From the docs:
When a match attempt is repeated by calling the NextMatch method, the regular expression engine gives empty matches special treatment. Usually, NextMatch begins the search for the next match exactly where the previous match left off. However, after an empty match, the NextMatch method advances by one character before trying the next match. This behavior guarantees that the regular expression engine will progress through the string. Otherwise, because an empty match does not result in any forward movement, the next match would start in exactly the same place as the previous match, and it would match the same empty string repeatedly.
Try this:
How can I find repeated characters with a regex in Java?
It is in java, but the regex and non-regex way is there. C# Regex is very similar to the Java way.

Regex: replace inner string

I'm working with X12 EDI Files (Specifically 835s for those of you in Health Care), and I have a particular vendor who's using a non-HIPAA compliant version (3090, I think). The problem is that in a particular segment (PLB- again, for those who care) they're sending a code which is no longer supported by the HIPAA Standard. I need to locate the specific code, and update it with a corrected code.
I think a Regex would be best for this, but I'm still very new to Regex, and I'm not sure where to begin. My current methodology is to turn the file into an array of strings, find the array that starts with "PLB", break that into an array of strings, find the code, and change it. As you can guess, that's very verbose code for something which should be (I'd think) fairly simple.
Here's a sample of what I'm looking for:
~PLB|1902841224|20100228|49>KC15X078001104|.08~
And here's what I want to change it to:
~PLB|1902841224|20100228|CS>KC15X078001104|.08~
Any suggestions?
UPDATE: After review, I found I hadn't quite defined my question well enough. The record above is an example, but it is not necessarilly a specific formatting match- there are three things which could change between this record and some other (in another file) I'd have to fix. They are:
The Pipe (|) could potentially be any non-alpha numeric character. The file itself will define which character (normally a Pipe or Asterisk).
The > could also be any other non-alpha numeric character (most often : or >)
The set of numbers immediately following the PLB is an identifier, and could change in format and length. I've only ever seen numeric Ids there, but technically it could be alpha numeric, and it won't necessarilly be 10 characters.
My Plan is to use String.Format() with my Regex match string so that | and > can be replaced with the correct characters.
And for the record. Yes, I hate ANSI X12.
Assuming that the "offending" code is always 49, you can use the following:
resultString = Regex.Replace(subjectString, #"(?<=~PLB|\d{10}|\d{8}|)49(?=>\w+|)", "CS");
This looks for 49 if it's the first element after a | delimiter, preceded by a group of 8 digits, another |, a group of 10 digits, yet another |, and ~PLB. It also looks if it is followed by >, then any number of alphanumeric characters, and one more |.
With the new requirements (and the lucky coincidence that .NET is one of the few regex flavors that allow variable repetition inside lookbehind), you can change that to:
resultString = Regex.Replace(subjectString, #"(?<=~PLB\1\w+\1\d{8}(\W))49(?=\W\w+\1)", "CS");
Now any non-alphanumeric character is allowed as separator instead of | or > (but in the case of | it has to be always the same one), and the restrictions on the number of characters for the first field have been loosened.
Another, similar approach that works on any valid X12 file to replace a single data value with another on a matching segment:
public void ReplaceData(string filePath, string segmentName,
int elementPosition, int componentPosition,
string oldData, string newData)
{
string text = File.ReadAllText(filePath);
Match match = Regex.Match(text,
#"^ISA(?<e>.).{100}(?<c>.)(?<s>.)(\w+.*?\k<s>)*IEA\k<e>\d*\k<e>\d*\k<s>$");
if (!match.Success)
throw new InvalidOperationException("Not an X12 file");
char elementSeparator = match.Groups["e"].Value[0];
char componentSeparator = match.Groups["c"].Value[0];
char segmentTerminator = match.Groups["s"].Value[0];
var segments = text
.Split(segmentTerminator)
.Select(s => s.Split(elementSeparator)
.Select(e => e.Split(componentSeparator)).ToArray())
.ToArray();
foreach (var segment in segments.Where(s => s[0][0] == segmentName &&
s.Count() > elementPosition &&
s[elementPosition].Count() > componentPosition &&
s[elementPosition][componentPosition] == oldData))
{
segment[elementPosition][componentPosition] = newData;
}
File.WriteAllText(filePath,
string.Join(segmentTerminator.ToString(), segments
.Select(e => string.Join(elementSeparator.ToString(),
e.Select(c => string.Join(componentSeparator.ToString(), c))
.ToArray()))
.ToArray()));
}
The regular expression used validates a proper X12 interchange envelope and assures that all segments within the file contain at least a one character name element. It also parses out the element and component separators as well as the segment terminator.
Assuming that your code is always a two digit number that comes after a pipe character | and before the greater than sign > you can do it like this:
var result = Regex.Replace(yourString, #"(\|)(\d{2})(>)", #"$1CS$3");
You can break it down with regex yes.
If i understand your example correctly the 2 characters between the | and the > need to be letters and not digits.
~PLB\|\d{10}\|\d{8}\|(\d{2})>\w{14}\|\.\d{2}~
This pattern will match the old one and capture the characters between the | and the >. Which you can then use to modify (lookup in a db or something) and do a replace with the following pattern:
(?<=|)\d{2}(?=>)
This will look for the ~PLB|#|#| at the start and replace the 2 numbers before the > with CS.
Regex.Replace(testString, #"(?<=~PLB|[0-9]{10}|[0-9]{8})(\|)([0-9]{2})(>)", #"$1CS$3")
The X12 protocol standard allows the specification of element and component separators in the header, so anything that hard-codes the "|" and ">" characters could eventually break. Since the standard mandates that the characters used as separators (and segment terminators, e.g., "~") cannot appear within the data (there is no escape sequence to allow them to be embedded), parsing the syntax is very simple. Maybe you're already doing something similar to this, but for readability...
// The original segment string (without segment terminator):
string segment = "PLB|1902841224|20100228|49>KC15X078001104|.08";
// Parse the segment into elements, then the fourth element
// into components (bounds checking is omitted for brevity):
var elements = segment.Split('|');
var components = elements[3].Split('>');
// If the first component is the bad value, replace it with
// the correct value (again, not checking bounds):
if (components[0] == "49")
components[0] = "CS";
// Reassemble the segment by joining the components into
// the fourth element, then the elements back into the
// segment string:
elements[3] = string.Join(">", components);
segment = string.Join("|", elements);
Obviously more verbose than a single regular expression but parsing X12 files is as easy as splitting strings on a single character. Except for the fixed length header (which defines the delimiters), an entire transaction set can be parsed with Split:
// Starting with a string that contains the entire 835 transaction set:
var segments = transactionSet.Split('~');
var segmentElements = segments.Select(s => s.Split('|')).ToArray();
// segmentElements contains an array of element arrays,
// each composite element can be split further into components as shown earlier
What I found is working is the following:
parts = original.Split(record);
for(int i = parts.Length -1; i >= 0; i--)
{
string s = parts[i];
string nString =String.Empty;
if (s.StartsWith("PLB"))
{
string[] elems = s.Split(elem);
if (elems[3].Contains("49" + subelem.ToString()))
{
string regex = string.Format(#"(\{0})49({1})", elem, subelem);
nString = Regex.Replace(s, regex, #"$1CS$2");
}
I'm still having to split my original file into a set of strings and then evaluate each string, but the that seams to be working now.
If anyone knows how to get around that string.Split up at the top, I'd love to see a sample.

Categories