Realtime validation and formatting of textbox input with regex

Realtime validation and formatting of textbox input with regex - c#

I have WPF application where the users are able to change the product number of a product connected to the computer with USB. The sticker on the product lists the product number in this format: 111 22 33-44.
Today the users may only enter digits (111223344) in the textbox. The input is validated with regex that checks for nine digits. But now the client wants the users to be able to either:
Enter the number as digits only and format the string as it is being typed. When the user has typed "1112" it should automatically be formatted as "111 2" in the textbox and so on. When user has entered all nine digits it should look like 111 22 33-44
Enter the number as it is written on the sticker (with spaces, etc).
But at the same time the product number must be validated to include only nine digits. The spaces and "-" must be invisible
I could've easily solved this in code, but the problem here is that this validation/formatting must be fully configurable in a config file. There are various categories of products that can be serviced by this application and the format of the product number may vary.
Is this solvable in a fairly easy way with regex? I really can't see how i can combine the two, validation and formatting:
^\d{9}$ - for validating nine digits
(\w{3})(\w{2})?(\w{2})?(\w{2})? - for formatting together with replacement pattern $1 $2 $3-$4. This pattern does however only format nine digits without spaces and "-"
Any suggestions?
EDIT:
It seems like i would need to use at least three regex patterns for this to work:
for validating the valid product number (not the display value). Is it 9 digits?
for formatting the display value (123456789 = 123 45 67-89)
stripping the added characters from the formatting (blanks and -)
Maybe a simpler solution would be to keep the current validation (for example ^\d{9}$) that validates the raw value, and then simply add a setting called DisplayMask where the people that are responsible for the configuration can enter something like this:
"### ## ##-##"
And then i write code that uses this mask for formatting the display value. This has several advantages:
Very easy to understand for the people responsible for the configurations
This will also enable me to easily retrieve all character that needs to be stripped from the entered value by simply getting the displaymask and remove all #. The characters that are left are the ones that must be stripped from the product number before they're written to the product hardware.
Also makes it very easy to set the max length of the textbox. Max length of product number raw value + number of added characters from display mask.

If you'd like to make this configurable, the key is the Regex class in System.Text.RegularExpressions namespace.
Storing the regular expression in an external config file could then be easily imported and used in matching, such as Regex.IsMatch(), especially IsMatch(string, string).

Have a look at this one Regex in PreviewTextInput: only decimals between 0.0 and 1.0
you can use PreviewTextInput event here,then use Regex.Replace to set the format, here is an example
string inputStr = "111223";//e.Text;
inputStr = Regex.Replace(inputStr, #"\D", string.Empty);
if (inputStr.Length > 0)
{
inputStr = inputStr.Substring(0, Math.Min(9, inputStr.Length));
List<string[]> tmp = new List<string[]>() { new string[] { "7", "-" }, new string[] { "5", " " }, new string[] { "3", " " } };
foreach (var arr in tmp)
{
inputStr = Regex.Replace(inputStr, #"(?<=^\d{" + arr[0] + "})", arr[1]);
}
}
Console.WriteLine(inputStr);

First off, after the text is formatted you have an entry with a length of 12. I would set the TextBox.MaxLength = 12 to limit the amount of data that can be entered.
As far as validating, there's probably a "cleaner" way of doing this, but to start with you can have a series of Regex.IsMatch() conditions that will auto format the input.
For example:
1112 => 111 2
111 223 => 111 22 3
111 22 334 => 111 22 33-4
Then there's a final Regex.IsMatch() check that the input is in the format of
### ## ##-##
Code Sample:
private void textBox1_TextChanged(object sender, EventArgs e)
{
string text = textBox1.Text;
if (Regex.IsMatch(text, "\\d{4}"))
{
textBox1.Text = Regex.Replace(text, "(\\d{3})(\\d)", "$1 $2");
}
else if (Regex.IsMatch(text, "\\d{3} \\d{3}"))
{
textBox1.Text = Regex.Replace(text, "(\\d{3} \\d{2})(\\d)", "$1 $2");
}
else if (Regex.IsMatch(text, "\\d{3} \\d{2} \\d{3}"))
{
textBox1.Text = Regex.Replace(text, "(\\d{3} \\d{2} \\d{2})(\\d)", "$1-$2");
}
else if (!Regex.IsMatch(text, "\\d{3} \\d{2} \\d{2}-\\d{2}"))
{
// Invalid entry
}
// Keep the cursor at the end of the input
textBox1.SelectionStart = textBox1.Text.Length;
}

Related

How to extract digits between two fixed strings in Arabic Language?

I have a string in the format:
خصم بقيمة 108 بتاريخ 31-01-2021
And I want to replace the digits between the words: بقيمة & بتاريخ with a "?" character.
And keep the digits in the date part of the string
I tried using this Regular Expression: (?<=بقيمة)(.*?)(?=بتاريخ)
Which works on https://regex101.com/
But when I implement it in C# in Regex.Replace function, it doesn't have any effect when I use the Arabic words:
e.Row.Cells[3].Text = Regex.Replace(e.Row.Cells[3].Text, "(?<=بقيمة)(.*?)(?=بتاريخ)", "?");
But it works if I use Latin letters:
e.Row.Cells[3].Text = Regex.Replace(e.Row.Cells[3].Text, "(?<=X)(.*?)(?=Y)", "?");
Is there anyway to make the function work with Arabic characters?
Or is there a better approach I can take to achieve the desired result? For example excluding the date part?

Since the needed digits (without "-"s) are bookended by spaces just use \s(\d+)\s.
var txt = "خصم بقيمة 108 بتاريخ 12-31-2021";
var pattern = #"\s(\d+)\s";
Console.WriteLine( Regex.Match(txt, pattern).Value ); // 108

Display numbers with 20 in front of var

How can I display numbers with 20 in front of the var? (i.e. \20172018\INV instead of \1718\INV)

Insert your 20 at particular places by building a new string from substrings. Assuming your variable is in a string called s:
string newString = s.Substring(0,1) + "20" + s.Substring(1,2) + "20" + s.Substring(3,6);

How do you determine where the 20 goes? Is it after every two digits??
The regular expression var r = Regex.Replace("1718\\NV", "\\d{2}", "20$&"); for example assumes that you are looking for 2 digits each time (that's what the {2} means) and places the 20 in front of each pair of digits. It returns 20172018\\NV. If you want to place the 20 in front of each digit separately, then modify the RegEx pattern (2nd parameter). var r = Regex.Replace("1718\\NV", "\\d", "20$&"); returns 201207201208\\NV

Extracting Titles from strings with RegEx

I'm facing a problem caused by having to extract titles of programs from small pieces of strings whose structure can't be predicted at all. There are some patterns like you can see below, and each string must be evaluated to see if it matches any of those structures to get me able to properly get the title.
I've bought Mastering Regular Expressions but the time that I have to accomplish this doesn't allow me to be studing the book and trying to get the necessary introduction to this (interesting but particular) Theme.
Perharps, someone experienced in this area could help me to understand how to accomplish this job?
Some random Name 2 - Ep.1
=> Some random Name 2
Some random Name - Ep.1
=> Some random Name
Boff another 2 name! - Ep. 228
=> Boff another 2 name!
Another one & the rest - T1 Ep. 2
=>Another one & the rest
T5 - Ep. 2 Another Name
=> Another Name
T3 - Ep. 3 - One More with an Hyfen
=> One More with an Hyfen
Another one this time with a Date - 02/12/2012
=>Another one this time with a Date
10 Aug 2012 - Some Other 2 - Ep. 2
=> Some Other 2
Ep. 93 - Some program name
=> Some Program name
Someother random name - Epis. 1 e 2
=> Someother random name
The Last one with something inside parenthesis (V.O.)
=> The Last one with something inside parenthesis
As you may see the titles that I want to extract from the given string may have Numbers, special characters like &, and characters from a-zA-Z (i guess that's all)
The complex part comes when having to know if it has one space or more after the title and is followed by a hyphen and if it haves zero or more spaces until Ep. (i can't explain this, it's just complex.)

This program will handle your cases. The main principle is that it removes a certain sequence if present in the beginnign or the end of the string. You'll have to maintain the list of regular expressions if the format of the strings you want to remove will change or change the order of them as needed.
using System;
using System.Text.RegularExpressions;
public class MyClass
{
static string [] strs =
{
"Some random Name 2 - Ep.1",
"Some random Name - Ep.1",
"Boff another 2 name! - Ep. 228",
"Another one & the rest - T1 Ep. 2",
"T5 - Ep. 2 Another Name",
"T3 - Ep. 3 - One More with an Hyfen",
#"Another one this time with a Date - 02/12/2012",
"10 Aug 2012 - Some Other 2 - Ep. 2",
"Ep. 93 - Some program name",
"Someother random name - Epis. 1 e 2",
"The Last one with something inside parenthesis (V.O.)"};
static string [] regexes =
{
#"T\d+",
#"\-",
#"Ep(i(s(o(d(e)?)?)?)?)?\s*\.?\s*\d+(\s*e\s*\d+)*",
#"\d{2}\/\d{2}\/\d{2,4}",
#"\d{2}\s*[A-Z]{3}\s*\d{4}",
#"T\d+",
#"\-",
#"\!",
#"\(.+\)",
};
public static void Main()
{
foreach(var str in strs)
{
string cleaned = str.Trim();
foreach(var cleaner in regexes)
{
cleaned = Regex.Replace(cleaned, "^" + cleaner, string.Empty, RegexOptions.IgnoreCase).Trim();
cleaned = Regex.Replace(cleaned, cleaner + "$", string.Empty, RegexOptions.IgnoreCase).Trim();
}
Console.WriteLine(cleaned);
}
Console.ReadKey();
}

If it's only about checking for patterns, and not actually extracting the title name, let me have a go:
With #"Ep(is)?\.?\s*\d+" you can check for strings such as "Ep1", "Ep01", "Ep.999", "Ep3", "Epis.0", "Ep 11" and similar (it also detects multiple whitespaces between Ep and the numeral).
You may want to use the RegexOptions.IgnoreCase in case you want to match "ep1" as well as "Ep1" or "EP1"
If you are certain, that no name will include a "-" and that this character separates name from episode-info, you can try to split the string like this:
string[] splitString = inputString.Split(new char[] {'-'});
foreach (string s in splitString)
{
s.Trim() // removes all leading or trailing whitespaces
}
You'll have the name in either splitString[0] or splitString[1] and the episode-info in the other.
To search for dates, you can use this: #"\d{1,4}(\\|/|.|,)\d{1,2}(\\|/|.|,)\d{1,4}" which can detect dates with the year to the front or the back written with 1 to 4 decimals (except for the center value, which can be 1 to 2 decimals long) and separated with a back-slash, a slash, a comma or a dot.
Like I mentioned before: this will not allow your program to extract the actual title, only to find out if such strings exist (those strings may still be part of the title itself)
Edit:
A way to get rid of multiple whitespaces is to use inputString = Regex.Replace(inputString, "\s+", " ") which replaces multiple whitespaces with a single whitespace. Maybe you have underscores instead of whitespaces? Such as: "This_is_a_name", in which case you might want to use inputString = Regex.Replace(inputString, "_+", " ") before removing the multiple whitespaces.

C# - Removing a Line that matches a Regex

I have some data.. it looks similar to this:
0423 222222 ADH, TEXTEXT
0424 1234 ADH,MORE TEXT
0425 98765 ADH, TEXT 3609
2000 98765-4 LBL,IUC,PCA,S/N
0010 99999-27 LBL,IUI,1.0x.25
9000 12345678 HERE IS MORE, TEXT
9010 123-123 SOMEMORE,TEXT1231
9100 SD178 YAYFOR, TEXT01
9999 90123 HEY:HOW-TO DOTHIS
And I would like to remove each entire line that begins with a 9xxx. Right now I have tried Replacing the value using Regex. Here is what I have for that:
output = Regex.Replace(output, #"^9[\d]{3}\s+[\d*\-*\w*]+\s+[\d*\w*\-*\,*\:*\;*\.*\d*\w*]+", "");
However, this is really hard to read and it actually does not delete the entire line.
CODE:
Here is the section of the code I am using:
try
{
// Resets the formattedTextRichTextBox so multiple files aren't loaded on top of eachother.
formattedTextRichTextBox.ResetText();
foreach (string line in File.ReadAllLines(openFile.FileName))
{
// Uses regular expressions to find a line that has, digit(s), space(s), digit(s) + letter(s),
// space(s), digit(s), space(s), any character (up to 25 times).
Match theMatch = Regex.Match(line, #"^[\.*\d]+\s+[\d\w]+\s+[\d\-\w*]+\s+.{25}");
if (theMatch.Success)
{
// Stores the matched value in string output.
string output = theMatch.Value;
// Replaces the text with the required layout.
output = Regex.Replace(output, #"^[\.*\d]+\s+", "");
//output = Regex.Replace(output, #"^9[\d]{3}\s+[\d*\-*\w*]+\s+[\d*\w*\-*\,*\:*\;*\.*\d*\w*]+", "");
output = Regex.Replace(output, #"\s+", " ");
// Sets the formattedTextRichTextBox to the string output.
formattedTextRichTextBox.AppendText(output);
formattedTextRichTextBox.AppendText("\n");
}
}
}
OUTCOME:
So what I would like the new data to look like is in this format (removed 9xxx):
0423 222222 ADH, TEXTEXT
0424 1234 ADH,MORE TEXT
0425 98765 ADH, TEXT 3609
2000 98765-4 LBL,IUC,PCA,S/N
0010 99999-27 LBL,IUI,1.0x.25
QUESTIONS:
Is there an easier way to go about this?
If so, can I use regex to go about this or must I use a different way?

Just reformulate the regex that tests your format to match everything that doesn't begin with 9 - that way lines starting with 9 are not added to the rich text box.

Try this(Uses Linq):
//Create a regex to identify lines that start with 9XXX
Regex rgx = new Regex(#"^9\d{3}");
//Below is the linq expression to filter the lines that start with 9XXX
var validLines =
(
//This following line specifies what enumeration to pick the data from
from ln in File.ReadAllLines(openFile.FileName)
//This following specifies what is the filter that needs to be applied to select the data.
where !rgx.IsMatch(ln)
//This following specifies what to select from the filtered data.
select ln;
).ToArray(); //This line makes the IQueryable enumeration to an array of Strings (since variable ln in the above expression is a String)
//Finally join the filtered entries with a \n using String.Join and then append it to the textbox
formattedTextRichTextBox.AppendText = String.Join(validLines, "\n");

Yes, there is a simpler way. Just use Regex.Replace method, and provide Multiline option.

Why don't you just match the first 9xxx part the use a wildcard to match the rest of the line, it would be a lot more readable.
output = Regex.Replace(output, #"^9[\d{3}].*", "")

Converting a normalized phone number to a user-friendly version

In my C# application, I use a regular expression to validate the basic format of a US phone number to make sure that the user isn't just entering bogus data. Then, I strip out everything except numbers, so this:
(123) 456-7890 x1234
becomes
12345678901234
in the database. In various parts of my application, however, I would like to convert this normalized phone number back to
(123) 456-7890 x1234
What's the best way to do such a thing? (Don't worry about accounting for international phone number formats, by the way.)

String.Format("{0:(###) ###-#### x ###}", double.Parse("1234567890123"))
Will result in (123) 456-7890 x 123

Using a regex you can replace:
(\d{3})(\d{3})(\d{4})(\d{4})
with:
(\1) \2-\3 x\4
(Though I'm not familiar with US phone numbers so maybe there's more to it.)

I would just use a custom format string to transform the number back into the string:
class Program
{
static void Main(string[] args)
{
long phoneNumber = 12345678901234;
string phoneNumberString = String.Format("{0:(000) 000-0000 x0000}", phoneNumber);
Console.WriteLine(phoneNumberString);
}
}
Of course, you would factor it out into a function which would take the phone number as a long and then return the string (with the format loaded or stored as a constant in the method, or something appropriate for your situation).
Oh, and if you have it in a string and not a long, you can easily convert the string to a long, and then pass it to the format function. Of course, there are performance considerations here if you are doing it repeatedly (since you are iterating the string to create the long, and then converting it back to a string, when you could just use substring).

If you only support US numbers, you could simply format the digits to show parenthesis and x wherever you want.
I would prefer to store the whole string, I would parse it using a regex to validate it, then store it in a normalized string.
To make it accept any country, I would do this:
I would add the IDD code to all phone numbers, and then hide it from users from that country.
so: (123) 456-7890 x1234 would be stored as +1 (123) 456-7890 x1234
The (perl-compatible) regex would be something like (completely untested and wouldn't work) :
(+\d+)?\s+(((\d{,3}))(?\s+([-.0-9]{6,})\s+((x|ext\w*)\d{,4})
This is an optional number of digits preceded by +
Followed by one or more spaces
Then an optional group of up to 3 digits between parenthesis
Then one or more spaces
Then a group of 6 or more digits, dashes or dots
Then one or more spaces
Then an optional x or a word that begins with ext (ext, extension ...) and a group of up to 4 digits
I would have a database of users including country and area code, then fill those in automatically in case they're missing, the country would have it's default digit grouping convention for phone numbers (3,4 for the us).
So if you're in area 123 in the us, and enter 456.7890, it would be parsed as +1 (123) 4567890, and you would only see it as 456-7890
if you're in Qatar and enter the number 4444555 extenshn 33, it is stored as +974 4444555 x33, you would see it as 4444555 x33
The international code will not be displayed for users in the same country, and the area code will not be displayed for users in the same country and area code. The full number would be displayed onmouseover (HTML label?)

Do you HAVE to break it down for the DB? If not, don't. If you MUST, then you can either store the different parts in different fields, (Areacode, Prefix, SubscriberNum, Extenion).
Or, extract the number, and begin parsing. If it's only 10 digits, then you know there is no extension. All digits past 10, stick them in the string after an 'x' or something.
I did something similar to this in a C++ app I wrote the stored different contact mechanisms as a single string, but instead, I did the reverse of what you are doing. I took the fields off a dialog, and built the formatted number to store as a string.

Here's an extension method that might help:
public static string InsertStringAtPositions(this string str, string insertStr, IEnumerable<int> positions)
{
if (str != null && insertStr != null && positions != null)
{
string newString = string.Empty;
int previousPos = 0;
foreach (var pos in positions)
{
if (pos < str.Length)
{
newString += str.Substring(previousPos, pos - previousPos) + insertStr;
previousPos = pos;
}
}
if (positions.Last() < str.Length)
{
return newString + str.Substring(positions.Last(), str.Length - positions.Last());
}
return newString;
}
return str;
}
Usage:
// Will convert "0399998888" to "03 9999 8888"
number.InsertStringAtPositions(" ", new[] {2, 6});

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.