Converting a normalized phone number to a user-friendly version

Converting a normalized phone number to a user-friendly version - c#

In my C# application, I use a regular expression to validate the basic format of a US phone number to make sure that the user isn't just entering bogus data. Then, I strip out everything except numbers, so this:
(123) 456-7890 x1234
becomes
12345678901234
in the database. In various parts of my application, however, I would like to convert this normalized phone number back to
(123) 456-7890 x1234
What's the best way to do such a thing? (Don't worry about accounting for international phone number formats, by the way.)

String.Format("{0:(###) ###-#### x ###}", double.Parse("1234567890123"))
Will result in (123) 456-7890 x 123

Using a regex you can replace:
(\d{3})(\d{3})(\d{4})(\d{4})
with:
(\1) \2-\3 x\4
(Though I'm not familiar with US phone numbers so maybe there's more to it.)

I would just use a custom format string to transform the number back into the string:
class Program
{
static void Main(string[] args)
{
long phoneNumber = 12345678901234;
string phoneNumberString = String.Format("{0:(000) 000-0000 x0000}", phoneNumber);
Console.WriteLine(phoneNumberString);
}
}
Of course, you would factor it out into a function which would take the phone number as a long and then return the string (with the format loaded or stored as a constant in the method, or something appropriate for your situation).
Oh, and if you have it in a string and not a long, you can easily convert the string to a long, and then pass it to the format function. Of course, there are performance considerations here if you are doing it repeatedly (since you are iterating the string to create the long, and then converting it back to a string, when you could just use substring).

If you only support US numbers, you could simply format the digits to show parenthesis and x wherever you want.
I would prefer to store the whole string, I would parse it using a regex to validate it, then store it in a normalized string.
To make it accept any country, I would do this:
I would add the IDD code to all phone numbers, and then hide it from users from that country.
so: (123) 456-7890 x1234 would be stored as +1 (123) 456-7890 x1234
The (perl-compatible) regex would be something like (completely untested and wouldn't work) :
(+\d+)?\s+(((\d{,3}))(?\s+([-.0-9]{6,})\s+((x|ext\w*)\d{,4})
This is an optional number of digits preceded by +
Followed by one or more spaces
Then an optional group of up to 3 digits between parenthesis
Then one or more spaces
Then a group of 6 or more digits, dashes or dots
Then one or more spaces
Then an optional x or a word that begins with ext (ext, extension ...) and a group of up to 4 digits
I would have a database of users including country and area code, then fill those in automatically in case they're missing, the country would have it's default digit grouping convention for phone numbers (3,4 for the us).
So if you're in area 123 in the us, and enter 456.7890, it would be parsed as +1 (123) 4567890, and you would only see it as 456-7890
if you're in Qatar and enter the number 4444555 extenshn 33, it is stored as +974 4444555 x33, you would see it as 4444555 x33
The international code will not be displayed for users in the same country, and the area code will not be displayed for users in the same country and area code. The full number would be displayed onmouseover (HTML label?)

Do you HAVE to break it down for the DB? If not, don't. If you MUST, then you can either store the different parts in different fields, (Areacode, Prefix, SubscriberNum, Extenion).
Or, extract the number, and begin parsing. If it's only 10 digits, then you know there is no extension. All digits past 10, stick them in the string after an 'x' or something.
I did something similar to this in a C++ app I wrote the stored different contact mechanisms as a single string, but instead, I did the reverse of what you are doing. I took the fields off a dialog, and built the formatted number to store as a string.

Here's an extension method that might help:
public static string InsertStringAtPositions(this string str, string insertStr, IEnumerable<int> positions)
{
if (str != null && insertStr != null && positions != null)
{
string newString = string.Empty;
int previousPos = 0;
foreach (var pos in positions)
{
if (pos < str.Length)
{
newString += str.Substring(previousPos, pos - previousPos) + insertStr;
previousPos = pos;
}
}
if (positions.Last() < str.Length)
{
return newString + str.Substring(positions.Last(), str.Length - positions.Last());
}
return newString;
}
return str;
}
Usage:
// Will convert "0399998888" to "03 9999 8888"
number.InsertStringAtPositions(" ", new[] {2, 6});

Related

Realtime validation and formatting of textbox input with regex

I have WPF application where the users are able to change the product number of a product connected to the computer with USB. The sticker on the product lists the product number in this format: 111 22 33-44.
Today the users may only enter digits (111223344) in the textbox. The input is validated with regex that checks for nine digits. But now the client wants the users to be able to either:
Enter the number as digits only and format the string as it is being typed. When the user has typed "1112" it should automatically be formatted as "111 2" in the textbox and so on. When user has entered all nine digits it should look like 111 22 33-44
Enter the number as it is written on the sticker (with spaces, etc).
But at the same time the product number must be validated to include only nine digits. The spaces and "-" must be invisible
I could've easily solved this in code, but the problem here is that this validation/formatting must be fully configurable in a config file. There are various categories of products that can be serviced by this application and the format of the product number may vary.
Is this solvable in a fairly easy way with regex? I really can't see how i can combine the two, validation and formatting:
^\d{9}$ - for validating nine digits
(\w{3})(\w{2})?(\w{2})?(\w{2})? - for formatting together with replacement pattern $1 $2 $3-$4. This pattern does however only format nine digits without spaces and "-"
Any suggestions?
EDIT:
It seems like i would need to use at least three regex patterns for this to work:
for validating the valid product number (not the display value). Is it 9 digits?
for formatting the display value (123456789 = 123 45 67-89)
stripping the added characters from the formatting (blanks and -)
Maybe a simpler solution would be to keep the current validation (for example ^\d{9}$) that validates the raw value, and then simply add a setting called DisplayMask where the people that are responsible for the configuration can enter something like this:
"### ## ##-##"
And then i write code that uses this mask for formatting the display value. This has several advantages:
Very easy to understand for the people responsible for the configurations
This will also enable me to easily retrieve all character that needs to be stripped from the entered value by simply getting the displaymask and remove all #. The characters that are left are the ones that must be stripped from the product number before they're written to the product hardware.
Also makes it very easy to set the max length of the textbox. Max length of product number raw value + number of added characters from display mask.

If you'd like to make this configurable, the key is the Regex class in System.Text.RegularExpressions namespace.
Storing the regular expression in an external config file could then be easily imported and used in matching, such as Regex.IsMatch(), especially IsMatch(string, string).

Have a look at this one Regex in PreviewTextInput: only decimals between 0.0 and 1.0
you can use PreviewTextInput event here,then use Regex.Replace to set the format, here is an example
string inputStr = "111223";//e.Text;
inputStr = Regex.Replace(inputStr, #"\D", string.Empty);
if (inputStr.Length > 0)
{
inputStr = inputStr.Substring(0, Math.Min(9, inputStr.Length));
List<string[]> tmp = new List<string[]>() { new string[] { "7", "-" }, new string[] { "5", " " }, new string[] { "3", " " } };
foreach (var arr in tmp)
{
inputStr = Regex.Replace(inputStr, #"(?<=^\d{" + arr[0] + "})", arr[1]);
}
}
Console.WriteLine(inputStr);

First off, after the text is formatted you have an entry with a length of 12. I would set the TextBox.MaxLength = 12 to limit the amount of data that can be entered.
As far as validating, there's probably a "cleaner" way of doing this, but to start with you can have a series of Regex.IsMatch() conditions that will auto format the input.
For example:
1112 => 111 2
111 223 => 111 22 3
111 22 334 => 111 22 33-4
Then there's a final Regex.IsMatch() check that the input is in the format of
### ## ##-##
Code Sample:
private void textBox1_TextChanged(object sender, EventArgs e)
{
string text = textBox1.Text;
if (Regex.IsMatch(text, "\\d{4}"))
{
textBox1.Text = Regex.Replace(text, "(\\d{3})(\\d)", "$1 $2");
}
else if (Regex.IsMatch(text, "\\d{3} \\d{3}"))
{
textBox1.Text = Regex.Replace(text, "(\\d{3} \\d{2})(\\d)", "$1 $2");
}
else if (Regex.IsMatch(text, "\\d{3} \\d{2} \\d{3}"))
{
textBox1.Text = Regex.Replace(text, "(\\d{3} \\d{2} \\d{2})(\\d)", "$1-$2");
}
else if (!Regex.IsMatch(text, "\\d{3} \\d{2} \\d{2}-\\d{2}"))
{
// Invalid entry
}
// Keep the cursor at the end of the input
textBox1.SelectionStart = textBox1.Text.Length;
}

How can I limit user input to a single alpha character in c#?

I'm using the following code to accept user input. I want to limit user input to a single alpha (a-z) character only. I'm finding a lot of validations using IsNumber to validate integer input, and a lot of information about using a regex on an input String, but I've been unable to uncover how I would be able to restrict input possibilities with this code. Can someone point me in the right direction?
public char promptForGuess()
{
Console.Write("\nGuess a letter: ");
String pre = Console.ReadKey().Key.ToString();
string pre2 = pre.ToUpper();
char pre3 = Convert.ToChar(pre2);
}

You cannot limit the user only put in a-z chars on the console - you have to check the input, he can write in any character (just think about when the input is redirected to your program from a file with <, e.g. yourapp.exe < input.dat ).
But its easy to check a character is lowercase a-z letter. E.g. with plain, ASCII, C tactic (I will use your defined variables):
if('A' <= pre3 && pre3 <'Z') { // pre3 was made upper in your code
// input OK
} else {
// input NOK
}
With regex:
Regex r = new Regex(#"^[a-zA-Z]$");
return r.IsMatch(pre);
If you cannot allow case-insensitive characters, just change the code I wrote.
Anyway, I think you need Console.Read() (ReadKey also read keys like arrows, F1-F12 etc..., so ALL keys, even tab and caps lock). Refer to MSDN: http://msdn.microsoft.com/en-us/library/system.console.read.aspx
And maybe you should use this function, if you would support unicode letters: http://msdn.microsoft.com/en-us/library/yyxz6h5w.aspx
Note that unicode letters are usually not one bytes! But char can store it. These letters are for example beautiful Hungarian letters with acutes and these king of things: á, é, ő, ű, ö, ü etc (but also French have a lot, and also Dutch etc...)

For judging a valid string, you could judge by
str.length() == 1 && str[0] >= 'a' && str[1] <= 'z'
and for restricting input possibilities, you could write a loop that loops if the input is invalid.
pre = read();
while (!valid(pre))
pre = read();

why don't you use Regex
if (Regex.IsMatch(pre[0].ToString(), #"[A-Za-z]"))
{
//do someting
}
else
{
//do someting
}

How can I split part of a string that is inconsistent?

I have the following string:
01-21-27-0000-00-048 and it is easy to split it apart because each section is separated by a -, but sometimes this string is represented as 01-21-27-0000-00048, so splitting it is not as easy because the last 2 parts are combined. How can I handle this? Also, what about the case where it might be something like 01-21-27-0000-00.048
In case anyone is curious, this is a parcel number and it varies from county to county and a county can have 1 format or they can have 100 formats.

This is a very good case for using regular expressions. You string matches the following regexp:
(\d{2})-(\d{2})-(\d{2})-(\d{4})-(\d{2})[.-]?(\d{3})
Match the input against this expression, and harvest the six groups of digits from the match:
var str = new[] {
"01-21-27-0000-00048", "01-21-27-0000-00.048", "01-21-27-0000-00-048"
};
foreach (var s in str) {
var m = Regex.Match(s, #"(\d{2})-(\d{2})-(\d{2})-(\d{4})-(\d{2})[.-]?(\d{3})");
for (var i = 1 /* one, not zero */ ; i != m.Groups.Count ; i++) {
Console.Write("{0} ", m.Groups[i]);
}
Console.WriteLine();
}
If you would like to allow for other characters, say, letters in the segments that are separated by dashes, you could use \w instead of \d to denote a letter, a digit, or an underscore. If you would like to allow an unspecified number of such characters within a known range, say, two to four, you can use {2,4} in the regexp instead of the more specific {2}, which means "exactly two". For example,
(\w{2,3})-(\w{2})-(\w{2})-(\d{4})-(\d{2})[.-]?(\d{3})
lets the first segment contain two to three digits or letters, and also allow for letters in segments two and three.

Normalize the string first.
I.e. if you know that the last part is always three characters, then insert a - as the fourth-to-last character, then split the resultant string. Along the same line, convert the dot '.' to a dash '-' and split that string.

Replace all the char which are not digit with emptyString('').
then any of your string become in the format like
012127000000048
now you can use the divide it in (2, 2, 2, 4, 2, 3) parts.

Regex: replace inner string

I'm working with X12 EDI Files (Specifically 835s for those of you in Health Care), and I have a particular vendor who's using a non-HIPAA compliant version (3090, I think). The problem is that in a particular segment (PLB- again, for those who care) they're sending a code which is no longer supported by the HIPAA Standard. I need to locate the specific code, and update it with a corrected code.
I think a Regex would be best for this, but I'm still very new to Regex, and I'm not sure where to begin. My current methodology is to turn the file into an array of strings, find the array that starts with "PLB", break that into an array of strings, find the code, and change it. As you can guess, that's very verbose code for something which should be (I'd think) fairly simple.
Here's a sample of what I'm looking for:
~PLB|1902841224|20100228|49>KC15X078001104|.08~
And here's what I want to change it to:
~PLB|1902841224|20100228|CS>KC15X078001104|.08~
Any suggestions?
UPDATE: After review, I found I hadn't quite defined my question well enough. The record above is an example, but it is not necessarilly a specific formatting match- there are three things which could change between this record and some other (in another file) I'd have to fix. They are:
The Pipe (|) could potentially be any non-alpha numeric character. The file itself will define which character (normally a Pipe or Asterisk).
The > could also be any other non-alpha numeric character (most often : or >)
The set of numbers immediately following the PLB is an identifier, and could change in format and length. I've only ever seen numeric Ids there, but technically it could be alpha numeric, and it won't necessarilly be 10 characters.
My Plan is to use String.Format() with my Regex match string so that | and > can be replaced with the correct characters.
And for the record. Yes, I hate ANSI X12.

Assuming that the "offending" code is always 49, you can use the following:
resultString = Regex.Replace(subjectString, #"(?<=~PLB|\d{10}|\d{8}|)49(?=>\w+|)", "CS");
This looks for 49 if it's the first element after a | delimiter, preceded by a group of 8 digits, another |, a group of 10 digits, yet another |, and ~PLB. It also looks if it is followed by >, then any number of alphanumeric characters, and one more |.
With the new requirements (and the lucky coincidence that .NET is one of the few regex flavors that allow variable repetition inside lookbehind), you can change that to:
resultString = Regex.Replace(subjectString, #"(?<=~PLB\1\w+\1\d{8}(\W))49(?=\W\w+\1)", "CS");
Now any non-alphanumeric character is allowed as separator instead of | or > (but in the case of | it has to be always the same one), and the restrictions on the number of characters for the first field have been loosened.

Another, similar approach that works on any valid X12 file to replace a single data value with another on a matching segment:
public void ReplaceData(string filePath, string segmentName,
int elementPosition, int componentPosition,
string oldData, string newData)
{
string text = File.ReadAllText(filePath);
Match match = Regex.Match(text,
#"^ISA(?<e>.).{100}(?<c>.)(?<s>.)(\w+.*?\k<s>)*IEA\k<e>\d*\k<e>\d*\k<s>$");
if (!match.Success)
throw new InvalidOperationException("Not an X12 file");
char elementSeparator = match.Groups["e"].Value[0];
char componentSeparator = match.Groups["c"].Value[0];
char segmentTerminator = match.Groups["s"].Value[0];
var segments = text
.Split(segmentTerminator)
.Select(s => s.Split(elementSeparator)
.Select(e => e.Split(componentSeparator)).ToArray())
.ToArray();
foreach (var segment in segments.Where(s => s[0][0] == segmentName &&
s.Count() > elementPosition &&
s[elementPosition].Count() > componentPosition &&
s[elementPosition][componentPosition] == oldData))
{
segment[elementPosition][componentPosition] = newData;
}
File.WriteAllText(filePath,
string.Join(segmentTerminator.ToString(), segments
.Select(e => string.Join(elementSeparator.ToString(),
e.Select(c => string.Join(componentSeparator.ToString(), c))
.ToArray()))
.ToArray()));
}
The regular expression used validates a proper X12 interchange envelope and assures that all segments within the file contain at least a one character name element. It also parses out the element and component separators as well as the segment terminator.

Assuming that your code is always a two digit number that comes after a pipe character | and before the greater than sign > you can do it like this:
var result = Regex.Replace(yourString, #"(\|)(\d{2})(>)", #"$1CS$3");

You can break it down with regex yes.
If i understand your example correctly the 2 characters between the | and the > need to be letters and not digits.
~PLB\|\d{10}\|\d{8}\|(\d{2})>\w{14}\|\.\d{2}~
This pattern will match the old one and capture the characters between the | and the >. Which you can then use to modify (lookup in a db or something) and do a replace with the following pattern:
(?<=|)\d{2}(?=>)

This will look for the ~PLB|#|#| at the start and replace the 2 numbers before the > with CS.
Regex.Replace(testString, #"(?<=~PLB|[0-9]{10}|[0-9]{8})(\|)([0-9]{2})(>)", #"$1CS$3")

The X12 protocol standard allows the specification of element and component separators in the header, so anything that hard-codes the "|" and ">" characters could eventually break. Since the standard mandates that the characters used as separators (and segment terminators, e.g., "~") cannot appear within the data (there is no escape sequence to allow them to be embedded), parsing the syntax is very simple. Maybe you're already doing something similar to this, but for readability...
// The original segment string (without segment terminator):
string segment = "PLB|1902841224|20100228|49>KC15X078001104|.08";
// Parse the segment into elements, then the fourth element
// into components (bounds checking is omitted for brevity):
var elements = segment.Split('|');
var components = elements[3].Split('>');
// If the first component is the bad value, replace it with
// the correct value (again, not checking bounds):
if (components[0] == "49")
components[0] = "CS";
// Reassemble the segment by joining the components into
// the fourth element, then the elements back into the
// segment string:
elements[3] = string.Join(">", components);
segment = string.Join("|", elements);
Obviously more verbose than a single regular expression but parsing X12 files is as easy as splitting strings on a single character. Except for the fixed length header (which defines the delimiters), an entire transaction set can be parsed with Split:
// Starting with a string that contains the entire 835 transaction set:
var segments = transactionSet.Split('~');
var segmentElements = segments.Select(s => s.Split('|')).ToArray();
// segmentElements contains an array of element arrays,
// each composite element can be split further into components as shown earlier

What I found is working is the following:
parts = original.Split(record);
for(int i = parts.Length -1; i >= 0; i--)
{
string s = parts[i];
string nString =String.Empty;
if (s.StartsWith("PLB"))
{
string[] elems = s.Split(elem);
if (elems[3].Contains("49" + subelem.ToString()))
{
string regex = string.Format(#"(\{0})49({1})", elem, subelem);
nString = Regex.Replace(s, regex, #"$1CS$2");
}
I'm still having to split my original file into a set of strings and then evaluate each string, but the that seams to be working now.
If anyone knows how to get around that string.Split up at the top, I'd love to see a sample.

Parse the number with Regex with non capturing group

I'm trying to parse phone number with regex. Exactly I want to get a string with phone number in it using function like this:
string phoneRegex = #"^([+]|00)(\d{2,12}(?:\s*-*)){1,5}$";
string formated = Regex.Match(e.Value.ToString(), phoneRegex).Value;
As you can see I'm trying to use non-capturing group (?:\s*-*) but I'm doing something wrong.
Expected resoult should be:
input (e.Value): +48 123 234 344 or +48 123234344 or +48 123-234-345
output: +48123234344
Thanks in advance for any suggestions.

Regex.Match will not alter the string for you; it will simply match it. If you have a phone number string and want to format it by removing unwanted characters, you will want to use the Regex.Replace method:
// pattern for matching anything that is not '+' or a decimal digit
string replaceRegex = #"[^+\d]";
string formated = Regex.Replace("+48 123 234 344", replaceRegex, string.Empty);
In my sample the phone number is hard-coded, but it's just for demonstration purposes.
As a side note; the regex that you have in your code sample above assumes that the country code is 2 digits; this may not be the case. The United States has a one digit code (1) and many countries have 3-digit codes (perhaps there are countries with more digits than that, as well?).

This should work:
Match m = Regex.Match(s, #"^([+]|00)\(?(\d{3})\)?[\s\-]?(\d{3})\-?(\d{4})$");
return String.Format("{0}{1}{2}{4}", m.Groups[1], m.Groups[2], m.Groups[3], m.Groups[3]);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.