I have been tasked with the impossible, maybe?
I have a table with telephone numbers. But they are manually entered, and very dirty.
Example:
0711112399
07 1111 3288
07 1111 4832 NIKKI
0711117929
0711113616X123
0
NULL
1300 111 782
.
(Numbers changed to protect the innocent. :))
I need to break these into
CountryCode
AreaCode
Number
Extension
So, 0711112399 would become
CountryCode = +61 (Because there is no code on this number)
AreaCode = 07
Number = 11112399
Extension = NULL
11113616X123 would be
Country +61
AreaCode = NULL
Number = 11113616
Extension = 123
Rules are:
Possible area codes:
02 03 04 07 08
Is this even possible?
For 07 1111 4832 NIKKI - I will remove Alpha Numerics, unless it's an X between 2 numbers.
You can try this
^(00\d{2}|\+\d{2})?(0\d)?([\d ]+)(?:[xX](\d+))?
See it here on Regexr. You can see the content of the groups while hovering over the blue highlighted matches.
It will put the country code in Group1, the area code in group 2, the number in group 3 and the extension in group 4. All parts are optional except the number. When a part is not found, the value of the group is not set, you have to put your default values then.
I see a problem for the country code. It is hardcoded here with 2 digits, but I know there are also countries with a 3 digit code. For the countries with a 1 digit code, I am not sure, could be that there is a leading 0 then. But I need this to know when the area code/the number is starting.
I wouldn't say impossible but it will require rigorous testing. But I wouldn't necessarily focus on regular expressions. It may be simpler to implement using other techniques.
This is an ideal case to approach with Test Driven development. Start by listing all the possible cases, write a unit test for each case, and adjust the sanitizer code for the case.
There are dedicated libraries to normalize phone numbers, they're very specialized. But they tend to rely on Regex as well. The Lync Server (Microsoft's voice over ip solution), has a normalization library that relies on regex. Their page contains quite a few samples that will come in handy for you:
http://technet.microsoft.com/en-us/library/gg413082.aspx
In the end, it's probably easier to build a number of expressions that will normalize to a common format, than trying to create one expression to normalize everything.
Related
I have a project where I have an input for a phone number. Said phone number can be from any country. The user selected the country before entereing the phone number.
Is there a way to format the phone number as a user is typing it in WPF?
I was playing around with Google's library port for C# but to no avail (click here)
Thanks!
I can advise you to use two properties where in a property HomeNumber regular expression is used for formatting number input.
Example (closer to pseudocode):
public string Country
{
get { return country; }
set { country = value; }
}
public string HomeNumber
{
get
{
if (homeNumber == null)
{
return string.Empty;
}
if (country.Equals("USA"))
{
return Regex.Replace(homeNumber, #"(\d{2})(\d{2})(\d{2})", "$1-$2-$3");
}
else if (country.Equals("Russia"))
{
return Regex.Replace(homeNumber, #"(\d{3})(\d{2})(\d{2})", "$1-$2-$3");
}
}
set
{
homeNumber = value;
}
}
If countries to select a lot, to reduce the number of constructions If, I can advise to use the State pattern.
Advantages
simple solution
does not require third-party libraries and projects
Disadvantages
not a serious flaw, but the formatting of numbers is performed after the completion of printing numbers.
The definition of international phone numbers is a mess for 3 reasons:
1) How to dial an international phone number depends on the country you are in. If your application is used in only one country, then this is not too bad. However, there might be several ways how to get an international connection, like dialing first 00 or simply +.
2) The country code for a phone number is particularly bad designed. Just some examples:
1339 USA
1340 Virgin Islands (Caribbean Islands)
1341 USA
1342 not used
1343 Canada
The US, Canada and some smaller places share 1 as country code and the next 3 digits decide, if it is US or Canada or ... There is no easy way to figure out the country, like the first xxx are Canada, the rest US.
Some countries have a 2 digits code, others 3 and 4.
3) Even within the same country, different regions can have different formats.
The simplest solution is probably:
use a different format for international and national numbers
use for all internation numers the format +ccc dddddddd, where cc is a variable (!) length country code and dddddddd the digits within the country
depending on how many digits are used for dddddddd, you can write them as ddd dd dd, dddd dddd and ddd ddd ddd.
Of course, you can complicate it further by formatting also the area code within a country. I can't help you with areas, but for the rest you can find the code on Github (too much to post here):
github.com/PeterHuberSg/WpfWindowsLib/blob/master/WpfWindowsLib/PhoneTextBox.cs
A detailed explanation of international phone numbers and the PhoneTextBox is on CodeProject:
International Phone Number Validation Explained
Can I restrict the phone numbers in my console app without regular expressions?
I have this code, but it doesn´t work with international numbers, begining with 00.
static public bool CheckPhoneNumb (string phoneNumber)
{
long lphoneNumber;
return ((phoneNumber.Length >= 9) && phoneNumber.Length <= 15) &&
(long.TryParse (phoneNumber, out lphoneNumber))) ? true : false;
}
Thnks.
If you need to support worldwide calling, you will have a hard time doing so with regular expressions.
I would suggest the Google Phone Number Validation Library.
Parsing/formatting/validating phone numbers for all countries/regions of the world.
https://code.google.com/p/libphonenumber/
There's a C# port linked at the bottom of the page.
International phone numbers don't begin with 00. The 00 part is the code that you use in your part of the world to begin dialing the actual international number (which follows the 00). That number changes from where you are dialing from. For example, in the US, it is 011. In Europe, it is 00. Japan has a different code, and there are a few others.
Yes, you can restrict it without regular expressions, but you would probably find it easier to, and I would highly recommend you don't store your (or any) international access code with it, as it varies according to whom you display it to.
The answer you seek is rather more complex than you might think.
Often the number itself varies depending one the origin and destination locale as prefixes get added/removed.
What kind of phone numbers? NANP (North American Numbering Plan) or somewhere else?
The NANP, which covers the US, Canada, Mexico and the Carribean is described at http://www.nanpa.com/. For numbering plans in place around the world, a good place to start is at the World Telephone Numbering Guide at http://www.wtng.info/
return ((phoneNumber.Length >= 9) && phoneNumber.Length <= 15) && phoneNumber.All(char.IsNumber);
How to Implement LEFT Padding in Code or Query.
FROM TO
1 000001
2 000002
10 000010
110 000110
1110 001110
99999 099999
I am using MS Access 2007.
Thanks in Regards..
If you want to format a number as a string with leading zeros, you can use the d6 format:
int i = 200;
Console.Write(i.ToString("d6")); // prints 000200
Example: http://ideone.com/fScd9
in VBA, use the Format$ function (drop the dollar sign if you are using variants), and use "000000" for the format string.
format$(serial, "000000")
or
format(serial, "000000")
This will format the string to six digits using zeros where there are no leading numbers.
You can also try right("000000" & serial,6). Using Format is more elegant, however if you are running this on really large datasets or ODBC linked datasets it can be quite a bit slower.
SELECT [serial], right("000000" & [serial],6) AS [PaddedSerial]
FROM Table1
What is the right way to verify a credit card with a regex? If which one to use there are tons online. If not how to verify?
See this link Finding or Verifying Credit Card Numbers with Regulars Expressions
Visa: ^4[0-9]{12}(?:[0-9]{3})?$ All Visa card numbers start with a 4. New cards have 16 digits. Old cards have 13.
MasterCard: ^5[1-5][0-9]{14}$ All MasterCard numbers start with the numbers 51 through 55. All have 16 digits.
American Express: ^3[47][0-9]{13}$ American Express card numbers start with 34 or 37 and have 15 digits.
Diners Club: ^3(?:0[0-5]|[68][0-9])[0-9]{11}$ Diners Club card numbers begin with 300 through 305, 36 or 38. All have 14 digits. There are Diners Club cards that begin with 5 and have 16 digits. These are a joint venture between Diners Club and MasterCard, and should be processed like a MasterCard.
Discover: ^6(?:011|5[0-9]{2})[0-9]{12}$ Discover card numbers begin with 6011 or 65. All have 16 digits.
JCB: ^(?:2131|1800|35\d{3})\d{11}$ JCB cards beginning with 2131 or 1800 have 15 digits. JCB cards beginning with 35 have 16 digits.
Bye.
How can I use credit card numbers containing spaces? covers everything you should need.
I think you're looking for the Luhn Algorithm. It's a simple checksum formula used to validate a variety of identification numbers.
That depends on how accurate you want your pre-validation to be. To validate everything you can, you need to compute what the last digit of the card should be and compare to what is entered, which a RegEx cannot do.
For the algorithm and other details see this link, which also provides a list of common number prefixes that you could validate against.
-- Edit:
Infact, I'll slightly disagree with myself and agree with cletus. Validate as much as you can (without getting into details of specific types of credit cards [IMHO]) before sending it on. And it goes without saying (hopefully), that this validation should be done in JavaScript, to make it fast, then on the server, to double check (and for JavaScript disabled people).
-- Previous Response:
Don't bother; just let the provider verify it when you actually attempt payment. No legitimate reason to try and verify it yourself. You can use this though, if you really feel like it.
In this code I am debugging, I have this code snipit:
ddlExpYear.SelectedItem.Value.Substring(2).PadLeft(2, '0');
What does this return? I really can't run this too much as it is part of a live credit card application. The DropDownList as you could imagine from the name contains the 4-digit year.
UPDATE: Thanks everyone. I don't do a lot of .NET development so setting up a quick test isn't as quick for me.
It takes the last two digits of the year and pads the left side with zeroes to a maximum of 2 characters. Looks like a "just in case" for expiration years ending in 08, 07, etc., making sure that the leading zero is present.
This prints "98" to the console.
class Program
{
static void Main(string[] args)
{
Console.Write("1998".Substring(2).PadLeft(2, '0'));
Console.Read();
}
}
Of course you can run this. You just can't run it in the application you're debugging. To find out what it's doing, and not just what it looks like it's doing, make a new web application, put in a DropDownList, put a few static years in it, and then put in the code you've mentioned and see what it does. Then you'll know for certain.
something stupid. It's getting the value of the selected item and taking the everything after the first two characters. If that is only one character, then it adds a '0' to the beginning of it, and if it is zero characters, the it returns '00'. The reason I say this is stupid is because if you need the value to be two characters long, why not just set it like that to begin with when you are creating the drop down list?
It looks like it's grabbing the substring from the 3rd character (if 0 based) to the end, then if the substring has a length less than 2 it's making the length equal to 2 by adding 0 to the left side.
PadLeft ensures that you receive at least two characters from the input, padding the input (on the left side) with the appropriate character. So input, in this case, might be 12. You get "12" back. Or input might be 9, in which case, you get "09" back.
This is an example of complex chaining (see "Is there any benefit in Chaining" post) gone awry, and making code appear overly complex.
The substring returns the value with the first two characters skipped, the padleft pads the result with leading zeros:
string s = "2014";
MessageBox.Show(s.Substring(2).PadLeft(2, 'x')); //14
string s2 = "14";
MessageBox.Show(s2.Substring(2).PadLeft(2, 'x')); //xx
My guess is the code is trying to convert the year to a 2 digit value.
The PadLeft only does something if the user enters a year that is either 2 or 3 digits long.
With a 1-digit year, you get an exception (Subsring errs).
With a 2-digit year (07, 08, etc), it will return 00. I would say this is an error.
With a 3-digit year (207, 208), which the author may have assumed to be typos, it would return the last digit padded with a zero -- 207 -> 07; 208 -> 08.
As long as the user must choose a year and isn't allowed to enter a year, the PadLeft is unnecessary -- the Substring(2) does exactly what you need given a 4-digit year.
This code seems to be trying to grab a 2 digit year from a four digit year (ddlexpyear is the hint)
It takes strings and returns strings, so I will eschew the string delimiters:
1998 -> 98
2000 -> 00
2001 -> 01
2012 -> 12
Problem is that it doesn't do a good job. In these cases, the padding doesn't actually help. Removing the pad code does not affect the cases it gets correct.
So the code works (with or without the pad) for 4 digit years, what does it do for strings of other lengths?
null: exception
0: exception
1: exception
2: always returns "00". e.g. the year 49 (when the Jews were expulsed from rome) becomes "00". This is bad.
3: saves the last digit, and puts a "0" in front of it. Correct in 10% of cases (when the second digit is actually a zero, like 304, or 908), but quite wrong in the remainder (like 915, 423, and 110)
5: just saves the 3rd and 4th digits, which is also wrong, "10549" should probably be "49" but is instead "54".
as you can expect the problem continues in higher digits.
OK so it's taking the value from the drop down, ABCD
Then it takes the substring from position 2, CD
And then it err, left pads it with 2 zeros if it needs too, CD
Or, if you've just ended X, then it would substring to X and pad to OX
It's taking the last two digits of the year, then pad to the left with a "0".
So 2010 would be 10, 2009 would be 09.
Not sure why the developer didn't just set the value on the dropdown to the last two digits, or why you would need to left pad it (unless you were dealing with years 0-9 AD).