how to match rules using regex in C# - c#

I am new to regex stuff in C#. I am not sure how to use the regex to validate client reference number. This client reference number has 3 different types : id, mobile number, and serial number.
C#:
string client = "ABC 1234567891233";
//do code stuff here:
if Regex matches 3-4 digits to client, return value = client id
else if Regex matches 8 digts to client, return value = ref no
else if Regex matches 13 digits to client, return value = phone no
I dont know how to count digits using Regex for different types. Like Regex("{![\d.....}").

I don't understand why you're bent on using regular expressions here. A simple one-liner would do, eg. even such an extension method:
static int NumbersCount(this string str)
{
return str.ToCharArray().Where(c => Char.IsNumber(c)).Count();
}
It's clearer and more maintainable in my opinion.
You could probably give it a go with group matching and something along the lines of
"(?<client>[0-9]{5,9}?)|(?<serial>[0-9]{10}?)|(?<mobile>[0-9]{13,}?)"
Then you'd check whether you have a match for "client", "serial", "mobile" and interpret the string input on that basis. But is it easier to understand?
Does it express your intentions more clearly for those reading your code later on?
If the requirement is such that these numbers must be consecutive (as #Corak points out)... I'd still write that iteratively, like so:
/// <summary>
/// returns lengths of all the numeric sequences encountered in the string
/// </summary>
static IEnumerable<int> Lengths(string str)
{
var count = 0;
for (var i = 0; i < str.Length; i++)
{
if (Char.IsNumber(str[i]))
{
count++;
}
if ((!Char.IsNumber(str[i]) || i == str.Length - 1) && count > 0)
{
yield return count;
count = 0;
}
}
}
And then you could simply:
bool IsClientID(string str)
{
var lenghts = Lengths(str);
return lenghts.Count() == 1 && lenghts.Single() == 5;
}
Is it more verbose? Yes, but chances are that people will still like you more than if you make them fiddling with regex every time the validation rules happen to change, or some debugging is required : ) This includes your future self.

I'm not sure if I understood your question. But if you want to get the number of Numerical Characters from a string you can use the following code:
Regex regex = new Regex(#"^[0-9]+$");
string ValidateString = regex.Replace(ValidateString, "");
if(ValidateString.Length > 4 && ValidateString.Length < 10)
//this is a customer id
....

Related

In c# How to convert back unicoded characters to UTF-8?

I Have this text Grou00dfbeerenstrau00dfe and I need to convert it to Großbeerenstraße
also Eichstu00e4tt to Eichstätt
But I don't completely understand and solve this because of these reasons:
ONLY some characters (special characters) are converted, not the whole text
Unicoded texts usually have Escape characters like \u00df instead of u00df
Could you please help me to convert correctly back to its original states?
Basically, how can I convert when there is no escape character?
NOTE: If you must know, I'm sending some special charactered strings into some system. I cannot touch this system but when I request back the same string from that system, it converts Großbeerenstraße to Grou00dfbeerenstrau00dfe and so on.
Based on David's idea of looking for u and checking if the following 4 characters are valid hex numbers, it would look something like this:
public string FixGermanUnicode(string input) {
var output = new StringBuilder();
for (var i = 0; i < input.Length; i++) {
if (i < input.Length - 4 && input[i] == 'u' && input[i + 1] == '0'
&& int.TryParse(input.Substring(i + 1, 4), NumberStyles.HexNumber, null, out var code)) {
try {
output.Append(char.ConvertFromUtf32(code));
i += 4;
} catch (ArgumentOutOfRangeException) {
//not a valid unicode character
output.Append(input[i]);
}
} else {
output.Append(input[i]);
}
}
return output.ToString();
}
Console.WriteLine(FixGermanUnicode("Grou00dfbeerenstrau00dfe"));
Really, it checks for u0 to prevent cases where the next 4 characters are valid unicode, but should not have been replaced. That will work for German at least, since all the special characters in German have unicode codes starting with 0.
This will also catch scenarios where the follow 4 digits are valid hex numbers, but the resulting hex number is not a valid unicode character.
While I completely agree with #Gabriel Luci's answer, I would like to point out a more concise implementation of the same idea (it needs the ' System.Text.RegularExpression' namespace):
readonly static string unicodePattern = #"u0[0-9a-fA-F]{3}";
public static string FixGermanUnicode(string input)
{
return Regex.Replace(input, unicodePattern, match =>
{
var digits = match.Value.Substring(1);
try
{
return char.ConvertFromUtf32(int.Parse(digits, System.Globalization.NumberStyles.AllowHexSpecifier)).ToString();
}
catch (ArgumentOutOfRangeException)
{
//not a valid unicode character
return match.Value;
}
});
}

c# compare string irrespective of language

I have a routine that tries to find a specific term in a list of strings.
int FindString(string term, List<string> stringList)
{
for (int i = 0; i < stringList.Count; i++)
{
if (stringList[i].Contains(term))
{
return i;
}
}
return -1;
}
The term is always a Unicode string in English -for example "B4"- while the list of strings contains strings that may be written in other languages. A string might contain "B4" for example but since it was written in Greek, the Contains method returns false when comparing the English and Greek version of basically the same characters.
Is there a way to transform the non-English string so the Contains method will properly return true?
Example term and string (filename in reality):
term: B4
string: 19-299-12-Β4.txt
Basically you need to "normalize" string based on your custom rules and than perform search.
Since there is no generally accepted mapping that include at least "Latin B" equals "Greek B" you have to build your own - basic dictionary Dictionary<char,char> may be enough.
As part of that "normalization" you may also consider digit mapping - for that there is actually official Unicode information available - GetDigitValue.
So overall code to normalize would look like:
var source = "А9"; // Cyrilic A9 - "\u0410\u0039"
var map = new Dictionary<char,char> { { 'А', 'A' } }; // Cyrillic to Latin
var chars = source.Select( c =>
CharUnicodeInfo.GetUnicodeCategory(c)==UnicodeCategory.DecimalDigitNumber?
CharUnicodeInfo.GetDigitValue(c).ToString()[0] :
map.ContainsKey(c) ? map[c] :
c);
var result = String.Join("", chars);
var term = "\u0041\u0039"; // Latin A9
Console.WriteLine(source.Contains(term));
Console.WriteLine(result.Contains(term));

Check if string contains at least two alpha chars

I need to check if a string has at least two alpha characters, like a1763r or ab1244
I was thinking I would use something like:
myString = "a123B";
myString.Any(char.IsDigit).Count();
but I'm using .net 2.0 so this method Any() does not exists.
Is there something equivalent?
Don't know about alpha or what not, but you can count how many characters are digits without Linq like so:
string str = "a123B";
int digits = 0;
foreach (char c in str)
if (char.IsDigit(c))
digits++;
print(digits); // 3
You can create a simple helper function that loops over your string, taking in a minimum threshold to meet. It returns boolean to match the type of output behavior from .Any()
public bool ContainsMinAlphaCharacters(string input, int threshold)
{
var count = 0;
foreach (var character in input)
{
if (char.IsDigit(character)) count++;
if (count >= threshold)
{
return true;
}
}
return false;
}
Use regexpressions
two letters: Regex.IsMatch(myString, "[A-Za-z].*?[A-Za-z]");
two digits: Regex.IsMatch(myString, "\d.*?\d");
Not really. You will have to loop through the string and check if each character is a digit to get the count.

Double.TryParse() ignores NumberFormatInfo.NumberGroupSizes?

I'd like to know if I'm missing something or not... I'm running under the standard Great British culture.
Double result = 0;
if (Double.TryParse("1,2,3", NumberStyles.Any, CultureInfo.CurrentCulture, out result))
{
Console.WriteLine(result);
}
Expected output would be nothing... "1,2,3" shouldn't parse as a double. However it does. According to the .NET 2.0 MSDN documentation
AllowThousands Indicates that the numeric string can have group
separators; for example, separating the hundreds from the thousands.
Valid group separator characters are determined by the
NumberGroupSeparator and CurrencyGroupSeparator properties of
NumberFormatInfo and the number of digits in each group is determined
by the NumberGroupSizes and CurrencyGroupSizes properties of
NumberFormatInfo.
Allow thousands is included in NumberStyles.Any. The NumberGroupSizes is 3 for my culture. Is this just a bug in the Double.Parse? seems unlikely but I can't spot what I'm doing wrong....
It just means the input string can contain zero or more instances of NumberFormatInfo.NumberGroupSeparator. This separator can be used to separate groups of numbers of any size; not just thousands. NumberFormatInfo.NumberGroupSeparator and NumberFormatInfo.NumberGroupSizes are used when formatting decimals as strings. Using Reflector it seems like NumberGroupSeparator is only used to determine if the character is a separator, and if it is, it is skipped. NumberGroupSizes is not used at all.
If you want to validate the string, you could do so using RegEx or write a method to do so. Here's one I just hacked together:
string number = "102,000,000.80";
var parts = number.Split(',');
for (int i = 0; i < parts.Length; i++)
{
var len = parts[i].Length;
if ((len != 3) && (i == parts.Length - 1) && (parts[i].IndexOf('.') != 3))
{
Console.WriteLine("error");
}
else
{
Console.WriteLine(parts[i]);
}
}
// Respecting Culture
static Boolean CheckThousands(String value)
{
String[] parts = value.Split(new string[] { CultureInfo.CurrentCulture.NumberFormat.NumberGroupSeparator }, StringSplitOptions.None);
foreach (String part in parts)
{
int length = part.Length;
if (CultureInfo.CurrentCulture.NumberFormat.NumberGroupSizes.Contains(length) == false)
{
return false;
}
}
return true;
}

Converting "Bizarre" Chars in String to Roman Chars

I need to be able to convert user input to [a-z] roman characters ONLY (not case sensitive). So, there are only 26 characters that I am interested in.
However, the user can type in any "form" of those characters that they wish. The Spanish "n", the French "e", and the German "u" can all have accents from the user input (which are removed by the program).
I've gotten pretty close with these two extension methods:
public static string LettersOnly(this string Instring)
{
char[] aChar = Instring.ToCharArray();
int intCount = 0;
string strTemp = "";
for (intCount = 0; intCount <= Instring.Length - 1; intCount++)
{
if (char.IsLetter(aChar[intCount]) )
{
strTemp += aChar[intCount];
}
}
return strTemp;
}
public static string RemoveAccentMarks(this string s)
{
string normalizedString = s.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
char c;
for (int i = 0; i <= normalizedString.Length - 1; i++)
{
c = normalizedString[i];
if (System.Globalization.CharUnicodeInfo.GetUnicodeCategory(c) != System.Globalization.UnicodeCategory.NonSpacingMark)
{
sb.Append(c);
}
}
return sb.ToString();
}
Here is an example test:
string input = "Àlièñ451";
input = input.LettersOnly().RemoveAccentMarks().ToLower();
console.WriteLine(input);
Result: "alien" (as expected)
This works for 99.9% of the cases. However, a few characters seem to pass all of the checks.
For instance, "ß" (a German double-s, I think). This is considered by .Net to be a letter. This is not considered by the function above to have any accent marks... but it STILL isn't in the range of a-z, like I need it to be. Ideally, I could convert this to a "B" or an "ss" (whichever is appropriate), but I need to convert it to SOMETHING in the range of a-z.
Another example, the dipthong ("æ"). Again, .Net considers this a "letter". The function above doesn't see any accent, but again, it isn't in the roman 26 character alphabet. In this case, I need to convert to the two letters "ae" (I think).
Is there an easy way to convert ANY worldwide input to the closest roman alphabet equivalent? It is expected that this probably won't be a perfectly clean translation, but I need to trust that the inputs at FlipScript.com are ONLY getting the characters a-z... and nothing else.
Any and all help appreciated.
If I were you, I'd create a Dictionary which would contain the mappings from foreign letters to Roman letters. I'd use this for two reasons:
It will make understanding what you want to do easier to someone who is reading your code.
There are a small, finite, number of these special letters so you don't need to worry about maintenance of the data structure.
I'd put the mappings into an xml file then load them into the data structure at run-time. That way, you do not need to modify any code which uses the characters, you only need to specify the mappings themselves.

Categories