How do I include a string in an escape sequence in c# - c#

I have a string which is a combination of variables and text which looks like this:
string str = $"\x0{str2}{str3}";
As you can see I have a string escape sequence of \x, which requires 1-4 Hexadecimal characters. str2 is a hexadecimal character (e.g. D), while str3 is two decimal characters (e.g. 37). What I expect as an outcome of str = "\x0D37" is str to contain ഷ, but instead I get whitespace, as if str == "\x0". Why is that?

As per the specification, an interpolated string is split into separate tokens before parsing. To parse the \x escape sequence, it needs to be part of the same token, which in this case it is not. And in any case, there is simply no way the interpolated part would ever use escape sequences, as that is not defined for non-literals.
You are better off just generating a char value directly, albeit this is more complex
string str = ((char) (int.Parse(str2 + str3, NumberStyles.HexNumber))).ToString();

To answer why your example doesn't work:
The spec calls \x a "hexadecimal escape sequence".
A hexadecimal escape sequence represents a single Unicode character, with the value formed by the hexadecimal number following "\x".
The grammar only permits literal hexadecimal digits to be used in this way. ANTLR grammar:
hexadecimal_escape_sequence
: '\\x' hex_digit hex_digit? hex_digit? hex_digit?;
hex_digit
: '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
| 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'a' | 'b' | 'c' | 'd' | 'e' | 'f';
meaning, "the sequence \x followed immediately by one, two, three, or four hexadecimal digits". Thus, the escape can only be used as a literal in source code, not one concatenated at runtime.
Charlieface provided a good alternative for your case, though depending on the context you may want to rethink your organization and just have a single string holding the characters instead of two.

Related

Extract phone numbers and exclude extraneous characters

I'm trying to create a regex which will extract a complete phone number from a string (which is the only thing in the string) but leaving out any cruft like decorative brackets, etc.
The pattern I have mostly appears to work, but returns a list of matches - whereas I want it to return the phone number with the characters removed. Unfortunately, it completely fails if I add the start and end of line matchers...
^(?!\(\d+\)\s*){1}(?:[\+\d\s]*)$
Without the ^ and $ this matches the following numbers:
12345-678-901 returns three groups: 12345 678 901
+44-123-4567-8901 returns four groups: +44 123 4567 8901
(+48) 123 456 7890 returns four groups: +48 123 456 7890
How can I get the groups to be returned as a single, joined up whole?
Other than that, the only change I would like to include is to return nothing if there are any non-numeric, non-bracket, non-+ characters anywhere. So, this should fail:
(+48) 123 burger 7890
I'd keep it simple, makes it more readable and maintainable:
public string CleanPhoneNumber(string messynumber){
if(Regex.IsMatch(messynumber, "[a-z]"))
return "";
else
return Regex.Replace(messynumber, "[^0-9+]", "");
}
If any alphameric characters are present (extend this range if you wish) return blank else replace every char that is not 0-9 or +, with nothing. This produces output like 0123456789 and +481234567 with all the brackets, spaces and hyphens etc removed too. If you want to keep those in the output, add them to the Regex
Side note: It's not immediately clear or me what you think is "cruft" that should be stripped (non a-z?) and what you think is "cruft" that should cause blank (a-z?). I struggled with this because you said (paraphrase) "non digit, non bracket, non plus should cause blank" but earlier in your examples your processing permitted numbers that had hyphens and also spaces - being strictly demanding of spec hyphens/spaces would be "cruft that causes the whole thing to return blank" too
I've assumed that it's lowercase chars from the "burger" example but as noted you can extend the range in the IF part should you need to include other chars that return blank
If you have a lot of them to do maybe pre compile a regex as a class level variable and use it in the method:
private Regex _strip = new Regex( "[^0-9+]", RegexOptions.Compiled);
public string CleanPhoneNumber(string messynumber){
if(Regex.IsMatch(messynumber, "[a-z]"))
return "";
else
return _strip.Replace(messynumber, "");
}
...
for(int x = 0; x < millionStrArray.Length; x++)
millionStrArray[x] = CleanPhoneNumber(millionStrArray[x], "");
I don't think you'll gain much from compiling the IsMatch one but you could try it in a similar pattern
Other options exist if you're avoiding regex, you cold even do it using LINQ, or looping on char arrays, stringbuilders etc. Regex is probably the easiest in terms of short maintainable code
The strategy here is to use a look ahead and kick out (fail) a match if word characters are found.
Then when there are no characters, it then captures the + and all numbers into a match group named "Phone". We then extract that from the match's "Phone" capture group and combine as such:
string pattern = #"
^
(?=[\W\d+\s]+\Z) # Only allows Non Words, decimals and spaces; stop match if letters found
(?<Phone>\+?) # If a plus found at the beginning; allow it
( # Group begin
(?:\W*) # Match but don't *capture* any non numbers
(?<Phone>[\d]+) # Put the numbers in.
)+ # 1 to many numbers.
";
var number = "+44-123-33-8901";
var phoneNumber =
string.Join(string.Empty,
Regex.Match(number,
pattern,
RegexOptions.IgnorePatternWhitespace // Allows us to comment the pattern
).Groups["Phone"]
.Captures
.OfType<Capture>()
.Select(cp => cp.Value));
// phoneNumber is `+44123338901`
If one looks a the match structure, the data it houses is this:
Match #0
[0]: +44-123-33-8901
["1"] → [1]: -8901
→1 Captures: 44, -123, -33, -8901
["Phone"] → [2]: 8901
→2 Captures: +, 44, 123, 33, 8901
As you can see match[0] contains the whole match, but we only need the captures under the "Phone" group. With those captures { +, 44, 123, 33, 8901 } we now can bring them all back together by the string.Join.

Regex containing two specific characters

I have the following regex, which is not working:
#"^[a-z]{1}[a-z0-9\-_(%i)]*$"
The user is allowed to use %i, but only in this combination. Only % is not allowed. The expression in parentheses does not work.
The user input could be for example:
testing123%i
testing123
testing-%i-123
But this is not allowed:
testing%123
A character class only matches 1 single char. You need to take %i out of the character class if you want to match %i as a sequence:
^[a-z](?:[a-z0-9_-]|%i)*$
See the regex demo
Details:
^ - start of a string
[a-z] - a lowervase ASCII letter
(?:[a-z0-9_-]|%i)* - zero or more occurrences of:
[a-z0-9_-] - a lowercase ASCII letter, a digit, _ or -
| - or
%i - a literal char sequence %i
$ - end of string.
string pattern = #"\b(?!(?:.\B)(.)(?:\B.)\1)[%i]+\b";
string input = "testing123%i";
if (Regex.IsMatch(input, pattern))
{
return true;
}

How to validate 'live' input field with Regex?

Is there a way to validate 'live' input field using Regex in C#?
'live' means that I don't validate complete string, I want to validate the string while it's typed.
For example, I want my string to match the next pattern lalala111#alalala123, so I have to check each new char - is it # ? if it's # then is there a # already in the string? if yes, then I return a null, if no, then I return the char. And of course I have to check chars - is it letter or digit? if yes, then ok, if not, then not ok.
I hope you got my idea.
At now I have this code
private char ValidateEmail(string input, int charIndex, char charToValidate)
{
if (char.IsLetterOrDigit(charToValidate) &&
!(charToValidate >= 'а' && charToValidate <='я') &&
!(charToValidate >= 'А' && charToValidate <= 'Я') ||
charToValidate =='#' ||
"!#$%&'*+-/=?^_`{|}~#.".Contains(charToValidate.ToString()))
{
if ((charToValidate == '#' && input.Contains("#")) ||
(!input.Contains("#") && charIndex>=63) ||
(input.Contains("#") && charIndex >= 192))
return '\0';
}
else
{
return '\0';
}
return char.ToUpper(charToValidate);
}
it allows only latin letters with digits and some special characters, and also it allows first part of the string (before #) to have only 64 letters, and the second part (after #) to have only 128 letters, but the code looks ugly, don't it? So I want to do all these checks in one beauty regular expression.
lYou have to use the following code:
Declare this line at top:
System.Text.RegularExpressions.Regex remail = new System.Text.RegularExpressions.Regex(#"^([a-zA-Z0-9_\-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$");
next either on button click or leave event pass the following code to check
if (textemail.Text != "" && !remail.IsMatch(textemail.Text))
{
errorProvider1.Clear();
textemail.Focus();
errorProvider1.SetError(textemail, "Wrong Email ID");
MessageBox.Show("Wrong Email ID");
textemail.SelectAll();
return;
}
After a character has been typed, you want the string that has been entered to match one of the following:
between 1 and 64 (inclusive) acceptablecharacters.
between 1 and 64 (inclusive) acceptable characters then an # character.
between 1 and 64 (inclusive) acceptable characters then an # character then 128 or fewer acceptable characters.
Note that the last two clauses can be combined to say:
between 1 and 64 (inclusive) acceptable characters then an # character then between 0 and 128 inclusive acceptable characters.
Hence the entire requirement can be expressed as:
between 1 and 64 (inclusive) acceptable characters, optionally followed by an # character then between 0 and 128 inclusive acceptable characters.
Where the definition of "acceptable characters" is not at all clear from the question. The code within ValidateEmail does range checks of 'a' to 'я' and of 'А' to 'Я'. It also checks "!#$%&'*+-/=?^_{|}~#.".Contains(...)`.
The text below assumes acceptable characters actually means the 26 letters, upper and lower case, plus the 10 digits.
The required regular expression is then ^\w{1,64}(#\w{0,128})?$
This regular expression can then be used to check the concatenation of the already validated input text plus the character just typed.
If additional characters are wanted to be considered as acceptable then change the \w, there are two of them. For example if underscores (_) and hyphens (-) are to be allowed then change both \ws to be [\w_-].

regex expression help needed

I was wondering if this was possible using Regex. I would like to exclude all letters (upper and lowercase) and the following 14 characters ! “ & ‘ * + , : ; < = > # _
The problem is the equal sign. In the string (which must either be 20 or 37 characters long) that I will be validating, that equal sign must either be in the 17th or 20th position because it is used as a separator in those positions. So it must check if that equal sign is anywhere other than in the 16th or 20th position (but not both). The following are some examples:
pass: 1234567890123456=12345678901234567890
pass: 1234567890123456789=12345678901234567
don't pass: 123456=890123456=12345678901234567
don't pass: 1234567890123456=12=45678901234567890
I am having a hard time with the part that I must allow the equal sign in those two positions and not sure if that's possible with Regex. Adding an if-statement would require substantial code change and regression testing because this function that stores this regex currently is used by many different plug-ins.
I'll go for
^([^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]+|[^a-zA-Z!"&'*+,:;<=>#_]{19}=[^a-zA-Z!"&'*+,:;<=>#_]*)$
Explanations :
1) Start with your allowed char :
^[^a-zA-Z!"&'*+,:;<=>#_]$
[^xxx] means all except xxx, where a-z is lower case letters A-Z upper case ones, and your others chars
2) Repeat it 16 times, then =, then others allowed chars ("allowed char" followed by '+' to tell that is repeated 1 to n times)
^[^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]+$
At this point you'll match your first case, when = is at position 17.
3) Your second case will be
^[^a-zA-Z!"&'*+,:;<=>#_]{19}=[^a-zA-Z!"&'*+,:;<=>#_]*$
with the last part followed by * instead of + to handle strings that are only 20 chars long and that ends with =
4) just use the (case1|case2) to handle both
^([^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]+|[^a-zA-Z!"&'*+,:;<=>#_]{19}=[^a-zA-Z!"&'*+,:;<=>#_]*)$
Tested OK with notepad++ and your examples
Edit to match exactly 20 or 37 chars
^([^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]{3}|[^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]{20}|[^a-zA-Z!"&'*+,:;<=>#_]{19}=|[^a-zA-Z!"&'*+,:;<=>#_]{19}=[^a-zA-Z!"&'*+,:;<=>#_]{17})$
More readable view with explanation :
`
^(
// 20 chars with = at 17
[^a-zA-Z!"&'*+,:;<=>#_]{16} // 16 allowed chars
= // followed by =
[^a-zA-Z!"&'*+,:;<=>#_]{3} // folowed by 3 allowed chars
|
[^a-zA-Z!"&'*+,:;<=>#_]{16} // 37 chars with = at 17
=
[^a-zA-Z!"&'*+,:;<=>#_]{20}
|
[^a-zA-Z!"&'*+,:;<=>#_]{19} // 20 chars with = at 20
=
|
[^a-zA-Z!"&'*+,:;<=>#_]{19} // 37 chars with = at 20
=
[^a-zA-Z!"&'*+,:;<=>#_]{17}
)$
`
I've omitted other symbols matching other symbols and just placed the [^=], you should have there code for all allowed symbols except =
var r = new Regex(#"^(([0-9\:\<\>]{16,16}=(([0-9\:\<\>]{20})|([0-9\:\<\>]{3})))|(^[^=]{19,19}=(([0-9\:\<\>]{17}))?))$");
/*
#"^(
([0-9\:\<\>]{16,16}
=
(([0-9\:\<\>]{20})|([0-9\:\<\>]{3})))
|
(^[^=]{19,19}
=
(([0-9\:\<\>]{17}))?)
)$"
*/
using {length,length} you can also specify the overall string length. The $ in the end and ^ in the beginning are important also.

How do I write regex to validate EIN numbers?

I want to validate that a string follows this format (using regex):
valid: 123456789 //9 digits
valid: 12-1234567 // 2 digits + dash + 7 digits
Here's an example, how I would use it:
var r = new Regex("^[1-9]\d?-\d{7}$");
Console.WriteLine(r.IsMatch("1-2-3"));
I have the regex for the format with dash, but can't figure how to include the non-dash format???
Regex regex = new Regex("^\\d{2}-?\\d{7}$");
This will accept the two formats you want: 2 digits then an optional dash and 7 numbers.
^ \d{9} | \d{2} - \d{7} $
Remove the spaces, they are there for readability.

Categories