How to add apostrophe as allowed character to existing regular expression? - c#

I have the following regular expression for one of my name fields in C# web app:
^[A-Za-zÀ-ſ0-9.,#&-/'_!#;]?[a-zA-ZÀ-ſ0-9 '#&-/.,_:!#;]*[A-Za-zÀ-ſ0-9.,#-/_!#;]$
How can I properly modify it to add apostrophe/single quote character (') as an allowed character to it?

' is used for declaring a char, so put a backslash in front of the ' to escape it, like this \'.

It turned out that the RegEx has been fine, and it was the way the data has been input into database that caused the problem. Insert statements should have the apostrophes escaped. Even though the apostrophes were getting displayed correctly, they had been failing the RegEx check due to lack of escaping apostrophes. Thanks for your advice and sorry in case of any disapointment!

Related

C# regex.escape unexpected behavior when processing "."

Hey I have an issue with Regex.Escape I'm trying to feed it an Email from TextBox Controll. The function recieves "test#test.test". What I expect to get is this "test#test\.test" Regex.Escape escapes the dot character. Hovever what I get instead is "test#test\\.test" which is very confusing. I plan on handing that string down to an SQL query and I'm worried abut users misbehaving.
holder.address = Regex.Escape(EmailAddressInput.Text);
This is how I assign resulting string to field in holder class.
I have been researching this problem on my own but most sources (including MSDN) suggest to prefix the dot ("the special character") with one backslash.
As it is right now backslash escapes backslash and result is a badly formatted email address.
var s = "test#test\\.test"; means the s holds the test#test\.test string. Your issue does not exist. There is a single backslash. Click the magnifier button on the right - you will see that in the Text Visualizer.
Regex has to have \\ because its escaping the \
the string itself actually only has one \ in it.

Regex not matching when input string contains an ampersand

I am trying to come up with a regex that starts with a letter followed by only letters, spaces, commas, dots, ampersands, apostrophes and hyphens.
However, the ampersand character is giving me headaches. Whenever it appears in the input string, the regex no longer matches.
I am using the regex in an ASP.net project using C# in the 'Format' property of a TextInput (a custom control created in the project). In it, I am using Regex.IsMatch(Text, Format) to match it.
For example, using this regex:
^[a-zA-Z][a-zA-Z&.,'\- ]*$
The results are:
John' william-david Pass
John, william'david allen--tony-'' Pass
John, william&david Fail
Whenever I put a & in the input string the regex no longer matches, but without it everything works fine.
How can I fix my issue? Why would the ampersand be causing a problem?
Notes:
I've tried to escape the ampersand with ^[a-zA-Z][a-zA-Z\&.,'\- ]*$ but it has the same issue
I've tried to put the ampersand at the beginning or end o ^[a-zA-Z][&a-zA-Z.,'\- ]*$ or ^[a-zA-Z][a-zA-Z.,'\-\& ]*$ but it also doesn't work
Your problem is somewhere else. The following expression evaluates to true:
Regex.IsMatch(#"John, william&david", #"^[a-zA-Z][a-zA-Z&.,'\- ]*$")
See https://dotnetfiddle.net/WDvQNP
You mentioned in the comments that your problem pertains to C#, so I'll answer your question in that context. If ampersand (&) is truly giving you issues in your character class, you should specify it in an alternate manner.
Luckily, C# supports hex escape sequences which means that you can specifying & as \x26.
For example, instead of:
^[a-zA-Z][a-zA-Z&.,'\- ]*$
use
^[a-zA-Z][a-zA-Z\x26.,'\- ]*$
If that doesn't fix your issue, then your issue is not the &, it's something else.

Add special characters to alphanumeric regex

Brand new to using Regular Expressions. I have one that currently accepts alphanumeric characters only. I need to add the following special characters to the regex:
# #$%*():;"',/? !+=-_
Here is the regular expression:
RegularExpression(#"^[a-zA-Z\s.,0-9-]{1,30}$",
When I try to add the special characters, I alter the Regex like so:
RegularExpression(#"^[a-zA-Z\s.,0-9-# #$%*():;"',/? !+=-_]{1,30}$"
However this throws an error starting with the ' character that says Newline in constant.
I've tied to escape both the " and the ' characters, however without any luck.
the problem comes from the double quote that need to be escaped (""), not from the single quote.
#"^[a-zA-Z\s.,0-9##$%*():;""'/?!+=_-]{1,30}$"
note that the - must be at the last (or first) position in a character class, since it has a special meaning (define ranges)
These regexs' are equivalent to yours.
Both use tilde ~ as the delimeter.
Both use double quotes on the regex strings.
Note that in order for the the dash - in class to be interpreted literally and not as a range operator, it must exist somewhere disambiguous, or be escaped.
A good place to put it is between valid ranges (or at the beginning or end of a class).
For example [a-z-0-9] is a good place.
Edit - '-' Literal may have to be escaped or beginning/end of class. (This case was for Perl/PCRE engines)
This one ^[a-z-A-Z0-9_\s.,##$%*():;"',/?!+=]{1,30}$ is your regex without duplicate chars.
To make it more readable noting that the word class is contained, it can be reduced to
^[\w-\s.,##$%*():;"',/?!+=]{1,30}$
Edit - Php test cases removed.

Why does .* fail to match the entire (rest of the) string in this regex?

I ran into a problem with my regular expressions, I'm using regular expressions for obtaining data from the string below:
"# DO NOT EDIT THIS MAIL BY HAND #\r\n\r\n[Feedback]:hallo\r\n\r\n# DO NOT EDIT THIS MAIL BY HAND #\r\n\r\n"
So far I got it working with:
String sFeedback = Regex.Match(Message, #"\[Feedback\]\:(?<string>.*?)\r\n\r\t\n# DO NOT EDIT THIS MAIL BY HAND #").Groups[1].Value;
This works except if the header is changed, therefore I want the regex to read from [feedback]: to the end of the string. (symbols, ascii, everything..)
I tried: \[Feedback]:(?<string>.*?)$
Above regular expression does work in some regular expression builders online but in my c# code its not working and returns a empty string. What's wrong?
The problem is that . doesn't match newlines unless you use RegexOptions.Singleline when compiling the regex or inline it using (?s):
(?s)\[Feedback\]:(.*)$
You are missing the escape character.
Also, since you are not referring to the group by name in your C# code, you could further simplify your regex to this
\[Feedback\]:(.*)$
$ in regex means:
The match must occur at the end of the string or before \n at the end of the line or string.
and . means:
Matches any single character except \n.
try to use this simple regex:
\[Feedback\]:(?<string>.*)

Removing String Escape Codes

My program outputs strings like "Wzyryrff}av{v5~fvzu: Bb``igbuz~+\177Ql\027}C5]{H5LqL{" and the problem is the escape codes (\\\ instead of \, \177 instead of the character, etc.)
I need a way to unescape the string of all escape codes (mainly just the \\\ and octal \027 types). Is there something that already does this?
Thanks
Reference: http://www.tailrecursive.org/postscript/escapes.html
The strings are an encrypted value and I need to decrypt them, but I'm getting the wrong values since the strings are escaped
It sounds more like it's encoded rather than simply escaped (if \177 is really a character). So, try decoding it.
There is nothing built in to do exactly this kind of escaping.
You will need to parse and replace these sequences yourself.
The \xxx octal escapes can be found with a RegEx (\\\d{3}), iterating over the matches will allow you to parse out the octal part and get the replacement character for it (then a simple replace will do).
The others appear to be simple to replace with string.Replace.
If the string is encrypted then you probably need to treat it as binary and not text. You need to know how it is encoded and decode it accordingly. The fact that you can view it as text is incidental.
If you want to replace specific contents you can just use the .Replace() method.
i.e. myInput.Replace("\\", #"\")
I am not sure why the "\" is a problem for you. If it its actually an escape code then it just should be fine since the \ represents the \ in a string.
What is the reason you need to "remove" the escape codes?

Categories