Escaping hash and quote to regular expression

Escaping hash and quote to regular expression - c#

I am trying to define a regular to use with a regular expression validator that limits the content of a textbox to only alphanumeric characters, slash (/), hash (#), left and right parentheses (()), period (.), apostrophe ('), quote ("), hyphen (-) and spaces.
I am having troubles with the hash and quote, the other restrictions are working, but when I insert one of these chars the evaluation fails and I get the error message. I have tried to escape these characters without and also using verbatim which was my last attempt.
#"[ a-zA-ZÀ-ÿ/().\'-""#]"
Any thoughts on these? Thank you

The regex language is smart enough to understand that periods and parentheses within a character class actually refer to the characters and not to the patterns they usually do when they appear outside of character classes.
Within your character class, you need to escape the slash (\) and the hyphen(-), but that's it:
#"[ a-zA-ZÀ-ÿ/().\\'\-""#]"
If you move your hyphen to the end of the character class, you won't even need to escape that:
#"[ a-zA-ZÀ-ÿ/().\\'""#-]"
And of course this still only matches one a single character. If you want to ensure that the entire string consists only of these characters, you'll need to use start (^) and end ($) anchors and a quantifier (* or +) after your character class.
I believe your final pattern should look like this:
#"^[ a-zA-ZÀ-ÿ/().\\'""#-]*$"

Related

c# regex not removing ampersand [duplicate]

$.validator.addMethod('AZ09_', function (value) {
return /^[a-zA-Z0-9.-_]+$/.test(value);
}, 'Only letters, numbers, and _-. are allowed');
When I use somehting like test-123 it still triggers as if the hyphen is invalid. I tried \- and --

Escaping using \- should be fine, but you can also try putting it at the beginning or the end of the character class. This should work for you:
/^[a-zA-Z0-9._-]+$/

Escaping the hyphen using \- is the correct way.
I have verified that the expression /^[a-zA-Z0-9.\-_]+$/ does allow hyphens. You can also use the \w class to shorten it to /^[\w.\-]+$/.
(Putting the hyphen last in the expression actually causes it to not require escaping, as it then can't be part of a range, however you might still want to get into the habit of always escaping it.)

The \- maybe wasn't working because you passed the whole stuff from the server with a string. If that's the case, you should at first escape the \ so the server side program can handle it too.
In a server side string: \\-
On the client side: \-
In regex (covers): -
Or you can simply put at the and of the [] brackets.

Generally with hyphen (-) character in regex, its important to note the difference between escaping (\-) and not escaping (-) the hyphen because hyphen apart from being a character themselves are parsed to specify range in regex.
In the first case, with escaped hyphen (\-), regex will only match the hyphen as in example /^[+\-.]+$/
In the second case, not escaping for example /^[+-.]+$/ here since the hyphen is between plus and dot so it will match all characters with ASCII values between 43 (for plus) and 46 (for dot), so will include comma (ASCII value of 44) as a side-effect.

\- should work to escape the - in the character range. Can you quote what you tested when it didn't seem to? Because it seems to work: http://jsbin.com/odita3

A more generic way of matching hyphens is by using the character class for hyphens and dashes ("\p{Pd}" without quotes). If you are dealing with text from various cultures and sources, you might find that there are more types of hyphens out there, not just one character. You can add that inside the [] expression

Ignore spaces at the end of a string

I use the following regex, which is working, but I want to add a condition so as to accept spaces at the end of the value. Currently it is not working.
What am I missinghere?
^[a-zA-Z][a-zA-Z0-9_]+\s?$[\s]*$

Assumption: you added the two end of string anchors $ by mistake.
? quantifier, matching one or zero repetitions, makes the previous item optional
* quantifier, matching zero or more repetitions
So change your expression to
^[a-zA-Z][a-zA-Z0-9_]+\s*$
this is matching any amount of whitespace at the end of the string.
Be aware, whitespace is not just the space character, it is also tabs and newlines (and more)!
If you really want to match only space, just write a space or make a character class with all the characters you want to match.
^[a-zA-Z][a-zA-Z0-9_]+ *$
or
^[a-zA-Z][a-zA-Z0-9_]+[ \t]*$
Next thing is: Are you sure you only want plain ASCII letters? Today there is Unicode and you can use Unicode properties, scripts and blocks in your regular expressions.
Your expression in Unicode, allowing all letters and digits.
^\p{L}\w+\s*$
\p{L} Unicode property, any kind of letter from any language.
\w shorthand character class for word characters (letters, digits and connector characters like "_") [\p{L}\p{Nd}\p{Pc}] as character class with Unicode properties. Definition on msdn

why two dollars?
^[a-zA-Z][a-zA-Z0-9_]+\s*$
or make it this :
"^[a-zA-Z][a-zA-Z0-9_]+\s?\$\s*$"
if you want to literally match the dollar.

Try this -
"^[a-zA-Z][a-zA-Z0-9_]+(\s)?$"
or this -
"^[a-zA-Z][a-zA-Z0-9_]+((\s){,})$"
$ indicates end of expression, if you are looking $ as character, then escape it with \

Add special characters to alphanumeric regex

Brand new to using Regular Expressions. I have one that currently accepts alphanumeric characters only. I need to add the following special characters to the regex:
# #$%*():;"',/? !+=-_
Here is the regular expression:
RegularExpression(#"^[a-zA-Z\s.,0-9-]{1,30}$",
When I try to add the special characters, I alter the Regex like so:
RegularExpression(#"^[a-zA-Z\s.,0-9-# #$%*():;"',/? !+=-_]{1,30}$"
However this throws an error starting with the ' character that says Newline in constant.
I've tied to escape both the " and the ' characters, however without any luck.

the problem comes from the double quote that need to be escaped (""), not from the single quote.
#"^[a-zA-Z\s.,0-9##$%*():;""'/?!+=_-]{1,30}$"
note that the - must be at the last (or first) position in a character class, since it has a special meaning (define ranges)

These regexs' are equivalent to yours.
Both use tilde ~ as the delimeter.
Both use double quotes on the regex strings.
Note that in order for the the dash - in class to be interpreted literally and not as a range operator, it must exist somewhere disambiguous, or be escaped.
A good place to put it is between valid ranges (or at the beginning or end of a class).
For example [a-z-0-9] is a good place.
Edit - '-' Literal may have to be escaped or beginning/end of class. (This case was for Perl/PCRE engines)
This one ^[a-z-A-Z0-9_\s.,##$%*():;"',/?!+=]{1,30}$ is your regex without duplicate chars.
To make it more readable noting that the word class is contained, it can be reduced to
^[\w-\s.,##$%*():;"',/?!+=]{1,30}$
Edit - Php test cases removed.

Simple C# regex

I have a regex I need to match against a path like so: "C:\Documents and Settings\User\My Documents\ScanSnap\382893.pd~". I need a regex that matches all paths except those ending in '~' or '.dat'. The problem I am having is that I don't understand how to match and negate the exact string '.dat' and only at the end of the path. i.e. I don't want to match {d,a,t} elsewhere in the path.
I have built the regex, but need to not match .dat
[\w\s:\.\\]*[^~]$[^\.dat]
[\w\s:\.\\]* This matches all words, whitespace, the colon, periods, and backspaces.
[^~]$[^\.dat]$ This causes matches ending in '~' to fail. It seems that I should be able to follow up with a negated match for '.dat', but the match fails in my regex tester.
I think my answer lies in grouping judging from what I've read, would someone point me in the right direction? I should add, I am using a file watching program that allows regex matching, I have only one line to specify the regex.
This entry seems similar: Regex to match multiple strings

You want to use a negative look-ahead:
^((?!\.dat$)[\w\s:\.\\])*$
By the way, your character group ([\w\s:\.\\]) doesn't allow a tilde (~) in it. Did you intend to allow a tilde in the filename if it wasn't at the end? If so:
^((?!~$|\.dat$)[\w\s:\.\\~])*$

The following regex:
^.*(?<!\.dat|~)$
matches any string that does NOT end with a '~' or with '.dat'.
^ # the start of the string
.* # gobble up the entire string (without line terminators!)
(?<!\.dat|~) # looking back, there should not be '.dat' or '~'
$ # the end of the string
In plain English: match a string only when looking behind from the end of the string, there is no sub-string '.dat' or '~'.
Edit: the reason why your attempt failed is because a negated character class, [^...] will just negate a single character. A character class always matches a single character. So when you do [^.dat], you're not negating the string ".dat" but you're matching a single character other than '.', 'd', 'a' or 't'.

^((?!\.dat$)[\w\s:\.\\])*$
This is just a comment on an earlier answer suggestion:
. within a character class, [], is a literal . and does not need escaping.
^((?!\.dat$)[\w\s:.\\])*$
I'm sorry to post this as a new solution, but I apparently don't have enough credibility to simply comment on an answer yet.

I believe you are looking for this:
[\w\s:\.\\]*([^~]|[^\.dat])$
which finds, like before, all word chars, white space, periods (.), back slashes. Then matches for either tilde (~) or '.dat' at the end of the string. You may also want to add a caret (^) at the very beginning if you know that the string should be at the beginning of a new line.
^[\w\s:\.\\]*([^~]|[^\.dat])$

why do these regex tests let certain characters pass?

I am checking a string with the following regexes:
[a-zA-Z0-9]+
[A-Za-z]+
For some reason, the characters:
.
-
_
are allowed to pass, why is that?

If you want to check that the complete string consists of only the wanted characters you need to anchor your regex like follows:
^[a-zA-Z0-9]+$
Otherwise every string will pass that contains a string of the allowed characters somewhere. The anchors essentially tell the regular expression engine to start looking for those characters at the start of the string and stop looking at the end of the string.
To clarify: If you just use [a-zA-Z0-9]+ as your regex, then the regex engine would rightfully reject the string -__-- as the regex doesn't match against that. There is no single character from the character class you defined.
However, with the string a-b it's different. The regular expression engine will match the first a here since that matches the expression you entered (at least one of the given characters) and won't care about the - or the b. It has done its job and successfully matched a substring according to your regular expression.
Similarly with _-abcdef- – the regex will match the substring abcdef just fine, because you didn't tell it to match only at the start or end of the string; and ignore the other characters.
So when using ^[a-zA-Z0-9]+$ as your regex you are telling the regex engine definitely that you are looking for one or more letters or digits, starting at the very beginning of the string right until the end of the string. There is no room for other characters to squeeze in or hide so this will do what you apparently want. But without the anchors, the match can be anywhere in your search string. For validation purposes you always want to use those anchors.

In regular expressions the + tells the engine to match one or more characters.
So this expression [A-Za-z]+ passes if the string contains a sequence of 1 or more alphabetic characters. The only strings that wouldn't pass are strings that contain no alphabetic characters at all.
The ^ symbol anchors the character class to the beginning of the string and the $ symbol anchors to the end of the string.
So ^[A-Za-z0-9]+ means 'match a string that begins with a sequence of one or more alphanumeric characters'. But would allow strings that include non-alphanumerics so long as those characters were not at the beginning of the string.
While ^[A-Za-z0-9]+$ means 'match a string that begins and ends with a sequence of one or more alphanumeric characters'. This is the only way to completely exclude non-alphanumerics from a string.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Escaping hash and quote to regular expression - c#

Related

c# regex not removing ampersand [duplicate]

Ignore spaces at the end of a string

Add special characters to alphanumeric regex

Simple C# regex

why do these regex tests let certain characters pass?

Categories

Resources