i'm using C# and i'm trying to allow only alphabetical letters and spaces. my expression at the moment is:
string regex = "^[A-Za-z\s]{1,40}$";
my IDE says that \s is an "Unrecognized escape sequence"
what am i missing?
"\" is a c# escape character as well as a regex escape character. Try:
string regex = #"^[A-Za-z\s]{1,40}$";
You need to put an # in front of your string to turn it into a verbatim string literal:
string regex = #"^[A-Za-z\s]{1,40}$";
Right now, the \ in your regex is being interpreted as trying to escape the following s, which the compiler doesn't understand.
Alternatively, you can just escape the backslash with another one:
string regex = "^[A-Za-z\\s]{1,40}$";
but in general, prefer the first approach to the second.
An additional note, your regex doesn't do what you describe. You say a max of 1 space in between words. In order to do that, you need to move the "\s" out of the character list. The pattern you're currently using allows "any alphanumeric or space from 1 to 40 times" which allows for multiple successive spaces. You'll need something more like the following:
string regex = #"^(?:[A-Za-z]+\s?)+$";
This means "any alphanumeric 1 or more times followed by an optional space, this whole thing one or more times". I don't know how to limit the whole string to 40 characters when you don't know the size of the first expression in advance. Maybe this can be achieved with a "look behind" expression, but I'm not sure. You might have to do it in two steps.
Related
I would like to make a regex that validate a string is in this format:
".xml;.mp4;.webm;.wmv;.ogg"
file format separated with semicolon.
what is the best way to do this?
We can try using the pattern ^(?:\.[A-Za-z0-9]{3,4})(?:;\.[A-Za-z0-9]{3,4})*$:
Regex regex = new Regex(#"^(?:\.[A-Za-z0-9]{3,4})(?:;\.[A-Za-z0-9]{3,4})*$");
Match match = regex.Match(".xml;.mp4;.webm;.wmv;.ogg");
if (match.Success)
{
Console.WriteLine("MATCH");
}
Explanation:
^ from the start of the string
(?:\.[A-Za-z0-9]{3,4}) match a dot followed by 3-4 alphanumeric characters
(?:;\.[A-Za-z0-9]{3,4})* then match semicolon, followed by dot and 3-4 alphanumeric
characters, that quantity zero or more times
$ match the end of the string
Side note: I used ?: inside the terms in parentheses, which in theory should tell the regex engine not to capture these terms. This might improve performance, though perhaps at the cost of the pattern being slightly less readable.
Something like this, but need to check for only one format (if list has only one format, will it be followed by semicolon).
^(?:\.[a-zA-Z0-9]+;)*\.[a-zA-Z0-9]+$
I have a username validator IsValidUsername, and I am testing "baconman" but it is failing, could someone please help me out with this regex?
if(!Regex.IsMatch(str, #"^[a-zA-Z]\\w+|[0-9][0-9_]*[a-zA-Z]+\\w*$")) {
isValid = false;
}
I want the restrictions to be: (It's very close)
Be between 5 & 17 characters long
contain at least one letter
no spaces
no special characters
You're escaping unnecessarily: if you write your regex as starting with # outside the string, you don't need both \ - just one is fine.
Either:
#"\w"
or
"\\w"
Edit: I didn't make this clear: right now due to the double escaping, you're looking for a \ in your regex and a w. So your match would need [some character]\w to match (example: "a\w" or "a\wwwwww" would match.
Your requirements are best taken care of in normal C#. They don't map well to a regular expression. Just code them up using LINQ which works on strings like it would on an IEnumerable<char>.
Also, understanding a query of a string is much easier than understanding a Regex with the requirements that you have.
It is possible to do everything as part of a Regex, however it is not pretty :-)
^(\w(?=\w*[a-zA-Z])|[a-zA-Z]|\w(?<=[a-zA-Z]\w*)){5,17}$
It does 3 checks that always results in 1 character being matched (so we can perform the length check in the end)
Either the character is any word character \w which is before [a-zA-Z]
Or it is [a-zA-Z]
Or it is any word character \w which is after [a-zA-Z]
I'm trying to write a regular expression that finds C#-style unescaped strings, such as
string x = #"hello
world";
The problem I'm having is how to write a rule that handles double quotes within the string correctly, like in this example
string x = #"before quote ""junk"" after quote";
This should be an easy one, right?
Try this one:
#".*?(""|[^"])"([^"]|$)
The first parantheses mean 'If there is an " before the finishing quote, it better be two of them', the second parantheses mean 'After the finishing quote, there sould ether be not a quote, or the end of the line'.
How 'bout the regex #\"([^\"]|\"\")*\"(?=[^\"])
Due to greedy matching, the final lookahead clause is likely not to be needed in your regex engine, although it is more specific.
If I remember correctly, you have to use \"" - the double-double quotes to hash it for C# and the backslash to hash it for regex.
Try this:
#"[^"]*?(""[^"]*?)*";
It looks for the starting characters #", for the ending characters "; (you can leave the semicolon out if you need to) and in between it can have any characters except quotes, or if there are quotes they have to be doubled.
#"(?:""|[^"])*"(?!")
is the right regex for this job. It matches the #, a quote, then either two quotes in a row or any non-quote character, repeating this up unto the next quote (that isn't doubled).
"^#(""|[^"])*$" is the regex you want, looking for first an at-sign and a double-quote, then a sequence of any characters (except double-quotes) or double double-quotes, and finally a double-quote.
As a string literal in C#, you'd have to write it string regex = "^#\"(\"\"|[^\"])*\"$"; or string regex = #"^#""(""""|[^""])*""$";. Choose your poison.
I have a regex I need to match against a path like so: "C:\Documents and Settings\User\My Documents\ScanSnap\382893.pd~". I need a regex that matches all paths except those ending in '~' or '.dat'. The problem I am having is that I don't understand how to match and negate the exact string '.dat' and only at the end of the path. i.e. I don't want to match {d,a,t} elsewhere in the path.
I have built the regex, but need to not match .dat
[\w\s:\.\\]*[^~]$[^\.dat]
[\w\s:\.\\]* This matches all words, whitespace, the colon, periods, and backspaces.
[^~]$[^\.dat]$ This causes matches ending in '~' to fail. It seems that I should be able to follow up with a negated match for '.dat', but the match fails in my regex tester.
I think my answer lies in grouping judging from what I've read, would someone point me in the right direction? I should add, I am using a file watching program that allows regex matching, I have only one line to specify the regex.
This entry seems similar: Regex to match multiple strings
You want to use a negative look-ahead:
^((?!\.dat$)[\w\s:\.\\])*$
By the way, your character group ([\w\s:\.\\]) doesn't allow a tilde (~) in it. Did you intend to allow a tilde in the filename if it wasn't at the end? If so:
^((?!~$|\.dat$)[\w\s:\.\\~])*$
The following regex:
^.*(?<!\.dat|~)$
matches any string that does NOT end with a '~' or with '.dat'.
^ # the start of the string
.* # gobble up the entire string (without line terminators!)
(?<!\.dat|~) # looking back, there should not be '.dat' or '~'
$ # the end of the string
In plain English: match a string only when looking behind from the end of the string, there is no sub-string '.dat' or '~'.
Edit: the reason why your attempt failed is because a negated character class, [^...] will just negate a single character. A character class always matches a single character. So when you do [^.dat], you're not negating the string ".dat" but you're matching a single character other than '.', 'd', 'a' or 't'.
^((?!\.dat$)[\w\s:\.\\])*$
This is just a comment on an earlier answer suggestion:
. within a character class, [], is a literal . and does not need escaping.
^((?!\.dat$)[\w\s:.\\])*$
I'm sorry to post this as a new solution, but I apparently don't have enough credibility to simply comment on an answer yet.
I believe you are looking for this:
[\w\s:\.\\]*([^~]|[^\.dat])$
which finds, like before, all word chars, white space, periods (.), back slashes. Then matches for either tilde (~) or '.dat' at the end of the string. You may also want to add a caret (^) at the very beginning if you know that the string should be at the beginning of a new line.
^[\w\s:\.\\]*([^~]|[^\.dat])$
I am checking a string with the following regexes:
[a-zA-Z0-9]+
[A-Za-z]+
For some reason, the characters:
.
-
_
are allowed to pass, why is that?
If you want to check that the complete string consists of only the wanted characters you need to anchor your regex like follows:
^[a-zA-Z0-9]+$
Otherwise every string will pass that contains a string of the allowed characters somewhere. The anchors essentially tell the regular expression engine to start looking for those characters at the start of the string and stop looking at the end of the string.
To clarify: If you just use [a-zA-Z0-9]+ as your regex, then the regex engine would rightfully reject the string -__-- as the regex doesn't match against that. There is no single character from the character class you defined.
However, with the string a-b it's different. The regular expression engine will match the first a here since that matches the expression you entered (at least one of the given characters) and won't care about the - or the b. It has done its job and successfully matched a substring according to your regular expression.
Similarly with _-abcdef- – the regex will match the substring abcdef just fine, because you didn't tell it to match only at the start or end of the string; and ignore the other characters.
So when using ^[a-zA-Z0-9]+$ as your regex you are telling the regex engine definitely that you are looking for one or more letters or digits, starting at the very beginning of the string right until the end of the string. There is no room for other characters to squeeze in or hide so this will do what you apparently want. But without the anchors, the match can be anywhere in your search string. For validation purposes you always want to use those anchors.
In regular expressions the + tells the engine to match one or more characters.
So this expression [A-Za-z]+ passes if the string contains a sequence of 1 or more alphabetic characters. The only strings that wouldn't pass are strings that contain no alphabetic characters at all.
The ^ symbol anchors the character class to the beginning of the string and the $ symbol anchors to the end of the string.
So ^[A-Za-z0-9]+ means 'match a string that begins with a sequence of one or more alphanumeric characters'. But would allow strings that include non-alphanumerics so long as those characters were not at the beginning of the string.
While ^[A-Za-z0-9]+$ means 'match a string that begins and ends with a sequence of one or more alphanumeric characters'. This is the only way to completely exclude non-alphanumerics from a string.