Regex IsMatch Issue - c#

Can someone tell me the error in Regex for below mentioned things:
Regex: #"^(tcm:\d+-\d+)"
Input String: tcm:12-123a6
Problem: \d should only match numerals as per my knowledge. Input
string has 'a' in it. Still it matches the input string.
Regex: #"^[a-zA-Z0-9,&\s-]*$"
Input String: Transportation, Tourism & Travel which I am reading
from Query String and comes as
Transportation%252c%2bTravel%2b%2526%2bTourism
Problem: I think I have taken all the characters of input into
Regex. Still it does not match.
Regex: #"^[a-zA-Z0-9=]*$"
Input String: U2VuaW9yIFBhcnRuZXIgJiBNYW5hZ2luZyB&&&EaXJlY3Rvcg==
Problem: Even with '&' in input, why is it matching?

#"^(tcm:\d+-\d+)" will match tcm:12-123 from your string, you need to put $ in the end of your regex to match whole string.
#"^(tcm:\d+-\d+)$"
If ':' belongs to the string, then you need to add it to your list.
#"^[a-zA-Z0-9,&\s-:]*$"

Related

Match alphanumeric string of a specific length and string with hypens using Regex

I am parsing a URL and need help matching a an alphanumeric string of a specific length and a string with hyphens.
here is a sample string:
http://www.theplace.com/er5465dF3/2288494033cbbbe3de861f60bdf6934c/zRklm2/The-Post-Is-Here
This is what I have for the alphanumeric, but it is matching any alphanumeric with that length in the URL:
[0-9a-zA-Z]{9}
I Tired this, but I do not want the slashes back nor is it guaranteed that there will be a slash at the end. So I was a bit closer with this:
\/[0-9a-zA-Z]{9}\/
I've tried a few things with the hyphen regex, but they were variations of what I had above without the length attribute. I am stumped on that one. Again, it is not guaranteed that the hyphenated string will be at the end or between slashes.
This is what I am expecting as output for each type:
I should only get back one match for er5465dF3 with an alphanumeric regex with length of 9.
I should only get back The-Post-Is-Here with the hyphen regex.
Just add word boundaries in your regex to does an exact match.
\b[0-9a-zA-Z]{9}\b
Output:
er5465dF3
DEMO
To get the hypen seperated string,
\b[A-Za-z]+(?:-[A-Za-z]+)+\b
DEMO

Input string that takes in only alphanumeric, # sign and - sign and spaces

Tried using the following regex code but the - key cant be accepted into my input textbox. Please assist!
My code is as followed:
if (Regex.IsMatch(textBox_address.Text, #"^[a-zA-Z0-9#- ]+$"))
Escape the - by replacing it by \-:
^[a-zA-Z0-9#\- ]+$
As you may see in this expression the [.-.] if used to define a set of characters. To explain the regex parser, that your character has not this meaning use \ to escape it.
It would be the same thing if you want to a regex that matches only numbers and [.
To do it : ^[0-9\[]+$ otherwise the regex can't be parsed.

Validate Unicode Length With Regex

How can I validate ۱۳۹۱/۰۹/۰۹ string with Regex
I want the length of each separate slash be exact as {4}/{2}/{2}
the Unicode range is [\u06F0-\u06F9].
I have problem with length checking.
You can use the following regular expression:
"^[\u06F0-\u06F9]{4}/[\u06F0-\u06F9]{2}/[\u06F0-\u06F9]{2}$"
You're probably missing the ^ to make it start the match at the beginning of the string and the $ to make it end the match at the end of the string. Without these changes strings that were longer, but that contained your expression would yield as a match.
With this change a match is only successful if the string contains your pattern and does not have any extra characters to the left or to the right of the target pattern.
This regex should work for you:
"(^|[^\u06F0-\u06F9]{1})[\u06F0-\u06F9]{4}/[\u06F0-\u06F9]{2}/[\u06F0-\u06F9]{2}([^\u06F0-\u06F9]{1}|$)"
Match the date expression under both of the following conditions:
Condition1: It should be either at the beginning of the string or after a single character that's not in the character range [\u06F0-\u06F9]
Condition2: It should be either at the end of the string or before a single character that's not in the character range [\u06F0-\u06F9]
This will not match the expression in this string:
How can I validate ۱۱۳۹۱/۰۹/۰۹ string with Regex
-------------------^5Numbers, not matched
Or this string:
How can I validate ۱۱۳۹۱/۰۹/۰۹۹ string with Regex
------------------------------^Three numbers, not matched
but still will match the date expression in this string:
How can I validate۱۳۹۱/۰۹/۰۹string with Regex
------------------^---------^ No whitespaces above ^, the expression is matched though
If you want to avoid this, i.e, just match the date expression alone, with whitespaces (and linebreaks) before and after it, use the following Regex:
(^|[ \t\n]{1})[\u06F0-\u06F9]{4}/[\u06F0-\u06F9]{2}/[\u06F0-\u06F9]{2}([ \t\n]{1}|$)
Hope that's helpful.

Simple C# regex

I have a regex I need to match against a path like so: "C:\Documents and Settings\User\My Documents\ScanSnap\382893.pd~". I need a regex that matches all paths except those ending in '~' or '.dat'. The problem I am having is that I don't understand how to match and negate the exact string '.dat' and only at the end of the path. i.e. I don't want to match {d,a,t} elsewhere in the path.
I have built the regex, but need to not match .dat
[\w\s:\.\\]*[^~]$[^\.dat]
[\w\s:\.\\]* This matches all words, whitespace, the colon, periods, and backspaces.
[^~]$[^\.dat]$ This causes matches ending in '~' to fail. It seems that I should be able to follow up with a negated match for '.dat', but the match fails in my regex tester.
I think my answer lies in grouping judging from what I've read, would someone point me in the right direction? I should add, I am using a file watching program that allows regex matching, I have only one line to specify the regex.
This entry seems similar: Regex to match multiple strings
You want to use a negative look-ahead:
^((?!\.dat$)[\w\s:\.\\])*$
By the way, your character group ([\w\s:\.\\]) doesn't allow a tilde (~) in it. Did you intend to allow a tilde in the filename if it wasn't at the end? If so:
^((?!~$|\.dat$)[\w\s:\.\\~])*$
The following regex:
^.*(?<!\.dat|~)$
matches any string that does NOT end with a '~' or with '.dat'.
^ # the start of the string
.* # gobble up the entire string (without line terminators!)
(?<!\.dat|~) # looking back, there should not be '.dat' or '~'
$ # the end of the string
In plain English: match a string only when looking behind from the end of the string, there is no sub-string '.dat' or '~'.
Edit: the reason why your attempt failed is because a negated character class, [^...] will just negate a single character. A character class always matches a single character. So when you do [^.dat], you're not negating the string ".dat" but you're matching a single character other than '.', 'd', 'a' or 't'.
^((?!\.dat$)[\w\s:\.\\])*$
This is just a comment on an earlier answer suggestion:
. within a character class, [], is a literal . and does not need escaping.
^((?!\.dat$)[\w\s:.\\])*$
I'm sorry to post this as a new solution, but I apparently don't have enough credibility to simply comment on an answer yet.
I believe you are looking for this:
[\w\s:\.\\]*([^~]|[^\.dat])$
which finds, like before, all word chars, white space, periods (.), back slashes. Then matches for either tilde (~) or '.dat' at the end of the string. You may also want to add a caret (^) at the very beginning if you know that the string should be at the beginning of a new line.
^[\w\s:\.\\]*([^~]|[^\.dat])$

why do these regex tests let certain characters pass?

I am checking a string with the following regexes:
[a-zA-Z0-9]+
[A-Za-z]+
For some reason, the characters:
.
-
_
are allowed to pass, why is that?
If you want to check that the complete string consists of only the wanted characters you need to anchor your regex like follows:
^[a-zA-Z0-9]+$
Otherwise every string will pass that contains a string of the allowed characters somewhere. The anchors essentially tell the regular expression engine to start looking for those characters at the start of the string and stop looking at the end of the string.
To clarify: If you just use [a-zA-Z0-9]+ as your regex, then the regex engine would rightfully reject the string -__-- as the regex doesn't match against that. There is no single character from the character class you defined.
However, with the string a-b it's different. The regular expression engine will match the first a here since that matches the expression you entered (at least one of the given characters) and won't care about the - or the b. It has done its job and successfully matched a substring according to your regular expression.
Similarly with _-abcdef- – the regex will match the substring abcdef just fine, because you didn't tell it to match only at the start or end of the string; and ignore the other characters.
So when using ^[a-zA-Z0-9]+$ as your regex you are telling the regex engine definitely that you are looking for one or more letters or digits, starting at the very beginning of the string right until the end of the string. There is no room for other characters to squeeze in or hide so this will do what you apparently want. But without the anchors, the match can be anywhere in your search string. For validation purposes you always want to use those anchors.
In regular expressions the + tells the engine to match one or more characters.
So this expression [A-Za-z]+ passes if the string contains a sequence of 1 or more alphabetic characters. The only strings that wouldn't pass are strings that contain no alphabetic characters at all.
The ^ symbol anchors the character class to the beginning of the string and the $ symbol anchors to the end of the string.
So ^[A-Za-z0-9]+ means 'match a string that begins with a sequence of one or more alphanumeric characters'. But would allow strings that include non-alphanumerics so long as those characters were not at the beginning of the string.
While ^[A-Za-z0-9]+$ means 'match a string that begins and ends with a sequence of one or more alphanumeric characters'. This is the only way to completely exclude non-alphanumerics from a string.

Categories