Validate Unicode Length With Regex

Validate Unicode Length With Regex - c#

How can I validate ۱۳۹۱/۰۹/۰۹ string with Regex
I want the length of each separate slash be exact as {4}/{2}/{2}
the Unicode range is [\u06F0-\u06F9].
I have problem with length checking.

You can use the following regular expression:
"^[\u06F0-\u06F9]{4}/[\u06F0-\u06F9]{2}/[\u06F0-\u06F9]{2}$"
You're probably missing the ^ to make it start the match at the beginning of the string and the $ to make it end the match at the end of the string. Without these changes strings that were longer, but that contained your expression would yield as a match.
With this change a match is only successful if the string contains your pattern and does not have any extra characters to the left or to the right of the target pattern.

This regex should work for you:
"(^|[^\u06F0-\u06F9]{1})[\u06F0-\u06F9]{4}/[\u06F0-\u06F9]{2}/[\u06F0-\u06F9]{2}([^\u06F0-\u06F9]{1}|$)"
Match the date expression under both of the following conditions:
Condition1: It should be either at the beginning of the string or after a single character that's not in the character range [\u06F0-\u06F9]
Condition2: It should be either at the end of the string or before a single character that's not in the character range [\u06F0-\u06F9]
This will not match the expression in this string:
How can I validate ۱۱۳۹۱/۰۹/۰۹ string with Regex
-------------------^5Numbers, not matched
Or this string:
How can I validate ۱۱۳۹۱/۰۹/۰۹۹ string with Regex
------------------------------^Three numbers, not matched
but still will match the date expression in this string:
How can I validate۱۳۹۱/۰۹/۰۹string with Regex
------------------^---------^ No whitespaces above ^, the expression is matched though
If you want to avoid this, i.e, just match the date expression alone, with whitespaces (and linebreaks) before and after it, use the following Regex:
(^|[ \t\n]{1})[\u06F0-\u06F9]{4}/[\u06F0-\u06F9]{2}/[\u06F0-\u06F9]{2}([ \t\n]{1}|$)
Hope that's helpful.

Related

Regex IsMatch Issue

Can someone tell me the error in Regex for below mentioned things:
Regex: #"^(tcm:\d+-\d+)"
Input String: tcm:12-123a6
Problem: \d should only match numerals as per my knowledge. Input
string has 'a' in it. Still it matches the input string.
Regex: #"^[a-zA-Z0-9,&\s-]*$"
Input String: Transportation, Tourism & Travel which I am reading
from Query String and comes as
Transportation%252c%2bTravel%2b%2526%2bTourism
Problem: I think I have taken all the characters of input into
Regex. Still it does not match.
Regex: #"^[a-zA-Z0-9=]*$"
Input String: U2VuaW9yIFBhcnRuZXIgJiBNYW5hZ2luZyB&&&EaXJlY3Rvcg==
Problem: Even with '&' in input, why is it matching?

#"^(tcm:\d+-\d+)" will match tcm:12-123 from your string, you need to put $ in the end of your regex to match whole string.
#"^(tcm:\d+-\d+)$"
If ':' belongs to the string, then you need to add it to your list.
#"^[a-zA-Z0-9,&\s-:]*$"

Remove non-alphanumeric characters from start and end of string only

I am trying to clean up some data using a helper exe (C#).
I iterate through each string and I want to remove invalid characters from the start and end of the string i.e. remove the dollar symbols from $$$helloworld$$$.
This works fine using this regular expression: \W.
However, strings which contain invalid character in the middle should be left alone i.e. hello$$$$world is fine and my regular expression should not match this particular string.
So in essence, I am trying to figure out the syntax to match invalid characters at the start and the end of of a string, but leave the strings which contain invalid characters in their body.
Thanks for your help!

This does it!
(^[\W_]*)|([\W_]*$)
This regex says match zero or more non word characters at the start(^) or(|) at the end($)

The following should work:
^\W+|\W+$
^ and $ are anchors to the beginning and end of the string respectively. The | in the middle is an OR, so this regex means "either match one or more non-word characters at the start of the string, or match one or more non-word characters at the end of the string".

Use ^ to match the start of string, and $ to match the end of string. C# Regex Cheat Sheet

Try this one,
(^[^\w]*)|([^\w]*$)

Use ^ to match 'beginning of line' and $ to match 'end of line', i.e. you code should match and remove ^\W* and \W*$

Nothing else but Regex for matching the string

I want to check whether there is string starting from number and then optional character with the help of the regex.So what should be the regex for matching the string which must be started with number and then character might be there or not.Like there is string "30a" or "30" it should be matched.But if there is "a" or some else character or sereis of characters, string should not be matched.

Sounds like there should be able to be any number of numeric characters at the beginning followed by optional other characters. To match any other character after a series of numbers at the beginning I would use:
\d+.*
To match only alpha numeric characters after the mandatory numeric beginning I would use:
\d+\w*
Note: as pointed out by Dav, if you add a ^ to the start of the expression and a $ to the end of the expression like this ^\d+\w*$ you will ensure the whole string matches. However if you leave those off, you will be able to search the input string for what you need. It just depends on what your needs are.

^\d.*
The ^ matches the start of the string, \d matches a single digit, and then the .* matches any number of additional characters.
Thus, the net result is that it will only match if the string begins with a digit.

How to check if a Regex expression matches an entire string in c#?

I am new to regex expressions so sorry if this is a really noob question.
I have a regex expression... What I want to do is check if a string matches the regex expression in its entirety without the regex expression matching any subsets of the string.
For example...
If my regex expression is looking for a match of \sA\s*, it should return a match if the string it is comparing it to is " A " but if it compares to the string " A B" it should not return a match.
Any help would be appreciated? I code in C#.

You would normally use the start end end anchors ^ and $ respecitvely:
^\s*A*\s*$
Keep in mind that, if you regex engine supports multi-line, this may also capture strings that span multiple lines as long as one of those lines matches the regex(since ^ then anchors after any newline or string-start and $ before any newline or string end). If you're only running the regex against a single line, that won't be a problem.
If you want to ensure that a multi-line input is only a single line consisting of your pattern, you can use \A and \Z if supported - these mean start and end of string regardless of newlines.

If you cannot or don't want to change the regular expression, then you can also use:
var match = regex.Match(pattern);
if (match.Success && match.Length == pattern.Length)
{
// TODO: Entire string was matched, and not a sub string
}

why do these regex tests let certain characters pass?

I am checking a string with the following regexes:
[a-zA-Z0-9]+
[A-Za-z]+
For some reason, the characters:
.
-
_
are allowed to pass, why is that?

If you want to check that the complete string consists of only the wanted characters you need to anchor your regex like follows:
^[a-zA-Z0-9]+$
Otherwise every string will pass that contains a string of the allowed characters somewhere. The anchors essentially tell the regular expression engine to start looking for those characters at the start of the string and stop looking at the end of the string.
To clarify: If you just use [a-zA-Z0-9]+ as your regex, then the regex engine would rightfully reject the string -__-- as the regex doesn't match against that. There is no single character from the character class you defined.
However, with the string a-b it's different. The regular expression engine will match the first a here since that matches the expression you entered (at least one of the given characters) and won't care about the - or the b. It has done its job and successfully matched a substring according to your regular expression.
Similarly with _-abcdef- – the regex will match the substring abcdef just fine, because you didn't tell it to match only at the start or end of the string; and ignore the other characters.
So when using ^[a-zA-Z0-9]+$ as your regex you are telling the regex engine definitely that you are looking for one or more letters or digits, starting at the very beginning of the string right until the end of the string. There is no room for other characters to squeeze in or hide so this will do what you apparently want. But without the anchors, the match can be anywhere in your search string. For validation purposes you always want to use those anchors.

In regular expressions the + tells the engine to match one or more characters.
So this expression [A-Za-z]+ passes if the string contains a sequence of 1 or more alphabetic characters. The only strings that wouldn't pass are strings that contain no alphabetic characters at all.
The ^ symbol anchors the character class to the beginning of the string and the $ symbol anchors to the end of the string.
So ^[A-Za-z0-9]+ means 'match a string that begins with a sequence of one or more alphanumeric characters'. But would allow strings that include non-alphanumerics so long as those characters were not at the beginning of the string.
While ^[A-Za-z0-9]+$ means 'match a string that begins and ends with a sequence of one or more alphanumeric characters'. This is the only way to completely exclude non-alphanumerics from a string.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Validate Unicode Length With Regex - c#

How can I validate ۱۳۹۱/۰۹/۰۹ string with Regex I want the length of each separate slash be exact as {4}/{2}/{2} the Unicode range is [\u06F0-\u06F9]. I have problem with length checking.

Related

Regex IsMatch Issue

Remove non-alphanumeric characters from start and end of string only

Nothing else but Regex for matching the string

How to check if a Regex expression matches an entire string in c#?

why do these regex tests let certain characters pass?

Categories

Resources