Regular expression for non-standard ascii characters

Regular expression for non-standard ascii characters - c#

i need a regular expression that check a string for any non-standard ASCIi characters.

You can specify character's unicode point in c# string: "[\u0080-\uFFFF]" should find any character whose "ascii" code is 128+

does this simple one suit your needs ?
[^\x20-\x7E]

Put what you consider the standard characters in a set, then put the negate ^ sign in the set. That will match the nonstandard. For example I consider the standard to be a-z so my nonstandard match pattern would be
[^A-Za-z]
if that matches you have a non standard.

Related

Extract string from a pattern preceded by any length

I'm looking for a regular expression to extract a string from a file name
eg if filename format is "anythingatallanylength_123_TESTNAME.docx", I'm interested in extracting "TESTNAME" ... probably fixed length of 8. (btw, 123 can be any three digit number)
I think I can use regex match ...
".*_[0-9][0-9][0-9]_[A-Z][A-Z][A-Z][A-Z][A-Z][A-Z][A-Z][A-Z].docx$"
However this matches the whole thing. How can I just get "TESTNAME"?
Thanks

Use parenthesis to match a specific piece of the whole regex.
You can also use the curly braces to specify counts of matching characters, and \d for [0-9].
In C#:
var myRegex = new Regex(#"*._\d{3}_([A-Za-z]{8})\.docx$");
Now "TESTNAME" or whatever your 8 letter piece is will be found in the captures collection of your regex after using it.
Also note, there will be a performance overhead for look-ahead and look-behind, as presented in some other solutions.

You can use a look-behind and a look-ahead to check parts without matching them:
(?<=_[0-9]{3}_)[A-Z]{8}(?=\.docx$)
Note that this is case-sensitive, you may want to use other character classes and/or quantifiers to fit your exact pattern.

In your file name format "anythingatallanylength_123_TESTNAME.docx", the pattern you are trying to match is a string before .docx and the underscore _. Keeping the thing in mind that any _ before doesn't get matched I came up with following solution.
Regex: (?<=_)[A-Za-z]*(?=\.docx$)
Flags used:
g global search
m multi-line search.
Explanation:
(?<=_) checks if there is an underscore before the file name.
(?=\.docx$) checks for extension at the end.
[A-Za-z]* checks the required match.
Regex101 Demo

Thanks to #Lucero #noob #JamesFaix I came up with ...
#"(?<=.*[0-9]{3})[A-Z]{8}(?=.docx$)"
So a look behind (in brackets, starting with ?<=) for anything (ie zero or more any char (denoted by "." ) followed by an underscore, followed by thee numerics, followed by underscore. Thats the end of the look behind. Now to match what I need (eight letters). Finally, the look ahead (in brackets, starting with ?=), which is the .docx
Nice work, fellas. Thunderbirds are go.

Regex to match all alphanumeric and math operators

I have the simple regex #"[a-zA-Z]" to match all characters a-z in a string but I also need math operators (*, /, +, -). I was reading over the documentation on msdn but I got lost relatively fast due to the math operators being used as other tokens in the regex
This solution works:
#"[A-Za-z\*\+\-\/]"
Thanks for the help and resources everyone.

The correct answer is
#"[A-Za-z*+/-]"
Or #"[A-Za-z-*+/]", or #"[-A-Za-z*+/]", or #"[A-Za-z*\-+/]".
Or, shorten it with a case-insensitive modifier: #"(?i)[A-Z*+/-]" (or use a corresponding RegexOptions.IgnoreCase with #"[A-Z*+/-]" since it seems you are using C#).
Inside a character class, the unescaped hyphen should either be at the start or final position to be treated as a literal, or right after a range or shorthand class. Otherwise, it must be escaped. Also, ] must be escaped if not at the beginning of the character class. Other characters do not have to be escpaed inside a character class.
To test, use an appropriate online regex tester. You need one for .NET, see Regex demo at RegexStorm.

Regular expression to match all numeric characters except 5

When I want to match all numeric characters except 5 I use:
[^\D|5]
or
[^\D5]
or
[0-46-9]
or
[012346789]
When I want to match no numeric characters I can use:
[^\d]
or
[\D]
All of them work well. But when I use [^^\d5] or [^^\d|5] to match all numeric characters except 5, it doesn't work.
I want to use it in a lot of cases. For example, I want to match all \p{P} but not \:. Is there any way to use ^\d to match all numeric character except 5?

You could match all digits except 5 using this:
[123467890]
There is no reason to use a shorthand version of everything.
It makes no difference to the regex engine.
In fact, adding in alternation| and zero-length assertions^ will only degrade your performance.
A shorter version would be:
[0-46-9]
Hyphen/Dash behavior inside character classes []
Hyphens will specify a range inside character classes. You can look up an ASCII table to see what range you are doing, for example: [ -Z] actually matches ASCII 33 to 127.
Edit:
Ok, now I have a better understanding of your requirements.
You need to be specific about what you need to match up front.
You can do this using negative/positive lookaheads:
(?!.*?5.*?)(?!.*?\p{Alpha}.*?)(\p{P}*?$|\p{L}*?$)
This will match under the following conditions:
There is no number 5
There is no character from the POSIX class: Alpha
Any character with the Unicode property "letter" or "punctuation"

\d is just [0-9]. See the Java regex reference for confirmation.
Just use [0-46-9]. You can try it in a regex fiddle.
UPDATE:
Based on the requirement to leverage De Morgan's laws and use a logical complement per the OP's comment, here is my interpretation of the logical complement of [^\D5].
[^\D5] essentially means "NOT (a non-digit character OR 5)". Compare this to "NOT (A OR B)" in the referenced Wikipedia article on De Morgan's laws.
What we need then is "(NOT a non-digit character) AND (NOT 5)". Compare this to "(NOT A) AND (NOT B)" in the referenced Wikipedia article.
Here then is my interpretation of logically complementing [^\D5] using a sequence of lookahead expressions for logical ANDing:
(?!\D)(?!5).
No, it does not use double negation by ^^; this does not work as you have found; but the above logical complement essentially means what we want in regexese - "(NOT a non-digit character) AND (NOT 5)" - applied to a single character (i.e. .).
You can see in a follow-on regex fiddle that the above logical complement yields the same results as [^\D5] like it should.

Regexp Remove any non alphanumeric, but leave some special characters in one expression

I have this code that replaces all non alphanumeric characters with "-" char.
return Regex.Replace(strIn, #"[\W|_]+", "-", RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);
but I need to change it to allow pass some special characters (one or more) for example: #,*,%
how to change this regular expression?

Use
[^\p{L}\p{N}#*%]+
This matches one or more characters that are neither letters nor digits nor any of #, * or %.

Another option, you can use charcter class subtractioninfo, for example to remove # from the character class:
[\W_-[#]]+
Just add other accepted special chars after the #. Live example here: http://rextester.com/rundotnet?code=YFQ40277

How about this one:
[^a-zA-Z0-9#*%]+
If you are using unicode you can do (as Tim's answer):
[^\p{L}\p{N}#*%]+

Use this.
([^\w#*%]|_)
Add any other special characters after the %.
It is basically saying, match any character that is not (^) a word character(\w), #, * or % OR match _.

It seems this way is the best solution for you
#"(?!.*[^\w#*%])"

You can use set subtraction for that:
#"[\W_-[#*%]]+"
This matches the set of all non-word characters and the underscore, minus the set of #, * and %.
Note that you don't have to use | for "or" in a character class, since that's implied. In fact, the | in your regex just matches |.
Note also that in .NET, \w matches a few other "connector punctuation" characters besides the underscore. If you want to match the other characters too, you can use
#"[\W\p{Pc}-[#*%]]+"

regular expression for special chars and numerics

I need a regular expression for accepting alpha numerics and special charecters also
like : abc & def12
thanks in advance
Nagesh

This might be the syntax you are looking for /^[a-zA-Z0-9&:\/ ]+$/, insert any other characters you want to match between the square brackets.
I would recommend you to read up on regular expressions if you intend to use them in the future, check out this tutorial http://perldoc.perl.org/perlretut.html

^[\w]+$ this regex will match all alphanumerics, if you want to match some other chars as well just specify them in the [] brackets, i.e. if you wan't to also match ampersand you will have ^[\w&]+$ regex, if you wan't to match white characters as well (tabs, spaces, line feeds, carriage returns) you add \d and end up with ^[\w&\s]+$ and so on until you have all your special characters handled.

not got a specific answer, but regexlib.com has come in handy for me.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regular expression for non-standard ascii characters - c#

i need a regular expression that check a string for any non-standard ASCIi characters.

You can specify character's unicode point in c# string: "[\u0080-\uFFFF]" should find any character whose "ascii" code is 128+

does this simple one suit your needs ? [^\x20-\x7E]

Put what you consider the standard characters in a set, then put the negate ^ sign in the set. That will match the nonstandard. For example I consider the standard to be a-z so my nonstandard match pattern would be [^A-Za-z] if that matches you have a non standard.

Related

Extract string from a pattern preceded by any length

Regex to match all alphanumeric and math operators

Regular expression to match all numeric characters except 5

Regexp Remove any non alphanumeric, but leave some special characters in one expression

regular expression for special chars and numerics

Categories

Resources