Regex match everything but - c#

I would like to create a regular expression to match every word, whitespace, punctuation and special characters in a string except for specific keywords or phrases. Because I only can modify regex, not server code I have to use match instead of replace.
I have something like this so far: (?!(quick|brown|fox|the lazy))\b\w+ but it ignores white spaces and special characters in this tool
Thanks.

Does this work for you (?!(quick|brown|fox|the lazy))(\b\w+|[^\w])?
Do you have any examples?

Related

Allow double-byte space in Regex

I have a regex on my C# code to check if the name that end user entered is valid, my regex deny double-byte characters like double-byte space.
The double-byte space like the space between quotation “ “ .
My regex: #"^[\p{L}\p{M}\p{N}' \.\-]+$".
I'm already tried to edit this regex to accept double-byte space, but I did not reach meaningful result.
So please if any one can edit this regex to accept double-byte space, I will be thankful for him.
You need to replace a literal space with a pattern that matches any horizontal Unicode whitespace and in .NET regex, it can be achieved with \p{Zs}.
#"^[\p{L}\p{M}\p{N}\p{Zs}'.-]+$"
See the regex demo.
Note this pattern does not match a TAB char. If you need to match a TAB, too, you just need to add it,
#"^[\p{L}\p{M}\p{N}\p{Zs}\t'.-]+$"
Note you do not need to escape . and - in this regex. . inside square brackets is not any special regex metacharacter and - is not special when it is placed at the end of the character class.

Tamil language full-word search with .NET Regex

I have a Grid filled with Tamil words and a search string. I need to implement a full-word search through the Grid records. I'm using .NET Regex class for that approach. It sounds pretty simple, what I used to do is:
string pattern = #"\b" + searchText + #"\b".
It works as expected in Latin languages but for Tamil, this expression returns strange results. I have read about Unicode characters in regular expressions but that doesn't seem quite helpful to me. What I probably need is to determine where is the word boundary found and why.
As an example:
For the "\bஅம்மா\b" pattern Regex found matches in
அம்மாவிடம் and அம்மாக்கள் records but not in the original அம்மா record.
The last char in "அம்மா" word is ‎0BBE TAMIL VOWEL SIGN AA and it is a combining mark (in regex, it can be matched with \p{M}).
As \b only matches between start/end of string and a word char or between a word and a non-word char, it won't match after the char and a non-word char.
Use a usual workaround in this case.
var pattern = $#"(?<!\w){searchText}(?!\w)";
See this regex demo.
Here, (?<!\w) fails the match if there is a word char before searchText and (?!\w) fails the match if there is a word char after the text to find. Note you may also use Regex.Escape(searchText) if the text can contains special regex chars.
Or, if you want to avoid matching when inside base letters/diacritics, use
var pattern = $#"(?<![\p{{L}}\p{{M}}]){searchText}(?![\p{{L}}\p{{M}}])";
See this regex demo.
The (?<![\p{L}\p{M}]) and (?![\p{L}\p{M}]) lookarounds work similarly as the ones above, just they fails the match if there is a letter or a combining mark on either side of the search phrase.

Extract string from a pattern preceded by any length

I'm looking for a regular expression to extract a string from a file name
eg if filename format is "anythingatallanylength_123_TESTNAME.docx", I'm interested in extracting "TESTNAME" ... probably fixed length of 8. (btw, 123 can be any three digit number)
I think I can use regex match ...
".*_[0-9][0-9][0-9]_[A-Z][A-Z][A-Z][A-Z][A-Z][A-Z][A-Z][A-Z].docx$"
However this matches the whole thing. How can I just get "TESTNAME"?
Thanks
Use parenthesis to match a specific piece of the whole regex.
You can also use the curly braces to specify counts of matching characters, and \d for [0-9].
In C#:
var myRegex = new Regex(#"*._\d{3}_([A-Za-z]{8})\.docx$");
Now "TESTNAME" or whatever your 8 letter piece is will be found in the captures collection of your regex after using it.
Also note, there will be a performance overhead for look-ahead and look-behind, as presented in some other solutions.
You can use a look-behind and a look-ahead to check parts without matching them:
(?<=_[0-9]{3}_)[A-Z]{8}(?=\.docx$)
Note that this is case-sensitive, you may want to use other character classes and/or quantifiers to fit your exact pattern.
In your file name format "anythingatallanylength_123_TESTNAME.docx", the pattern you are trying to match is a string before .docx and the underscore _. Keeping the thing in mind that any _ before doesn't get matched I came up with following solution.
Regex: (?<=_)[A-Za-z]*(?=\.docx$)
Flags used:
g global search
m multi-line search.
Explanation:
(?<=_) checks if there is an underscore before the file name.
(?=\.docx$) checks for extension at the end.
[A-Za-z]* checks the required match.
Regex101 Demo
Thanks to #Lucero #noob #JamesFaix I came up with ...
#"(?<=.*[0-9]{3})[A-Z]{8}(?=.docx$)"
So a look behind (in brackets, starting with ?<=) for anything (ie zero or more any char (denoted by "." ) followed by an underscore, followed by thee numerics, followed by underscore. Thats the end of the look behind. Now to match what I need (eight letters). Finally, the look ahead (in brackets, starting with ?=), which is the .docx
Nice work, fellas. Thunderbirds are go.

Why is this regex not allowing this text?

I have a username validator IsValidUsername, and I am testing "baconman" but it is failing, could someone please help me out with this regex?
if(!Regex.IsMatch(str, #"^[a-zA-Z]\\w+|[0-9][0-9_]*[a-zA-Z]+\\w*$")) {
isValid = false;
}
I want the restrictions to be: (It's very close)
Be between 5 & 17 characters long
contain at least one letter
no spaces
no special characters
You're escaping unnecessarily: if you write your regex as starting with # outside the string, you don't need both \ - just one is fine.
Either:
#"\w"
or
"\\w"
Edit: I didn't make this clear: right now due to the double escaping, you're looking for a \ in your regex and a w. So your match would need [some character]\w to match (example: "a\w" or "a\wwwwww" would match.
Your requirements are best taken care of in normal C#. They don't map well to a regular expression. Just code them up using LINQ which works on strings like it would on an IEnumerable<char>.
Also, understanding a query of a string is much easier than understanding a Regex with the requirements that you have.
It is possible to do everything as part of a Regex, however it is not pretty :-)
^(\w(?=\w*[a-zA-Z])|[a-zA-Z]|\w(?<=[a-zA-Z]\w*)){5,17}$
It does 3 checks that always results in 1 character being matched (so we can perform the length check in the end)
Either the character is any word character \w which is before [a-zA-Z]
Or it is [a-zA-Z]
Or it is any word character \w which is after [a-zA-Z]

regular expression for special chars and numerics

I need a regular expression for accepting alpha numerics and special charecters also
like : abc & def12
thanks in advance
Nagesh
This might be the syntax you are looking for /^[a-zA-Z0-9&:\/ ]+$/, insert any other characters you want to match between the square brackets.
I would recommend you to read up on regular expressions if you intend to use them in the future, check out this tutorial http://perldoc.perl.org/perlretut.html
^[\w]+$ this regex will match all alphanumerics, if you want to match some other chars as well just specify them in the [] brackets, i.e. if you wan't to also match ampersand you will have ^[\w&]+$ regex, if you wan't to match white characters as well (tabs, spaces, line feeds, carriage returns) you add \d and end up with ^[\w&\s]+$ and so on until you have all your special characters handled.
not got a specific answer, but regexlib.com has come in handy for me.

Categories