Regex for alphanumeric, at least 1 number and special chars - c#

I am trying to find a regex which will give me the following validation:
string should contain at least 1 digit and at least 1 special character. Does allow alphanumeric.
I tried the following but this fails:
#"^[a-zA-Z0-9##$%&*+\-_(),+':;?.,!\[\]\s\\/]+$]"
I tried "password1$" but that failed
I also tried "Password1!" but that also failed.
ideas?
UPDATE
Need the solution to work with C# - currently the suggestions posted as of Oct 22 2013 do not appear to work.

Try this:
Regex rxPassword = new Regex( #"
^ # start-of-line, followed by
[a-zA-Z0-9!##]+ # a sequence of one or more characters drawn from the set consisting of ASCII letters, digits or the punctuation characters ! # and #
(<=[0-9]) # at least one of which is a decimal digit
(<=[!##]) # at least one of which is one of the special characters
(<=[a-zA-Z]) # at least one of which is an upper- or lower-case letter
$ # followed by end-of-line
" , RegexOptions.IgnorePatternWhitespace ) ;
The construct (<=regular-expression) is a zero-width positive look-behind assertion.

Sometimes it's a lot simpler to do things one step at a time. The static constructor builds the escaped character class characters from a simple list of allowed special characters. The built-in Regex.Escape method doesn't work here.
public static class PasswordValidator {
private const string ALLOWED_SPECIAL_CHARS = #"##$%&*+_()':;?.,![]\-";
private static string ESCAPED_SPECIAL_CHARS;
static PasswordValidator() {
var escapedChars = new List<char>();
foreach (char c in ALLOWED_SPECIAL_CHARS) {
if (c == '[' || c == ']' || c == '\\' || c == '-')
escapedChars.AddRange(new[] { '\\', c });
else
escapedChars.Add(c);
}
ESCAPED_SPECIAL_CHARS = new string(escapedChars.ToArray());
}
public static bool IsValidPassword(string input) {
// Length requirement?
if (input.Length < 8) return false;
// First just check for a digit
if (!Regex.IsMatch(input, #"\d")) return false;
// Then check for special character
if (!Regex.IsMatch(input, "[" + ESCAPED_SPECIAL_CHARS + "]")) return false;
// Require a letter?
if (!Regex.IsMatch(input, "[a-zA-Z]")) return false;
// DON'T allow anything else:
if (Regex.IsMatch(input, #"[^a-zA-Z\d" + ESCAPED_SPECIAL_CHARS + "]")) return false;
return true;
}
}

This may be work, there are two possible, the digit before special char or the digit after the special char. You should use DOTALL(the dot point all char)
^((.*?[0-9].*?[##$%&*+\-_(),+':;?.,!\[\]\s\\/].*)|(.*?[##$%&*+\-_(),+':;?.,!\[\]\s\\/].*?[0-9].*))$

This worked for me:
#"(?=^[!##$%\^&*()_-+=[{]};:<>|./?a-zA-Z\d]{8,}$)(?=([!##$%\^&*()_-+=[{]};:<>|./?a-zA-Z\d]\W+){1,})(?=[^0-9][0-9])[!##$%\^&*()_-+=[{]};:<>|./?a-zA-Z\d]*$"
alphanumeric, at least 1 numeric, and special character with a min length of 8

This should do the work
(?:(?=.*[0-9]+)(?=.*[a-zA-Z]+)(?=.*[##$%&*+\-_(),+':;?.,!\[\]\s\\/]+))+
Tested with javascript, not sure about c#, may need some little adjust.
What it does is use anticipated positive lookahead to find the required elements of the password.
EDIT
Regular expression is designed to test if there are matches. Since all the patterns are lookahead, no real characters get captured and matches are empty, but if the expression "match", then the password is valid.
But, since the question is C# (sorry, i don't know c#, just improvising and adapting samples)
string input = "password1!";
string pattern = #"^(?:(?=.*[0-9]+)(?=.*[a-zA-Z]+)(?=.*[##$%&*+\-_(),+':;?.,!\[\]\s\\/]+))+.*$";
Regex rgx = new Regex(pattern, RegexOptions.None);
MatchCollection matches = rgx.Matches(input);
if (matches.Count > 0) {
Console.WriteLine("{0} ({1} matches):", input, matches.Count);
foreach (Match match in matches)
Console.WriteLine(" " + match.Value);
}
Adding start of line, and a .*$ to the end, the expression will match if the password is valid. And the match value will be the password. (i guess)

Related

How to capitalize 1st letter (ignoring non a-z) with regex in c#?

There are tons of posts regarding how to capitalize the first letter with C#, but I specifically am struggling how to do this when ignoring prefixed non-letter characters and tags inside them. Eg,
<style=blah>capitalize the word, 'capitalize'</style>
How to ignore potential <> tags (or non-letter chars before it, like asterisk *) and the contents within them, THEN capitalize "capitalize"?
I tried:
public static string CapitalizeFirstCharToUpperRegex(string str)
{
// Check for empty string.
if (string.IsNullOrEmpty(str))
return string.Empty;
// Return char and concat substring.
// Start # first char, no matter what (avoid <tags>, etc)
string pattern = #"(^.*?)([a-z])(.+)";
// Extract middle, then upper 1st char
string middleUpperFirst = Regex.Replace(str, pattern, "$2");
middleUpperFirst = CapitalizeFirstCharToUpper(str); // Works
// Inject the middle back in
string final = $"$1{middleUpperFirst}$3";
return Regex.Replace(str, pattern, final);
}
EDIT:
Input: <style=foo>first non-tagged word 1st char upper</style>
Expected output: <style=foo>First non-tagged word 1st char upper</style>
You may use
<[^<>]*>|(?<!\p{L})(\p{L})(\p{L}*)
The regex does the following:
<[^<>]*> - matches <, any 0+ chars other than < and > and then >
| - or
(?<!\p{L}) - finds a position not immediately preceded with a letter
(\p{L}) - captures into Group 1 any letter
(\p{L}*) - captures into Group 2 any 0+ letters (that is necessary if you want to lowercase the rest of the word).
Then, check if Group 2 matched, and if yes, capitalize the first group value and lowercase the second one, else, return the whole value:
var result = Regex.Replace(s, #"<[^<>]*>|(?<!\p{L})(\p{L})(\p{L}*)", m =>
m.Groups[1].Success ?
m.Groups[1].Value.ToUpper() + m.Groups[2].Value.ToLower() :
m.Value);
If you do not need to lowercase the rest of the word, remove the second group and the code related to it:
var result = Regex.Replace(s, #"<[^<>]*>|(?<!\p{L})(\p{L})", m =>
m.Groups[1].Success ?
m.Groups[1].Value.ToUpper() : m.Value);
To only replace the first occurrence using this approach, you need to set a flag and reverse it once the first match is found:
var s = "<style=foo>first non-tagged word 1st char upper</style>";
var found = false;
var result = Regex.Replace(s, #"<[^<>]*>|(?<!\p{L})(\p{L})", m => {
if (m.Groups[1].Success && !found) {
found = !found;
return m.Groups[1].Value.ToUpper();
} else {
return m.Value;
}
});
Console.WriteLine(result); // => <style=foo>First non-tagged word 1st char upper</style>
See the C# demo.
Using look-behind regex feature you can match the first 'capitalize' without > parenthesis and then you can capitalize the output.
The regex is the following:
(?<=<.*>)\w+
It will match the first word after the > parenthesis

How to select first sentence in a piece of text using regular expression?

My task is to select first sentence from a text (I'm writing in C#). I suppose that the most appropriate way would be using regex but some troubles occurred. What regex pattern should I use to select the first sentence?
Several examples:
Input: "I am a lion and I want to be free. Do you see a lion when you look inside of me?" Expected result: "I am a lion and I want to be free."
Input: "I drink so much they call me Charlie 4.0 hands. Any text." Expected result: "I drink so much they call me Charlie 4.0 hands."
Input: "So take out your hands and throw the H.U. up. 'Now wave it around like you don't give a fake!'" Expected result: "So take out your hands and throw the H.U. up."
The third is really confusing me.
Since you aleready provided some assumptions:
sentences are divided by a whitespace
task is to select first sentence
You can use the following regex:
^.*?[.?!](?=\s+(?:$|\p{P}*\p{Lu}))
See RegexStorm demo
Regex breakdown:
^ - start of string (thus, only the first sentence will be matched)
.*? - any number of characters, as few as possible (use RegexOptions.Singleline to also match a newline with .)
[.?!] - a final punctuation symbol
(?=\s+(?:$|\p{P}*\p{Lu})) - a look-ahead making sure there is 1 or more whitespace symbols (\s+) right after before the end of string ($) or optional punctuation (\p{P}) and a capital letter (\p{Lu}).
UPDATE:
Since it turns out you can have single sentence input, and your sentences can start with any letter or digit, you can use
^.*?[.?!](?=\s+\p{P}*[\p{Lu}\p{N}]|\s*$)
See another demo
I came up with a regular expression that uses lots of negative look-aheads to exclude certain cases, e.g. a punctuation must not be followed by lowercase character, or a dot before a capital letter is not closing a sentence. This splits up all the text in their seperate sentences. If you are given a text, just take the first match.
[\s\S]*?(?![A-Z]+)(?:\.|\?|\!)(?!(?:\d|[A-Z]))(?! [a-z])/gm
Sentence separators should be searched with following scanner:
if it's sentence-finisher character (like [.!?])
it must be followed by space or allowed sequence of characters and then space:
like sequence of '.' for '.' (A sentence...)
...or sequence of '!' and/or '?' for '!' and '?' (Exclamation here!?)
then it must be followed by either:
capital character (ignore quotes, if any)
numeric
which must be followed by lowercase or another sentence-finister
dialog-starter character (Blah blah blah... - And what next, Elric?)
Tip: don't forget to add extra space character to input source string.
Upd:
Some wild pseudocode xD:
func sentence(inputString) {
finishers = ['.', '!', '?']
allowedSequences = ['.' => ['..'], '!' => ['!!', '?'], '?' => ['??', '!']]
input = inputString
result = ''
found = false
while input != '' {
finisherPos = min(pos(input, finishers))
if !finisherPos
return inputString
result += substr(input, 0, finisherPos + 1)
input = substr(input, finisherPos)
p = finisherPos
finisher = input[p]
p++
if input[p] != ' '
if match = testSequence(substr(input, p), allowedSequences[finisher]) {
result += match
found = true
break
} else {
continue
}
else {
p++
if input[p] in [A-Z] {
found = true
break
}
if input[p] in [0-9] {
p++
if input[p] in [a-z] or input[p] in finishers {
found = true
break
}
p--
}
if input[p] in ['-'] {
found = true;
break
}
}
}
if !found
return inputStr
return result
}
func testSequence(str, sequences) {
foreach (sequence: sequences)
if startsWith(str, sequence)
return sequence
return false
}

Regex Lookahead and lookbehind at most one digit

I'm looking for create RegEx pattern
8 characters [a-zA_Z]
must contains only one digit in any place of string
I created this pattern:
^(?=.*[0-9].*[0-9])[0-9a-zA-Z]{8}$
This pattern works fine but i want only one digit allowed. Example:
aaaaaaa6 match
aaa7aaaa match
aaa88aaa don't match
aaa884aa don't match
aaawwaaa don't match
You could instead use:
^(?=[0-9a-zA-Z]{8})[^\d]*\d[^\d]*$
The first part would assert that the match contains 8 alphabets or digits. Once this is ensured, the second part ensures that there is only one digit in the match.
EDIT: Explanation:
The anchors ^ and $ denote the start and end of string.
(?=[0-9a-zA-Z]{8}) asserts that the match contains 8 alphabets or digits.
[^\d]*\d[^\d]* would imply that there is only one digit character and remaining non-digit characters. Since we had already asserted that the input contains digits or alphabets, the non-digit characters here are alphabets.
If you want a non regex solution, I wrote this for a small project :
public static bool ContainsOneDigit(string s)
{
if (String.IsNullOrWhiteSpace(s) || s.Length != 8)
return false;
int nb = 0;
foreach (char c in s)
{
if (!Char.IsLetterOrDigit(c))
return false;
if (c >= '0' && c <= '9') // just thought, I could use Char.IsDigit() here ...
nb++;
}
return nb == 1;
}

Regular expression to remove whitespace around a comma, except when quoted

I have a CSV file that has rows resembling this:
1, 4, 2, "PUBLIC, JOHN Q" ,ACTIVE , 1332
I am looking for a regular expression replacement that will match against these rows and spit out something resembling this:
1,4,2,"PUBLIC, JOHN Q",ACTIVE,1332
I thought this would be rather easy: I made the expression ([ \t]+,) and replaced it with ,. I made a complement expression (,[ \t]+) with a replacement of , and I thought I had achieved a good means of right-trimming and left-trimming strings.
...but then I noticed that my "PUBLIC, JOHN Q" was now "PUBLIC,JOHN Q" which isn't what I wanted. (Note the space following the comma is now gone).
What would be the appropriate expression to trim the white space before and after a comma, but leave quoted text untouched?
UPDATE
To clarify, I am using an application to handle the file. This application allows me to define multiple regular expression replacements; it does not provide a parsing capability. While this may not be the ideal mechanism for this, it would sure beat making another application for this one file.
If the engine used by your tool is the C# regular expression engine, then you can try the following expression:
(?<!,\s*"(?:[^\\"]|\\")*)\s+(?!(?:[^\\"]|\\")*"\s*,)
replace with empty string.
The guys answers assumed the quotes are balanced and used counting to determine if the space is part of a quoted value or not.
My expression looks for all spaces that are not part of a quoted value.
RegexHero Demo
Something like this might do the job:
(?<!(^[^"]*"[^"]*(("[^"]*){2})*))[\t ]*,[ \t]*
Which matches [\t ]*,[ \t]*, only when not preceded by an odd number of quotes.
Going with some CSV library or parsing the file yourself would be much more easier, and IMO should be preferable option here.
But if you really insist on a regex, you can use this one:
"\s+(?=([^\"]*\"[^\"]*\")*[^\"]*$)"
And replace it with empty string - ""
This regex matches one or more whitespaces, followed by an even number of quotes. This will of course work only if you have balanced quote.
(?x) # Ignore Whitespace
\s+ # One or more whitespace characters
(?= # Followed by
( # A group - This group captures even number of quotes
[^\"]* # Zero or more non-quote characters
\" # A quote
[^\"]* # Zero or more non-quote characters
\" # A quote
)* # Zero or more repetition of previous group
[^\"]* # Zero or more non-quote characters
$ # Till the end
) # Look-ahead end
string format(string val)
{
if (val.StartsWith("\"")) val = " " + val;
string[] vals = val.Split('\"');
for (int i = 0; i < vals.Length; i += 2) vals[i] = vals[i].Replace(" ", "").Replace("\t", "");
return string.Join("\t", vals);
}
This will work if you have properly closed quoted strings in between
Forget the regex (See Bart's comment on the question, regular expressions aren't suitable for CSV).
public static string ReduceSpaces( string input )
{
char[] a = input.ToCharArray();
int placeComma = 0, placeOther = 0;
bool inQuotes = false;
bool followedComma = true;
foreach( char c in a ) {
inQuotes ^= (c == '\"');
if (c == ' ') {
if (!followedComma)
a[placeOther++] = c;
}
else if (c == ',') {
a[placeComma++] = c;
placeOther = placeComma;
followedComma = true;
}
else {
a[placeOther++] = c;
placeComma = placeOther;
followedComma = false;
}
}
return new String(a, 0, placeComma);
}
Demo: http://ideone.com/NEKm09

how to prevent non-English characters and allow non-alpha characters

I have a string, and I want to make sure that every letter in it is English.
The other characters, I don't care.
34556#%42%$23$%^*&sdfsfr - valid
34556#%42%$23$%^*&בלה בלה - not valid
Can I do that with Linq? RegEx?
Thanks
You can define in a character class either all characters/character ranges/Unicode-properties/blocks you want to allow or you don't want to allow.
[abc] is a character class that allows a and b and c
[^abc] is a negated character class that matches everything but not a or b or c
Here in your case I would go this way, no need to define every character:
^[\P{L}A-Za-z]*$
Match from the start to the end of the string everything that is not a letter [^\p{L}] or A-Za-z.
\p{L} Is a Unicode property and matches everything that has the property letter. \P{L} is the negated version, everything that is not a letter.
Test code:
string[] StrInputNumber = { "34556#%42%$23$%^*&sdfsfr", "asdf!\"§$%&/()=?*+~#'", "34556#%42%$23$%^*&בלה בלה", "öäü!\"§$%&/()=?*+~#'" };
Regex ASCIILettersOnly = new Regex(#"^[\P{L}A-Za-z]*$");
foreach (String item in StrInputNumber) {
if (ASCIILettersOnly.IsMatch(item)) {
Console.WriteLine(item + " ==> Contains only ASCII letters");
}
else {
Console.WriteLine(item + " ==> Contains non ASCII letters");
}
}
Some more basic regex explanations: What absolutely every Programmer should know about regular expressions
Maybe you could use
using System.Linq;
...
static bool IsValid(string str)
{
return str.All(c => c <= sbyte.MaxValue);
}
This considers all ASCII chars to be "valid" (even control characters). But punctuation and other special characters outside ASCII are not "valid". If str is null, an exception is thrown.
One thing you can try is put the char you want in this regx
bool IsValid(string input) {
return !(Regex.IsMatch(#"[^A-Za-z0-9'\.&#:?!()$#^]", input));
}
char other than specfied in the regx string are get ignored i.e return false..

Categories