Merging 3 Regular Expressions to make a Slug/URL validation check

Merging 3 Regular Expressions to make a Slug/URL validation check - c#

I am trying to merge a few working RegEx patterns together (AND them). I don't think I am doing this properly, further, the first RegEx might be getting in the way of the next two.
Slug example (no special characters except for - and _):
(^[a-z0-9-_]+$)
Then I would like to ensure the first character is NOT - or _:
(^[^-_])
Then I would like to ensure the last character is NOT - or _:
([^-_]$)
Match (good Alias):
my-new_page
pagename
Not-Match (bad Alias)
-my-new-page
my-new-page_
!##$%^&*()
If this RegExp can be simplified and I am more than happy to use it. I am trying to create validation on a page URL that the user can provide, I am looking for the user to:
Not start or and with a special character
Start and end with a number or letter
middle (not start and end) can include - and _
One I get that working, I can tweak if for other characters as needed.
In the end I am applying as an Annotation to my model like so:
[RegularExpression(
#"(^[a-z0-9-_]+$)?(^[^-_])?([^-_]$)",
ErrorMessage = "Alias is not valid")
]
Thank you, and let me know if I should provide more information.

See regex in use here
^[a-z\d](?:[a-z\d_-]*[a-z\d])?$
^ Assert position at the start of the line
[a-z\d] Match any lowercase ASCII letter or digit
(?:[a-z\d_-]*[a-z\d])? Optionally match the following
[a-z\d_-]* Match any character in the set any number of times
[a-z\d] Match any lowercase ASCII letter or digit
$ Assert position at the end of the line
See code in use here
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
Regex regex = new Regex(#"^[a-z\d](?:[a-z\d_-]*[a-z\d])?$");
string[] strings = {"my-new_page", "pagename", "-my-new-page", "my-new-page_", "!##$%^&*()"};
foreach(string s in strings) {
if (regex.IsMatch(s))
{
Console.WriteLine(s);
}
}
}
}
Result (only positive matches):
my-new_page
pagename

Related

How to match camel case identifiers with a Regular Expression?

I have the need to match camel case variables. I am ignoring variables with numbers in the name.
private const String characters = #"\-:;*+=\[\{\(\/?\s^""'\<\]\}\.\)$\>";
private const String start = #"(?<=[" + characters +"])[_a-z]+";
private const String capsWord = "[_A-Z]{1}[_a-z]+";
private const String end = #"(?=[" + characters + "])";
var regex = new Regex($"{start}{capsWord}{end}",
RegexOptions.Compiled | RegexOptions.CultureInvariant) }
This is great for matching single hump variables! But not with multiple nor does the one that meets the end of the line. I thought $ or ^ in my characters would allow them to match.
abcDef // match
notToday<end of line> // no match
<start of line>intheBeginning // no match
whatIf // match
"howFar" // match
(whatsNext) // match
ohMyGod // two humps don't match
I have also tried wrapping my capsWord like this
"(capsWord)+" but it also doesn't work.
WARNING! Regex tester online matches using this "(capsWord)+" so don't verify and respond by testing from there.
It seems that my deployment wasn't getting the updates when I was making changes so there may not have been an issue after all.
This following almost works save for the start of line problem. Note, I notice I didn't need the suffix part because the match ends with [a-z] content.
private const String characters = #"\-:;*+=\[\{\(\/?\s^""'\<\]\}\.\)$\>";
private const String pattern = "(?<=[" + characters + "])[_a-z]+([A-Z][a-z]+)+";
abcDef // match
notToday<end of line> // match
<start of line>intheBeginning // no match
whatIf // match
"howFar" // match
(whatsNext) // match
ohMyGod // match
So, if anyone can solve it let me know.
I have also simplified the other characters to a simpler more concise expression but it still has a problem with matching from the beginning of the line.
private const String pattern = "(?<=[^a-zA-Z])[_a-z]+([A-Z][a-z]+)+";

You can match an empty position between a prefix and a suffix to split the camelCase identifiers
(?<=[_a-z])(?=[_A-Z])
The prefix contains the lower case letters, the suffix the upper case letters.
If you want to match camelCase identifiers, you can use
(?<=^|[^_a-zA-Z])_*[a-z]+[_a-zA-Z]*
How it works:
(?<= Match any position pos following a prefix exp (?<=exp)pos
^ Beginning of line
| OR
[^_a-zA-Z] Not an identifier character
)
_* Any number of underlines
[a-z]+ At least one lower case letter
[_a-zA-Z]* Any number of underlines and lower or upper case letters
So, it basically says: Match a sequence optionally starting with underlines, followed by at least one lower case letter, optionally followed by underlines and letters (upper and lower), and the whole thing must be preceded by either a beginning of line or a non-identifier character. This is necessary to make sure that we not only match the ending of a identifier starting with an upper case letter (or underscores and a upper case letter).
var camelCaseExpr = new Regex("(?<=^|[^_a-zA-Z])_*[a-z]+[_a-zA-Z]*");
MatchCollection matches = camelCaseExpr.Matches("whatIf _Abc _abc howFar");
foreach (Match m in matches) {
Console.WriteLine(m.Value);
}
prints
whatIf
_abc
howFar

Had the same problem today, what worked for me:
\b([a-z][a-z0-9]+[A-Z])+[a-z0-9]+\b
Note: this is for PCRE regexes
Explanation:
`(` group begin
`[a-z]` start with a lower-case letter
`[a-z0-9]+` match a string of all lowercase/numbers
`[A-Z]` an upper-case letter
`)+` group end; match one or more of such groups.
Ends with some more lower-case/numbers.
\b for word boundary.
In my case, the _camelCaseIdent_s had only one letter upper in between words.
So, this worked for me, but if you can have (or want to match) more than one
upper-case letter in between, you could do something like [A-Z]{1,2}

String Needs to Contain 2 words

I have a textbox on one of my views, and that textbox should not accept anything that has more than 2 words or less than 2 words. This textbox needs 2 words.
Basically this textbox accepts a person's first and last name. I don't want people to only enter one or the other.
Is there a way to check for a space character between 2 words and another space character along with any letter, number, etc after the 2nd word if it exists? I think that if the user accidently 'fat-fingers' an extra space after the 2nd word, that should be fine bc there are still only 2 words.
For example:
/* the _ character means space */
John /* not accepted */
John_ /* not accepted */
John_Smith_a /* not accepted */
John Smith_ /* accepted */
Any help is appreciated.

There are multiple approaches that you could use to solve this, I'll review over a few.
Using the String.Split() Method
You could use the String.Split() method to break up a string into it's individual components based on a delimiter. In this case, you could use a space as a delimiter to get the individual words :
// Get your words, removing any empty entries along the way
var words = YourTextBox.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
// Determine how many words you have here
if(words.Length != 2)
{
// Tell the user they made a horrible mistake not typing two words here
}
Using a Regular Expression
Additionally, you could attempt to resolve this via a Regular Expression using the Regex.IsMatch() method :
// Check for exactly two words (and allow for beginning and trailing spaces)
if(!Regex.IsMatch(input,#"^(\s+)?\w+\s+\w+(\s+)?"))
{
// There are not two words, do something
}
The expression itself may look a bit scary, but it can be broken down as follows :
^ # This matches the start of your string
(\s+)? # This optionally allows for a single series of one or more whitespace characters
\w+ # This allows for one or more "word" characters that make up your first word
\s+ # Again you allow for a series of whitespace characters, you can drop the + if you just want one
\w+ # Here's your second word, nothing new here
(\s+)? # Finally allow for some trailing spaces (up to you if you want them)
A "word" character \w is a special character in Regular Expressions that can represent a digit, letter or an underscore and is the equivalent of [a-zA-Z0-9_].
Taking Advantage of Regular Expressions using MVC's RegularExpressionAttribute
Finally, since you are using MVC, you could take advantage of the [RegularExpressionValidation] attribute on your model itself :
[RegularExpression(#"^(\s+)?\w+\s+\w+(\s+)?", ErrorMessage = "Exactly two words are required.")]
public string YourProperty { get; set; }
This will allow you to simply call the ModelState.IsValid within your Controller Action to see if your Model has any errors or not :
// This will check your validation attributes like the one mentioned above
if(!ModelState.IsValid)
{
// You probably have some errors, like not exactly two words
}

use it like this
string s="John_Smith_a"
if (s.Trim().Split(new char[] { ' ' }).Length > 1)
{
}

The tag implies MVC here, so I would recommend using the RegularExpressionAttribute class:
public class YourModel
{
[RegularExpression(#"[^\w\s\w$]", ErrorMessage = "You must have exactly two words separated by a space.")]
public string YourProperty { get; set; }
}

Match m = Regex.Match(this.yourTextBox.Text, #"[^\w\s\w$]", String.Empty);
if (m.Success)
//do something
else
//do something else
With my very limited knowledge of regular expressions, I believe that this will solve your issue.

The cleanest way is to use regular expressions with the IsMatch method like this:
Regex.IsMatch("One Two", #"^\w+\s\w+\s?$")
Returns true if the input is a match.

Try this
if (str.Split(' ').Length == 2)
{
//Do Something
}
str is the variable holding your string to compare

How to validate a string contains only latin characters including diacritics [duplicate]

I'd like to restrict my form input from entering non-english characters. For example, all Chinese, Japanese, Cyrllic, but also single characters like: à, â, ù, û, ü, ô, î, ê. Would this be possible? Do I have to set up a locale on my MVC application or rather just do a regex textbox validation? Just a side note, I want to be able to enter numbers and other characters. I only want this to exclude letters.
Please advice, thank you

For this you have to use Unicode character properties and blocks. Each Unicode code points has assigned some properties, e.g. this point is a Letter. Blocks are code point ranges.
For more details, see:
regular-expressions.info for some general information about Unicode code points, character properties, scripts and blocks
MSDN for the supported properties and blocks in .net
Those Unicode Properties and blocks are written \p{Name}, where "Name" is the name of the property or block.
When it is an uppercase "P" like this \P{Name}, then it is the negation of the property/block, i.e. it matches anything else.
There are e.g. some properties (only a short excerpt):
L ==> All letter characters.
Lu ==> Letter, Uppercase
Ll ==> Letter, Lowercase
N ==> All numbers. This includes the Nd, Nl, and No categories.
Pc ==> Punctuation, Connector
P ==> All punctuation characters. This includes the Pc, Pd, Ps, Pe, Pi, Pf, and Po categories.
Sm ==> Symbol, Math
There are e.g. some blocks (only a short excerpt):
0000 - 007F ==> IsBasicLatin
0400 - 04FF ==> IsCyrillic
1000 - 109F ==> IsMyanmar
What I used in the solution:
\P{L} is a character property that is matching any character that is not a letter ("L" for Letter)
\p{IsBasicLatin} is a Unicode block that matches the code points 0000 - 007F
So your regex would be:
^[\P{L}\p{IsBasicLatin}]+$
In plain words:
This matches a string from the start to the end (^ and $), When there are (at least one) only non letters or characters from the ASCII table (doce points 0000 - 007F)
A short c# test method:
string[] myStrings = { "Foobar",
"Foo#bar!\"§$%&/()",
"Föobar",
"fóÓè"
};
Regex reg = new Regex(#"^[\P{L}\p{IsBasicLatin}]+$");
foreach (string str in myStrings) {
Match result = reg.Match(str);
if (result.Success)
Console.Out.WriteLine("matched ==> " + str);
else
Console.Out.WriteLine("failed ==> " + str);
}
Console.ReadLine();
Prints:
matched ==> Foobar
matched ==> Foo#bar!\"§$%&/()
failed ==> Föobar
failed ==> fóÓè

You can use a Regular Expression attribute on your ViewModel to restrict that
public class MyViewModel
{
[System.ComponentModel.DataAnnotations.RegularExpression("[a-zA-Z]+")]
public string MyEntry
{
get;
set;
}
}

You can use regex [\x00-\x80]+ or [\u0000-\u0080]+. Haven't tested but think it should work in C# also.
Adapted from: Regular expression to match non-English characters?
You can use regex validation for textbox and validate on the server also.

May be this one help You:=
private void Validate(TextBox textBox1)
{
Regex rx = new Regex("[^A-Z|^a-z|^ |^\t]");
if (rx.IsMatch(textBoxControl.Text))
throw new Exception("Your error message");
}
Usefull Link:-
http://social.msdn.microsoft.com/Forums/en-US/csharpgeneral/thread/84e4f7fa-5fff-427f-8c0e-d478cb38fa12
http://www.c-sharpcorner.com/Forums/Thread/177046/allow-only-20-alphabets-and-numbers-in-textbox-using-reg.aspx

this might help, not efficient way but simple non-reg validation
foreach (char c in inputTextField)
{
if ((int)(c) > 127)
{
// expection or your logic whatever you want to return
}
}

Check if an expression is a match with regex

In C# I have two strings: [I/text] and [S/100x20].
So, the first one is [I/ followed by text and ending in ].
And the second is [S/ followed by an integer, then x, then another integer, and ending in ].
I need to check if a given string is a match of one of this formats. I tried the following:
(?<word>.*?) and (?<word>[0-9]x[0-9])
But this does not seem to work and I am missing the [I/...] and [S/...] parts.
How can I do this?

This should do nicely:
Regex rex = new Regex(#"\[I/[^\]]+\]|\[S/\d+x\d+\]");
If the text in [I/text] is supposed to include only alphanumeric characters then #Oleg's use of the \w instead of [^\]] would be better. Also using + means there needs to be at least one of the preceding character class, and the * allows class to be optional. Adjust as needed..
And use:
string testString1 = "[I/text]";
if(rex.IsMatch(testString1))
{
// should match..
}
string testString2 = "[S/100x20]";
if(rex.IsMatch(testString2))
{
// should match..
}

Following regex does it. Matches the whole string
"(\[I/\w+\])|(\[S/\d+x\d+\])"

([I/\w+])
(S/\d+x\d+])
the above works.
use http://regexr.com?34543 to play with your expressions

form validation allow only english alphabet characters

I'd like to restrict my form input from entering non-english characters. For example, all Chinese, Japanese, Cyrllic, but also single characters like: à, â, ù, û, ü, ô, î, ê. Would this be possible? Do I have to set up a locale on my MVC application or rather just do a regex textbox validation? Just a side note, I want to be able to enter numbers and other characters. I only want this to exclude letters.
Please advice, thank you

For this you have to use Unicode character properties and blocks. Each Unicode code points has assigned some properties, e.g. this point is a Letter. Blocks are code point ranges.
For more details, see:
regular-expressions.info for some general information about Unicode code points, character properties, scripts and blocks
MSDN for the supported properties and blocks in .net
Those Unicode Properties and blocks are written \p{Name}, where "Name" is the name of the property or block.
When it is an uppercase "P" like this \P{Name}, then it is the negation of the property/block, i.e. it matches anything else.
There are e.g. some properties (only a short excerpt):
L ==> All letter characters.
Lu ==> Letter, Uppercase
Ll ==> Letter, Lowercase
N ==> All numbers. This includes the Nd, Nl, and No categories.
Pc ==> Punctuation, Connector
P ==> All punctuation characters. This includes the Pc, Pd, Ps, Pe, Pi, Pf, and Po categories.
Sm ==> Symbol, Math
There are e.g. some blocks (only a short excerpt):
0000 - 007F ==> IsBasicLatin
0400 - 04FF ==> IsCyrillic
1000 - 109F ==> IsMyanmar
What I used in the solution:
\P{L} is a character property that is matching any character that is not a letter ("L" for Letter)
\p{IsBasicLatin} is a Unicode block that matches the code points 0000 - 007F
So your regex would be:
^[\P{L}\p{IsBasicLatin}]+$
In plain words:
This matches a string from the start to the end (^ and $), When there are (at least one) only non letters or characters from the ASCII table (doce points 0000 - 007F)
A short c# test method:
string[] myStrings = { "Foobar",
"Foo#bar!\"§$%&/()",
"Föobar",
"fóÓè"
};
Regex reg = new Regex(#"^[\P{L}\p{IsBasicLatin}]+$");
foreach (string str in myStrings) {
Match result = reg.Match(str);
if (result.Success)
Console.Out.WriteLine("matched ==> " + str);
else
Console.Out.WriteLine("failed ==> " + str);
}
Console.ReadLine();
Prints:
matched ==> Foobar
matched ==> Foo#bar!\"§$%&/()
failed ==> Föobar
failed ==> fóÓè

You can use a Regular Expression attribute on your ViewModel to restrict that
public class MyViewModel
{
[System.ComponentModel.DataAnnotations.RegularExpression("[a-zA-Z]+")]
public string MyEntry
{
get;
set;
}
}

You can use regex [\x00-\x80]+ or [\u0000-\u0080]+. Haven't tested but think it should work in C# also.
Adapted from: Regular expression to match non-English characters?
You can use regex validation for textbox and validate on the server also.

May be this one help You:=
private void Validate(TextBox textBox1)
{
Regex rx = new Regex("[^A-Z|^a-z|^ |^\t]");
if (rx.IsMatch(textBoxControl.Text))
throw new Exception("Your error message");
}
Usefull Link:-
http://social.msdn.microsoft.com/Forums/en-US/csharpgeneral/thread/84e4f7fa-5fff-427f-8c0e-d478cb38fa12
http://www.c-sharpcorner.com/Forums/Thread/177046/allow-only-20-alphabets-and-numbers-in-textbox-using-reg.aspx

this might help, not efficient way but simple non-reg validation
foreach (char c in inputTextField)
{
if ((int)(c) > 127)
{
// expection or your logic whatever you want to return
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Merging 3 Regular Expressions to make a Slug/URL validation check - c#

Related

How to match camel case identifiers with a Regular Expression?

String Needs to Contain 2 words

How to validate a string contains only latin characters including diacritics [duplicate]

Check if an expression is a match with regex

form validation allow only english alphabet characters

Categories

Resources