Compare Two Strings With Wildcards - c#

I need to compare two strings, one of which uses '*' as a wildcard. I was thinking of using either an iterative or recursive method when I realized that RegEx would perform the task more quickly. Unfortunately, I am new to RegEx, and am not sure how to do this.
If I sent in the pattern "He**o", then "Hello" and "He7(o" should return true, but "Hhllo" should return false.

Assuming that you mean * to be a single-character wildcard, the correct substitution in a Regex pattern is a dot (.):
string pattern = "He**o";
string regexPattern = pattern.Replace("*",".");
Regex.IsMatch("Hello",regexPattern); // true
Regex.IsMatch("He7)o",regexPattern); // true
Regex.IsMatch("he7)o",regexPattern); // false
Regex.IsMatch("he7)o",regexPattern, RegexOptions.IgnoreCase); // true
You might also want to anchor the pattern with ^ (start of string) and $ (end of string):
regexPattern = String.Format("^{0}$", pattern.Replace("*","."));
If you expect it to be able to parse input strings with special characters, you'll can escape all other characters like this:
string regexPattern = String.Join(".",pattern.Split("*".ToCharArray())
.Select(s => Regex.Escape(s)).ToArray());

Compare the strings by using the char index in a for loop. If the pattern char (wildcard) appears, ignore the comparison and move on to the next comparison.
private bool Compare(string pattern, string compare)
{
if (pattern.Length != compare.Length)
//strings don't match
return false;
for(int x = 0, x < pattern.Length, x++)
{
if (pattern[x] != '*')
{
if (pattern[x] != compare[x])
return false;
}
}
return true;
}

Make Regex using "He..lo"
This is a case that will not be recognized
Regex r = new Regex("He..o");
string test = "Hhleo";
bool sucess = r.Match(a).Success;
This is a case that will be recognized
Regex r = new Regex("He..o");
string test = "He7)o";
bool sucess = r.Match(a).Success;

That's exactly what I've done in php today. When you add this:
if (pattern[x] != '*' && compare[x] != '*')
Then both strings can have wildcards. (hope that && means logical AND like in php)

Related

Check if a string contains only letters, digits and underscores

I have to check if a string contains only letters, digits and underscores.
This is how I tried but it doesn't work:
for(int i = 0; i<=snameA.Length-1; i++)
{
validA = validA && (char.IsLetterOrDigit(snameA[i])||snameA[i].Equals("_"));
}
I love Linq for this kind of question:
bool validA = sname.All(c => Char.IsLetterOrDigit(c) || c.Equals('_'));
You are assigning validA every time again, without checking its previous value. Now you always get the value of the last check executed.
You could 'and' the result:
validA &= (char.IsLetterOrDigit(snameA[i]) || snameA[i] == '_');
This would mean you still run all characters, which might be useless if the first check failed. So it is better to simply step out if it fails:
for(int i = 0; i<=snameA.Length-1; i++)
{
validA = (char.IsLetterOrDigit(snameA[i]) || snameA[i] == '_');
if (!validA)
{ break; } // <-- see here
}
Or with LINQ:
validA = snameA.All(c => char.IsLetterOrDigit(c) || c == '_');
you can use regex
Regex regex1 = new Regex(#"^[a-zA-Z0-9_]+$");
if(regex1.IsMatch(snameA))
{
}
I would use a Regex
string pattern = #"^[a-zA-Z0-9\_]+$";
Regex regex = new Regex(pattern);
// Compare a string against the regular expression
return regex.IsMatch(stringToTest);
You could try matching a regular expression. There is a built in type for "letters, digits and underscores", which is "\w".
Regex rgx = new Regex(#"\w*");
rgs.IsMatch(yourString);
If you require 1 or more, then use "\w+".
Further information here: Regex.IsMatch
First, letter is a bit vague term: do you mean a..z and A..Z characters or letter could belong to any alphabet, e.g. а..я and А..Я (Russian, Cyrillic letters). According to your current implementation, you want the second option.
Typical solution with loop is to check until first counter example:
Boolean validA = true; // true - no counter examples so far
// Why for? foreach is much readble here
foreach(Char ch in sname)
// "!= '_'" is more readable than "Equals"; and wants no boxing
if (!char.IsLetterOrDigit(ch) && ! (ch != '_')) {
Boolean validA = false; // counter example (i.e. non-letter/digit symbol found)
break; // <- do not forget this: there's no use to check other characters
}
However you can simplify the code with either Linq:
validA = sname.All(ch => Char.IsLetterOrDigit(ch) || ch == '_');
Or regular expression:
validA = Regex.IsMatch(sname, #"^\w*$");

Regex. Camel case to underscore. Ignore first occurrence

For example:
thisIsMySample
should be:
this_Is_My_Sample
My code:
System.Text.RegularExpressions.Regex.Replace(input, "([A-Z])", "_$0", System.Text.RegularExpressions.RegexOptions.Compiled);
It works fine, but if the input is changed to:
ThisIsMySample
the output will be:
_This_Is_My_Sample
How can first occurrence be ignored?
Non-Regex solution
string result = string.Concat(input.Select((x,i) => i > 0 && char.IsUpper(x) ? "_" + x.ToString() : x.ToString()));
Seems to be quite fast too: Regex: 2569ms, C#: 1489ms
Stopwatch stp = new Stopwatch();
stp.Start();
for (int i = 0; i < 1000000; i++)
{
string input = "ThisIsMySample";
string result = System.Text.RegularExpressions.Regex.Replace(input, "(?<=.)([A-Z])", "_$0",
System.Text.RegularExpressions.RegexOptions.Compiled);
}
stp.Stop();
MessageBox.Show(stp.ElapsedMilliseconds.ToString());
// Result 2569ms
Stopwatch stp2 = new Stopwatch();
stp2.Start();
for (int i = 0; i < 1000000; i++)
{
string input = "ThisIsMySample";
string result = string.Concat(input.Select((x, j) => j > 0 && char.IsUpper(x) ? "_" + x.ToString() : x.ToString()));
}
stp2.Stop();
MessageBox.Show(stp2.ElapsedMilliseconds.ToString());
// Result: 1489ms
You can use a lookbehind to ensure that each match is preceded by at least one character:
System.Text.RegularExpressions.Regex.Replace(input, "(?<=.)([A-Z])", "_$0",
System.Text.RegularExpressions.RegexOptions.Compiled);
lookaheads and lookbehinds allow you to make assertions about the text surrounding a match without including that text within the match.
Maybe like;
var str = Regex.Replace(input, "([A-Z])", "_$0", RegexOptions.Compiled);
if(str.StartsWith("_"))
str = str.SubString(1);
// (Preceded by a lowercase character or digit) (a capital) => The character prefixed with an underscore
var result = Regex.Replace(input, "(?<=[a-z0-9])[A-Z]", m => "_" + m.Value);
result = result.ToLowerInvariant();
This works for both PascalCase and camelCase.
It creates no leading or trailing underscores.
It leaves in tact any sequences of non-word characters and underscores in the string, because they would seem intentional, e.g. __HiThere_Guys becomes __hi_there_guys.
Digit suffixes are (intentionally) considered part of the word, e.g. NewVersion3 becomes new_version3.
Digit prefixes follow the original casing, e.g. 3VersionsHere becomes 3_versions_here, but 3rdVersion becomes 3rd_version.
Unfortunately, capitalized two-letter acronyms (e.g. in IDNumber, where ID would be considered a separate word), as suggested in Microsoft's Capitalization Conventions, are not supported, since they conflict with other cases. I recommend, in general, to resist this guideline, as it is a seemingly arbitrary exception to the convention of not capitalizing acronyms. Stick with IdNumber.
Elaborating on sa_ddam213's solution, mine extends this:
public static string GetConstStyleName(this string value)
{
return string.Concat(value.Select((x, i) =>
{
//want to avoid putting underscores between pairs of upper-cases or pairs of numbers, or adding redundant underscores if they already exist.
bool isPrevCharLower = (i == 0) ? false : char.IsLower(value[i - 1]);
bool isPrevCharNumber = (i == 0) ? false : char.IsNumber(value[i - 1]);
return (isPrevCharLower && (char.IsUpper(x) || char.IsNumber(x))) //lower-case followed by upper-case or number needs underscore
|| (isPrevCharNumber && (char.IsUpper(x))) //number followed by upper-case needs underscore
? "_" + x.ToString() : x.ToString();
})).ToUpperInvariant();
}
Use ".([A-Z])" for your regular expression, and then "_$1" for the replacement. So you use the captured string for the replacement and with the leading . you are sure you are not catching the first char of your string.
You need to modify your regex to not match the first char by defining you want to ignore the first char at all by
.([A-Z])
The above regex simply excludes every char that comes first and since it is not in the braces it would be in the matching group.
Now you need to match the second group like Bibhu noted:
System.Text.RegularExpressions.Regex.Replace(s, "(.)([A-Z])", "$1_$2", System.Text.RegularExpressions.RegexOptions.Compiled);

Regex class and test in C# .Net

I've copied a RegEx that's working in Javascript. But when I run it in C# it returns false. I'm not sure if it's my code iteslf that's incorrect or if it is the RegEx. This is my code.
bool isValid = true;
string nameInput = "Andreas Johansson";
string emailInput = "email#gmail.com";
string passwordInput = "abcABC123";
string namePattern = #"^[A-z]+(-[A-z]+)*$";
string emailPattern = #"^([\w-\.]+#([\w-]+\.)+[\w-]{2,4})?$";
string passwordPattern = #"^(?=.*\d+)(?=.*[a-zA-Z])[0-9a-zA-Z!##$%]{6,50}$";
Regex nameRegEx = new Regex(namePattern);
Regex emailRegEx = new Regex(emailPattern);
Regex passwordRegEx = new Regex(passwordPattern);
if (model.CreateFullName.Length < 3 || !nameRegEx.IsMatch(nameInput))
isValid = false;
if (model.CreateEmail.Length < 3 || !emailRegEx.IsMatch(emailInput))
isValid = false;
if (model.CreatePassword.Length < 3 || !passwordRegEx.IsMatch(passwordInput))
isValid = false;
Thankful for inputs!
You should remove boundary slashes from pattern definitions. They are required for regex objects in javascript not .NET. e.g.:
string namePattern = #"^[A-z]+(-[A-z]+)*$";
string emailPattern = #"^([\w-\.]+#([\w-]+\.)+[\w-]{2,4})?$";
string passwordPattern = #"^(?=.*\d+)(?=.*[a-zA-Z])[0-9a-zA-Z!##$%]{6,50}$";
UPDATE: you fixed them in your edit.
The name pattern still does not account for spaces in the input. Try following instead:
^[A-z]+([-\s][A-z]+)*$
Also note that [A-z] is not a correct pattern for matching alphabet letters. Use [A-Za-z] for matching ASCII alphabet letters, or \p{L} for matching any unicode letter.
The problem for [A-z] is that it matches these character too that reside after Z and before a:
[\]^_`

C# Regex match all occurrences

I'm trying to make a Regular Expression in C# that will match strings like"", but my Regex stops at the first match, and I'd like to match the whole string.
I've been trying with a lot of ways to do this, currently, my code looks like this:
string sPattern = #"/&#\d{2};/";
Regex rExp = new Regex(sPattern);
MatchCollection mcMatches = rExp.Matches(txtInput.Text);
foreach (Match m in mcMatches) {
if (!m.Success) {
//Give Warning
}
}
And also tried lblDebug.Text = Regex.IsMatch(txtInput.Text, "(&#[0-9]{2};)+").ToString(); but it also only finds the first match.
Any tips?
Edit:
The end result I'm seeking is that strings like &# are labeled as incorrect, as it is now, since only the first match is made, my code marks this as a correct string.
Second Edit:
I changed my code to this
string sPattern = #"&#\d{2};";
Regex rExp = new Regex(sPattern);
MatchCollection mcMatches = rExp.Matches(txtInput.Text);
int iMatchCount = 0;
foreach (Match m in mcMatches) {
if (m.Success) {
iMatchCount++;
}
}
int iTotalStrings = txtInput.Text.Length / 5;
int iVerify = txtInput.Text.Length % 5;
if (iTotalStrings == iMatchCount && iVerify == 0) {
lblDebug.Text = "True";
} else {
lblDebug.Text = "False";
}
And this works the way I expected, but I still think this can be achieved in a better way.
Third Edit:
As #devundef suggest, the expression "^(&#\d{2};)+$" does the work I was hopping, so with this, my final code looks like this:
string sPattern = #"^(&#\d{2};)+$";
Regex rExp = new Regex(sPattern);
lblDebug.Text = rExp.IsMatch(txtInput.Text).ToString();
I always neglect the start and end of string characters (^ / $).
Remove the / at the start and end of the expression.
string sPattern = #"&#\d{2};";
EDIT
I tested the pattern and it works as expected. Not sure what you want.
Two options:
&#\d{2}; => will give N matches in the string. On the string  it will match 2 groups,  and 
(&#\d{2};)+ => will macth the whole string as one single group. On the string  it will match 1 group, 
Edit 2:
What you want is not get the groups but know if the string is in the right format. This is the pattern:
Regex rExp = new Regex(#"^(&#\d{2};)+$");
var isValid = rExp.IsMatch("") // isValid = true
var isValid = rExp.IsMatch("xyz") // isValid = false
Here you go: (&#\d{2};)+ This should work for one occurence or more
(&#\d{2};)*
Recommend: http://www.weitz.de/regex-coach/

What code would I use to convert a SQL like expression to a regex on the fly?

I'm looking to convert a SQL like statement on the fly to the equivalent regex i.e.
LIKE '%this%'
LIKE 'Sm_th'
LIKE '[C-P]arsen'
What's the best approach to doing this?
P.S. I'm looking to do this on the .Net Framework (C#).
The following Regex converts an SQL like pattern into a Regex pattern with the help of a MatchEvaluator delegate. It correctly handles square bracket blocks and escapes special Regex characters.
string regexPattern = "^" + Regex.Replace(
likePattern,
#"[%_]|\[[^]]*\]|[^%_[]+",
match =>
{
if (match.Value == "%")
{
return ".*";
}
if (match.Value == "_")
{
return ".";
}
if (match.Value.StartsWith("[") && match.Value.EndsWith("]"))
{
return match.Value;
}
return Regex.Escape(match.Value);
}) + "$";
In addition to #Nathan-Baulch's solution you can use the code below to also handle the case where a custom escape character has been defined using the LIKE '!%' ESCAPE '!' syntax.
public Regex ConvertSqlLikeToDotNetRegex(string regex, char? likeEscape = null)
{
var pattern = string.Format(#"
{0}[%_]|
[%_]|
\[[^]]*\]|
[^%_[{0}]+
", likeEscape);
var regexPattern = Regex.Replace(
regex,
pattern,
ConvertWildcardsAndEscapedCharacters,
RegexOptions.IgnorePatternWhitespace);
regexPattern = "^" + regexPattern + "$";
return new Regex(regexPattern,
!m_CaseSensitive ? RegexOptions.IgnoreCase : RegexOptions.None);
}
private string ConvertWildcardsAndEscapedCharacters(Match match)
{
// Wildcards
switch (match.Value)
{
case "%":
return ".*";
case "_":
return ".";
}
// Remove SQL defined escape characters from C# regex
if (StartsWithEscapeCharacter(match.Value, likeEscape))
{
return match.Value.Remove(0, 1);
}
// Pass anything contained in []s straight through
// (These have the same behaviour in SQL LIKE Regex and C# Regex)
if (StartsAndEndsWithSquareBrackets(match.Value))
{
return match.Value;
}
return Regex.Escape(match.Value);
}
private static bool StartsAndEndsWithSquareBrackets(string text)
{
return text.StartsWith("[", StringComparison.Ordinal) &&
text.EndsWith("]", StringComparison.Ordinal);
}
private bool StartsWithEscapeCharacter(string text, char? likeEscape)
{
return (likeEscape != null) &&
text.StartsWith(likeEscape.ToString(), StringComparison.Ordinal);
}
From your example above, I would attack it like this (I speak in general terms because I do not know C#):
Break it apart by LIKE '...', put the ... pieces into an array.
Replace unescaped % signs by .*, underscores by ., and in this case the [C-P]arsen translates directly into regex.
Join the array pieces back together with a pipe, and wrap the result in parentheses, and standard regex bits.
The result would be:
/^(.*this.*|Sm.th|[C-P]arsen)$/
The most important thing here is to be wary of all the ways you can escape data, and which wildcards translate to which regular expressions.
% becomes .*
_ becomes .
I found a Perl module called Regexp::Wildcards. You can try to port it or try Perl.NET. I have a feeling you can write something up yourself too.

Categories