I'm trying to make a Regular Expression in C# that will match strings like"", but my Regex stops at the first match, and I'd like to match the whole string.
I've been trying with a lot of ways to do this, currently, my code looks like this:
string sPattern = #"/&#\d{2};/";
Regex rExp = new Regex(sPattern);
MatchCollection mcMatches = rExp.Matches(txtInput.Text);
foreach (Match m in mcMatches) {
if (!m.Success) {
//Give Warning
}
}
And also tried lblDebug.Text = Regex.IsMatch(txtInput.Text, "(&#[0-9]{2};)+").ToString(); but it also only finds the first match.
Any tips?
Edit:
The end result I'm seeking is that strings like &# are labeled as incorrect, as it is now, since only the first match is made, my code marks this as a correct string.
Second Edit:
I changed my code to this
string sPattern = #"&#\d{2};";
Regex rExp = new Regex(sPattern);
MatchCollection mcMatches = rExp.Matches(txtInput.Text);
int iMatchCount = 0;
foreach (Match m in mcMatches) {
if (m.Success) {
iMatchCount++;
}
}
int iTotalStrings = txtInput.Text.Length / 5;
int iVerify = txtInput.Text.Length % 5;
if (iTotalStrings == iMatchCount && iVerify == 0) {
lblDebug.Text = "True";
} else {
lblDebug.Text = "False";
}
And this works the way I expected, but I still think this can be achieved in a better way.
Third Edit:
As #devundef suggest, the expression "^(&#\d{2};)+$" does the work I was hopping, so with this, my final code looks like this:
string sPattern = #"^(&#\d{2};)+$";
Regex rExp = new Regex(sPattern);
lblDebug.Text = rExp.IsMatch(txtInput.Text).ToString();
I always neglect the start and end of string characters (^ / $).
Remove the / at the start and end of the expression.
string sPattern = #"&#\d{2};";
EDIT
I tested the pattern and it works as expected. Not sure what you want.
Two options:
&#\d{2}; => will give N matches in the string. On the string it will match 2 groups, and
(&#\d{2};)+ => will macth the whole string as one single group. On the string it will match 1 group,
Edit 2:
What you want is not get the groups but know if the string is in the right format. This is the pattern:
Regex rExp = new Regex(#"^(&#\d{2};)+$");
var isValid = rExp.IsMatch("") // isValid = true
var isValid = rExp.IsMatch("xyz") // isValid = false
Here you go: (&#\d{2};)+ This should work for one occurence or more
(&#\d{2};)*
Recommend: http://www.weitz.de/regex-coach/
Related
I need to make sure a field has the proper syntax using Regex in C#, before proceeding. Here is my code:
Description = 'AB1234567,AB3456789;AB2345678';
Regex reg = new Regex("(AB.{7},?)*;?(AB.{7},?)*");
Match match = reg.Match(Description);
if (!match.Success)
{
//code to raise error
}
So, some syntax rules:
The field has elements of 2 letters (in this case AB) followed by 7 characters.
These elements are comma separated, either on the left or right side of a ";". Which side they are in indicating their properties, but either side can be empty.
If the right side is not empty then ";" is mandatory, if empty it is optional.
The last element of each side cannot end with a ",".
Correct examples:
- AB1234567,AB3456789;AB2345678
- AB1234567,AB3456789;
- AB1234567
- ;AB2345678,AB34567890
Wrong examples:
- AB1234567,;AB2345678
- AB3456789;AB2345678,
My regular expression is not complete, but I cannot come up with how to consider all cases. What is the correct regular expression for this problem?
Straight answer and much more simplified version, Should work in all options.
bool IsValid(string line)
{
if (string.IsNullOrEmpty(line)) return true;
return !line.Trim().EndsWith(",");
}
IEnumerable<string> GetTokens(string line)
{
var pattern = #"(AB\d{7}([,;]|[^0-9a-zA-Z]|$))";
var matches = Regex.Matches(line, pattern, RegexOptions.Singleline);
foreach (Match match in matches)
{
yield return match.Value;
}
}
string inputLine = ";AB2345678,AB34567890";
string[] leftRight = inputLine.Split(new[] { ';' });
string left =string.Empty, right = string.Empty;
if (leftRight.Length > 0) left = leftRight[0];
if (leftRight.Length > 1) right = leftRight[1];
bool isLeftValid = IsValid(left);
bool isRightValid = IsValid(right);
IEnumerable<string> leftTokens = null, rightTokens = null;
if (isLeftValid) leftTokens = GetTokens(left);
if (isRightValid) rightTokens = GetTokens(right);
I think your expression is almost correct, you just need to ensure that a comma is followed by another AB group. You can do that with a positive lookahead, like this:
^(AB.{7}(,(?=AB))?)*;?(AB.{7}(,(?=AB))?)*$
You also need to put in the start and end markers otherwise you will get multiple submatches.
This expression will not match the ;AB2345678,AB34567890 sample because it has 8 digits in the last group instead of 7
Edit: If you want the AB groups in a nice collection, try
^((?<left>AB.{7})(,(?=AB))?)*;?((?<right>AB.{7})(,(?=AB))?)*$
Then match.Groups["left"]?.Captures and match.Groups["right"]?.Captures will give you the respective matched strings (or null). This is called a named capture.
I am trying to use regex to help to convert the following string into a Dictionary:
{TheKey|TheValue}{AnotherKey|AnotherValue}
Like such:
["TheKey"] = "TheValue"
["AnotherKey"] = "AnotherValue"
To parse the string for the dictionary, I am using the regex expression:
^(\{(.+?)\|(.+?)\})*?$
But it will only capture the last group of {AnotherKey|AnotherValue}.
How do I get it to capture all of the groups?
I am using C#.
Alternatively, is there a more straightforward way to approach this rather than using Regex?
Code (Properties["PromptedValues"] contains the string to be parsed):
var regex = Regex.Matches(Properties["PromptedValues"], #"^(\{(.+?)\|(.+?)\})*?$");
foreach(Match match in regex) {
if(match.Groups.Count == 4) {
var key = match.Groups[2].Value.ToLower();
var value = match.Groups[3].Value;
values.Add(key, new StringPromptedFieldHandler(key, value));
}
}
This is coded to work for the single value, I would be looking to update it once I can get it to capture multiple values.
The $ says that: The match must occur at the end of the string or before \n at the end of the line or string.
The ^ says that: The match must start at the beginning of the string or line.
Read this for more regex syntax: msdn RegEx
Once you remove the ^ and $ your regex will match all of the sets You should read: Match.Groups and get something like the following:
public class Example
{
public static void Main()
{
string pattern = #"\{(.+?)\|(.+?)\}";
string input = "{TheKey|TheValue}{AnotherKey|AnotherValue}";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
Console.WriteLine("The Key: {0}", match.Groups[1].Value);
Console.WriteLine("The Value: {0}", match.Groups[2].Value);
Console.WriteLine();
}
Console.WriteLine();
}
}
Your regex tries to match against the entire line. You can get individual pairs if you don't use anchors:
var input = Regex.Matches("{TheKey|TheValue}{AnotherKey|AnotherValue}");
var matches=Regex.Matches(input,#"(\{(.+?)\|(.+?)\})");
Debug.Assert(matches.Count == 2);
It's better to name the fields though:
var matches=Regex.Matches(input,#"\{(?<key>.+?)\|(?<value>.+?)\}");
This allows you to access the fields by name, and even use LINQ:
var pairs= from match in matches.Cast<Match>()
select new {
key=match.Groups["key"].Value,
value=match.Groups["value"].Value
};
Alternatively, you can use the Captures property of your groups to get all of the times they matched.
if (regex.Success)
{
for (var i = 0; i < regex.Groups[1].Captures.Count; i++)
{
var key = regex.Groups[2].Captures[i].Value.ToLower();
var value = regex.Groups[3].Captures[i].Value;
}
}
This has the advantage of still checking that your entire string was made up of matches. Solutions suggesting you remove the anchors will find things that look like matches in a longer string, but will not fail for you if anything was malformed.
I want to find out, whether my string contains a text like #1, #a, #abc, #123, #abc123dsds and so on... ('#' character with one or more characters (digits and letters).
My code so far won't work:
string test = "#123";
boolean matches = test.Contains("#.+");
The matches variable is false.
String.Contains does not accept a regex.
Use Regex.IsMatch:
var matches = Regex.IsMatch(test, "#.+");
test.Contains("#.+"); does not "understand" regular expressions. It literally checks if the string test literally contains #.+ sequence of characters, which #123 does not contain.
Use Regex.IsMatch instead:
bool matches = Regex.IsMatch(test, "#.+");
Demo.
Or without regex, you can use a combination of StartsWith, Enumerable.Any and char.IsLetterOrDigit methods like;
var s = "#abc123dsds+";
var matches = s.Length > 1 && s.StartsWith("#") && s.Substring(1).All(char.IsLetterOrDigit);
You need to use Regex in order to use a regex pattern.
string text = "#123";
Regex rgx = new Regex("#[a-zA-Z0-9]+");
var match = rgx.Match(text);
bool matche = match.Success)
Well this worked for me. \# checks if it starts with #, \w checks if it is a word.
class Program
{
static void Main(string[] args)
{
string text = "#2";
string pat = #"\#(\w+)";
Regex r = new Regex(pat);
Match m = r.Match(text);
Console.WriteLine(m.Success);
Console.ReadKey();
}
}
I need to compare two strings, one of which uses '*' as a wildcard. I was thinking of using either an iterative or recursive method when I realized that RegEx would perform the task more quickly. Unfortunately, I am new to RegEx, and am not sure how to do this.
If I sent in the pattern "He**o", then "Hello" and "He7(o" should return true, but "Hhllo" should return false.
Assuming that you mean * to be a single-character wildcard, the correct substitution in a Regex pattern is a dot (.):
string pattern = "He**o";
string regexPattern = pattern.Replace("*",".");
Regex.IsMatch("Hello",regexPattern); // true
Regex.IsMatch("He7)o",regexPattern); // true
Regex.IsMatch("he7)o",regexPattern); // false
Regex.IsMatch("he7)o",regexPattern, RegexOptions.IgnoreCase); // true
You might also want to anchor the pattern with ^ (start of string) and $ (end of string):
regexPattern = String.Format("^{0}$", pattern.Replace("*","."));
If you expect it to be able to parse input strings with special characters, you'll can escape all other characters like this:
string regexPattern = String.Join(".",pattern.Split("*".ToCharArray())
.Select(s => Regex.Escape(s)).ToArray());
Compare the strings by using the char index in a for loop. If the pattern char (wildcard) appears, ignore the comparison and move on to the next comparison.
private bool Compare(string pattern, string compare)
{
if (pattern.Length != compare.Length)
//strings don't match
return false;
for(int x = 0, x < pattern.Length, x++)
{
if (pattern[x] != '*')
{
if (pattern[x] != compare[x])
return false;
}
}
return true;
}
Make Regex using "He..lo"
This is a case that will not be recognized
Regex r = new Regex("He..o");
string test = "Hhleo";
bool sucess = r.Match(a).Success;
This is a case that will be recognized
Regex r = new Regex("He..o");
string test = "He7)o";
bool sucess = r.Match(a).Success;
That's exactly what I've done in php today. When you add this:
if (pattern[x] != '*' && compare[x] != '*')
Then both strings can have wildcards. (hope that && means logical AND like in php)
I am trying to find a regex to match any word enclosed in parentheses in a sentence.
Suppose, I have a sentence.
"Welcome, (Hello, All of you) to the Stack Over flow."
Say if my matching word is Hello,, All, of or you. It should return true.
Word could contain anything number , symbol but separated from other by white-space
I tried with this \(([^)]*)\). but this returns all words enclosed by parentheses
static void Main(string[] args)
{
string ss = "Welcome, (Hello, All of you) to the Stack Over flow.";
Regex _regex = new Regex(#"\(([^)]*)\)");
Match match = _regex.Match(ss.ToLower());
if (match.Success)
{
ss = match.Groups[0].Value;
}
}
Help and Guidance is very much appreciated.
Thanks.
Thanks People for you time and answers. I have finally solved by changing my code as reply by Tim.
For People with similar problem. I am writing my final code here
static void Main(string[] args)
{
string ss = "Welcome, (Hello, All of you) to the Stack Over flow.";
Regex _regex = new Regex(#"[^\s()]+(?=[^()]*\))");
Match match = _regex.Match(ss.ToLower());
while (match.Success)
{
ss = match.Groups[0].Value;
Console.WriteLine(ss);
match = match.NextMatch();
}
}
OK, so it seems that a "word" is anything that's not whitespace and doesn't contain parentheses, and that you want to match a word if the next parenthesis character that follows is a closing parenthesis.
So you can use
[^\s()]+(?=[^()]*\))
Explanation:
[^\s()]+ matches a "word" (should be easy to understand), and
(?=[^()]*\)) makes sure that a closing parenthesis follows:
(?= # Look ahead to make sure the following regex matches here:
[^()]* # Any number of characters except parentheses
\) # followed by a closing parenthesis.
) # (End of lookahead assertion)
I've developed a c# function for you, if you are interested.
public static class WordsHelper
{
public static List<string> GetWordsInsideParenthesis(string s)
{
List<int> StartIndices = new List<int>();
var rtn = new List<string>();
var numOfOpen = s.Where(m => m == '(').ToList().Count;
var numOfClose = s.Where(m => m == ')').ToList().Count;
if (numOfClose == numOfOpen)
{
for (int i = 0; i < numOfOpen; i++)
{
int ss = 0, sss = 0;
if (StartIndices.Count == 0)
{
ss = s.IndexOf('(') + 1; StartIndices.Add(ss);
sss = s.IndexOf(')');
}
else
{
ss = s.IndexOf('(', StartIndices.Last()) + 1;
sss = s.IndexOf(')', ss);
}
var words = s.Substring(ss, sss - ss).Split(' ');
foreach (string ssss in words)
{
rtn.Add(ssss);
}
}
}
return rtn;
}
}
Just call it this way:
var text = "Welcome, (Hello, All of you) to the (Stack Over flow).";
var words = WordsHelper.GetWordsInsideParenthesis(s);
Now you'll have a list of words in words variable.
Generally, you should opt for c# coding, rather than regex because c# is far more efficient and readable and better than regex in performance wise.
But, if you want to stick on to Regex, then its ok, do the following:
If you want to use regex, keep the regex from Tim Pietzcker [^\s()]+(?=[^()]*\)) but use it this way:
var text="Welcome, (Hello, All of you) to the (Stack Over flow).";
var values= Regex.Matches(text,#"[^\s()]+(?=[^()]*\))");
now values contains MatchCollection
You can access the value using index and Value property
Something like this:
string word=values[0].Value;
(?<=[(])[^)]+(?=[)])
Matches all words in parentheses
(?<=[(]) Checks for (
[^)]+ Matches everything up to but not including a )
(?=[)]) Checks for )