c# Regular expression example - c#

The following are examples of correct input strings:
1,a
2,a,b
3,a,b,c
4,a,b,c,b
and so on...
The first number indicates how many letters follow in the string, and the rest of the letters can be either a,b or c in any order.
Can a regular expression be used to correctly match and capture the first number and each of the letters into groups (and exclude the commas) using Regex.Match?

You don't need Regular Expressions to do this, you could just use LINQ.
string[] split = string.Split('.');
string number = split.First();
string[] letters = split.Skip(1).ToArray();

You can use Split and LINQ methods. First you need to validate the input string:
var parts = input.Split(',');
bool isMatch = parts.Skip(1)
.Count(x => x.Length == 1 &&
char.IsDigit(x[0])) == int.Parse(parts[0]);
if(isMatch)
var digits = parts.Select(int.Parse);
More generally you can write an extension method for that:
public static bool IsMatch(this string source)
{
if (source == null) throw new ArgumentNullException("source");
var parts = source.Split(',');
if (parts.Any())
{
return parts.All(x => x.Length > 0 && x.All(char.IsDigit)) && parts.Skip(1).Count() == int.Parse(parts[0]);
}
return false;
}
This method will match the following strings:
1,4
3,456,123,789
5,12,34,11,78,65
And won't match the following strings:
1,2,3
4,2,1
2,1
1,a
2,1,b

Related

Trying to filter only digits in string array using LINQ

I'm trying to filter only digits in string array. This works if I have this array:
12324 asddd 123 123, but if I have chars and digits in one string e.g. asd1234, it does not take it.
Can u help me how to do it ?
int[] result = input
.Where(x => x.All(char.IsDigit))// tried with .Any(), .TakeWhile() and .SkipWhile()
.Select(int.Parse)
.Where(x => x % 2 == 0)
.ToArray();
Something like this should work. The function digitString will select only digits from the input string, and recombine into a new string. The rest is simple, just predicates selecting non-empty strings and even numbers.
var values = new[]
{
"helloworld",
"hello2",
"4",
"hello123world123"
};
bool isEven(int i) => i % 2 == 0;
bool notEmpty(string s) => s.Length > 0;
string digitString(string s) => new string(s.Where(char.IsDigit).ToArray());
var valuesFiltered = values
.Select(digitString)
.Where(notEmpty)
.Select(int.Parse)
.Where(isEven)
.ToArray();
You need to do it in 2 steps: First filter out all the invalid strings, then filter out all the non-digits in the valid strings.
A helper Method would be very readable here, but it is also possible with pure LINQ:
var input = new[]{ "123d", "12e", "pp", "33z3"};
input
.Where(x => x.Any(char.IsDigit))
.Select(str => string.Concat(str.Where(char.IsDigit)));
Possible null values should be drop to avoid NullReferenceException.
string.Join() suitable for concatenation with digit filtering.
Additinally empty texts should be dropped because it cannot be converted to an integer.
string[] input = new string[] { "1234", "asd124", "2345", "2346", null, "", "asdfas", "2" };
int[] result = input
.Where(s => s != null)
.Select(s => string.Join("", s.Where(char.IsDigit)))
.Where(s => s != string.Empty)
.Select(int.Parse)
.Where(x => x % 2 == 0)
.ToArray();
Using Linq Aggregate method and TryParse() can give you perfect result:
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
var input = new string[] { "aaa123aaa124", "aa", "778", "a777", null };
var result = input.Aggregate(
new List<int>(),
(x, y) =>
{
if (y is null)
return x;
var digitOnlyString = Regex.Replace(y, "[^0-9]", string.Empty);
if (int.TryParse(digitOnlyString, out var temp) && temp % 2 == 0)
x.Add(temp);
return x;
})
.ToArray();
Max,
You can do this in a single expression like so:
using System.Linq;
using System.Text.RegularExpressions;
var input = new[] { "aaa123aaa124", "aa", "778", "a777", null };
var rx = new Regex(#"[0-9]+");
var numbersOnly = input.Where(s => !string.IsNullOrEmpty(s) && rx.IsMatch(s))
.Select(s => string.Join("", rx.Matches(s).Cast<Match>().Select(m => m.Value)));
foreach (var number in numbersOnly) Console.WriteLine(number);
Which returns:
123124
778
777
if I have chars and digits in one string e.g. asd1234, it does not take it
Apparently you want to parse this line also. You want to translate "asd1234" to "1234" and then parse it.
But what if your input sequence of strings contains a string with two numbers: "123asd456". Do you want to interpret this as "123", or maybe as "123456", or maybe you consider this as two numbers "123" and "456".
Let's assume you don't have this problem: every string contains at utmost one number, or if you have a string with more than one number, you only want the first number.
In fact, you only want to keep those string that are "zero or more non-digits followed by one or more digits followed by zero or more characters.
Enter Regular Expressions!
const string regexTxt = "\D*\d+.*";
Regex regex = new Regex(regexTxt);
\D: any non-digit
*: zero or more
\d: any digit
+: one or more
. any character
(...) capture the parts between the parentheses
So this regular expression matches any string that starts with zero or more non-digits, followed by at least one digit, followed by zero or more characters. Capture the "at least one digit" part.
If you try to Match() an input string with this regular expression, you get a Match object. Property Success tells you whether the input string is according to the regular expression.
The Match object has a property Groups which contains all Matches. Groups[0] is the complete string, Groups1 contains a Group which has the first captured string in property Value.
A simple program that shows how to use the regular expression:
const string regexTxt = #"\D*(\d+).*";
Regex regex = new Regex(regexTxt);
var lines = new string[]
{
String.Empty,
"A",
"A lot of text, no numbers",
"1",
"123456",
"Some text and then a number 123",
"Several characters, then a number: 123 followed by another number 456!",
"___123---456...",
};
foreach (var line in lines)
{
Match match = regex.Match(line);
if (match.Success)
{
string capturedDigits = match.Groups[1].Value;
int capturedNumber = Int32.Parse(capturedDigits);
Console.WriteLine("{0} => {1}", line, capturedNumber);
}
}
Or in a LINQ statement:
const string regexTxt = #"\D*(\d+).*";
Regex regex = new Regex(regexTxt);
IEnumerable<string> sourceLines = ...
var numbers= sourceLines
.Select(line => regex.Match(line)) // Match the Regex
.Where(match => match.IsMatch) // Keep only the Matches that match
.Select(match => Int32.Parse(match.Groups[1].Value);
// Finally Parse the captured text to int

C# String and Digits Regular Expression

How can I allow the users that can only type 2words that only accepts M and C word and 5 digits number (0-9) ?
For example MC04326
Here is my code so far
else if (!(new Regex(#"^(MC)(([][0-9])$")).IsMatch(txtStudentIDReg.Text))
{
}
No need for regex with such simple validation:
// str is inputted string
var isValid =
str.StartsWith("MC") && // starts with MC
str.Substring(2).All(ch => char.IsDigit(ch)) && // after second character, all are digits
str.Length == 7; // is of length 7

LINQ conditional selection and formatting

I have a string of characters and I'm trying to set up a query that'll substitute a specific sequence of similar characters into a character count. Here's an example of what I'm trying to do:
agd69dnbd555bdggjykbcx555555bbb
In this case, I'm trying to isolate and count ONLY the occurrences of the number 5, so my output should read:
agd69dnbd3bdggjykbcx6bbb
My current code is the following, where GroupAdjacentBy is a function that groups and counts the character occurrences as above.
var res = text
.GroupAdjacentBy((l, r) => l == r)
.Select(x => new { n = x.First(), c = x.Count()})
.ToArray();
The problem is that the above function groups and counts EVERY SINGLE character in my string, not the just the one character I'm after. Is there a way to conditionally perform that operation on ONLY the character I need counted?
Regex is a better tool for this job than LINQ.
Have a look at this:
string input = "agd69dnbd555bdggjykbcx555555bbb";
string pattern = #"5+"; // Match 1 or more adjacent 5 character
string output = Regex.Replace(input, pattern, match => match.Length.ToString());
// output = agd69dnbd3bdggjykbcx6bbb
Not sure if your intending to replace every 5 character, or just when there is more than one adjacent 5.
If it's the latter, just change your pattern to:
string pattern = #"5{2,}"; // Match 2 or more adjacent 5's
The best answer was already given by Johnathan Barclay. But just for the case that you need something similar by using Linq and to show an alternative solution:
var charToCombine = '5';
var res = text
.GroupAdjacentBy((l, r) => l == r )
.SelectMany(x => x.Count() > 1 && x.First() == charToCombine ? x.Count().ToString() : x)
.ToArray();

Check if a string contains only letters, digits and underscores

I have to check if a string contains only letters, digits and underscores.
This is how I tried but it doesn't work:
for(int i = 0; i<=snameA.Length-1; i++)
{
validA = validA && (char.IsLetterOrDigit(snameA[i])||snameA[i].Equals("_"));
}
I love Linq for this kind of question:
bool validA = sname.All(c => Char.IsLetterOrDigit(c) || c.Equals('_'));
You are assigning validA every time again, without checking its previous value. Now you always get the value of the last check executed.
You could 'and' the result:
validA &= (char.IsLetterOrDigit(snameA[i]) || snameA[i] == '_');
This would mean you still run all characters, which might be useless if the first check failed. So it is better to simply step out if it fails:
for(int i = 0; i<=snameA.Length-1; i++)
{
validA = (char.IsLetterOrDigit(snameA[i]) || snameA[i] == '_');
if (!validA)
{ break; } // <-- see here
}
Or with LINQ:
validA = snameA.All(c => char.IsLetterOrDigit(c) || c == '_');
you can use regex
Regex regex1 = new Regex(#"^[a-zA-Z0-9_]+$");
if(regex1.IsMatch(snameA))
{
}
I would use a Regex
string pattern = #"^[a-zA-Z0-9\_]+$";
Regex regex = new Regex(pattern);
// Compare a string against the regular expression
return regex.IsMatch(stringToTest);
You could try matching a regular expression. There is a built in type for "letters, digits and underscores", which is "\w".
Regex rgx = new Regex(#"\w*");
rgs.IsMatch(yourString);
If you require 1 or more, then use "\w+".
Further information here: Regex.IsMatch
First, letter is a bit vague term: do you mean a..z and A..Z characters or letter could belong to any alphabet, e.g. а..я and А..Я (Russian, Cyrillic letters). According to your current implementation, you want the second option.
Typical solution with loop is to check until first counter example:
Boolean validA = true; // true - no counter examples so far
// Why for? foreach is much readble here
foreach(Char ch in sname)
// "!= '_'" is more readable than "Equals"; and wants no boxing
if (!char.IsLetterOrDigit(ch) && ! (ch != '_')) {
Boolean validA = false; // counter example (i.e. non-letter/digit symbol found)
break; // <- do not forget this: there's no use to check other characters
}
However you can simplify the code with either Linq:
validA = sname.All(ch => Char.IsLetterOrDigit(ch) || ch == '_');
Or regular expression:
validA = Regex.IsMatch(sname, #"^\w*$");

Check if a string contains a certain set of characters? (duplicate characters might be needed)

I know there are plenty of ways to check if a string contains certain characters, but I'm trying to figure out a way of excluding duplicate letters.
So for instance, we have the string "doaurid" (random letters entered by the user)
And they type the word "dad" to see if it's valid
I can't figure out a simple solution to check if that string has 2 D's and one A.
the only way I've thought of is to use nested for loops and go through every single element in a char array and convert used letters to 1 or something
You can use:
var userInput = "doaurid";
var toCheck = "dad";
var check = toCheck.GroupBy(c=> c).ToDictionary(g => g.Key, g => g.Count());
var input = userInput.GroupBy(c=> c).ToDictionary(g => g.Key, g => g.Count());
bool validMatch = check.All(g => input.ContainsKey(g.Key) && input[g.Key] == g.Value);
This will only be valid if the userInput string contains all of the letters in toCheck, and the exact same number of letters.
If the input string can allow more duplicated letters (ie: if "dddoauriddd" should match), the check could be done via:
bool validMatch = check.All(g => input.ContainsKey(g.Key) && input[g.Key] >= g.Value);
Reed Copsey's answer is correct.Anyway here is another alternative with LINQ:
var userInput = "doaurid";
var searchWord = "dad";
var control = userInput.Where(searchWord.Contains).Count() == searchWord.Length;
One possibility for this is using regular expressions. For instance, the following code will detect whether or not the supplied string contains any duplicate letters.
var expression = new Regex(#"(?<letter>\w).*\k<letter>");
if (expression.IsMatch(userInput)) {
Console.WriteLine("Found a duplicate letter: {0}", expression.Match(userInput).Groups["letter"].Value);
}
This expression works by first matching any word character, and then storing that result in the "letter" group. It then skips over 0 or more other intervening letters and finally matches a "backreference" to whatever it captured in the "letter" group. If a string doesn't contain any duplicate letters, this regular expression will not match - so if it matches, you know it contains duplicates, and you know at least one of those duplicates by examining the value it captured in the letter group.
In this case, it would be case sensitive. If you wanted it to be case insensitive, you could pass the RegexOptions.IgnoreCase argument to the constructor of your regular expression.
Another apporach, using ToLookup:
var charCount = "dad".ToLookup(chr => chr);
bool allSame = charCount.All(g => g.Count() == "doaurid".Count(c => c == g.Key));
I could think of a simpler solution like this
var userInput = "dddoauriddd"; //"doaurid";
var toCheck = "dad";
var toCheckR = "";
foreach(var c in toCheck)
{
toCheckR += ".*";
}
Console.WriteLine(Regex.IsMatch(userInput, toCheckR));

Categories