compiled Regex template with passing value dynamically [duplicate] - c#

This is the input string: 23x^45*y or 2x^2 or y^4*x^3.
I am matching ^[0-9]+ after letter x. In other words I am matching x followed by ^ followed by numbers. Problem is that I don't know that I am matching x, it could be any letter that I stored as variable in my char array.
For example:
foreach (char cEle in myarray) // cEle is letter in char array x, y, z, ...
{
match CEle in regex(input) //PSEUDOCODE
}
I am new to regex and I new that this can be done if I define regex variables, but I don't know how.

You can use the pattern #"[cEle]\^\d+" which you can create dynamically from your character array:
string s = "23x^45*y or 2x^2 or y^4*x^3";
char[] letters = { 'e', 'x', 'L' };
string regex = string.Format(#"[{0}]\^\d+",
Regex.Escape(new string(letters)));
foreach (Match match in Regex.Matches(s, regex))
Console.WriteLine(match);
Result:
x^45
x^2
x^3
A few things to note:
It is necessary to escape the ^ inside the regular expression otherwise it has a special meaning "start of line".
It is a good idea to use Regex.Escape when inserting literal strings from a user into a regular expression, to avoid that any characters they type get misinterpreted as special characters.
This will also match the x from the end of variables with longer names like tax^2. This can be avoided by requiring a word boundary (\b).
If you write x^1 as just x then this regular expression will not match it. This can be fixed by using (\^\d+)?.

The easiest and faster way to implement from my point of view is the following:
Input: This?_isWhat?IWANT
string tokenRef = "?";
Regex pattern = new Regex($#"([^{tokenRef}\/>]+)");
The pattern should remove my tokenRef and storing the following output:
Group1 This
Group2 _isWhat
Group3 IWANT

Try using this pattern for capturing the number but excluding the x^ prefix:
(?<=x\^)[0-9]+
string strInput = "23x^45*y or 2x^2 or y^4*x^3";
foreach (Match match in Regex.Matches(strInput, #"(?<=x\^)[0-9]+"))
Console.WriteLine(match);
This should print :
45
2
3
Do not forget to use the option IgnoreCase for matching, if required.

Related

Regex only letters except set of numbers

I'm using Replace(#"[^a-zA-Z]+", "");
leave only letters, but I have a set of numbers or characters that I want to keep as well, ex: 122456 and 112466. But I'm having trouble leaving it only if it's this sequence:
ex input:
abc 1239 asm122456000
I want to:
abscasm122456
tried this: ([^a-zA-Z])+|(?!122456)
My answer doesn't applying Replace(), but achieves a similar result:
(?:[a-zA-Z]+|\d{6})
which captures the group (non-capturing group) with the alphabetic character(s) or a set of digits with 6 occurrences.
Regex 101 & Test Result
Join all the matching values into a single string.
using System.Linq;
Regex regex = new Regex("(?:[a-zA-Z]+|\\d{6})");
string input = "abc 1239 asm12245600";
string output = "";
var matches = regex.Matches(input);
if (matches.Count > 0)
output = String.Join("", matches.Select(x => x.Value));
Sample .NET Fiddle
Alternate way,
using .Split() and .All(),
string input = "abc 1239 asm122456000";
string output = string.Join("", input.Split().Where(x => !x.All(char.IsDigit)));
.NET Fiddle
It is very simple: you need to match and capture what you need to keep, and just match what you need to remove, and then utilize a backreference to the captured group value in the replacement pattern to put it back into the resulting string.
Here is the regex:
(122456|112466)|[^a-zA-Z]
See the regex demo. Details:
(122456|112466) - Capturing group with ID 1: either of the two alternatives
| - or
[^a-zA-Z] - a char other than an ASCII letter (use \P{L} if you need to match any char other than any Unicode letter).
Note the removed + quantifier as [^A-Za-z] also matches digits.
You need to use $1 in the replacement:
var result = Regex.Replace(text, #"(122456|112466)|[^a-zA-Z]", "$1");

Regex for splitting string into a collection of two based on a pattern

Using the C# Regex.Split method, I would like to split strings that will always start with RepXYZ, Where the XYZ bit is a number that will always have either 3 or 4 characters.
Examples
"Rep1007$chkCheckBox"
"Rep127_Group_Text"
The results should be:
{"Rep1007","$chkCheckBox"}
{"Rep127","_Group_Text"}
So far I have tried (Rep)[\d]{3,4} and ((Rep)[\d]{3,4})+ but both of those are giving me unwanted results
Using Regex.Split often results in empty or unwanted items in the resulting array. Using (Rep)[\d]{3,4} in Regex.Split, will put Rep without the numbers into the resulting array. (Rep[\d]{3,4}) will put the Rep and the numbers into the result, but since the match is at the start, there will be an empty item in the array.
I suggest using Regex.Match here:
var match = Regex.Match(text, #"^(Rep\d+)(.*)$");
if (match.Success)
{
Console.WriteLine(match.Groups[1].Value);
Console.WriteLine(match.Groups[2].Value);
}
See the regex demo
Details:
^ - start of string
(Rep\d+) - capturing group 1: Rep and any one or more digits
(.*) - capturing group 2: any one or more chars other than a newline, as many as possible
$ - end of string.
A splitting approach is better implemented with a lookaround-based regex:
var results = Regex.Split(text, #"(?<=^Rep\d+)(?=[$_])");
See this regex demo.
(?<=^Rep\d+)(?=[$_]) splits a string at the location that is immediately preceded with Rep and one or more digits at the start of the string, and immediately followed with $ or _.
Try splitting on the regex pattern on either $ or _:
string input = "Rep127_Group_Text";
string[] parts = input.Split(new[] { '$', '_' }, 2);
foreach (string part in parts)
{
Console.WriteLine(part);
}
This prints:
Rep127
Group_Text

Removing words with special characters in them

I have a long string composed of a number of different words.
I want to go through all of them, and if the word contains a special character or number (except '-'), or starts with a Capital letter, I want to delete it (the whole word not just that character). For all intents and purposes 'foreign' letters can count as special characters.
The obvious solution is to run a loop through each word (after splitting it) and then a loop through each character - but I'm hoping there's a faster way of doing it? Perhaps using Regex but I've almost no experience with it.
Thanks
ADDED:
(What I want for example:)
Input: "this Is an Example of 5 words in an input like-so from example.com"
Output: {this,an,of,words,in,an,input,like-so,from}
(What I've tried so far)
List<string> response = new List<string>();
string[] splitString = text.Split(' ');
foreach (string s in splitString)
{
bool add = true;
foreach (char c in s.ToCharArray())
{
if (!(c.Equals('-') || (Char.IsLetter(c) && Char.IsLower(c))))
{
add = false;
break;
}
if (add)
{
response.Add(s);
}
}
}
Edit 2:
For me a word should be a number of characters (a..z) seperated by a space. ,/./!/... at the end shouldn't count for the 'special character' condition (which is really mostly just to remove urls or the like)
So:
"I saw a dog. It was black!"
should result in
{saw,a,dog,was,black}
So you want to find all "words" that only contain characters a-z or -, for words that are separated by spaces?
A regex like this will find such words:
(?<!\S)[a-z-]+(?!\S)
To also allow for words that end with single punctuation, you could use:
(?<!\S)[a-z-]+(?=[,.!?:;]?(?!\S))
Example (ideone):
var re = #"(?<!\S)[a-z-]+(?=[,.!?:;]?(?!\S))";
var str = "this, Is an! Example of 5 words in an input like-so from example.com foo: bar?";
var m = Regex.Matches(str, re);
Console.WriteLine("Matched: ");
foreach (Match i in m)
Console.Write(i + " ");
Notice the punctuation in the string.
Output:
Matched:
this an of words in an input like-so from foo bar
How about this?
(?<=^|\s+)(?[a-z-]+)(?=$|\s+)
Edit: Meant (?<=^|\s+)(?<word>[a-z\-]+)(?=(?:\.|,|!|\.\.\.)?(?:$|\s+))
Rules:
Word can only be preceded by start of line or some number of whitespace characters
Word can only be followed by end of line or some number of whitespace characters (Edit supports words ending with periods, commas, exclamation points, and ellipses)
Word can only contain lower case (latin) letters and dashes
The named group containing each word is "word"
Have a look at Microsoft's How to: Search Strings Using Regular Expressions (C# Programming Guide) - it's about regexes in C#.
List<string> strings = new List<string>() {"asdf", "sdf-sd", "sdfsdf"};
for (int i = strings.Count-1; i > 0; i--)
{
if (strings[i].Contains("-"))
{
strings.Remove(strings[i]);
}
}
This could be a starting point. right now it just checks only for "." as a special char. This outputs : "this an of words in an like-so from"
string pattern = #"[A-Z]\w+|\w*[0-9]+\w*|\w*[\.]+\w*";
string line = "this Is an Example of 5 words in an in3put like-so from example.com";
System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex(pattern);
line = r.Replace(line,"");
You can do this in two ways, the white-list way and the black-list way. With a white-list you define the set of characters that you consider to be acceptable and with the black-list its the opposite.
Lets assume the white-list way and that you accept only characters a-z, A-Z and the - character. Additionally you have the rule that the first character of a word cannot be an upper case character.
With this you can do something like this:
string target = "This is a white-list example: (Foo, bar1)";
var matches = Regex.Matches(target, #"(?:\b)(?<Word>[a-z]{1}[a-zA-Z\-]*)(?:\b)");
string[] words = matches.Cast<Match>().Select(m => m.Value).ToArray();
Console.WriteLine(string.Join(", ", words));
Outputs:
// is, a, white-list, example
You can use look-aheads and look-behinds to do this. Here's a regex that matches your example:
(?<=\s|^)[a-z-]+(?=\s|$)
The explanation is: match one or more alphabetic characters (lowercase only, plus hyphen), as long as what comes before the characters is whitespace (or the start of the string), and as long as what comes after is whitespace or the end of the string.
All you need to do now is plug that into System.Text.RegularExpressions.Regex.Matches(input, regexString) to get your list of words.
Reference: http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet

How to ignore regex matches in C#?

An input string:
string datar = "aag, afg, agg, arg";
I am trying to get matches: "aag" and "arg", but following won't work:
string regr = "a[a-z&&[^fg]]g";
string regr = "a[a-z[^fg]]g";
What is the correct way of ignoring regex matches in C#?
The obvious way is to use a[a-eh-z]g, but you could also try with a negative lookbehind like this :
string regr = "a[a-z](?<!f|g)g"
Explanation :
a Match the character "a"
[a-z] Match a single character in the range between "a" and "z"
(?<!XXX) Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
f|g Match the character "f" or match the character "g"
g Match the character "g"
Character classes aren't quite that fancy. The simple solution is:
a[a-eh-z]g
If you really want to explicitly list out the letters that don't belong, you could try something like:
a[^\W\d_A-Zfg]g
This character class matches everything except:
\W excludes non-word characters, i.e. punctuation, whitespace, and other special characters. What's left are letters, digits, and the underscore _.
\d removes digits so now we have letters and the underscore _.
_ removes the underscore so now we only match letters.
A-Z removes uppercase letters so now we only match lowercase letters.
Finally at this point we can list the individual lowercase letters we don't want to match.
All in all way more complicated than we'd likely ever want. That's regular expressions for ya!
What you're using is Java's set intersection syntax:
a[a-z&&[^fg]]g
..meaning the intersection of the two sets ('a' THROUGH 'z') and (ANYTHING EXCEPT 'f' OR 'g'). No other regex flavor that I know of uses that notation. The .NET flavor uses the simpler set subtraction syntax:
a[a-z-[fg]]g
...that is, the set ('a' THROUGH 'z') minus the set ('f', 'g').
Java demo:
String s = "aag, afg, agg, arg, a%g";
Matcher m = Pattern.compile("a[a-z&&[^fg]]g").matcher(s);
while (m.find())
{
System.out.println(m.group());
}
C# demo:
string s = #"aag, afg, agg, arg, a%g";
foreach (Match m in Regex.Matches(s, #"a[a-z-[fg]]g"))
{
Console.WriteLine(m.Value);
}
Output of both is
aag
arg
Try this if you want match arg and aag:
a[ar]g
If you want to match everything except afg and agg, you need this regex:
a[^fg]g
It seems like you're trying to match any three alphabetic characters, with the condition that the second character cannot be f or g. If this is the case, why not use the following regular expression:
string regr = "a[a-eh-z]g";
Regex: a[a-eh-z]g.
Then use Regex.Matches to get the matched substrings.

How to use variables with regex?

This is the input string: 23x^45*y or 2x^2 or y^4*x^3.
I am matching ^[0-9]+ after letter x. In other words I am matching x followed by ^ followed by numbers. Problem is that I don't know that I am matching x, it could be any letter that I stored as variable in my char array.
For example:
foreach (char cEle in myarray) // cEle is letter in char array x, y, z, ...
{
match CEle in regex(input) //PSEUDOCODE
}
I am new to regex and I new that this can be done if I define regex variables, but I don't know how.
You can use the pattern #"[cEle]\^\d+" which you can create dynamically from your character array:
string s = "23x^45*y or 2x^2 or y^4*x^3";
char[] letters = { 'e', 'x', 'L' };
string regex = string.Format(#"[{0}]\^\d+",
Regex.Escape(new string(letters)));
foreach (Match match in Regex.Matches(s, regex))
Console.WriteLine(match);
Result:
x^45
x^2
x^3
A few things to note:
It is necessary to escape the ^ inside the regular expression otherwise it has a special meaning "start of line".
It is a good idea to use Regex.Escape when inserting literal strings from a user into a regular expression, to avoid that any characters they type get misinterpreted as special characters.
This will also match the x from the end of variables with longer names like tax^2. This can be avoided by requiring a word boundary (\b).
If you write x^1 as just x then this regular expression will not match it. This can be fixed by using (\^\d+)?.
The easiest and faster way to implement from my point of view is the following:
Input: This?_isWhat?IWANT
string tokenRef = "?";
Regex pattern = new Regex($#"([^{tokenRef}\/>]+)");
The pattern should remove my tokenRef and storing the following output:
Group1 This
Group2 _isWhat
Group3 IWANT
Try using this pattern for capturing the number but excluding the x^ prefix:
(?<=x\^)[0-9]+
string strInput = "23x^45*y or 2x^2 or y^4*x^3";
foreach (Match match in Regex.Matches(strInput, #"(?<=x\^)[0-9]+"))
Console.WriteLine(match);
This should print :
45
2
3
Do not forget to use the option IgnoreCase for matching, if required.

Categories