Custom string variable in regex - c#

I want to find two or more variable in a string with Regex. For instance I have an string like this "Result = Num + 2 ( 6 * Count )". I want to find out if "Result", "Num" and "Count" are in this string or not. Suppose that I want to build a small compiler and these Strings are my reserved words and I want to use regex for this checks.
Case sensitive is more important for me. For example if client inputs "num" or "count" in a string, the method must return false.
How can I do it in C#?

Update to use arbitary word collection
var words = new [] {
"Result",
"Num",
"Count"
};
var source = "Result = Num + 2 ( 6 * Count)";
var regex=new Regex(string.format(#"\b(?<words>(?-i){0})\b", string.Join("|",words));
var results = (
from m in regex.Matches(source).OfType<Match>()
select m.Groups["words"].Value
).ToArray();
results will be an array of matching words
However if as you state as your comment in another answer you are building a small compiler you would be better off building a tokenising engine. For example Build a Better Tokeniser

This is probably one of the first things in a Regex tutorial..
string expression = "Result = Num + 2 ( 6 * Count )";
foreach (Match match in Regex.Matches(expression, "[a-zA-Z]+")) {
Console.WriteLine(match.Value);
}

Related

Replace a part of string containing Password

Slightly similar to this question, I want to replace argv contents:
string argv = "-help=none\n-URL=(default)\n-password=look\n-uname=Khanna\n-p=100";
to this:
"-help=none\n-URL=(default)\n-password=********\n-uname=Khanna\n-p=100"
I have tried very basic string find and search operations (using IndexOf, SubString etc.). I am looking for more elegant solution so as to replace this part of string:
-password=AnyPassword
to:
-password=*******
And keep other part of string intact. I am looking if String.Replace or Regex replace may help.
What I've tried (not much of error-checks):
var pwd_index = argv.IndexOf("--password=");
string converted;
if (pwd_index >= 0)
{
var leftPart = argv.Substring(0, pwd_index);
var pwdStr = argv.Substring(pwd_index);
var rightPart = pwdStr.Substring(pwdStr.IndexOf("\n") + 1);
converted = leftPart + "--password=********\n" + rightPart;
}
else
converted = argv;
Console.WriteLine(converted);
Solution
Similar to Rubens Farias' solution but a little bit more elegant:
string argv = "-help=none\n-URL=(default)\n-password=\n-uname=Khanna\n-p=100";
string result = Regex.Replace(argv, #"(password=)[^\n]*", "$1********");
It matches password= literally, stores it in capture group $1 and the keeps matching until a \n is reached.
This yields a constant number of *'s, though. But telling how much characters a password has, might already convey too much information to hackers, anyway.
Working example: https://dotnetfiddle.net/xOFCyG
Regular expression breakdown
( // Store the following match in capture group $1.
password= // Match "password=" literally.
)
[ // Match one from a set of characters.
^ // Negate a set of characters (i.e., match anything not
// contained in the following set).
\n // The character set: consists only of the new line character.
]
* // Match the previously matched character 0 to n times.
This code replaces the password value by several "*" characters:
string argv = "-help=none\n-URL=(default)\n-password=look\n-uname=Khanna\n-p=100";
string result = Regex.Replace(argv, #"(password=)([\s\S]*?\n)",
match => match.Groups[1].Value + new String('*', match.Groups[2].Value.Length - 1) + "\n");
You can also remove the new String() part and replace it by a string constant

Remove substring if number exists before keyword

I have a strings with the form:
5 dogs = 1 medium size house
4 cats = 2 small houses
one bird = 1 bird cage
What I amt trying to do is remove the substring that exists before the equals sign but only if the substring contains a keyword and the data before that keyword is a integer.
So in this example my key words are:
dogs,
cats,
bird
In the above example, the ideal output of my process would be:
1 medium size house
2 small houses
one bird = 1 bird cage
My code so far looks like this (I am hard coding the keyword values/strings for now)
var orginalstring= "5 dogs = 1 medium size house";
int equalsindex = originalstring.indexof('=');
var prefix = originalstring.Substring(0,equalsindex);
if(prefix.Contains("dogs")
{
var modifiedstring = originalstring.Remove(prefix).Replace("=", string.empty);
return modifiedstring;
}
return originalstring;
The issue here is that I am removing the whole substring regardless of whether or not the data preceding the keyword is a number.
Would somebody be able to help me with this additional logic?
Thanks so much as always for anybody who takes a few minutes to read this question.
Mick
You can do it with a simple regex of the form
\d+\s+(?:kw1|kw2|kw3|...)\s*=\s*
where kwX is the corresponding keyword.
var data = new[] {
"5 dogs = 1 medium size house",
"4 cats = 2 small houses",
"one bird = 1 bird cage"
};
var keywords = new[] {"dogs", "cats", "bird"};
var regexStr = string.Format( #"\d+\s+(?:{0})\s*=\s*", string.Join("|", keywords));
var regex = new Regex(regexStr);
foreach (var s in data) {
Console.WriteLine("'{0}'", regex.Replace(s, string.Empty));
}
In the example above the call of string.Format pastes the list of keywords joined by | into the "template" of the expression at the top of the post, i.e.
\d+\s+(?:dogs|cats|bird)\s*=\s*
This expression matches
One or more digits \d+, followed by
One or more space \s+, followed by
A keyword from the list: dogs, cats, bird (?:dogs|cats|bird), followed by
Zero or more spaces \s*, followed by
An equal sign =, followed by
Zero or more spaces \s*
The rest is easy: since this regex matches the part that you wish to remove, you need to call Replace and pass it string.Empty.
Demo.
You can use regex (System.Text.RegularExpressions) to identify whether or not there is a number in the string.
Regex r = new Regex("[0-9]"); //Look for a number between 0 and 9
bool hasNumber = r.IsMatch(prefix);
This Regex simply searches for any number in the string. If you want to search for a number-space-string you could use [0-9] [a-z]|[A-Z]. The | is an "or" so that both upper and lower case letters result in a match.
You can try something like this:
int i;
if(int.TryParse(prefix.Substring(0, 1), out i)) //try to get an int from first char of prefix
{
//remove prefix
}
This will only work for single-digit integers, however.

Regex cut number in a string c#

I have a string as following 2 - 5 now I want to get the number 5 with Regex C# (I'm new to Regex), could you suggest me an idea? Thanks
You can use String.Split method simply:
int number = int.Parse("2 - 5".Split('-', ' ').Last());
This will work if there is no space after the last number.If that is the case then:
int number = int.Parse("2 - 5 ".Split('-', ' ')
.Last(x => x.Any() && x.All(char.IsDigit)));
Very simply as follows:
'\s-\s(\d)'
and extract first matching group
#SShashank has the right of it, but I thought I'd supply some code, since you mentioned you were new to Regex:
string s = "something 2-5 another";
Regex rx = new Regex(#"-(\d)");
if (rx.IsMatch(s))
{
Match m = rx.Match(s);
System.Console.WriteLine("First match: " + m.Groups[1].Value);
}
Groups[0] is the entire match and Groups[1] is the first matched group (stuff in parens).
If you really want to use regex, you can simply do:
string text = "2 - 5";
string found = Regex.Match(text, #"\d+", RegexOptions.RightToLeft).Value;

RegEx - Find and Replace while ignoring a number in the middle?

In the middle of a long string, I am looking for "No. 1234. "
The number (1234) in my example above can be any length whole number. It also has to match on the space at the end.
So I am looking for examples:
1) This is a test No. 42. Hello Nice People
2) I have no idea wtf No. 1234412344124. I am doing.
I have figured out a way to match on this pattern with the following regex:
(No. [\d]{1,}. )'
What I cannot figure out, though, is how to do one simple thing when finding a match: Replace that last period with a darn comma!
So, with the two examples up above, I want to transform them into:
1) This is a test No. 42, Hello Nice People
2) I have no idea wtf No. 1234412344124, I am doing.
(Notice the commas now after the numbers)
How might one do this in C# and RegEx? Thank you!
EDIT:
Another way of looking at this is...
I can do this easily and have for years:
str = Replace(str, "Find this", "Replace it with this")
However, how can I do that by combining regex and the unknown portion of the string in the middle to replace the last period (not to be confused with the last character since the last character still needs to be a space)
This is a test No. 42. Hello Nice People
This is a test No. (some unknown length number). Hello Nice People
becomes
This is a test No. 42, Hello Nice People
This is a test No. (some unknown length number), Hello Nice People
(Notice the comma)
So you are essentially trying to match two adjacent groups, "\d+" and ". " then replace the second with ", ".
var r = new Regex(#"(\d+)(\. )");
var input = "This is a test No. 42. Hello Nice People";
var output = r.Replace(input, "$1, ");
Use the parenthesis to match two groups then with replace keep the first group and dump in the ", ".
Edit: derp, escape that period.
Edit - #1:
neilh's way is much better!
Ok, i know the code looks ugly.. i don't know how to edit the last char of a match directly in a regex
string[] stringhe = new string[5] {
"This is a test No. 42, Hello Nice People",
"I have no idea wtf No. 1234412344124. I am doing.",
"Very long No. 74385748957348957893458934; Hello World",
"Nope No. 48394839!!!",
"Nope"
};
Regex reg = new Regex(#"No.\s*([0-9]+)");
Match match;
int idx = 0;
StringBuilder builder;
foreach(string stringa in stringhe)
{
match = reg.Match(stringa);
if (match.Success)
{
Console.WriteLine("No. Stringa #" + idx + ": " + stringhe[idx]);
int indexEnd = match.Groups[1].Index + match.Groups[1].Length;
builder = new StringBuilder(stringa);
builder[indexEnd] = '.';
stringhe[idx] = builder.ToString();
Console.WriteLine("New String: " + stringhe[idx]);
}
++idx;
}
Console.ReadKey(true);
If you want to edit the char after the number of if it's a ',':
int indexEnd = match.Groups[1].Index + match.Groups[1].Length;
if (stringa[indexEnd] == ',')
{
builder = new StringBuilder(stringa);
builder[indexEnd] = '.';
stringhe[idx] = builder.ToString();
Console.WriteLine("New String: " + stringhe[idx]);
}
Or, we can edit the Regex to detect only if the number is followed by a comma with (better anyway)
No.\s*([0-9]+),
I'm not the best at Regex, but this should do what you want.
No.\s+([0-9]+)
If you except zero or more whitespaces between No. {NUMBER} this Regex should do the work:
No.\s*([0-9]+)
An example of how can look C# code:
string[] stringhe = new string[4] {
"This is a test No. 42, Hello Nice People",
"I have no idea wtf No. 1234412344124. I am doing.",
"Very long No. 74385748957348957893458934; Hello World",
"Nope No. 48394839!!!"
};
Regex reg = new Regex(#"No.\s+([0-9]+)");
Match match;
int idx = 0;
foreach(string stringa in stringhe)
{
match = reg.Match(stringa);
if (match.Success)
{
Console.WriteLine("No. Stringa #" + idx + ": " + match.Groups[1].Value);
}
++idx;
}
Here is the code :
private string Format(string input)
{
Match m = new Regex("No. [0-9]*.").Match(input);
int targetIndex = m.Index + m.Length - 1;
return input.Remove(targetIndex, 1).Insert(targetIndex, ",");
}

C# Regex Split - How do I split string into 2 words

I have the following string:
String myNarrative = "ID: 4393433 This is the best narration";
I want to split this into 2 strings;
myId = "ID: 4393433";
myDesc = "This is the best narration";
How do I do this in Regex.Split()?
Thanks for your help.
If it is a fixed format as shown, use Regex.Match with Capturing Groups (see Matched Subexpressions). Split is useful for dividing up a repeating sequence with unbound multiplicity; the input does not represent such a sequence but rather a fixed set of fields/values.
var m = Regex.Match(inp, #"ID:\s+(\d+)\s+(.*)\s+");
if (m.Success) {
var number = m.Groups[1].Value;
var rest = m.Groups[2].Value;
} else {
// Failed to match.
}
Alternatively, one could use Named Groups and have a read through the Regular Expression Language quick-reference.

Categories