Regex string for sample strings - c#

I want a regex for below strings.
string: Some () Text (1)
I want to capture 'Some () Text' and '1'
string: Any () Text
I want to capture 'Any () Text' and '0'
I came up with the following regex to capture 'text' and 'count' but it does not match the 2nd ex above.
#"(?<text>.+)\((?<count>\d+)\)
c#:
string pattern = #"(?<text>.+)\((?<count>\d+)\)";
Match m = Regex.Match(line, pattern);
count = 0;
text = "";
if (m.Success)
{
text = m.Groups["text"].Value.Trim();
int.TryParse(m.Groups["count"].Value, out count);
}

Just make the group optional:
string pattern = #"^(?<text>.+?)(\((?<count>\d+)\))?$";
Match m = Regex.Match(line, pattern);
count = 0;
text = "";
if (m.Success)
{
text = m.Groups["text"].Value.Trim();
if(m.Groups["count"].Success) {
int.TryParse(m.Groups["count"].Value, out count);
}
}

Try this
(?<group_text>Some Text) (?:\((?<group_count>\d+)\)|(?<group_count>))
update
There is really too many ways to go here given the information you provide.
This could be the totally flexible version.
(?<group_text>
(?:
(?! \s* \( \s* \d+ \s* \) )
[\s\S]
)*
)
\s*
(?:
\( \s* (?<group_count>\d+ ) \s* \)
)?

Regexp solution:
var s = "Some Text (1)";
var match = System.Text.RegularExpressions.Regex.Match(s, #"(?<text>[^(]+)\((?<d>[^)]+)\)");
var matches = match.Groups;
if(matches["text"].Success && matches["d"].Success) {
int n = int.Parse(matches["d"].Value);
Console.WriteLine("text = {0}, number = {1}", match.Groups["text"].Value, n);
} else {
Console.WriteLine("NOT FOUND");
}
.Split() solution:
var parts = s.Split(new char[] { '(', ')'});
var text = parts[0];
var number = parts[1];
int n;
if(parts.Length >= 3 int.TryParse(number, out n)) {
Console.WriteLine("text = {0}, number = {1}", text,n);
} else {
Console.WriteLine("NOT FOUND");
}
Output:
text = Some Text , number = 1
text = Some Text , number = 1

Related

How do I check if a string contains "(1)" and if it does, increase the number by 1?

If a any given string, at the end contains "(" followed by a number, + ")", i want to increase that value by one. If not Ill just add a "(1)".
Ive tried with something like string.Contains(), but since the value within () can be diffrent i don't know how to always search like this and get the number.
To find a parentheses enclosed number at the end of a string, and increase b 1, try this:
Regex.Replace(yourString, #"(?<=\()\d+(?=\)$)", match => (int.Parse(match.Value) + 1).ToString());
Explanation:
(?<=\() is a positive look-behind, which matches an open bracket, but does not include it in the match result.
\d+ matches one or more digits.
(?=\)$) is a positive look-ahead, which matches a closing bracket at the end of the string.
To add a number if none is present, test the match first:
string yourString = "A string with no number at the end";
string pattern = #"(?<=\()\d+(?=\)$)";
if (Regex.IsMatch(yourString, pattern))
{
yourString = Regex.Replace(yourString, pattern, match => (int.Parse(match.Value) + 1).ToString());
}
else
{
yourString += " (1)";
}
You can try regular expressions: Match and Replace the desired fragment, e.g.
using System.Text.RegularExpressions;
...
string[] tests = new string[] {
"abc",
"def (123)",
"pqr (123) def",
"abs (789) (123)",
};
Func<string, string> solution = (line) =>
Regex.Replace(line,
#"\((?<value>[0-9]+)\)$",
m => $"({int.Parse(m.Groups["value"].Value) + 1})");
string demo = string.Join(Environment.NewLine, tests
.Select(test => $"{test,-20} => {solution(test)}"));
Console.Write(demo);
Outcome:
abc => abc # no numbers
def (123) => def (124) # 123 turned into 124
pqr (123) def => pqr (123) def # 123 is not at the end of string
abs (789) (123) => abs (789) (124) # 123 turned into 124, 789 spared
If we put
Func<string, string> solution = (line) => {
Match m = Regex.Match(line, #"\((?<value>[0-9]+)\)$");
return m.Success
? line.Substring(0, m.Index) + $"({int.Parse(m.Groups["value"].Value) + 1})"
: line + " (1)";
};
Edit: If we want to put (1) if we haven't any match we can try Match and replace matched text:
abc => abc (1)
def (123) => def (124)
pqr (123) def => pqr (123) def (1)
abs (789) (123) => abs (789) (124)
string s = "sampleText";
string pattern = "[(]([0-9]*?)[)]$";
for (int i = 0; i < 5; i++)
{
var m = Regex.Match(s, pattern);
if (m.Success)
{
int value = int.Parse(m.Groups[1].Value);
s = Regex.Replace(s, pattern, $"({++value})");
}
else
{
s += "(1)";
}
Console.WriteLine(s);
}
If I understand correctly you have strings such as :
string s1 = "foo(12)"
string s2 = "bar(21)"
string s3 = "foobar"
And you want to obtain the following:
IncrementStringId(s1) == "foo(13)"
IncrementStringId(s2) == "bar(22)"
IncrementStringId(s3) == "foobar(1)"
you could accomplish this by using the following method
public string IncrementStringId(string input)
{
// The RexEx pattern is looking at the very end of the string for any number encased in paranthesis
string pattern = #"\(\d*\)$";
Regex regex = new Regex(pattern);
Match match = regex.Match(input);
if (match.Success)
if (int.TryParse(match.Value.Replace(#"(", "").Replace(#")", ""), out int index))
//if pattern in found parse the number detected and increment it by 1
return Regex.Replace(input, pattern, "(" + ++index + ")");
// In case the pattern is not detected add a (1) to the end of the string
return input + "(1)";
}
Please make sure you are using System.Text.RegularExpressions namespace that includes Regex class.

Get Strings Between Characters in Brackets in C#

I have a string, that looks like a method:
"method(arg1,arg2,arg3);"
And I need to get all of the arguments in it as a string, I.e:
"arg1"
"arg2"
"arg3"
How can I do it? I have tried the following code:
var input = LineText;
var split = input.Split(',');
var result = String.Join(",", split.Skip(1).Take(split.Length - 2));
var split2 = input.Split(',');
var result2 = String.Join(",", split.Skip(2).Take(split.Length - 2));
var split3 = input.Split(',');
var result3 = String.Join(",", split.Skip(3).Take(split.Length - 2));
However it doesn't work correctly.
I need not a regex.
You can use regex or with this simple string pure string methods. First find the beginning and the end of the brackets with String.IndexOf. Then use Substring and Split:
static string[] ExtractArguments(string methodSig)
{
int bracketStart = methodSig.IndexOf('(');
if (bracketStart == -1) return null;
int bracketEnd = methodSig.IndexOf(')', bracketStart + 1);
if (bracketEnd == -1) return null;
string arguments = methodSig.Substring(++bracketStart, bracketEnd - bracketStart);
return arguments.Split(',').Select(s => s.Trim()).ToArray();
}
Here is a regex expression that can handle nested parentheses
static class Program
{
static readonly Regex regex = new Regex(#"
\( # Match (
(
[^()]+ # all chars except ()
| (?<Level>\() # or if ( then Level += 1
| (?<-Level>\)) # or if ) then Level -= 1
)+ # Repeat (to go from inside to outside)
(?(Level)(?!)) # zero-width negative lookahead assertion
\) # Match )",
RegexOptions.IgnorePatternWhitespace);
/// <summary>
/// Program Entry Point
/// </summary>
/// <param name="args">Command Line Arguments</param>
static void Main(string[] args)
{
var input = "method(arg1,arg2,arg3(x));";
var match = regex.Match(input);
if(match != null)
{
string method = input.Substring(0, match.Index); // "method"
string inside_parens = input.Substring(match.Index+1, match.Length-2); // "arg1,arg2,arg3"
string remainer = input.Substring(match.Index+match.Length); // ";"
string[] arguments = inside_parens.Split(',');
// recreate the input
Debug.WriteLine($"{method}({string.Join(",", arguments)});");
// Output: "method(arg1,arg2,arg3(x));"
}
}
}
Code was blatanlty stolen from this SO post.
An alternative solution using regex:
var s = "method(arg1,arg2,arg3);";
// Use regex to match the argument sequence "arg1,arg2,arg3"
var match = Regex.Match(s, #"^\w+\((.+)\);$");
// Split the arguments and put them into an array
string[] arguments = match.Groups[1].ToString().Split(',').ToArray();

How to Regex match on spaces and replace without overwriting spaces

I have a Regex match like the following code:
string[] specials = new string[] { "special1", "special2", "special3" };
for (int i = 0; i < specials.Length; i++)
{
string match = string.Format("(?:\\s)({0})(?:\\s)", specials[i]);
if (Regex.IsMatch(name, match, RegexOptions.IgnoreCase))
{
name = Regex.Replace(name, match, specials[i], RegexOptions.IgnoreCase);
break;
}
}
What I would like is to have the replace operation replace only the matching text and leave the leading and trailing space in tact. So "This is a Special1 sentence" would become "This is a special1 sentence". With the Replace statement above I get "This is aspecial1sentence".
Solution:
Based on #Jerry's comment, I changed the match to:
(\\s)({0})(\\s)
and the Replace to:
name = Regex.Replace(name, match, "$1" + specials[i] + "$3", RegexOptions.IgnoreCase);
and was able to get the desired results.
You can use a lookbehind and a lookahead to check for the spaces without including them in the match:
string[] specials = new string[] { "special1", "special2", "special3" };
for (int i = 0; i < specials.Length; i++)
{
string match = string.Format("(?<=\\s){0}(?=\\s)", specials[i]);
if (Regex.IsMatch(name, match, RegexOptions.IgnoreCase))
{
name = Regex.Replace(name, match, specials[i], RegexOptions.IgnoreCase);
break;
}
}
This way you don't have to add the spaces back in.

C# extract words using regex

I've found a lot of examples of how to check something using regex, or how to split text using regular expressions.
But how can I extract words out of a string ?
Example:
aaaa 12312 <asdad> 12334 </asdad>
Lets say I have something like this, and I want to extract all the numbers [0-9]* and put them in a list.
Or if I have 2 different kind of elements:
aaaa 1234 ...... 1234 ::::: asgsgd
And I want to choose digits that come after ..... and words that come after ::::::
Can I extract these strings in a single regex ?
Here's a solution for your first problem:
class Program
{
static void Main(string[] args)
{
string data = "aaaa 12312 <asdad> 12334 </asdad>";
Regex reg = new Regex("[0-9]+");
foreach (var match in reg.Matches(data))
{
Console.WriteLine(match);
}
Console.ReadLine();
}
}
In the general case, you can do this using capturing parentheses:
string input = "aaaa 1234 ...... 1234 ::::: asgsgd";
string regex = #"\.\.\.\. (\d+) ::::: (\w+)";
Match m = Regex.Match(input, regex);
if (m.Success) {
int numberAfterDots = int.Parse(m.Groups[1].Value);
string wordAfterColons = m.Groups[2].Value;
// ... Do something with these values
}
But the first part you asked (extract all the numbers) is a bit easier:
string input = "aaaa 1234 ...... 1234 ::::: asgsgd";
var numbers = Regex.Matches(input, #"\d+")
.Cast<Match>()
.Select(m => int.Parse(m.Value))
.ToList();
Now numbers will be a list of integers.
For your specific examples:
string firstString = "aaaa 12312 <asdad> 12334 </asdad>";
Regex firstRegex = new Regex(#"(?<Digits>[\d]+)", RegexOptions.ExplicitCapture);
if (firstRegex.IsMatch(firstString))
{
MatchCollection firstMatches = firstRegex.Matches(firstString);
foreach (Match match in firstMatches)
{
Console.WriteLine("Digits: " + match.Groups["Digits"].Value);
}
}
string secondString = "aaaa 1234 ...... 1234 ::::: asgsgd";
Regex secondRegex = new Regex(#"([\.]+\s(?<Digits>[\d]+))|([\:]+\s(?<Words>[a-zA-Z]+))", RegexOptions.ExplicitCapture);
if (secondRegex.IsMatch(secondString))
{
MatchCollection secondMatches = secondRegex.Matches(secondString);
foreach (Match match in secondMatches)
{
if (match.Groups["Digits"].Success)
{
Console.WriteLine("Digits: " + match.Groups["Digits"].Value);
}
if (match.Groups["Words"].Success)
{
Console.WriteLine("Words: " + match.Groups["Words"].Value);
}
}
}
Hope that helps. The output is:
Digits: 12312
Digits: 12334
Digits: 1234
Words: asgsgd
Something like this will do nicely!
var text = "aaaa 12312 <asdad> 12334 </asdad>";
var matches = Regex.Matches(text, #"\w+");
var arrayOfMatched = matches.Cast<Match>().Select(m => m.Value).ToArray();
Console.WriteLine(string.Join(", ", arrayOfMatched));
\w+ Matches consecutive word characters. Then we just selected the values out of the list of matches and turn them into an array.
Regex itemsRegex = new Regex(#"(\d*)");
MatchCollection matches = itemsRegex.Matches(text);
int[] values = matches.Cast<Match>().Select(m => Convert.ToInt32(m.Value)).ToArray();
Regex phoneregex = new Regex("[0-9][0-9][0-9]\-[0-9][0-9][0-9][0-9]");
String unicornCanneryDirectory = "unicorn cannery 483-8627 cha..."
String numbersToCall = "";
//the second argument is where to begin within the match,
//we probably want 0, the first character
Match matchIterator = phoneregex.Match(unicornCanneryDirectory , 0);
//Success tells us if matchIterator has another match or not
while( matchIterator.Sucess){
String aResult = matchIterator.Result();
//we could manipulate our match now but I'm going to concatenate them all for later
numbersToCall += aResult + " ";
matchIterator = matchIterator.NextMatch();
}
// use my concatenated matches now
String message = "Unicorn rights activists demand more sparkles in the unicorn canneries under the new law...";
phoneDialer.MassCallWithAutomatedMessage(aResult, message );
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.match.nextmatch.aspx

Find substring ignoring specified characters

Do any of you know of an easy/clean way to find a substring within a string while ignoring some specified characters to find it. I think an example would explain things better:
string: "Hello, -this- is a string"
substring to find: "Hello this"
chars to ignore: "," and "-"
found the substring, result: "Hello, -this"
Using Regex is not a requirement for me, but I added the tag because it feels related.
Update:
To make the requirement clearer: I need the resulting substring with the ignored chars, not just an indication that the given substring exists.
Update 2:
Some of you are reading too much into the example, sorry, i'll give another scenario that should work:
string: "?A&3/3/C)412&"
substring to find: "A41"
chars to ignore: "&", "/", "3", "C", ")"
found the substring, result: "A&3/3/C)41"
And as a bonus (not required per se), it will be great if it's also not safe to assume that the substring to find will not have the ignored chars on it, e.g.: given the last example we should be able to do:
substring to find: "A3C412&"
chars to ignore: "&", "/", "3", "C", ")"
found the substring, result: "A&3/3/C)412&"
Sorry if I wasn't clear before, or still I'm not :).
Update 3:
Thanks to everyone who helped!, this is the implementation I'm working with for now:
http://www.pastebin.com/pYHbb43Z
An here are some tests:
http://www.pastebin.com/qh01GSx2
I'm using some custom extension methods I'm not including but I believe they should be self-explainatory (I will add them if you like)
I've taken a lot of your ideas for the implementation and the tests but I'm giving the answer to #PierrOz because he was one of the firsts, and pointed me in the right direction.
Feel free to keep giving suggestions as alternative solutions or comments on the current state of the impl. if you like.
in your example you would do:
string input = "Hello, -this-, is a string";
string ignore = "[-,]*";
Regex r = new Regex(string.Format("H{0}e{0}l{0}l{0}o{0} {0}t{0}h{0}i{0}s{0}", ignore));
Match m = r.Match(input);
return m.Success ? m.Value : string.Empty;
Dynamically you would build the part [-, ] with all the characters to ignore and you would insert this part between all the characters of your query.
Take care of '-' in the class []: put it at the beginning or at the end
So more generically, it would give something like:
public string Test(string query, string input, char[] ignorelist)
{
string ignorePattern = "[";
for (int i=0; i<ignoreList.Length; i++)
{
if (ignoreList[i] == '-')
{
ignorePattern.Insert(1, "-");
}
else
{
ignorePattern += ignoreList[i];
}
}
ignorePattern += "]*";
for (int i = 0; i < query.Length; i++)
{
pattern += query[0] + ignorepattern;
}
Regex r = new Regex(pattern);
Match m = r.Match(input);
return m.IsSuccess ? m.Value : string.Empty;
}
Here's a non-regex string extension option:
public static class StringExtensions
{
public static bool SubstringSearch(this string s, string value, char[] ignoreChars, out string result)
{
if (String.IsNullOrEmpty(value))
throw new ArgumentException("Search value cannot be null or empty.", "value");
bool found = false;
int matches = 0;
int startIndex = -1;
int length = 0;
for (int i = 0; i < s.Length && !found; i++)
{
if (startIndex == -1)
{
if (s[i] == value[0])
{
startIndex = i;
++matches;
++length;
}
}
else
{
if (s[i] == value[matches])
{
++matches;
++length;
}
else if (ignoreChars != null && ignoreChars.Contains(s[i]))
{
++length;
}
else
{
startIndex = -1;
matches = 0;
length = 0;
}
}
found = (matches == value.Length);
}
if (found)
{
result = s.Substring(startIndex, length);
}
else
{
result = null;
}
return found;
}
}
EDIT: here's an updated solution addressing the points in your recent update. The idea is the same except if you have one substring it will need to insert the ignore pattern between each character. If the substring contains spaces it will split on the spaces and insert the ignore pattern between those words. If you don't have a need for the latter functionality (which was more in line with your original question) then you can remove the Split and if checking that provides that pattern.
Note that this approach is not going to be the most efficient.
string input = #"foo ?A&3/3/C)412& bar A341C2";
string substring = "A41";
string[] ignoredChars = { "&", "/", "3", "C", ")" };
// builds up the ignored pattern and ensures a dash char is placed at the end to avoid unintended ranges
string ignoredPattern = String.Concat("[",
String.Join("", ignoredChars.Where(c => c != "-")
.Select(c => Regex.Escape(c)).ToArray()),
(ignoredChars.Contains("-") ? "-" : ""),
"]*?");
string[] substrings = substring.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
string pattern = "";
if (substrings.Length > 1)
{
pattern = String.Join(ignoredPattern, substrings);
}
else
{
pattern = String.Join(ignoredPattern, substring.Select(c => c.ToString()).ToArray());
}
foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine("Index: {0} -- Match: {1}", match.Index, match.Value);
}
Try this solution out:
string input = "Hello, -this- is a string";
string[] searchStrings = { "Hello", "this" };
string pattern = String.Join(#"\W+", searchStrings);
foreach (Match match in Regex.Matches(input, pattern))
{
Console.WriteLine(match.Value);
}
The \W+ will match any non-alphanumeric character. If you feel like specifying them yourself, you can replace it with a character class of the characters to ignore, such as [ ,.-]+ (always place the dash character at the start or end to avoid unintended range specifications). Also, if you need case to be ignored use RegexOptions.IgnoreCase:
Regex.Matches(input, pattern, RegexOptions.IgnoreCase)
If your substring is in the form of a complete string, such as "Hello this", you can easily get it into an array form for searchString in this way:
string[] searchString = substring.Split(new[] { ' ' },
StringSplitOptions.RemoveEmptyEntries);
This code will do what you want, although I suggest you modify it to fit your needs better:
string resultString = null;
try
{
resultString = Regex.Match(subjectString, "Hello[, -]*this", RegexOptions.IgnoreCase).Value;
}
catch (ArgumentException ex)
{
// Syntax error in the regular expression
}
You could do this with a single Regex but it would be quite tedious as after every character you would need to test for zero or more ignored characters. It is probably easier to strip all the ignored characters with Regex.Replace(subject, "[-,]", ""); then test if the substring is there.
Or the single Regex way
Regex.IsMatch(subject, "H[-,]*e[-,]*l[-,]*l[-,]*o[-,]* [-,]*t[-,]*h[-,]*i[-,]*s[-,]*")
Here's a non-regex way to do it using string parsing.
private string GetSubstring()
{
string searchString = "Hello, -this- is a string";
string searchStringWithoutUnwantedChars = searchString.Replace(",", "").Replace("-", "");
string desiredString = string.Empty;
if(searchStringWithoutUnwantedChars.Contains("Hello this"))
desiredString = searchString.Substring(searchString.IndexOf("Hello"), searchString.IndexOf("this") + 4);
return desiredString;
}
You could do something like this, since most all of these answer require rebuilding the string in some form.
string1 is your string you want to look through
//Create a List(Of string) that contains the ignored characters'
List<string> ignoredCharacters = new List<string>();
//Add all of the characters you wish to ignore in the method you choose
//Use a function here to get a return
public bool subStringExist(List<string> ignoredCharacters, string myString, string toMatch)
{
//Copy Your string to a temp
string tempString = myString;
bool match = false;
//Replace Everything that you don't want
foreach (string item in ignoredCharacters)
{
tempString = tempString.Replace(item, "");
}
//Check if your substring exist
if (tempString.Contains(toMatch))
{
match = true;
}
return match;
}
You could always use a combination of RegEx and string searching
public class RegExpression {
public static void Example(string input, string ignore, string find)
{
string output = string.Format("Input: {1}{0}Ignore: {2}{0}Find: {3}{0}{0}", Environment.NewLine, input, ignore, find);
if (SanitizeText(input, ignore).ToString().Contains(SanitizeText(find, ignore)))
Console.WriteLine(output + "was matched");
else
Console.WriteLine(output + "was NOT matched");
Console.WriteLine();
}
public static string SanitizeText(string input, string ignore)
{
Regex reg = new Regex("[^" + ignore + "]");
StringBuilder newInput = new StringBuilder();
foreach (Match m in reg.Matches(input))
{
newInput.Append(m.Value);
}
return newInput.ToString();
}
}
Usage would be like
RegExpression.Example("Hello, -this- is a string", "-,", "Hello this"); //Should match
RegExpression.Example("Hello, -this- is a string", "-,", "Hello this2"); //Should not match
RegExpression.Example("?A&3/3/C)412&", "&/3C\\)", "A41"); // Should match
RegExpression.Example("?A&3/3/C) 412&", "&/3C\\)", "A41"); // Should not match
RegExpression.Example("?A&3/3/C)412&", "&/3C\\)", "A3C412&"); // Should match
Output
Input: Hello, -this- is a string
Ignore: -,
Find: Hello this
was matched
Input: Hello, -this- is a string
Ignore: -,
Find: Hello this2
was NOT matched
Input: ?A&3/3/C)412&
Ignore: &/3C)
Find: A41
was matched
Input: ?A&3/3/C) 412&
Ignore: &/3C)
Find: A41
was NOT matched
Input: ?A&3/3/C)412&
Ignore: &/3C)
Find: A3C412&
was matched

Categories