I have regex pattern like below:
Regex rx1 = new Regex(#"<div>/\*(.(?!\*/))*\*/(</div>|<br/></div>|<br></div>)");
Regex rx2 = new Regex(#"/\*[^>]+?\*/(<br/>|<br>)");
Regex rx3 = new Regex(#"/\*[^>]+?\*/");
Can anybody help to join together the regexes become 1 pattern?
Your problem with RX1 is because of (.(?!\*/))*\*/ which captures any character zero or more times aslong as it is not followed by */ because of this the answer can never match.
UPDATED Answer
#"(?'div'<div>)?/\*((?<!\*/).)*?\*/(?:<br/?>)?(?'-div'</div>)?(?(div)(?!))"
This will capture:
(?'div'<div>) an optional opening div stored in capture group div
/\* char sequence /*
((<!\*/).)*? zero or more characters, non greedy and each character is not
preceded by the string */
\*/ char sequence `*/`
(?:<br/?>)? optionally <br> or <br/>
(?'-div'</div>)? optionally </div> remove from capture group `div`
(?(div)(?!)) match only if capture group div is empty (ie balanced <div> </div>)
I think you need this for combining the patterns:
(pattern1|pattern2|pattern3) means pattern1 or pattern2 or pattern3
Try the following(It's frankenstein code but it helps you manage each regex variable as it's own as opposed to concatenating all three into one big regex(although that it is not wrong but it can be hard to manage changes to the regex).:
CODE:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
namespace BatchRegex
{
class Program
{
static void Main(string[] args)
{
string[] target =
{
"<div>/*...*/</div> <div>/*...*/<br></div> <div>/*...*/<br></div>",
"/*...*/<br></div> or /*...*/<br/></div>"
};
foreach (var tgt in target)
{
var rx1 = new Regex[]{new Regex(#"<div>/\*(.(?!\*/))*\*/(</div>|<br/></div>|<br></div>)", RegexOptions.Multiline),
new Regex(#"/\*[^>]+?\*/(<br/>|<br>)", RegexOptions.Multiline),
new Regex(#"/\*[^>]+?\*/", RegexOptions.Multiline)};
foreach (var rgx in rx1)
{
var rgxMatches = rgx.Matches(tgt).Cast<Match>();
Parallel.ForEach(rgxMatches, match =>
{
Console.WriteLine("Found {0} in target {1}.", match, tgt);
});
}
}
Console.Write("Press any key to exit...");
Console.ReadKey();
}
}
}
Related
I want to extract emoji id from the input.
For example, inputs:
`<eid=1> valid get 1`
`<eid = > invalid `
`<exd = 1> invalid`
`< eid = 1000> valid get 1000`
I know how to match those string, but I have no idea about how to extract those ids from the matched strings.
Use regex
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string[] inputs = {
"<eid=1>",
"<eid = >",
"<exd = 1>",
"< eid = 1000>"
};
string pattern = #"\<\s*eid\s*=\s*(?'number'\d+)\s*\>";
foreach (string input in inputs)
{
Match match = Regex.Match(input, pattern);
if (match.Success)
{
Console.WriteLine("input : '{0}' Does Match, number = '{1}'", input, match.Groups["number"]);
}
else
{
Console.WriteLine("input : '{0}' Does not Match", input);
}
}
Console.ReadLine();
}
}
}
You can do something like this. If you don't want to store each item in an array (ex. you have html code), you can store all the values as one string, as use the following:
var input = #"`<eid=1> valid get 1`
`<eid = > invalid `
`<exd = 1> invalid`
`< eid = 1000> valid get 1000`";
var regex = new Regex(#"(?<open>\=).*?(?<final-open>\>)");
var matches = regex.Matches(input).Cast<Match>().Select(m => m.Groups["final"].Value).Distinct().ToList();
foreach (var match in matches)
{
// here you have all the matches
var id = int.Parse(match.Trim());
}
This method sets the opening and closing tags of the matches you want where this is the open tag '\=' and this is the closing tag '>':
var regex = new Regex(#"(?<open>\=).*?(?<final-open>\>)");
You need to understand what is a match, what is a capture and how can one do match captures of specific data.
In the realm of regular expressions there is a difference between a match and a capture and basic grouping.
You want to match the whole value <eid=8> but you want to get the value 8 into a capture. That is done by adding a grouping ( ) pattern to establish 1 to many capture groups. For a match can hold one or more groupings which are indexed starting at 1 to N. Zero is a special group done automatically and explained later.
So for the data <eid=8>, to group capture the value use this regex <\w+=(\d+)\> (instead of the viable pattern <\w+=\d+\>). The grouping is what puts the number into the match capture group of 1 with a value of 8.
So what are groups exactly?
Groups[0] is always the whole match such as what you see of <eid=8>.
Groups[1-N] are individual captures when ( ) construct is specified. So for our example Groups[1].Value is the number of 8. Nice, that answers your question.
One can do a named match capture by putting in (<?<{name here}>... ). By that logic we can change our pattern to <\w+=(?<TheNumbers>\d+)\> and we then can extract with Groups["TheNumbers"].Value or even Groups[1].Value still.
I am currently writing a program that will check a line from a file and see if a US state is contained in that line and that it is also spelled correctly. I have it currently working for a single state. I would like to be able to see if any of all the US states are in the line.
This is my code so far below
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
using System.IO;
namespace ConsoleApplication3
{
class Program
{
static void Main(string[] args)
{
var r = new Regex(#"\bArizona\b", RegexOptions.IgnoreCase);
string[] lines = File.ReadAllLines(#"C:\sampledata.dat");
foreach (string s in lines)
{
var m = r.Match(s);
Console.WriteLine(m.Success); // false
Console.WriteLine(s);
Console.ReadLine();
}
}
}
}
Essentially I would like to do something like
var r = new Regex(#"\bAll US States.txt\b", RegexOptions.IgnoreCase);
If you wanted to see if any of the states were contained, you could essentially use the string.Join() method to generate an expression that would match any of them:
// Read your lines
var states = File.ReadAllLines(#"C:\states.txt");
// Build a pipe delimited string (e.g. State1|State2|State3 ...) to use an a Regex with necessary boundaries
// leading and trailing boundaries
var pattern = $#"\b{(string.Join("\b|\b", states))}\b";
// Now use that pattern to build a Regex to check against (using C# string interpolation)
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
// Now loop through your data here and check each line
string[] lines = File.ReadAllLines(#"C:\sampledata.dat");
foreach (string s in lines)
{
var m = regex.Match(s);
Console.WriteLine(m.Success); // false
Console.WriteLine(s);
Console.ReadLine();
}
Additionally, if you aren't able to use string interpolation to build your pattern, simply use the older string.Format() approach:
var pattern = string.Format("\b{0}\b", string.Join("\b|\b", states));
The regex you're looking for is
\b(?:Alabama|Alaska|Arizona|...)\b
To compose it from a list stored in an All US States.txt file use File.ReadAllLines:
var states = File.ReadAllLines(#"All US States.txt");
var r = new Regex(string.Format(#"\b(?:{0})\b", string.Join("|", states)), RegexOptions.IgnoreCase);
I have a string which contains some functions (I know their names) and their parameters like this:
translate(700 210) rotate(-30)
I would like to parse each one of them in a string array starting with the function name followed by the parameters.
I don't know much abour regex and so far I got this:
MatchCollection matches = Regex.Matches(attribute.InnerText, #"((translate|rotate|scale|matrix)\s*\(\s*(-?\d+\s*\,*\s*)+\))*");
for (int i = 0; i < matches.Count; i++)
{
Console.WriteLine(matches[i].Value);
}
That this returns is:
translate(700 210)
[blank space]
rotate(-30)
[blank space]
This works for me because I can run another regular expression one each row from the resulting collection and get the contents. What I don't understand is why there are blank rows returned between the methods.
Also, is running a regex twice - once to separate the methods and once to actually parse them a good approach?
Thanks!
Regex.Matches will match your entire regular expression multiple times. It finds one match for the whole thing, then finds the next match for the whole thing.
The outermost parens with * indicate that you're willing to accept zero or more of the preceding group's contents as a match. So when it finds none of them, it happily returns that. That is not your intent. You want exactly one.
The blanks are harmless, but "zero or more" also includes two. Consider this string, with no space between the two functions:
var text = "translate(700 210)rotate(-30)";
That's one match, according to the regex you provided. You'll get "rotate" and "-30". If the missing space is an error, detect it and warn the user. If you're not going to do that, parse it correctly.
So let's get rid of the outermost parens and that *. We'll also name the capturing groups, for readability.
var matches = Regex.Matches(text, #"(?<funcName>translate|rotate|scale|matrix)\s*\(\s*(?<param>-?\s*\d+\s*\,*\s*)+\)");
foreach (Match match in matches)
{
if (match.Groups["funcName"].Success)
{
var funcName = match.Groups["funcName"].Value;
var param = Int32.Parse(match.Groups["param"].Value);
Console.WriteLine($"{funcName}( {param} )");
}
}
I also stuck in \s* after the optional -, just in case.
I like using Regex with a dictionary
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication56
{
class Program
{
static void Main(string[] args)
{
Dictionary<string, string> dict = new Dictionary<string, string>();
string input = "translate(700 210) rotate(-30)";
string pattern = #"(?'command'[^\(]+)\((?'value'[^\)]+)\)";
MatchCollection matches = Regex.Matches(input, pattern);
foreach(Match match in matches.Cast<Match>())
{
dict.Add(match.Groups["command"].Value, match.Groups["value"].Value);
}
}
}
}
How to replace string between some Specific String in c#
For Exmaple
string temp = "I love ***apple***";
I need to get value between "***" string, i.e. "apple";
I have tried with IndexOf, but only get first index of selected value.
You should use regex for proper operation of various values like ***banana*** or ***nut*** so the code below may useful for your need. I created for both replacement and extraction of values between *** ***
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
namespace RegexReplaceTest
{
class Program
{
static void Main(string[] args)
{
string temp = "I love ***apple***, and also I love ***banana***.";
//This is for replacing the values with specific value.
string result = Regex.Replace(temp, #"\*\*\*[a-z]*\*\*\*", "Replacement", RegexOptions.IgnoreCase);
Console.WriteLine("Replacement output:");
Console.WriteLine(result);
//This is for extracting the values
Regex matchValues = new Regex(#"\*\*\*([a-z]*)\*\*\*", RegexOptions.IgnoreCase);
MatchCollection matches = matchValues.Matches(temp);
List<string> matchResult = new List<string>();
foreach (Match match in matches)
{
matchResult.Add(match.Value);
}
Console.WriteLine("Values with *s:");
Console.WriteLine(string.Join(",", matchResult));
Console.WriteLine("Values without *s:");
Console.WriteLine(string.Join(",", matchResult.Select(x => x.Trim('*'))));
}
}
}
And a working example is here: http://ideone.com/FpKaMA
Hope this examples helps you with your issue.
I want to get this bold part from this string:
some other code src='/pages/captcha?t=c&s=**51afb384edfc&h=513cc6f5349b**' `</td><td><input type=text name=captchaenter id=captchaenter size=3`
This is my regex that is not working:
Regex("src=\\'/pages/captcha\\?t=c&s=([\\d\\w&=]+)\\'", RegexOptions.IgnoreCase)
In tool for regex testing it's working.
How can this be fixed?
Your string-based regex is different from the regex you tested in the tool. In your regex, you have [\d\w\W]+ which matches any character and is aggressive (i.e. no ? after + to make it non-aggressive). So it may match a very long string, which may be all the way up to the last end quote.
In your tool you have [\d\w&=] which only matches digits, letters, & and =, so obviously it will stop when hitting the end quote.
The regex's aren't the same. The one in code has a character class ([\\d\\w\\W]+) that is different from the one in the tool ([\\d\\w&=]+])
Works perfectly fine with this code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string s = "src='/pages/captcha?t=c&s=51afb384edfc&h=513cc6f5349b' </td><td><input type=text name=captchaenter id=captchaenter size=3";
Regex rgx = new Regex("src=\\'/pages/captcha\\?t=c&s=([\\d\\w\\W]+)\\'", RegexOptions.IgnoreCase);
Match m = rgx.Match(s);
Console.Write(m.Groups[1]);
}
}
}
It outputs
51afb384edfc&h=513cc6f5349b
I despise regular expressions. I would do it similar to (but safer than) this:
private static string GetStuff(string source)
{
var start = source.IndexOf("s=") + 2;
var end = source.IndexOf('\'', start + 3);
return source.Substring(start, end - start);
}