I'm met problem with string parsing and want solve her by regular expression.
Always as input I'm get string the same like: %function_name%(IN: param1, ..., paramN; OUT: param1,..., paramN)
I'm wrote a pattern:
string pattern = #"[A-za-z][A-za-z0-9]*\(IN:\s*(([A-za-z][A-za-z0-9](,|;))+|;)\s*OUT:(\s*[A-za-z][A-za-z0-9],?)*\)";
This pattern detected my input strings, but in fact as output I'm want to have a two arrays of strings. One of this must contain INPUT params (after "IN:") IN: param1, ..., paramN and second array must have names of output params. Params can contains numbers and '_'.
Few examples of real input strings:
Add_func(IN: port_0, in_port_1; OUT: out_port99)
Some_func(IN:;OUT: abc_P1)
Some_func2(IN: input_portA;OUT:)
Please, tell me how to make a correct pattern.
You can use this pattern, that allows to catch all functions with separate params in one shot:
(?<funcName>\w+)\(IN: ?|OUT: ?|\G(?<inParam>[^,;()]+)?(?=[^)(;]*;)\s*[,;]\s*|\G(?<outParam>[^,()]+)(?=[^;]*\s*\))\s*[,)]\s*
Pattern details:
(?<funcName>\w+)\(IN: ? # capture the function name and match "(IN: "
| # OR
OUT: ? # match "OUT: "
| # OR
\G(?<inParam>[^,;()]+)? # contiguous match, that captures a IN param
(?=[^)(;]*;) # check that it is always followed by ";"
\s*[,;]\s* # match "," or ";" (to be always contiguous)
| # OR
\G(?<outParam>[^,()]+)? # contiguous match, that captures a OUT param
(?=[^;]*\s*\)) # check that it is always followed by ")"
\s*[,)]\s* # match "," (to be always contiguous) or ")"
(To obtain a cleaner result, you must walk to the match array (with a foreach) and remove empty entries)
example code:
static void Main(string[] args)
{
string subject = #"Add_func(IN: port_0, in_port_1; OUT: out_port99)
Some_func(IN:;OUT: abc_P1)
shift_data(IN:po1_p0;OUT: po1_p1, po1_p2)
Some_func2(IN: input_portA;OUT:)";
string pattern = #"(?<funcName>\w+)\(IN: ?|OUT: ?|\G(?<inParam>[^,;()]+)?(?=[^)(;]*;)\s*[,;]\s*|\G(?<outParam>[^,()]+)(?=[^;]*\s*\))\s*[,)]\s*";
Match m = Regex.Match(subject, pattern);
while (m.Success)
{
if (m.Groups["funcName"].ToString() != "")
{
Console.WriteLine("\nfunction name: " + m.Groups["funcName"]);
}
if (m.Groups["inParam"].ToString() != "")
{
Console.WriteLine("IN param: " + m.Groups["inParam"]);
}
if (m.Groups["outParam"].ToString() != "")
{
Console.WriteLine("OUT param: "+m.Groups["outParam"]);
}
m = m.NextMatch();
}
}
An other way consists to match all IN parameters and all OUT parameters in one string and then to split these strings with \s*,\s*
example:
string pattern = #"(?<funcName>\w+)\(\s*IN:\s*(?<inParams>[^;]*?)\s*;\s*OUT\s*:\s*(?<outParams>[^)]*?)\s*\)";
Match m = Regex.Match(subject, pattern);
while (m.Success)
{
string functionName = m.Groups["function name"].ToString();
string[] inParams = Regex.Split(m.Groups["inParams"].ToString(), #"\s*,\s*");
string[] outParams = Regex.Split(m.Groups["outParams"].ToString(), #"\s*,\s*");
// Why not construct a "function" object to store all these values
m = m.NextMatch();
}
The way to do this is with capturing groups. Named capturing groups are the easiest to work with:
// a regex surrounded by parens is a capturing group
// a regex surrounded by (?<name> ... ) is a named capturing group
// here I've tried to surround the relevant parts of the pattern with named groups
var pattern = #"[A-za-z][A-za-z0-9]*\(IN:\s*(((?<inValue>[A-za-z][A-za-z0-9])(,|;))+|;)\s*OUT:(\s*(?<outValue>[A-za-z][A-za-z0-9]),?)*\)";
// get all the matches. ExplicitCapture is just an optimization which tells the engine that it
// doesn't have to save state for non-named capturing groups
var matches = Regex.Matches(input: input, pattern: pattern, options: RegexOptions.ExplicitCapture)
// convert from IEnumerable to IEnumerable<Match>
.Cast<Match>()
// for each match, select out the captured values
.Select(m => new {
// m.Groups["inValue"] gets the named capturing group "inValue"
// for groups that match multiple times in a single match (as in this case, we access
// group.Captures, which records each capture of the group. .Cast converts to IEnumerable<T>,
// at which point we can select out capture.Value, which is the actual captured text
inValues = m.Groups["inValue"].Captures.Cast<Capture>().Select(c => c.Value).ToArray(),
outValues = m.Groups["outValue"].Captures.Cast<Capture>().Select(c => c.Value).ToArray()
})
.ToArray();
I think this is what you are looking for:
[A-za-z][A-za-z0-9_]*\(IN:((?:\s*(?:[A-za-z][A-za-z0-9_]*(?:[,;])))+|;)\s*OUT:(\s*[A-za-z][A-za-z0-9_]*,?)*\)
There were a few problems with grouping as well as you were missing the space between multiple IN parameters. You also were not allowing for an underscore which appeared in your examples.
The above will work with all of your examples above.
Add_func(IN: port_0, in_port_1; OUT: out_port99) will capture:
port_0, in_port_1 and out_port99
Some_func(IN:;OUT: abc_P1) will capture:
; and abc_P1
Some_func2(IN: input_portA; OUT:) will capture:
input_portA and empty.
After getting these capture groups, you can split them on commas to get your arrays.
Related
I'm attempting to replace all instances of any special characters between each occurrence of a set of delimiters in a string. I believe the solution will include some combination of a regular expression match to retrieve the text between each set of delimiters and a regular expression replace to replace each offending character within the match with a space. Here’s what I have so far:
string input = "***XX*123456789~N3*123 E. Fake St. Apt# 456~N4*Beverly Hills*CA*902122405~REF*EI*902122405~HL*1*1*50*0~SBR*P*18*******MA~NM1*IL*1*Tom*Thompson*T***MI*123456789A~N3*456 W. False Ave.*Apt. #6B~N4*Beverly Hills*CA*90210~DMG*";
string matchPattern = "(~N3\\*)(.*?)(~N4\\*)";
string replacePattern = "[^0-9a-zA-Z ]?";
var matches = Regex.Matches(input, matchPattern);
foreach (Match match in matches)
{
match.Value = "~N3*" + Regex.Replace(match.Value, replacePattern, " ") + "~N4*";
}
MessageBox.Show(input);
I would expect the message box to show the following:
"***XX*123456789~N3*123 E Fake St Apt 456~N4*Beverly Hills*CA*902122405~REF*EI*902122405~HL*1*1*50*0~SBR*P*18*******MA~NM1*IL*1*Tom*Thompson*T***MI*123456789A~N3*456 W False Ave *Apt 6B~N4*Beverly Hills*CA*90210~DMG*"
Obviously this isn’t working because I can’t assign to the matched value inside the loop, but I hope you can follow my thought process. It is important that any characters which are not between the delimiters remain unchanged. Any direction or advice would be helpful. Thank you so much!
Use a Regex.Replace with a match evaluator where you may call the second Regex.Replace:
string input = "***XX*123456789~N3*123 E. Fake St. Apt# 456~N4*Beverly Hills*CA*902122405~REF*EI*902122405~HL*1*1*50*0~SBR*P*18*******MA~NM1*IL*1*Tom*Thompson*T***MI*123456789A~N3*456 W. False Ave.*Apt. #6B~N4*Beverly Hills*CA*90210~DMG*";
string matchPattern = #"(~N3\*)(.*?)(~N4\*)";
string replacePattern = "[^0-9a-zA-Z ]";
string res = Regex.Replace(input, matchPattern, m =>
string.Format("{0}{1}{2}",
m.Groups[1].Value,
Regex.Replace(m.Groups[2].Value, replacePattern, " "), // Here, you modify just inside the 1st regex matches
m.Groups[3].Value));
Console.Write(res); // Just to print the demo result
// => ***XX*123456789~N3*123 E Fake St Apt 456~N4*Beverly Hills*CA*902122405~REF*EI*902122405~HL*1*1*50*0~SBR*P*18*******MA~NM1*IL*1*Tom*Thompson*T***MI*123456789A~N3*456 W False Ave Apt 6B~N4*Beverly Hills*CA*90210~DMG*
See the C# demo
Actually, since ~N3* and ~N4* are literal strings, you may use a single capturing group in the pattern and then add those delimiters as hard-coded in the match evaluator, but it is up to you to decide what suits you best.
I've got an input string that looks like this:
level=<device[195].level>&name=<device[195].name>
I want to create a RegEx that will parse out each of the <device> tags, for example, I'd expect two items to be matched from my input string: <device[195].level> and <device[195].name>.
So far I've had some luck with this pattern and code, but it always finds both of the device tags as a single match:
var pattern = "<device\\[[0-9]*\\]\\.\\S*>";
Regex rgx = new Regex(pattern);
var matches = rgx.Matches(httpData);
The result is that matches will contain a single result with the value <device[195].level>&name=<device[195].name>
I'm guessing there must be a way to 'terminate' the pattern, but I'm not sure what it is.
Use non-greedy quantifiers:
<device\[\d+\]\.\S+?>
Also, use verbatim strings for escaping regexes, it makes them much more readable:
var pattern = #"<device\[\d+\]\.\S+?>";
As a side note, I guess in your case using \w instead of \S would be more in line with what you intended, but I left the \S because I can't know that.
depends how much of the structure of the angle blocks you need to match, but you can do
"\\<device.+?\\>"
I want to create a RegEx that will parse out each of the <device> tags
I'd expect two items to be matched from my input string:
1. <device[195].level>
2. <device[195].name>
This should work. Get the matched group from index 1
(<device[^>]*>)
Live demo
String literals for use in programs:
#"(<device[^>]*>)"
Change your repetition operator and use \w instead of \S
var pattern = #"<device\[[0-9]+\]\.\w+>";
String s = #"level=<device[195].level>&name=<device[195].name>";
foreach (Match m in Regex.Matches(s, #"<device\[[0-9]+\]\.\w+>"))
Console.WriteLine(m.Value);
Output
<device[195].level>
<device[195].name>
Use named match groups and create a linq entity projection. There will be two matches, thus separating the individual items:
string data = "level=<device[195].level>&name=<device[195].name>";
string pattern = #"
(?<variable>[^=]+) # get the variable name
(?:=<device\[) # static '=<device'
(?<index>[^\]]+) # device number index
(?:]\.) # static ].
(?<sub>[^>]+) # Get the sub command
(?:>&?) # Match but don't capture the > and possible &
";
// Ignore pattern whitespace is to document the pattern, does not affect processing.
var items = Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
Variable = mt.Groups["variable"].Value,
Index = mt.Groups["index"].Value,
Sub = mt.Groups["sub"].Value
})
.ToList();
items.ForEach(itm => Console.WriteLine ("{0}:{1}:{2}", itm.Variable, itm.Index, itm.Sub));
/* Output
level:195:level
name:195:name
*/
I have the following string:
String myNarrative = "ID: 4393433 This is the best narration";
I want to split this into 2 strings;
myId = "ID: 4393433";
myDesc = "This is the best narration";
How do I do this in Regex.Split()?
Thanks for your help.
If it is a fixed format as shown, use Regex.Match with Capturing Groups (see Matched Subexpressions). Split is useful for dividing up a repeating sequence with unbound multiplicity; the input does not represent such a sequence but rather a fixed set of fields/values.
var m = Regex.Match(inp, #"ID:\s+(\d+)\s+(.*)\s+");
if (m.Success) {
var number = m.Groups[1].Value;
var rest = m.Groups[2].Value;
} else {
// Failed to match.
}
Alternatively, one could use Named Groups and have a read through the Regular Expression Language quick-reference.
I have the following string that would require me to parse it via Regex in C#.
Format: rec_mnd.rate.current_rate.sum.QWD.RET : 214345
I would like to extract our the bold chars as group objects in a groupcollection.
QWD = 1 group
RET = 1 group
214345 = 1 group
what would the message pattern be like?
It would be something like this:
string s = "Format: rec_mnd.rate.current_rate.sum.QWD.RET : 214345";
Match m = Regex.Match(s, #"^Format: rec_mnd\.rate\.current_rate\.sum\.(.+?)\.(.+?) : (\d+)$");
if( m.Success )
{
Console.WriteLine(m.Groups[1].Value);
Console.WriteLine(m.Groups[2].Value);
Console.WriteLine(m.Groups[3].Value);
}
The question mark in the first two groups make that quantifier lazy: it will capture the least possible amount of characters. In other words, it captures until the first . it sees. Alternatively, you could use ([^.]+) in those groups, which explicitly captures everything except a period.
The last group explicitly only captures decimal digits. If your expression can have other values on the right side of the : you'd have to change that to .+ as well.
Please, make it a lot easier on yourself and label your groups to make it easier to understand what is going on in code.
RegEx myRegex = new Regex(#"rec_mnd\.rate\.current_rate\.sum\.(?<code>[A-Z]{3})\.(?<subCode>[A-Z]{3})\s*:\s*(?<number>\d+)");
var matches = myRegex.Matches(sourceString);
foreach(Match match in matches)
{
//do stuff
Console.WriteLine("Match");
Console.WriteLine("Code: " + match.Groups["code"].Value);
Console.WriteLine("SubCode: " + match.Groups["subCode"].Value);
Console.WriteLine("Number: " + match.Groups["number"].Value);
}
This should give you what you want regardless of what's between the .'s.
#"(?:.+\.){4}(.\w+)\.(\w+)\s?:\s?(\d+)"
I need to match the following strings and returns the values as groups:
abctic
abctac
xyztic
xyztac
ghhtic
ghhtac
Pattern is wrote with grouping is as follows:
(?<arch>[abc,xyz,ghh])(?<flavor>[tic,tac]$)
The above returns only parts of group names. (meaning match is not correct).
If I use * in each sub pattern instead of $ at the end, groups are correct, but that would mean that abcticff will also match.
Please let me know what my correct regex should be.
Your pattern is incorrect because a pipe symbol | is used to specify alternate matches, not a comma in brackets as you were using, i.e., [x,y].
Your pattern should be: ^(?<arch>abc|xyz|ghh)(?<flavor>tic|tac)$
The ^ and $ metacharacters ensures the string matches from start to end. If you need to match text in a larger string you could replace them with \b to match on a word boundary.
Try this approach:
string[] inputs = { "abctic", "abctac", "xyztic", "xyztac", "ghhtic", "ghhtac" };
string pattern = #"^(?<arch>abc|xyz|ghh)(?<flavor>tic|tac)$";
foreach (var input in inputs)
{
var match = Regex.Match(input, pattern);
if (match.Success)
{
Console.WriteLine("Arch: {0} - Flavor: {1}",
match.Groups["arch"].Value,
match.Groups["flavor"].Value);
}
else
Console.WriteLine("No match for: " + input);
}