How to find values based on pattern matching from 2 string - c#

I have string
a = "{key1}|{key2}_{key3}-{key4}"
I have another string
b = "abc|qwe_tue-pqr"
I need the output to be as to get values of
key1="abc", key2="qwe", key3="tue" and key4="pqr"

If the use case is as simple as presented you could perhaps transform a into a regex with named capturing groups:
var r = new Regex(Regex.Escape(a).Replace("\\{","(?<").Replace("}",">[a-z]+)"));
This turns a into a regex like (the | delimiter needed escaping):
(?<key1>[a-z]+)\|(?<key2>[a-z]+)_(?<key3>[a-z]+)-(?<key4>[a-z]+)
You can then get the matches and print them:
var m = r.Match(b);
foreach(Group g in m.Groups){
Console.WriteLine(g.Name + "=" + g.Value);
}
key1=abc
key2=qwe
key3=tue
key4=pqr
I can't promise it will reliably process everything you throw into it, based on the very limited input example, but it should give you the idea for a start on the process

Related

C# Regex matching using a varible

I am most familiar with PowerShell and have recently moved into using C# as my primary language. In PowerShell it's possible to do the following
$var1 = "abc"
"abc" -match "$var1"
This results in a true statement.
I would like to do be able to do the same thing in C#. I know that you can use interpolation with C# and I have tries various ways of trying to use Regex.Match() with no luck.
Example:
string toMatch = "abc";
var result = Regex.Match("abc", $"{{toMatch}}");
var a = Regex.Match("abc", $"{{{toMatch}}}");
var b = Regex.Match("abc", $"{toMatch}");
var c = Regex.Match(toMatch,toMatch);
None of the above seems to work. I am not even sure if what I am trying to do is possible in C#. Ideally I'd like to be able to use a combination of variables and Regex for a match. Something even like this Regex.Match(varToMatch,$"{{myVar}}\\d+\\w{4}")
edit:
After reading some answers here and trying some code out it appears that my real issue is trying to match up against a directory path. Something like "C:\temp\abcfile". For example:
string path = #"C:\temp\abc";
string path2 = #"C:\temp\abc";
string fn = path.Split('\\').LastOrDefault();
path = Regex.Escape(path);
path2 = Regex.Escape(path2);
Regex rx = new Regex(path);
var a = Regex.Match(path.Split('\\').Last().ToString(), $"{fn}");
//Example A works if I split and match on just the file name.
var b = Regex.Match(path, $"{rx}");
//Example B does not work, even though it's a regex object.
var c = Regex.Match(path, $"{{path}}");
//Example C I've tried one, two, and three sets of parenthesis with no luck
var d = Regex.Match(path,path);
// Even a direct variable to variable match returns 0 results.
You seem to have it right in the last example, so perhaps the issue is that you're expecting a bool result instead of a Match result?
Hopefully this small example helps:
int a = 123;
string b = "abc";
string toMatch = "123 and abc";
var result = Regex.Match(toMatch, $"{a}.*{b}");
if (result.Success)
{
Console.WriteLine("Found a match!");
}

Using RegEx to split strings after specific character

I've been working on trying to get this string split in a couple different places which I managed to get to work, except if the name had a forward-slash in it, it would throw all of the groups off completely.
The string:
123.45.678.90:00000/98765432109876541/[CLAN]PlayerName joined [windows/12345678901234567]
I essentially need the following:
IP group: 123.45.678.90:00000 (without the following /)
id group: 98765432109876541
name group: [CLAN]PlayerName
id1 group: 12345678901234567
The text "joined" also has to be there. However windows does not.
Here is what I have so far:
(?<ip>.*)\/(?<id>.*)\/(.*\/)?(?<name1>.*)( joined.*)\[(.*\/)?(?<id1>.*)\]
This works like a charm unless the player name contains a "/". How would I go about escaping that?
Any help with this would be much appreciated!
Since you tag your question with C# and Regex and not only Regex, I will propose an alternative. I am not sure if it will more efficient or not. I find it easiest to read and to debug if you simply use String.Split():
Demo
public void Main()
{
string input = "123.45.678.90:00000/98765432109876541/[CLAN]Player/Na/me joined [windows/12345678901234567]";
// we want "123.45.678.90:00000/98765432109876541/[CLAN]Player/Na/me joined" and "12345678901234567]"
// Also, you can remove " joined" by adding it before " [windows/"
var content = input.Split(new string[]{" [windows/"}, StringSplitOptions.None);
// we want ip + groupId + everything else
var tab = content[0].Split('/');
var ip = tab[0];
var groupId = tab[1];
var groupName = String.Join("/", tab.Skip(2)); // merge everything else. We use Linq to skip ip and groupId
var groupId1 = RemoveLast(content[1]); // cut the trailing ']'
Console.WriteLine(groupName);
}
private static string RemoveLast(string s)
{
return s.Remove(s.Length - 1);
}
Output:
[CLAN]Player/Na/me joined
If you are using a class for ip, groupId, etc. and I guess you do, just put everything in it with a constructor which accept a string as parameter.
You shouldn't be using greedy quanitifiers (*) with an open character such as .. It won't work as intended and will result in a lot of backtracking.
This is slightly more efficient, but not overly strict:
^(?<ip>[^\/\n]+)\/(?<id>[^\/]+)\/(?<name1>\S+)\D+(?<id1>\d+)]$
Regex demo
You basically needs to use non greedy selectors (*?). Try this:
(?<ip>.*?)\/(?<id>.*?)\/(?<name1>.*?)( joined )\[(.*?\/)?(?<id1>.*?)\]

Replace string with regular expression and my own parameter

In my html I've serval token like this:
{PROP_1_1}, {PROP_1_2}, {PROP_37871_1} ...
Actually I replace that token with the following code:
htmlBuffer = htmlBuffer.Replace("{PROP_" + prop.PropertyID + "_1}", prop.PropertyDefaultHtml);
where prop is a custom object. But in this case it affects only the tokens ending with '_1'. I would like to propagate this logic to all the rest ending up with '_X' where X is numeric.
How could I implement a regexp pattern to achieve this?
You can use Regex.Replace():
Regex rgx = new Regex("{PROP_" + prop.PropertyID + "_\d+}");
htmlBuffer = rgx.Replace(htmlBuffer, prop.PropertyDefaultHtml);
You can do even better, you can catch both identifiers in a regular expression. That way you can loop through the references that exist in the string and get the properties for those, instead of looping through all the properties that you have and check if there is any reference for them in the string.
Example:
htmlBuffer = Regex.Replace(htmlBuffer, #"{PROP_(\d+)_(\d+)}", m => {
int id = Int32.Parse(m.Groups[1].Value);
int suffix = Int32.Parse(m.Groups[2].Value);
return properties[id].GetValue(suffix);
});

Match Multiline & IgnoreSome

I'm trying to extract some information from a JCL source using regex in C#
Basically, this is a string I can have:
//JOBNAME0 JOB (BLABLABLA),'SOME TEXT',MSGCLASS=YES,ILIKE=POTATOES, GRMBL
// IALSOLIKE=TOMATOES, ANOTHER GARBAGE
// FINALLY=BYE
//OTHER STUFF
So I need to extract the jobname JOBNAME0, the info (BLABLABLA), the description 'SOME TEXT' and the other parms MSGCLASS=YES ILIKE=POTATOES IALSOLIKE=TOMATOES FINALLY=BYE.
I must ignore everything that is after the space ... like GRMBL or ANOTHER GARBAGE
I must continue to next line if my last valid char was a , and stop if it there were none.
So far, I have successfully managed to get the jobname, the info and the description, pretty easy. For the other parms, i'm able to get all the parms and to split them, but i don't know how to get rid of the garbage.
Here is my code:
var regex = "//([^\\s]*) JOB (\\([^)]*\\))?,?(\\'[^']*\\')?,?([^,]*[,|\\s|$])*";
Match match2 = Regex.Match(test5, regex,RegexOptions.Singleline);
string CarteJob2 = match2.Groups[0].Value;
string JobName2 = match2.Groups[1].Value;
string JobInfo2 = match2.Groups[2].Value;
string JobDesc2 = match2.Groups[3].Value;
IEnumerable<string> parms = match2.Groups[4].Captures.OfType<Capture>().Select(x => x.Value);
string JobParms2 = String.Join("|", parms);
Console.WriteLine(CarteJob2 + "|");
Console.WriteLine(JobName2 + "|");
Console.WriteLine(JobInfo2 + "|");
Console.WriteLine(JobDesc2 + "|");
Console.WriteLine(JobParms2 + "|");
The output I get is this one:
//JOBNAME0 JOB (BLABLABLA),'SOME TEXT',MSGCLASS=YES,ILIKE=POTATOES, GRMBL
// IALSOLIKE=TOMATOES, ANOTHER GARBAGE
// FINALLY=BYE
//OTHER |
JOBNAME0|
(BLABLABLA)|
'SOME TEXT'|
MSGCLASS=YES,|ILIKE=POTATOES,| GRMBL
// IALSOLIKE=TOMATOES,| ANOTHER GARBAGE
// FINALLY=BYE
//OTHER |
The output I would like to see is:
//JOBNAME0 JOB (BLABLABLA),'SOME TEXT',MSGCLASS=YES,ILIKE=POTATOES, GRMBL
// IALSOLIKE=TOMATOES, ANOTHER GARBAGE
// FINALLY=BYE|
JOBNAME0|
(BLABLABLA)|
'SOME TEXT'|
MSGCLASS=YES|ILIKE=POTATOES|IALSOLIKE=TOMATOES|FINALLY=BYE|
Is there a way to get what I want ?
I think I'd try and do this with two Regex expressions.
The first one to get all the starting information from the beginning of the string - job name, info, description.
The second one to get all the parameters, which all seem to have a simple pattern of <param name>=<param value>.
The first Regex might look like this:
^//(?<job>[\d\w]+)[ ]+JOB[ ]+\((?<info>[\d\w]+)\),'(?<description>[\d\w ]+)'
I don't know if rules permit whitespaces to appear in the job name, info or description - adjust as needed. Also, I'm assuming this is the start of the file using the ^ char. Finally, this Regex has groups already defined, so getting values should be easier in C#.
The second Regex might be something like this:
(?<param>[\w\d]+)=(?<value>[\w\d]+)
Again, grouping is added to help get the parameter names and values.
Hope this helps.
EDIT:
A small tip - you can use the # sign before a string in C# to make it easier to write such Regex patterns. For example:
Regex reg = new Regex(#"(?<param>[\w\d]+)=(?<value>[\w\d]+)");

Regular expression that returns a constant value as part of a match

I have a regular expression to match 2 different number formats: \=(?[0-9]+)\?|\+(?[0-9]+)\?
This should return 9876543 as its Value for ;1234567890123456?+1234567890123456789012345123=9876543? and ;1234567890123456?+9876543?
What I would like is to be able to return another value along with the matched 'Value'.
So, for example, if the first string was matched, I'd like it to return:
Value:
9876543
Format:
LongFormat
And if matched in the second string:
Value:
9876543
Format:
ShortFormat
Is this possible?
Another option, which is not quite the solution you wanted, but saves you using two separate regexes, is to use named groups, if your implementation supports it.
Here is some C#:
var regex = new Regex(#"\=(?<Long>[0-9]+)\?|\+(?<Short>[0-9]+)\?");
string test1 = ";1234567890123456?+1234567890123456789012345123=9876543?";
string test2 = ";1234567890123456?+9876543?";
var match = regex.Match(test1);
Console.WriteLine("Long: {0}", match.Groups["Long"]); // 9876543
Console.WriteLine("Short: {0}", match.Groups["Short"]); // blank
match = regex.Match(test2);
Console.WriteLine("Long: {0}", match.Groups["Long"]); // blank
Console.WriteLine("Short: {0}", match.Groups["Short"]); // 9876543
Basically just modify your regex to include the names, and then regex.Groups[GroupName] will either have a value or wont. You could even just use the Success property of the group to know which matched (match.Groups["Long"].Success).
UPDATE:
You can get the group name out of the match, with the following code:
static void Main(string[] args)
{
var regex = new Regex(#"\=(?<Long>[0-9]+)\?|\+(?<Short>[0-9]+)\?");
string test1 = ";1234567890123456?+1234567890123456789012345123=9876543?";
string test2 = ";1234567890123456?+9876543?";
ShowGroupMatches(regex, test1);
ShowGroupMatches(regex, test2);
Console.ReadLine();
}
private static void ShowGroupMatches(Regex regex, string testCase)
{
int i = 0;
foreach (Group grp in regex.Match(testCase).Groups)
{
if (grp.Success && i != 0)
{
Console.WriteLine(regex.GroupNameFromNumber(i) + " : " + grp.Value);
}
i++;
}
}
I'm ignoring the 0th group, because that is always the entire match in .NET
No, you can't match text that isn't there. The match can only return a substring of the target.
You essentially want to match against two patterns and take different actions in each case. See if you can separate them in your code:
if match(\=(?[0-9]+)\?) then
return 'Value: ' + match + 'Format: LongFormat'
else if match(\+(?[0-9]+)\?) then
return 'Value: ' + match + 'Format: ShortFormat'
(Excuse the dodgy pseudocode, but you get the idea.)
You can't match text that isn't there - but, depending on what language you're using, you can process what you match, and conditionally add text based on what is there.
With some implementations of regex, you can specify a "callback function" which allows you to run logic against each result.
Here's a pseudo-code example:
Input.replaceAll( /[+=][0-9]+(?=\?)/ , formatValue );
formatValue : function(match,groups)
{
switch( left(match,1) )
{
case '+' : Format = 'Short'; break;
case '=' : Format = 'Long'; break;
default : Format = 'Unknown'; break;
}
Value : match.replace('[+=]');
return 'Value: '+Value+' Format: ' + Format;
}
What that will do, in a language that supports regex callbacks, is execute the formatValue function every time it finds a match, and use the result of the function as the replacement text.
You haven't specified which implementation you're using, so this may or not be possible for you, but it is definitely worth checking out.

Categories