Making a group with spaces between words and decimals - c#

I'm not new to the concept of regex but the syntax and semantics of everything get confusing for me at times. I have been trying to create a pattern to recognize
Ambient Relative Humidity: 31.59
With the grouping
Ambient Relative Humidity (Group 1)
31.59 (Group 2)
But I also need to be able to match things such as
Operator: Bob
With the grouping
Operator (Group 1)
Bob (Group 2)
Or
Sensor Number: 0001
With the grouping
Sensor Number (Group 1)
0001 (Group 2)
Here is the current pattern I created which works for the examples involving operator and sensor number but does not match with the first example (ambient humidity)
\s*([A-Za-z0-9]*\s*?[A-Za-z0-9]*)\s*:\s*([A-Za-z0-9]*)

You have to add more space separated key parts to the regex.
Also, you have to add an option for decimal numbers in the value.
Something like this ([A-Za-z0-9]*(?:\s*[A-Za-z0-9]+)*)\s*:\s*((?:\d+(?:\.\d*)?|\.\d+)|[A-Za-z0-9]+)?
https://regex101.com/r/fl0wtb/1
Explained
( # (1 start), Key
[A-Za-z0-9]*
(?: \s* [A-Za-z0-9]+ )*
) # (1 end)
\s* : \s*
( # (2 start), Value
(?: # Decimal number
\d+
(?: \. \d* )?
| \. \d+
)
| # or,
[A-Za-z0-9]+ # Alpha num's
)? # (2 end)

I may have posted too soon without thinking, I now have the following expression
\s*([A-Za-z0-9]*\s*[A-Za-z0-9]*\s*[A-Za-z0-9]*)\s*:\s*([A-Za-z0-9.]*)
The only thing is that it includes spaces sometimes that I was trying to avoid but I can just trim those later. Sorry for posting so soon!

var st = "Ambient Relative Humidity: 31.59 Operator: Bob Sensor Number: 0001";
var li = Regex.Matches(st, #"([\w]+?:)\s+(\d+\.?\d+|\w+)").Cast<Match>().ToList();
foreach (var t in li)
{
Console.WriteLine($"Group 1 {t.Groups[1]}");
Console.WriteLine($"Group 2 {t.Groups[2]}");
}
//Group 1 Humidity:
//Group 2 31.59
//Group 1 Operator:
//Group 2 Bob
//Group 1 Number:
//Group 2 0001

Related

Regex split by same character within brackets

I have a like long string, like so:
(A) name1, name2, name3, name3 (B) name4, name5, name7 (via name7) ..... (AA) name47, name47 (via name 46) (BB) name48, name49
Currently I split by "(" but it picks up the via as new lines)
string[] lines = routesRaw.Split(new[] { " (" }, StringSplitOptions.RemoveEmptyEntries);
How can I split the information within the first brackets only? There is no AB, AC, AD, etc. the characters are always the same within the brackets.
Thanks.
You may use a matching approach here since the pattern you need will contain a capturing group in order to be able to match the same char 0 or more amount of times, and Regex.Split outputs all captured substrings together with non-matches.
I suggest
(?s)(.*?)(?:\(([A-Z])\2*\)|\z)
Grab all non-empty Group 1 values. See the regex demo.
Details
(?s) - a dotall, RegexOptions.Singleline option that makes . match newlines, too
(.*?) - Group 1: any 0 or more chars, but as few as possible
(?:\(([A-Z])\2*\)|\z) - a non-capturing group that matches:
\(([A-Z])\2*\) - (, then Group 2 capturing any uppercase ASCII letter, then any 0 or more repetitions of this captured letter and then )
| - or
\z - the very end of the string.
In C#, use
var results = Regex.Matches(text, #"(?s)(.*?)(?:\(([A-Z])\2*\)|\z)")
.Cast<Match>()
.Select(x => x.Groups[1].Value)
.Where(z => !string.IsNullOrEmpty(z))
.ToList();
See the C# demo online.

Match only the nth occurrence using a regular expression

I have a string with 3 dates in it like this:
XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx
I want to select the 2nd date in the string, the 20180208 one.
Is there away to do this purely in the regex, with have to resort to pulling out the 2 match in code. I'm using C# if that matters.
Thanks for any help.
You could use
^(?:[^_]+_){2}(\d+)
And take the first group, see a demo on regex101.com.
Broken down, this says
^ # start of the string
(?:[^_]+_){2} # not _ + _, twice
(\d+) # capture digits
C# demo:
var pattern = #"^(?:[^_]+_){2}(\d+)";
var text = "XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx";
var result = Regex.Match(text, pattern)?.Groups[1].Value;
Console.WriteLine(result); // => 20180208
Try this one
MatchCollection matches = Regex.Matches(sInputLine, #"\d{8}");
string sSecond = matches[1].ToString();
You could use the regular expression
^(?:.*?\d{8}_){1}.*?(\d{8})
to save the 2nd date to capture group 1.
Demo
Naturally, for n > 2, replace {1} with {n-1} to obtain the nth date. To obtain the 1st date use
^(?:.*?\d{8}_){0}.*?(\d{8})
Demo
The C#'s regex engine performs the following operations.
^ # match the beginning of a line
(?: # begin a non-capture group
.*? # match 0+ chars lazily
\d{8} # match 8 digits
_ # match '_'
) # end non-capture group
{n} # execute non-capture group n (n >= 0) times
.*? # match 0+ chars lazily
(\d{8}) # match 8 digits in capture group 1
The important thing to note is that the first instance of .*?, followed by \d{8}, because it is lazy, will gobble up as many characters as it can until the next 8 characters are digits (and are not preceded or followed by a digit. For example, in the string
_1234abcd_efghi_123456789_12345678_ABC
capture group 1 in (.*?)_\d{8}_ will contain "_1234abcd_efghi_123456789".
You can use System.Text.RegularExpressions.Regex
See the following example
Regex regex = new Regex(#"^(?:[^_]+_){2}(\d+)"); //Expression from Jan's answer just showing how to use C# to achieve your goal
GroupCollection groups = regex.Match("XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx").Groups;
if (groups.Count > 1)
{
Console.WriteLine(groups[1].Value);
}

How to match pairs of characters using Regex?

I have a variable containing a string. This string contains only alphabetic and numeric characters. This string also have a fixed length by 32 characters in length. How I can match using regular expressions if this string have only paired characters by length of 2, 4, 8, 16?
For example, for similar to this strings:
abcdefghijklmnopqrstuvwxyz012345
Regex.IsMatch must return false.
But for strings similar to this:
aaaaaaaaaaaaaaaa5555555555555555
this is a 16-characters pairs;
aaaaaaaa55555555aaaaaaaa55555555
this is a 8-characters pairs;
aaaa5555aaaa5555aaaa5555aaaa5555
this is a 4-characters pairs;
aa55aa55aa55aa55aa55aa55aa55aa55
this is a 2-characters pairs -
Regex.IsMatch must return true.
EDIT
Apparently, the requirement is simply to match eg aabbccddeeffgghhiijjkkllmmnnoopp, ie the first two characters must be the same, then the next two etc for exactly 32 characters. That can be easily tested for with:
((\w)\2(\w)\3){8}
This should work (without needing to come up with individual regexes for each possible combination).
public bool isRelevantMatch(string inputString)
{
int matchCount = Regex.Matches(inputString, #"([a-zA-Z])\1{1}").Count;
return matchCount == 1 ||
matchCount == 2 ||
matchCount == 4 ||
matchCount == 8 ||
matchCount == 16;
}
Explanation: get the count of matches of repeated characters (using a backreference regex to match any instance of aa, AA, bb, BB, etc.). If that count is 1, 2, 4, or 8, return true (there are 2, 4, 8, or 16 paired characters in the string).
Late to this but will throw out this.
If you want to do repeating 'unique pairs' this works in Perl.
I tried to make this smaller but couldn't figure out how.
The syntax for Dot-Net is probably the same. However, I've reused the
capture group names, which works in Perl, but not sure about Dot-Net
(should be ok, if not change to unique names).
Also, in Perl, could have used a branch reset to overlay capture groups,
then test a single group length to get the repeat order, but this is not available in Dot-Net.
So, just have to test 4 groups for a match (or length) to get the order.
# (?<A>(?<b>\w)\k<b>(?!\k<b>)(?<c>\w)\k<c>)\k<A>{7}|(?<A>(?<b>\w)\k<b>{3}(?!\k<b>)(?<c>\w)\k<c>{3})\k<A>{3}|(?<A>(?<b>\w)\k<b>{7}(?!\k<b>)(?<c>\w)\k<c>{7})\k<A>{1}|(?<A>(?<b>\w)\k<b>{15}(?!\k<b>)(?<c>\w)\k<c>{15})
(?<A> # (1 start), 2 char pairs, repeating x 8
(?<b> \w ) # (2)
\k<b>
(?! \k<b> )
(?<c> \w ) # (3)
\k<c>
) # (1 end)
\k<A>{7}
|
(?<A> # (4 start), 4 char pairs, repeating x 4
(?<b> \w ) # (5)
\k<b>{3}
(?! \k<b> )
(?<c> \w ) # (6)
\k<c>{3}
) # (4 end)
\k<A>{3}
|
(?<A> # (7 start), 8 char pairs, repeating x 2
(?<b> \w ) # (8)
\k<b>{7}
(?! \k<b> )
(?<c> \w ) # (9)
\k<c>{7}
) # (7 end)
\k<A>{1}
|
(?<A> # (10 start), 16 char pairs
(?<b> \w ) # (11)
\k<b>{15}
(?! \k<b> )
(?<c> \w ) # (12)
\k<c>{15}
) # (10 end)

Parameter regex

I need help with creating regex for validating parameter string.
Parameter string consists of 2 optional groups of explicit characters. First group can contain only one occurrence of P, O, Z characters (order doesn't matter). Second group has same restrictions but can contain only characters t, c, p, m. If both groups are presented, they need to be delimited by a single space character.
So valid strings are:
P t
PO t
OZP ct
P tcmp
P
PZ
t
tp
etc.
Why not ditch regex, and using string to represent non string data, and do
[Flags]
enum First
{
None = 0,
P = 1,
O = 2,
Z = 4
}
[Flags]
enum Second
{
None = 0
T = 1,
C = 2,
P = 4,
M = 8
}
void YourMethod(First first, Second second)
{
bool hasP = first.HasFlag(First.P);
var hasT = second.HasFlag(Second.T);
}
You could then call YourMethod like this.
// equivalent to "PO mp", but checked at compile time.
YourMethod(First.P | First.O, Second.M | Second.P);
or, if you felt like it
// same as above.
YourMethod((First)3, (Second)12);
If you'd like to know more about how this works see this question.
I don't think a regex is a good solution here, because it will have to be quite complicated:
Regex regexObj = new Regex(
#"^ # Start of string
(?: # Start non-capturing group:
([POZ]) # Match and capture one of [POZ] in group 1
(?![POZ]*\1) # Assert that that character doesn't show up again
)* # Repeat any number of times (including zero)
(?: # Start another non-capturing group:
(?<!^) # Assert that we're not at the start of the string
\ # Match a space
(?!$) # Assert that we're also not at the end of the string
)? # Make this group optional.
(?<! # Now assert that we're not right after...
[POZ] # one of [POZ] (i. e. make sure there's a space)
(?!$) # unless we're already at the end of the string.
) # End of negative lookahead assertion
(?: # Start yet another non-capturing group:
([tcpm]) # Match and capture one of [tcpm] in group 2
(?![tcpm]*\2) # Assert that that character doesn't show up again
)* # Repeat any number of times (including zero)
$ # End of string",
RegexOptions.IgnorePatternWhitespace);
This should give you what you need:
([POZ]+)? ?([tcpm]+)?

Find a pattern to match 'a', ignoring that 'a' which lies within 'b' and 'c'

Need a compound expression for
" from" such that " from" is not within parenthesis
(ignoring those which are in parenthesis) here a=" from"; b="("; and c=")";
The closest (but invalid) pattern I could write is
string pat = #"^((?!\(.* from.*\)).)* from((?!\(.* from.*\)).)*$";
my expression denies if any " from" is present in parenthesis but i want to strictly ignore such " from"
Matches should be found in:
1: " from" 2:select field1 from t1 (select field1 from t1) ---- 1 time in both
3: select field1 from t1 (select field1 from t1)select field1 from t1 ---2 times
Strings not containing matches:(Because i want to ignore the " from" within parenthesis)
1: select field1 no_f_rom_OutOf_Parenthesis t1 (select field1 from t1)
2: (select field1 from t1) 3: "" (Empty String) 4. No word as form
0 times in all four strings
Relevant Material: (not much necessary to read)
The most helpful link nearer to my question telling how to match 'pattern' but not 'regular' has been a reply by stanav at Jul 31st, 2009, 08:05 AM in following link...
http://www.vbforums.com/archive/index.php/t-578417.html
Also: Regex in C# that contains "this" but not "that
Also: Regular expression to match a line that doesn't contain a word?
I have studied/searched about a week but still Its complex for me:)
The following should work, even with arbitrarily nested parentheses:
if (Regex.IsMatch(subjectString,
#"\sfrom # Match ' from'
(?= # only if the following regex can be matched here:
(?: # The following group, consisting of
[^()]* # any number of characters except parentheses,
\( # followed by an opening (
(?> # Now match...
[^()]+ # one or more characters except parentheses
| # or
\( (?<DEPTH>) # a (, increasing the depth counter
| # or
\) (?<-DEPTH>) # a ), decreasing the depth counter
)* # any number of times
(?(DEPTH)(?!)) # until the depth counter is zero again,
\) # then match the closing )
)* # Repeat this any number of times.
[^()]* # Then match any number of characters except ()
\z # until the end of the string.
) # End of lookahead.",
RegexOptions.IgnorePatternWhitespace))
As a single line regex ("The horror! The horror!"), if you insist:
if (Regex.IsMatch(subjectString,#"\sfrom(?=(?:[^()]*\((?>[^()]+|\((?<DEPTH>)|\)(?<-DEPTH>))*(?(DEPTH)(?!))\))*[^()]*\z)"))
This may be what you want.
string s="select field1 dfd t1 (select field1 from t1)select field1 from t1";
Regex r=new Regex(#"(?<=\)|^)\bselect\b.*?\bfrom\b.*?(?=\()",RegexOptions.RightToLeft);
r.Replace(s,"HELL yeah");

Categories