Getting string between characters in Regex

Getting string between characters in Regex - c#

I have a list of strings with the output below
stop = F6, quantity ( 1 ) // stop 0
stop = F8, quantity ( 1 ) // stop 1
stop = BN, quantity ( 1 ) // stop 2
stop = F6, quantity ( 1 ) // stop 3
stop = F8, quantity ( 1 ) // stop 4
stop = BN, quantity ( 1 ) // stop 5
stop = F6, quantity ( 1 ) // stop 6
stop = F8, quantity ( 1 ) // stop 7
stop = SC, quantity ( 1 ) // stop 8
etc
using a foreach loop i'm retrieving each line in the list ie
`stop = F6, quantity ( 1 ) // stop 0`
However I only need the character F6.
I Know I need to use regex to retrieve f6 in this instance, however, I am unsure on the expression. From a brief tutorial on regex, I've tried using the code below to achieve this with no luck
`Regex.Match(output, #"=\w*,").Value.Replace("\"", "");`
Any help would be appreciated.

You can use this pattern:
"=\\s([A-Za-z0-9]{2}),"
//or
"=\\s(\\w+),"
Code:
string str = "stop = F6, quantity ( 1 ) ";
var res = Regex.Matches(str, "=\\s([A-Za-z0-9]{2}),")[0].Groups[1].Value;

i don't know much in C# but you're regex is this : "= (\w+),". That regex get any words/digit between = and ,.
In regex, an expression between parenthesis is call a "Capturing Group". In any languages you have some API to retrieve content capture in capturing group. I found this for C# : https://msdn.microsoft.com/fr-fr/library/system.text.regularexpressions.match.groups(v=vs.110).aspx
So the code for retrieve you're data look like that :
String pattern = #"=\\s(\\w+),";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
Console.WriteLine("Value : {0}", match.Groups[1].Value);
}
To test you're regex in live, https://regex101.com/ is so usefull ! Use it to see visually what the regex request do while you write it.

Related

C# check if characters occur in a fixed order in a string

I need to check if a user input resembles a parameter or not. It comes as a string (not changeable) and has to look like the following examples:
p123[2] -> writable array index
r23[12] -> read only array index
p3[7].5 -> writable bit in word
r1263[13].24 -> read only bit in word
15 -> simple value
The user is allowed to input any of them and my function has to distinguish them in order to call the proper function.
An idea would be to check for characters in a specific order e.g. "p[]", "r[]", "p[]." etc.
But I am not sure how to archive that without checking each single character and using multiple cases...
Any other idea of how to make sure that the user input is correct is also welcomed.

If you just need to validate user input that should come in 1 of the 5 provided formants, use a regex check:
Regex.IsMatch(str, #"^(?:(?<p>[pr]\d+)(?:\[(?<idx>\d+)])?(?:\.(?<inword>\d+))?|(?<simpleval>\d+))$")
See the regex demo
Description:
^ - start of string
(?: - start of the alternation group
(?<p>[pr]\d+) - Group "p" capturing p or r and 1 or more digits after
(?:\[(?<idx>\d+)])? - an optional sequence of [, 1 or more digits (captured into Group "idx") and then ]
(?:\.(?<inword>\d+)‌)? - an optional sequence of a literal ., then 1 or more digits captured into Group "inword"
| - or (then comes the second alternative)
(?<simpleval>\d+)‌ - Group "simpleval" capturing 1 or more digits
) - end of the outer grouping
$ - end of string.
If the p or r can be any ASCII letters, use [a-zA-Z] instead of [pr].
C# demo:
var strs = new List<string> { "p123[2]","r23[12]","p3[7].5","r1263[13].24","15"};
var pattern = #"^(?:(?<p>[pr]\d+)(?:\[(?<idx>\d+)])?(?:\.(?<inword>\d+))?|(?<simpleval>\d+))$";
foreach (var s in strs)
Console.WriteLine("{0}: {1}", s, Regex.IsMatch(s, pattern));

You can check if the input match with a regex pattern :
1 ) Regex.IsMatch(input,#"^p\d+\[\d+\]$"); // match p123[2]
2 ) Regex.IsMatch(input,#"^r\d+\[\d+\]$"); // match r23[12]
3 ) Regex.IsMatch(input,#"^p\d+\[\d+\]\.\d+$"); // match p3[7].5
4 ) Regex.IsMatch(input,#"^r\d+\[\d+\]\.\d+$"); // match r1263[13].24
5 ) Regex.IsMatch(input,#"^\d+$") ;// match simple value

Replace regex Match with other value

I have a query like this:
select * from tdirectories where tdirectories.parent in
(
select max(tdirectories.directoryid) from tdirectories
where tdirectories.ntfsdrivedocuid in
(
select ntfsdrivedocuid from tntfsdrives, tntfsdrivedocu
where tntfsdrivedocu.ntfsdriveid = tntfsdrives.ntfsdriveid and tntfsdrives.hostid in
(
select tdocu.hostid from tdocu, tshares
here tdocu.docuid = tshares.docuid
and tdocu.archiv = 0
)
and tntfsdrivedocu.archiv = 0
)
and tdirectories.pathhash in (select tshares.pathhash from tshares )
)
What I want to do is that by using RegEx I want to find this part:
select max(tdirectories.directoryid)
Inside the max can be any value. I want to find it and remove, as result i will have
select tdirectories.directoryid
The regex I have created looks like this:
Regex rgx = new Regex("(select\\s.+select)\\smax\\s*\\((?<VAR>[^)]+)\\)");
But this does not solve my issue. What am i missing?

You could go for (in free mode):
select # select literally
\ # a space
max # max literally
\(([^)]+)\) # capture anything inside the parentheses
And use the first group ($1), see a demo on regex101.com.

Regex to split "&" in URL parameters only if they are followed by content ending with "="

I have a dilemma that I have been attempting to resolve with malformed URL's, where specific parameters can have values that contain specific characters that might conflict with parsing the url.
if( remaining.Contains( "?" ) || remaining.Contains( "#" ) )
{
if( remaining.Contains( "?" ) )
{
Path = remaining.Substring( 0, temp = remaining.IndexOf( "?" ) );
remaining = remaining.Substring( temp + 1 );
// Re-encode for URLs
if( remaining.Contains( "?" ) )
{
remaining = URL.Substring( URL.IndexOf( "?" ) + 1 );
}
if( remaining.IndexOf("=") >= 0 )
{
string[] qsps = Regex.Split( remaining, #"[&]\b" );// Original Method: remaining.Split( '&' );
qsps.ToList().ForEach( qsp =>
{
string[] vals = qsp.Split( '=' );
if( vals.Length == 2 )
{
Parameters.Add( vals[0], vals[1] );
}
else
{
string key = (string) vals[0].Clone();
vals[0] = "";
Parameters.Add( key, String.Join( "=", vals ).Substring( 1 ) );
}
} );
}
}
I added the line "Regex.Split( remaining, #"[&]\b" );" to grab "&" that were followed by a character, which seems useful.
I am just trying to see if there is a better approach to only splitting the "&'s" that are actually for parameters?
Example to test against (which caused this needed update):
www.myURL.com/shop/product?utm_src=bm23&utm_med=email&utm_term=apparel&utm_content=02/15/2016&utm_campaign=Last
Chance! Presidents' Day Sales Event: Free Shipping & More!
A working regex should only grab the &'s for the following:
utm_src=btm23
utm_med=email
utm_term=apparel
utm_content=02/15/2016
utm_campaign=Last Chance! Presidents' Day Sales Event: Free Shipping & More!
It should NOT count the "& More" as a match, since the section does not end with "=" afterwards

I would suggest a regex using a look-ahead:
/&(?=[^&=]+=)/
You can see this in effect here: version1. It looks first for the & character, and then "peeks" forward to ensure that a = follows, but only if it does not contain another & or a = in between.
You can also ensure that there are no whitespace characters (like newlines, etc.) which aren't valid in URLs anyway (version 2):
&(?=[^\s&=]+=)

I'd like to use this regex:
Regex.Split(url, #"(?<=(?:=\S+?))&",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
if you pass your test string via url which is.
www.myURL.com/shop/product?utm_src=bm23&utm_med=email&utm_term=apparel&utm_content=02/15/2016&utm_campaign=Last Chance! Presidents' Day Sales Event: Free Shipping & More!
The output should be.
www.myURL.com/shop/product?utm_src=bm23
utm_med=email
utm_term=apparel
utm_content=02/15/2016
utm_campaign=Last Chance! Presidents' Day Sales Event: Free Shipping & More!
Please note first line of output.
www.myURL.com/shop/product?utm_src=bm23
which contains first path of url, but can be easily splitted by ?

Not sure what you're trying to do, but if you want to find errant
ampersands, this is a good regex for that.
&(?=[^&=]*(?:&|$))
You could either replace with a %26 or split with it.
If you split with it, just recombine and the errant ampersand will be gone.

(?<=[?&])([^&]*)(?=.*[&=])
Explanation:
(?<=[?&]) positive lookbehind for either '&' or '?'
([^&]*) capture as many characters as possible that aren't '&'
(?=.*[&=]) positive lookahead for either an '&' or '='
Output:
utm_src=bm23
utm_med=email
utm_term=apparel
utm_content=02/15/2016
utm_campaign=Last Chance! Presidents' Day Sales Event: Free Shipping
Demo
So to get the matches:
string str = "www.myURL.com/...";
Regex reg = "(?<=[?&])([^&]*)(?=.*[&=])";
List<string> result = reg.Matches(str).Cast<Match>().Select(m => m.Value).ToList();
Edit for the question edit:
(?<=[?&])\S.*?(?=&\S)|(?<=[?&])\S.*(?=\s)

capturing specific or group in regex C#

I'm trying to parse match a file name like xxxxSystemCheckedOut.png where xxx can be any prefix to the file name and System and Checked out are keywords to identify.
EDIT: I wasn't being clear on all the possible file names and their results. So filenames can be
xxxxSystem.png produces (group 1: xxxx group 2: System)
xxxxSystemCheckedOut.png produces (group 1: xxxx group 2: System group 3: CheckedOut)
xxxxCheckedOut.png produces (group 1: xxxx group 2: CheckedOut)
this is my current regex, it matchs the file name like I want it to but can't get it to group in the right way.
Using the previous example I'd like the groups to be like this:
xxxx
System
CheckedOut
.png
(?:([\w]*)(CheckedOut|System)+(\.[a-z]*)\Z)

[EDIT]
Give this a try.
Pattern: (.*?)(?:(System)|(CheckedOut)|(Cached))+(.png)\Z
String: xxxxTESTSystemCached.png
Groups:
xxxxTest
System
Cached
.png
https://regex101.com/r/jE5eA4/1

UPDATE - Based on comments to other answers:
This should work for all combinations of System/CheckedOut/Cached:
(\w+?)(System)?(CheckedOut)?(Cached)?(.png)
https://regex101.com/r/qT2sX9/1
Note that that the groups for missing keywords will still exist, so for example:
"abcdSystemCached.png" gives:
Match 1 : "abcd"
Match 2 : "System"
Match 3 :
Match 4 : "Cached"
Match 5 : ".png"
And "1234CheckedOutCached.png" gives:
Match 1 : "abcd"
Match 2 :
Match 3 : "CheckedOut"
Match 4 : "Cached"
Match 5 : ".png"
This is kinda nice as you know a particular keyword will always be a certain position, so it becomes like a flag.

From the comments: I actually need the groups separately so I know how to operate on the image, each keyword ends in different operations on the image
You really don't need to use separate capture buffers on the keywords.
If you need the order of the matched keywords relative to one another,
you'd use the below code. Even if you didn't need the order it could be
done like that.
( .*? ) # (1)
( System | CheckedOut )+ # (2)
\.png $
C#:
string fname = "xxxxSystemCheckedOutSystemSystemCheckedOutCheckedOut.png";
Regex RxFname = new Regex( #"(.*?)(System|CheckedOut)+\.png$" );
Match fnameMatch = RxFname.Match( fname );
if ( fnameMatch.Success )
{
Console.WriteLine("Group 0 = {0}", fnameMatch.Groups[0].Value);
Console.WriteLine("Group 1 = {0}", fnameMatch.Groups[1].Value);
Console.WriteLine("Last Group 2 = {0}\n", fnameMatch.Groups[2].Value);
CaptureCollection cc = fnameMatch.Groups[2].Captures;
Console.WriteLine("Array and order of group 2 matches (collection):\n");
for (int i = 0; i < cc.Count; i++)
{
Console.WriteLine("[{0}] = '{1}'", i, cc[i].Value);
}
}
Output:
Group 0 = xxxxSystemCheckedOutSystemSystemCheckedOutCheckedOut.png
Group 1 = xxxx
Last Group 2 = CheckedOut
Array and order of group 2 matches (collection):
[0] = 'System'
[1] = 'CheckedOut'
[2] = 'System'
[3] = 'System'
[4] = 'CheckedOut'
[5] = 'CheckedOut'

I'm no Regex wizard, so if this can be shortened/tidied I'd love to know, but this groups like you want based on the keywords you gave:
Edited based on OPs clarification of the file structure
(\w+?)(system)?(checkedout)?(cached)?(.png)/ig
Regex101 Demo
Edit: beercohol and jon have me beat ;-)

I read somewhere (can't remember where) the more precise your pattern is, the better performance you'll get from it.
So try this pattern
"(\\w+?)(?:(System)|(CheckedOut))+(.png)"
Code Sample:
List<string> fileNames = new List<string>
{
"xxxxSystemCheckedOut.png", // Good
"SystemCheckedOut.png", // Good
"1afweiljSystemCheckedOutdgf.png", // Bad - Garbage characters before .png
"asdf.png", // Bad - No System or CheckedOut
"xxxxxxxSystemCheckedOut.bmp", // Bad - Wrong file extension
"xxSystem.png", // Good
"xCheckedOut.png" // Good
};
foreach (Match match in fileNames.Select(fileName => Regex.Match(fileName, "(\\w+?)(?:(System)|(CheckedOut))+(.png)")))
{
List<Group> matchedGroups = match.Groups.Cast<Group>().Where(group => !String.IsNullOrEmpty(group.Value)).ToList();
if (matchedGroups.Count > 0)
{
matchedGroups.ForEach(Console.WriteLine);
Console.WriteLine();
}
}
Results:
xxxxSystemCheckedOut.png
xxxx
System
CheckedOut
.png
SystemCheckedOut.png
System
CheckedOut
.png
xxSystem.png
xx
System
.png
xCheckedOut.png
x
CheckedOut
.png

How do I find the Nth occurrence of a pattern with regex?

I have a string of numbers separated by some non-numeric character like this: "16-15316-273"
Is it possible to build regex expression the way it returns me Nth matching group? I heard that ${n} might help, but it does not work for me at least in this expression:
// Example: I want to get 15316
var r = new Regex(#"(\d+)${1}");
var m = r.Match("16-15316-273");
(\d+)${0} returns 16, but (\d+)${1} gives me 273 instead of expected 15316
So N which is order of pattern needed to be extracted and input string itself ("16-15316-273" is just an example) are dynamic values which might change during app execution. The task is to build regex expression the way where the only thing changed inside it is N, and to be applicable to any such string.
Please do not offer solutions with any additional c# code like m.Groups[n] or Split, I'm intentionally asking for building proper Regex pattern for that. In short, I can not modify the code for every new N value, all I can modify is regex expression which is built dynamically, N will be passed as a parameter to the method. All the rest is static, no way to change it.

Maybe this expression will help you?
(?<=(\d+[^\d]+){1})\d+
You will need to modify {1} according to your N.
I.e.
(?<=(\d+[^\d]+){0})\d+ => 16
(?<=(\d+[^\d]+){1})\d+ => 15316
(?<=(\d+[^\d]+){2})\d+ => 273

Your regular expression
(\d+)${1}
says to match this:
(\d+): match 1 or more decimal digits, followed by
${1}: match the atomic zero-width assertion "end of input string" exactly once.
One should note that the {1} quantifier is redundant since there's normally only one end-of-input-string (unless you've turned on the multiline option).
That's why you're matching `273': it's the longest sequence of digits anchored at end-of-string.
You need to use a zero-width positive look-behind assertion. To capture the Nth field in your string, you need to capture that string of digits that is preceded by N-1 fields. Given this source string:
string input = "1-22-333-4444-55555-666666-7777777-88888888-999999999" ;
The regular expression to match the 3rd field, where the first field is 1 rather than 0 looks like this:
(?<=^(\d+(-|$)){2})\d+
It says to
match the longest sequence of digits that is preceded by
start of text, followed by
a group, consisting of
1 or more decimal digits, followed by
either a - or end-of-text
with that group repeated exactly 2 times
Here's a sample program:
string src = "1-22-333-4444-55555-666666-7777777-88888888-999999999" ;
for ( int n = 1 ; n <= 10 ; ++n )
{
int n1 = n-1 ;
string x = n1.ToString(CultureInfo.InvariantCulture) ;
string regex = #"(?<=^(\d+(-|$)){"+ x + #"})\d+" ;
Console.Write( "regex: {0} ",regex);
Regex rx = new Regex( regex ) ;
Match m = rx.Match( src ) ;
Console.WriteLine( "N={0,-2}, N-1={1,-2}, {2}" ,
n ,
n1 ,
m.Success ? "success: " + m.Value : "failure"
) ;
}
It produces this output:
regex: (?<=^(\d+(-|$)){0})\d+ N= 1, N-1=0 , success: 1
regex: (?<=^(\d+(-|$)){1})\d+ N= 2, N-1=1 , success: 22
regex: (?<=^(\d+(-|$)){2})\d+ N= 3, N-1=2 , success: 333
regex: (?<=^(\d+(-|$)){3})\d+ N= 4, N-1=3 , success: 4444
regex: (?<=^(\d+(-|$)){4})\d+ N= 5, N-1=4 , success: 55555
regex: (?<=^(\d+(-|$)){5})\d+ N= 6, N-1=5 , success: 666666
regex: (?<=^(\d+(-|$)){6})\d+ N= 7, N-1=6 , success: 7777777
regex: (?<=^(\d+(-|$)){7})\d+ N= 8, N-1=7 , success: 88888888
regex: (?<=^(\d+(-|$)){8})\d+ N= 9, N-1=8 , success: 999999999
regex: (?<=^(\d+(-|$)){9})\d+ N=10, N-1=9 , failure

Try this:
string text = "16-15316-273";
Regex r = new Regex(#"\d+");
var m = r.Match(text, text.IndexOf('-'));
The output is 15316 ;)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Getting string between characters in Regex - c#

You can use this pattern: "=\\s([A-Za-z0-9]{2})," //or "=\\s(\\w+)," Code: string str = "stop = F6, quantity ( 1 ) "; var res = Regex.Matches(str, "=\\s([A-Za-z0-9]{2}),")[0].Groups[1].Value;

Related

C# check if characters occur in a fixed order in a string

Replace regex Match with other value

Regex to split "&" in URL parameters only if they are followed by content ending with "="

capturing specific or group in regex C#

How do I find the Nth occurrence of a pattern with regex?

Categories

Resources