How can i match inner expression on nested expression with regular expressions?

How can i match inner expression on nested expression with regular expressions? - c#

I got this code on c#
This works:
string code = "dqwdSTART12sdaSTART12312ENDsdfSTARTasdsaENDasdaENDqwe";
string pattern = "START[^(START)(END)]*END";
But not this:
string code = "dqwdstart12sdastart12312endsdfstartasdsaendasdaendqwe";
string pattern = "start[^(start)(end)]*end";
How can i do the match ?
( preferably c # )

this pattern [^(start)(end)] does not mean what you think, it does not mean non of the words but non of the characters enclosed between [ and ]
the only reason why it worked is because you had numbers between start and end, if you add a letter like s it won't work.
use this pattern instead
START((?:(?!START|END).)*)END
with gi options
Demo
START # "START"
( # Capturing Group (1)
(?: # Non Capturing Group
(?! # Negative Look-Ahead
START # "START"
| # OR
END # "END"
) # End of Negative Look-Ahead
. # Any character except line break
) # End of Non Capturing Group
* # (zero or more)(greedy)
) # End of Capturing Group (1)
END # "END"

(?<=start)(?:(?!start|end).)*(?=end)
You can try this as well if you dont want to capture start and end and just the content between.See demo,
http://regex101.com/r/yP3iB0/23

Related

Find all words enclosed within #{{Word}}# or {{Word}} from string

I am writing code to extract all the words enclosed within #{{}}# and {{}}, so far I have searched the web and found the below code which works as expected.
string sampleString = "A #{{Quick}}# brown #{{fox}}# jumps #{{over}}# a lazy {{dog}}.";
List<string> keywordList = new List<string>();
MatchCollection matchedCollection = Regex.Matches(sampleString, #"(#{{(.*?)}}#|{{(.*?)}})");
foreach (Match m in matchedCollection)
{
keywordList.Add(m.ToString());
}
Above code works fine, it gives me 4 items listed below and that is correct.
#{{Quick}}#
#{{Fox}}#
#{{Over}}#
{{dog}}
But, the problem arises when the word is not properly enclosed in the brackets/pattern. For example, if I have improperly formatted string like below, I'll get incorrect result.
string sampleString = "A #{{Quick}}# brown #{{fox jumps #{{over}}# a lazy {{dog}}.";
Code with above string input will give me three items in list.
Current Result:
#{{Quick}}#
#{{fox jumps #{{over}}#
{{dog}}
Expected Result
#{{Quick}}#
#{{over}}#
{{dog}}
Any suggestion to correct this would be really appreciated.

If all you want is words-without-spaces inside, then you can use \S instead of . (any character that isn't a space instead of any character)
MatchCollection matchedCollection = Regex.Matches(sampleString, #"(#{{(\S*?)}}#|{{(\S*?)}})");

The repetition in the regex seems redundant, unless you really care to know whether the # were present or not. If you're stripping them off then Regex.Matches(sampleString, #"{{(\S*?)}}") will be fine

If the double curly braces should be exact 2, you can use lookarounds to assert that there are no more curly braces before and after.
(#?)(?<!{){{(?!{)\S+?(?<!})}}(?!})\1
Explanation
(#?) Capture group 1, match an optional #
(?<!{){{(?!{) Match {{ not preceded or followed by {
\S+? Match 1+ times as least as possible non whitespace chars
(?<!})}}(?!}) Match }} not preceded or followed by }
\1 Backreference to what is captured in group 1
.NET regex demo
If it is no problem to have more than 2 curly braces, you can omit the lookarounds
(#?){{\S+?}}\1
.NET regex demo

You should not worry if your matches contain spaces or not, use a proper regex like
(#)?{{(?:(?!#{{).)*?}}(?(1)#|)
See proof.
C# code:
var matchedCollection = Regex.Matches(sampleString, #"(#)?{{(?:(?!#{{).)*?}}(?(1)#|)", RegexOptions.Singleline);
Explanation
--------------------------------------------------------------------------------
( group and capture to \1 (optional):
--------------------------------------------------------------------------------
# '#'
--------------------------------------------------------------------------------
)? end of \1
--------------------------------------------------------------------------------
{{ '{{'
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the least amount possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
#{{ '#{{'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
. any character (including \n
with RegexOptions.Singleline)
--------------------------------------------------------------------------------
)*? end of grouping
--------------------------------------------------------------------------------
}} '}}'
--------------------------------------------------------------------------------
(?(1) if back-reference \1 matched, then:
--------------------------------------------------------------------------------
# '#'
--------------------------------------------------------------------------------
| else:
--------------------------------------------------------------------------------
succeed
--------------------------------------------------------------------------------
) end of conditional on \1

Remove First and Last Specific Char

I dont know is that duplicate or not,but i not found same(maybe i not found because it got hard title)
So,i have a this string:
string a = "(Hello(World),World(Hello))";
And i need to remove a first Bracket,and last Bracket.
And get that output:
Hello(World),World(Hello)
I not need to remove first char and last.
I need to remove first specific char(bracket) and last specific char(close bracket).
That says,if string is be:
string a = "gyfw(Hello(World),World(Hello))";
Output is be:
gyfw Hello(World),World(Hello)

To remove first specific char:
a = a.Remove(a.IndexOf("("), 1);
To remove last specific char:
a = a.Remove(a.LastIndexOf(")"), 1);

In a balanced way, it can be done with this regex
Find #"\(((?>[^()]+|\((?<Depth>)|\)(?<-Depth>))*(?(Depth)(?!)))\)"
Replace #"$1"
If it is required to have inner parens change the * to a +.
If it is required that it should only match once and span the string, add ^ and $ to beginning / end respectively to the regex.
Here is the regex explained
\( # Match ( a open parenth
( # (1 start), Capture the core, to be written back
(?> # Then either match (possessively):
[^()]+ # Any character except parenths
| # or
\( # Open ( increase the paren counter
(?<Depth> )
| # or
\) # Close ) decrease the paren counter
(?<-Depth> )
)* # Repeat as needed.
(?(Depth) # Assert that the paren counter is at zero.
(?!)
)
) # (1 end)
\) # Match ) a closing parenth

Use the String.Substring() method to remove specific character.
So, if your string is stored in a variable myval:
myval = myval.Substring(1, myval.Length - 1);

Validator for file name with custom words between curly braces

I have regex:
[\w,\s-]+\.[A-Za-z]+$
and a filename:
test-file_name-5.pdf
And it works okay. But now I want to add something like this:
my-filename{time}.pdf
or this:
test{word}hello.pdf
and the regex should accept it.
If there is only opening/closing curly brace, it should fail. The braces could contain a-Z0-9.
I tried with RegExr but couldn't do it.

You can use the following regex:
^[\w,\s-]+(?:(?:{[A-Za-z\d]+}[\w,\s-]*)?)*\.[A-Za-z]+$
Explanation:
^ # Assert position at the beginning of the string
[\w,\s-]+ # Beginning of the filename
(?: # Begin group
(?: # Begin group
{[A-Za-z\d]+} # Match {...} part
[\w,\s-]* # Followed by optional characters
)? # Make the group optional
)* # Repeat the group zero or more times
\.[A-Za-z]+ # Match the filename extension
$ # Assert position at the end of the string
This matches:
test-file_name-5.pdf
my-filename{23m}.pdf
test{word1}hello{word2}xyz.pdf
test{word}hello.pdf
But doesn't match:
foo-filename{23m.pdf
foo-filename23m}.pdf
RegEx Demo

C# Regular Expression excluding a string

I got a collection of string and all i want for regex is to collect all started with http..
href="http://www.test.com/cat/1-one_piece_episodes/"href="http://www.test.com/cat/2-movies_english_subbed/"href="http://www.test.com/cat/3-english_dubbed/"href="http://www.exclude.com"
this is my regular expression pattern..
href="(.*?)[^#]"
and return this
href="http://www.test.com/cat/1-one_piece_episodes/"
href="http://www.test.com/cat/2-movies_english_subbed/"
href="http://www.xxxx.com/cat/3-english_dubbed/"
href="http://www.exclude.com"
what is the pattern for excluding the last match.. or excluding matches that has the exclude domain inside like href="http://www.exclude.com"
EDIT:
for multiple exclusion
href="((?:(?!"|\bexclude\b|\bxxxx\b).)*)[^#]"

#ridgerunner and me would change the regex to:
href="((?:(?!\bexclude\b)[^"])*)[^#]"
It matches all href attributes as long as they don't end in # and don't contain the word exclude.
Explanation:
href=" # Match href="
( # Capture...
(?: # the following group:
(?! # Look ahead to check that the next part of the string isn't...
\b # the entire word
exclude # exclude
\b # (\b are word boundary anchors)
) # End of lookahead
[^"] # If successful, match any character except for a quote
)* # Repeat as often as possible
) # End of capturing group 1
[^#]" # Match a non-# character and the closing quote.
To allow multiple "forbidden words":
href="((?:(?!\b(?:exclude|this|too)\b)[^"])*)[^#]"

Your input doesn't look like a valid string (unless you escape the quotes in them) but you can do it without regex too:
string input = "href=\"http://www.test.com/cat/1-one_piece_episodes/\"href=\"http://www.test.com/cat/2-movies_english_subbed/\"href=\"http://www.test.com/cat/3-english_dubbed/\"href=\"http://www.exclude.com\"";
List<string> matches = new List<string>();
foreach(var match in input.split(new string[]{"href"})) {
if(!match.Contains("exclude.com"))
matches.Add("href" + match);
}

Will this do the job?
href="(?!http://[^/"]+exclude.com)(.*?)[^#]"

Regular expression to find separator dots in formula

The C# expression library I am using will not directly support my table/field parameter syntax:
The following are table/field parameter names that are not directly supported:
TableName1.FieldName1
[TableName1].[FieldName1]
[Table Name 1].[Field Name 1]
It accepts alphanumeric parameters without spaces, or most characters enclosed within square brackets. I would like to use C# regular expressions to replace the dot separators and neighboring brackets to a different delimiter, so the results would be as follows:
[TableName1|FieldName1]
[TableName1|FieldName1]
[Table Name 1|Field Name 1]
I also need to skip any string literals within single quotes, like:
'TableName1.FieldName1'
And, of course, ignore any numeric literals like:
12345.6789
EDIT: Thank you for your feedback on improving my question. Hopefully it is clearer now.

I've written a completely new answer, now that the problem is clarified:
You can do this in a single regex. It is quite bulletproof, I think, but as you can see, it's not exactly self-explanatory, which is why I've commented it liberally. Hope it makes sense.
You're lucky that .NET allows re-use of named capturing groups, otherwise you would have had to do this in several steps.
resultString = Regex.Replace(subjectString,
#"(?: # Either match...
(?<before> # (and capture into backref <before>)
(?=\w*\p{L}) # (as long as it contains at least one letter):
\w+ # one or more alphanumeric characters,
) # (End of capturing group <before>).
\. # then a literal dot,
(?<after> # (now capture again, into backref <after>)
(?=\w*\p{L}) # (as long as it contains at least one letter):
\w+ # one or more alphanumeric characters.
) # (End of capturing group <after>) and end of match.
| # Or:
\[ # Match a literal [
(?<before> # (now capture into backref <before>)
[^\]]+ # one or more characters except ]
) # (End of capturing group <before>).
\]\.\[ # Match literal ].[
(?<after> # (capture into backref <after>)
[^\]]+ # one or more characters except ]
) # (End of capturing group <after>).
\] # Match a literal ]
) # End of alternation. The match is now finished, but
(?= # only if the rest of the line matches either...
[^']*$ # only non-quote characters
| # or
[^']*'[^']*' # contains an even number of quote characters
[^']* # plus any number of non-quote characters
$ # until the end of the line.
) # End of the lookahead assertion.",
"[${before}|${after}]", RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How can i match inner expression on nested expression with regular expressions? - c#

(?<=start)(?:(?!start|end).)*(?=end) You can try this as well if you dont want to capture start and end and just the content between.See demo, http://regex101.com/r/yP3iB0/23

Related

Find all words enclosed within #{{Word}}# or {{Word}} from string

Remove First and Last Specific Char

Validator for file name with custom words between curly braces

C# Regular Expression excluding a string

Regular expression to find separator dots in formula

Categories

Resources