match optional special characters - c#

I have a question that has asked before in this link, but there is no right answer in the link. I have some sql query text and I want to get all function's names (the whole name, contain schema) that has created in these.
my string may be like this:
create function [SN].[FunctionName] test1 test1 ...
create function SN.FunctionName test2 test2 ...
create function functionName test3 test3 ...
and I want to get both [SN].[FunctionName] and SN.FunctionName,
I tried this regex :
create function (.*?\]\.\[.*?\])
but this returns only the first statement, how can I make those brackets optional in the regex expression?

This one works for me:
create function\s+\[?\w+\]?\.\[?\w+\]?
val regExp = "create function" + //required string literal
"\s+" + //allow to have several spaces before the function name
"\[?" + // '[' is special character, so we quote it and make it optional using - '?'
"\w+" + // only letters or digits for the function name
"\]?" + // optional close bracket
"\." + // require to have point, quote it with '\' because it is a special character
"\[?" + //the same as before for the second function name
"\w+" +
"\]?"
See test example: http://regexr.com/3bo0e

You can use lookarounds:
(?<=create function )(\s*\S+\..*?)(?=\s)
Demo on regex101.com
It captures everything between create function literal followed by one or more spaces and another space assuming the matched string contains at least one dot char.

To make some subpattern optional, you need to use the ? quantifier that matches 1 or 0 occurrences of the preceding subpattern.
In your case, you can use
create[ ]function[ ](?<name>\[?[^\]\s.]*\]?\.\[?[^\]\s.]*\]?)
^ ^ ^ ^
The regex matches a string starting with create function and then matching:
var rx = new Regex(#"create[ ]function[ ]
(?<name>\[? # optional opening square bracket
[^\]\s.]* # 0 or more characters other than `.`, whitespace, or `]`
\]? # optional closing square bracket
\. # a literal `.`
\[? # optional opening square bracket
[^\]\s.]* # 0 or more characters other than `.`, whitespace, or `]`
\]? # optional closing square bracket
)", RegexOptions.IgnorePatternWhitespace);
See demo

Related

Greedy regex finding latest brace

I'm trying to parse some variable definition to extract documentation automatically, but I'm having trouble skipping some } which potentially appear in the default value.
Here's a sample...
variable "a" {
type = string
description = "A desc"
default = ""
}
variable "b" {
type = map()
description = "B desc"
default = {}
}
variable "c" {
type = list(string)
description = "C desc"
default = []
}
And the regex I'm using
variable.\"(?<name>\w+)\"(.*?)description.=."(?<desc>[^"\\]*(?:\\.[^"\\]*)*)"(.*?)}
with a replace of
'* `${name}`: ${desc}
This gives the output
* `a`: A desc
* `b`: B desc
}
* `c`: C Desc
I need the regex to be in single line mode and non-greedy so it stays within each variable definition, but then I can't seem to stop it matching on the first trailing } it finds. What would be good is if could match ^} - but again we are in single line mode so it doesn't apply.
See if this will work for your dataset:
variable.\"(?<name>\w+)\"(.*?)description.=."(?<desc>[^"\\]*(?:\\.[^"\\]*)*)"(.*?)(?=[^}]+?variable|$)
Try it on Regex101
Here I replaced } at the end with (?=[^}]+?variable|$). This should ensure that the last capturing group will keep consuming characters until there are no more closing braces before the next variable (or the end of the input).
You can match the variable and the description values and match all lines in between that do not start with } using a negative lookahead.
variable\s*"(?<name>\w+)"\s*{(?:(?!\r?\n}|\bdescription\b).)*description\s*=\s*"(?<desc>[^"]*(?:\\.[^"]*)*)"(?:(?!\r?\n}).)*\r?\n}
Explanation
variable\s*" Match variable and then "
(?<name>\w+) Group name, match 1+ word chars
"\s*{ Match optional whitespace chars and {
(?:(?!\r?\n}|\bdescription\b).)* Match any char using the dot (Single line mode) when what is directly to the right is not a newline and } or description
description\s*=\s*" match description=" with optional whitespace chars around the = and then "
(?<desc>[^"]*(?:\\.[^"]*)*) Named group desc to capture the description
" Match the closing "
(?:(?!\r?\n}).)* Match any char (Using the Single line mode) when what is directly to the right is not a newline and }
\r?\n} Match a newline and }
.Net regex demo
It's quite verbose, but a bit more optimized pattern might be
variable\s*"(?<name>\w+)"\s*{[^}d]*(?>}(?<!\r?\n.)[^}]*|(?!\bdescription\s*=\s*"[^"]*")d[^d]*)*\bdescription\s*=\s*"(?<desc>[^"]*(?:\\.[^"]*)*)"[^}]*(?:}(?<!\r?\n.)[^}]*)*\r?\n}
Regex demo

Regex to find special pattern

I have a string to parse. First I have to check if string contains special pattern:
I wanted to know if there is substrings which starts with "$(",
and end with ")",
and between those start and end special strings,there should not be
any white-empty space,
it should not include "$" character inside it.
I have a little regex for it in C#
string input = "$(abc)";
string pattern = #"\$\(([^$][^\s]*)\)";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(input);
foreach (var match in matches)
{
Console.WriteLine("value = " + match);
}
It works for many cases but failed at input= $(a$() , which inside the expression is empty. I wanted NOT to match when input is $().[ there is nothing between start and end identifiers].
What is wrong with my regex?
Note: [^$] matches a single character but not of $
Use the below regex if you want to match $()
\$\(([^\s$]*)\)
Use the below regex if you don't want to match $(),
\$\(([^\s$]+)\)
* repeats the preceding token zero or more times.
+ Repeats the preceding token one or more times.
Your regex \(([^$][^\s]*)\) is wrong. It won't allow $ as a first character inside () but it allows it as second or third ,, etc. See the demo here. You need to combine the negated classes in your regex inorder to match any character not of a space or $.
Your current regex does not match $() because the [^$] matches at least 1 character. The only way I can think of where you would have this match would be when you have an input containing more than one parens, like:
$()(something)
In those cases, you will also need to exclude at least the closing paren:
string pattern = #"\$\(([^$\s)]+)\)";
The above matches for example:
abc in $(abc) and
abc and def in $(def)$()$(abc)(something).
Simply replace the * with a + and merge the options.
string pattern = #"\$\(([^$\s]+)\)";
+ means 1 or more
* means 0 or more

check is valid my string in custom format? the number of brackets

I need a regex for check this format:
[some digits][some digits][some digits][some digits][some digits][some digits]#
"some digits" means each number (0 or 1 or 2 or 3 or .... ), 2 digits, 3 digits, or more...
but it's important that each open bracket be closed before another open one...
actually I want to check the format and also get the number of [].
I tried this code for getting number of [] :
Regex.Matches( input, "[]" ).Count
but it didnt work.
thanks for helping
This is the regex you're looking for:
^(\[\d+\])+#$
See the demo.
Sample Code for the Count
var myRegex = new Regex(#"^(\[\d+\])+#$");
string bracketCount = myRegex.Match(yourString).Groups[1].Count;
Explanation
The ^ anchor asserts that we are at the beginning of the string
( starts capture Group 1
\[opens a bracket
\d+ matches one or more digits
\] matches the closing bracket
) closes Group 1
+ matches this 1 or more times
# the hash
The $ anchor asserts that we are at the end of the string

Regex problems with equal sign?

In C# I'm trying to validate a string that looks like:
I#paramname='test'
or
O#paramname=2827
Here is my code:
string t1 = "I#parameter='test'";
string r = #"^([Ii]|[Oo])#\w=\w";
var re = new Regex(r);
If I take the "=\w" off the end or variable r I get True. If I add an "=\w" after the \w it's False. I want the characters between # and = to be able to be any alphanumeric value. Anything after the = sign can have alphanumeric and ' (single quotes). What am I doing wrong here. I very rarely have used regular expressions and normally can find example, this is custom format though and even with cheatsheets I'm having issues.
^([Ii]|[Oo])#\w+=(?<q>'?)[\w\d]+\k<q>$
Regular expression:
^ start of line
([Ii]|[Oo]) either (I or i) or (O or o)
\w+ 1 or more word characters
= equals sign
(?<q>'?) capture 0 or 1 quotes in named group q
[\w\d]+ 1 or more word or digit characters
\k<q> repeat of what was captured in named group q
$ end of line
use \w+ instead of \w to one character or more. Or \w* to get zero or more:
Try this: Live demo
^([Ii]|[Oo])#\w+=\'*\w+\'*
If you are being a bit more strict with using paramname:
^([Ii]|[Oo])#paramname=[']?[\w]+[']?
Here is a demo
You could try something like this:
Regex rx = new Regex( #"^([IO])#(\w+)=(.*)$" , RegexOptions.IgnoreCase ) ;
Match group 1 will give you the value of I or O (the parameter direction?)
Match group 2 will give you the name of the parameter
Match group 3 will give you the value of the parameter
You could be stricter about the 3rd group and match it as
(([^']+)|('(('')|([^']+))*'))
The first alternative matches 1 or more non quoted character; the second alternative match a quoted string literal with any internal (embedded) quotes escape by doubling them, so it would match things like
'' (the empty string
'foo bar'
'That''s All, Folks!'

Replace any character before <usernameredacted#example.com> with an empty string

I have this string
AnyText: "jonathon" <usernameredacted#example.com>
Desired Output Using Regex
AnyText: <usernameredacted#example.com>
Omit anything in between !
I am still a rookie at regular expressions. Could anyone out there help me with the matching & replacing expression for the above scenario?
Try this:
string input = "jonathon <usernameredacted#example.com>";
string output = Regex.Match(input, #"<[^>]+>").Groups[0].Value;
Console.WriteLine(output); //<usernameredacted#example.com>
You could use the following regex to match all the characters that you want to replace with an empty string:
^[^<]*
The first ^ is an anchor to the beginning of the string. The ^ inside the character class means that the character class is a negation. ie. any character that isn't an < will match. The * is a greedy quantifier. So in summary, this regex will swallow up all characters from the beginning of the string until the first <.
Here is the way to do it in VBA flavor: Replace "^[^""]*" with "".
^ marks the start of the sentence.
[^""]* marks anything other than a
quote sign.
UPDATE:
Since in your additional comment you mentioned you wanted to grab the "From:" and the email address, but none of the junk in between or after, I figure instead of replace, extract would be better. Here is a VBA function written for Excel that will give you back all the subgroup matches (everything you put in parenthesis) and nothing else.
Function RegexExtract(ByVal text As String, _
ByVal extract_what As String) As String
Application.ScreenUpdating = False
Dim i As Long
Dim result As String
Dim allMatches As Object
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
RE.Pattern = extract_what
RE.Global = True
Set allMatches = RE.Execute(text)
For i = 0 To allMatches.Item(0).submatches.count - 1
result = result & allMatches.Item(0).submatches.Item(i)
Next
RegexExtract = result
Application.ScreenUpdating = True
End Function
Using this code, your regex call would be: "^(.+: ).+(<.+>).*"
^ denotes start of sentence
(.+: ) denotes first match group. .+ is one or more characters, followed by : and a space
.+ denotes one or more characters
(<.+>) denotes second match group.
< is <, then .+ for one or more characters, then the final >
.* denotes zero or more
characters.
So in excel you'd use (assuming cell is A1):
=RegexExtract(A1, "^(.+: ).+(<.+>).*")

Categories