regex to get substring before substring - c#

I have a string like following,
hi,hello,-LSB-,ASPECT,-RSB-,you
I want to extract sub-string that comes before -LSB-,ASPECT, till comma, hello in this case.
I have written regular expression like
\b\w+[/-/,LSB/-/,ASPECT]
however it extracts entire substring before and inclusing-LSB-,ASPECT, till start like,
hi,hello,-LSB-,ASPECT
Any clue??

The regex for this (using a positive lookahead assertion) would be
[^,]*(?=,-LSB-,ASPECT,)
Explanation:
[^,]* # Match any number of characters except commas
(?= # until the following regex can be matched:
,-LSB-,ASPECT, # the literal text ",-LSB-,ASPECT,".
) # (End of lookahead assertion)
Careful, square brackets create a character class which you don't want in this case.

Live demo
Try this:
(\w+),-LSB-,ASPECT

Related

Regex for escaping some single quotes

I was creating some regex for matching strings like :
'pulkit'
'989'
basically anything in between the two single quotes.
so I created a regex something like ['][^']*['].
But this is not working for cases like:
'burger king's'. The expected output is burger king's but from my logic
it is burger king only.
As an another example 'pulkit'sharma' the expected output should be pulkit'sharma
So can anyone help me in this ? How to escape single quotes in this case.
Try a positive lookahead to match a space or end of line for matching the closing single quote
'.+?'(?=\s|$)
Demo
You may match single quote that is not preceded with a word char and is followed with a word char, and match any text up to the ' that is preceded with a word char and not followed with a word char:
(?s)\B'\b(.*?)\b'\B
See the .NET regex demo.
Note you do not have to wrap single quotation marks with square brackets, they are not special regex metacharacters.
C# code:
var matches = Regex.Matches(text, #"(?s)\B'\b(.*?)\b'\B")
.Cast<Match>()
.Select(x => x.Groups[1].Value)
.ToList();

Regex to insert and replace characters in a string C#

I have a string which looks like this :-
"$.ConfigSettings.DatabaseSettings.DatabaseConnections.SqlConnectionString.0.Id"
and I want the result to look like this :-
"$.ConfigSettings.DatabaseSettings.DatabaseConnections.SqlConnectionString[0].Id"
Basically wherever there is a single digit preceded and succeeded by a period I need to change it to [digit] followed by period ie [digit]. .I have seen tons of examples where people are only replacing the regex string.
How will I do this using Regex.Replace in C#
Regex.Replace(input, #"\.(\d)(?=\.)", "[$1]")
\. - capture a "."
(\d) - then a single digit in a capturing group ($1 in the replacement)
(?= - start a positive lookahead
\. - that matches a "."
) - end the lookahead
So, it means : (match a dot followed by a digit in a capturing group) only if it is followed by a dot
So we matched ".0" and captured "0". We replace the entire match with "[$1]", where $1 refers to the first captured group.
See "Grouping Constructs in Regular Expressions" : https://msdn.microsoft.com/en-us/library/bs2twtah(v=vs.110).aspx for information about the different grouping constructs that I use in this solution.

How to insert spaces between characters using Regex?

Trying to learn a little more about using Regex (Regular expressions). Using Microsoft's version of Regex in C# (VS 2010), how could I take a simple string like:
"Hello"
and change it to
"H e l l o"
This could be a string of any letter or symbol, capitals, lowercase, etc., and there are no other letters or symbols following or leading this word. (The string consists of only the one word).
(I have read the other posts, but I can't seem to grasp Regex. Please be kind :) ).
Thanks for any help with this. (an explanation would be most useful).
You could do this through regex only, no need for inbuilt c# functions.
Use the below regexes and then replace the matched boundaries with space.
(?<=.)(?!$)
DEMO
string result = Regex.Replace(yourString, #"(?<=.)(?!$)", " ");
Explanation:
(?<=.) Positive lookbehind asserts that the match must be preceded by a character.
(?!$) Negative lookahead which asserts that the match won't be followed by an end of the line anchor. So the boundaries next to all the characters would be matched but not the one which was next to the last character.
OR
You could also use word boundaries.
(?<!^)(\B|b)(?!$)
DEMO
string result = Regex.Replace(yourString, #"(?<!^)(\B|b)(?!$)", " ");
Explanation:
(?<!^) Negative lookbehind which asserts that the match won't be at the start.
(\B|\b) Matches the boundary which exists between two word characters and two non-word characters (\B) or match the boundary which exists between a word character and a non-word character (\b).
(?!$) Negative lookahead asserts that the match won't be followed by an end of the line anchor.
Regex.Replace("Hello", "(.)", "$1 ").TrimEnd();
Explanation
The dot character class matches every character of your string "Hello".
The paranthesis around the dot character are required so that we could refer to the captured character through the $n notation.
Each captured character is replaced by the replacement string. Our replacement string is "$1 " (notice the space at the end). Here $1 represents the first captured group in the input, therefore our replacement string will replace each character by that character plus one space.
This technique will add one space after the final character "o" as well, so we call TrimEnd() to remove that.
A demo can be seen here.
For the enthusiast, the same effect can be achieve through LINQ using this one-liner:
String.Join(" ", YourString.AsEnumerable())
or if you don't want to use the extension method:
String.Join(" ", YourString.ToCharArray())
It's very simple. To match any character use . dot and then replace with that character along with one extra space
Here parenthesis (...) are used for grouping that can be accessed by $index
Find what : "(.)"
Replace with "$1 "
DEMO

RegEx to match a number in the second line

I need a regex to match a number in the second line. Similar input is like this:
^C1.1
xC20
SS3
M 4
Decimal pattern (-?\d+(\.\d+)?) matches all numbers and second number can be get in a loop on the code behind but I need a regular expression to get directly the number in the second line.
/^[^\r\n]*\r?\n\D*?(-?\d+(\.\d+)?)/
This operates by capturing a single line at the beginning of the input:
^ Beginning of the string
[^\r\n]* Anything that isn't a line terminator
\r?\n A newline, optionally preceded by a carriage return
Then all the non digit characters, then your numbers.
Since you've now repeatedly changed your needs, try this on for size:
/(?<=\n\D*)-?\d+(\.\d+)?/
I was able to capture it with this regex.
.*\n\D*(\d*).*\n
Check out group 1 of anything that this matches:
^.*?\r\n.*?(\d+)
If that doesn't work, try this:
^.*?\r\n.*?(\d+)
Both are with multiline NOT set...
I would probably use the captured group in /^.*?\r?\n.*?(-?\d+(?:\.\d+)?)/ where…
^ # beginning of string
.*? # anything...
\r?\n # followed by a new line
.*? # anything...
( # followed by...
-? # an optional negative sign (minus)
\d+ # a number
(?: # -this part not captured explicitly-
\.\d+ # a dot and a number
)? # -and is optional-
)
If it is a flavor that supports lookbehind then there are other alternatives.

Regex: How to not match the last character of a word?

I am trying to create a regex that does not match a word (a-z only) if the word has a : on the end but otherwise matches it. However, this word is in the middle of a larger regex and so I (don't think) you can use a negative lookbehind and the $ metacharacter.
I tried this negative lookahead instead:
([a-z]+)(?!:)
but this test case
example:
just matches to
exampl
instead of failing.
If you are using a negative lookahead, you could put it at the beginning:
(?![a-z]*:)[a-z]+
i.e: "match at least one a-z char, except if the following chars are 0 to n 'a-z' followed by a ':'"
That would support a larger regex:
X(?![a-z]*:)[a-z]+Y
would match in the following string:
Xeee Xrrr:Y XzzzY XfffZ
only 'XzzzY'
Try this:
[a-z]\s
([a-z]+\b)(?!:)
asserts a word boundary at the end of the match and thus will fail "exampl"
[a-z]+(?![:a-z])

Categories