I was creating some regex for matching strings like :
'pulkit'
'989'
basically anything in between the two single quotes.
so I created a regex something like ['][^']*['].
But this is not working for cases like:
'burger king's'. The expected output is burger king's but from my logic
it is burger king only.
As an another example 'pulkit'sharma' the expected output should be pulkit'sharma
So can anyone help me in this ? How to escape single quotes in this case.
Try a positive lookahead to match a space or end of line for matching the closing single quote
'.+?'(?=\s|$)
Demo
You may match single quote that is not preceded with a word char and is followed with a word char, and match any text up to the ' that is preceded with a word char and not followed with a word char:
(?s)\B'\b(.*?)\b'\B
See the .NET regex demo.
Note you do not have to wrap single quotation marks with square brackets, they are not special regex metacharacters.
C# code:
var matches = Regex.Matches(text, #"(?s)\B'\b(.*?)\b'\B")
.Cast<Match>()
.Select(x => x.Groups[1].Value)
.ToList();
Related
I have created a Regex Pattern (?<=[TCC|TCC_BHPB]\s\d{3,4})[-_\s]\d{1,2}[,]
This Pattern match just:
TCC 6005_5,
What should I change to the end to match these both strings:
TCC 6005-5 ,
TCC 6005_5,
You can add a non-greedy wildcard to your expression (.*?):
(?<=(?:TCC|TCC_BHPB)\s\d{3,4})[-_\s]\d{1,2}.*?[,]
^^^
This will now also match any characters between the last digit and the comma.
As has been pointed out in the comments, [TCC|TCC_BHPB] is a character class rather than a literal match, so I've changed this to (?:TCC|TCC_BHPB) which is presumably what your intention was.
Try it online
This part of the pattern [TCC|TCC_BHPB] is a character class that matches one of the listed characters. It might also be written for example as [|_TCBHP]
To "match" both strings, you can match all parts instead of using a positive lookbehind.
\bTCC(?:_BHPB)?\s\d{3,4}[-_\s]\d{1,2}\s?,
See a regex demo
\bTCC A word boundary to prevent a partial match, then match TCC
(?:_BHPB)?\s\d{3,4} Optionally match _BHPB, match a whitespace char and 3-4 digits (Use [0-9] to match a digit 0-9)
[-_\s]\d{1,2} Match one of - _ or a whitespace char
\s?, Match an optional space and ,
Note that \s can also match a newline.
Using the lookbehind:
(?<=TCC(?:_BHPB)?\s\d{3,4})[-_\s]\d{1,2}\s?,
Regex demo
Or if you want to match 1 or more spaces except a newline
\bTCC(?:_BHPB)?[\p{Zs}\t][0-9]{3,4}[-_\p{Zs}\t][0-9]{1,2}[\p{Zs}\t]*,
Regex demo
I'm trying to write a regular expression to transform words written like "H e l l o Everyone" to "Hello Everyone".
If it is words separated by spaces like "Hello everyone, how are you?", nothing should happen.
Basically all single characters should be squeezed to a make a word and we can consider if it is more than 2 characters only are following this pattern.
If it is like "a b cdef" - Nothing should happen
But "a b c def" -> "abc def"
I tried something like this "^\w(?:(\s)\w)*$" but it is matching with "Hello world" as well.
And also, I'm not sure on how to squeeze these single characters.
Any help is greatly appreciated.
Thanks!
I suggest to match chunks of single word chars separated with single whitespaces and then removing the spaces inside within a match evaluator.
The regex is
(?<!\S)\w(?:\s\w){2,}(?!\S)
See its demo at RegexStorm. The (?<!\S) and (?!\S) make sure these chunks are enclosed with whitespaces (or are at string start/end).
Details:
(?<!\S) - a negative lookbehind making sure there is a whitespace or start of string immediately before the current location
\w - a word char (letter/digit/underscore, to match a letter, use \p{L} instead)
(?:\s\w){2,} - 2 or more sequences of:
\s - a whitespace
\w - a word char
(?!\S) - a negative lookahead making sure there is a whitespace or start of string immediately after the current location
See the C# demo:
var res = Regex.Replace(s, #"(?<!\S)\w(?:\s\w){2,}(?!\S)", m =>
new string(m.Value
.Where(c => !Char.IsWhiteSpace(c))
.ToArray()));
If you're looking for a pure regex solution,
Regex.Replace(s, #"(?<=^\w|(\s\w)+)\s(?=(\w\s)+|\w$)", string.Empty);
replaces a space with at least one space and letter pair on each side with nothing (with a little extra to handle start/end of the string).
Trying to learn a little more about using Regex (Regular expressions). Using Microsoft's version of Regex in C# (VS 2010), how could I take a simple string like:
"Hello"
and change it to
"H e l l o"
This could be a string of any letter or symbol, capitals, lowercase, etc., and there are no other letters or symbols following or leading this word. (The string consists of only the one word).
(I have read the other posts, but I can't seem to grasp Regex. Please be kind :) ).
Thanks for any help with this. (an explanation would be most useful).
You could do this through regex only, no need for inbuilt c# functions.
Use the below regexes and then replace the matched boundaries with space.
(?<=.)(?!$)
DEMO
string result = Regex.Replace(yourString, #"(?<=.)(?!$)", " ");
Explanation:
(?<=.) Positive lookbehind asserts that the match must be preceded by a character.
(?!$) Negative lookahead which asserts that the match won't be followed by an end of the line anchor. So the boundaries next to all the characters would be matched but not the one which was next to the last character.
OR
You could also use word boundaries.
(?<!^)(\B|b)(?!$)
DEMO
string result = Regex.Replace(yourString, #"(?<!^)(\B|b)(?!$)", " ");
Explanation:
(?<!^) Negative lookbehind which asserts that the match won't be at the start.
(\B|\b) Matches the boundary which exists between two word characters and two non-word characters (\B) or match the boundary which exists between a word character and a non-word character (\b).
(?!$) Negative lookahead asserts that the match won't be followed by an end of the line anchor.
Regex.Replace("Hello", "(.)", "$1 ").TrimEnd();
Explanation
The dot character class matches every character of your string "Hello".
The paranthesis around the dot character are required so that we could refer to the captured character through the $n notation.
Each captured character is replaced by the replacement string. Our replacement string is "$1 " (notice the space at the end). Here $1 represents the first captured group in the input, therefore our replacement string will replace each character by that character plus one space.
This technique will add one space after the final character "o" as well, so we call TrimEnd() to remove that.
A demo can be seen here.
For the enthusiast, the same effect can be achieve through LINQ using this one-liner:
String.Join(" ", YourString.AsEnumerable())
or if you don't want to use the extension method:
String.Join(" ", YourString.ToCharArray())
It's very simple. To match any character use . dot and then replace with that character along with one extra space
Here parenthesis (...) are used for grouping that can be accessed by $index
Find what : "(.)"
Replace with "$1 "
DEMO
I have a string like following,
hi,hello,-LSB-,ASPECT,-RSB-,you
I want to extract sub-string that comes before -LSB-,ASPECT, till comma, hello in this case.
I have written regular expression like
\b\w+[/-/,LSB/-/,ASPECT]
however it extracts entire substring before and inclusing-LSB-,ASPECT, till start like,
hi,hello,-LSB-,ASPECT
Any clue??
The regex for this (using a positive lookahead assertion) would be
[^,]*(?=,-LSB-,ASPECT,)
Explanation:
[^,]* # Match any number of characters except commas
(?= # until the following regex can be matched:
,-LSB-,ASPECT, # the literal text ",-LSB-,ASPECT,".
) # (End of lookahead assertion)
Careful, square brackets create a character class which you don't want in this case.
Live demo
Try this:
(\w+),-LSB-,ASPECT
I have been using Regex to match strings embedded in square brackets [*] as:
new Regex(#"\[(?<name>\S+)\]", RegexOptions.IgnoreCase);
I also need to match some codes that look like:
[TESTTABLE: A, B, C, D]
it has got spaces, comma, colon
Can you please guide me how can I modify my above Regex to include such codes.
P.S. other codes have no spaces/special charaters but are always enclosed in [...].
Regex myregex = new Regex(#"\[([^\]]*)]")
will match all characters that are not closing brackets and that are enclosed between brackets. Capture group \1 will match the content between brackets.
Explanation (courtesy of RegexBuddy):
Match the character “[” literally «\[»
Match the regular expression below and capture its match into backreference number 1 «([^\]]*)»
Match any character that is NOT a ] character «[^\]]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “]” literally «]»
This will also work if you have more than one pair of matching brackets in the string you're looking at. It will not work if brackets can be nested, e. g. [Blah [Blah] Blah].
/\[([^\]:])*(?::([^\]]*))?\]/
Capture group 1 will contain the entire tag if it doesn't have a colon, or the part before the colon if it does.
Capture group 2 will contain the part after the colon. You can then split on ',' and trim each entry to get the individual parts.