I want to match text between word1 and first occurrence of word2. What's the best way to do that, considering that text may include newline characters? Is there a pattern like this: (word1)(not word2)*(word2)?
You could use a lazy quantifier to match as few characters as possible between word1 and word2.
(word1).*?(word2)
See quantifiers topic on MSDN.
You can match them using the SingleLine option:
//use '*' or '*?' depending on what you want for "word1 aaa word2 bbb word2"
string pattern = "word1(.*)word2";
var m = Regex.Match(text1, pattern, RegexOptions.Singleline);
Console.WriteLine(m.Groups[1]); // the result
MSDN about SingleLine :
... causes the regular expression engine to treat the input string as if
it consists of a single line. It does this by changing the behavior of
the period (.) language element so that it matches every character,
instead of matching every character except for the newline character
\n or \u000A.
Related
I have a regex that detect urls:
#"((http|ftp|https)\:\/\/)?([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?";
I am using it with regex.replace to remove urls from text.
I do not want it to replace any word that starts with /images
for example if the text is "this is my text here is a link http://dfdf.com and my is /images/dd.gif"
I need the http://dfdf.com replaces but not the /images/dd.gif
my regex replaces the dd.gif
so I want to negate any word after images/
any idea how can I fix this ?
You may start matching after a word boundary, and fail the match if it is immediately preceded with a whole "word" images/ using
\b(?<!\bimages/)(?:(?:http|ftp)s?://)?([\w-]+(?:\.[\w-]+)+)([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?
See the regex demo. Details:
\b - a word boundary
(?<!\bimages/) - no images/ as a whole word is allowed immediately on the left
(?:(?:http|ftp)s?://)? - an optional sequence of either http or ftp followed with an optional s and then :// substring
([\w-]+(?:\.[\w-]+)+) - Group 1: one or more word or hyphen chars followed with one or more sequences of a . and then one or more word or hyphen chars
([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])? - an optional Group 2: zero or more word chars or chars from the .,#?^=%&:/~+#- set and then a word char or a char from the #?^=%&/~+#- set.
As an alternative solution, you could match match what you don't want to remove and capture what you do want to remove.
You can use a callback with Replace and test for the existence of group 1. If it is there, return an empty string. If it is not there, return the match to leave it unchanged.
\S*/images\S*|(?<!\S)((?:(?:https?|ftp)://)?[\w-]+(?:(?:\.[\w-]+)+)(?:[\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?)
Explanation
\S*/images\S* Match /images preceded and followed by optional non whitespace chars that your want to keep
| Or
(?<!\S) Assert a whitespace boundary to the left
((?:(?:https?|ftp)://)?[\w-]+(?:(?:\.[\w-]+)+)(?:[\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?) The pattern that you tried with some minor changes to make it a bit shorter
Regex demo (Click on the Table tab to see the matches)
For example
var s = #"this is my text here is a link http://dfdf.com and my is /images/dd.gif";
var regex = new Regex(#"\S*/images\S*|(?<!\S)((?:(?:https?|ftp)://)?[\w-]+(?:(?:\.[\w-]+)+)(?:[\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?)");
var result = regex.Replace(s, match => match.Groups[1].Success ? "" : match.Value);
Console.WriteLine(result);
See a C# demo
This pattern is not working sometimes (it works only for the 3rd instance). The pattern is ^\s*flood\s{55}\s+\w+
I am new to regular expression and I am trying to write a regular expression that captures all the following conditions:
Example 1: flood a)
Example 2: flood As respects
Example 3: flood USD100,000
(it's in a tabular format and there's a lot of space between flood and the next word)
Your expression is saying:
^\s* The start of the string may have zero or more whitespace characters
flood followed by the string flood
\s{55} followed by exactly 55 whitespace characters
\s+\w+ followed by one or more whitespace characters and then one or more word characters.
If you want a minimum number of whitespace characters, say at least 30, followed by one or more word chraracters, then you could do this:
^\s*flood\s{30,}\w+
Try this:
string input =
#" flood a)
flood As respects
flood USD100,000";
string pattern = #"^\s*flood\s+.+$";
MatchCollection matches = Regex.Matches(input, pattern, RegexOptions.Multiline);
If there are a lot of spaces between flood and the next word you could omit \s{55} which is a quantifier that matches a whitespace character 55 times.
That would leave you with ^\s*flood\s+\w+ which does not yet match all the values at the end because \w matches a word character but not a whitespace or any of ),.
To match your values you might use a character class and add the characters that you allow to match:
^\s*flood\s+[\w,) ]+
Or if you want to match any character you could use a dot instead of a character class.
According to your comment, you might use a positive lookbehind:
(?<=\(13\. Deductible\))\s*(\s*flood\s+[\w,) ]+)+
Demo
Short version:
How do I match a single, specific character or nothing within a longer, potentially repeating, pattern?
Long version:
I'm forming a regex to count the occurrences of string 'word' in strings which have the specific format of;
a hyphen followed by an integer number (any length) followed by a hyphen followed by the string 'word' followed by a hyphen, potentially repeating.
E.g.
'-0-word-' (1 match)
'-10-word-' (1 match)
'-999-word-' (1 match)
'-1-word-1-word-' (2 matches)
'-1-word-1-word-222-word-' (3 matches) etc.
If the pattern repeats then I think the leading hyphen has to be optional as it is already the trailing hyphen for the previous match.
The best I have come up with so far is;
[-]?\d+-word-
which gives 3 matches for
'-1-word-1-word-222-word-'
but it also gives 3 matches for
'-1-word-1-word-X222-word-'
because the leading hyphen is optional and the 'X' is ignored. I want the leading hyphen to be only a hyphen or nothing. I want to make sure the whole string is rejected (no matches) if the format is not correct.
Thanks for your help!
^-\d+-word(?:-\d+-word)*-$
Try this.See demo.
https://regex101.com/r/wU7sQ0/20
If you want to count the number of occurences and to check the string format at the same time, you can do this:
String input = "-1-word-1-word-222-word-";
String pattern = #"\A(-[0-9]+-word)+-\z";
Match m = Regex.Match(input, pattern);
if (m.Success) {
Console.WriteLine(m.Groups[1].Captures.Count);
}
When you repeat a capture group, each captures are stored, and you can access them with the Captures attribute.
Please try this regex demo for the word occurences.
Pattern is "-\d+-word(-\d+-word)*-"
http://regexr.com/3agkb
I want it to search string like "$12,56,450" using Regex in c#, but it doesn't match the string
Here is my code:
string input="Total earn for the year $12,56,450";
string pattern = #"\b(?mi)($12,56,450)\b";
Regex regex = new Regex(pattern);
if (regex.Match(input).Success)
{
return true;
}
This Regex will do the job, (?mi)(\$\d{2},\d{2},\d{3}), and here's a Regex 101 to prove it.
Now let's break it down a little:
\$ matches the literal $ at the beginning of the string
\d{2} matches any two digits
, matches the literal ,
\d{2} matches any two digits
, matches the literal ,
\d{3} matches any three digits
Now, for the purposes of the demonstration I removed the word boundaries, \b, but I'm also pretty confident you don't need them anyway. See, word boundaries aren't generally necessary for such a finite string match. Consider their definition:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
You need to escape $ and some other special regex caracters.
try this #"\b(?mi)(\$12,56,450)\b";
if you want you can use \d to match a digit, and use \d{2,3} to match a digit with size 2 or 3.
OK, this one is driving me nuts....
I have a string that is formed thus:
var newContent = string.Format("({0})\n{1}", stripped_content, reply)
newContent will display like:
(old text)
new text
I need a regular expression that strips away the text between parentheses with the parenthesis included AND the newline character.
The best I can come up with is:
const string regex = #"^(\(.*\)\s)?(?<capture>.*)";
var match= Regex.Match(original_content, regex);
var stripped_content = match.Groups["capture"].Value;
This works, but I want specifically to match the newline (\n), not any whitespace (\s)
Replacing \s with \n \\n or \\\n does NOT work.
Please help me hold on to my sanity!
EDIT: an example:
public string Reply(string old,string neww)
{
const string regex = #"^(\(.*\)\s)?(?<capture>.*)";
var match= Regex.Match(old, regex);
var stripped_content = match.Groups["capture"].Value;
var result= string.Format("({0})\n{1}", stripped_content, neww);
return result;
}
Reply("(messageOne)\nmessageTwo","messageThree") returns :
(messageTwo)
messageThree
If you specify RegexOptions.Multiline then you can use ^ and $ to match the start and end of a line, respectively.
If you don't wish to use this option, remember that a new line may be any one of the following: \n, \r, \r\n, so instead of looking only for \n, you should perhaps use something like: [\n\r]+, or more exactly: (\n|\r|\r\n).
Actually it works but with opposite option i.e.
RegexOptions.Singleline
You are probably going to have a \r before your \n. Try replacing the \s with (\r\n).
Think I may be a bit late to the party, but still hope this helps.
I needed to get multiple tokens between two hash signs.
Example i/p:
## token1 ##
## token2 ##
## token3_a
token3_b
token3_c ##
This seemed to work in my case:
var matches = Regex.Matches (mytext, "##(.*?)##", RegexOptions.Singleline);
Of course, you may want to replace the double hash signs at both ends with your own chars.
HTH.
Counter-intuitive as it is, you can use both Multiline and Singleline option.
Regex.Match(input, #"(.+)^(.*)", RegexOptions.Multiline | RegexOptions.Singleline)
First capturing group will contain first line (including \r and \n) and second group will have second line.
Why:
First of all RegexOptions enum is flag so it can be combined with bitwise operators, then
Multiline:
^ and $ match the beginning and end of each line (instead of the beginning and end of the input string).
Singleline:
The period (.) matches every character (instead of every character except \n)
see docs