RegEx expression works on regex101 but not in C# [duplicate] - c#
https://regex101.com/r/sB9wW6/1
(?:(?<=\s)|^)#(\S+) <-- the problem in positive lookbehind
Working like this on prod: (?:\s|^)#(\S+), but I need a correct start index (without space).
Here is in JS:
var regex = new RegExp(/(?:(?<=\s)|^)#(\S+)/g);
Error parsing regular expression: Invalid regular expression:
/(?:(?<=\s)|^)#(\S+)/
What am I doing wrong?
UPDATE
Ok, no lookbehind in JS :(
But anyways, I need a regex to get the proper start and end index of my match. Without leading space.
Make sure you always select the right regex engine at regex101.com. See an issue that occurred due to using a JS-only compatible regex with [^] construct in Python.
JS regex - at the time of answering this question - did not support lookbehinds. Now, it becomes more and more adopted after its introduction in ECMAScript 2018. You do not really need it here since you can use capturing groups:
var re = /(?:\s|^)#(\S+)/g;
var str = 's #vln1\n#vln2\n';
var res = [];
while ((m = re.exec(str)) !== null) {
res.push(m[1]);
}
console.log(res);
The (?:\s|^)#(\S+) matches a whitespace or the start of string with (?:\s|^), then matches #, and then matches and captures into Group 1 one or more non-whitespace chars with (\S+).
To get the start/end indices, use
var re = /(\s|^)#\S+/g;
var str = 's #vln1\n#vln2\n';
var pos = [];
while ((m = re.exec(str)) !== null) {
pos.push([m.index+m[1].length, m.index+m[0].length]);
}
console.log(pos);
BONUS
My regex works at regex101.com, but not in...
First of all, have you checked the Code Generator link in the Tools pane on the left?
All languages - "Literal string" vs. "String literal" alert - Make sure you test against the same text used in code, literal string, at the regex tester. A common scenario is copy/pasting a string literal value directly into the test string field, with all string escape sequences like \n (line feed char), \r (carriage return), \t (tab char). See Regex_search c++, for example. Mind that they must be replaced with their literal counterparts. So, if you have in Python text = "Text\n\n abc", you must use Text, two line breaks, abc in the regex tester text field. Text.*?abc will never match it although you might think it "works". Yes, . does not always match line break chars, see How do I match any character across multiple lines in a regular expression?
All languages - Backslash alert - Make sure you correctly use a backslash in your string literal, in most languages, in regular string literals, use double backslash, i.e. \d used at regex101.com must written as \\d. In raw string literals, use a single backslash, same as at regex101. Escaping word boundary is very important, since, in many languages (C#, Python, Java, JavaScript, Ruby, etc.), "\b" is used to define a BACKSPACE char, i.e. it is a valid string escape sequence. PHP does not support \b string escape sequence, so "/\b/" = '/\b/' there.
All languages - Default flags - Global and Multiline - Note that by default m and g flags are enabled at regex101.com. So, if you use ^ and $, they will match at the start and end of lines correspondingly. If you need the same behavior in your code check how multiline mode is implemented and either use a specific flag, or - if supported - use an inline (?m) embedded (inline) modifier. The g flag enables multiple occurrence matching, it is often implemented using specific functions/methods. Check your language reference to find the appropriate one.
line-breaks - Line endings at regex101.com are LF only, you can't test strings with CRLF endings, see regex101.com VS myserver - different results. Solutions can be different for each regex library: either use \R (PCRE, Java, Ruby) or some kind of \v (Boost, PCRE), \r?\n, (?:\r\n?|\n)/(?>\r\n?|\n) (good for .NET) or [\r\n]+ in other libraries (see answers for C#, PHP). Another issue related to the fact that you test your regex against a multiline string (not a list of standalone strings/lines) is that your patterns may consume the end of line, \n, char with negated character classes, see an issue like that. \D matched the end of line char, and in order to avoid it, [^\d\n] could be used, or other alternatives.
php - You are dealing with Unicode strings, or want shorthand character classes to match Unicode characters, too (e.g. \w+ to match Стрибижев or Stribiżew, or \s+ to match hard spaces), then you need to use u modifier, see preg_match() returns 0 although regex testers work - To match all occurrences, use preg_match_all, not preg_match with /...pattern.../g, see PHP preg_match to find multiple occurrences and "Unknown modifier 'g' in..." when using preg_match in PHP?- Your regex with inline backreference like \1 refuses to work? Are you using a double quoted string literal? Use a single-quoted one, see Backreference does not work in PHP
phplaravel - Mind you need the regex delimiters around the pattern, see https://stackoverflow.com/questions/22430529
python - Note that re.search, re.match, re.fullmatch, re.findall and re.finditer accept the regex as the first argument, and the input string as the second argument. Not re.findall("test 200 300", r"\d+"), but re.findall(r"\d+", "test 200 300"). If you test at regex101.com, please check the "Code Generator" page. - You used re.match that only searches for a match at the start of the string, use re.search: Regex works fine on Pythex, but not in Python - If the regex contains capturing group(s), re.findall returns a list of captures/capture tuples. Either use non-capturing groups, or re.finditer, or remove redundant capturing groups, see re.findall behaves weird - If you used ^ in the pattern to denote start of a line, not start of the whole string, or used $ to denote the end of a line and not a string, pass re.M or re.MULTILINE flag to re method, see Using ^ to match beginning of line in Python regex
- If you try to match some text across multiple lines, and use re.DOTALL or re.S, or [\s\S]* / [\s\S]*?, and still nothing works, check if you read the file line by line, say, with for line in file:. You must pass the whole file contents as the input to the regex method, see Getting Everything Between Two Characters Across New Lines. - Having trouble adding flags to regex and trying something like pattern = r"/abc/gi"? See How to add modifers to regex in python?
c#, .net - .NET regex does not support possessive quantifiers like ++, *+, ??, {1,10}?, see .NET regex matching digits between optional text with possessive quantifer is not working - When you match against a multiline string and use RegexOptions.Multiline option (or inline (?m) modifier) with an $ anchor in the pattern to match entire lines, and get no match in code, you need to add \r? before $, see .Net regex matching $ with the end of the string and not of line, even with multiline enabled - To get multiple matches, use Regex.Matches, not Regex.Match, see RegEx Match multiple times in string - Similar case as above: splitting a string into paragraphs, by a double line break sequence - C# / Regex Pattern works in online testing, but not at runtime - You should remove regex delimiters, i.e. #"/\d+/" must actually look like #"\d+", see Simple and tested online regex containing regex delimiters does not work in C# code - If you unnecessarily used Regex.Escape to escape all characters in a regular expression (like Regex.Escape(#"\d+\.\d+")) you need to remove Regex.Escape, see Regular Expression working in regex tester, but not in c#
dartflutter - Use raw string literal, RegExp(r"\d"), or double backslashes (RegExp("\\d")) - https://stackoverflow.com/questions/59085824
javascript - Double escape backslashes in a RegExp("\\d"): Why do regex constructors need to be double escaped?
- (Negative) lookbehinds unsupported by most browsers: Regex works on browser but not in Node.js - Strings are immutable, assign the .replace result to a var - The .replace() method does change the string in place - Retrieve all matches with str.match(/pat/g) - Regex101 and Js regex search showing different results or, with RegExp#exec, RegEx to extract all matches from string using RegExp.exec- Replace all pattern matches in string: Why does javascript replace only first instance when using replace?
javascriptangular - Double the backslashes if you define a regex with a string literal, or just use a regex literal notation, see https://stackoverflow.com/questions/56097782
java - Word boundary not working? Make sure you use double backslashes, "\\b", see Regex \b word boundary not works - Getting invalid escape sequence exception? Same thing, double backslashes - Java doesn't work with regex \s, says: invalid escape sequence - No match found is bugging you? Run Matcher.find() / Matcher.matches() - Why does my regex work on RegexPlanet and regex101 but not in my code? - .matches() requires a full string match, use .find(): Java Regex pattern that matches in any online tester but doesn't in Eclipse - Access groups using matcher.group(x): Regex not working in Java while working otherwise - Inside a character class, both [ and ] must be escaped - Using square brackets inside character class in Java regex - You should not run matcher.matches() and matcher.find() consecutively, use only if (matcher.matches()) {...} to check if the pattern matches the whole string and then act accordingly, or use if (matcher.find()) to check if there is a single match or while (matcher.find()) to find multiple matches (or Matcher#results()). See Why does my regex work on RegexPlanet and regex101 but not in my code?
scala - Your regex attempts to match several lines, but you read the file line by line (e.g. use for (line <- fSource.getLines))? Read it into a single variable (see matching new line in Scala regex, when reading from file)
kotlin - You have Regex("/^\\d+$/")? Remove the outer slashes, they are regex delimiter chars that are not part of a pattern. See Find one or more word in string using Regex in Kotlin - You expect a partial string match, but .matchEntire requires a full string match? Use .find, see Regex doesn't match in Kotlin
mongodb - Do not enclose /.../ with single/double quotation marks, see mongodb regex doesn't work
c++ - regex_match requires a full string match, use regex_search to find a partial match - Regex not working as expected with C++ regex_match - regex_search finds the first match only. Use sregex_token_iterator or sregex_iterator to get all matches: see What does std::match_results::size return? - When you read a user-defined string using std::string input; std::cin >> input;, note that cin will only get to the first whitespace, to read the whole line properly, use std::getline(std::cin, input); - C++ Regex to match '+' quantifier - "\d" does not work, you need to use "\\d" or R"(\d)" (a raw string literal) - This regex doesn't work in c++ - Make sure the regex is tested against a literal text, not a string literal, see Regex_search c++
go - Double backslashes or use a raw string literal: Regular expression doesn't work in Go - Go regex does not support lookarounds, select the right option (Go) at regex101.com before testing! Regex expression negated set not working golang
groovy - Return all matches: Regex that works on regex101 does not work in Groovy
r - Double escape backslashes in the string literal: "'\w' is an unrecognized escape" in grep - Use perl=TRUE to PCRE engine ((g)sub/(g)regexpr): Why is this regex using lookbehinds invalid in R?
oracle - Greediness of all quantifiers is set by the first quantifier in the regex, see Regex101 vs Oracle Regex (then, you need to make all the quantifiers as greedy as the first one)] - \b does not work? Oracle regex does not support word boundaries at all, use workarounds as shown in Regex matching works on regex tester but not in oracle
firebase - Double escape backslashes, make sure ^ only appears at the start of the pattern and $ is located only at the end (if any), and note you cannot use more than 9 inline backreferences: Firebase Rules Regex Birthday
firebasegoogle-cloud-firestore - In Firestore security rules, the regular expression needs to be passed as a string, which also means it shouldn't be wrapped in / symbols, i.e. use allow create: if docId.matches("^\\d+$").... See https://stackoverflow.com/questions/63243300
google-data-studio - /pattern/g in REGEXP_REPLACE must contain no / regex delimiters and flags (like g) - see How to use Regex to replace square brackets from date field in Google Data Studio?
google-sheets - If you think REGEXEXTRACT does not return full matches, truncates the results, you should check if you have redundant capturing groups in your regex and remove them, or convert the capturing groups to non-capturing by add ?: after the opening (, see Extract url domain root in Google Sheet
sed - Why does my regular expression work in X but not in Y?
word-boundarypcrephp - [[:<:]] and [[:>:]] do not work in the regex tester, although they are valid constructs in PCRE, see https://stackoverflow.com/questions/48670105
snowflake-cloud-data-platform snowflake-sql - If you are writing a stored procedure, and \\d does not work, you need to double them again and use \\\\d, see REGEX conversion of VARCHAR value to DATE in Snowflake stored procedure using RLIKE not consistent.
Related
Use OR in Regex Expression
I have a regex to match the following: somedomain.com/services/something Basically I need to ensure that /services is present. The regex I am using and which is working is: \/services* But I need to match /services OR /servicos. I tried the following: (\/services|\/servicos)* But this shows 24 matches?! https://regex101.com/r/jvB1lr/1 How to create this regex?
The (\/services|\/servicos)* matches 0+ occurrences of /services or /servicos, and that means it can match an empty string anywhere inside the input string. You can group the alternatives like /(services|servicos) and remove the * quantifier, but for this case, it is much better to use a character class [oe] as the strings only differ in 1 char. You want to use the following pattern: /servic[eo]s See the regex demo To make sure you match a whole subpart, you may append (?:/|$) at the pattern end, /servic[eo]s(?:/|$). In C#, you may use Regex.IsMatch with the pattern to see if there is a match in a string: var isFound = Regex.IsMatch(s, #"/servic[eo]s(?:/|$)"); Note that you do not need to escape / in a .NET regex as it is not a special regex metacharacter. Pattern details / - a / servic[eo]s - services or servicos (?:/|$) - / or end of string.
Well the * quantifier means zero or more, so that is the problem. Remove that and it should work fine: (\/services|\/servicos) Keep in mind that in your example, you have a typo in the URL so it will correctly not match anything as it stands. Here is an example with the typo in the URL fixed, so it shows 1 match as expected.
First off you specify C# (really .Net is the library which holds regex not the language) in this post but regex101 in your example is set to PHP. That is providing you with invalid information such as needed to escape a forward slash / with \/ which is unnecessary in .Net regular expressions. The regex language is the same but there are different tools which behave differently and php is not like .Net regex. Secondly the star * on the ( ) is saying that there may be nothing in the parenthesis and your match is getting null nothing matches on every word. Thirdly one does not need to split the whole word. I would just extract the commonality in the words into a set [ ]. That will allow the "or-ness" you need to match on either services or servicos. Such as (/servic[oe]s) Will inform you if services are found or not. Nothing else is needed.
C# Regex to match attributes [duplicate]
To review regular expresions I read this tutorial. Anyways that tutorial mentions that \b matches a word boundary (between \w and \W characters). That tutorial also gives a link where you can install expresso (program that helps when creating regular expressions). So I have created my regular expressions in expresso and I do inded get a match. Now when I copy the same regex to visual studio I do not get a match. Take a look: Why am I not getting a match? in the immediate window I am showing the content of variable output. In expresso I do get a match and in visual studio I don't. why?
The C# language and .NET Regular Expressions both have their own distinct set of backslash-escape sequences, but the C# compiler is intercepting the "\b" in your string and converting it into an ASCII backspace character so the RegEx class never sees it. You need to make your string verbatim (prefix with an at-symbol) or double-escape the 'b' so the backslash is passed to RegEx like so: #"\bCOMPILATION UNIT"; Or "\\bCOMPILATION UNIT" I'll say the .NET RegEx documentation does not make this clear. It took me a while to figure this out at first too. Fun-fact: The \r and \n characters (carriage-return and line-break respectively) and some others are recognized by both RegEx and the C# language, so the end-result is the same, even if the compiled string is different.
You should use #"\bCOMPILATION UNIT". This is a verbatim literal. When you do "\b" instead, it parses \b into a special character. You can also do "\\b", whose double backslash is parsed into a real backslash, but it's generally easier to just use verbatims when dealing with regex.
Get what is in string from one quotation mark to other [duplicate]
I have a value like this: "Foo Bar" "Another Value" something else What regex will return the values enclosed in the quotation marks (e.g. Foo Bar and Another Value)?
In general, the following regular expression fragment is what you are looking for: "(.*?)" This uses the non-greedy *? operator to capture everything up to but not including the next double quote. Then, you use a language-specific mechanism to extract the matched text. In Python, you could do: >>> import re >>> string = '"Foo Bar" "Another Value"' >>> print re.findall(r'"(.*?)"', string) ['Foo Bar', 'Another Value']
I've been using the following with great success: (["'])(?:(?=(\\?))\2.)*?\1 It supports nested quotes as well. For those who want a deeper explanation of how this works, here's an explanation from user ephemient: ([""']) match a quote; ((?=(\\?))\2.) if backslash exists, gobble it, and whether or not that happens, match a character; *? match many times (non-greedily, as to not eat the closing quote); \1 match the same quote that was use for opening.
I would go for: "([^"]*)" The [^"] is regex for any character except '"' The reason I use this over the non greedy many operator is that I have to keep looking that up just to make sure I get it correct.
Lets see two efficient ways that deal with escaped quotes. These patterns are not designed to be concise nor aesthetic, but to be efficient. These ways use the first character discrimination to quickly find quotes in the string without the cost of an alternation. (The idea is to discard quickly characters that are not quotes without to test the two branches of the alternation.) Content between quotes is described with an unrolled loop (instead of a repeated alternation) to be more efficient too: [^"\\]*(?:\\.[^"\\]*)* Obviously to deal with strings that haven't balanced quotes, you can use possessive quantifiers instead: [^"\\]*+(?:\\.[^"\\]*)*+ or a workaround to emulate them, to prevent too much backtracking. You can choose too that a quoted part can be an opening quote until the next (non-escaped) quote or the end of the string. In this case there is no need to use possessive quantifiers, you only need to make the last quote optional. Notice: sometimes quotes are not escaped with a backslash but by repeating the quote. In this case the content subpattern looks like this: [^"]*(?:""[^"]*)* The patterns avoid the use of a capture group and a backreference (I mean something like (["']).....\1) and use a simple alternation but with ["'] at the beginning, in factor. Perl like: ["'](?:(?<=")[^"\\]*(?s:\\.[^"\\]*)*"|(?<=')[^'\\]*(?s:\\.[^'\\]*)*') (note that (?s:...) is a syntactic sugar to switch on the dotall/singleline mode inside the non-capturing group. If this syntax is not supported you can easily switch this mode on for all the pattern or replace the dot with [\s\S]) (The way this pattern is written is totally "hand-driven" and doesn't take account of eventual engine internal optimizations) ECMA script: (?=["'])(?:"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]*(?:\\[\s\S][^'\\]*)*') POSIX extended: "[^"\\]*(\\(.|\n)[^"\\]*)*"|'[^'\\]*(\\(.|\n)[^'\\]*)*' or simply: "([^"\\]|\\.|\\\n)*"|'([^'\\]|\\.|\\\n)*'
Peculiarly, none of these answers produce a regex where the returned match is the text inside the quotes, which is what is asked for. MA-Madden tries but only gets the inside match as a captured group rather than the whole match. One way to actually do it would be : (?<=(["']\b))(?:(?=(\\?))\2.)*?(?=\1) Examples for this can be seen in this demo https://regex101.com/r/Hbj8aP/1 The key here is the the positive lookbehind at the start (the ?<= ) and the positive lookahead at the end (the ?=). The lookbehind is looking behind the current character to check for a quote, if found then start from there and then the lookahead is checking the character ahead for a quote and if found stop on that character. The lookbehind group (the ["']) is wrapped in brackets to create a group for whichever quote was found at the start, this is then used at the end lookahead (?=\1) to make sure it only stops when it finds the corresponding quote. The only other complication is that because the lookahead doesn't actually consume the end quote, it will be found again by the starting lookbehind which causes text between ending and starting quotes on the same line to be matched. Putting a word boundary on the opening quote (["']\b) helps with this, though ideally I'd like to move past the lookahead but I don't think that is possible. The bit allowing escaped characters in the middle I've taken directly from Adam's answer.
The RegEx of accepted answer returns the values including their sourrounding quotation marks: "Foo Bar" and "Another Value" as matches. Here are RegEx which return only the values between quotation marks (as the questioner was asking for): Double quotes only (use value of capture group #1): "(.*?[^\\])" Single quotes only (use value of capture group #1): '(.*?[^\\])' Both (use value of capture group #2): (["'])(.*?[^\\])\1 - All support escaped and nested quotes.
I liked Eugen Mihailescu's solution to match the content between quotes whilst allowing to escape quotes. However, I discovered some problems with escaping and came up with the following regex to fix them: (['"])(?:(?!\1|\\).|\\.)*\1 It does the trick and is still pretty simple and easy to maintain. Demo (with some more test-cases; feel free to use it and expand on it). PS: If you just want the content between quotes in the full match ($0), and are not afraid of the performance penalty use: (?<=(['"])\b)(?:(?!\1|\\).|\\.)*(?=\1) Unfortunately, without the quotes as anchors, I had to add a boundary \b which does not play well with spaces and non-word boundary characters after the starting quote. Alternatively, modify the initial version by simply adding a group and extract the string form $2: (['"])((?:(?!\1|\\).|\\.)*)\1 PPS: If your focus is solely on efficiency, go with Casimir et Hippolyte's solution; it's a good one.
A very late answer, but like to answer (\"[\w\s]+\") http://regex101.com/r/cB0kB8/1
The pattern (["'])(?:(?=(\\?))\2.)*?\1 above does the job but I am concerned of its performances (it's not bad but could be better). Mine below it's ~20% faster. The pattern "(.*?)" is just incomplete. My advice for everyone reading this is just DON'T USE IT!!! For instance it cannot capture many strings (if needed I can provide an exhaustive test-case) like the one below: $string = 'How are you? I\'m fine, thank you'; The rest of them are just as "good" as the one above. If you really care both about performance and precision then start with the one below: /(['"])((\\\1|.)*?)\1/gm In my tests it covered every string I met but if you find something that doesn't work I would gladly update it for you. Check my pattern in an online regex tester.
This version accounts for escaped quotes controls backtracking /(["'])((?:(?!\1)[^\\]|(?:\\\\)*\\[^\\])*)\1/
MORE ANSWERS! Here is the solution i used \"([^\"]*?icon[^\"]*?)\" TLDR; replace the word icon with what your looking for in said quotes and voila! The way this works is it looks for the keyword and doesn't care what else in between the quotes. EG: id="fb-icon" id="icon-close" id="large-icon-close" the regex looks for a quote mark " then it looks for any possible group of letters thats not " until it finds icon and any possible group of letters that is not " it then looks for a closing "
I liked Axeman's more expansive version, but had some trouble with it (it didn't match for example foo "string \\ string" bar or foo "string1" bar "string2" correctly, so I tried to fix it: # opening quote (["']) ( # repeat (non-greedy, so we don't span multiple strings) (?: # anything, except not the opening quote, and not # a backslash, which are handled separately. (?!\1)[^\\] | # consume any double backslash (unnecessary?) (?:\\\\)* | # Allow backslash to escape characters \\. )*? ) # same character as opening quote \1
string = "\" foo bar\" \"loloo\"" print re.findall(r'"(.*?)"',string) just try this out , works like a charm !!! \ indicates skip character
Unlike Adam's answer, I have a simple but worked one: (["'])(?:\\\1|.)*?\1 And just add parenthesis if you want to get content in quotes like this: (["'])((?:\\\1|.)*?)\1 Then $1 matches quote char and $2 matches content string.
All the answer above are good.... except they DOES NOT support all the unicode characters! at ECMA Script (Javascript) If you are a Node users, you might want the the modified version of accepted answer that support all unicode characters : /(?<=((?<=[\s,.:;"']|^)["']))(?:(?=(\\?))\2.)*?(?=\1)/gmu Try here.
My solution to this is below (["']).*\1(?![^\s]) Demo link : https://regex101.com/r/jlhQhV/1 Explanation: (["'])-> Matches to either ' or " and store it in the backreference \1 once the match found .* -> Greedy approach to continue matching everything zero or more times until it encounters ' or " at end of the string. After encountering such state, regex engine backtrack to previous matching character and here regex is over and will move to next regex. \1 -> Matches to the character or string that have been matched earlier with the first capture group. (?![^\s]) -> Negative lookahead to ensure there should not any non space character after the previous match
echo 'junk "Foo Bar" not empty one "" this "but this" and this neither' | sed 's/[^\"]*\"\([^\"]*\)\"[^\"]*/>\1</g' This will result in: >Foo Bar<><>but this< Here I showed the result string between ><'s for clarity, also using the non-greedy version with this sed command we first throw out the junk before and after that ""'s and then replace this with the part between the ""'s and surround this by ><'s.
From Greg H. I was able to create this regex to suit my needs. I needed to match a specific value that was qualified by being inside quotes. It must be a full match, no partial matching could should trigger a hit e.g. "test" could not match for "test2". reg = r"""(['"])(%s)\1""" if re.search(reg%(needle), haystack, re.IGNORECASE): print "winning..." Hunter
If you're trying to find strings that only have a certain suffix, such as dot syntax, you can try this: \"([^\"]*?[^\"]*?)\".localized Where .localized is the suffix. Example: print("this is something I need to return".localized + "so is this".localized + "but this is not") It will capture "this is something I need to return".localized and "so is this".localized but not "but this is not".
A supplementary answer for the subset of Microsoft VBA coders only one uses the library Microsoft VBScript Regular Expressions 5.5 and this gives the following code Sub TestRegularExpression() Dim oRE As VBScript_RegExp_55.RegExp '* Tools->References: Microsoft VBScript Regular Expressions 5.5 Set oRE = New VBScript_RegExp_55.RegExp oRE.Pattern = """([^""]*)""" oRE.Global = True Dim sTest As String sTest = """Foo Bar"" ""Another Value"" something else" Debug.Assert oRE.test(sTest) Dim oMatchCol As VBScript_RegExp_55.MatchCollection Set oMatchCol = oRE.Execute(sTest) Debug.Assert oMatchCol.Count = 2 Dim oMatch As Match For Each oMatch In oMatchCol Debug.Print oMatch.SubMatches(0) Next oMatch End Sub
Regex for allowing semi colon
I have a regex for validating a string but it doesn't accept semicolons? Is it because I have to use some escape sequences? I tested my regex here and it passes i.e allows semi-colon but doesn't allow in my c# app. EDITED I have following regex ^[A-Za-z0-9]{1}[A-Za-z.&0-9\s\\-]{0,21}$ And tried validating sar232 trading inc;
The & entity hints at the fact you have this regular expression inside some XML attribute, and that this & gets parsed as a single & symbol when the pattern is sent to the regex engine. That means, your pattern lacks the semi-colon inside the second character class, and that is why your regex does not match the string you provided. The solution is simple: add the semi-colon to the 2nd character class: someattr="^[A-Za-z0-9][;A-Za-z.&0-9\s\\-]{0,21}$" ^ See the regex demo Please also note that the {1} limiting quantifier is redundant since a [A-Za-z0-9] already matches only 1 symbol from the indicated ranges.
How to use regex to match anything from A to B, where B is not preceeded by C
I'm having a hard time with this one. First off, here is the difficult part of the string I'm matching against: "a \"b\" c" What I want to extract from this is the following: a \"b\" c Of course, this is just a substring from a larger string, but everything else works as expected. The problem is making the regex ignore the quotes that are escaped with a backslash. I've looked into various ways of doing it, but nothing has gotten me the correct results. My most recent attempt looks like this: "((\"|[^"])+?)" In various test online, this works the way it should - but when I build my ASP.NET page, it cuts off at the first ", leaving me with just the a-letter, white space and a backslash. The logic behind the pattern above is to capture all instances of \" or something that is not ". I was hoping this would search for \", making sure to find those first - but I got the feeling that this is overridden by the second part of the expression, which is only 1 single character. A single backslash does not match 2 characters (\"), but it will match as a non-". And from there, the next character will be a single ", and the matching is completed. (This is just my hypothesis on why my pattern is failing.) Any pointers on this one? I have tried various combinations with "look"-methods in regex, but I didn't really get anywhere. I also get the feeling that is what I need.
ORIGINAL ANSWER To match a string like a \"b\" c, you need to use following regex declaration: (?:\\"|[^"])+ var rx = Regex(#"(?:\\""|[^""])+"); See RegexStorm demo Here is an IDEONE demo: var str = "a \\\"b\\\" c"; Console.WriteLine(str); var rx = new Regex(#"(?:\\""|[^""])+"); Console.WriteLine(rx.Match(str).Value); Please note the # in front of the string literal that lets us use verbatim string literals where we have to double quotes to match literal quotes and use single escape slashes instead of double. This makes regexps easier to read and maintain. If you want to match any escaped entities in your input string, you can use: var rx = new Regex(#"[^""\\]*(?:\\.[^""\\]*)*"); See demo on RegexStorm UPDATE To match the quoted strings, just add quotes around the pattern: var rx = new Regex(#"""(?<res>[^""\\]*(?:\\.[^""\\]*)*)"""); This pattern yields much better performance than Tim Long's suggested regex, see RegexHero test resuls:
The following expression worked for me: "(?<Result>(\\"|.)*)" The expression matches as follows: An opening quote (literal ") A named capture (?<name>pattern) consisting of: Zero or more occurences * of literal \" or (|) any single character (.) A final closing quote (literal ") Note that the * (zero or more) quantifier is non-greedy so the final quote is matched by the literal " and not the "any single character" . part. I used ReSharper 9's built-in Regular Expression validator to develop the expression and verify the results: I have used the "Explicit Capture" option to reduce cruft in the output (RegexOptions.ExplicitCapture). One thing to note is that I am matching the whole string, but I am only capturing the substring, using a named capture. Using named captures is a really useful way to get at the results you want. In code, it might look something like this: static string MatchQuotedString(string input) { const string pattern = #"""(?<Result>(\\""|.)*)"""; const RegexOptions options = RegexOptions.ExplicitCapture; Regex regex = new Regex(pattern, options); var matches = regex.Match(input); var substring = matches.Groups["Result"].Value; return substring; } Optimization: If you are planning on using the regex a lot, you could factor it out into a field and use the RegexOptions.Compiled option, this pre-compiles the expression and gives you faster throughput at the expense of longer initialization.