I am trying to match text between two delimiters, [% %], and I want to get everything whether the string contains new lines or not.
Code
string strEmailContent = sr.ReadToEnd();
string commentPatt = #"\[%((\r\n?|\n).*(\r\n?|\n))%\]";
Regex commentRgx = new Regex(commentPatt, RegexOptions.Singleline);
Sample Inputs
//Successful
[%
New Comment
%] other content from input
//Match: [%\r\nNew Comment\r\n%]
//Fail
[% New Comment %]
//Match: false
//Successfully match single line with
string commentPatt = #"\[%(.*)%\]";
//Match: [% New Comment %]
I do not know how to combine these two patterns to match both cases. Can anyone provide any assistance?
To get text between two delimiters you need to use lazy matching with .*?, but to also match newline symbols, you need (?s) singleline modifier so that the dot could also match newline symbols:
(?s)\[%(.*?)%]
Note that (?s)\[%(.*?)%] will match even if the % is inside [%...%].
See regex demo. Note that the ] does not have to be escaped since it is situated in an unambiguous position and can only be interpreted as a literal ].
In C#, you can use
var rx = new Regex(#"(?s)\[%(.*?)%]");
var res = rx.Matches(str).Cast<Match>().Select(p => p.Groups[1].Value).ToList();
Try this pattern:
\[%([^%]*)%\]
It captures all characters between "[%" and "%]" that is not a "%" character.
Tested # Regex101
If you want to "see" the "\r\n" in your results, you'll have to escape them with a String.Replace().
See Fiddle Demo
Related
My situation is not about removing empty spaces, but keeping them. I have this string >[database values] which I would like to find. I created this RegEx to find it then go in and remove the >, [, ]. The code below takes a string that is from a document. The first pattern looks for anything that is surrounded by >[some stuff] it then goes in and "removes" >, [, ]
string decoded = "document in string format";
string pattern = #">\[[A-z, /, \s]*\]";
string pattern2 = #"[>, \[, \]]";
Regex rgx = new Regex(pattern);
Regex rgx2 = new Regex(pattern2);
foreach (Match match in rgx.Matches(decoded))
{
string replacedValue= rgx2.Replace(match.Value, "");
Console.WriteLine(match.Value);
Console.WriteLine(replacedValue);
What I am getting in first my Console.WriteLine is correct. So I would be getting things like >[123 sesame St]. But my second output shows that my replace removes not just the characters but the spaces so I would get something like this 123sesameSt. I don't see any space being replaced in my Regex. Am I forgetting something, perhaps it is implicitly in a replace?
The [A-z, /, \s] and [>, \[, \]] in your patterns are also looking for commas and spaces. Just list the characters without delimiting them, like this: [A-Za-z/\s]
string pattern = #">\[[A-Za-z/\s]*\]";
string pattern2 = #"[>,\[\]]";
Edit to include Casimir's tip.
After rereading your question (if I understand well) I realize that your two steps approach is useless. You only need one replacement using a capture group:
string pattern = #">\[([^]]*)]";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(yourtext, "$1");
pattern details:
>\[ # literals: >[
( # open the capture group 1
[^]]* # all that is not a ]
) # close the capture group 1
] # literal ]
the replacement string refers to the capture group 1 with $1
By defining [>, \[, \]] in pattern2 you define a character group consisting of single characters like >, ,, , [ and every other character you listed in the square brackets. But I guess you don't want to match space and ,. So if you don't want to match them leave them out like
string pattern2 = #"[>\[\]]";
Alternatively, you could use
string pattern2 = #"(>\[|\])";
Thereby, you either match >[ or ] which better expresses your intention.
I've got an input string that looks like this:
level=<device[195].level>&name=<device[195].name>
I want to create a RegEx that will parse out each of the <device> tags, for example, I'd expect two items to be matched from my input string: <device[195].level> and <device[195].name>.
So far I've had some luck with this pattern and code, but it always finds both of the device tags as a single match:
var pattern = "<device\\[[0-9]*\\]\\.\\S*>";
Regex rgx = new Regex(pattern);
var matches = rgx.Matches(httpData);
The result is that matches will contain a single result with the value <device[195].level>&name=<device[195].name>
I'm guessing there must be a way to 'terminate' the pattern, but I'm not sure what it is.
Use non-greedy quantifiers:
<device\[\d+\]\.\S+?>
Also, use verbatim strings for escaping regexes, it makes them much more readable:
var pattern = #"<device\[\d+\]\.\S+?>";
As a side note, I guess in your case using \w instead of \S would be more in line with what you intended, but I left the \S because I can't know that.
depends how much of the structure of the angle blocks you need to match, but you can do
"\\<device.+?\\>"
I want to create a RegEx that will parse out each of the <device> tags
I'd expect two items to be matched from my input string:
1. <device[195].level>
2. <device[195].name>
This should work. Get the matched group from index 1
(<device[^>]*>)
Live demo
String literals for use in programs:
#"(<device[^>]*>)"
Change your repetition operator and use \w instead of \S
var pattern = #"<device\[[0-9]+\]\.\w+>";
String s = #"level=<device[195].level>&name=<device[195].name>";
foreach (Match m in Regex.Matches(s, #"<device\[[0-9]+\]\.\w+>"))
Console.WriteLine(m.Value);
Output
<device[195].level>
<device[195].name>
Use named match groups and create a linq entity projection. There will be two matches, thus separating the individual items:
string data = "level=<device[195].level>&name=<device[195].name>";
string pattern = #"
(?<variable>[^=]+) # get the variable name
(?:=<device\[) # static '=<device'
(?<index>[^\]]+) # device number index
(?:]\.) # static ].
(?<sub>[^>]+) # Get the sub command
(?:>&?) # Match but don't capture the > and possible &
";
// Ignore pattern whitespace is to document the pattern, does not affect processing.
var items = Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
Variable = mt.Groups["variable"].Value,
Index = mt.Groups["index"].Value,
Sub = mt.Groups["sub"].Value
})
.ToList();
items.ForEach(itm => Console.WriteLine ("{0}:{1}:{2}", itm.Variable, itm.Index, itm.Sub));
/* Output
level:195:level
name:195:name
*/
I would like a regex pattern to match all lines in a text file that has the follow pattern:
TcQuery {dynamic_content} Alias "{dynamic_content}" New
If the text file has these two lines:
//tcquery c_query alias "qrybklog" new <= This one shouldn´t be found because there is two backslashes before TcQuery.
tcquery c_query alias "qrybklog" new <= I want a pattern to match this line
I´ve tried this but both lines are matched:
var prw = System.IO.File.ReadAllText(#"d:\backlog.prw", Encoding.ASCII);
prw = "//TcQuery c_query Alias teste1 new";
prw = "\nTcQuery c_query Alias teste2 new";
prw = "\nTcQuery c_query Alias teste3 new";
prw = "\n//TcQuery c_query Alias teste4 new";
var regexTcQuery = new Regex("TcQuery+[ *]+[0-9a-zA-Z_]+[ *]+alias+[ *]+[0-9a-zA-Z_\"]+[ *]new$", RegexOptions.IgnoreCase);
var resultTcQuery = regexTcQuery.Matches(prw);
Use singleline and ignorecase option with this regex
(?<!\/\/)\s*?\btcquery\b(.*?)\balias\b.*?new
(?<!\/\/) checks for // before tcquery
.*? is a lazily matches 0 to many characters.If you use .* then it would match the last tcqurey skipping ibetween tcquery matches
\b is a non-word boundary.This allows us to match separate words
\s matches 0 to many space which can be any of the [\n\r\t\f]
If you want to get the c_query text you can do this
List<string> lst=Regex.Matches(input,#"(?<!\/\/)\s*?\btcquery\b(.*?)\balias\b.*?new").Cast<Match>().Select(x=>x.Groups[1].Value).ToList();
If you want to match the full line (without any sub-groups), you could use:
^tcquery.*$
The ^ indicates that the text starts at the beginning of the string; since tcquery is the text immediately following, lines that start with // will be ignored.
If there are is any whitespace before tcquery, you can match this with \s*:
^\s*tcquery.*$
If you have lines that can begin with tcquery but don't necessarily follow the format you specified, you can update the pattern with:
^\s*tcquery.*alias.*new.*?$
To match the "dynamic" content portions of the line(s) into groups, you should be able to use:
^\s*tcquery\s+(.*)\s+alias\s+"(.*)"\s+new.*?$
Also, worth noting, you should have the ignore-case regex option enabled for each of my above-examples.
([^\/\/])tcquery (.*) alias \"(.*)\" new$/i
if your language does not support /i for caseless matching, then switch the regex to:
([^\/\/])TcQuery (.*) Alias \"(.*)\" New$
try this:
^tcquery [^ ]+ alias "[^"]+" new
it's an amateur one but the main idea is that the line must start with tcquery.
Given a string, I want to retrieve a string that is in between the quotation marks, and that is fully capitalized.
For example, if a string of
oqr"awr"q q"ASRQ" asd "qIKQWIR"
has been entered, the regex would only evaluate "ASRQ" as matching string.
What is the best way to approach this?
Edit: Forgot to mention the string takes a numeric input as well I.E: "IO8917AS" is a valid input
EDIT: If you actually want "one or more characters, and none of the characters is a lower-case letter" then you probably want:
Regex regex = new Regex("\"\\P{Ll}+\"");
That will then allow digits as well... and punctuation. If you want to allow digits and upper case letters but nothing else, you can use:
Regex regex = new Regex("\"[\\p{Lu}\\d]+\"");
Or in verbatim string literal form (makes the quotes more confusing, but the backslashes less so):
Regex regex = new Regex(#"""[\p{Lu}\d]+""");
Original answer (before digits were required)
Sounds like you just want (within the pattern)
"[A-Z]*"
So something like:
Regex regex = new Regex("\"[A-Z]*\"");
Or for full Unicode support, use the Lu Unicode character category:
Regex regex = new Regex("\"\\p{Lu}*\"");
EDIT: As noted, if you don't want to match an empty string in quotes (which is still "a string where everything is upper case") then use + instead of *, e.g.
Regex regex = new Regex("\"\\p{Lu}+\");
Short but complete example of finding and displaying the first match:
using System;
using System.Text.RegularExpressions;
class Program
{
public static void Main()
{
Regex regex = new Regex("\"\\p{Lu}+\"");
string text = "oqr\"awr\"q q\"ASRQ\" asd \"qIKQWIR\"";
Match match = regex.Match(text);
Console.WriteLine(match.Success); // True
Console.WriteLine(match.Value); // "ASRQ"
}
}
Like this:
"\"[A-Z]+\""
The outermost quotes are not part of the regex, they delimit a C# string.
This requires at least one uppercase character between quotes and works for the English language.
Please try the following:
[\w]*"([A-Z0-9]+)"
I need to match all the whole words containing a given a string.
string s = "ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
Regex r = new Regex("(?<TM>[!\..]*TEST.*)", ...);
MatchCollection mc = r.Matches(s);
I need the result to be:
MYTESTING
YOUTESTED
TESTING
But I get:
TESTING
TESTED
.TESTING
How do I achieve this with Regular expressions.
Edit: Extended sample string.
If you were looking for all words including 'TEST', you should use
#"(?<TM>\w*TEST\w*)"
\w includes word characters and is short for [A-Za-z0-9_]
Keep it simple: why not just try \w*TEST\w* as the match pattern.
I get the results you are expecting with the following:
string s = #"ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
var m = Regex.Matches(s, #"(\w*TEST\w*)", RegexOptions.IgnoreCase);
Try using \b. It's the regex flag for a non-word delimiter. If you wanted to match both words you could use:
/\b[a-z]+\b/i
BTW, .net doesn't need the surrounding /, and the i is just a case-insensitive match flag.
.NET Alternative:
var re = new Regex(#"\b[a-z]+\b", RegexOptions.IgnoreCase);
Using Groups I think you can achieve it.
string s = #"ABC.TESTING
XYZ.TESTED";
Regex r = new Regex(#"(?<TM>[!\..]*(?<test>TEST.*))", RegexOptions.Multiline);
var mc= r.Matches(s);
foreach (Match match in mc)
{
Console.WriteLine(match.Groups["test"]);
}
Works exactly like you want.
BTW, your regular expression pattern should be a verbatim string ( #"")
Regex r = new Regex(#"(?<TM>[^.]*TEST.*)", RegexOptions.IgnoreCase);
First, as #manojlds said, you should use verbatim strings for regexes whenever possible. Otherwise you'll have to use two backslashes in most of your regex escape sequences, not just one (e.g. [!\\..]*).
Second, if you want to match anything but a dot, that part of the regex should be [^.]*. ^ is the metacharacter that inverts the character class, not !, and . has no special meaning in that context, so it doesn't need to be escaped. But you should probably use \w* instead, or even [A-Z]*, depending on what exactly you mean by "word". [!\..] matches ! or ..
Regex r = new Regex(#"(?<TM>[A-Z]*TEST[A-Z]*)", RegexOptions.IgnoreCase);
That way you don't need to bother with word boundaries, though they don't hurt:
Regex r = new Regex(#"(?<TM>\b[A-Z]*TEST[A-Z]*\b)", RegexOptions.IgnoreCase);
Finally, if you're always taking the whole match anyway, you don't need to use a capturing group:
Regex r = new Regex(#"\b[A-Z]*TEST[A-Z]*\b", RegexOptions.IgnoreCase);
The matched text will be available via Match's Value property.