regex lines that matchs a pattern - c#

I would like a regex pattern to match all lines in a text file that has the follow pattern:
TcQuery {dynamic_content} Alias "{dynamic_content}" New
If the text file has these two lines:
//tcquery c_query alias "qrybklog" new <= This one shouldn´t be found because there is two backslashes before TcQuery.
tcquery c_query alias "qrybklog" new <= I want a pattern to match this line
I´ve tried this but both lines are matched:
var prw = System.IO.File.ReadAllText(#"d:\backlog.prw", Encoding.ASCII);
prw = "//TcQuery c_query Alias teste1 new";
prw = "\nTcQuery c_query Alias teste2 new";
prw = "\nTcQuery c_query Alias teste3 new";
prw = "\n//TcQuery c_query Alias teste4 new";
var regexTcQuery = new Regex("TcQuery+[ *]+[0-9a-zA-Z_]+[ *]+alias+[ *]+[0-9a-zA-Z_\"]+[ *]new$", RegexOptions.IgnoreCase);
var resultTcQuery = regexTcQuery.Matches(prw);

Use singleline and ignorecase option with this regex
(?<!\/\/)\s*?\btcquery\b(.*?)\balias\b.*?new
(?<!\/\/) checks for // before tcquery
.*? is a lazily matches 0 to many characters.If you use .* then it would match the last tcqurey skipping ibetween tcquery matches
\b is a non-word boundary.This allows us to match separate words
\s matches 0 to many space which can be any of the [\n\r\t\f]
If you want to get the c_query text you can do this
List<string> lst=Regex.Matches(input,#"(?<!\/\/)\s*?\btcquery\b(.*?)\balias\b.*?new").Cast<Match>().Select(x=>x.Groups[1].Value).ToList();

If you want to match the full line (without any sub-groups), you could use:
^tcquery.*$
The ^ indicates that the text starts at the beginning of the string; since tcquery is the text immediately following, lines that start with // will be ignored.
If there are is any whitespace before tcquery, you can match this with \s*:
^\s*tcquery.*$
If you have lines that can begin with tcquery but don't necessarily follow the format you specified, you can update the pattern with:
^\s*tcquery.*alias.*new.*?$
To match the "dynamic" content portions of the line(s) into groups, you should be able to use:
^\s*tcquery\s+(.*)\s+alias\s+"(.*)"\s+new.*?$
Also, worth noting, you should have the ignore-case regex option enabled for each of my above-examples.

([^\/\/])tcquery (.*) alias \"(.*)\" new$/i
if your language does not support /i for caseless matching, then switch the regex to:
([^\/\/])TcQuery (.*) Alias \"(.*)\" New$

try this:
^tcquery [^ ]+ alias "[^"]+" new
it's an amateur one but the main idea is that the line must start with tcquery.

Related

C# Regex Match between with or without new lines

I am trying to match text between two delimiters, [% %], and I want to get everything whether the string contains new lines or not.
Code
string strEmailContent = sr.ReadToEnd();
string commentPatt = #"\[%((\r\n?|\n).*(\r\n?|\n))%\]";
Regex commentRgx = new Regex(commentPatt, RegexOptions.Singleline);
Sample Inputs
//Successful
[%
New Comment
%] other content from input
//Match: [%\r\nNew Comment\r\n%]
//Fail
[% New Comment %]
//Match: false
//Successfully match single line with
string commentPatt = #"\[%(.*)%\]";
//Match: [% New Comment %]
I do not know how to combine these two patterns to match both cases. Can anyone provide any assistance?
To get text between two delimiters you need to use lazy matching with .*?, but to also match newline symbols, you need (?s) singleline modifier so that the dot could also match newline symbols:
(?s)\[%(.*?)%]
Note that (?s)\[%(.*?)%] will match even if the % is inside [%...%].
See regex demo. Note that the ] does not have to be escaped since it is situated in an unambiguous position and can only be interpreted as a literal ].
In C#, you can use
var rx = new Regex(#"(?s)\[%(.*?)%]");
var res = rx.Matches(str).Cast<Match>().Select(p => p.Groups[1].Value).ToList();
Try this pattern:
\[%([^%]*)%\]
It captures all characters between "[%" and "%]" that is not a "%" character.
Tested # Regex101
If you want to "see" the "\r\n" in your results, you'll have to escape them with a String.Replace().
See Fiddle Demo

Regex to find special pattern

I have a string to parse. First I have to check if string contains special pattern:
I wanted to know if there is substrings which starts with "$(",
and end with ")",
and between those start and end special strings,there should not be
any white-empty space,
it should not include "$" character inside it.
I have a little regex for it in C#
string input = "$(abc)";
string pattern = #"\$\(([^$][^\s]*)\)";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(input);
foreach (var match in matches)
{
Console.WriteLine("value = " + match);
}
It works for many cases but failed at input= $(a$() , which inside the expression is empty. I wanted NOT to match when input is $().[ there is nothing between start and end identifiers].
What is wrong with my regex?
Note: [^$] matches a single character but not of $
Use the below regex if you want to match $()
\$\(([^\s$]*)\)
Use the below regex if you don't want to match $(),
\$\(([^\s$]+)\)
* repeats the preceding token zero or more times.
+ Repeats the preceding token one or more times.
Your regex \(([^$][^\s]*)\) is wrong. It won't allow $ as a first character inside () but it allows it as second or third ,, etc. See the demo here. You need to combine the negated classes in your regex inorder to match any character not of a space or $.
Your current regex does not match $() because the [^$] matches at least 1 character. The only way I can think of where you would have this match would be when you have an input containing more than one parens, like:
$()(something)
In those cases, you will also need to exclude at least the closing paren:
string pattern = #"\$\(([^$\s)]+)\)";
The above matches for example:
abc in $(abc) and
abc and def in $(def)$()$(abc)(something).
Simply replace the * with a + and merge the options.
string pattern = #"\$\(([^$\s]+)\)";
+ means 1 or more
* means 0 or more

Regex removing empty spaces when using replace

My situation is not about removing empty spaces, but keeping them. I have this string >[database values] which I would like to find. I created this RegEx to find it then go in and remove the >, [, ]. The code below takes a string that is from a document. The first pattern looks for anything that is surrounded by >[some stuff] it then goes in and "removes" >, [, ]
string decoded = "document in string format";
string pattern = #">\[[A-z, /, \s]*\]";
string pattern2 = #"[>, \[, \]]";
Regex rgx = new Regex(pattern);
Regex rgx2 = new Regex(pattern2);
foreach (Match match in rgx.Matches(decoded))
{
string replacedValue= rgx2.Replace(match.Value, "");
Console.WriteLine(match.Value);
Console.WriteLine(replacedValue);
What I am getting in first my Console.WriteLine is correct. So I would be getting things like >[123 sesame St]. But my second output shows that my replace removes not just the characters but the spaces so I would get something like this 123sesameSt. I don't see any space being replaced in my Regex. Am I forgetting something, perhaps it is implicitly in a replace?
The [A-z, /, \s] and [>, \[, \]] in your patterns are also looking for commas and spaces. Just list the characters without delimiting them, like this: [A-Za-z/\s]
string pattern = #">\[[A-Za-z/\s]*\]";
string pattern2 = #"[>,\[\]]";
Edit to include Casimir's tip.
After rereading your question (if I understand well) I realize that your two steps approach is useless. You only need one replacement using a capture group:
string pattern = #">\[([^]]*)]";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(yourtext, "$1");
pattern details:
>\[ # literals: >[
( # open the capture group 1
[^]]* # all that is not a ]
) # close the capture group 1
] # literal ]
the replacement string refers to the capture group 1 with $1
By defining [>, \[, \]] in pattern2 you define a character group consisting of single characters like >, ,, , [ and every other character you listed in the square brackets. But I guess you don't want to match space and ,. So if you don't want to match them leave them out like
string pattern2 = #"[>\[\]]";
Alternatively, you could use
string pattern2 = #"(>\[|\])";
Thereby, you either match >[ or ] which better expresses your intention.

C# RegEx - get only first match in string

I've got an input string that looks like this:
level=<device[195].level>&name=<device[195].name>
I want to create a RegEx that will parse out each of the <device> tags, for example, I'd expect two items to be matched from my input string: <device[195].level> and <device[195].name>.
So far I've had some luck with this pattern and code, but it always finds both of the device tags as a single match:
var pattern = "<device\\[[0-9]*\\]\\.\\S*>";
Regex rgx = new Regex(pattern);
var matches = rgx.Matches(httpData);
The result is that matches will contain a single result with the value <device[195].level>&name=<device[195].name>
I'm guessing there must be a way to 'terminate' the pattern, but I'm not sure what it is.
Use non-greedy quantifiers:
<device\[\d+\]\.\S+?>
Also, use verbatim strings for escaping regexes, it makes them much more readable:
var pattern = #"<device\[\d+\]\.\S+?>";
As a side note, I guess in your case using \w instead of \S would be more in line with what you intended, but I left the \S because I can't know that.
depends how much of the structure of the angle blocks you need to match, but you can do
"\\<device.+?\\>"
I want to create a RegEx that will parse out each of the <device> tags
I'd expect two items to be matched from my input string:
1. <device[195].level>
2. <device[195].name>
This should work. Get the matched group from index 1
(<device[^>]*>)
Live demo
String literals for use in programs:
#"(<device[^>]*>)"
Change your repetition operator and use \w instead of \S
var pattern = #"<device\[[0-9]+\]\.\w+>";
String s = #"level=<device[195].level>&name=<device[195].name>";
foreach (Match m in Regex.Matches(s, #"<device\[[0-9]+\]\.\w+>"))
Console.WriteLine(m.Value);
Output
<device[195].level>
<device[195].name>
Use named match groups and create a linq entity projection. There will be two matches, thus separating the individual items:
string data = "level=<device[195].level>&name=<device[195].name>";
string pattern = #"
(?<variable>[^=]+) # get the variable name
(?:=<device\[) # static '=<device'
(?<index>[^\]]+) # device number index
(?:]\.) # static ].
(?<sub>[^>]+) # Get the sub command
(?:>&?) # Match but don't capture the > and possible &
";
// Ignore pattern whitespace is to document the pattern, does not affect processing.
var items = Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
Variable = mt.Groups["variable"].Value,
Index = mt.Groups["index"].Value,
Sub = mt.Groups["sub"].Value
})
.ToList();
items.ForEach(itm => Console.WriteLine ("{0}:{1}:{2}", itm.Variable, itm.Index, itm.Sub));
/* Output
level:195:level
name:195:name
*/

Regex global without collection c#

How can I get global Regex in one string, and not in collection
Regex r = new Regex(".+");
Match match = r.Match("aaaa \r\n bbbb");
string result=match.Value;
I get: result="aaaa " and I want: result="aaaa \r\n bbbb"
I know that I can get that in a collection, but I need get that in Match datatype rather.
. doesn't match linebreaks unless you make it to.
You can use (?s) for that, like: new Regex("(?s).+")
Or the Singleline option, like: new Regex(".+", RegexOptions.Singleline)
The dot does not match newline characters by default, so you need to compile the regex using the RegexOptions.Singleline flag:
Regex r = new Regex(".+", RegexOptions.Singleline);

Categories