Here is the data source, lines stored in a txt file:
servers[i]=["name1", type1, location3];
servers[i]=["name2", type2, location3];
servers[i]=["name3", type1, location7];
Here is my code:
string servers = File.ReadAllText("servers.txt");
string pattern = "^servers[i]=[\"(?<name>.*)\", (.*), (?<location>.*)];$";
Regex reg = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Multiline);
Match m;
for (m = reg.Match(servers); m.Success; m = m.NextMatch()) {
string name = m.Groups["name"].Value;
string location = m.Groups["location"].Value;
}
No lines are matching. What am I doing wrong?
If you don't care about anything except the servername and location, you don't need to specify the rest of the input in your regex. That lets you avoid having to escape the brackets, as Graeme correctly points out. Try something like:
string pattern = "\"(?<name>.+)\".+\s(?<location>[^ ]+)];$"
That's
\" = quote mark,
(?<name> = start capture group 'name',
.+ = match one or more chars (could use \w+ here for 1+ word chars)
) = end the capture group
\" = ending quote mark
.+\s = one or more chars, ending with a space
(?<location> = start capture group 'location',
[^ ]+ = one or more non-space chars
) = end the capture group
];$ = immediately followed by ]; and end of string
I tested this using your sample data in Rad Software's free Regex Designer, which uses the .NET regex engine.
I don't know if C# regex's are the same as perl, but if so, you probably want to escape the [ and ] characters. Also, there are extra characters in there. Try this:
string pattern = "^servers\[i\]=\[\"(?<name>.*)\", (.*), (?<location>.*)\];$";
Edited to add: After wondering why my answer was downvoted and then looking at Val's answer, I realized that the "extra characters" were there for a reason. They are what perl calls "named capture buffers", which I have never used but the original code fragment does. I have updated my answer to include them.
try this
string pattern = "servers[i]=[\"(?<name>.*)\", (.*), (?<location>.*)];$";
Related
My situation is not about removing empty spaces, but keeping them. I have this string >[database values] which I would like to find. I created this RegEx to find it then go in and remove the >, [, ]. The code below takes a string that is from a document. The first pattern looks for anything that is surrounded by >[some stuff] it then goes in and "removes" >, [, ]
string decoded = "document in string format";
string pattern = #">\[[A-z, /, \s]*\]";
string pattern2 = #"[>, \[, \]]";
Regex rgx = new Regex(pattern);
Regex rgx2 = new Regex(pattern2);
foreach (Match match in rgx.Matches(decoded))
{
string replacedValue= rgx2.Replace(match.Value, "");
Console.WriteLine(match.Value);
Console.WriteLine(replacedValue);
What I am getting in first my Console.WriteLine is correct. So I would be getting things like >[123 sesame St]. But my second output shows that my replace removes not just the characters but the spaces so I would get something like this 123sesameSt. I don't see any space being replaced in my Regex. Am I forgetting something, perhaps it is implicitly in a replace?
The [A-z, /, \s] and [>, \[, \]] in your patterns are also looking for commas and spaces. Just list the characters without delimiting them, like this: [A-Za-z/\s]
string pattern = #">\[[A-Za-z/\s]*\]";
string pattern2 = #"[>,\[\]]";
Edit to include Casimir's tip.
After rereading your question (if I understand well) I realize that your two steps approach is useless. You only need one replacement using a capture group:
string pattern = #">\[([^]]*)]";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(yourtext, "$1");
pattern details:
>\[ # literals: >[
( # open the capture group 1
[^]]* # all that is not a ]
) # close the capture group 1
] # literal ]
the replacement string refers to the capture group 1 with $1
By defining [>, \[, \]] in pattern2 you define a character group consisting of single characters like >, ,, , [ and every other character you listed in the square brackets. But I guess you don't want to match space and ,. So if you don't want to match them leave them out like
string pattern2 = #"[>\[\]]";
Alternatively, you could use
string pattern2 = #"(>\[|\])";
Thereby, you either match >[ or ] which better expresses your intention.
I've got an input string that looks like this:
level=<device[195].level>&name=<device[195].name>
I want to create a RegEx that will parse out each of the <device> tags, for example, I'd expect two items to be matched from my input string: <device[195].level> and <device[195].name>.
So far I've had some luck with this pattern and code, but it always finds both of the device tags as a single match:
var pattern = "<device\\[[0-9]*\\]\\.\\S*>";
Regex rgx = new Regex(pattern);
var matches = rgx.Matches(httpData);
The result is that matches will contain a single result with the value <device[195].level>&name=<device[195].name>
I'm guessing there must be a way to 'terminate' the pattern, but I'm not sure what it is.
Use non-greedy quantifiers:
<device\[\d+\]\.\S+?>
Also, use verbatim strings for escaping regexes, it makes them much more readable:
var pattern = #"<device\[\d+\]\.\S+?>";
As a side note, I guess in your case using \w instead of \S would be more in line with what you intended, but I left the \S because I can't know that.
depends how much of the structure of the angle blocks you need to match, but you can do
"\\<device.+?\\>"
I want to create a RegEx that will parse out each of the <device> tags
I'd expect two items to be matched from my input string:
1. <device[195].level>
2. <device[195].name>
This should work. Get the matched group from index 1
(<device[^>]*>)
Live demo
String literals for use in programs:
#"(<device[^>]*>)"
Change your repetition operator and use \w instead of \S
var pattern = #"<device\[[0-9]+\]\.\w+>";
String s = #"level=<device[195].level>&name=<device[195].name>";
foreach (Match m in Regex.Matches(s, #"<device\[[0-9]+\]\.\w+>"))
Console.WriteLine(m.Value);
Output
<device[195].level>
<device[195].name>
Use named match groups and create a linq entity projection. There will be two matches, thus separating the individual items:
string data = "level=<device[195].level>&name=<device[195].name>";
string pattern = #"
(?<variable>[^=]+) # get the variable name
(?:=<device\[) # static '=<device'
(?<index>[^\]]+) # device number index
(?:]\.) # static ].
(?<sub>[^>]+) # Get the sub command
(?:>&?) # Match but don't capture the > and possible &
";
// Ignore pattern whitespace is to document the pattern, does not affect processing.
var items = Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
Variable = mt.Groups["variable"].Value,
Index = mt.Groups["index"].Value,
Sub = mt.Groups["sub"].Value
})
.ToList();
items.ForEach(itm => Console.WriteLine ("{0}:{1}:{2}", itm.Variable, itm.Index, itm.Sub));
/* Output
level:195:level
name:195:name
*/
I am trying to find the correct regex syntax for matching and splitting on a word that is surrounded by double brackets.
const string originalString = "I love to [[verb]] while I [[verb]].";
I tried
var arrayOfStrings = Regex.Split(originalString,#"\[\[(.+)\]\]");
But it did not work correctly. I don't know what I am doing wrong
I would like the arrayOfStrings to come out like so
arrayOfStrings[0] = "I love to "
arrayOfStrings[1] = "[[verb]]"
arrayOfStrings[2] = " while I "
arrayOfStrings[3] = "[[verb]]"
arrayOfStrings[4] = "."
I think that is what you need.
string input = "I love to [[verb]] while I [[verb]].";
string pattern = #"(\[\[.+?\]\])";
string[] matches = Regex.Split( input, pattern );
foreach (string match in matches)
{
Console.WriteLine(match);
}
The answer which will produce exactly what you want is #"(?=\[\[.*?\]\])|(?<=\]\])".
This has two parts to it, separated by the | "or" symbol.
(?=\[\[.*?\]\]) will look for any symbol which is immediately followed by a [[ some characters, and a ]], and split inbetween it and the [.
(?<=\]\]) will look for any symbol which is immediately preceded by ]] and split just after the ].
These are called "lookahead" and "lookbehind", and you can find more variants of them here.
How can I escape certain characters in a string with a C# Regex?
This is a test for % and ' thing? -> This is a test for \% and \' thing?
resultString = Regex.Replace(subjectString,
#"(?<! # Match a position before which there is no
(?<!\\) # odd number of backlashes
\\ # (it's odd if there is one backslash,
(?:\\\\)* # followed by an even number of backslashes)
)
(?=[%']) # and which is followed by a % or a '",
#"\", RegexOptions.IgnorePatternWhitespace);
However, if you're trying to protect yourself against malevolent SQL queries, regex is not the right way to go.
var escapedString = Regex.Replace(input, #"[%']", #"\$1");
This is pretty much all you need. Inside the square brackets, you should put every character you wish to escape with a backslash, which may include the backslash character itself.
I don't think this could be done with regex in good fashion, but you can simply run a for loop:
var specialChars = new char[]{'%',....};
var stream = "";
for (int i=0;i<myStr.Length;i++)
{
if (specialChars.Contains(myStr[i])
{
stream+= '\\';
}
stream += myStr[i];
}
(1) you can use StringBuilder to prevent from too many string creation.
I need to match all the whole words containing a given a string.
string s = "ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
Regex r = new Regex("(?<TM>[!\..]*TEST.*)", ...);
MatchCollection mc = r.Matches(s);
I need the result to be:
MYTESTING
YOUTESTED
TESTING
But I get:
TESTING
TESTED
.TESTING
How do I achieve this with Regular expressions.
Edit: Extended sample string.
If you were looking for all words including 'TEST', you should use
#"(?<TM>\w*TEST\w*)"
\w includes word characters and is short for [A-Za-z0-9_]
Keep it simple: why not just try \w*TEST\w* as the match pattern.
I get the results you are expecting with the following:
string s = #"ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
var m = Regex.Matches(s, #"(\w*TEST\w*)", RegexOptions.IgnoreCase);
Try using \b. It's the regex flag for a non-word delimiter. If you wanted to match both words you could use:
/\b[a-z]+\b/i
BTW, .net doesn't need the surrounding /, and the i is just a case-insensitive match flag.
.NET Alternative:
var re = new Regex(#"\b[a-z]+\b", RegexOptions.IgnoreCase);
Using Groups I think you can achieve it.
string s = #"ABC.TESTING
XYZ.TESTED";
Regex r = new Regex(#"(?<TM>[!\..]*(?<test>TEST.*))", RegexOptions.Multiline);
var mc= r.Matches(s);
foreach (Match match in mc)
{
Console.WriteLine(match.Groups["test"]);
}
Works exactly like you want.
BTW, your regular expression pattern should be a verbatim string ( #"")
Regex r = new Regex(#"(?<TM>[^.]*TEST.*)", RegexOptions.IgnoreCase);
First, as #manojlds said, you should use verbatim strings for regexes whenever possible. Otherwise you'll have to use two backslashes in most of your regex escape sequences, not just one (e.g. [!\\..]*).
Second, if you want to match anything but a dot, that part of the regex should be [^.]*. ^ is the metacharacter that inverts the character class, not !, and . has no special meaning in that context, so it doesn't need to be escaped. But you should probably use \w* instead, or even [A-Z]*, depending on what exactly you mean by "word". [!\..] matches ! or ..
Regex r = new Regex(#"(?<TM>[A-Z]*TEST[A-Z]*)", RegexOptions.IgnoreCase);
That way you don't need to bother with word boundaries, though they don't hurt:
Regex r = new Regex(#"(?<TM>\b[A-Z]*TEST[A-Z]*\b)", RegexOptions.IgnoreCase);
Finally, if you're always taking the whole match anyway, you don't need to use a capturing group:
Regex r = new Regex(#"\b[A-Z]*TEST[A-Z]*\b", RegexOptions.IgnoreCase);
The matched text will be available via Match's Value property.