I'm trying to write a VBA parser; in order to create a ConstantNode, I need to be able to match all possible variations of a Const declaration.
These work beautifully:
Const foo = 123
Const foo$ = "123"
Const foo As String = "123"
Private Const foo = 123
Public Const foo As Integer = 123
Global Const foo% = 123
But I have 2 problems:
If there's a comment at the end of the declaration, I'm picking it up as part of the value:
Const foo = 123 'this comment is included as part of the value
If there's two or more constants declared in the same instruction, I'm failing to match the entire instruction:
Const foo = 123, bar = 456
Here is the regular expressions I'm using:
/// <summary>
/// Gets a regular expression pattern for matching a constant declaration.
/// </summary>
/// <remarks>
/// Constants declared in class modules may only be <c>Private</c>.
/// Constants declared at procedure scope cannot have an access modifier.
/// </remarks>
public static string GetConstantDeclarationSyntax()
{
return #"^((Private|Public|Global)\s)?Const\s(?<identifier>[a-zA-Z][a-zA-Z0-9_]*)(?<specifier>[%&#!#$])?(?<as>\sAs\s(?<reference>(((?<library>[a-zA-Z][a-zA-Z0-9_]*))\.)?(?<identifier>[a-zA-Z][a-zA-Z0-9_]*)))?\s\=\s(?<value>.*)$";
}
Obviously both issues are caused by the (?<value>.*)$ part, which matches anything up until the end of the line. I got VariableNode to support multiple declarations in one instruction by enclosing the whole pattern in a capture group and adding an optional comma, but because constants have this value group, doing that resulted in the first constant having all following declarations captured as part of its value... which brings me back to problem #1.
I wonder if it's at all possible to solve problem #1 with a regular expression, given that the value may be a string that contains an apostrophe, and possibly some escaped (doubled-up) double quotes.
I think I can solve it in the ConstantNode class itself, in the getter for Value:
/// <summary>
/// Gets the constant's value. Strings include delimiting quotes.
/// </summary>
public string Value
{
get
{
return RegexMatch.Groups["value"].Value;
}
}
I mean, I could implement some additional logic in here, to do what I can't do with a regex.
If problem #1 can be solved with a regex, then I believe problem #2 can be as well... or am I on the right track here? Should I ditch the [pretty complex] regex patterns and think of another way? I'm not too familiar with greedy subexpressions, backreferences and other more advanced regex features - is this what's limiting me, or it's just that I'm using the wrong hammer for this nail?
Note: it doesn't matter that the patterns potentially match illegal syntax - this code will only run against compilable VBA code.
Let me go ahead and add the disclaimer on this one. This is absolutely not a good idea (but it was a fun challenge). The regex(s) I'm about to present will parse the test cases in the question, but they obviously are not bullet proof. Using a parser will save you a lot of headache later. I did try to find a parser for VBA, but came up empty handed (and I'm assuming everyone else has too).
Regex
For this to work nicely, you need to have some control over the VBA code coming in. If you can't do this, then you truly need to be looking at writing a parser instead of using Regexes. However, judging from what you already said, you may have a little bit of control. So maybe this will help out.
So for this, I had to split the regex into two distinct regexes. The reason for this is the .Net Regex library cannot handle capturing groups within a repeating group.
Capture the line and start parsing, this will place the variables (with the values) into a single group, but the second Regex will parse them. Just fyi, the regexes make use of negative lookbehinds.
^(?:(?<Accessibility>Private|Public|Global)\s)?Const\s(?<variable>[a-zA-Z][a-zA-Z0-9_]*(?:[%&#!#$])?(?:\sAs)?\s(?:(?:[a-zA-Z][a-zA-Z0-9_]*)\s)?=\s[^',]+(?:(?:(?!"").)+"")?(?:,\s)?){1,}(?:'(?<comment>.+))?$
Regex Demo
Here's the regex to parse the variables
(?<identifier>[a-zA-Z][a-zA-Z0-9_]*)(?<specifier>[%&#!#$])?(?:\sAs)?\s(?:(?<reference>[a-zA-Z][a-zA-Z0-9_]*)\s)?=\s(?<value>[^',]+(?:(?:(?!").)+")?),?
Regex Demo
And here's some c# code you can toss in and test everything out. This should make it easy to test any edge cases you have.
static void Main(string[] args)
{
List<String> test = new List<string> {
"Const foo = 123",
"Const foo$ = \"123\"",
"Const foo As String = \"1'2'3\"",
"Const foo As String = \"123\"",
"Private Const foo = 123",
"Public Const foo As Integer = 123",
"Global Const foo% = 123",
"Const foo = 123 'this comment is included as part of the value",
"Const foo = 123, bar = 456",
"'Const foo As String = \"123\"",
};
foreach (var str in test)
Parse(str);
Console.Read();
}
private static Regex parse = new Regex(#"^(?:(?<Accessibility>Private|Public|Global)\s)?Const\s(?<variable>[a-zA-Z][a-zA-Z0-9_]*(?:[%&#!#$])?(?:\sAs)?\s(?:(?:[a-zA-Z][a-zA-Z0-9_]*)\s)?=\s[^',]+(?:(?:(?!"").)+"")?(?:,\s)?){1,}(?:'(?<comment>.+))?$", RegexOptions.Compiled | RegexOptions.Singleline, new TimeSpan(0, 0, 20));
private static Regex variableRegex = new Regex(#"(?<identifier>[a-zA-Z][a-zA-Z0-9_]*)(?<specifier>[%&#!#$])?(?:\sAs)?\s(?:(?<reference>[a-zA-Z][a-zA-Z0-9_]*)\s)?=\s(?<value>[^',]+(?:(?:(?!"").)+"")?),?", RegexOptions.Compiled | RegexOptions.Singleline, new TimeSpan(0, 0, 20));
public static void Parse(String str)
{
Console.WriteLine(String.Format("Parsing: {0}", str));
var match = parse.Match(str);
if (match.Success)
{
//Private/Public/Global
var accessibility = match.Groups["Accessibility"].Value;
//Since we defined this with atleast one capture, there should always be something here.
foreach (Capture variable in match.Groups["variable"].Captures)
{
//Console.WriteLine(variable);
var variableMatch = variableRegex.Match(variable.Value);
if (variableMatch.Success)
{
Console.WriteLine(String.Format("Identifier: {0}", variableMatch.Groups["identifier"].Value));
if (variableMatch.Groups["specifier"].Success)
Console.WriteLine(String.Format("specifier: {0}", variableMatch.Groups["specifier"].Value));
if (variableMatch.Groups["reference"].Success)
Console.WriteLine(String.Format("reference: {0}", variableMatch.Groups["reference"].Value));
Console.WriteLine(String.Format("value: {0}", variableMatch.Groups["value"].Value));
Console.WriteLine("");
}
else
{
Console.WriteLine(String.Format("FAILED VARIABLE: {0}", variable.Value));
}
}
if (match.Groups["comment"].Success)
{
Console.WriteLine(String.Format("Comment: {0}", match.Groups["comment"].Value));
}
}
else
{
Console.WriteLine(String.Format("FAILED: {0}", str));
}
Console.WriteLine("+++++++++++++++++++++++++++++++++++++++++++++");
Console.WriteLine("");
}
The c# code was just what I was using to test my theory, so I apologize for the craziness in it.
For completeness here's a small sample of the output. If you run the code you'll get more output, but this directly shows that it can handle the situations you were asking about.
Parsing: Const foo = 123 'this comment is included as part of the value
Identifier: foo
value: 123
Comment: this comment is included as part of the value
Parsing: Const foo = 123, bar = 456
Identifier: foo
value: 123
Identifier: bar
value: 456
What it handles
Here are the major cases I can think of that you're probably interested in. It should still handle everything you had before as I just added to the regex you provided.
Comments
Multiple variable declarations on a single line
The apostrophe (comment character) within a string value. Ie foo = "She's awesome"
If the line starts with a comment, the line should be ignored
What it doesn't handle
The one thing I didn't really handle was spacing, but it shouldn't be hard add that in yourself if you need it. So for instance if the declare multiple variables there MUST be a space after the comma. ie (VALID: foo = 123, foobar = 124) (INVALID: foo = 123,foobar = 124)
You won't get much leniency on the format from it, but there's not a whole lot you can do with that when using regexes.
Hope this helps you out, and if you need any more explanation on how any of this works just let me know. Just know this is a bad idea. You'll run into situations that the regex can't handle. If I was in your position, I'd be considering writing a simple parser which would give you greater flexibility in the long run. Good luck.
Related
the interpolated string is easy, just a string lead with $ sign. But what if the string template is coming from outside of your code. For example assume you have a XML file containing following line:
<filePath from="C:\data\settle{date}.csv" to="D:\data\settle{date}.csv"/>
Then you can use LINQ to XML read the content of the attributes in.
//assume the ele is the node <filePath></filePath>
string pathFrom = ele.Attribute("from").value;
string pathTo = ele.Attibute("to").value;
string date = DateTime.Today.ToString("MMddyyyy");
Now how can I inject the date into the pathFrom variable and pathTo variable?
If I have the control of the string itself, things are easy. I can just do var xxx=$"C:\data\settle{date}.csv";But now, what I have is only the variable that I know contains the placeholder date
String interpolation is a compiler feature, so it cannot be used at runtime. This should be clear from the fact that the names of the variables in the scope will in general not be availabe at runtime.
So you will have to roll your own replacement mechanism. It depends on your exact requirements what is best here.
If you only have one (or very few replacements), just do
output = input.Replace("{date}", date);
If the possible replacements are a long list, it might be better to use
output = Regex.Replace(input, #"\{\w+?\}",
match => GetValue(match.Value));
with
string GetValue(string variable)
{
switch (variable)
{
case "{date}":
return DateTime.Today.ToString("MMddyyyy");
default:
return "";
}
}
If you can get an IDictionary<string, string> mapping variable names to values you may simplify this to
output = Regex.Replace(input, #"\{\w+?\}",
match => replacements[match.Value.Substring(1, match.Value.Length-2)]);
You can't directly; the compiler turns your:
string world = "world";
var hw = $"Hello {world}"
Into something like:
string world = "world";
var hw = string.Format("Hello {0}", world);
(It chooses concat, format or formattablestring depending on the situation)
You could engage in a similar process yourself, by replacing "{date" with "{0" and putting the date as the second argument to a string format, etc.
SOLUTION 1:
If you have the ability to change something on xml template change {date} to {0}.
<filePath from="C:\data\settle{0}.csv" to="D:\data\settle{0}.csv" />
Then you can set the value of that like this.
var elementString = string.Format(element.ToString(), DateTime.Now.ToString("MMddyyyy"));
Output: <filePath from="C:\data\settle08092020.csv" to="D:\data\settle08092020.csv" />
SOLUTION 2:
If you can't change the xml template, then this might be my personal course to go.
<filePath from="C:\data\settle{date}.csv" to="D:\data\settle{date}.csv" />
Set the placeholder like this.
element.Attribute("to").Value = element.Attribute("to").Value.Replace("{date}", DateTime.Now.ToString("MMddyyyy"));
element.Attribute("from").Value = element.Attribute("from").Value.Replace("{date}", DateTime.Now.ToString("MMddyyyy"));
Output: <filePath from="C:\data\settle08092020.csv" to="D:\data\settle08092020.csv" />
I hope it helps. Kind regards.
If you treat your original string as a user-input string (or anything that is not processed by the compiler to replace the placeholder, then the question is simple - just use String.Replace() to replace the placehoder {date}, with the value of the date as you wish. Now the followup question is: are you sure that the compiler is not substituting it during compile time, and leaving it untouched for handling at the runtime?
String interpolation allows the developer to combine variables and text to form a string.
Example
Two int variables are created: foo and bar.
int foo = 34;
int bar = 42;
string resultString = $"The foo is {foo}, and the bar is {bar}.";
Console.WriteLine(resultString);
Output:
The foo is 34, and the bar is 42.
Problem
I need to sanitize a collection of Strings from user input to a valid property name.
Context
We have a DataGrid that works with runtime generated classes. These classes are generated based on some parameters. Parameter names are converted into Properties. Some of these parameter names are from user input. We implemented this and it all seemed to work great. Our logic to sanitizing strings was to only allow numbers and letters and convert the rest to an X.
const string regexPattern = #"[^a-zA-Z0-9]";
return ("X" + Regex.Replace(input, regexPattern, "X")); //prefix with X in case the name starts with a number
The property names were always correct and we stored the original string in a dictionary so we could still show a user friendly parameter name.
However, where the trouble starts is when a string only differs in illegal characters like this:
Parameter Name
Parameter_Name
These were both converted into:
ParameterXName
A solution would be to just generate some safe, unrelated names like A, B C. etc. But I would prefer the name to still be recognizable in debug. Unless it's too complicated to implement this behavior of course.
I looked at other questions on StackOverflow, but they all seem to remove illegal characters, which has the same problem.
I feel like I'm reinventing the wheel. Is there some standard solution or trick for this?
I can suggest to change algorithm of generating safe, unrelated and recognizable names.
In c# _ is valid symbol for member names. Replace all invalid symbols (chr) not with X but with "_"+(short)chr+"_".
demo
public class Program
{
public static void Main()
{
string [] props = {"Parameter Name", "Parameter_Name"};
var validNames = props.Select(s=>Sanitize(s)).ToList();
Console.WriteLine(String.Join(Environment.NewLine, validNames));
}
private static string Sanitize(string s)
{
return String.Join("", s.AsEnumerable()
.Select(chr => Char.IsLetter(chr) || Char.IsDigit(chr)
? chr.ToString() // valid symbol
: "_"+(short)chr+"_") // numeric code for invalid symbol
);
}
}
prints
Parameter_32_Name
Parameter_95_Name
I'm not sure if it's okay to ask... But here goes.
I implemented a method that parses a string using regex, each matching are parsed through the delegates with an order ( actually, order is not important-- I think, wait, is it? ... But I wrote it this way, and it's not fully tested ):
Pattern Regex.Replace: #"(?<!\\)\$.+?\$" then String.Replace: #"\$", #"$"; Replace string enclosed by dollar sign. Ignores backslash ones, then erases backslash. Ex: "$global name$" -> "motherofglobalvar", "Money \$9000" -> "Money $9000"
Pattern Regex.Replace #"(?<!\\)%.+?%" then String.Replace #"\%", #"%"; Replace string enclosed by percentage sign. Ignores backslash ones, then erase backslash. Same as previous example: "%local var%" -> "lordoflocalvar", "It's over 9000\%" -> "It's over 9000%"
Pattern Regex.Replace #"(?<!\\)#" then String.Replace #"\#", #"#"; Replace char '#' with whitespace, ' '. But ignore backslash ones, then erase the backslash. Ex: "I#hit#the#ground#too#hard" -> "I hit the ground too hard", "qw\#op" -> "qw#op"
What I've done without much experience (I think):
//parse variable
public static string ParseVariable(string text)
{
return Regex.Replace(Regex.Replace(Regex.Replace(text, #"(?<!\\)\$.+?\$", match =>
{
string trim = match.Value.Trim('$');
string trimUpper = trim.ToUpper();
return variableGlobal.ContainsKey(trim) ? variableGlobal[trim] : match.Value;
}).Replace(#"\$", #"$"), #"(?<!\\)%.+?%", match =>
{
string trim = match.Value.Trim('%');
string trimUpper = trim.ToUpper();
return variableLocal.ContainsKey(trim) ? variableLocal[trim] : match.Value;
}).Replace(#"\%", #"%"), #"(?<!\\)#", " ").Replace(#"\#", #"#");
}
In short, what I used is: Regex.Replace().Replace()
Since I need to parse 3 kinds of symbols, I chained it as following: Regex.Replace(Regex.Replace(Regex.Replace().Replace()).Replace()).Replace()
Is there any more efficient way than this? I mean, like without need to go through the text 6 times? (3 times regex.replace, 3 times string.replace, where each replace modifies the text to be used by the next replace )
Or is it the best way it can do?
Thanks.
Here's a unique take on the problem, I think. You can build a class that will be used to construct the overall pattern piece-by-piece. This class will be responsible for the generating of the MatchEvaluator delegate that will be passed to Replace as well.
class RegexReplacer
{
public string Pattern { get; private set; }
public string Replacement { get; private set; }
public string GroupName { get; private set; }
public RegexReplacer NextReplacer { get; private set; }
public RegexReplacer(string pattern, string replacement, string groupName, RegexReplacer nextReplacer = null)
{
this.Pattern = pattern;
this.Replacement = replacement;
this.GroupName = groupName;
this.NextReplacer = nextReplacer;
}
public string GetAggregatedPattern()
{
string constructedPattern = this.Pattern;
string alternation = (this.NextReplacer == null ? string.Empty : "|" + this.NextReplacer.GetAggregatedPattern()); // If there isn't another replacer, then we won't have an alternation; otherwise, we build an alternation between this pattern and the next replacer's "full" pattern
constructedPattern = string.Format("(?<{0}>{1}){2}", this.GroupName, this.Pattern, alternation); // The (?<XXX>) syntax builds a named capture group. This is used by our GetReplacementDelegate metho.
return constructedPattern;
}
public MatchEvaluator GetReplaceDelegate()
{
return (match) =>
{
if (match.Groups[this.GroupName] != null && match.Groups[this.GroupName].Length > 0) // Did we get a hit on the group name?
{
return this.Replacement;
}
else if (this.NextReplacer != null) // No? Then is there another replacer to inspect?
{
MatchEvaluator next = this.NextReplacer.GetReplaceDelegate();
return next(match);
}
else
{
return match.Value; // No? Then simply return the value
}
};
}
}
It should be obvious as to what Pattern and Replacement represent. GroupName is kind of a hack to let the replacement evaluator know which RegexReplacer fragment resulted in the match. NextReplacer points to another replacer instance that holds a different pattern fragment (et al.).
The idea here is to have a kind of linked list of objects that will represent the overall pattern. You can call GetAggregatedPattern on the outer-most replacer to get the full pattern--each replacer calls the next replacer's GetAggregatedPattern to get that replacer's patter fragment, to which it concatenates its own fragment. The GetReplacementDelegate generates a MatchEvaluator. This MatchEvaluator will compare its own GroupName to the Match's captured groups. If the group name was captured, then we have a hit, and we return this replacer's Replacement value. Otherwise, we step into the next replacer (if there is one) and repeat the group name comparison. If there is no hit on any replacer, then we simply yield back the original value (i.e. what was matched by the pattern; this should be rare).
The usage of such might look like this:
string target = #"$global name$ Money \$9000 %local var% It's over 9000\% I#hit#the#ground#too#hard qw\#op";
RegexReplacer dollarWrapped = new RegexReplacer(#"(?<!\\)\$[^$]+\$", "motherofglobalvar", "dollarWrapped");
RegexReplacer slashDollar = new RegexReplacer(#"\\\$", string.Empty, "slashDollar", dollarWrapped);
RegexReplacer percentWrapped = new RegexReplacer(#"(?<!\\)%[^%]+%", "lordoflocalvar", "percentWrapped", slashDollar);
RegexReplacer slashPercent = new RegexReplacer(#"\\%", string.Empty, "slashPercent", percentWrapped);
RegexReplacer singleAt = new RegexReplacer(#"(?<!\\)#", " ", "singleAt", slashPercent);
RegexReplacer slashAt = new RegexReplacer(#"\\#", "#", "slashAt", singleAt);
RegexReplacer replacer = slashAt;
string pattern = replacer.GetAggregatedPattern();
MatchEvaluator evaluator = replacer.GetReplaceDelegate();
string result = Regex.Replace(target, pattern, evaluator);
Because you want each replacer to know if it got a hit, and because we are hacking this by using group names, you want to make sure that each group name is distinct. A simple way to ensure this would be to use a name that's identical to the variable name since you can't have two variables with the same name within the same scope.
You can see above that I am building each part of the pattern separately, but as I build, I pass the previous replacer as a 4th parameter to the current replacer. This builds the chain of replacers. Once built, I use the last replacer constructed in order to generate the overall pattern and evaluator. If you use anything but, then you will only have part of the overall pattern. Finally, it's simply a matter of passing the generated pattern and evaluator to the Replace method.
Keep in mind that this approach was targeted more at the problem as described. It may work in more general scenarios, but I've only worked with what you've presented. Also, since this is more of a parsing question, a parser may be the proper route to take--although the learning curve is going to be higher.
Also keep in mind that I haven't profiled this code. It certainly doesn't loop over the target string multiple times, but it does involve additional method calls during replacement. You would certainly want to test it in your environment.
I have a file composed of several functions:
public int function(param1, param2, param3, param4,paramN)
{
}
public void function1(param1, param2)
{
}
public void function2()
{
}
I've made this code to get the functions name and parameters:
class Program
{
static void Main(string[] args)
{
// Read the file as one string.
System.IO.StreamReader myFile =
new System.IO.StreamReader("E:\\new1.c");
string myString = myFile.ReadToEnd();
string Expression = myString;
///
/// Get function name
///
var func = Regex.Match(Expression, #"\b[^()]+\((.*)\)$");
Console.WriteLine("FuncTag: " + func.Value);
string innerArgs = func.Groups[1].Value;
///
/// Get parameters
///
var paramTags = Regex.Matches(innerArgs, #"([^,]+\(.+?\))|([^,]+)");
Console.WriteLine("Matches: " + paramTags.Count);
foreach (var item in paramTags)
Console.WriteLine("ParamTag: " + item);
myFile.Close();
// Display the file contents.
Console.WriteLine(myString);
// Suspend the screen.
Console.ReadLine();
Console.ReadKey();
}
}
This works fine when I put string Expression ="function(param1, param2, param3, param4,paramN)", but when I put instead the name of the file it doesn't work .I want to read all functions from the file and show all the functions with their parameters. How can I fix this?
Text editors not written for a specific programming language like UltraEdit or Notepad++ use also regular expressions to find names of functions / methods, function parameters, etc.
Therefore it is possible to use the regular expressions of those text editors also in C# coded applications.
Attention!
A regular expression approach to identify symbol names is never perfect and may return false positives or incorrect data.
For example using block or line comments within the parentheses of a function definition containing special characters like ) or { in the comments will always produce wrong results. And of course the regular expression search would not eliminiate block and line comments within a function definition before as a C/C++/C# compiler does.
UltraEdit v21.20 uses in syntax highlighting wordfile c_cplusplus.uew the Perl regular expression string
^(?!if\b|else\b|while\b|[\s*])(?:[\w*~_&]+?\s+){1,6}([\w:*~_&]+\s*)\([^);]*\)[^{;]*?(?:^[^\r\n{]*;?[\s]+){0,10}\{
There is only 1 marking group in this expression which (usually) matches the name of the function / method.
The expression above matches in a C/C++ file everything from beginning of a function definition line up to opening brace, again with the limitation of not working for all function definition formats possible in C/C++ source files.
UltraEdit searches next inside the entire found string for \( and \) and extracts this string for further processing for function parameters using the expression \s*([^,]+).
Therefore it would be possible to use
^(?!if\b|else\b|while\b|[\s*])(?:[\w*~_&]+?\s+){1,6}([\w:*~_&]+\s*)\(([^);]*)\)[^{;]*?(?:^[^\r\n{]*;?[\s]+){0,10}\{
which contains now 2 marking groups:
The first one matches the name of the function / method.
The second one matches everything between the parentheses.
So every second match can be searched next by \s*([^,]+) to get the function parameters with type (and comments) and on all found strings the String.TrimEnd method should be used to remove all trailing spaces and tabs.
Suppose I want to ask a user what format they want a certain output to be in and the output will include fill-in fields. So they provide something like this string:
"Output text including some field {FieldName1Value} and another {FieldName2Value} and so on..."
Anything bound by the {} should be a column name in a table somewhere they will be replaced with the the stored value with the code I am writing. Seems simple, I could just do a string.Replace on any instance that matches the patter "{" + FieldName + "}". But, what if I also want to give the user the option of using an escape so they can use brackets like any other string. I was thinking they provide "{{" or "}}" to escape that bracket - nice and easy for them. So, they could provide something like:
"Output text including some field {FieldName1Value} and another {FieldName2Value} but not this {{FieldName2Value}}"
But now that "{{FieldName2Value}}" is to be treated like any other string and ignored by the by the Replace. Also, if they decided to put something like "{{{FieldName2Value}}}" with the triple brackets, that would be interpreted by the code as the field value wrapped with brackets and so on.
This is where I get stuck. I am trying with RegEx and came up with this:
public object Convert(object[] values, Type targetType, object parameter, CultureInfo culture)
{
string format = (string)values[0];
ObservableCollection<CalloutFieldAliasMap> oc = (ObservableCollection<CalloutFieldAliasMap>)values[1];
foreach (CalloutFieldMap map in oc)
format = Regex.Replace(format, #"(?<!{){" + map.FieldName + "(?<!})}", " " + map.FieldAlias + " ", RegexOptions.IgnoreCase);
return format;
}
This works in the situation with double brackets {{ }} but NOT if there are three, ie {{{ }}}. The triple brackets are treated like string when it should be treated as {FieldValue}.
Thanks for any help.
By expanding on your regular expression, the presence of literals can be accommodated.
format = Regex.Replace(format,
#"(?<!([^{]|^){(?:{{)*){" + Regex.Escape(map.FieldName) + "}",
String.Format(" {0} ", map.FieldAlias),
RegexOptions.IgnoreCase | RegexOptions.Compiled);
The first part of the expression, (?<!([^{]|^){(?:{{)*){, designates that the { must be preceded by an even number of { characters for it to mark the beginning of a field token. Thus, {FieldName} and {{{FieldName} will denote the start of a field name, whereas {{FieldName} and {{{{FieldName} would not.
The closing } simply requires that the end of the field be a simple }. There is some ambiguity in the syntax in that {FieldName1Value}}} could be parsed as a token with FieldName1Value (followed by the literal }) or FieldName1Value}. The regex assumes the former. (If the latter is intended, you could replace this with }(?!}(}})*) instead.
A couple of other notes. I added Regex.Escape(map.FieldName) so that all characters in the field name are treated as literals; and added the RegexOptions.Compiled flag. (Since this is both a complex expression and executed in a loop, it is a good candidate for compilation.)
After the loop executes, a simple:
format = format.Replace("{{", "{").Replace("}}", "}")
can be used to unescape the literal {{ and }} characters.
The simplest way would be to use String.Replace to replace the double brackets with a character sequence that the user can not (or almost certainly will not) enter. Then do the replacement of your fields, and finally convert replacement back to the double brackets.
For example, given:
string replaceOpen = "{x"; // 'x' should be something like \u00ff, for example
string replaceClose = "x}";
string template = "Replace {ThisField} but not {{ThatField}}";
string temp = template.Replace("{{", replaceOpen).Replace("}}", replaceClose);
string converted = temp.Replace("{ThisField}", "Foo");
string final = converted.Replace(replaceOpen, "{{").Replace(replaceClose, "}});
It's not particularly pretty, but it's effective.
How you go about it is going to depend in large part on how often you call this, and how fast you really need it to be.
I have an extension method I wrote that almost does what you ask, but, while it does escape using double braces, it doesn't do the triple braces like you suggested. Here is the method (also on GitHub at https://github.com/benallred/Icing/blob/master/Icing/Icing.Core/StringExtensions.cs):
private const string FormatTokenGroupName = "token";
private static readonly Regex FormatRegex = new Regex(#"(?<!\{)\{(?<" + FormatTokenGroupName + #">\w+)\}(?!\})", RegexOptions.Compiled);
public static string Format(this string source, IDictionary<string, string> replacements)
{
if (string.IsNullOrWhiteSpace(source) || replacements == null)
{
return source;
}
string replaced = replacements.Aggregate(source,
(current, pair) =>
FormatRegex.Replace(current,
new MatchEvaluator(match =>
(match.Groups[FormatTokenGroupName].Value == pair.Key
? pair.Value : match.Value))));
return replaced.Replace("{{", "{").Replace("}}", "}");
}
Usage:
"This is my {FieldName}".Format(new Dictionary<string, string>() { { "FieldName", "value" } });
Even easier if you add this:
public static string Format(this string source, object replacements)
{
if (string.IsNullOrWhiteSpace(source) || replacements == null)
{
return source;
}
IDictionary<string, string> replacementsDictionary = new Dictionary<string, string>();
foreach (PropertyDescriptor propertyDescriptor in TypeDescriptor.GetProperties(replacements))
{
string token = propertyDescriptor.Name;
object value = propertyDescriptor.GetValue(replacements);
replacementsDictionary.Add(token, (value != null ? value.ToString() : String.Empty));
}
return Format(source, replacementsDictionary);
}
Usage:
"This is my {FieldName}".Format(new { FieldName = "value" });
Unit tests for this method are at https://github.com/benallred/Icing/blob/master/Icing/Icing.Tests/Core/TestOf_StringExtensions.cs
If this doesn't work, what would your ideal solution do for more than three braces? In other words, if {{{FieldName}}} becomes {value}, what does {{{{FieldName}}}} become? What about {{{{{FieldName}}}}} and so on? While those cases are unlikely, they still need to be handled purposefully.
RegEx will not do what you want because it only knows it's current state and what transitions are available. It has no concept of memory. The language you're trying parse is not regular so you will never be able to write a RegEx to handle the general case. You would need i expressions where i is the number of matching braces.
There is a lot of theory behind this and I'll provide some links at the bottom if you're curious. But basically the language you're trying to parse is context-free and to implement a general solution you'll need model a push down automaton, which uses a stack to ensure that an opening brace has a matching closing brace (yes, this is why most languages have matching braces).
Each time you encounter { you put it on the stack. If you encounter } you pop from the stack. When you empty the stack you will know that you've reached the end of a field. Of course that's a major simplification of the problem, but if you're looking for a general solution it should get you moving in the right direction.
http://en.wikipedia.org/wiki/Regular_language
http://en.wikipedia.org/wiki/Context-free_language
http://en.wikipedia.org/wiki/Pushdown_automaton