Sanitizing a String for a Property Name - c#

Problem
I need to sanitize a collection of Strings from user input to a valid property name.
Context
We have a DataGrid that works with runtime generated classes. These classes are generated based on some parameters. Parameter names are converted into Properties. Some of these parameter names are from user input. We implemented this and it all seemed to work great. Our logic to sanitizing strings was to only allow numbers and letters and convert the rest to an X.
const string regexPattern = #"[^a-zA-Z0-9]";
return ("X" + Regex.Replace(input, regexPattern, "X")); //prefix with X in case the name starts with a number
The property names were always correct and we stored the original string in a dictionary so we could still show a user friendly parameter name.
However, where the trouble starts is when a string only differs in illegal characters like this:
Parameter Name
Parameter_Name
These were both converted into:
ParameterXName
A solution would be to just generate some safe, unrelated names like A, B C. etc. But I would prefer the name to still be recognizable in debug. Unless it's too complicated to implement this behavior of course.
I looked at other questions on StackOverflow, but they all seem to remove illegal characters, which has the same problem.
I feel like I'm reinventing the wheel. Is there some standard solution or trick for this?

I can suggest to change algorithm of generating safe, unrelated and recognizable names.
In c# _ is valid symbol for member names. Replace all invalid symbols (chr) not with X but with "_"+(short)chr+"_".
demo
public class Program
{
public static void Main()
{
string [] props = {"Parameter Name", "Parameter_Name"};
var validNames = props.Select(s=>Sanitize(s)).ToList();
Console.WriteLine(String.Join(Environment.NewLine, validNames));
}
private static string Sanitize(string s)
{
return String.Join("", s.AsEnumerable()
.Select(chr => Char.IsLetter(chr) || Char.IsDigit(chr)
? chr.ToString() // valid symbol
: "_"+(short)chr+"_") // numeric code for invalid symbol
);
}
}
prints
Parameter_32_Name
Parameter_95_Name

Related

Using C#6 string interpolation like String.Format [duplicate]

C#6.0 have a string interpolation - a nice feature to format strings like:
var name = "John";
WriteLine($"My name is {name}");
The example is converted to
var name = "John";
WriteLine(String.Format("My name is {0}", name));
From the localization point of view, it is much better to store strings like :
"My name is {name} {middlename} {surname}"
than in String.Format notation:
"My name is {0} {1} {2}"
How to use the string interpolation for .NET localization? Is there going to be a way to put $"..." to resource files? Or should strings be stored like "...{name}" and somehow interpolated on fly?
P.S. This question is NOT about "how to make string.FormatIt extension" (there are A LOT of such libraries, SO answers, etc.). This question is about something like Roslyn extension for "string interpolation" in "localization" context (both are terms in MS .NET vocabulary), or dynamic usage like Dylan proposed.
An interpolated string evaluates the block between the curly braces as a C# expression (e.g. {expression}, {1 + 1}, {person.FirstName}).
This means that the expressions in an interpolated string must reference names in the current context.
For example this statement will not compile:
var nameFormat = $"My name is {name}"; // Cannot use *name*
// before it is declared
var name = "Fred";
WriteLine(nameFormat);
Similarly:
class Program
{
const string interpolated = $"{firstName}"; // Name *firstName* does not exist
// in the current context
static void Main(string[] args)
{
var firstName = "fred";
Console.WriteLine(interpolated);
Console.ReadKey();
}
}
To answer your question:
There is no current mechanism provided by the framework to evaluate interpolated strings at runtime. Therefore, you cannot store strings and interpolate on the fly out of the box.
There are libraries that exist that handle runtime interpolation of strings.
According to this discussion on the Roslyn codeplex site, string interpolation will likely not be compatible with resource files (emphasis mine):
String interpolation could be neater and easier to debug than either String.Format or concatenation...
Dim y = $"Robot {name} reporting
{coolant.name} levels are {coolant.level}
{reactor.name} levels are {reactor.level}"
However, this example is fishy. Most professional programmers won't be writing
user-facing strings in code. Instead they'll be storing those strings in resources (.resw, .resx or .xlf) for reasons of localization. So there doesn't seem much use for string interpolation here.
Assuming that your question is more about how to localise interpolated strings in your source code, and not how to handle interpolated string resources...
Given the example code:
var name = "John";
var middlename = "W";
var surname = "Bloggs";
var text = $"My name is {name} {middlename} {surname}";
Console.WriteLine(text);
The output is obviously:
My name is John W Bloggs
Now change the text assignment to fetch a translation instead:
var text = Translate($"My name is {name} {middlename} {surname}");
Translate is implemented like this:
public static string Translate(FormattableString text)
{
return string.Format(GetTranslation(text.Format),
text.GetArguments());
}
private static string GetTranslation(string text)
{
return text; // actually use gettext or whatever
}
You need to provide your own implementation of GetTranslation; it will receive a string like "My name is {0} {1} {2}" and should use GetText or resources or similar to locate and return a suitable translation for this, or just return the original parameter to skip translation.
You will still need to document for your translators what the parameter numbers mean; the text used in the original code string doesn't exist at runtime.
If, for example, in this case GetTranslation returned "{2}. {0} {2}, {1}. Don't wear it out." (hey, localisation is not just about language!) then the output of the full program would be:
Bloggs. John Bloggs, W. Don't wear it out.
Having said this, while using this style of translation is easy to develop, it's hard to actually translate, since the strings are buried in the code and only surface at runtime. Unless you have a tool that can statically explore your code and extract all the translatable strings (without having to hit that code path at runtime), you're better off using more traditional resx files, since they inherently give you a table of text to be translated.
As already said in previous answers: you currently cannot load the format string at runtime (e.g. from resource files) for string interpolation because it is used at compile time.
If you don't care about the compile time feature and just want to have named placeholders, you could use something like this extension method:
public static string StringFormat(this string input, Dictionary<string, object> elements)
{
int i = 0;
var values = new object[elements.Count];
foreach (var elem in elements)
{
input = Regex.Replace(input, "{" + Regex.Escape(elem.Key) + "(?<format>[^}]+)?}", "{" + i + "${format}}");
values[i++] = elem.Value;
}
return string.Format(input, values);
}
Be aware that you cannot have inline expressions like {i+1} here and that this is not code with best performance.
You can use this with a dictionary you load from resource files or inline like this:
var txt = "Hello {name} on {day:yyyy-MM-dd}!".StringFormat(new Dictionary<string, object>
{
["name"] = "Joe",
["day"] = DateTime.Now,
});
String interpolation is difficult to combine with localization because the compiler prefers to translate it to string.Format(...), which does not support localization. However, there is a trick that makes it possible to combine localization and string interpolation; it is described near the end of this article.
Normally string interpolation is translated to string.Format, whose behavior cannot be customized. However, in much the same way as lambda methods sometimes become expression trees, the compiler will switch from string.Format to FormattableStringFactory.Create (a .NET 4.6 method) if the target method accepts a System.FormattableString object.
The problem is, the compiler prefers to call string.Format if possible, so if there were an overload of Localized() that accepted FormattableString, it would not work with string interpolation because the C# compiler would simply ignore it [because there is an overload that accepts a plain string]. Actually, it's worse than that: the compiler also refuses to use FormattableString when calling an extension method.
It can work if you use a non-extension method. For example:
static class Loca
{
public static string lize(this FormattableString message)
{ return message.Format.Localized(message.GetArguments()); }
}
Then you can use it like this:
public class Program
{
public static void Main(string[] args)
{
Localize.UseResourceManager(Resources.ResourceManager);
var name = "Dave";
Console.WriteLine(Loca.lize($"Hello, {name}"));
}
}
It's important to realize that the compiler converts the $"..." string into an old-fashioned format string. So in this example, Loca.lize actually receives "Hello, {0}" as the format string, not "Hello, {name}".
Using the Microsoft.CodeAnalysis.CSharp.Scripting package you can achieve this.
You will need to create an object to store the data in, below a dynamic object is used. You could also create an specific class with all the properties required. The reason to wrap the dynamic object in a class in described here.
public class DynamicData
{
public dynamic Data { get; } = new ExpandoObject();
}
You can then use it as shown below.
var options = ScriptOptions.Default
.AddReferences(
typeof(Microsoft.CSharp.RuntimeBinder.RuntimeBinderException).GetTypeInfo().Assembly,
typeof(System.Runtime.CompilerServices.DynamicAttribute).GetTypeInfo().Assembly);
var globals = new DynamicData();
globals.Data.Name = "John";
globals.Data.MiddleName = "James";
globals.Data.Surname = "Jamison";
var text = "My name is {Data.Name} {Data.MiddleName} {Data.Surname}";
var result = await CSharpScript.EvaluateAsync<string>($"$\"{text}\"", options, globals);
This is compiling the snippet of code and executing it, so it is true C# string interpolation. Though you will have to take into account the performance of this as it is actually compiling and executing your code at runtime. To get around this performance hit if you could use CSharpScript.Create to compile and cache the code.
The C# 6.0 string interpolation won't help you if the format string is not in your C# source code. In that case, you will have to use some other solution, like this library.
If we use interpolation then we are thinking in terms of methods, not constants. In that case we could define our translations as methods:
public abstract class InterpolatedText
{
public abstract string GreetingWithName(string firstName, string lastName);
}
public class InterpolatedTextEnglish : InterpolatedText
{
public override string GreetingWithName(string firstName, string lastName) =>
$"Hello, my name is {firstName} {lastName}.";
}
We can then load an implementation of InterpolatedText for a specific culture. This also provides a way to implement fallback, as one implementation can inherit from another. If English is the default language and other implementations inherit from it, there will at least be something to display until a translation is provided.
This seems a bit unorthodox, but offers some benefits:
Primarily, the string used for interpolation is always stored in a strongly-typed method with clearly-specified arguments.
Given this: "Hello, my name is {0} {1}" can we determine that the placeholders represent first name and last name in that order? There will always be a method which matches values to placeholders, but there's less room for confusion when the interpolated string is stored with its arguments.
Similarly, if we store our translation strings in one place and use them in another, it becomes possible to modify them in a way that breaks the code using them. We can add {2} to a string which will be used elsewhere, and that code will fail at runtime.
Using string interpolation this is impossible. If our translation string doesn't match the available arguments it won't even compile.
There are drawbacks, although I see difficulty in maintaining any solution.
The greatest is portability. If your translation is coded in C# and you switch, it's not the easiest thing to export all of your translations.
It also means that if you wish to farm out translations to different individuals (unless you have one person who speaks everything) then the translators must modify code. It's easy code, but code nonetheless.
Interpolated strings can not refactored out from their (variable) scope because of using of the embedded variables in them.
The only way to relocate the string literal part is passing the scope bound variables as parameter to an other location, and mark their position in the string with special placeholders. However this solution is already "invented" and out there:
string.Format("literal with placeholers", parameters);
or some of advanced library (interpolating runtime), but using the very same concept (passing parameters).
Then you can refactor out the "literal with placeholers" to a resource.

C#6.0 string interpolation localization

C#6.0 have a string interpolation - a nice feature to format strings like:
var name = "John";
WriteLine($"My name is {name}");
The example is converted to
var name = "John";
WriteLine(String.Format("My name is {0}", name));
From the localization point of view, it is much better to store strings like :
"My name is {name} {middlename} {surname}"
than in String.Format notation:
"My name is {0} {1} {2}"
How to use the string interpolation for .NET localization? Is there going to be a way to put $"..." to resource files? Or should strings be stored like "...{name}" and somehow interpolated on fly?
P.S. This question is NOT about "how to make string.FormatIt extension" (there are A LOT of such libraries, SO answers, etc.). This question is about something like Roslyn extension for "string interpolation" in "localization" context (both are terms in MS .NET vocabulary), or dynamic usage like Dylan proposed.
An interpolated string evaluates the block between the curly braces as a C# expression (e.g. {expression}, {1 + 1}, {person.FirstName}).
This means that the expressions in an interpolated string must reference names in the current context.
For example this statement will not compile:
var nameFormat = $"My name is {name}"; // Cannot use *name*
// before it is declared
var name = "Fred";
WriteLine(nameFormat);
Similarly:
class Program
{
const string interpolated = $"{firstName}"; // Name *firstName* does not exist
// in the current context
static void Main(string[] args)
{
var firstName = "fred";
Console.WriteLine(interpolated);
Console.ReadKey();
}
}
To answer your question:
There is no current mechanism provided by the framework to evaluate interpolated strings at runtime. Therefore, you cannot store strings and interpolate on the fly out of the box.
There are libraries that exist that handle runtime interpolation of strings.
According to this discussion on the Roslyn codeplex site, string interpolation will likely not be compatible with resource files (emphasis mine):
String interpolation could be neater and easier to debug than either String.Format or concatenation...
Dim y = $"Robot {name} reporting
{coolant.name} levels are {coolant.level}
{reactor.name} levels are {reactor.level}"
However, this example is fishy. Most professional programmers won't be writing
user-facing strings in code. Instead they'll be storing those strings in resources (.resw, .resx or .xlf) for reasons of localization. So there doesn't seem much use for string interpolation here.
Assuming that your question is more about how to localise interpolated strings in your source code, and not how to handle interpolated string resources...
Given the example code:
var name = "John";
var middlename = "W";
var surname = "Bloggs";
var text = $"My name is {name} {middlename} {surname}";
Console.WriteLine(text);
The output is obviously:
My name is John W Bloggs
Now change the text assignment to fetch a translation instead:
var text = Translate($"My name is {name} {middlename} {surname}");
Translate is implemented like this:
public static string Translate(FormattableString text)
{
return string.Format(GetTranslation(text.Format),
text.GetArguments());
}
private static string GetTranslation(string text)
{
return text; // actually use gettext or whatever
}
You need to provide your own implementation of GetTranslation; it will receive a string like "My name is {0} {1} {2}" and should use GetText or resources or similar to locate and return a suitable translation for this, or just return the original parameter to skip translation.
You will still need to document for your translators what the parameter numbers mean; the text used in the original code string doesn't exist at runtime.
If, for example, in this case GetTranslation returned "{2}. {0} {2}, {1}. Don't wear it out." (hey, localisation is not just about language!) then the output of the full program would be:
Bloggs. John Bloggs, W. Don't wear it out.
Having said this, while using this style of translation is easy to develop, it's hard to actually translate, since the strings are buried in the code and only surface at runtime. Unless you have a tool that can statically explore your code and extract all the translatable strings (without having to hit that code path at runtime), you're better off using more traditional resx files, since they inherently give you a table of text to be translated.
As already said in previous answers: you currently cannot load the format string at runtime (e.g. from resource files) for string interpolation because it is used at compile time.
If you don't care about the compile time feature and just want to have named placeholders, you could use something like this extension method:
public static string StringFormat(this string input, Dictionary<string, object> elements)
{
int i = 0;
var values = new object[elements.Count];
foreach (var elem in elements)
{
input = Regex.Replace(input, "{" + Regex.Escape(elem.Key) + "(?<format>[^}]+)?}", "{" + i + "${format}}");
values[i++] = elem.Value;
}
return string.Format(input, values);
}
Be aware that you cannot have inline expressions like {i+1} here and that this is not code with best performance.
You can use this with a dictionary you load from resource files or inline like this:
var txt = "Hello {name} on {day:yyyy-MM-dd}!".StringFormat(new Dictionary<string, object>
{
["name"] = "Joe",
["day"] = DateTime.Now,
});
String interpolation is difficult to combine with localization because the compiler prefers to translate it to string.Format(...), which does not support localization. However, there is a trick that makes it possible to combine localization and string interpolation; it is described near the end of this article.
Normally string interpolation is translated to string.Format, whose behavior cannot be customized. However, in much the same way as lambda methods sometimes become expression trees, the compiler will switch from string.Format to FormattableStringFactory.Create (a .NET 4.6 method) if the target method accepts a System.FormattableString object.
The problem is, the compiler prefers to call string.Format if possible, so if there were an overload of Localized() that accepted FormattableString, it would not work with string interpolation because the C# compiler would simply ignore it [because there is an overload that accepts a plain string]. Actually, it's worse than that: the compiler also refuses to use FormattableString when calling an extension method.
It can work if you use a non-extension method. For example:
static class Loca
{
public static string lize(this FormattableString message)
{ return message.Format.Localized(message.GetArguments()); }
}
Then you can use it like this:
public class Program
{
public static void Main(string[] args)
{
Localize.UseResourceManager(Resources.ResourceManager);
var name = "Dave";
Console.WriteLine(Loca.lize($"Hello, {name}"));
}
}
It's important to realize that the compiler converts the $"..." string into an old-fashioned format string. So in this example, Loca.lize actually receives "Hello, {0}" as the format string, not "Hello, {name}".
Using the Microsoft.CodeAnalysis.CSharp.Scripting package you can achieve this.
You will need to create an object to store the data in, below a dynamic object is used. You could also create an specific class with all the properties required. The reason to wrap the dynamic object in a class in described here.
public class DynamicData
{
public dynamic Data { get; } = new ExpandoObject();
}
You can then use it as shown below.
var options = ScriptOptions.Default
.AddReferences(
typeof(Microsoft.CSharp.RuntimeBinder.RuntimeBinderException).GetTypeInfo().Assembly,
typeof(System.Runtime.CompilerServices.DynamicAttribute).GetTypeInfo().Assembly);
var globals = new DynamicData();
globals.Data.Name = "John";
globals.Data.MiddleName = "James";
globals.Data.Surname = "Jamison";
var text = "My name is {Data.Name} {Data.MiddleName} {Data.Surname}";
var result = await CSharpScript.EvaluateAsync<string>($"$\"{text}\"", options, globals);
This is compiling the snippet of code and executing it, so it is true C# string interpolation. Though you will have to take into account the performance of this as it is actually compiling and executing your code at runtime. To get around this performance hit if you could use CSharpScript.Create to compile and cache the code.
The C# 6.0 string interpolation won't help you if the format string is not in your C# source code. In that case, you will have to use some other solution, like this library.
If we use interpolation then we are thinking in terms of methods, not constants. In that case we could define our translations as methods:
public abstract class InterpolatedText
{
public abstract string GreetingWithName(string firstName, string lastName);
}
public class InterpolatedTextEnglish : InterpolatedText
{
public override string GreetingWithName(string firstName, string lastName) =>
$"Hello, my name is {firstName} {lastName}.";
}
We can then load an implementation of InterpolatedText for a specific culture. This also provides a way to implement fallback, as one implementation can inherit from another. If English is the default language and other implementations inherit from it, there will at least be something to display until a translation is provided.
This seems a bit unorthodox, but offers some benefits:
Primarily, the string used for interpolation is always stored in a strongly-typed method with clearly-specified arguments.
Given this: "Hello, my name is {0} {1}" can we determine that the placeholders represent first name and last name in that order? There will always be a method which matches values to placeholders, but there's less room for confusion when the interpolated string is stored with its arguments.
Similarly, if we store our translation strings in one place and use them in another, it becomes possible to modify them in a way that breaks the code using them. We can add {2} to a string which will be used elsewhere, and that code will fail at runtime.
Using string interpolation this is impossible. If our translation string doesn't match the available arguments it won't even compile.
There are drawbacks, although I see difficulty in maintaining any solution.
The greatest is portability. If your translation is coded in C# and you switch, it's not the easiest thing to export all of your translations.
It also means that if you wish to farm out translations to different individuals (unless you have one person who speaks everything) then the translators must modify code. It's easy code, but code nonetheless.
Interpolated strings can not refactored out from their (variable) scope because of using of the embedded variables in them.
The only way to relocate the string literal part is passing the scope bound variables as parameter to an other location, and mark their position in the string with special placeholders. However this solution is already "invented" and out there:
string.Format("literal with placeholers", parameters);
or some of advanced library (interpolating runtime), but using the very same concept (passing parameters).
Then you can refactor out the "literal with placeholers" to a resource.

how to use Regex for matching Functions and Capturing their Arguments

I have a file composed of several functions:
public int function(param1, param2, param3, param4,paramN)
{
}
public void function1(param1, param2)
{
}
public void function2()
{
}
I've made this code to get the functions name and parameters:
class Program
{
static void Main(string[] args)
{
// Read the file as one string.
System.IO.StreamReader myFile =
new System.IO.StreamReader("E:\\new1.c");
string myString = myFile.ReadToEnd();
string Expression = myString;
///
/// Get function name
///
var func = Regex.Match(Expression, #"\b[^()]+\((.*)\)$");
Console.WriteLine("FuncTag: " + func.Value);
string innerArgs = func.Groups[1].Value;
///
/// Get parameters
///
var paramTags = Regex.Matches(innerArgs, #"([^,]+\(.+?\))|([^,]+)");
Console.WriteLine("Matches: " + paramTags.Count);
foreach (var item in paramTags)
Console.WriteLine("ParamTag: " + item);
myFile.Close();
// Display the file contents.
Console.WriteLine(myString);
// Suspend the screen.
Console.ReadLine();
Console.ReadKey();
}
}
This works fine when I put string Expression ="function(param1, param2, param3, param4,paramN)", but when I put instead the name of the file it doesn't work .I want to read all functions from the file and show all the functions with their parameters. How can I fix this?
Text editors not written for a specific programming language like UltraEdit or Notepad++ use also regular expressions to find names of functions / methods, function parameters, etc.
Therefore it is possible to use the regular expressions of those text editors also in C# coded applications.
Attention!
A regular expression approach to identify symbol names is never perfect and may return false positives or incorrect data.
For example using block or line comments within the parentheses of a function definition containing special characters like ) or { in the comments will always produce wrong results. And of course the regular expression search would not eliminiate block and line comments within a function definition before as a C/C++/C# compiler does.
UltraEdit v21.20 uses in syntax highlighting wordfile c_cplusplus.uew the Perl regular expression string
^(?!if\b|else\b|while\b|[\s*])(?:[\w*~_&]+?\s+){1,6}([\w:*~_&]+\s*)\([^);]*\)[^{;]*?(?:^[^\r\n{]*;?[\s]+){0,10}\{
There is only 1 marking group in this expression which (usually) matches the name of the function / method.
The expression above matches in a C/C++ file everything from beginning of a function definition line up to opening brace, again with the limitation of not working for all function definition formats possible in C/C++ source files.
UltraEdit searches next inside the entire found string for \( and \) and extracts this string for further processing for function parameters using the expression \s*([^,]+).
Therefore it would be possible to use
^(?!if\b|else\b|while\b|[\s*])(?:[\w*~_&]+?\s+){1,6}([\w:*~_&]+\s*)\(([^);]*)\)[^{;]*?(?:^[^\r\n{]*;?[\s]+){0,10}\{
which contains now 2 marking groups:
The first one matches the name of the function / method.
The second one matches everything between the parentheses.
So every second match can be searched next by \s*([^,]+) to get the function parameters with type (and comments) and on all found strings the String.TrimEnd method should be used to remove all trailing spaces and tabs.

Best way to provide the user an escape string

Suppose I want to ask a user what format they want a certain output to be in and the output will include fill-in fields. So they provide something like this string:
"Output text including some field {FieldName1Value} and another {FieldName2Value} and so on..."
Anything bound by the {} should be a column name in a table somewhere they will be replaced with the the stored value with the code I am writing. Seems simple, I could just do a string.Replace on any instance that matches the patter "{" + FieldName + "}". But, what if I also want to give the user the option of using an escape so they can use brackets like any other string. I was thinking they provide "{{" or "}}" to escape that bracket - nice and easy for them. So, they could provide something like:
"Output text including some field {FieldName1Value} and another {FieldName2Value} but not this {{FieldName2Value}}"
But now that "{{FieldName2Value}}" is to be treated like any other string and ignored by the by the Replace. Also, if they decided to put something like "{{{FieldName2Value}}}" with the triple brackets, that would be interpreted by the code as the field value wrapped with brackets and so on.
This is where I get stuck. I am trying with RegEx and came up with this:
public object Convert(object[] values, Type targetType, object parameter, CultureInfo culture)
{
string format = (string)values[0];
ObservableCollection<CalloutFieldAliasMap> oc = (ObservableCollection<CalloutFieldAliasMap>)values[1];
foreach (CalloutFieldMap map in oc)
format = Regex.Replace(format, #"(?<!{){" + map.FieldName + "(?<!})}", " " + map.FieldAlias + " ", RegexOptions.IgnoreCase);
return format;
}
This works in the situation with double brackets {{ }} but NOT if there are three, ie {{{ }}}. The triple brackets are treated like string when it should be treated as {FieldValue}.
Thanks for any help.
By expanding on your regular expression, the presence of literals can be accommodated.
format = Regex.Replace(format,
#"(?<!([^{]|^){(?:{{)*){" + Regex.Escape(map.FieldName) + "}",
String.Format(" {0} ", map.FieldAlias),
RegexOptions.IgnoreCase | RegexOptions.Compiled);
The first part of the expression, (?<!([^{]|^){(?:{{)*){, designates that the { must be preceded by an even number of { characters for it to mark the beginning of a field token. Thus, {FieldName} and {{{FieldName} will denote the start of a field name, whereas {{FieldName} and {{{{FieldName} would not.
The closing } simply requires that the end of the field be a simple }. There is some ambiguity in the syntax in that {FieldName1Value}}} could be parsed as a token with FieldName1Value (followed by the literal }) or FieldName1Value}. The regex assumes the former. (If the latter is intended, you could replace this with }(?!}(}})*) instead.
A couple of other notes. I added Regex.Escape(map.FieldName) so that all characters in the field name are treated as literals; and added the RegexOptions.Compiled flag. (Since this is both a complex expression and executed in a loop, it is a good candidate for compilation.)
After the loop executes, a simple:
format = format.Replace("{{", "{").Replace("}}", "}")
can be used to unescape the literal {{ and }} characters.
The simplest way would be to use String.Replace to replace the double brackets with a character sequence that the user can not (or almost certainly will not) enter. Then do the replacement of your fields, and finally convert replacement back to the double brackets.
For example, given:
string replaceOpen = "{x"; // 'x' should be something like \u00ff, for example
string replaceClose = "x}";
string template = "Replace {ThisField} but not {{ThatField}}";
string temp = template.Replace("{{", replaceOpen).Replace("}}", replaceClose);
string converted = temp.Replace("{ThisField}", "Foo");
string final = converted.Replace(replaceOpen, "{{").Replace(replaceClose, "}});
It's not particularly pretty, but it's effective.
How you go about it is going to depend in large part on how often you call this, and how fast you really need it to be.
I have an extension method I wrote that almost does what you ask, but, while it does escape using double braces, it doesn't do the triple braces like you suggested. Here is the method (also on GitHub at https://github.com/benallred/Icing/blob/master/Icing/Icing.Core/StringExtensions.cs):
private const string FormatTokenGroupName = "token";
private static readonly Regex FormatRegex = new Regex(#"(?<!\{)\{(?<" + FormatTokenGroupName + #">\w+)\}(?!\})", RegexOptions.Compiled);
public static string Format(this string source, IDictionary<string, string> replacements)
{
if (string.IsNullOrWhiteSpace(source) || replacements == null)
{
return source;
}
string replaced = replacements.Aggregate(source,
(current, pair) =>
FormatRegex.Replace(current,
new MatchEvaluator(match =>
(match.Groups[FormatTokenGroupName].Value == pair.Key
? pair.Value : match.Value))));
return replaced.Replace("{{", "{").Replace("}}", "}");
}
Usage:
"This is my {FieldName}".Format(new Dictionary<string, string>() { { "FieldName", "value" } });
Even easier if you add this:
public static string Format(this string source, object replacements)
{
if (string.IsNullOrWhiteSpace(source) || replacements == null)
{
return source;
}
IDictionary<string, string> replacementsDictionary = new Dictionary<string, string>();
foreach (PropertyDescriptor propertyDescriptor in TypeDescriptor.GetProperties(replacements))
{
string token = propertyDescriptor.Name;
object value = propertyDescriptor.GetValue(replacements);
replacementsDictionary.Add(token, (value != null ? value.ToString() : String.Empty));
}
return Format(source, replacementsDictionary);
}
Usage:
"This is my {FieldName}".Format(new { FieldName = "value" });
Unit tests for this method are at https://github.com/benallred/Icing/blob/master/Icing/Icing.Tests/Core/TestOf_StringExtensions.cs
If this doesn't work, what would your ideal solution do for more than three braces? In other words, if {{{FieldName}}} becomes {value}, what does {{{{FieldName}}}} become? What about {{{{{FieldName}}}}} and so on? While those cases are unlikely, they still need to be handled purposefully.
RegEx will not do what you want because it only knows it's current state and what transitions are available. It has no concept of memory. The language you're trying parse is not regular so you will never be able to write a RegEx to handle the general case. You would need i expressions where i is the number of matching braces.
There is a lot of theory behind this and I'll provide some links at the bottom if you're curious. But basically the language you're trying to parse is context-free and to implement a general solution you'll need model a push down automaton, which uses a stack to ensure that an opening brace has a matching closing brace (yes, this is why most languages have matching braces).
Each time you encounter { you put it on the stack. If you encounter } you pop from the stack. When you empty the stack you will know that you've reached the end of a field. Of course that's a major simplification of the problem, but if you're looking for a general solution it should get you moving in the right direction.
http://en.wikipedia.org/wiki/Regular_language
http://en.wikipedia.org/wiki/Context-free_language
http://en.wikipedia.org/wiki/Pushdown_automaton

Does not match sentence that contains specified character

Im trying to match properties in class. Example class:
public static string ComingSoonPage
{
get { return "/blog-coming-soon.aspx"; }
}
public static string EncodeBase64(string dataToEncode)
{
byte[] bytes = System.Text.ASCIIEncoding.UTF8.GetBytes(dataToEncode);
string returnValue = System.Convert.ToBase64String(bytes);
return returnValue;
}
Im using this kind of regex:
(?:public|private|protected)([\s\w]*)\s+(\w+)[^(]
It matches not only properties but also methods which is wrong. So i want remove from matches sentences that contains (. So it select all but not methods (which contains ( ). How can i achieve that.
Try matching the "{" and the "get {" instead
(public|private|protected|internal)[\s\w]*\s+(\w+)\s*\{\s*get\s*\{
UPDATE
Match only the name of the property
(?<=(public|private|protected|internal)[\s\w]*\s+)\w+(?=\s*\{\s*get\s*\{)
uses the general pattern
(?<=prefix)find(?=suffix)
EDIT
A property might have no modifier (public, private etc.) at all and the type might contain extra characters (e.g. for arrays int[,]. Therefore it would probably be better to test only for the syntax elements following the property name (and the name itself). Also a property could consist of only a setter and be abstract: abstract int[,] Matrix { set; }. I suggest retrieving the property names like this:
\w+(?=\s*\{\s*(get|set)\b)
where \b matches a word beginning or (in this case) a word end.
This may be what you are looking for and this works perfectly! I deserve some treat though :)...
Regex r=new Regex(#"(public|private).*?(?=(public|private|$))",RegexOptions.Singleline);
Regex nr=new Regex(#"\(.*?\)\s+\{",RegexOptions.Singleline);
foreach(Match m in r.Matches(yourCodeFile))//extracts all methods and properties
{
if(!nr.IsMatch(m.Value))//shoots down methods
m.Value;//properties only
}
According to this answer, try using:
for Properties: type and name:
(?:public\s|private\s|protected\s|internal\s)\s*(?:readonly|static\s+)?(?<type>\w+)\s+(?<name>\w+)[\s\r\n]*{
for Fields: type and name:
(?:public\s|private\s|protected\s)\s*(?:readonly|static\s+)?(?<type>\w+)\s+(?<name>\w+);
for Methods: methodName and parameterType and parameter:
(?:public\s|private\s|protected\s|internal\s)?[\s\w]*\s+(?<methodName>\w+)\s*\(\s*(?:(ref\s|/in\s|out\s)?\s*(?<parameterType>\w+)\s+(?<parameter>\w+)\s*,?\s*)+\)
for c# code analysis try Irony or The Roslyn Project, see this sample:
C# and VB.NET Code Searcher - Using Roslyn codeproject

Categories