String.Format with curly braces - c#

Our low level logging library has to cope with all sorts of log messages sent to it.
Some of these messages include curly braces (as part of the text), and some contain parameters to be formatted as part of the string using String.Format
For example, this string could be an input to the Logger class:
"Parameter: {Hostname} Value: {0}"
With the correct variable sent to use for the formatter.
In order to properly do it, i must escape the curly braces that are not part of the formatting (by doubling them up).
I thought of doing it using Regex, however this is not as simple as it may seem, since i have no idea how to match these strings inside a curly braces (ones that are NOT used by String.Format for formatting purposes).
Another issue is that the Logger class should be as performance efficient as possible, starting to handle regular expressions as part of its operation may hinder performance.
Is there any proper and known best practice for this?

Doing it in just one regex:
string input = "Parameter: {Hostname} Value: {0}";
input = Regex.Replace(input, #"{([^[0-9]+)}", #"{{$1}}");
Console.WriteLine(input);
Outputs:
Parameter: {{Hostname}} Value: {0}
This of course only works as long as there aren't any parameters that contain numbers but should still be escaped with {{ }}

I think that you should look into your loggers interface. Compare with how Console.WriteLine works:
Console.WriteLine(String) outputs exactly the string given, no formatting, nothing special with { and }.
Console.WriteLine(String, Object[]) outputs using formatting. { and } are special characters that the caller must escape to {{ and }}
I think it's flawed design having to differentiate between different curly brace occurences in the code to find out what as meant. Lay the burden of escaping { that should occur in the output into {{.

I would double all the curly braces and then I would look for those to be replaced with a regex like {{\d+}} so that they came back to their original format -- {{0}} => {0} -- in your string.
So for each line I would do sth like this
string s = input.Replace("{", "{{").Replace("}", "}}");
return Regex.Replace(s, #"{{(?<val>\d+)}}",
m => { return "{" + m.Groups["val"] + "}"; }));
So that's a technical answer to the original question but #Anders Abel is perfectly right. It would be worth considering the design again...

To allow the caller to have formatted strings and cope with formitting specifiers e.g.
Logger.Log("{0:dd/mm/yyy} {0:hh:mm:ss} {hostname} Some error {1:x4} happened on {123Component}!", DateTime.UtcNow, 257)
You'd need a regex like:
string input = "{0:dd/mm/yyy} {0:hh:mm:ss} {hostname} Some error {1:x4} happened on {123Component}!";
Regex reg = new Regex(#"(\{[^[0-9}]+?[^}]*\}|\{(?![0-9]+:)[^}]+?\})");
string output = reg.Replace(input, "{$1}");
Console.WriteLine(output);
This outputs:
"{0:dd/mm/yyy} {0:hh:mm:ss} {{hostname}} Some error {1:x4} happened on {{123Component}}!"
But to reiterate, I'd agree with Anders Abel that you ought to redesign to avoid the need for the log library to do this.

Related

Escape Character Associativity C# 6 String Interpolation

Given:
double price = 5.05;
Console.Write($"{{Price = {price:C}}}");
and the desired output: {Price = $5.05}
Is there any way to associate the last two curly braces as an escaped '}' so the interpolation works as intended? As it stands, the first two are escaped(I assume?), and the output is :{Price = C}
Console.Write($"{{Price = {price:C} }}");
works as expected, but with the extra space. And I can concatenate the tail brace, which I consider a poor man's solution. Is there a colloquial rich man's solution? Thanks.
This arises because of an "oddity" in the behavior of string.Format, and our desire to have a precise 1-to-1 mapping between interpolations and inserts in the generated format string. In short, the language behavior precisely models the behavior of string.Format.
In an interpolation (the thing inside the curly braces), the expression ends either at a colon (which starts a format string), or a close curly brace. In the latter case a doubled curly brace has no special meaning because it isn't inside a literal part of the string. So three curly braces in a row would be interpreted as a close to the interpolation, followed by a literal (escaped by doubling) close curly brace. But after the colon the format string is given for that interpolation, and that format string is any string, and it is terminated by a close curly brace. If you want a close curly brace inside your format string, you simply double it up. Which is what you have unintentionally done.
CoolBots gave the best way of handling this https://stackoverflow.com/a/42993667/241658
Read the "Escaping Braces" section of https://msdn.microsoft.com/en-us/library/txafckwd(v=vs.110).aspx for an explanation of precisely this issue.
Curious workaround:
var p = price.ToString("C");
Console.Write($"{{Price = {p}}}");
For some reason, $"{{Price = {p}}}" and $"{{Price = {p:C}}}" have different associativity outcomes, which feels like a compiler bug. I'll ask around! Note that it is consistent with how string.Format applies the same rule, so it might be intentionally propagating an earlier framework oddity.
You can interpolate instead of concatenate - pass it as a string literal:
double price = 5.05;
Console.Write($"{{Price = {price:C}{"}"}");
Well you can try with less used escape characters. Maybe \b will work as it doesn't print anything and it had no function for a really long time. Something like:
double price = 5.05;
Console.Write($"{{Price = {price:C}\b}}");
If that doesn't work for you, you can try with special UNICODE characters like U+200B or U+FEFF:
double price = 5.05;
Console.Write($"{{Price = {price:C}\x8203}}");
Escape characters: https://blogs.msdn.microsoft.com/csharpfaq/2004/03/12/what-character-escape-sequences-are-available/
UNICODE space characters: https://www.cs.tut.fi/~jkorpela/chars/spaces.html
When there are some problems with C# 6 syntax why not to use traditional string.Format() instead?
double price = 5.05;
Console.WriteLine(string.Format("{{Price = {0}}}", price.ToString("C")));

Regex for ignoring consecutive quotation marks in string

I have built a parser in Sprache and C# for files using a format I don't control. Using it I can correctly convert:
a = "my string";
into
my string
The parser (for the quoted text only) currently looks like this:
public static readonly Parser<string> QuotedText =
from open in Parse.Char('"').Token()
from content in Parse.CharExcept('"').Many().Text().Token()
from close in Parse.Char('"').Token()
select content;
However the format I'm working with escapes quotation marks using "double doubles" quotes, e.g.:
a = "a ""string"".";
When attempting to parse this nothing is returned. It should return:
a ""string"".
Additionally
a = "";
should be parsed into a string.Empty or similar.
I've tried regexes unsuccessfully based on answers like this doing things like "(?:[^;])*", or:
public static readonly Parser<string> QuotedText =
from content in Parse.Regex("""(?:[^;])*""").Token()
This doesn't work (i.e. no matches are returned in the above cases). I think my beginners regex skills are getting in the way. Does anybody have any hints?
EDIT: I was testing it here - http://regex101.com/r/eJ9aH1
If I'm understanding you correctly, this is the kind of regex you're looking for:
"(?:""|[^"])*"
See the demo.
1. " matches an opening quote
2. (?:""|[^"])* matches two quotes or any chars that are not a quote (including newlines), repeating
3. " matches the closing quote.
But it's always going to boil down to whether your input is balanced. If not, you'll be getting false positives. And if you have a string such as "string"", which should be matched?"string"",""`, or nothing?... That's a tough decision, one that, fortunately, you don't have to make if you are sure of your input.
You can likely adapt your desired output from this pattern:
"(.+".+")"|(".+?")|("")
example:
http://regex101.com/r/lO1vZ4
If you only want to ignore consecutive double quotes, try this:
("{2,})
Live demo
This regex "("+) might help you to match extra unwanted double quotes.
here is the DEMO

C# .NET Regex remove all quotes of quotes excluding one instance in a sentance

I have description field which is:
16" Alloy Upgrade
In CSV format it appears like this:
"16"" Alloy Upgrade "
What would be the best use of regex to maintain the original format? As I'm learning I would appreciate it being broke down for my understanding.
I'm already using Regex to split some text separating 2 fields which are: code, description. I'm using this:
,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))
My thoughts are to remove the quotes, then remove the delimiter excluding use in sentences.
Thanks in advance.
If you don't want to/can't use a standard CSV parser (which I'd recommend), you can strip all non-doubled quotes using a regex like this:
Regex.Replace(text, #"(?!="")""(?!"")",string.Empty)
That regex will match every " character not preceded or followed by another ".
I wouldn't use regex since they are usually confusing and totally unclear what they do (like the one in your question for example). Instead this method should do the trick:
public string CleanField(string input)
{
if (input.StartsWith("\"") && input.EndsWith("\""))
{
string output = input.Substring(1,input.Length-2);
output = output.Replace("\"\"","\"");
return output;
}
else
{
//If it doesn't start and end with quotes then it doesn't look like its been escaped so just hand it back
return input;
}
}
It may need tweaking but in essence it checks if the string starts and ends with a quote (which it should if it is an escaped field) and then if so takes the inside part (with the substring) and then replaces double quotes with single quotes. The code is a bit ugly due to all the escaping but there is no avoiding that.
The nice thing is this can be used easily with a bit of Linq to take an existing array and convert it.
processedFieldArray = inputfieldArray.Select(CleanField).ToArray();
I'm using arrays here purely because your linked page seems to use them where you are wanting this solution.

regular expression for c# verbatim like strings (processing ""-like escapes)

I'm trying to extract information out of rc-files. In these files, "-chars in strings are escaped by doubling them ("") analog to c# verbatim strings. is ther a way to extract the string?
For example, if I have the following string "this is a ""test""" I would like to obtain this is a ""test"". It also must be non-greedy (very important).
I've tried to use the following regular expression;
"(?<text>[^""]*(""(.|""|[^"])*)*)"
However the performance was awful.
I'v based it on the explanation here: http://ad.hominem.org/log/2005/05/quoted_strings.php
Has anybody any idea to cope with this using a regular expression?
You've got some nested repetition quantifiers there. That can be catastrophic for the performance.
Try something like this:
(?<=")(?:[^"]|"")*(?=")
That can now only consume either two quotes at once... or non-quote characters. The lookbehind and lookahead assert, that the actual match is preceded and followed by a quote.
This also gets you around having to capture anything. Your desired result will simply be the full string you want (without the outer quotes).
I do not assert that the outer quotes are not doubled. Because if they were, there would be no way to distinguish them from an empty string anyway.
This turns out to be a lot simpler than you'd expect. A string literal with escaped quotes looks exactly like a bunch of simple string literals run together:
"Some ""escaped"" quotes"
"Some " + "escaped" + " quotes"
So this is all you need to match it:
(?:"[^"]*")+
You'll have to strip off the leading and trailing quotes in a separate step, but that's not a big deal. You would need a separate step anyway, to unescape the escaped quotes (\" or "").
Don't if this is better or worse than m.buettner's (guessing not - he seems to know his stuff) but I thought I'd throw it out there for critique.
"(([^"]+(""[^"]+"")*)*)"
Try this (?<=^")(.*?"{2}.*?"{2})(?="$)
it will be maybe more faster, than two previous
and without any bugs.
Match a " beginning the string
Multiple times match a non-" or two "
Match a " ending the string
"([^"]|(""))*?"

Regex-How to remove comma which is between " and "?

How to remove ,(comma) which is between "(double inverted comma) and "(double inverted comma). Like there is "a","b","c","d,d","e","f" and then from this, between " and " there is one comma which should be removed and after removing that comma it should be "a","b","c","dd","e","f" with the help of the regex in C# ?
EDIT: I forgot to specify that there may be double comma between quotes like "a","b","c","d,d,d","e","f" for it that regex does not work. and there can be any number of comma between quotes.
And there can be string like a,b,c,"d,d",e,f then there should be result like a,b,c,dd,e,f and if string like a,b,c,"d,d,d",e,f then result should be like a,b,c,ddd,e,f.
Assuming the input is as simple as your examples (i.e., not full-fledged CSV data), this should do it:
string input = #"a,b,c,""d,d,d"",e,f,""g,g"",h";
Console.WriteLine(input);
string result = Regex.Replace(input,
#",(?=[^""]*""(?:[^""]*""[^""]*"")*[^""]*$)",
String.Empty);
Console.WriteLine(result);
output: a,b,c,"d,d,d",e,f,"g,g",h
a,b,c,"ddd",e,f,"gg",h
The regex matches any comma that is followed by an odd number of quotation marks.
EDIT: If fields are quoted with apostrophes (') instead of quotation marks ("), the technique is exactly the same--except you don't have to escape the quotes:
string input = #"a,b,c,'d,d,d',e,f,'g,g',h";
Console.WriteLine(input);
string result = Regex.Replace(input,
#",(?=[^']*'(?:[^']*'[^']*')*[^']*$)",
String.Empty);
Console.WriteLine(result);
If some fields were quoted with apostrophes while others were quoted with quotation marks, a different approach would be needed.
EDIT: Probably should have mentioned this in the previous edit, but you can combine those two regexes into one regex that will handle either apostrophes or quotation marks (but not both):
#",(?=[^']*'(?:[^']*'[^']*')*[^']*$|[^""]*""(?:[^""]*""[^""]*"")*[^""]*$)"
Actually, it will handle simple strings like 'a,a',"b,b". The problem is that there would be nothing to stop you from using one of the quote characters in a quoted field of the other type, like '9" Nails' (sic) or "Kelly's Heroes". That's taking us into full-fledged CSV territory (if not beyond), and we've already established that we're not going there. :D
They're called regular expressions for a reason — they are used to process strings that meet a very specific and academic definition for what is "regular". It looks like you have some fairly typical csv data here, and it happens that csv strings are outside of that specific definition: csv data is not formally "regular".
In spite of this, it can be possible to use regular expressions to handle csv data. However, to do so you must either use certain extensions to normal regular expressions to make them Turing complete, know certain constraints about your specific csv data that is not promised in the general case, or both. Either way, the expressions required to do this are unwieldly and difficult to manage. It's often just not a good idea, even when it's possible.
A much better (and usually faster) solution is to use a dedicated CSV parser. There are two good ones hosted at code project (FastCSV and Linq-to-CSV), there is one (actually several) built into the .Net Framework (Microsoft.VisualBasic.TextFieldParser), and I have one here on Stack Overflow. Any of these will perform better and just plain work better than a solution based on regular expressions.
Note here that I'm not arguing it can't be done. Most regular expression engines today have the necessary extensions to make this possible, and most people parsing csv data know enough about the data they're handling to constrain it appropriately. I am arguing that it's slower to execute, harder to implement, harder to maintain, and more error-prone compared to a dedicated parser alternative, which is likely built into whichever platform you're using, and is therefore not in your best interests.
var input = "\"a\",\"b\",\"c\",\"d,d\",\"e\",\"f\"";
var regex = new Regex("(\"\\w+),(\\w+\")");
var output = regex.Replace(input,"$1$2");
Console.WriteLine(output);
You'd need to evaluate whether or not \w is what you want to use.
You can use this:
var result = Regex.Replace(yourString, "([a-z]),", "$1");
Sorry, after seeing your edits, regular expressions are not appropriate for this.
This should be very simple using Regex.Replace and a callback:
string pattern = #"
"" # open quotes
[^""]* # some not quotes
"" # closing quotes
";
data = Regex.Replace(data, pattern, m => m.Value.Replace(",", ""),
RegexOptions.IgnorePatternWhitespace);
You can even make a slight modification to allow escaped quotes (here I have \", and the comments explain how to use "":
string pattern = #"
\\. # escaped character (alternative is be """")
|
(?<Quotes>
"" # open quotes
(?:\\.|[^""])* # some not quotes or escaped characters
# the alternative is (?:""""|[^""])*
"" # closing quotes
)
";
data = Regex.Replace(data, pattern,
m => m.Groups["Quotes"].Success ? m.Value.Replace(",", "") : m.Value,
RegexOptions.IgnorePatternWhitespace);
If you need a single quote replace all "" in the pattern with a single '.
Something like the following, perhaps?
"(,)"

Categories