Strange behaviour of String.Format when (mis-)using placeholders

Strange behaviour of String.Format when (mis-)using placeholders - c#

When I learned about the String.Format function, I did the mistake to think that it's acceptable to name the placeholders after the colon, so I wrote code like this:
String.Format("A message: '{0:message}'", "My message");
//output: "A message: 'My message'"
I just realized that the string behind the colon is used to define the format of the placeholder and may not be used to add a comment as I did.
But apparently, the string behind the colon is used for the placeholder if:
I want to fill the placeholder with an integer and
I use an unrecognized formating-string behind the colon
But this doesn't explain to me, why the string behind the colon is used for the placeholder if I provide an integer.
Some examples:
//Works for strings
String.Format("My number is {0:number}!", "10")
//output: "My number is 10!"
//Works without formating-string
String.Format("My number is {0}!", 10)
//output: "My number is 10!"
//Works with recognized formating string
String.Format("My number is {0:d}!", 10)
//output: "My number is 10!"
//Does not work with unrecognized formating string
String.Format("My number is {0:number}!", 10)
//output: "My number is number!"
Why is there a difference between the handling of strings and integers? And why is the fallback to output the formating string instead of the given value?

Just review the MSDN page about composite formatting for clarity.
A basic synopsis, the format item syntax is:
{ index[,alignment][:formatString]}
So what appears after the : colon is the formatString. Look at the "Format String Component" section of the MSDN page for what kind of format strings are predefined. You will not see System.String mentioned in that list. Which is no great surprise, a string is already "formatted" and will only ever appear in the output as-is.
Composite formatting is pretty lenient to mistakes, it won't throw an exception when you specify an illegal format string. That the one you used isn't legal is already pretty evident from the output you get. And most of all, the scheme is extensible. You can actually make a :message format string legal, a class can implement the ICustomFormatter interface to implement its own custom formatting. Which of course isn't going to happen on System.String, you cannot modify that class.
So this works as expected. If you don't get the output you expected then this is pretty easy to debug, you've just go two mistakes to consider. The debugger eliminates one (wrong argument), your eyes eliminates the other.

String.Format article on MSDN has following description:
A format item has this syntax: { index[,alignment][ :formatString] }
...
formatString Optional.
A string that specifies the format of the
corresponding argument's result string. If you omit formatString, the
corresponding argument's parameterless ToString method is called to
produce its string representation. If you specify formatString, the
argument referenced by the format item must implement the IFormattable
interface.
If we directly format the value using the IFormattable we will have the same result:
String garbageFormatted = (10 as IFormattable).ToString("garbage in place of int",
CultureInfo.CurrentCulture.NumberFormat);
Console.WriteLine(garbageFormatted); // Writes the "garbage in place of int"
So it seems that it is something close to the "garbage in, garbage out" problem in the implementation of the IFormattable interface on Int32 type(and possibly on other types as well). The String class does not implement IFormattable, so any format specifier is left unused and .ToString(IFormatProvider) is called instead.
Also:
Ildasm shows that Int32.ToString(String, INumberFormat) internally calls
string System.Number::FormatInt32(int32,
string,
class System.Globalization.NumberFormatInfo)
But it is the internalcall method (extern implemented somewhere in native code), so Ildasm is of no use if we want to determine the source of the problem.
EDIT - CULPRIT:
After reading the How to see code of method which marked as MethodImplOptions.InternalCall? I've used the source code from Shared Source Common Language Infrastructure 2.0 Release (it is .NET 2.0 but nonetheless) in attempt to find a culprit.
Code for the Number.FormatInt32 is located in the ...\sscli20\clr\src\vm\comnumber.cpp file.
The culprit could be deduced from the default section of the format switch statement of the FCIMPL3(Object*, COMNumber::FormatInt32, INT32 value, StringObject* formatUNSAFE, NumberFormatInfo* numfmtUNSAFE):
default:
NUMBER number;
Int32ToNumber(value, &number);
if (fmt != 0) {
gc.refRetString = NumberToString(&number, fmt, digits, gc.refNumFmt);
break;
}
gc.refRetString = NumberToStringFormat(&number, gc.refFormat, gc.refNumFmt);
break;
The fmt var is 0, so the NumberToStringFormat(&number, gc.refFormat, gc.refNumFmt); is being called.
It leads us to nothing else than to the second switch statement default section in the NumberToStringFormat method, that is located in the loop that enumerates every format string character. It is very simple:
default:
*dst++ = ch;
It just plain copies every character from the format string into the output array, that's how the format string ends repeated in the output.
From one point of view it allows to really use garbage format strings that will output nothing useful, but from other point of view it will allow you to use something like:
String garbageFormatted = (1234 as IFormattable).ToString("0 thousands and ### in thousand",
CultureInfo.CurrentCulture.NumberFormat);
Console.WriteLine(garbageFormatted);
// Writes the "1 thousands and 234 in thousand"
that can be handy in some situations.

Interesting behavior indeed BUT NOT unaccounted for.
Your last example works when
if String.Format("My number is {0:n}!", 10)
but revert to the observed beahvior when
if String.Format("My number is {0:nu}!", 10)`.
This prompts to search about the Standard Numeric Format Specifier article on MSDN where you can read
Standard numeric format strings are used to format common numeric
types. A standard numeric format string takes the form Axx, where:
A is a single alphabetic character called the format specifier. Any
numeric format string that contains more than one alphabetic
character, including white space, is interpreted as a custom numeric
format string. For more information, see Custom Numeric Format
Strings.
The same article explains: if you have a SINGLE letter that is not recognized you get an exception.
Indeed
if String.Format("My number is {0:K}!", 10)`.
throws the FormatException as explained.
Now looking in the Custom Numeric Format Strings chapter you will find a table of eligible letters and their possible mixings, but at the end of the table you could read
Other
All other characters
The character is copied to the result string unchanged.
So I think that you have created a format string that cannot in any way print that number because there is no valid format specifier where the number 10 should be 'formatted'.

No it's not acceptable to place anything you like after the colon. Putting anything other than a recognized format specifier is likely to result in either an exception or unpredictable behaviour as you've demonstrated. I don't think you can expect string.Format to behave consistently when you're passing it arguments which are completely inconsistent with the documented formatting types

Related

numeric format strings #,#0.00 vs #,0.00

I tried to figure out the basics of these numeric string formatters. So I think I understand the basics but there is one thing I'm not sure about
So, for example
#,##0.00
It turns out that it produces identical results as
#,#0.00
or
#,0.00
#,#########0.00
So my question is, why are people using the #,## so often (I see it a lot when googling)
Maybe I missed something.
You can try it out yourself here and put the following inside that main function
double value = 1234.67890;
Console.WriteLine(value.ToString("#,0.00"));
Console.WriteLine(value.ToString("#,#0.00"));
Console.WriteLine(value.ToString("#,##0.00"));
Console.WriteLine(value.ToString("#,########0.00"));

Probably because Microsoft uses the same format specifier in their documentation, including the page you linked. It's not too hard to figure out why; #,##0.00 more clearly states the programmer's intent: three-digit groups separated by commas.

What happens?
The following function is called:
public string ToString(string? format)
{
return Number.FormatDouble(m_value, format, NumberFormatInfo.CurrentInfo);
}
It is important to realize that the format is used to format the string, but your formats happen to give the same result.
Examples:
value.ToString("#,#") // 1,235
value.ToString("0,0") // 1,235
value.ToString("#") // 1235
value.ToString("0") // 1235
value.ToString("#.#")) // 1234.7
value.ToString("#.##") // 1234.68
value.ToString("#.###") // 1234.679
value.ToString("#.#####") // 1234.6789
value.ToString("#.######") // = value.ToString("#.#######") = 1234.6789
We see that
it doesn't matter whether you put #, 0, or any other digit for that matter
One occurrence means: any arbitrary large number
double value = 123467890;
Console.WriteLine(value.ToString("#")); // Prints the full number
, and . however, are treated different for double
After a dot or comma, it will only show the amount of character that are provided (or less: as for #.######).
At this point it's clear that it has to do with the programmer's intent. If you want to display the number as 1,234.68 or 1234.67890, you would format it as
"#,###.##" or "#,#.##" // 1,234.68
"####.#####" or "#.#####" // 1234.67890

String interpolation C#: Documentation of colon and semicolon functionality

I found this codegolf answer for the FizzBuzz test, and after examining it a bit I realized I had no idea how it actually worked, so I started investigating:
for(int i=1; i<101;i++)
System.Console.Write($"{(i%3*i%5<1?0:i):#}{i%3:;;Fizz}{i%5:;;Buzz}\n");
I put it into dotnetfiddle and established the 1st part works as follows:
{(BOOL?0:i):#}
When BOOL is true, then the conditional expression returns 0 otherwise the number.
However the number isn't returned unless it's <> 0. I'm guessing this is the job the of :# characters. I can't find any documentation on the :# characters workings. Can anyone explain the colon/hash or point me in the right direction?
Second part:
{VALUE:;;Fizz}
When VALUE = 0 then nothing is printed. I assume this is determined by the first ; character [end statement]. The second ; character determines 'if VALUE <> 0 then print what's after me.'
Again, does anyone have documentation on the use of a semicolon in string interpolation, as I can't find anything useful.

This is all covered in the String Interpolation documentation, especially the section on the Structure of an Interpolated String, which includes this:
{<interpolatedExpression>[,<alignment>][:<formatString>]}
along with a more detailed description for each of those three sections.
The format string portion of that structure is defined on separate pages, where you can use standard and custom formats for numeric types as well as standard and custom formats for date and time types. There are also options for Enum values, and you can even create your own custom format provider.
It's worth taking a look at the custom format provider documentation just because it will also lead you to the FormattableString type. This isn't well-covered by the documentation, but my understanding is this type may in theory allow you to avoid re-parsing the interpolated string for each iteration when used in a loop, thus potentially improving performance (though in practice, there's no difference at this time). I've written about this before, and my conclusion is MS needs to build this into the framework in a better way.

Thanks to all the commenters! Fast response.
The # is defined here (Custom specifier)
https://learn.microsoft.com/en-us/dotnet/standard/base-types/custom-numeric-format-strings#the--custom-specifier
The "#" custom format specifier serves as a digit-placeholder symbol.
If the value that is being formatted has a digit in the position where
the "#" symbol appears in the format string, that digit is copied to
the result string. Otherwise, nothing is stored in that position in
the result string. Note that this specifier never displays a zero that
is not a significant digit, even if zero is the only digit in the
string. It will display zero only if it is a significant digit in the
number that is being displayed.
The ; is defined here (Section Seperator):
https://learn.microsoft.com/en-us/dotnet/standard/base-types/custom-numeric-format-strings#the--section-separator
The semicolon (;) is a conditional format specifier that applies
different formatting to a number depending on whether its value is
positive, negative, or zero. To produce this behavior, a custom format
string can contain up to three sections separated by semicolons...

custom string format puzzler

We have a requirement to display bank routing/account data that is masked with asterisks, except for the last 4 numbers. It seemed simple enough until I found this in unit testing:
string.Format("{0:****1234}",61101234)
is properly displayed as: "****1234"
but
string.Format("{0:****0052}",16000052)
is incorrectly displayed (due to the zeros??): "****1600005252""
If you use the following in C# it works correctly, but I am unable to use this because DevExpress automatically wraps it with "{0: ... }" when you set the displayformat without the curly brackets:
string.Format("****0052",16000052)
Can anyone think of a way to get this format to work properly inside curly brackets (with the full 8 digit number passed in)?
UPDATE: The string.format above is only a way of testing the problem I am trying to solve. It is not the finished code. I have to pass to DevExpress a string format inside braces in order for the routing number to be formatted correctly.

It's a shame that you haven't included the code which is building the format string. It's very odd to have the format string depend on the data in the way that it looks like you have.
I would not try to do this in a format string; instead, I'd write a method to convert the credit card number into an "obscured" string form, quite possibly just using Substring and string concatenation. For example:
public static string ObscureFirstFourCharacters(string input)
{
// TODO: Argument validation
return "****" + input.Substring(4);
}
(It's not clear what the data type of your credit card number is. If it's a numeric type and you need to convert it to a string first, you need to be careful to end up with a fixed-size string, left-padded with zeroes.)

I think you are looking for something like this:
string.Format("{0:****0000}", 16000052);
But I have not seen that with the * inline like that. Without knowing better I probably would have done:
string.Format("{0}{1}", "****", str.Substring(str.Length-4, 4);
Or even dropping the format call if I knew the length.
These approaches are worthwhile to look through: Mask out part first 12 characters of string with *?
As you are alluding to in the comments, this should also work:
string.Format("{0:****####}", 16000052);
The difference is using the 0's will display a zero if no digit is present, # will not. Should be moot in your situation.
If for some reason you want to print the literal zeros, use this:
string.Format("{0:****\0\052}", 16000052);
But note that this is not doing anything with your input at all.

how to change values in string from 0,00 to 0.00

How can I change values in string from 0,00 to 0.00? - only numeric values, not all chars "," to "."
FROM
string myInputString = "<?xml version=\"1.0\"?>\n<List xmlns:Table=\"urn:www.navision.com/Formats/Table\"><Row><HostelMST>12,0000</HostelMST><PublicMST>0,0000</PublicMST><TaxiMST>0,0000</TaxiMST><ParkMST>0,0000</ParkMST><RoadMST>0,0000</RoadMST><FoodMST>0,0000</FoodMST><ErrorCode>0</ErrorCode><ErrorDescription></ErrorDescription></Row></List>\n";
TO
string myInputString = "<?xml version=\"1.0\"?>\n<List xmlns:Table=\"urn:www.navision.com/Formats/Table\"><Row><HostelMST>12.0000</HostelMST><PublicMST>0.0000</PublicMST><TaxiMST>0.0000</TaxiMST><ParkMST>0.0000</ParkMST><RoadMST>0.0000</RoadMST><FoodMST>0.0000</FoodMST><ErrorCode>0</ErrorCode><ErrorDescription></ErrorDescription></Row></List>\n";
Thanks for answers, but I mean to change only numeric values, not all chars "," to "."
I don't want change string from
string = "<Attrib>txt txt, txt</Attrib><Attrib1>12,1223</Attrib1>";
to
string = "<Attrib>txt txt. txt</Attrib><Attrib1>12.1223</Attrib1>";
but this one is ok
string = "<Attrib>txt txt, txt</Attrib><Attrib1>12.1223</Attrib1>";

Try this :
Regex.Replace("attrib1='12,34' attrib2='43,22'", "(\\d),(\\d)", "$1.$2")
output : attrib1='12.34' attrib2='43.22'

The best method depends on the context. Are you parsing the XML? Are you writing the XML. Either way it's all to do with culture.
If you are writing it then I am assuming your culture is set to something which uses commas as decimal seperators and you're not aware of that fact. Firstly go change your culture in Windows settings to something which better fits your culture and the way you do things. Secondly, if you were writing the numbers out for human display then I would leave it as culturally sensative so it will fit whoever is reading it. If it is to be parsed by another machine then you can use the Invariant Culture like so:
12.1223.ToString(CultureInfo.InvariantCulture);
If you are reading (which I assume is what you are doing) then you can use the culture info again. If it was from a human source (e.g. they typed it in a box) then again use their default culture info (default in float.Parse). If it is from a computer then use InvariantCulture again:
float f = float.Parse("12.1223", CultureInfo.InvariantCulture);
Of course, this assumes that the text was written with an invariant culutre. But as you're asking the question it's not (unless you have control over it being written, in which case use InvariantCulture to write it was suggested above). You can then use a specific culture which does understand commas to parse it:
NumberFormatInfo commaNumberFormatInfo = new NumberFormatInfo();
commaNumberFormatInfo.NumberDecimalSeperator = ",";
float f = float.Parse("12,1223", commaNumberFormatInfo);

I strongly recommend joel.neely's regex approach or the one below:
Use XmlReader to read all nodes
Use double.TryParse with the formatter = a NumberFormatInfo that uses a comma as decimal separator, to identify numbers
Use XmlWriter to write a new XML
Use CultureInfo.InvariantCulture to write the numbers on that XML

The answer from ScarletGarden is a start, but you'll need to know the complete context and grammar of "numeric values" in your data.
The problem with the short answer is that cases such as this get modified:
<elem1>quantity<elem2>12,6 of which were broken</elem2></elem1>
Yes, there's probably a typo (missing space after the comma) but human-entered data often has such errors.
If you include more context, you're likely to reduce the false positives. A pattern like
([\s>]-?$?\d+),(\d+[\s<])
(which you can escape to taste for your programming language of choice) would only match when the "digits-comma-digits" portion (with optional sign and currency symbol) was bounded by space or an end of an element. If all of your numeric values are isolated within XML elements, then you'll have an easier time.

string newStr = myInputString.Replace("0,00", "0.00");

While you could theoretically do this using a Regex, the pattern would be complex and hard to to test. ICR is on the right track, you need to do this based on culture.
Do you know that your numbers are always going to be using a comma as a decimal separator instead of a period? It looks like you can, given that Navision is a Danish company.
If so, you'll need to traverse the XML document in the string, and rewrite the numeric values. It appears you can determine this on node name, so this won't be an issue.
When you convert the number, use something similar to this:
here's what you want to do:
internal double ConvertNavisionNumber(string rawValue)
{
double result = 0;
if (double.TryParse(rawValue, NumberStyles.Number, new CultureInfo("da-DK").NumberFormat, out result))
return result;
else
return 0;
}
This tells the TryParse() method that you're converting a number from Danish (da-DK). Once you call the function, you can use ToString() to write the number out in your local format (which I'm assuming is US or Canadian) to get a period for your decimal separator. This will also take into account numbers with different thousands digit separator (1,234.56 in Canada is written as 1 234,56 in Denmark).
ConvertNavisionNumber("4,43").ToString()
will result in "4.43".
ConvertNavisionNumber("1 234").ToString()
will result in "1,234".

if the , is not used anywhere else but number with in the string you can use the following:
string newStr = myInputString.Replace(",", ".");

C#: How do you go upon constructing a multi-lined string during design time?

How would I accomplish displaying a line as the one below in a console window by writing it into a variable during design time then just calling Console.WriteLine(sDescription) to display it?
Options:
-t Description of -t argument.
-b Description of -b argument.

If I understand your question right, what you need is the # sign in front of your string. This will make the compiler take in your string literally (including newlines etc)
In your case I would write the following:
String sDescription =
#"Options:
-t Description of -t argument.";
So far for your question (I hope), but I would suggest to just use several WriteLines.
The performance loss is next to nothing and it just is more adaptable.
You could work with a format string so you would go for this:
string formatString = "{0:10} {1}";
Console.WriteLine("Options:");
Console.WriteLine(formatString, "-t", "Description of -t argument.");
Console.WriteLine(formatString, "-b", "Description of -b argument.");
the formatstring makes sure your lines are formatted nicely without putting spaces manually and makes sure that if you ever want to make the format different you just need to do it in one place.

Console.Write("Options:\n\tSomething\t\tElse");
produces
Options:
Something Else
\n for next line, \t for tab, for more professional layouts try the field-width setting with format specifiers.
http://msdn.microsoft.com/en-us/library/txafckwd.aspx

If this is a /? screen, I tend to throw the text into a .txt file that I embed via a resx file. Then I just edit the txt file. This then gets exposed as a string property on the generated resx class.
If needed, I embed standard string.Format symbols into my txt for replacement.

Personally I'd normally just write three Console.WriteLine calls. I know that gives extra fluff, but it lines the text up appropriately and it guarantees that it'll use the right line terminator for whatever platform I'm running on. An alternative would be to use a verbatim string literal, but that will "fix" the line terminator at compile-time.

I know C# is mostly used on windows machines, but please, please, please try to write your code as platform neutral. Not all platforms have the same end of line character. To properly retrieve the end of line character for the currently executing platform you should use:
System.Environment.NewLine
Maybe I'm just anal because I am a former java programmer who ran apps on many platforms, but you never know what the platform of the future is.

The "best" answer depends on where the information you're displaying comes from.
If you want to hard code it, using an "#" string is very effective, though you'll find that getting it to display right plays merry hell with your code formatting.
For a more substantial piece of text (more than a couple of lines), embedding a text resources is good.
But, if you need to construct the string on the fly, say by looping over the commandline parameters supported by your application, then you should investigate both StringBuilder and Format Strings.
StringBuilder has methods like AppendFormat() that accept format strings, making it easy to build up lines of format.
Format Strings make it easy to combine multiple items together. Note that Format strings may be used to format things to a specific width.
To quote the MSDN page linked above:
Format Item Syntax
Each format item takes the following
form and consists of the following
components:
{index[,alignment][:formatString]}
The matching braces ("{" and "}") are
required.
Index Component
The mandatory index component, also
called a parameter specifier, is a
number starting from 0 that identifies
a corresponding item in the list of
objects ...
Alignment Component
The optional alignment component is a
signed integer indicating the
preferred formatted field width. If
the value of alignment is less than
the length of the formatted string,
alignment is ignored and the length of
the formatted string is used as the
field width. The formatted data in
the field is right-aligned if
alignment is positive and left-aligned
if alignment is negative. If padding
is necessary, white space is used. The
comma is required if alignment is
specified.
Format String Component
The optional formatString component is
a format string that is appropriate
for the type of object being formatted
...

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.