I tried to figure out the basics of these numeric string formatters. So I think I understand the basics but there is one thing I'm not sure about
So, for example
#,##0.00
It turns out that it produces identical results as
#,#0.00
or
#,0.00
#,#########0.00
So my question is, why are people using the #,## so often (I see it a lot when googling)
Maybe I missed something.
You can try it out yourself here and put the following inside that main function
double value = 1234.67890;
Console.WriteLine(value.ToString("#,0.00"));
Console.WriteLine(value.ToString("#,#0.00"));
Console.WriteLine(value.ToString("#,##0.00"));
Console.WriteLine(value.ToString("#,########0.00"));
Probably because Microsoft uses the same format specifier in their documentation, including the page you linked. It's not too hard to figure out why; #,##0.00 more clearly states the programmer's intent: three-digit groups separated by commas.
What happens?
The following function is called:
public string ToString(string? format)
{
return Number.FormatDouble(m_value, format, NumberFormatInfo.CurrentInfo);
}
It is important to realize that the format is used to format the string, but your formats happen to give the same result.
Examples:
value.ToString("#,#") // 1,235
value.ToString("0,0") // 1,235
value.ToString("#") // 1235
value.ToString("0") // 1235
value.ToString("#.#")) // 1234.7
value.ToString("#.##") // 1234.68
value.ToString("#.###") // 1234.679
value.ToString("#.#####") // 1234.6789
value.ToString("#.######") // = value.ToString("#.#######") = 1234.6789
We see that
it doesn't matter whether you put #, 0, or any other digit for that matter
One occurrence means: any arbitrary large number
double value = 123467890;
Console.WriteLine(value.ToString("#")); // Prints the full number
, and . however, are treated different for double
After a dot or comma, it will only show the amount of character that are provided (or less: as for #.######).
At this point it's clear that it has to do with the programmer's intent. If you want to display the number as 1,234.68 or 1234.67890, you would format it as
"#,###.##" or "#,#.##" // 1,234.68
"####.#####" or "#.#####" // 1234.67890
Related
I found this codegolf answer for the FizzBuzz test, and after examining it a bit I realized I had no idea how it actually worked, so I started investigating:
for(int i=1; i<101;i++)
System.Console.Write($"{(i%3*i%5<1?0:i):#}{i%3:;;Fizz}{i%5:;;Buzz}\n");
I put it into dotnetfiddle and established the 1st part works as follows:
{(BOOL?0:i):#}
When BOOL is true, then the conditional expression returns 0 otherwise the number.
However the number isn't returned unless it's <> 0. I'm guessing this is the job the of :# characters. I can't find any documentation on the :# characters workings. Can anyone explain the colon/hash or point me in the right direction?
Second part:
{VALUE:;;Fizz}
When VALUE = 0 then nothing is printed. I assume this is determined by the first ; character [end statement]. The second ; character determines 'if VALUE <> 0 then print what's after me.'
Again, does anyone have documentation on the use of a semicolon in string interpolation, as I can't find anything useful.
This is all covered in the String Interpolation documentation, especially the section on the Structure of an Interpolated String, which includes this:
{<interpolatedExpression>[,<alignment>][:<formatString>]}
along with a more detailed description for each of those three sections.
The format string portion of that structure is defined on separate pages, where you can use standard and custom formats for numeric types as well as standard and custom formats for date and time types. There are also options for Enum values, and you can even create your own custom format provider.
It's worth taking a look at the custom format provider documentation just because it will also lead you to the FormattableString type. This isn't well-covered by the documentation, but my understanding is this type may in theory allow you to avoid re-parsing the interpolated string for each iteration when used in a loop, thus potentially improving performance (though in practice, there's no difference at this time). I've written about this before, and my conclusion is MS needs to build this into the framework in a better way.
Thanks to all the commenters! Fast response.
The # is defined here (Custom specifier)
https://learn.microsoft.com/en-us/dotnet/standard/base-types/custom-numeric-format-strings#the--custom-specifier
The "#" custom format specifier serves as a digit-placeholder symbol.
If the value that is being formatted has a digit in the position where
the "#" symbol appears in the format string, that digit is copied to
the result string. Otherwise, nothing is stored in that position in
the result string. Note that this specifier never displays a zero that
is not a significant digit, even if zero is the only digit in the
string. It will display zero only if it is a significant digit in the
number that is being displayed.
The ; is defined here (Section Seperator):
https://learn.microsoft.com/en-us/dotnet/standard/base-types/custom-numeric-format-strings#the--section-separator
The semicolon (;) is a conditional format specifier that applies
different formatting to a number depending on whether its value is
positive, negative, or zero. To produce this behavior, a custom format
string can contain up to three sections separated by semicolons...
I am having some problems with a quite easy task - i feel like im missing something very obvious here.
I have a .csv file which is semicolon seperated. In this file are several numbers that contain dots like "1.300" but there are also dates included like "2015.12.01". The task is to find and delete all dots but only those that are in numbers and not in dates. The dates and numbers are completely variable and never at the same position in the file.
My question now: What is the 'best' way to handle this problem?
From a programmers point of view: Is it a good solution to just split at every semilicon, count the dots and if there is only one dot, delete it? This is the only way to solve the problem i could think of by now.
Example source file:
2015.12.01;
13.100;
500;
1.200;
100;
Example result:
2015.12.01;
13100;
500;
1200;
100;
If you can rely on the fact that dates have two dots and numbers just one, you can use that as a filter:
string s = "123.45";
if (s.Count(x => x == '.') == 1)
{
s = s.Replace(".", null);
}
The source file looks like a valid file generated by a program running on a machine whose locale uses . as the thousand separator (most of Europe does) and date separator (German locales only I think). Such locales also use ; as the list separator.
If the question was only how to parse such dates, numbers, the answer would be to pass the proper culture to the parse function, eg: decimal.Parse("13.500",new CultureInfo("de-at")) would return 13500. The actual issue though is that the data must be fed to another program that uses . as the decimal separator.
The safest option would be to change the locale used by the exporting program, eg change the thread CultureInfo if the exporter is a .NET program, the locale in an SSIS package etc, to a locale like en-gb to export with . and avoid the weird date format. This assumes that the next program in the pipeline doesn't use German for the date, English for numbers
Another option would be to load the text, parse the fields using the proper locale then export them in the format required by the next program.
Finally, a regular expression could be used to match only the numeric fields and remove the dot. This can be a bit tricky and depends on the actual contents.
For example (\d+)\.(\d{3}) can be used to match numbers if there is only one thousand separator. This can fail if some text field contains similar values. Or ;(\d+)\.(\d{3}); could match only a full field, except the first and last fields, eg:
Regex.Replace("1.457;2016.12.30;13.000;1,50;2015.12.04;13.456",#";(\d+)\.(\d{3});",#"$1$2;")
produces :
1.457;2016.12.3013000;1,50;2015.12.04;13.456
A regular expression that would match either numbers between ; or the first/last field could be
(^|;)(\d+)\.(\d{3})(;|$)
This would produce 1457;2016.12.30;13000;1,50;2015.12.04;13456, eg:
var data="1.457;2016.12.30;13.000;1,50;2015.12.04;13.456";
var pattern=#"(^|;)(\d+)\.(\d{3})(;|$)";
var replacement=#"$1$2$3$4";
var result= Regex.Replace(data,pattern,replacement);
The advantage of a regex over splitting and replacing strings is that it's a lot faster and more memory efficient. Instead of generating temporary strings for each split, manipulation, a Regex only calculates indexes in the source. A string object is generated only when you request the final text result. This results in far fewer allocations and garbage collections.
Even in medium-sized files this can result in 10x better performance
I wouldn't rely on the number of dots as mistakes can be made.
You can use the double.TryParse to safely test if the string is a number
var data = "2015.12.01;13.100;500;1.200;100;";
var dataArray = data.Split(';');
foreach (var s in dataArray)
{
double result;
if(double.TryParse(s,out result))
// implement your logic here
Console.WriteLine(s.Replace(".",string.Empty));
}
Struggling with the basics - I'm trying to code a simple currency converter. The XML provided by external source uses comma as a decimal separator for exchange rate (kurs_sredni):
<pozycja>
<nazwa_waluty>bat (Tajlandia)</nazwa_waluty>
<przelicznik>1</przelicznik>
<kod_waluty>THB</kod_waluty>
<kurs_sredni>0,1099</kurs_sredni>
</pozycja>
I already managed to load the data from XML into a nifty list of objects (kursyAktualne), and now i'm trying to do the math. I'm stuck with conversion.
First of all i'm assigning "kurs_sredni" to a string, trying to replace "," with "." and converting the hell out of it:
string kursS = kursyAktualne[iNa].kurs_sredni;
kursS.Replace(",",".");
kurs = Convert.ToDouble(kursS);
MessageBox.Show(kurs.ToString());
The messagebox show 1099 instead of expected 0.1099 and kursS still has comma, not dot.
Tried toying with some cultureInfo stuff i googled, but that was too random. I need to understand how to control this.
Just use decimal.Parse but specify a CultureInfo. There's nothing "random" about it - pick an appropriate CultureInfo, and then use that. For example:
using System;
using System.Globalization;
class Test
{
static void Main()
{
var french = CultureInfo.GetCultureInfo("fr-FR");
decimal value = decimal.Parse("0,1099", french);
Console.WriteLine(value.ToString(CultureInfo.InvariantCulture)); // 0.1099
}
}
This is just using French as one example of a culture which uses , as a decimal separator. It would probably make sense to use the culture of the origin of the data.
Note that decimal is a better pick for currency values than double - you're trying to represent an "artificial" construct which is naturally specified in base10, rather than a "natural" continuous value such as a weight.
(I would also be wary of a data provider who provides data in a non-standard format. If they're getting that wrong, who knows what else they'll get wrong. It's not like XML doesn't have a well-specified format for numbers...)
It is because Replace method returns new string with replaced characters. It does not modify your original string.
So you need to reassign it:
kursS = kursS.Replace(",",".");
Replace returns a string. So you need an assignment.
kursS = kursS.Replace(",", ".");
There is "neater" way of doing this by using CulturInfo. Look this up on the MSDN website.
You replace result isn't used, but the original value that doesn't contain the replace.
You should do:
kursS = kursS.Replace(",", ".")
In addition this method isn't really safe if there are thousands-separators.
So if you are not using culture settings you should do:
kursS = kursS.Replace(".", "").Replace(",", ".")
When I learned about the String.Format function, I did the mistake to think that it's acceptable to name the placeholders after the colon, so I wrote code like this:
String.Format("A message: '{0:message}'", "My message");
//output: "A message: 'My message'"
I just realized that the string behind the colon is used to define the format of the placeholder and may not be used to add a comment as I did.
But apparently, the string behind the colon is used for the placeholder if:
I want to fill the placeholder with an integer and
I use an unrecognized formating-string behind the colon
But this doesn't explain to me, why the string behind the colon is used for the placeholder if I provide an integer.
Some examples:
//Works for strings
String.Format("My number is {0:number}!", "10")
//output: "My number is 10!"
//Works without formating-string
String.Format("My number is {0}!", 10)
//output: "My number is 10!"
//Works with recognized formating string
String.Format("My number is {0:d}!", 10)
//output: "My number is 10!"
//Does not work with unrecognized formating string
String.Format("My number is {0:number}!", 10)
//output: "My number is number!"
Why is there a difference between the handling of strings and integers? And why is the fallback to output the formating string instead of the given value?
Just review the MSDN page about composite formatting for clarity.
A basic synopsis, the format item syntax is:
{ index[,alignment][:formatString]}
So what appears after the : colon is the formatString. Look at the "Format String Component" section of the MSDN page for what kind of format strings are predefined. You will not see System.String mentioned in that list. Which is no great surprise, a string is already "formatted" and will only ever appear in the output as-is.
Composite formatting is pretty lenient to mistakes, it won't throw an exception when you specify an illegal format string. That the one you used isn't legal is already pretty evident from the output you get. And most of all, the scheme is extensible. You can actually make a :message format string legal, a class can implement the ICustomFormatter interface to implement its own custom formatting. Which of course isn't going to happen on System.String, you cannot modify that class.
So this works as expected. If you don't get the output you expected then this is pretty easy to debug, you've just go two mistakes to consider. The debugger eliminates one (wrong argument), your eyes eliminates the other.
String.Format article on MSDN has following description:
A format item has this syntax: { index[,alignment][ :formatString] }
...
formatString Optional.
A string that specifies the format of the
corresponding argument's result string. If you omit formatString, the
corresponding argument's parameterless ToString method is called to
produce its string representation. If you specify formatString, the
argument referenced by the format item must implement the IFormattable
interface.
If we directly format the value using the IFormattable we will have the same result:
String garbageFormatted = (10 as IFormattable).ToString("garbage in place of int",
CultureInfo.CurrentCulture.NumberFormat);
Console.WriteLine(garbageFormatted); // Writes the "garbage in place of int"
So it seems that it is something close to the "garbage in, garbage out" problem in the implementation of the IFormattable interface on Int32 type(and possibly on other types as well). The String class does not implement IFormattable, so any format specifier is left unused and .ToString(IFormatProvider) is called instead.
Also:
Ildasm shows that Int32.ToString(String, INumberFormat) internally calls
string System.Number::FormatInt32(int32,
string,
class System.Globalization.NumberFormatInfo)
But it is the internalcall method (extern implemented somewhere in native code), so Ildasm is of no use if we want to determine the source of the problem.
EDIT - CULPRIT:
After reading the How to see code of method which marked as MethodImplOptions.InternalCall? I've used the source code from Shared Source Common Language Infrastructure 2.0 Release (it is .NET 2.0 but nonetheless) in attempt to find a culprit.
Code for the Number.FormatInt32 is located in the ...\sscli20\clr\src\vm\comnumber.cpp file.
The culprit could be deduced from the default section of the format switch statement of the FCIMPL3(Object*, COMNumber::FormatInt32, INT32 value, StringObject* formatUNSAFE, NumberFormatInfo* numfmtUNSAFE):
default:
NUMBER number;
Int32ToNumber(value, &number);
if (fmt != 0) {
gc.refRetString = NumberToString(&number, fmt, digits, gc.refNumFmt);
break;
}
gc.refRetString = NumberToStringFormat(&number, gc.refFormat, gc.refNumFmt);
break;
The fmt var is 0, so the NumberToStringFormat(&number, gc.refFormat, gc.refNumFmt); is being called.
It leads us to nothing else than to the second switch statement default section in the NumberToStringFormat method, that is located in the loop that enumerates every format string character. It is very simple:
default:
*dst++ = ch;
It just plain copies every character from the format string into the output array, that's how the format string ends repeated in the output.
From one point of view it allows to really use garbage format strings that will output nothing useful, but from other point of view it will allow you to use something like:
String garbageFormatted = (1234 as IFormattable).ToString("0 thousands and ### in thousand",
CultureInfo.CurrentCulture.NumberFormat);
Console.WriteLine(garbageFormatted);
// Writes the "1 thousands and 234 in thousand"
that can be handy in some situations.
Interesting behavior indeed BUT NOT unaccounted for.
Your last example works when
if String.Format("My number is {0:n}!", 10)
but revert to the observed beahvior when
if String.Format("My number is {0:nu}!", 10)`.
This prompts to search about the Standard Numeric Format Specifier article on MSDN where you can read
Standard numeric format strings are used to format common numeric
types. A standard numeric format string takes the form Axx, where:
A is a single alphabetic character called the format specifier. Any
numeric format string that contains more than one alphabetic
character, including white space, is interpreted as a custom numeric
format string. For more information, see Custom Numeric Format
Strings.
The same article explains: if you have a SINGLE letter that is not recognized you get an exception.
Indeed
if String.Format("My number is {0:K}!", 10)`.
throws the FormatException as explained.
Now looking in the Custom Numeric Format Strings chapter you will find a table of eligible letters and their possible mixings, but at the end of the table you could read
Other
All other characters
The character is copied to the result string unchanged.
So I think that you have created a format string that cannot in any way print that number because there is no valid format specifier where the number 10 should be 'formatted'.
No it's not acceptable to place anything you like after the colon. Putting anything other than a recognized format specifier is likely to result in either an exception or unpredictable behaviour as you've demonstrated. I don't think you can expect string.Format to behave consistently when you're passing it arguments which are completely inconsistent with the documented formatting types
We have a requirement to display bank routing/account data that is masked with asterisks, except for the last 4 numbers. It seemed simple enough until I found this in unit testing:
string.Format("{0:****1234}",61101234)
is properly displayed as: "****1234"
but
string.Format("{0:****0052}",16000052)
is incorrectly displayed (due to the zeros??): "****1600005252""
If you use the following in C# it works correctly, but I am unable to use this because DevExpress automatically wraps it with "{0: ... }" when you set the displayformat without the curly brackets:
string.Format("****0052",16000052)
Can anyone think of a way to get this format to work properly inside curly brackets (with the full 8 digit number passed in)?
UPDATE: The string.format above is only a way of testing the problem I am trying to solve. It is not the finished code. I have to pass to DevExpress a string format inside braces in order for the routing number to be formatted correctly.
It's a shame that you haven't included the code which is building the format string. It's very odd to have the format string depend on the data in the way that it looks like you have.
I would not try to do this in a format string; instead, I'd write a method to convert the credit card number into an "obscured" string form, quite possibly just using Substring and string concatenation. For example:
public static string ObscureFirstFourCharacters(string input)
{
// TODO: Argument validation
return "****" + input.Substring(4);
}
(It's not clear what the data type of your credit card number is. If it's a numeric type and you need to convert it to a string first, you need to be careful to end up with a fixed-size string, left-padded with zeroes.)
I think you are looking for something like this:
string.Format("{0:****0000}", 16000052);
But I have not seen that with the * inline like that. Without knowing better I probably would have done:
string.Format("{0}{1}", "****", str.Substring(str.Length-4, 4);
Or even dropping the format call if I knew the length.
These approaches are worthwhile to look through: Mask out part first 12 characters of string with *?
As you are alluding to in the comments, this should also work:
string.Format("{0:****####}", 16000052);
The difference is using the 0's will display a zero if no digit is present, # will not. Should be moot in your situation.
If for some reason you want to print the literal zeros, use this:
string.Format("{0:****\0\052}", 16000052);
But note that this is not doing anything with your input at all.