How to parse international currency strings reliably with minimal code

How to parse international currency strings reliably with minimal code - c#

A bit of background - I am writing some automated user acceptance tests for a mobile application and I don't always have control over the culture of the device that the tests will be executed on.
This application deals with currencies and I was wondering what the best approach might be to parse a string with a currency amount reliably without knowing the device culture.
For example, I would like to parse €12.34 or $12.34 or £12.34 etc to a double value (or whatever).
A workaround is to ignore the first character in the string but that's not necessarily an ideal solution.

What I did eventually to address this was to get all possible international currency symbols as per the snippet below (thanks to stuard) and then to check my own string against each of these values.
string[] currencySymbols = System.Globalization.CultureInfo.GetCultures(System.Globalization.CultureTypes.A‌llCultures).Select(culture => culture.NumberFormat.CurrencySymbol) .Distinct() .ToArray();
I stripped any numeric characters, '.' and ',' from my own string, leaving just leave the currency prefix, before doing the comparison.
I'm not sure if it's the best possible solution but it works!

Related

How to format a string into a string with currency symbol and seperators

Numerical formatting of int, double, decimal can be simply achieved via using Standard Numerical Formatter for example assuming Culture is "en-GB":
int value = 1000;
Console.WriteLine(value.ToString("C0")); // Would output £1,000
However I am wondering if there is a simple way to format a string to something of the same effect as above. For example:
string amount = "£2000"; // Would want to format to "£2,000"
Is there a way to format this string so that the thousands separator is added in the correct position?
Given it's a string I don't think numerical formatting would work without converting the string to a numerical data type beforehand:
var result = Int32.Parse("£2000", NumberStyles.AllowCurrencySymbol, new CultureInfo("en-GB"));
Console.WriteLine(result.ToString("C0", new CultureInfo("en-GB"))); // Outputs £2,000
However its a bit verbose to convert string to int then back to string. Is there is a simpler way to do this given that the starting string has the currency symbol?

Given it's a string I don't think numerical formatting would work without converting the string to a numerical data type beforehand
Indeed.
Is there is a simpler way to do this given that the starting string has the currency symbol?
No. And I seriously doubt that such a feature would ever be added and/or welcomed by the developer community. A formal specification for such a feature would be a complexity nightmare.
Of course, in your particular case, if you are sure that your string always consists of "currency symbol + sequence of digits without comma or period", you could develop a string-based solution optimized for your use case (for example, a fancy regular expression). However, I think that your current solution is both readable and maintainable and you would do your future self a favor by keeping it that way. If you need it multiple times, extract it into a method.

Delete character out of string

I am having some problems with a quite easy task - i feel like im missing something very obvious here.
I have a .csv file which is semicolon seperated. In this file are several numbers that contain dots like "1.300" but there are also dates included like "2015.12.01". The task is to find and delete all dots but only those that are in numbers and not in dates. The dates and numbers are completely variable and never at the same position in the file.
My question now: What is the 'best' way to handle this problem?
From a programmers point of view: Is it a good solution to just split at every semilicon, count the dots and if there is only one dot, delete it? This is the only way to solve the problem i could think of by now.
Example source file:
2015.12.01;
13.100;
500;
1.200;
100;
Example result:
2015.12.01;
13100;
500;
1200;
100;

If you can rely on the fact that dates have two dots and numbers just one, you can use that as a filter:
string s = "123.45";
if (s.Count(x => x == '.') == 1)
{
s = s.Replace(".", null);
}

The source file looks like a valid file generated by a program running on a machine whose locale uses . as the thousand separator (most of Europe does) and date separator (German locales only I think). Such locales also use ; as the list separator.
If the question was only how to parse such dates, numbers, the answer would be to pass the proper culture to the parse function, eg: decimal.Parse("13.500",new CultureInfo("de-at")) would return 13500. The actual issue though is that the data must be fed to another program that uses . as the decimal separator.
The safest option would be to change the locale used by the exporting program, eg change the thread CultureInfo if the exporter is a .NET program, the locale in an SSIS package etc, to a locale like en-gb to export with . and avoid the weird date format. This assumes that the next program in the pipeline doesn't use German for the date, English for numbers
Another option would be to load the text, parse the fields using the proper locale then export them in the format required by the next program.
Finally, a regular expression could be used to match only the numeric fields and remove the dot. This can be a bit tricky and depends on the actual contents.
For example (\d+)\.(\d{3}) can be used to match numbers if there is only one thousand separator. This can fail if some text field contains similar values. Or ;(\d+)\.(\d{3}); could match only a full field, except the first and last fields, eg:
Regex.Replace("1.457;2016.12.30;13.000;1,50;2015.12.04;13.456",#";(\d+)\.(\d{3});",#"$1$2;")
produces :
1.457;2016.12.3013000;1,50;2015.12.04;13.456
A regular expression that would match either numbers between ; or the first/last field could be
(^|;)(\d+)\.(\d{3})(;|$)
This would produce 1457;2016.12.30;13000;1,50;2015.12.04;13456, eg:
var data="1.457;2016.12.30;13.000;1,50;2015.12.04;13.456";
var pattern=#"(^|;)(\d+)\.(\d{3})(;|$)";
var replacement=#"$1$2$3$4";
var result= Regex.Replace(data,pattern,replacement);
The advantage of a regex over splitting and replacing strings is that it's a lot faster and more memory efficient. Instead of generating temporary strings for each split, manipulation, a Regex only calculates indexes in the source. A string object is generated only when you request the final text result. This results in far fewer allocations and garbage collections.
Even in medium-sized files this can result in 10x better performance

I wouldn't rely on the number of dots as mistakes can be made.
You can use the double.TryParse to safely test if the string is a number
var data = "2015.12.01;13.100;500;1.200;100;";
var dataArray = data.Split(';');
foreach (var s in dataArray)
{
double result;
if(double.TryParse(s,out result))
// implement your logic here
Console.WriteLine(s.Replace(".",string.Empty));
}

Use of CultureInfo.InvariantCulture

I'm working on a project that uses decimals in a textbox. I'm currently developing in a machine that has decimal separators set to "," instead of "." so I had to use this sentence when parsing text string into decimal:
decimal number = Decimal.Parse(number.Text, CultureInfo.InvariantCulture);
Now... Is CultureInfo.InvariantCulture the right thing to do or should I use CurrentCulture instead?
Thank you,
Matias.

For user input, you usually want CultureInfo.CurrentCulture. The fact that you're using a locale that's not natural to you is not the usual user case - if the user has , as the decimal point in their whole system, they're probably used to using that, instead of your preferred .. In other words, while you're testing on a system with locale like that, learn to use , instead of . - it's part of what makes locale testing useful :)
On the other hand, if you're e.g. storing the values in a configuration file or something like that, you really want to use CultureInfo.InvariantCulture.
The thinking is rather simple - is it the user's data (=> CurrentCulture), or is it supposed to be global (=> InvariantCulture)?

If you do a correctly internationalized program...
A) If you are using Winforms or WPF or in general the user will "work" on the machine of the program, then input parsing should be done with the CurrentCulture
B) If you are web-programming, then the user should be able to select its culture, and CurrentCulture (the culture of the web-server) should be only used as a default
And then,
A) data you save "internally" (so to be written and then read by your program) should use InvariantCulture,
B) data you interchange with other apps would be better to be written with InvariantCulture (unless the other app is badly written and requires a specific format),
C) files you export to the user should follow the previous rules (the ones about user interface), unless you are using a "standard-defined format" like XML (then use Xml formatters)

How to safely and correctly convert a number from user input to double?

This is, basically, a CultureInfo problem. Formally, in my country, the decimal separator is a comma (,) and a thousands separator is a dot (.). In practice, however, this is only used by accountants and diligent people. Normally people never use a thousands separator, and they use both a comma and a dot interchangeably as a decimal separator. I've seen this being the problem even in some Excel spreadsheets that I received from other people, with Excel not having recognized a dot as a decimal separator, leaving the field formatted as a string, rather than a number.
My "solution" thus far has been to simply replace all commas in user input with dots and then parsing the double with InvariantCulture, like so:
string userInput;
...
userInput = userInput.Replace(',', '.');
double result;
double.TryParse(userInput, NumberStyles.Float, CultureInfo.InvariantCulture, out result);
This will obviously fail when someone actually enters the thousands separator and this seems to me more like a hack than a real solution. So, other than making my own parser for doubles, are there any cleaner ways to handle this problem?

If you are using ASP.Net you can use the AjaxControlToolkit FilteredTextBox you can also accomplish the task using regular expressions and pattern matching. It is nearly always better to try and get a standard input than attempting to deal with every possible human input variable.
Some other links:
MaskedTextBox
WPF Tools FilteredTextBox

If there are rules that can conclusively determine what they meant, then you can code the logic. With this problem, though, it is impossible to know the intent in every case:
1,001 === 1.001 or 1001
Also, even though any "better" logic might assume that numbers like "1,01" are unambiguous, such an entry might be a typo of "1,001." How likely this is depends on what kind of data you're gathering.
If people rarely use a thousands separator, then your existing logic seems good. If you want to be 100% certain of intent, though, the only way to be sure is to ask them what they meant in such cases. E.g. if someone enters 1,001 or 1.001 then fail validation, but recode it as "1,001.0" (or .00 if dealing with currency) to disambiguate it, forcing them to resumbit it.
In practice, you probably would cause more harm than good with this kind of abundance of caution since people don't really use the thousands separator. I'd stick with what you got.

Best way to parse float?

What is the best way to parse a float in CSharp?
I know about TryParse, but what I'm particularly wondering about is dots, commas etc.
I'm having problems with my website. On my dev server, the ',' is for decimals, the '.' for separator. On the prod server though, it is the other way round.
How can I best capture this?

I agree with leppie's reply; to put that in terms of code:
string s = "123,456.789";
float f = float.Parse(s, CultureInfo.InvariantCulture);

Depends where the input is coming from.
If your input comes from the user, you should use the CultureInfo the user/page is using (Thread.CurrentThread.CurrentUICulture).
You can get and indication of the culture of the user, by looking at the HttpRequest.UserLanguages property. (Not correct 100%, but I've found it a very good first guess) With that information, you can set the Thread.CurrentThread.CurrentUICulture at the start of the page.
If your input comes from an internal source, you can use the InvariantCulture to parse the string.
The Parse method is somewhat easier to use, if your input is from a controlled source. That is, you have already validated the string. Parse throws a (slow) exception if its fails.
If the input is uncontrolled, (from the user, or other Internet source) the TryParse looks better to me.

If you want persist values ( numbers, date, time, etc... ) for internal purpose. Everytime use "InvariantCulture" for formating & parsing values. "InvariantCulture" is same on every computer, every OS with any user's culture/language/etc...
string strFloat = (15.789f).ToString(System.Globalization.CultureInfo.InvariantInfo);
float numFloat = float.Parse(System.Globalization.CultureInfo.InvariantInfo, strFloat);
string strNow = DateTime.Now.ToString(System.Globalization.CultureInfo.InvariantInfo);
DateTime now = DateTime.Parse(System.Globalization.CultureInfo.InvariantInfo, strNow);

You could always use the overload of Parse which includes the culture to use?
For instance:
double number = Double.Parse("42,22", new CultureInfo("nl-NL").NumberFormat); // dutch number formatting
If you have control over all your data, you should use "CultureInfo.InvariantCulture" in all of your code.

Use a neutral culture (or one you know) when parsing with Try/Parse.

Pass in a CultureInfo or NumberFormatInfo that represents the culture you want to parse the float as; this controls what characters are used for decimals, group separators, etc.
For example to ensure that the '.' character was treated as the decimal indicator you could pass in CultureInfo.InvariantCulture (this one is typically very useful in server applications where you tend to want things to be the same irrespective of the environment's culture).

Try to avoid float.Parse, use TryParse instead as it performs a lot better but does the same job.
this also applies to double, DateTime, etc...
(some types also offer TryParseExact which also performs even better!)

The source is an input from a website. I can't rely on it being valid. So I went with TryParse as mentioned before.
But I can't figure out how to give the currentCulture to it.
Also, this would give me the culture of the server it's currently running on, but since it's the world wide web, the user can be from anywhere...

you can know current Cuklture of your server with a simple statement:
System.Globalization.CultureInfo culture = System.Globalization.CultureInfo.CurrentCulture;
Note that there id a CurrentUICulture property, but UICulture is used from ResourceMeanager form multilanguages applications. for number formatting, you must considere CurrentCulture.
I hope this will help you

One approach is to force localization to use dot instead of comma separator - this way your code will work identically on all windows machines independently from selected language and settings.
This approach is applicable to small gained applications, like test applications, console applications and so on. For application, which was localization in use this is not so useful, but depends on requirements of application.
var CurrentCultureInfo = new CultureInfo("en", false);
CurrentCultureInfo.NumberFormat.NumberDecimalSeparator = ".";
CurrentCultureInfo.NumberFormat.CurrencyDecimalSeparator = ".";
Thread.CurrentThread.CurrentUICulture = CurrentCultureInfo;
Thread.CurrentThread.CurrentCulture = CurrentCultureInfo;
CultureInfo.DefaultThreadCurrentCulture = CurrentCultureInfo;
This code forces to use dot ('.') instead of comma, needs to be placed at application startup.

Since you don't know the web user's culture, you can do some guesswork. TryParse with a culture that uses , for separators and . for decimal, AND TryParse with a culture that uses . for separators and , for decimal. If they both succeed but yield different answers then you'll have to ask the user which they intended. Otherwise you can proceed normally, given your two equal results or one usable result or no usable result.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to parse international currency strings reliably with minimal code - c#

Related

How to format a string into a string with currency symbol and seperators

Delete character out of string

Use of CultureInfo.InvariantCulture

How to safely and correctly convert a number from user input to double?

Best way to parse float?

Categories

Resources