Can somebody explain to me what is the use of globalization in C#?
Is it used for conversion purposes? I mean I want to convert any English word into a selected language.
So will this globalization or cultureinfo help me?
Globalization is a means of formatting text for specific cultures. E.g. a string representation of the number 1000 may be 1,000.00 for the UK or 1'000,00 for France. It is quite an in depth subject but that is the essential aim.
It is NOT a translation service, but it does allow you to determine the culture under which your application is running and therefore allow you to choose the language you want to display. You will have to provide text translation yourself, however, usually by means of resource files.
Globalization is a way of allowing the user to customize the application that he or she may be using to fit the standards where they may be. Cusomtization allows for the:
Money Formatting
Time
Date
Text orientation
To be culturally appropriate. The region that is currently set is handled by the OS and passed to your application. Globalization/Internationalization(I18n) also typically motivates the developer to separate the displayed text of the program from the implementation its self.
From MSDN:
System.Globalization - contains
classes that define culture-related
information, including the language,
the country/region, the calendars in
use, the format patterns for dates,
currency and numbers, and the sort
order for strings.
This assembly helps in making your application culture-aware, and is used heavily internally within the .NET framework. For example, when converting from Date to String, Globalization is used to determine what format to use, such as "11/28/2009" or "28-11-2009". Generally this determination is done automatically within the framework without you ever using the assembly directly. However, if you need to, you can use Globalization directly to look up culture-specific information for your own use.
To clear even more confusion
Localization (or Localisation for non-US people), L10n for short: process of adapting program for a specific location. It consist of translating resources, adapting UI (if necessary), etc.
Internationalization, i18n for short: process of adapting program to support localization, regional characters, formats and so on and so forth, but most importantly, the process of allowing program to work correctly regardless of current locale settings and OS language version.
Globalization, g11n for short: consist of both i18n and L10n.
To clear some confusion:
Globalisation: Allowing your program to use locale specific resources loaded from an external resource DLL at runtime. This means putting all your strings in resource files rather than hard coding them into the source code.
Localisation: Adapting your program for a specific locale. This could be translating Strings and making dialog boxes read right-to-left for languages such as Arabic.
Here is a link to creating Satellite DLLs. Its says C++ but it the same principle applies to C#.
Globalization:-
Globalization is the process of designing and developing applications for multiple cultures regions.
Localization:-
Localization is the process of customizing application for a given culture and locale.
Related
I am seeing a different Unicode character as the number group separator for the "de-CH" culture when running on a local desktop and in Azure.
When the following code is run on my desktop in .NET Core 3.1 or .NET Framework 4.7.2 it outputs 2019 which looks like an apostrophe but is not the same.
When run in Azure, for instance in https://try.dot.net or (slightly modified) in an Azure function running on .NET Core 3.1 (on a Windows based App Service) it results in 0027, a standard ASCII apostrophe.
using System;
using System.Linq;
using System.Globalization;
Console.WriteLine(((int)(CultureInfo
.GetCultureInfo("de-CH")
.NumberFormat
.NumberGroupSeparator
.Single())) // Just getting the single character as an int
.ToString("X4") // unicode value of that character
);
The result of this is that trying to parse the string 4'200.000 (where the apostrophe there is Unicode 0027) on local desktop using "de-CH" culture fails, but it works in Azure.
Why the difference?
This Microsoft blog by Shawn Steele explains why you shouldn't rely on a specific culture setting being stable (Fully quoted because it is no longer online at MSDN):
https://web.archive.org/web/20190110065542/https://blogs.msdn.microsoft.com/shawnste/2005/04/05/culture-data-shouldnt-be-considered-stable-except-for-invariant/
CultureInfo and RegionInfo data represents a cultural, regional, admin
or user preference for cultural settings. Applications should NOT
make any assumptions that rely on this data being stable. The only
exception (this is a rule, so of course there's an exception) is for
CultureInfo.InvariantCulture. CultureInfo.InvariantCulture is
supposed to remain stable, even between versions.
There are many reasons that cultural data can change. With Whidbey
and Custom Cultures the list gets a little longer.
The most obvious reason is that there is a bug in the data and we had to make a change. (Believe it or not we make mistakes ;-)) In this case our users (and yours too) want culturally correct data, so we have to fix the bug even if it breaks existing applications.
Another reason is that cultural preferences can change. There're lots of ways this can happen, but it does happen:
Global awareness, cross cultural exchange, the changing role of computers and so forth can all effect a cultural preference.
International treaties, trade, etc. can change values. The adoption of the Euro changed many countries currency symbol to €.
National or regional regulations can impact these values too.
Preferred spelling of words can change over time.
Preferred date formats, etc can change.
Multiple preferences could exist for a culture. The preferred best choice can then change over time.
Users could have overridden some values, like date or time formats. These can be requested without user override, however we recommend that applications consider using user overrides.
Users or administrators could have created a replacement culture, replacing common default values for a culture with company specific, regional specific, or other variations of the standard data.
Some cultures may have preferences that vary depending on the setting. A business might have a more formal form than an Internet Café.
An enterprise may require a specific date format or time format for the entire organization.
Differing versions of the same custom culture, or one that's custom on one machine and a windows only culture on another machine.
So if you format a string with a particular date/time format, and then
try to Parse it later, parse might fail if the version changed, if the
machine changed, if the framework version changed (newer data), or if
a custom culture was changed. If you need to persist data in a
reliable format, choose a binary method, provide your own format or
use the InvariantCulture.
Even without changing data, remembering to use Invariant is still a
good idea. If you have different . and , syntax for something like
1,000.29, then Parsing can get confused if a client was expecting
1.000,29. I've seen this problem with applications that didn't realize that a user's culture would be different than the developer's
culture. Using Invariant or another technique solves this kind of
problem.
Of course you can't have both "correct" display for the current user
and perfect round tripping if the culture data changes. So generally
I'd recommend persisting data using InvariantCulture or another
immutable format, and always using the appropriate formatting APIs for
display. Your application will have its own requirements, so consider
them carefully.
Note that for collation (sort order/comparisons), even Invariant
behavior can change. You'll need to use the Sort Versioning to get
around that if you require consistently stable sort orders.
If you need to parse data automatically that is formatted to be user-friendly, there are two approaches:
Allow the user to explicitly specify the used format.
First remove every character except digits, minus sign and the decimal separator from the string before trying to parse this. Note that you need to know the correct decimal separator first. There is no way to guess this correctly and guessing wrong could result in major problems.
Wherever possible try to avoid parsing numbers that are formatted to be user-friendly. Instead whenever possible try to request numbers in a strictly defined (invariant) format.
Background
We have an older webforms application, and we're not using any custom language/globalization code that I'm aware of. We're using resx files for culture specific resources. For Chinese, we currently have 4 resx files: zh, zh-cn, zh-hans, and zh-hant. I am hoping to pair down to just 2 for simplicity's sake: zh-hans and zh-hant. We don't intend to support any locale-specific variations; just generic Simplified Chinese and generic Traditional Chinese.
Question(s)
My main question is this: Are there any situations where I would need to have more than zh-hans or zh-hant files for Chinese? *Outside of a need to support any locale-specific variations.
Basically, I need to guarantee that if a user browses to our site using zh, zh-cn, zh-sg language codes, they will get resources from the zh-hans file, and if they browse using zh-tw, zh-hk, or zh-mo, they get resources from the zh-hant file. These are the only Chinese codes I know of, but if there are others, they should be included here as well - basically a Chinese speaking user with their browser set to any Chinese language, should see the appropriate Simplified or Traditional Chinese - and definitely never see English.
So far my testing seems to indicate that yes, .NET resolves them correctly, but I want to be sure there are no hidden or rare scenarios I am missing.
I am aware that the zh-hans and zh-hant codes are a little newer - and are considered "parent" cultures. I am also aware that the old generic, parent codes were zh-chs and zh-cht. Are there old browsers out there that allow these zh-chs/zh-cht codes to be selected, and if so, will .NET still resolve them appropriately?
I am not entirely sure how .NET resolves the correct resource files, so if someone could point me in the right direction, such as what classes/namespaces are involved, that would also be great.
Resources
Language codes for simplified Chinese and traditional Chinese?
IETF Language Tags
The answer was no - you can't have only zh-hans/zh-hant. Because, if you do, and you pass the base zh code with your browser, it will NOT automatically resolve to the zh-hans or zh-hant resx files.
The reason is due to something called the “Resource Fallback Process”.
Basically, .NET resolves resources through a hierarchy. It will start with the culture code requested, and if a resx for that specific culture code doesn’t exist, it will then begin to travel up the hierarchy, searching for the next culture code that does have a resx. But, it never travels down, only up.
Here is an example of the hierarchy for zh-sg [Singapore]:
Invariant
zh
zh-Hans
zh-chs (legacy generic code)
zh-sg (locale-specific codes)
So first it looks for zh-sg. If there is no resource file for zh-sg, it looks for zh-chs. If there is no resource file for zh-chs, it looks for zh-Hans. And so forth.
Having just zh-Hans and zh-Hant would work for all the locale-specific codes, such as zh-sg, zh-tw, etc. Some browsers allow you to select Generic Simplified (zh-Hans) or Generic Traditional (zh-Hant), and obviously it'd work in those scenarios as well. The only scenario it wouldn't work for is when base Chinese zh was selected and passed in the request. In that scenario, if zh doesn't exist, the next place it looks is the Invariant culture - in this case, that returns English text - which is obviously not what we want.
My solution, then, was to pair down to having base zh, and zh-Hant for my 2 codes. Base zh then is my file containing Simplified text, and zh-Hant of course is my file containing Traditional text. I could've done it the other way around (made zh contain Traditional text and used zh-Hans for Simplified) but really, there are no guidelines out there I can find, about what the base zh should represent. Should it be Simplified, or should it be Traditional? I suspect it's largely a matter of preference. A quick search of several popular websites revealed that in most cases, a request for base zh produces text which Google Translate recognizes as Simplified. In a few cases it did produce Traditional. But, I went with what the majority seemed to be doing.
I am writing a program that needs to parse a bunch of text files generated by some third-party software. Some of these files will be generated in France, where something like "1,5" means "one and a half". Other files will be generated in the US, where "1,5" is not a number, and "one and a half" is "1.5". Of course, "1,234.5" is a legitimate number in the US.
These are just examples; in reality, my program needs to deal with a variety of numbers in a variety of locales; it needs to handle things like "e-5" and "2e10", etc. Unfortunately, there's no way to know ahead of time which file comes from which locale.
Is there some commonly accepted solution to this problem in C# ? I realize that I can write my own number-parsing code, but I'd prefer to avoid it, unless there's no other way...
Since your entire input file has been generated from one locale, you could look at the problem as having to detect the specific locale from the input file prior to actually parsing it. It's an extra requirement that results from the inadequate input files (which should all use one agreed locale or have a field to specify the locale used).
Language detection is not a complete solution as number formatting is not language-specific but locale-specific. Here is an example: If you detect the language as Spanish, would that be es-ES (Spain) or es-MX (Mexico)? In the former case, the decimal separator is a comma (1,23). In the latter, the decimal separator is a period (1.23).
The solution would be heuristics-based. The simplest is probably that if you know what your locale generally is (e.g. most of your users use the period), you could have an ordered list of culture identifiers and try then one after the other until you've found one that can be used to interpret all the numbers in the file. Could be as simple as starting with en-US and, failing that, trying with en-GB, since for numbers, there really aren't many more formats.
This is maybe a little bit overdesigned solution, but it could work (In case your text files contain some text apart from numbers):
Detect language of your text files using a letter frequency. Google has open sourced a code they use in Chrome to detect page language - http://code.google.com/p/chromium-compact-language-detector/. I think I saw C# wrapper for this, but I can´t find it now. If you don´t want to use any library, it is not so difficult to implement it on your own. I have done some very simple testing of this algorithm and it seems that it is possible to detect a language from only about 15-20 letters.
Build regular expression based on rules for detected language (Or just parse it). This can be very complex problem, considering that there are many rules for decimal separator, number grouping, negative signs etc. But it is not impossible to implement.
As you see from the comments your problem has no fail safe solution.
The best you can do is minimize the error:
Since each file (hopefully) contains several numbers all from the same locale, try parsing the numbers in file with all the expected distinct locales (i.e. don't parse with en-US and en-AU for instance as the number format for both locales is the same.)
After parsing you'll end up with either of:
A single matching locale.
Multiple locales.
In the second case test whether the results from all locales match (most/all locales parse integers without thousand separators and scientific notation the same way.)
If they match no problem, else try to employ heuristics to figure out the correct locale:
Are the values in the expected range.
If there is any other text in the file, you can do a word search in language dictionaries to try and figure out the language.
If everything fails discard the file and mark it for manual processing.
Your program should have a facility that allows marking files as being of a specific culture bypassing the heuristics.
Your best choice is to change the input format so that the file locale is specified somewhere, such as in the data, the name of the file or an accompanying metadata file.
Does Microsoft implementation of C# runtime offer some localization mechanism to translate common strings like Overflow, Stack overflow, Underflow, etc...
See the code below - it's a part of Mono and Mono itself has a Locale.GetText routine for making such translations.
// Added to avoid possible integer overflow.
if (inputOffset > inputBuffer.Length - inputCount)
throw new ArgumentException("inputOffset" +
Locale.GetText("Overflow");
Now - how is it done in Microsoft version of runtime and how can I use it, for example, to get the localized equivalent of Overflow without adding resource files?
.NET provides a framework that makes it easy to localize your content (ResourceManager) and while it internally maintains some translations for its own purpose (for example DateTime.ToString gives you a textual representation for the date/time that is locally appropriate, which includes the translated month and day names), it does not provide you with any ready-made translations, be they common strings or not. It could hardly do this reliably anyway, as there is a plethora of human languages out there and words can have different translations depending on context etc.
In your example, I would say that you are OK with untranslated exception messages. Although Microsoft recommends that you localize exception descriptions and they do localize their own (at least for major languages), this advice seems ill-thought at it's not only a waste of effort to translate all this text that users probably should never see, but it can make debugging a nightmare.
Yes, it does and it's a terrible idea. It makes debugging so much harder.
without adding resource files
What do you have against resource files? Resources are the prescribed way to provide localized and localizable strings, images, and other data for a .NET app or assembly.
Note that single word substitution as shown in your example code will result in poor quality translations. Different languages have different sentence structure and word order which your single word substitution won't accommodate. Non-English languages often involve genders for nouns and declension of words to properly reflect their role and number in a phrase. Single word substitution fails miserably at this.
Your non-English customers will most likely prefer that you not butcher their language by attempting to partially translate text a word here and a word there. If you're going to go to the trouble of supporting localizable messages, do it right and allow the entire string to be translated so that word ordering and declension can be done properly by translators. In cases where the content is variable, make the format string a resource so that the translator can set off the variable data using the conventions of the language.
We need to have our apps be translated into other languages. This entails renaming the .text properties of our visible controls as well as other literals found within our apps to whatever language we need to translate into.
Is this something that can easily be accomplished with .resx files? I was thinking of creating a master resx key/value list where the key would be the fully qualified name of the control/variable/constant etc. and then refactor our apps to look into this file to get their values based on the cultureinfo found at runtime?
Is there a standard or simpler approach to this problem?
Check out FairlyLocal when you get a chance. It's a library that lets you do i18n using GetText, thus allowing you to follow the best practices from the rest of the industry rather than the .resx stuff that MS tries to force on you.
There a quite a few resources for this:
MSDN guide for ASP.NET applications.
Code Project example for WPF applications.
You are correct in thinking that this can be achieved through the use of .resx files. Basically you create .resx file for each language you wish to support and if you give it a name based on the locale (EN-US, DE-DE, etc) then it gets picked up automatically.