odd results when comparing strings based on culture

odd results when comparing strings based on culture - c#

is there a reason why :
string s1 = "aéa";
string s2 = "aea";
string result = s1.Equals(s2, StringComparison.CurrentCultureIgnoreCase);
result = s1.Equals(s2, StringComparison.InvariantCultureIgnoreCase);
result = false in all cases although my current culture is french.
I would expect one of the 2 lines should return true?
On the other hand, I get
int a = string.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace);
a = 0 meaning an equality.
This sounds paradoxal to me. Any explanation???
thx in advance.

In the first equality check, you are ignoring case with StringComparison.CurrentCultureIgnoreCase in your current culture (fr). So, first check should be false.
In the second one, you are ignoring case in invariant culture with StringComparison.InvariantCultureIgnoreCase. é is not equal to e in invariant culture. Those characters are in fact different (has different meaning) in most cultures. This check should be false.
In the last one, you are ignoring characters, such as diacritics, with CompareOptions.IgnoreNonSpace. The last one should be true.
Also, read here.

Related

Why String.Equals is returning false?

I have the following C# code (from a library I'm using) that tries to find a certificate comparing the thumbprint. Notice that in the following code both mycert.Thumbprint and certificateThumbprint are strings.
var certificateThumbprint = AppSettings.CertificateThumbprint;
var cert =
myStore.Certificates.OfType<X509Certificate2>().FirstOrDefault(
mycert =>
mycert.Thumbprint != null && mycert.Thumbprint.Equals(certificateThumbprint)
);
This fails to find the certificate with the thumbprint because mycert.Thumbprint.Equals(certificateThumbprint) is false even when the strings are equal. mycert.Thumbprint == certificateThumbprint also returns false, while mycert.Thumbprint.CompareTo(certificateThumbprint) returns 0.
I might be missing something obvious, but I can't figure out why the Equals method is failing. Ideas?

CompareTo ignores certain characters:
static void Main(string[] args)
{
var a = "asdas"+(char)847;//add a hidden character
var b = "asdas";
Console.WriteLine(a.Equals(b)); //false
Console.WriteLine(a.CompareTo(b)); //0
Console.WriteLine(a.Length); //6
Console.WriteLine(b.Length); //5
//watch window shows both a and b as "asdas"
}
(Here, the character added to a is U+034F, Combining Grapheme Joiner.)
So CompareTo's result is not a good indicator of a bug in Equals. The most likely reason of your problem is hidden characters. You can check the lengths to be sure.
See this for more info.

You may wish to try using an overload of String.Equals that accepts a parameter of type StringComparison.
For example:
myCert.Thumbprint.Equals(certificateThumbprint, StringComparison.[SomeEnumeration])
Where [SomeEnumeration] is replaced with one of the following enumerated constants:
- CurrentCulture
- CurrentCultureIgnoreCase
- InvariantCulture
- InvariantCultureIgnoreCase
- Ordinal
- OrdinalIgnoreCase
Reference the MSDN Documentation found here.

Sometimes when we insert data in database it stores some spaces like "question ". And when you will try to compare it with "question" it returns false. So my suggestion is: please check the value in database or use Trim() method.
In your case, please try:
mycert.Thumbprint != null && mycert.Thumbprint.trim().equals(certificateThumbprint.trim())
I think it will return true if any record will exist.

Why this string ("ʿAbdul-Baha'"^^mso:text#de) doesn't start with "?

"\"ʿAbdul-Baha'\"^^mso:text#de".StartsWith("\"") // is false
"\"Abdul-Baha'\"^^mso:text#de".StartsWith("\"") // is true
(int)'ʿ' // is 703`
is there anyone could tell me Why?

You need to use the second parameter of the function BeginsWith; StringComparison.Ordinal (or StringComparison.OrdinalIgnoreCase). This instructs the function to compare by character value and to take no consideration to cultural information on sorting. This quote is from the MSDN-link below:
"An operation that uses word sort rules performs a culture-sensitive comparison wherein certain nonalphanumeric Unicode characters might have special weights assigned to them. Using word sort rules and the conventions of a specific culture, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list."
This seems to affect how BeginsWith performs depending on locale/culture (see the comments on OP's post) - it works for some but not for others.
In my example (unit-test) below I show that if you convert the strings to a char-array and look at the first character, it it actually the same. When calling the BeginsWith-function you need to add the Ordinal comparison to get the same result.
For reference my locale is Swedish.
For further info: MSDN: StringComparison Enumeration
[Test]
public void BeginsWith_test()
{
const string string1 = "\"ʿAbdul-Baha'\"^^mso:text#de";
const string string2 = "\"Abdul-Baha'\"^^mso:text#de";
var chars1 = string1.ToCharArray();
var chars2 = string2.ToCharArray();
Assert.That(chars1[0], Is.EqualTo('"'));
Assert.That(chars2[0], Is.EqualTo('"'));
Assert.That(string1.StartsWith("\"", StringComparison.InvariantCulture), Is.False);
Assert.That(string1.StartsWith("\"", StringComparison.CurrentCulture), Is.False);
Assert.That(string1.StartsWith("\"", StringComparison.Ordinal), Is.True); // Works
Assert.That(string2.StartsWith("\""), Is.True);
}

Converting string array to int

I'm having a weird problem, trying to take a string from a string array
and convert it to an integer.
Take a look at this code snippet:
string date = "‎21/‎07/‎2010 ‏‎13:50";
var date1 = date.Split(' ')[0];
string[] dateArray = date1.Split('/');
string s = "21";
string t1 = dateArray[0];
bool e = string.Compare(s, t1) == 0; //TRUE
int good = Convert.ToInt32(s); //WORKING!
int bad = Convert.ToInt32(t1); //Format exception - Input string was not in a correct format.
Can someone please explain why the conversion with s works, while with t1 fails?

Your string is full of hidden characters, causing it to break. There's four U+200E and one U+200F
Here's a clean string to try on:
string date = "21/07/2010 13:50";

Why do you use string.Compare(s, t1) == 0 to test if the strings are equal? This overload of Compare does a culture sensitive comparison. But it doesn't mean that the strings are identical. To check if the strings consist of identical "sequences" of char values, use ordinal comparison. Ordinal comparison can be done, for example, with
bool e = s == t1;
In your case, the strings have different Lengths, and they also differ on the first index, s[0] != t1[0].
Your string date contains right-to-left marks and left-to-right marks. This may happen because you copy-paste from an Arabic text (or another language written in the "wrong" direction).
To remove these characters in the ends of your string (not in the middle), you can use something like
t1 = t1.Trim('\u200E', '\u200F');

Double.TryParse thousand separator returns unexpected result

I just ran into something very strange, and was just wondering if I was missing something.
I was trying to parse a string (with thousand separators) into a double, and found the below issue.
CultureInfo ci = CultureInfo.CurrentCulture; // en-ZA
string numberGroupSeparator = CultureInfo.CurrentCulture.NumberFormat.NumberGroupSeparator; //numberGroupSeparator = ,
string numberDecimalSeparator = CultureInfo.CurrentCulture.NumberFormat.NumberDecimalSeparator;//numberDecimalSeparator = .
string strValue = "242,445.24";
double try1;
double try2;
bool btry1 = Double.TryParse(strValue, out try1); //try1 = 242445.24 : btry1 = true
bool btry2 = Double.TryParse(strValue, NumberStyles.Any, null, out try2); //try2 = 0.0 : btry2 = false <- STRANGE
double try3 = Convert.ToDouble(strValue); //try3 = 242445.24
Now the reason why I didnt just want to use Convert.ToDouble is due to scientific notation which has given me some problems before.
Does anybody know why this might be?
EDIT:
I have update my current culture info.

Its working on my machine as expected, so I believe it has to do with the Current Culture. Try using CultureInfo.InvariantCulture instead of null in your TryParse
Double.TryParse(strValue, NumberStyles.Any,CultureInfo.InvariantCulture, out try2);
It is failing for your current specified culture en-ZA, I tried the following code and try2 is holding 0.0
Double.TryParse(strValue, NumberStyles.Any,new CultureInfo("en-ZA"), out try2);

Updated (correct) answer, after much digging
You say that your current culture is en-ZA, but checking
new System.Globalization.CultureInfo("en-ZA").NumberFormat.NumberGroupSeparator
we see that the value is the empty string and not "," as the question states. So if we set CultureInfo.CurrentCulture to new CultureInfo("en-ZA") then parsing fails even for try1.
After manually setting it to "," with
Thread.CurrentThread.CurrentCulture.NumberFormat.NumberGroupSeparator = ",";
it transpires that parsing into try1 is successful. Parsing into try2 still fails.
For the TryParse overload used in try2 the documentation is pretty clear that the current thread culture is used when the format provider is null, so something else must be going on...
After carefully comparing InvariantCulture.NumberFormat to that of the en-ZA culture, I noticed that the cultures also differ in their currency formats. Trying
Thread.CurrentThread.CurrentCulture.NumberFormat.CurrencyGroupSeparator = ",";
Thread.CurrentThread.CurrentCulture.NumberFormat.CurrencyDecimalSeparator = ".";
hit the jackpot: parsing succeeds! So what's really going on is that when using NumberStyles.All, the parse treats the number as currency.
The hypothesis can be verified if you try
double.TryParse(strValue,
NumberStyles.Any & ~NumberStyles.AllowCurrencySymbol, null, out try2);
which succeeds without needing to mess with the currency separators (of course the NumberGroupSeparator does have to be appropriate)!

The documentation says that 0.0 is returned, when the conversation fails.
Most likely TryParse returns false, and you should try calling Parse, to get an exception message that might tell you what is wrong.

String StartsWith() issue with Danish text

Can anyone explain this behaviour?
var culture = new CultureInfo("da-DK");
Thread.CurrentThread.CurrentCulture = culture;
"daab".StartsWith("da"); //false
I know that it can be fixed by specifying StringComparison.InvariantCulture. But I'm just confused by the behavior.
I also know that "aA" and "AA" are not considered the same in a Danish case-insensitive comparision, see http://msdn.microsoft.com/en-us/library/xk2wykcz.aspx. Which explains this
String.Compare("aA", "AA", new CultureInfo("da-DK"), CompareOptions.IgnoreCase) // -1 (not equal)
Is this linked to the behavior of the first code snippet?

Here a test that illustrates the problem, daab og dåb (same word in old and modern language respectively) means baptism/christening.
public class can_handle_remnant_of_danish_language
{
[Fact]
public void daab_start_with_då()
{
var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
Assert.True("daab".StartsWith("då")); // Fails
}
[Fact]
public void daab_start_with_da()
{
var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
Assert.True("daab".StartsWith("da")); // Fails
}
[Fact]
public void daab_start_with_daa()
{
var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
Assert.True("daab".StartsWith("daa")); // Succeeds
}
[Fact]
public void dåb_start_with_daa()
{
var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
Assert.True("dåb".StartsWith("daa")); // Fails
}
[Fact]
public void dåb_start_with_da()
{
var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
Assert.True("dåb".StartsWith("da")); // Fails
}
[Fact]
public void dåb_start_with_då()
{
var culture = new CultureInfo("da-DK"); Thread.CurrentThread.CurrentCulture = culture;
Assert.True("dåb".StartsWith("då")); // Succeeds
}
}
All the above tests should be successfull with my understanding of the language, and im danish!
I aint got no degree in grammar though. :-)
Seems like a bug to me.

Like Nappy said, its a feature of the danish language, where "aa" and "å" is still the same. Danish got another two letters, æ and ø, but I am not sure if they can be written using two letters as well.
I think in the second example "aA" is not changed while "AA" is changed to "Å". Just to confuse things even more, "Aa" is considered equal to "AA" and "aa" only when using case-insensitive comparing.

The modern spelling of "baptism" in Danish, namely dåb, is certainly not considered to start with da, for a Danophone. If daab is supposed to be an old-fashioned spelling of dåb, it is a bit philosophical whether it starts with da or not. But for (modern) collation purposes, it does not (alphabetically, such daab goes after disk, not before).
However, if your string is not supposed to represent natural language, but is instead some kind of technical code, like hexadecimal digits, surely you do not want to use any culture-specific rules. The solution here is not to use the invariant culture. The invariant culture has (English) rules itself!
Instead, you want to use ordinal comparison.
Ordinal comparison simply compares the strings char by char, without any assumptions of what sequences are "equivalent" in some sense. (Technical remark: Each char is a UTF-16 code unit, not a "character". Ordinal comparison is ignorant of the rules of Unicode normalization.)
I think the confusion arises because, by default, some string methods use a culture-aware comparison, and other string methods use the ordinal comparison.
The following examples all use a culture-aware comparison:
"Straße".StartsWith("Strasse", StringComparison.CurrentCulture)
"Straße".Equals("Strasse", StringComparison.CurrentCulture)
"ne\u0301e".StartsWith("née", StringComparison.CurrentCulture)
"ne\u0301e".Equals("née", StringComparison.CurrentCulture)
"Straße".StartsWith("Strasse") // CurrentCulture is default for 'StartsWith'!
"ne\u0301e".StartsWith("née") // CurrentCulture is default for 'StartsWith'!
Each of the above may depend on the .NET version as well! (As an example, the first one gives true if the current culture is the invariant culture and you are under .NET Framework 4.8; but it gives false if the current culture is the invariant culture and you use .NET 6.)
But these examples use ordinal comparison:
"Straße".StartsWith("Strasse", StringComparison.Ordinal)
"Straße".Equals("Strasse", StringComparison.Ordinal)
"ne\u0301e".StartsWith("née", StringComparison.Ordinal)
"ne\u0301e".Equals("née", StringComparison.Ordinal)
"Straße".Equals("Strasse") // Ordinal is default for 'Equals'!
"ne\u0301e".Equals("née") // Ordinal is default for 'Equals'!
So remember to check what the default comparison is for the string method you use, and specify the opposite one if needed. (Or always specify the comparison, even when redundant, if you prefer.)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

odd results when comparing strings based on culture - c#

Related

Why String.Equals is returning false?

Why this string ("ʿAbdul-Baha'"^^mso:text#de) doesn't start with "?

Converting string array to int

Double.TryParse thousand separator returns unexpected result

String StartsWith() issue with Danish text

Categories

Resources