Globally set String.Compare/ CompareInfo.Compare to Ordinal

Globally set String.Compare/ CompareInfo.Compare to Ordinal - c#

I'm searching for a strategy with which I can set the default sortorder of String.CompareTo to bytewise - ordinal. I need to do this without having to specify the sortorder in the call to the method.
I have tried out several strategies without satisfactory results. I got as far as this:
CultureAndRegionInfoBuilder crib =
new CultureAndRegionInfoBuilder("foo", CultureAndRegionModifiers.Neutral);
CompareInfo compareInfo = new CustomCompareInfo();
crib.Register();
In this CustomCompareInfo I try to override the default CompareInfo class, but unfortunately this does not compile:
The type 'System.Globalization.CompareInfo' has no constructors defined
I'm stuck here. Got the feeling that a custom implementation of CompareInfo is the solution to my problem.
Got any ideas on this?
Edit: context of my question:
This project I'm working on is quite unusual - a huge codebase has been converted from an other programming language to .NET. In this programming language the string comparison defaults to ordinal and this difference with .NET is causing bugs in the converted codebase, so I figured it would be the most elegant solution if we'd be able to configure .NET to the same default behavior.
Of course it is possible to reconvert the code using a comparison-specifier. Or, we could introduce an extension method which performs a ordinal (binary) comparison. Et cetera..
However, as far as I am concerned, from an architectural viewpoint, these solutions are less elegant. This is the reason why I am searching for a solution with which I can set this ordinal comparison globally on the framework.
Thanks in advance!

Sorry, you can't make this work. The CompareInfo class does have a constructor. But it is internal and takes a CultureInfo as an argument. The actual implementation involves private members of CultureInfo that reflect sorting tables built into mscorlib. They are not extensible.
This does actually work in VB.NET, presumably the reason you are pursuing this. It has an Option Compare statement that lets you select binary comparison. This is however not implemented with CultureInfo, it is done by the compiler. Which recognizes a string comparison and replaces it with a custom vb.net string comparison method that is aware of the selected Option Compare. It's name is Microsoft.VisualBasic.CompilerServices.Operators.CompareString()
You cannot coax the C# compiler into the same behavior. You'd have to painstakingly replace comparison expressions in converted vb.net code. A horrible job of course and very prone to mistakes. If the conversion was done by a converter program then you might be better off with a good decompiler, it won't hide the CompareString() calls.

There appears to be no means of setting the default comparison mode (here, to ordinal).
If what you want is always-consistent comparison results, you can set, for each thread you create in your app, the culture to 'invariant' (cultureInfo with empty string as parameter)
Thread.CurrentThread.CurrentCulture = new CultureInfo("");
If you want to perform ordinal comparisons for performance, I really think that nothing can be done globally - you will need to pass this option explicitly each time you perform a string comparison.
Can you tell us what you need exactly?

Related

How to use a 'hard-coded' dictionary/enum

I am wanting to create a 'dictionary' of strings, however I have only ever learned how to use strings to reference what I want in a dictionary. I want something with more auto-correct (as typos can happen in a large table of strings), which is why I want to know how to hard-code. (The value of the strings will be retrieved from a text file, like JSON).
I notice that Microsoft uses some type of hard-coding in their String Resource File.
So instead of doing:
string result = strings["Hello"];
I wish to do this:
string result = strings.Hello;
The only thing I can think of is to use some external tool that creates an enum/struct script with the values from the text file. Is there a better option, perhaps one built into .NET?
Edit: I think 'strongly-typed' would be a better description over 'hard-coded'.
Edit 2: Thanks for all the comments and answers. By the looks of it, some code-gen is required to fufil this result. I wonder if there's already any tools out there that do this for you (I tried looking but my terminology may be lacking). It doesn't seem too difficult to create this tool.

There are compiletime constants and runtime constants.
Your wish for Autocrrection/Intellisense support requires a compile time constants. Those are the only ones Intellisence, Syntax Highlighting and the Compiler double check for you.
But your requriement of having the values generated from a 3rd party textfile, indicates either a runtime constant or some automatic code generation. Runtime constants would take away the Editor support. While Code generation would run into issue with the Editor only having a old copy of the file. And a high risk of breaking tons of code if a string in that one file changes.
So your two requirements are inherently at odds. You need to have your cake and eat it too.
Perhaps my primitve solution to the Enum/ToString() problem might help you?
Enumeration are for most parts groups of constants, and integer ones by default. With added type checks on assignments. That makes them a good way around Primitive Obsession. You reference a value from the group like you would any constant, readonly static field or readonly property. (There is other advantages like Flags, but I doubt they mater here).
While Enums have a string you could use for display and input parsing - the one you use in sourcecode - that one is absolutely not suited for display. By default they are all-caps and you would need to support Localisation down the line. My primitive Solution was a translation layer. I add a Dictionary<someEnum, String> SomeEnumStringRepresentation. This dictionary can be generated and even changed at runtime:
I need to display any specific value, it is SomeEnumLocalisation[someEnum]. I could add a default behavior to just ToString() the compiler representation of the Enum.
I need to parse a user input? Itterate over the values until you find a match, if not throw a ParseException.
I get to use compile time checks. Without having to deal with the very inmutable compile side strings anywhere else. Or with my code side strings changing all the time.

i am not quit understand what out put you want , bu I am just throwing an idea to here - how about to extend the class string and add your own methods to it ? so when you use strings.Hello it will return what you wanted?
example :
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/extension-methods

string vs System.String, int vs System.Int32 : another Alias vs Type Name question

Quite often I see source code where language's keyword are replaced with full type names:
System.String, System.Int32, System.GUID etc.
Moreover, people who do this write complete type names everywhere, making source full of such declarations:
System.Collections.Generic.List<System.Reflection.PropertyInfo> list = System.Collections.Generic.List<System.Reflection.PropertyInfo>(newSystem.Reflection.PropertyInfo[] { ... });
When I ask them why do they do this, i get wide range of answers: "It helps me avoid type names collisions", "It looks more professional", "my VS plugin does it for me automatically" etc.
I understand, sometimes writing full type names helps you avoid writing unnecessary using if you use the type one time throughout the source code file. And sometimes you need to declare a type explicitly, a great example is Threading Timer and WinForms Timer.
But if you source full of DB calls and you still write System.Data.SqlClient.SqlCommand instead of 'SqlCommand' it looks quite a bit strange for me.
What do you think? Am i right or i just don't understand something?
Thank you!
P.S. And another phenomena is writing if (0 != variable) instead of if (variable != 0).

The if (0 == variable) thing is a C++ convention used to protect against accidentally writing if (variable = 0), which is valid in C++ however doesn't do what's intended. Its completely unnecessary in C# as the wrong version doesn't compile, so you should use the other version instead as it reads better.
Personally I like to write as little as possible, so unless there is a namespace clash I never fully qualify things.
As for string vs String, I always use the alias (string / int) rather than the full class name, however its purely a convention - there is no runtime difference.

I'd argue strongly against "it looks more professional", as frankly it looks the opposite to me.
That said, if I was to use a single member of a namespace in the entire source file, I might use the full name there rather than have a using.
Favouring 0 != x over x != 0 etc. does have some advantages depending on overrides of equals and a few other things. This is more commonly so in some other languages, so can be a hangover from that. It's particularly common to see people favour putting the null first, as that way it's less likely to be turned into passing null to an equality override (again, more commonly a real issue in other languages). It can also avoid accidental assignment due to a typo, though yet again this is rarely an issue in C# (unless the type you are using is bool).

It is a bit subjective but unless your coding standard says otherwise I think removing the namespace is always better as it is less vebose and makes for easier reading. If there is a namespace collision, use a shorter alias that means something.
As to your last point, if you compare name.Equals("Paul") vs "Paul".Equals(name). They both do the same thing unless name is null. In this case, the first fails with a null exception, whilst the 2nd (correctly?) returns false.

For primitive data types: Duplicate questions here - C#, int or Int32? Should I care?
For non-primitive data types: The answers given are valid, especially "It helps to avoid type names collisions"
For if (0 != variable): variable is the subject to compare in the expression, it should go first. So, I would prefer if (variable != 0).

I don't find any of these reasons convincing. Add using statements is better.
Two minor exceptions:
In generated code, you may see redundant namespace prefixes, but that is OK as this code is not indented to edited.
Sometimes it is helpful to write Int32 explicitly when depend on the type being exactly 32 bits.

Make code as readable as possible! See also: http://en.wikipedia.org/wiki/KISS_principle
"It looks professional" is a very bad argument.
BTW, if your code is full of SQL statements, then you may want to refactor that anyway.

About string vs String. I try to use string, when it is that, i.e. a string of characters as in any programming language. But when it is an object (or I am referring to the String class), I try to use String.

Culture Sensitive GetHashCode

I'm writing a c# application that will process some text and provide basic query functions. In order to ensure the best possible support for other languages, I am allowing the users of the application to specify the System.Globalization.CultureInfo (via the "en-GB" style code) and also the full range of collation options using the System.Globalization.CompareOptions flags enum.
For regular string comparison I'm then using a combination of:
a) String.Compare overload that accepts the culture and options
b) For some bulk processes I'm caching the byte data (KeyData) from CompareInfo.GetSortKey (overload that accepts the options) and using a byte-by-byte comparison of the KeyData.
This seemed fine (although please comment if you think these two methods shouldn't be mixed), but then I had reason to use the HashSet<> class which only has an overload for IEqualityComparer<>.
MS documentation seems to suggest that I should use StringComparer (which implements both IEqualityComparer<> and IComparer<>), but this only seems to support the "IgnoreCase" option from CompareOptions and not "IgnoreKanaType", "IgnoreSymbols", "IgnoreWidth" etc.
I'm assuming that a StringComparer that ignores these other options could produce different hashcodes for two strings that might be considered the same using my other comparison options. I'd therefore get incorrect results from my application.
Only thought at the moment is to create my own IEqualityComparer<> that generates a hashcode from the SortKey.KeyData and compares eqality be using the String.Compare overload.
Any suggestions?

You will certainly need to implement your own IEqualityComparer<>, but I don't believe the hashcode necessarily has to play into it. Just use the string.Compare overload like you said.

Is there a way to set a DLL to always follow CultureInfo.InvariantCulture by default, if not specified?

I have a lot of code in a class library that does not specify CultureInfo.InvariantCulture. For example in toString operations, toBool, toInt, etc.
Is there a way I can get set a property for the class library to always execute using CultureInfo.InvariantCulture, even if it is not explicitly specified everywhere in the code?
Sort of like a global switch?
It is not only messy to have to explicitly type it everytime, it makes my code less readable, and is a royal pain for example:
if (Convert.ToInt16(task.RetryCount, CultureInfo.InvariantCulture) <
Convert.ToInt16(ConfigurationManager.AppSettings["TasksMaxRetry"], CultureInfo.InvariantCulture))

While I agree that Mark's answer is the correct answer to the question posed; I don't think that switching the thread culture is a good design. It could introduce subtle bugs if other parts of the application, most likely the UI, depends on the threads current culture. Also, I would argue that explicitly stating the culture in Convert calls is a good design, that tells the reader of the code that the original programmer has made an active decision about which format to allow; and that the code is not just "working by coincidence".
You will most likely want to have many of your parse operations grouped together in the same class; perhaps one that deals with reading configuration. In that class, you could define a field to contain the culture you would like to use for parsing:
private static readonly IFormatProvider parseFormat = CultureInfo.InvariantCulture;
Then use that field in any calls to Convert methods or similar. Declaring the field as an IFormatProvider, together with a well chosen name, tells the reader of the code very explicitly, that this is a field used to define the parsing format. IMHO, it makes the intent of the code clearer.
Another way to do this would be to make your own Parse / Convert class, that wraps the Convert.ToXxx methods and calls them with the format you intend to use. Then you will have the desired benefit of not having to explicitly state the format in each call.

I don't think so, but it is possible to set the CultureInfo on a per-thread basis:
Console.WriteLine(double.Parse("1.000"));
Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture;
Console.WriteLine(double.Parse("1.000"));
Output on my machine (your output may vary depending on your current culture):
1000
1
Is this what you want?

Which is faster/more efficient: Dictionary<string,object> or Dictionary<enum,object>?

Are enum types faster/more efficient than string types when used as dictionary keys?
IDictionary<string,object> or IDictionary<enum,object>
As a matter of fact, which data type is most suitable as a dictionary key and why?
Consider the following: NOTE: Only 5 properties for simplicity
struct MyKeys
{
public string Incomplete = "IN";
public string Submitted = "SU";
public string Processing="PR";
public string Completed = "CO";
public string Closed = "CL";
}
and
enum MyKeys
{
Incomplete,
Submitted,
Processing,
Completed,
Closed
}
Which of the above will be better if used as keys in a dictionary!

Certainly the enum version is better (when both are applicable and make sense, of course). Not just for performance (it can be better or worse, see Rashack's very good comment) as it's checked compile time and results in cleaner code.
You can circumvent the comparer issue by using Dictionary<int, object> and casting enum keys to ints or specifying a custom comparer.

I think you should start by focusing on correctness. This is far more important than the minimal difference between the minor performance differences that may occur within your program. In this case I would focus on the proper representation of your types (enum appears to be best). Then later on profile your application and if there is a issue, then and only then should you fix it.
Making code faster later in the process is typically a straight forward process. Take the link that skolima provided. If you had chosen enum, it would have been a roughly 10 minute fix to remove a potential performance problem in your application. I want to stress the word potential here. This was definitely a problem for NHibernate but as to whether or not it would be a problem for your program would be solely determined by the uses.
On the other hand, making code more correct later in the process tends to be more difficult. In a large enough problem you'll find that people start taking dependencies on the side effects of the previous bad behavior. This can make correcting code without breaking other components challenging.

Use enum to get cleaner and nicer code, but remember to provide a custom comparer if you are concerned with performance: http://ayende.com/Blog/archive/2009/02/21/dictionaryltenumtgt-puzzler.aspx .

I would guess that the enum version is faster. Under the hood the dictionary references everything by hashcode. My guess is that it is slower to generate the hashcode for a string. However, this is probably negligibly slower, and is most certainly faster than anything like a string compare. I agree with the other posters who said that an enum is cleaner.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.