I am trying to implement simple algorithm with use of C#'s Dictionary :
My 'outer' dictionary looks like this : Dictionary<paramID, Dictionary<string, object>> [where paramID is simply an identifier which holds 2 strings]
if key 'x' is already in the dictionary then add specific entry to this record's dictionary, if it doesn't exist then add its entry to the outer Dictionary and then add entry to the inner dictionary.
Somehow, when I use TryGetValue it always returns false, therefore it always creates new entries in the outer Dictionary - what produces duplicates.
My code looks more or less like this :
Dictionary<string, object> tempDict = new Dictionary<string, object>();
if(outerDict.TryGetValue(new paramID(xKey, xValue), out tempDict))
{
tempDict.Add(newKey, newValue);
}
Block inside the ifis never executed, even if there is this specific entry in the outer Dictionary.
Am I missing something ? (If you want I can post screen shots from debugger - or something else if you desire)
If you haven't over-ridden equals and GetHashCode on your paramID type, and it's a class rather than a struct, then the default equality meaning will be in effect, and each paramID will only be equal to itself.
You likely want something like:
public class ParamID : IEquatable<ParamID> // IEquatable makes this faster
{
private readonly string _first; //not necessary, but immutability of keys prevents other possible bugs
private readonly string _second;
public ParamID(string first, string second)
{
_first = first;
_second = second;
}
public bool Equals(ParamID other)
{
//change for case-insensitive, culture-aware, etc.
return other != null && _first == other._first && _second == other._second;
}
public override bool Equals(object other)
{
return Equals(other as ParamID);
}
public override int GetHashCode()
{
//change for case-insensitive, culture-aware, etc.
int fHash = _first.GetHashCode();
return ((fHash << 16) | (fHash >> 16)) ^ _second.GetHashCode();
}
}
For the requested explanation, I'm going to do a different version of ParamID where the string comparison is case-insensitive and ordinal rather than culture based (a form that would be appropriate for some computer-readable codes (e.g. matching keywords in a case-insensitive computer language or case-insensitive identifiers like language tags) but not for something human-readable (e.g. it will not realise that "SS" is a case-insensitive match to "ß"). This version also considers {"A", "B"} to match {"B", "A"} - that is, it doesn't care what way around the strings are. By doing a different version with different rules it should be possible to touch on a few of the design considerations that come into play.
Let's start with our class containing just the two fields that are it's state:
public class ParamID
{
private readonly string _first; //not necessary, but immutability of keys prevents other possible bugs
private readonly string _second;
public ParamID(string first, string second)
{
_first = first;
_second = second;
}
}
At this point if we do the following:
ParamID x = new ParamID("a", "b");
ParamID y = new ParamID("a", "b");
ParamID z = x;
bool a = x == y;//a is false
bool b = z == x;//b is true
Because by default a reference type is only equal to itself. Why? Well firstly, sometimes that's just what we want, and secondly it isn't always clear what else we might want without the programmer defining how equality works.
Note also, that if ParamID was a struct, then it would have equality defined much like what you wanted. However, the implementation would be rather inefficient, and also buggy if it contained a decimal, so either way it's always a good idea to implement equality explicitly.
The first thing we are going to do to give this a different concept of equality is to override IEquatable<ParamID>. This is not strictly necessary, (and didn't exist until .NET 2.0) but:
It will be more efficient in a lot of use cases, including when key to a Dictionary<TKey, TValue>.
It's easy to do the next step with this as a starting point.
Now, there are four rules we must follow when we implement an equality concept:
An object must still be always equal to itself.
If X == Y and X != Z, then later if the state of none of those objects has changed, X == Y and X != Z still.
If X == Y and Y == Z, then X == Z.
If X == Y and Y != Z then X != Z.
Most of the time, you'll end up following all these rules without even thinking about it, you just have to check them if you're being particularly strange and clever in your implementation. Rule 1 is also something that we can take advantage of to give us a performance boost in some cases:
public class ParamID : IEquatable<ParamID>
{
private readonly string _first; //not necessary, but immutability of keys prevents other possible bugs
private readonly string _second;
public ParamID(string first, string second)
{
_first = first;
_second = second;
}
public bool Equals(ParamID other)
{
if(other == null)
return false;
if(ReferenceEquals(this, other))
return true;
if(string.Compare(_first, other._first, StringComparison.InvariantCultureIgnoreCase) == 0 && string.Compare(_second, other._second, StringComparison.InvariantCultureIgnoreCase) == 0)
return true;
return string.Compare(_first, other._second, StringComparison.InvariantCultureIgnoreCase) == 0 && string.Compare(_second, other._first, StringComparison.InvariantCultureIgnoreCase) == 0;
}
}
The first thing we've done is see if we're being compared with equality to null. We almost always want to return false in such cases (not always, but the exceptions are very, very rare and if you don't know for sure you're dealing with such an exception, you almost certainly are not), and certainly we don't want to throw a NullReferenceException.
The next thing we do is to see if the object is being compared with itself. This is purely an optimisation. In this case, it's probably a waste of time, but it can be very useful with more complicated equality tests, so it's worth pointing out this trick here. This takes advantage of the rule that identity entails equality, that is, any object is equal to itself (Ayn Rand seemed to think this was somehow profound).
Finally, having dealt with these two special cases, we get to the actual rule for equality. As I said above, my example considers two objects equal if they have the same two strings, in either order, for case-insensitive ordinal comparisons, so I've a bit of code to work that out.
(Note that the order in which we compare component parts can have a performance impact. Not in this case, but with a class that contains both an int and a string we would compare the ints first because is faster and we will hence perhaps find an answer of false before we even look at the strings)
Now at this point we've a good basis for overriding the Equals method defined in object:
public override bool Equals(object other)
{
return (other as ParamID);
}
Since as will return a ParamID reference if other is a ParamID and null for anything else (including if null was what we were passed in the first place), and since we already handle comparison with null, we're all set.
Try to compile at this point and you will get a warning that you have overriden Equals but not GetHashCode (the same is true if you'd done it the other way around).
GetHashCode is used by the dictionary (and other hash-based collections like HashTable and HashSet) to decide where to place the key internally. It will take the hashcode, re-hash it down to a smaller value in a way that is its business, and use it to place the object in its internal store.
Because of this, it's clear why the following is a bad idea were ParamID not readonly on all fields:
ParamID x = new ParamID("a", "b");
dict.Add(x, 33);
x.First = "c";//x will now likely never be found in dict because its hashcode doesn't match its position!
This means the following rules apply to hash-codes:
Two objects considered equal, must have the same hashcode. (This is a hard rule, you will have bugs if you break it).
While we can't guarantee uniqueness, the more spread out the returned results, the better. (Soft rule, you will have better performance the better you do at it).
(Well, 2½.) While not a strict rule, if we take such a complicated approach to point 2 above that it takes forever to return a result, the nett effect will be worse than if we had a poorer-quality hash. So we want to try to be reasonably quick too if we can.
Despite the last point, it's rarely worth memoising the results. Hash-based collections will normally memoise the value themselves, so it's a waste to do so in the object.
For the first implementation, because our approach to equality depended upon the default approach to equality of the strings, we could use strings default hashcode. For my different version I'll use another approach that we'll explore more later:
public override int GetHashCode()
{
return StringComparer.OrdinalIgnoreCase.GetHashCode(_first) ^ StringComparer.OrdinalIgnoreCase.GetHashCode(_second);
}
Let's compare this to the first version. In both cases we get hashcodes of the component parts. If the values where integers, chars or bytes we would have worked with the values themselves, but here we build on the work done in implementing the same logic for those parts. In the first version we use the GetHashCode of string itself, but since "a" has a different hashcode to "A" that won't work here, so we use a class that produces a hashcode ignoring that difference.
The other big difference between the two is that in the first case we mix the bits up more with ((fHash << 16) | (fHash >> 16)). The reason for this is to avoid duplicate hashes. We can't produce a perfect hashcode where every different object has a different value, because there are only 4294967296 possible hashcode values, but many more possible values for ParamID (including null, which is treated as having a hashcode of 0). (There are cases where prefect hashes are possible, but they bring in different concerns than here). Because of this imperfection we have to think not only about what values are possible, but which are likely. Generally, shifting bits like we've done in the first version avoids common values having the same hash. We don't want {"A", "B"} to hash the same as {"B", "A"}.
It's an interesting experiment to produce a deliberately poor GetHashCode that always returns 0, it'll work, but instead of being close to O(1), dictionaries will be O(n), and poor as O(n) goes for that!
The second version doesn't do that, because it has different rules so for it we actually want to consider values the same but for being switch around as equal, and hence with the same hashcode.
The other big difference is the use of StringComparer.OrdinalIgnoreCase. This is an instance of StringComparer which, among other interfaces, implements IEqualityComparer<string> and IEqualityComparer. There are two interesting things about the IEqualityComparer<T> and IEqualityComparer interfaces.
The first is that hash-based collections (such as dictionary) all use them, it's just that unless passed an instance of one to their constructor they will use DefaultEqualityComparer which calls into the Equals and GetHashCode methods we've described above.
The other, is that it allows us to ignore the Equals and GetHashCode mentioned above, and provide them from another class. There are three advantages to this:
We can use them in cases (string is a classic case) where there is more than one likely definition of "equals".
We can ignore that by the class' author, and provide our own.
We can use them to avoid a particular attack. This attack is based on being in a situation where input you provide will be hashed by the code you are attacking. You pick input so as to deliberately provide objects that are different, but hash the same. This means that the poor performance we talked about avoiding earlier is hit, and it can be so bad that it becomes a denial of service attack. By providing different IEqualityComparer implementations with random elements to the hash code (but the same for every instance of the comparer) we can vary the algorithm enough each time as to twart the attack. The use for this is rare (it has to be something that will hash based purely on outside input that is large enough for the poor performance to really hurt), but vital when it comes up.
Finally. If we override Equals we may or may not want to override == and != too. It can be useful to keep them refering to identity only (there are times when that is what we care most about) but it can be useful to have them refer to other semantics (`"abc" == "ab" + "c" is an example of an override).
In summary:
The default equality of reference objects is identity (equal only to itself).
The default equality of value types is a simple comparison of all fields (but poor in performance).
We can change the concept of equality for our classes in either case, but this MUST involve both Equals and GetHashCode*
We can override this and provide another concept of equality.
Dictionary, HashSet, ConcurrentDictionary, etc. all depend on this.
Hashcodes represent a mapping from all values of an object to a 32-bit number.
Hashcodes must be the same for objects we consider equal.
Hashcodes must be spread well.
*Incidentally, anonymous classes have a simple comparison like that of value types, but better performance, which matches almost any case in which we mght care about the hash code of an anonymous type.
Most likely, paramID does not implement equality comparison correctly.
It should be implementing IEquatable<paramID> and that means especially that the GetHashCode implementation must adhere to the requirements (see "Notes to implementers").
As for keys in dictionaries, MSDN says:
As long as an object is used as a key in the Dictionary(Of TKey,
TValue), it must not change in any way that affects its hash value.
Every key in a Dictionary(Of TKey, TValue) must be unique according to
the dictionary's equality comparer. A key cannot be Nothing, but a
value can be, if the value type TValue is a reference type.
Dictionary(Of TKey, TValue) requires an equality implementation to
determine whether keys are equal. You can specify an implementation of
the IEqualityComparer(Of T) generic interface by using a constructor
that accepts a comparer parameter; if you do not specify an
implementation, the default generic equality comparer
EqualityComparer(Of T).Default is used. If type TKey implements the
System.IEquatable(Of T) generic interface, the default equality
comparer uses that implementation.
Since you don't show the paramID type I cannot go into more detail.
As an aside: that's a lot of keys and values getting tangled in there. There's a dictionary inside a dictionary, and the keys of the outer dictionary aggregate some kind of value as well. Perhaps this arrangement can be advantageously simplified? What exactly are you trying to achieve?
Use the Dictionary.ContainsKey method.
And so:
Dictionary<string, object> tempDict = new Dictionary<string, object>();
paramID searchKey = new paramID(xKey, xValue);
if(outerDict.ContainsKey(searchKey))
{
outerDict.TryGetValue(searchKey, out tempDict);
tempDict.Add(newKey, newValue);
}
Also don't forget to override the Equals and GetHashCode methods in order to correctly compare two paramIDs:
class paramID
{
// rest of things
public override bool Equals(object obj)
{
paramID p = (paramID)obj;
// how do you determine if two paramIDs are the same?
if(p.key == this.key) return true;
return false;
}
public override int GetHashCode()
{
return this.key.GetHashCode();
}
}
Related
Is there a difference between these two methods?
public class A
{
public int Count { get; set; }
}
public A Increment(A instance)
{
instance.Count++;
return instance;
}
public void Increment(A instance)
{
instance.Count++;
}
I mean, apart from one method returning the same reference and the other method not returning anything, both of them accomplish the same thing, to increment the Count property of the reference being passed as argument.
Is there an advantage of using one against the other? I generally tend to use the former because of method chaining, but is there a performance tradeoff?
One of the advantages of the latter method, for example, is that one cannot create a new reference:
public void Increment(A instance)
{
instance.Count++;
instance = new A(); //This new object has local scope, the original reference is not modified
}
This could be considered a defensive approach against new implementations of an interface.
I don't want this to be opinion based, so I am explicitly looking for concrete advantages (or disadvantages), taken out from the documentation or the language's specification.
One of the advantages of the latter method, for example, is that one cannot create a new reference.
You could consider that one of the disadvantages. Consider:
public A Increment(A instance)
{
return new A { Count = instance.Count +1 };
}
Or
public A Increment()
{
return new A { Count = this.Count +1 };
}
Apply this consistently, and you can have your A classes being immutable, with all the advantages that brings.
It also allows for different types that implement the same interface to be returned. This is how Linq works:
Enumerable.Range(0, 1) // RangeIterator
.Where(i => i % 2 == 0) // WhereEnumerableIterator<int>
.Select(i => i.ToString()) // WhereSelectEnumerableIterator<int, string>
.Where(i => i.Length != 1) // WhereEnumerableIterator<string>
.ToList(); // List<string>
While each operation acts on the type IEnumerable<int> each result is implemented by a different type.
Mutating fluent methods, like you suggest, are pretty rare in C#. They are more common in languages without the sort of properties C# supports, as it's then convenient to do:
someObject.setHeight(23).setWidth(143).setDepth(10);
But in C# such setXXX methods are rare, with property setters being more common, and they can't be fluent.
The main exception is StringBuilder because its very nature means that repeatedly calling Append() and/or Insert() on it with different values is very common, and the fluent style lends itself well to that.
Otherwise the fact that mutating fluent methods aren't common means that all you really get by supplying one is the minute extra cost of returning the field. It is minute, but it's not gaining anything when used with the more idiomatic C# style that is going to ignore it.
To have an external method that both mutated and also returned the mutated object would be unusual, and that could lead someone to assume that you didn't mutate the object, since you were returning the result.
E.g upon seeing:
public static IList<T> SortedList(IList<T> list);
Someone using the code might assume that after the call list was left alone, rather than sorted in place, and also that the two would be different and could be mutated separately.
For that reason alone it would be a good idea to either return a new object, or to return void to make the mutating nature more obvious.
We could though have short-cuts when returning a new object:
public static T[] SortedArray<T>(T[] array)
{
if (array.Length == 0) return array;
T[] newArray = new T[array.Length];
Array.Copy(array, newArray, array.Length);
Array.Sort(newArray);
return newArray;
}
Here we take advantage of the fact that since empty arrays are essentially immutable (they have no elements to mutate, and they can't be added to) for most uses returning the same array is the same as returning a new array. (Compare with how string implements ICloneable.Clone() by returning this). As well as reducing the amount of work done, we reduce the number of allocations, and hence the amount of GC pressure. Even here though we need to be careful (someone keying a collection on object identity will be stymied by this), but it can be useful in many cases.
Short answer - it depends.
Long answer - I would consider returning the instance of the object if you are using a builder pattern or where you need chaining of methods.
Most of other cases do look like a code smell: if you are in control of the API and you find a lot of places where your returned object is not used, so why bother with extra effort? possibly you'll create subtle bugs.
Given an instance of an object in C#, how can I determine if that object has value semantics? In other words, I want to guarantee that an object used in my API is suitable to be used as a dictionary key. I was thinking about something like this:
var type = instance.GetType();
var d1 = FormatterServices.GetUninitializedObject(type);
var d2 = FormatterServices.GetUninitializedObject(type);
Assert.AreEqual(d1.GetHashCode(), d2.GetHashCode());
What do you think of that approach?
You can test for implementation of Equals() and GetHashCode() with this:
s.GetType().GetMethod("GetHashCode").DeclaringType == s.GetType()
or rather per #hvd's suggestion:
s.GetType().GetMethod("GetHashCode").DeclaringType != typeof(object)
Some some object s, if GetHashCode() is not implemented by it's type, this will be false, otherwise true.
One thing to be careful on is that this will not protect against a poor implementation of Equals() or GetHashCode() - this would evaluate to true even if the implementation was public override int GetHashCode() { }.
Given the drawbacks, I would tend towards documenting your types ("this type should / should not be used for a dictionary key..."), because this isn't something you could ultimately depend upon. If the implementation of Equals() or GetHashCode() was flawed instead of missing, it would pass this test but still have a run-time error.
FormatterServices.GetUninitializedObject can put the object in an invalid state; It breaks the guaranteed assignment of readonly fields etc. Any code which assumes that fields will not be null will break. I wouldn't use that.
You can check whether GetHashCode and Equals is overridden via reflection, but that's not enough. You could override the method call base class method. That doesn't count as value semantics.
Btw value semantics doesn't mean equal hashcodes. It could be a collision too; Value semantics means that two objects with equals properties should return same hashcode as well as equals method should evaluate to true.
I suggest you to create an instance, assign some properties, clone it; Now both hashcodes should be equal and calling object.Equals(original, clone) should evaluate to true.
You can see if an object defines its own Equals and GetHashCode using the DeclaringType property on the corresponding MethodInfo:
bool definesEquality = type.GetMethod("Equals", new[] { typeof(object) }).DelcaringType == type && type.GetMethod("GetHashCode", Type.EmptyTypes).DeclaringType == type;
How do I create a class to store a range of any type provided that the type allows comparison operators to ensure that that the first value provided to the constructor is less than the second?
public class Range<T> where T : IComparable<T>
{
private readonly T lowerBound;
private readonly T upperBound;
/// <summary>
/// Initializes a new instance of the Range class
/// </summary>
/// <param name="lowerBound">The smaller number in the Range tuplet</param>
/// <param name="upperBound">The larger number in the Range tuplet</param>
public Range(T lowerBound, T upperBound)
{
if (lowerBound > upperBound)
{
throw new ArgumentException("lowerBlound must be less than upper bound", lowerBound.ToString());
}
this.lowerBound = lowerBound;
this.upperBound = upperBound;
}
I am getting the error:
Error 1 Operator '>' cannot be applied to operands of type 'T' and 'T' C:\Source\MLR_Rebates\DotNet\Load_MLR_REBATE_IBOR_INFO\Load_MLR_REBATE_IBOR_INFO\Range.cs 27 17 Load_MLR_REBATE_IBOR_INFO
You could use
where T : IComparable<T>
... or you could just use an IComparer<T> in your code, defaulting to Comparer<T>.Default.
This latter approach is useful as it allows ranges to be specified even for types which aren't naturally comparable to each other, but could be compared in a custom, sensible way.
On the other hand, it does mean that you won't catch incomparable types at compile time.
(As an aside, creating a range type introduces a bunch of interesting API decisions around whether you allow reversed ranges, how you step over them, etc. Been there, done that, was never entirely happy with the results...)
You cannot constrain a T to support a given set of operators, but you can constrain to IComparable<T>
where T : IComparable<T>
Which at least allows you to use first.CompareTo(second). Your basic numeric types, plus strings, DateTimes, etc., implement this interface.
To combine two suggestions already given, we combine the ability to create Ranges with a manually defined comparison rule, with an over-ride for those types that implement IComparable<T>, and with compile-time safety on the latter.
We take much the same approach as the static Tuple class' Create method. This can also offer concision in allowing us to rely upon type inference:
public static class Range // just a class to a hold the factory methods
{
public static Range<T> Create<T>(T lower, T upper) where T : IComparable<T>
{
return new Range<T>(lower, upper, Comparer<T>.Default);
}
//We don't need this override, but it adds consistency that we can always
//use Range.Create to create a range we want.
public static Range<T> Create<T>(T lower, T upper, IComparer<T> cmp)
{
return new Range<T>(lower, upper, cmp);
}
}
public class Range<T>
{
private readonly T lowerBound;
private readonly T upperBound;
private readonly IComparer<T> _cmp;
public Range(T lower, T upper, IComparer<T> cmp)
{
if(lower == null)
throw new ArgumentNullException("lower");
if(upper == null)
throw new ArgumentNullException("upper");
if((_cmp = cmp).Compare(lower, upper) > 0)
throw new ArgumentOutOfRangeException("Argument \"lower\" cannot be greater than \"upper\".");
lowerBound = lower;
upperBound = upper;
}
}
Now we can't accidentally construct a Range with the default comparer where it won't work, but can also leave out the comparer and have it compile only if it'll work.
Edit:
There are two main approaches to having items comparable in an order-giving way in .NET and this uses both.
One way is to have a type define its on way of being compared with another object of the same type*. This is done by IComparable<T> (or the non-generic IComparable, but then you have to catch type mis-matches at run-time, so it isn't as useful post .NET1.1).
int for example, implements IComparable<int>, which means we can do 3.CompareTo(5) and receive a negative number indicating that 3 comes before 5 when the two are put into order.
Another way is to have an object that implements IComparer<T> (and likewise a non-generic IComparer that is less useful post .NET1.1). This is used to compare two objects, generally of a different type to the comparer. We explicitly use this either because a type we are interested in doesn't implement IComparable<T> or because we want to override the default sorting order. For example we could create the following class:
public class EvenFirst : IComparer<int>
{
public int Compare(int x, int y)
{
int evenOddCmp = x % 2 - y % 2;
if(evenOddCmp != 0)
return evenOddCmp;
return x.CompareTo(y);
}
}
If we used this to sort a list of integers (list.Sort(new EvenFirst())), it would put all the even numbers first, and all the odd numbers last, but have the even and odd numbers in normal order within their block.
Okay, so now we've got two different ways of comparing instances of a given type, one which is provided by the type itself and which is generally the "most natural", which is great, and one which gives us more flexibility, which is also great. But this means that we will have to write two versions of any piece of code that cares about such comparisons - one that uses IComparable<T>.CompareTo() and one that uses IComparer<T>.Compare().
It gets worse if we care about two types of objects. Then we need 4 different methods!
The solution is provided by Comparer<T>.Default. This static property gives us an implementation of IComparer<T>.Compare() for a given T that calls into IComparable<T>.CompareTo.
So, now we generally only ever write our methods to make use of IComparer<T>.Compare(). Providing a version that uses CompareTo for the most common sort of comparisons is just a matter of an override that uses the default comparer. E.g. instead of:
public void SortStrings(IComparer<string> cmp)//lets caller decide about case-sensitivity etc.
{
//pretty complicated sorting code that uses cmp.Compare(string1, string2)
}
public void SortStrings()
{
//equally complicated sorting code that uses string.CompareTo()
}
We have:
public void SortStrings(IComparer<string> cmp)//lets caller decide about case-sensitivity etc.
{
//pretty complicated sorting code that uses cmp.Compare(string1, string2)
}
public void SortStrings()
{
SortStrings(Comparer<string>.Default);//simple one-line code to re-use all the above.
}
As you can see, we've the best of both worlds here. Someone who just wants the default behaviour calls SortStrings(), someone who wants a more specific comparison rule to be used calls e.g. SortStrings(StringComparer.CurrentCultureIgnoreCase), and the implementation only had to do a tiny bit of work to offer that choice.
This is what is done with the suggestion for Range here. The constructor always takes an IComparer<T> and always uses it's Compare, but there's a factory method that calls it with Comparer<T>.Default to offer the other behaviour.
Note that we don't strictly need this factory method, we can just use an overload on the constructor:
public Range(T lower, T upper)
:this(lower, upper, Comparer<T>.Default)
{
}
The downside though, is that we can't add a where clause to this to restrict it to cases where it'll work. This means that if we called it with types that didn't implement IComparer<T> we'd get an ArgumentException at runtime rather than a compiler error. Which was Jon's point when he said:
On the other hand, it does mean that you won't catch incomparable types at compile time.
The use of the factory method is purely to ensure this wouldn't happen. Personally, I'd probably just go with the constructor override and try to be sure not to call it inappropriately, but I added the bit with the factory method since it does combine two things that had come up on this thread.
*Strictly, there's nothing to stop e.g. A : IComparable<B>, but while this is of little use in the first place, one also doesn't know for most uses whether the code using it will end up calling a.CompareTo(b) or b.CompareTo(a) so it doesn't work unless we do the same on both classes. In sort, if it can't be pushed up to a common base-class it's just going to get messy fast.
You can use IComparable interface which is used widely in the .NET framework.
Say you have two different classes where each have their own implementation of Equals; which one is used? What if only one of them have one? Or none of them? Are any of the following lines equivalent?
object .Equals( first, second )
first .Equals( second )
second .Equals( first )
I'm guessing that the first two might be equivalent, but I don't really have a clue.
What does it really do?
Basically it does three things:
Check for reference equality (return true if so)
Check for reference nullity (return false if either value is null; by now the null == null case has been handled)
Check for value equality with first.Equals(second)
The ordering shouldn't matter if both values have well-behaved equality implementations, as equality should be implemented such that x.Equals(y) implies y.Equals(x). However, the offline documentation I've got installed does state that first.Equals(second) (or objA.equals(objB) to use the real parameter naming) is specified. The online documentation doesn't mention this, interestingly enough.
Just to make all of this concrete, the implementation could look like this:
public static bool Equals(object x, object y)
{
if (x == y) // Reference equality only; overloaded operators are ignored
{
return true;
}
if (x == null || y == null) // Again, reference checks
{
return false;
}
return x.Equals(y); // Safe as we know x != null.
}
By default, object equivalency is determined by the object's address in memory. If both instances have the same memory address, they are equal.
However, this can be overloaded within the object so that developers can compare two objects that arn't in the same memory location and still be considered equal. For example, if you had a Data Access Layer where each object had its data record's ID from the database, you could have object equality compared based on the ID.
You can overload operators to produce this functionality.
Comparing string in C# is pretty simple. In fact there are several ways to do it. I have listed some in the block below. What I am curious about are the differences between them and when one should be used over the others? Should one be avoided at all costs? Are there more I haven't listed?
string testString = "Test";
string anotherString = "Another";
if (testString.CompareTo(anotherString) == 0) {}
if (testString.Equals(anotherString)) {}
if (testString == anotherString) {}
(Note: I am looking for equality in this example, not less than or greater than but feel free to comment on that as well)
Here are the rules for how these functions work:
stringValue.CompareTo(otherStringValue)
null comes before a string
it uses CultureInfo.CurrentCulture.CompareInfo.Compare, which means it will use a culture-dependent comparison. This might mean that ß will compare equal to SS in Germany, or similar
stringValue.Equals(otherStringValue)
null is not considered equal to anything
unless you specify a StringComparison option, it will use what looks like a direct ordinal equality check, i.e. ß is not the same as SS, in any language or culture
stringValue == otherStringValue
Is not the same as stringValue.Equals().
The == operator calls the static Equals(string a, string b) method (which in turn goes to an internal EqualsHelper to do the comparison.
Calling .Equals() on a null string gets null reference exception, while on == does not.
Object.ReferenceEquals(stringValue, otherStringValue)
Just checks that references are the same, i.e. it isn't just two strings with the same contents, you're comparing a string object with itself.
Note that with the options above that use method calls, there are overloads with more options to specify how to compare.
My advice if you just want to check for equality is to make up your mind whether you want to use a culture-dependent comparison or not, and then use .CompareTo or .Equals, depending on the choice.
From MSDN:
"The CompareTo method was designed primarily for use in sorting or
alphabetizing operations. It should not be used when the primary
purpose of the method call is to determine whether two strings are
equivalent. To determine whether two strings are equivalent, call
the Equals method."
They suggest using .Equals instead of .CompareTo when looking solely for equality. I am not sure if there is a difference between .Equals and == for the string class. I will sometimes use .Equals or Object.ReferenceEquals instead of == for my own classes in case someone comes along at a later time and redefines the == operator for that class.
If you are ever curious about differences in BCL methods, Reflector is your friend :-)
I follow these guidelines:
Exact match: EDIT: I previously always used == operator on the principle that inside Equals(string, string) the object == operator is used to compare the object references but it seems strA.Equals(strB) is still 1-11% faster overall than string.Equals(strA, strB), strA == strB, and string.CompareOrdinal(strA, strB). I loop tested with a StopWatch on both interned/non-interned string values, with same/different string lengths, and varying sizes (1B to 5MB).
strA.Equals(strB)
Human-readable match (Western cultures, case-insensitive):
string.Compare(strA, strB, StringComparison.OrdinalIgnoreCase) == 0
Human-readable match (All other cultures, insensitive case/accent/kana/etc defined by CultureInfo):
string.Compare(strA, strB, myCultureInfo) == 0
Human-readable match with custom rules (All other cultures):
CompareOptions compareOptions = CompareOptions.IgnoreCase
| CompareOptions.IgnoreWidth
| CompareOptions.IgnoreNonSpace;
string.Compare(strA, strB, CultureInfo.CurrentCulture, compareOptions) == 0
As Ed said, CompareTo is used for sorting.
There is a difference, however, between .Equals and ==.
== resolves to essentially the following code:
if(object.ReferenceEquals(left, null) &&
object.ReferenceEquals(right, null))
return true;
if(object.ReferenceEquals(left, null))
return right.Equals(left);
return left.Equals(right);
The simple reason is the following will throw an exception:
string a = null;
string b = "foo";
bool equal = a.Equals(b);
And the following will not:
string a = null;
string b = "foo";
bool equal = a == b;
Good explanation and practices about string comparison issues may be found in the article New Recommendations for Using Strings in Microsoft .NET 2.0 and also in Best Practices for Using Strings in the .NET Framework.
Each of mentioned method (and other) has particular purpose. The key difference between them is what sort of StringComparison Enumeration they are using by default. There are several options:
CurrentCulture
CurrentCultureIgnoreCase
InvariantCulture
InvariantCultureIgnoreCase
Ordinal
OrdinalIgnoreCase
Each of above comparison type targets different use case:
Ordinal
Case-sensitive internal identifiers
Case-sensitive identifiers in standards like XML and HTTP
Case-sensitive security-related settings
OrdinalIgnoreCase
Case-insensitive internal identifiers
Case-insensitive identifiers in standards like XML and HTTP
File paths (on Microsoft Windows)
Registry keys/values
Environment variables
Resource identifiers (handle names, for example)
Case insensitive security related settings
InvariantCulture or InvariantCultureIgnoreCase
Some persisted linguistically-relevant data
Display of linguistic data requiring a fixed sort order
CurrentCulture or CurrentCultureIgnoreCase
Data displayed to the user
Most user input
Note, that StringComparison Enumeration as well as overloads for string comparison methods, exists since .NET 2.0.
String.CompareTo Method (String)
Is in fact type safe implementation of IComparable.CompareTo Method. Default interpretation: CurrentCulture.
Usage:
The CompareTo method was designed primarily for use in sorting or alphabetizing operations
Thus
Implementing the IComparable interface will necessarily use this method
String.Compare Method
A static member of String Class which has many overloads. Default interpretation: CurrentCulture.
Whenever possible, you should call an overload of the Compare method that includes a StringComparison parameter.
String.Equals Method
Overriden from Object class and overloaded for type safety. Default interpretation: Ordinal.
Notice that:
The String class's equality methods include the static Equals, the static operator ==, and the instance method Equals.
StringComparer class
There is also another way to deal with string comparisons especially aims to sorting:
You can use the StringComparer class to create a type-specific comparison to sort the elements in a generic collection. Classes such as Hashtable, Dictionary, SortedList, and SortedList use the StringComparer class for sorting purposes.
Not that performance usually matters with 99% of the times you need to do this, but if you had to do this in a loop several million times I would highly suggest that you use .Equals or == because as soon as it finds a character that doesn't match it throws the whole thing out as false, but if you use the CompareTo it will have to figure out which character is less than the other, leading to slightly worse performance time.
If your app will be running in different countries, I'd recommend that you take a look at the CultureInfo implications and possibly use .Equals. Since I only really write apps for the US (and don't care if it doesn't work properly by someone), I always just use ==.
In the forms you listed here, there's not much difference between the two. CompareTo ends up calling a CompareInfo method that does a comparison using the current culture; Equals is called by the == operator.
If you consider overloads, then things get different. Compare and == can only use the current culture to compare a string. Equals and String.Compare can take a StringComparison enumeration argument that let you specify culture-insensitive or case-insensitive comparisons. Only String.Compare allows you to specify a CultureInfo and perform comparisons using a culture other than the default culture.
Because of its versatility, I find I use String.Compare more than any other comparison method; it lets me specify exactly what I want.
One BIG difference to note is .Equals() will throw an exception if first string is null, Whereas == will not.
string s = null;
string a = "a";
//Throws {"Object reference not set to an instance of an object."}
if (s.Equals(a))
Console.WriteLine("s is equal to a");
//no Exception
if(s==a)
Console.WriteLine("s is equal to a");
s1.CompareTo(s2): Do NOT use if primary purpose is to determine whether two strings are equivalent
s1 == s2: Cannot ignore case
s1.Equals(s2, StringComparison): Throws NullReferenceException if s1 is null
String.Equals(s2, StringComparison): By process of eliminiation, this static method is the WINNER (assuming a typical use case to determine whether two strings are equivalent)!
Using .Equals is also a lot easier to read.
with .Equals, you also gain the StringComparison options. very handy for ignoring case and other things.
btw, this will evaluate to false
string a = "myString";
string b = "myString";
return a==b
Since == compares the values of a and b (which are pointers) this will only evaluate to true if the pointers point to the same object in memory. .Equals dereferences the pointers and compares the values stored at the pointers.
a.Equals(b) would be true here.
and if you change b to:
b = "MYSTRING";
then a.Equals(b) is false, but
a.Equals(b, StringComparison.OrdinalIgnoreCase)
would be true
a.CompareTo(b) calls the string's CompareTo function which compares the values at the pointers and returns <0 if the value stored at a is less than the value stored at b, returns 0 if a.Equals(b) is true, and >0 otherwise. However, this is case sensitive, I think there are possibly options for CompareTo to ignore case and such, but don't have time to look now.
As others have already stated, this would be done for sorting. Comparing for equality in this manner would result in unecessary overhead.
I'm sure I'm leaving stuff out, but I think this should be enough info to start experimenting if you need more details.