Complex object comparison in C# [duplicate] - c#

This question already has answers here:
C# implementation of deep/recursive object comparison in .net 3.5
(6 answers)
Closed 8 years ago.
I have two complex objects of the same type. I want to compare both the objects to determine if they have the exact same values. What is the efficient way of doing this ?
sample class structure given below:
class Package
{
public List<GroupList> groupList;
}
class GroupList
{
public List<Feature> featurelist;
}
class Feature
{
public int qty;
}

Okay, so you want deep unordered structural comparison. The "unordered" part is tricky, and in fact it is a strong hint that your classes are not designed right: List<T> is inherently ordered, so perhaps you would rather want to use a HashSet<T> there (if you don't expect to have any duplicates). Doing so would make the comparison both easier to implement, and faster (though insertions would be slower):
class Package
{
public HashSet<GroupList> groupList;
public override bool Equals(object o)
{
Package p = o as Package;
if (p == null) return false;
return groupList.SetEquals(p.groupList);
}
public override int GetHashCode()
{
return groupList.Aggregate(0, (hash, g) => hash ^ g.GetHashCode());
}
}
class GroupList
{
public HashSet<Feature> featureList;
public override bool Equals(object o)
{
GroupList g = o as GroupList;
if (g == null) return false;
return featureList.SetEquals(g.featureList);
}
public override int GetHashCode()
{
return featureList.Aggregate(0, (hash, f) => hash ^ f.GetHashCode());
}
}
class Feature
{
public int qty;
public override bool Equals(object o)
{
Feature f = o as Feature;
if (f == null) return false;
return qty == f.qty;
}
public override int GetHashCode()
{
return qty.GetHashCode();
}
}
If you want to keep using List<T>, you'll need to use LINQ set operations - note, however, that those are significantly slower:
class Package
{
public List<GroupList> groupList;
public override bool Equals(object o)
{
Package p = o as Package;
if (p == null) return false;
return !groupList.Except(p.groupList).Any();
}
}
class GroupList
{
public List<Feature> featureList;
public override bool Equals(object o)
{
GroupList g = o as GroupList;
if (g == null) return false;
return !featureList.Except(f.featureList).Any();
}
}

For complex objects, I would consider operator overloading.
On the overloaded operator, I would define my condition for equality.
http://msdn.microsoft.com/en-us/library/aa288467%28VS.71%29.aspx

We always just end up writing a method on the class that goes through everything and compares it. You could implement this as IComparable, or override Equals.

As the comment said, depends on how "exact" you want to measure.
You could just override equality and implement a GetHashCode method, however this does not guarantee they are exact matches. Will however ensure they are "very likely" an exact match.
Next thing you could do, is to go through every property/field in the class and compare those hash values. This would be "extremely likely" an exact match.
And to truly get an exact match, you have to compare every field and member in a recursive loop...not recommended.

If I were you, I would implement the IComparable Interface on the two types:
http://msdn.microsoft.com/en-us/library/system.icomparable.aspx
From there you can use .CompareTo, and implement the exact comparisons required under your circumstances. This is a general best practice within .NET and I think applies well to your case.

Depends on what you what you want to do with comparison. Like others have pointed out IComparer is a good choice. If you are planning on using lambdas and LINQ, I would go with IEqualityComparer
http://msdn.microsoft.com/en-us/library/system.collections.iequalitycomparer.aspx

In general, you need a method to check the two, regardless of whether or not you overload equals, or use IComparer.
You asked how to do it most efficiently, here are some tips:
Your equality method should try to give up quickly, e.g. check if the size of the lists are the same, if they are not then return false right away
If you could implement an efficient hashCode, you could compare the hashes first, if they are not equal then the objects are not equal, if they are equal, then you need to compare the objects to see if the objects are equal
So in general, do the fastest comparisons first to try to return false.

Here is a somewhat simplified way to do it, using reflection. You will probably need to add other checks of datatypes for specific comparisons or loop through lists etc, but this should get you started.
void Mymethod(){
Class1 class1 = new Class1();
//define properties for class1
Class1 class2 = new Class1();
//define properties for class2
PropertyInfo[] properties = class1.GetType().GetProperties();
bool bClassesEqual = true;
foreach (PropertyInfo property in properties)
{
Console.WriteLine(property.Name.ToString());
if (property.GetValue(class1, null) != property.GetValue(class2, null))
{
bClassesEqual = false;
break;
}
}
}

Related

How to compare 2 instance of a class such that they are equal if at-most there Id differ? [duplicate]

I have a class like this
public class TestData
{
public string Name {get;set;}
public string type {get;set;}
public List<string> Members = new List<string>();
public void AddMembers(string[] members)
{
Members.AddRange(members);
}
}
I want to know if it is possible to directly compare to instances of this class to eachother and find out they are exactly the same? what is the mechanism? I am looking gor something like if(testData1 == testData2) //Do Something And if not, how to do so?
You should implement the IEquatable<T> interface on your class, which will allow you to define your equality-logic.
Actually, you should override the Equals method as well.
public class TestData : IEquatable<TestData>
{
public string Name {get;set;}
public string type {get;set;}
public List<string> Members = new List<string>();
public void AddMembers(string[] members)
{
Members.AddRange(members);
}
// Overriding Equals member method, which will call the IEquatable implementation
// if appropriate.
public override bool Equals( Object obj )
{
var other = obj as TestData;
if( other == null ) return false;
return Equals (other);
}
public override int GetHashCode()
{
// Provide own implementation
}
// This is the method that must be implemented to conform to the
// IEquatable contract
public bool Equals( TestData other )
{
if( other == null )
{
return false;
}
if( ReferenceEquals (this, other) )
{
return true;
}
// You can also use a specific StringComparer instead of EqualityComparer<string>
// Check out the specific implementations (StringComparer.CurrentCulture, e.a.).
if( EqualityComparer<string>.Default.Compare (Name, other.Name) == false )
{
return false;
}
...
// To compare the members array, you could perhaps use the
// [SequenceEquals][2] method. But, be aware that [] {"a", "b"} will not
// be considerd equal as [] {"b", "a"}
return true;
}
}
One way of doing it is to implement IEquatable<T>
public class TestData : IEquatable<TestData>
{
public string Name {get;set;}
public string type {get;set;}
public List<string> Members = new List<string>();
public void AddMembers(string[] members)
{
Members.AddRange(members);
}
public bool Equals(TestData other)
{
if (this.Name != other.Name) return false;
if (this.type != other.type) return false;
// TODO: Compare Members and return false if not the same
return true;
}
}
if (testData1.Equals(testData2))
// classes are the same
You can also just override the Equals(object) method (from System.Object), if you do this you should also override GetHashCode see here
There are three ways objects of some reference type T can be compared to each other:
With the object.Equals method
With an implementation of IEquatable<T>.Equals (only for types that implement IEquatable<T>)
With the comparison operator ==
Furthermore, there are two possibilities for each of these cases:
The static type of the objects being compared is T (or some other base of T)
The static type of the objects being compared is object
The rules you absolutely need to know are:
The default for both Equals and operator== is to test for reference equality
Implementations of Equals will work correctly no matter what the static type of the objects being compared is
IEquatable<T>.Equals should always behave the same as object.Equals, but if the static type of the objects is T it will offer slightly better performance
So what does all of this mean in practice?
As a rule of thumb you should use Equals to check for equality (overriding object.Equals as necessary) and implement IEquatable<T> as well to provide slightly better performance. In this case object.Equals should be implemented in terms of IEquatable<T>.Equals.
For some specific types (such as System.String) it's also acceptable to use operator==, although you have to be careful not to make "polymorphic comparisons". The Equals methods, on the other hand, will work correctly even if you do make such comparisons.
You can see an example of polymorphic comparison and why it can be a problem here.
Finally, never forget that if you override object.Equals you must also override object.GetHashCode accordingly.
I see many good answers here but just in case you want the comparison to work like
if(testData1 == testData2) // DoSomething
instead of using Equals function you can override == and != operators:
public static bool operator == (TestData left, TestData right)
{
bool comparison = true; //Make the desired comparison
return comparison;
}
public static bool operator != (TestData left, TestData right)
{
return !(left == right);
}
You can override the equals method and inside it manually compare the objects
Also take a look at Guidelines for Overloading Equals() and Operator ==
You will need to define the rules that make object A equal to object B and then override the Equals operator for this type.
http://msdn.microsoft.com/en-us/library/ms173147(v=vs.80).aspx
First of all equality is difficult to define and only you can define as to what equality means for you
Does it means members have same value
Or they are pointing to same location.
Here is a discussion and an answer here
What is "Best Practice" For Comparing Two Instances of a Reference Type?
Implement the IEquatable<T> interface. This defines a generalized method that a value type or class implements to create a type-specific method for determining equality of instances. More information here:
http://msdn.microsoft.com/en-us/library/ms131187.aspx

System.Collections.Immutable types: why no .Equals

var a = ImmutableList<int>.Empty.Add(1).Add(2).Add(3);
var b = ImmutableList<int>.Empty.Add(1).Add(2).Add(3);
Console.WriteLine(a.Equals(b)); // False
In the code above the a.Equals(b) calls Object.Equals, because ImmutableList<T> doens't override Equals(object), and as ImmutableList<T> is a reference type Object.Equals does (the useless) reference comparison.
Question: Why doesn't ImmutableList<T> override .Equals? It would be straightforward and expected to have it compare each contained object with .Equals and return the result based on those comparisons. It would even be consistent to test rest of the framework (see class String )
note: the above code is tested with System.Collections.Immutable.1.1.38-beta-23516
You want to do is test the contents of the collections for equality. No .NET collections override Equals to do this. Instead, use SequenceEqual:
Console.WriteLine(a.SequenceEqual(b));
As to why -- that's a matter of opinion, I suppose. Most reference oriented platforms do their best to not confuse reference equality with content equality.
String is actually very special case and though it does implement IEnumerable, isn't typically treated as a proper container in the sense that List/etc. are.
You 'could' create a wrapper class and override the Equality and HashCode methods... This could be helpful for c# records so that you don't have to keep overriding the auto generated methods.
public sealed class ImmutableListSequence<T>
{
public ImmutableListSequence(ImmutableList<T> items)
{
Items = items;
}
public ImmutableList<T> Items { get; }
public override int GetHashCode()
{
unchecked
{
return Items.Aggregate(0, (agg, curr) => (agg * 397) ^ (curr != null ? curr.GetHashCode() : 0));
}
}
public override bool Equals(object? obj)
{
if (obj is ImmutableListSequence<T> second)
{
return Items.SequenceEqual(second.Items);
}
return false;
}
}
None of the collections do. They all inherit Object.Equals(Object) and don't override it. Use Enumerable.SequenceEqual method to compare elements of two collections.

Is it safe to override GetHashCode and get it from string property?

I have a class:
public class Item
{
public string Name { get; set; }
public override int GetHashCode()
{
return Name.GetHashCode();
}
}
The purpose of overriding GetHashCode is that I want to have only one occurence of an object with specified name in Dictionary.
But is it safe to get hash code from string?
In other words, is there any chance that two objects with different values of property Name would return the same hash code?
But is it safe to get hash code from string?
Yes, it is safe. But, what you're doing isn't. You're using a mutable string field to generate your hash code. Let's imagine that you inserted an Item as a key for a given value. Then, someone changes the Name string to something else. You now are no longer able to find the same Item inside your Dictionary, HashSet, or whichever structure you use.
More-so, you should be relying on immutable types only. I'd also advise you to implement IEquatable<T> as well:
public class Item : IEquatable<Item>
{
public Item(string name)
{
Name = name;
}
public string Name { get; }
public bool Equals(Item other)
{
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return string.Equals(Name, other.Name);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != this.GetType()) return false;
return Equals((Item) obj);
}
public static bool operator ==(Item left, Item right)
{
return Equals(left, right);
}
public static bool operator !=(Item left, Item right)
{
return !Equals(left, right);
}
public override int GetHashCode()
{
return (Name != null ? Name.GetHashCode() : 0);
}
}
is there any chance that two objects with different values of property
Name would return the same hash code?
Yes, there is a statistical chance that such a thing will happen. Hash codes do not guarantee uniqueness. They strive for uni-formal distribution. Why? because your upper boundary is Int32, which is 32bits. Given the Pigenhole Principle, you may happen at end up with two different strings containing the same hash code.
Your class is buggy, because you have a GetHashCode override, but no Equals override. You also don't consider the case where Name is null.
The rule for GetHashCode is simple:
If a.Equals(b) then it must be the case that a.GetHashCode() == b.GetHashCode().
The more cases where if !a.Equals(b) then a.GetHashCode() != b.GetHashCode() the better, indeed the more cases where !a.Equals(b) then a.GetHashCode() % SomeValue != b.GetHashCode() % SomeValue the better, for any given SomeValue (you can't predict it) so we like to have a good mix of bits in the results. But the vital thing is that two objects considered equal must have equal GetHashCode() results.
Right now this isn't the case, because you've only overridden one of these. However the following is sensible:
public class Item
{
public string Name { get; set; }
public override int GetHashCode()
{
return Name == null ? 0 : Name.GetHashCode();
}
public override bool Equals(object obj)
{
var asItem = obj as Item;
return asItem != null && Name == obj.Name;
}
}
The following is even better, because it allows for faster strongly-typed equality comparisons:
public class Item : IEquatable<Item>
{
public string Name { get; set; }
public override int GetHashCode()
{
return Name == null ? 0 : Name.GetHashCode();
}
public bool Equals(Item other)
{
return other != null && Name == other.Name;
}
public override bool Equals(object obj)
{
return Equals(obj as Item);
}
}
In other words, is there any chance that two objects with different values of property Name would return the same hash code?
Yes, this can happen, but it won't happen often, so that's fine. The hash-based collections like Dictionary and HashSet can handle a few collisions; indeed there'll be collisions even if the hash codes are all different because they're modulo'd down to a smaller index. It's only if this happens a lot that it impacts performance.
Another danger is that you'll be using a mutable value as a key. There's a myth that you shouldn't use mutable values for hash-codes, which isn't true; if a mutable object has a mutable property that affects what it is considered equal with then it must result in a change to the hash-code.
The real danger is mutating an object that is a key to a hash collection at all. If you are defining equality based on Name and you have such an object as the key to a dictionary then you must not change Name while it is used as such a key. The easiest way to ensure that is to have Name be immutable, so that is definitely a good idea if possible. If it is not possible though, you need to be careful just when you allow Name to be changed.
From a comment:
So, even if there is a collision in hash codes, when Equals will return false (because the names are different), the Dictionary will handle propertly?
Yes, it will handle it, though it's not ideal. We can test this with a class like this:
public class SuckyHashCode : IEquatable<SuckyHashCode>
{
public int Value { get; set; }
public bool Equals(SuckyHashCode other)
{
return other != null && other.Value == Value;
}
public override bool Equals(object obj)
{
return Equals(obj as SuckyHashCode);
}
public override int GetHashCode()
{
return 0;
}
}
Now if we use this, it works:
var dict = Enumerable.Range(0, 1000).Select(i => new SuckyHashCode{Value = i}).ToDictionary(shc => shc);
Console.WriteLine(dict.ContainsKey(new SuckyHashCode{Value = 3})); // True
Console.WriteLine(dict.ContainsKey(new SuckyHashCode{Value = -1})); // False
However, as the name suggests, it isn't ideal. Dictionaries and other hash-based collections all have means to deal with collisions, but those means mean that we no longer have the great nearly O(1) look-up, but rather as the percentage of collisions gets greater the look-up approaches O(n). In the case above where the GetHashCode is as bad as it could be without actually throwing an exception, the look-up would be O(n) which is the same as just putting all the items into an unordered collection and then finding them by looking at every one to see if it matches (indeed, due to differences in overheads, it's actually worse than that).
So for this reason we always want to avoid collisions as much as possible. Indeed, to not just avoid collisions, but to avoid collisions after the result has been modulo'd down to make a smaller hash code (because that's what happens internally to the dictionary).
In your case though because string.GetHashCode() is reasonably good at avoiding collisions, and because that one string is the only thing that equality is defined by, your code would in turn be reasonably good at avoiding collisions. More collision-resistant code is certainly possible, but comes at a cost to performance in the the code itself* and/or is more work than can be justified.
*(Though see https://www.nuget.org/packages/SpookilySharp/ for code of mine that is faster than string.GetHashCode() on large strings on 64-bit .NET and more collision-resistant, though it is slower to produce those hash codes on 32-bit .NET or when the string is short).
Instead of using GetHashCode to prevent duplicates to be added to a dictionary, which is risky in your case as explained already, I would recommend to use a (custom) equality comparer for your dictionary.
If the key is an object, you should create an own equality comparer that compares the string Name value. If the key is the string itself, you can use StringComparer.CurrentCulture for example.
Also in this case it is key to make the string immutable, since else you might invalidate your dictionary by changing the Name.

IEquatable, how to implement this properly [duplicate]

This question already has answers here:
Is there a complete IEquatable implementation reference?
(5 answers)
Closed 2 years ago.
I am using .net 2.0 and c# and I have implemented the IEquatible interface in my class like this:-
public MyClass() : IEquatable<MyClass>
{
Guid m_id = Guid.NewGuid();
public Guid Id
{
get
{
return m_id;
}
}
#region IEquatable<MyClass> Members
public bool Equals(MyClass other)
{
if (this.Id == other.Id)
{
return true;
}
else
{
return false;
}
}
#endregion
}
Is this bad programming practice? I've read that I also need to implement Object.Equals and Object.GetHashCode as well, but I am not sure why.
I want to be able to check that an instance of MyClass is not already contained in a generic list of type MyClass. Why does the framework only suggests that you implement Equals only?
Any help would be greatly appreciated.
You can check if your list contains an item using a custom predicate for the criteria, using LINQ. In that case you don't need to override Equals nor implement IEquatable:
// check if the list contains an item with a specific ID
bool found = someList.Any(item => item.ID == someId);
Overriding Equals (with GetHashCode) and implementing IEquatable is useful if you need to store your item in a Dictionary or a Hashtable.
Is this bad programming practice?
Implementing IEquatable<T> is great, even more so for structs, but merely doing that much is not enough.
I've read that I also need to implement Object.Equals
Read it here why..
and Object.GetHashCode as well, but I am not sure why.
Read it here and here. Seriously, these have been discussed so many times, and it is pretty simple.. In short, you need it for collection types that deals with hashes like Dictionary<,> or HashSet<>
I want to be able to check that an instance of MyClass is not already contained in a generic list of type MyClass. Why does the framework only suggests that you implement Equals only?
Depends on the collection type. For a List<T>, it will check equality merely based on how you have defined Equals method, say for Contains method. For most scenario you will need Equals only. But if you have a HashSet<T> then absence and presence checks will utilize hash of your objects. Framework indeed asks us to implement good hashing approaches (without re-inventing the wheel) at appropriate places.
Any help would be greatly appreciated.
Do as below, but you have to overload operators == and != only if it make sense to you. Seeing your class I assumed its ok to have value semantics for your class. Otherwise just ignore that part (if == should mean reference equality)... Getting hashcode from your guid would suffice, provided that is all you need to test equality.
public sealed class MyClass : IEquatable<MyClass>
{
Guid m_id = Guid.NewGuid();
public Guid Id { get { return m_id; } }
public bool Equals(MyClass other)
{
if (ReferenceEquals(this, other))
return true;
if (ReferenceEquals(null, other))
return false;
return Id == other.Id;
}
public override bool Equals(object obj)
{
return Equals(obj as MyClass);
}
public static bool operator ==(MyClass lhs, MyClass rhs)
{
if (ReferenceEquals(lhs, null))
return ReferenceEquals(rhs, null);
return lhs.Equals(rhs);
}
public static bool operator !=(MyClass lhs, MyClass rhs)
{
return !(lhs == rhs);
}
public override int GetHashCode()
{
return Id.GetHashCode();
}
}
To not get it wrong, make use of the snippet available here: For a good overview see this SO thread.

Comparing objects

I have a class it contains some string members, some double members and some array objects.
I create two objects of this class, is there any simplest, efficient way of comparing these objects and say their equal? Any suggestions?
I know how to write a compare function, but will it be time consuming.
The only way you can really do this is to override bool Object.Equals(object other) to return true when your conditions for equality are met, and return false otherwise. You must also override int Object.GetHashCode() to return an int computed from all of the data that you consider when overriding Equals().
As an aside, note that the contract for GetHashCode() specifies that the return value must be equal for two objects when Equals() would return true when comparing them. This means that return 0; is a valid implementation of GetHashCode() but it will cause inefficiencies when objects of your class are used as dictionary keys, or stored in a HashSet<T>.
The way I implement equality is like this:
public class Foo : IEquatable<Foo>
{
public bool Equals(Foo other)
{
if (other == null)
return false;
if (other == this)
return true; // Same object reference.
// Compare this to other and return true/false as appropriate.
}
public override bool Equals(Object other)
{
return Equals(other as Foo);
}
public override int GetHashCode()
{
// Compute and return hash code.
}
}
A simple way of implementing GetHashCode() is to XOR together the hash codes of all of the data you consider for equality in Equals(). So if, for example, the properties you compare for equality are string FirstName; string LastName; int Id;, your implementation might look like:
public override int GetHashCode()
{
return (FirstName != null ? FirstName.GetHashCode() : 0) ^
(LastName != null ? LastName.GetHashCode() : 0) ^
Id; // Primitives of <= 4 bytes are their own hash codes
}
I typically do not override the equality operators, as most of the time I'm concerned with equality only for the purposes of dictionary keys or collections. I would only consider overriding the equality operators if you are likely to do more comparisons by value than by reference, as it is syntactically less verbose. However, you have to remember to change all places where you use == or != on your object (including in your implementation of Equals()!) to use Object.ReferenceEquals(), or to cast both operands to object. This nasty gotcha (which can cause infinite recursion in your equality test if you are not careful) is one of the primary reasons I rarely override these operators.
The 'proper' way to do it in .NET is to implement the IEquatable interface for your class:
public class SomeClass : IEquatable<SomeClass>
{
public string Name { get; set; }
public double Value { get; set; }
public int[] NumberList { get; set; }
public bool Equals(SomeClass other)
{
// whatever your custom equality logic is
return other.Name == Name &&
other.Value == Value &&
other.NumberList == NumberList;
}
}
However, if you really want to do it right, this isn't all you should do. You should also override the Equals(object, object) and GetHashCode(object) methods so that, no matter how your calling code is comparing equality (perhaps in a Dictionary or perhaps in some loosely-typed collection), your code and not reference-type equality will be the determining factor:
public class SomeClass : IEquatable<SomeClass>
{
public string Name { get; set; }
public double Value { get; set; }
public int[] NumberList { get; set; }
/// <summary>
/// Explicitly implemented IEquatable method.
/// </summary>
public bool IEquatable<SomeClass>.Equals(SomeClass other)
{
return other.Name == Name &&
other.Value == Value &&
other.NumberList == NumberList;
}
public override bool Equals(object obj)
{
var other = obj as SomeClass;
if (other == null)
return false;
return ((IEquatable<SomeClass>)(this)).Equals(other);
}
public override int GetHashCode()
{
// Determine some consistent way of generating a hash code, such as...
return Name.GetHashCode() ^ Value.GetHashCode() ^ NumberList.GetHashCode();
}
}
Just spent the whole day writing an extension method looping through reflecting over properties of an object with various complex bits of logic to deal with different property type and actually got it close to good, then at 16:55 it dawned on me that if you serialize the two object, you simply need compare the two strings ... duh
So here is a simple serializer extension method that even works on Dictionaries
public static class TExtensions
{
public static string Serialize<T>(this T thisT)
{
var serializer = new DataContractSerializer(thisT.GetType());
using (var writer = new StringWriter())
using (var stm = new XmlTextWriter(writer))
{
serializer.WriteObject(stm, thisT);
return writer.ToString();
}
}
}
Now your test can be as simple as
Asset.AreEqual(objA.Serialise(), objB.Serialise())
Haven't done extensive testing yet, but looks promising and more importantly, simple. Either way still a useful method to have in your utility set right ?
The best answer is to implement IEquatable for your class - it may not be the answer you want to hear, but that's the best way to implement value equivalence in .NET.
Another option would be computing a unique hash of all of the members of your class and then doing value comparisons against those, but that's even more work than writing a comparison function ;)
Since these are objects my guess is that you will have to override the Equals method for objects. Otherwise the Equals method will give you ok only if both objects refering to the same object.
I know this is not the answer you want. But since there is little number of properties in your class you can easily override the method.

Categories