Why two object's hash code is not same even though they have similar values. What is the other best approach to find value equality among objects with out reading ones each property and have check with other ones property?
Person person = new Person();
person.Name = "X";
person.Age = 25;
person.Zip = 600056;
person.Sex = 'M';
Person person1 = new Person();
person1.Name = "X";
person1.Age = 25;
person1.Zip = 600056;
person1.Sex = 'M';
int hashCode1 = person1.Name.GetHashCode();
int hashCode = person.Name.GetHashCode();
// hashCode1 and hashCode values are same.
if (person.GetHashCode() == person1.GetHashCode())
{
// Condition is not satisfied
}
in your code hashCode1 == hashCode is true because hashing same string will always give you the same result. however insances are different so you have to override GetHashCode() in a way that fits your business logic for example
public override int GetHashCode()
{
return Name.GetHashCode() ^ Zip.GetHashCode() ^ Sex ^ Age;
}
I sugest you take a look at this answer GetHashCode Guidelines in C#.
and http://musingmarc.blogspot.com/2007/08/vtos-rtos-and-gethashcode-oh-my.html
In your example, both objects are alike but not the same. They are different instances, so they have different hash code. Also person1 == person2 and person1.Equals(person2) are false.
You can override this behavior. If you consider the two objects are the same if those properties are equal, the you can:
public override bool Equals(object other) {
if(other == this) return true;
var person = other as Person;
if(person == null) return false;
return person.Name == Name && person.Age == Age && person.Zip == Zip && person.Sex == Sex;
}
public override int GetHashCode() {
//some logic to create the Hash Code based on the properties. i.e.
return (Name + Age + Zip + Sex).GetHashCode(); // this is just a bad example!
}
Why two object's hash code is not same even though they have similar values
Because that's the expected way of identifying them uniquely. AFAIK, most frameworks/libraries uses the pointer value for this by default.
What is the other best approach to find value equality among objects with out reading ones each property and have check with other ones property?
Depends on what makes them "equal", according to your needs. Basically it's still comparing properties, but maybe just more limited.
Related
Seeing from Artech's blog and then we had a discussion in the comments. Since that blog is written in Chinese only, I'm taking a brief explanation here. Code to reproduce:
[AttributeUsage(AttributeTargets.Class, Inherited = true, AllowMultiple = true)]
public abstract class BaseAttribute : Attribute
{
public string Name { get; set; }
}
public class FooAttribute : BaseAttribute { }
[Foo(Name = "A")]
[Foo(Name = "B")]
[Foo(Name = "C")]
public class Bar { }
//Main method
var attributes = typeof(Bar).GetCustomAttributes(true).OfType<FooAttribute>().ToList<FooAttribute>();
var getC = attributes.First(item => item.Name == "C");
attributes.Remove(getC);
attributes.ForEach(a => Console.WriteLine(a.Name));
The code gets all FooAttribute and removes the one whose name is "C". Obviously the output is "A" and "B"? If everything was going smoothly you wouldn't see this question. In fact you will get "AC" "BC" or even correct "AB" theoretically (I got AC on my machine, and the blog author got BC). The problem results from the implementation of GetHashCode/Equals in System.Attribute. A snippet of the implementation:
[SecuritySafeCritical]
public override int GetHashCode()
{
Type type = base.GetType();
//*****NOTICE*****
FieldInfo[] fields = type.GetFields(BindingFlags.NonPublic
| BindingFlags.Public
| BindingFlags.Instance);
object obj2 = null;
for (int i = 0; i < fields.Length; i++)
{
object obj3 = ((RtFieldInfo) fields[i]).InternalGetValue(this, false, false);
if ((obj3 != null) && !obj3.GetType().IsArray)
{
obj2 = obj3;
}
if (obj2 != null)
{
break;
}
}
if (obj2 != null)
{
return obj2.GetHashCode();
}
return type.GetHashCode();
}
It uses Type.GetFields so the properties inherited from base class are ignored, hence the equivalence of the three instances of FooAttribute (and then the Remove method takes one randomly). So the question is: is there any special reason for the implementation? Or it's just a bug?
A clear bug, no. A good idea, perhaps or perhaps not.
What does it mean for one thing to be equal to another? We could get quite philosophical, if we really wanted to.
Being only slightly philosophical, there are a few things that must hold:
Equality is reflexive: Identity entails equality. x.Equals(x) must hold.
Equality is symmetric. If x.Equals(y) then y.Equals(x) and if !x.Equals(y) then !y.Equals(x).
Equality is transitive. If x.Equals(y) and y.Equals(z) then x.Equals(z).
There's a few others, though only these can directly be reflected by the code for Equals() alone.
If an implementation of an override of object.Equals(object), IEquatable<T>.Equals(T), IEqualityComparer.Equals(object, object), IEqualityComparer<T>.Equals(T, T), == or of != does not meet the above, it's a clear bug.
The other method that reflects equality in .NET are object.GetHashCode(), IEqualityComparer.GetHashCode(object) and IEqualityComparer<T>.GetHashCode(T). Here there's the simple rule:
If a.Equals(b) then it must hold that a.GetHashCode() == b.GetHashCode(). The equivalent holds for IEqualityComparer and IEqualityComparer<T>.
If that doesn't hold, then again we've got a bug.
Beyond that, there are no over-all rules on what equality must mean. It depends on the semantics of the class provided by its own Equals() overrides or by those imposed upon it by an equality comparer. Of course, those semantics should either be blatantly obvious or else documented in the class or the equality comparer.
In all, how does an Equals and/or a GetHashCode have a bug:
If it fails to provide the reflexive, symmetric and transitive properties detailed above.
If the relationship between GetHashCode and Equals is not as above.
If it doesn't match its documented semantics.
If it throws an inappropriate exception.
If it wanders off into an infinite loop.
In practice, if it takes so long to return as to cripple things, though one could argue there's a theory vs. practice thing here.
With the overrides on Attribute, the equals does have the reflexive, symmetric and transitive properties, it's GetHashCode does match it, and the documentation for it's Equals override is:
This API supports the .NET Framework infrastructure and is not intended to be used directly from your code.
You can't really say your example disproves that!
Since the code you complain about doesn't fail on any of these points, it's not a bug.
There's a bug though in this code:
var attributes = typeof(Bar).GetCustomAttributes(true).OfType<FooAttribute>().ToList<FooAttribute>();
var getC = attributes.First(item => item.Name == "C");
attributes.Remove(getC);
You first ask for an item that fulfills a criteria, and then ask for one that is equal to it to be removed. There's no reason without examining the semantics of equality for the type in question to expect that getC would be removed.
What you should do is:
bool calledAlready;
attributes.RemoveAll(item => {
if(!calledAlready && item.Name == "C")
{
return calledAlready = true;
}
});
That is to say, we use a predicate that matches the first attribute with Name == "C" and no other.
Yep, a bug as others have already mentioned in the comments. I can suggest a few possible fixes:
Option 1, Don't use inheritence in the Attribute class, this will allow the default implementation to function. The other option is use a custom comparer to ensure you are using reference equality when removing the item. You can implement a comparer easily enough. Just use Object.ReferenceEquals for comparison and for your use you could use the type's hash code or use System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode.
public sealed class ReferenceEqualityComparer<T> : IEqualityComparer<T>
{
bool IEqualityComparer<T>.Equals(T x, T y)
{
return Object.ReferenceEquals(x, y);
}
int IEqualityComparer<T>.GetHashCode(T obj)
{
return System.Runtime.CompilerServices.RuntimeHelpers.GetHashCode(obj);
}
}
I have two lists see below.....result is coming back as empty
List<Pay>olist = new List<Pay>();
List<Pay> nlist = new List<Pay>();
Pay oldpay = new Pay()
{
EventId = 1,
Number = 123,
Amount = 1
};
olist.Add(oldpay);
Pay newpay = new Pay ()
{
EventId = 1,
Number = 123,
Amount = 100
};
nlist.Add(newpay);
var Result = nlist.Intersect(olist);
any clue why?
You need to override the Equals and GetHashCode methods in your Pay class, otherwise Intersect doesn't know when 2 instances are considered equal. How could it guess that it is the EventId that determines equality? oldPay and newPay are different instances, so by default they're not considered equal.
You can override the methods in Pay like this:
public override int GetHashCode()
{
return this.EventId;
}
public override bool Equals(object other)
{
if (other is Pay)
return ((Pay)other).EventId == this.EventId;
return false;
}
Another option is to implement an IEqualityComparer<Pay> and pass it as a parameter to Intersect:
public class PayComparer : IEqualityComparer<Pay>
{
public bool Equals(Pay x, Pay y)
{
if (x == y) // same instance or both null
return true;
if (x == null || y == null) // either one is null but not both
return false;
return x.EventId == y.EventId;
}
public int GetHashCode(Pay pay)
{
return pay != null ? pay.EventId : 0;
}
}
...
var Result = nlist.Intersect(olist, new PayComparer());
Intersect is probably only adding objects when the same instance of Pay is in both List. As oldPay and newPay are instantiated apart they're considered not equal.
Intersect uses the Equals method to compare objects. If you don't override it it keeps the same behavior of the Object class: returning true only if both are the same instance of the object.
You should override the Equals method in Pay.
//in the Pay class
public override bool Equals(Object o) {
Pay pay = o as Pay;
if (pay == null) return false;
// you haven't said if Number should be included in the comparation
return EventId == pay.EventId; // && Number == pay.Number; (if applies)
}
Objects are reference types. When you create two objects, you have two unique references. The only way they would ever compare equal is if you did:
object a = new object();
object b = a;
In this case, (a == b) is true. Read up on reference vs value types, and objects
And to fix your issue, override Equals and GetHashCode, as Thomas Levesque pointed out.
As others have noted, you need to provide the appropriate overrides to get Intersect to work correctly. But there is another way if you don't want to bother with overrides and your use case is simple. This assumes you want to match items on EventId, but you can modify this to compare any property. Note that this approach is likely more expensive than calling Intersect, but for small data sets it may not matter.
List<Pay> intersectedPays = new List<Pay>();
foreach (Pay o in olist)
{
var intersectedPay = nlist.Where(n => n.EventId == o.EventId).SingleOrDefault();
if (intersectedPay != null)
intersectedPays.Add(intersectedPay);
}
List<Pay> result = intersectedPays;
I have a dialog, when spawned it gets populated with the data in an object model. At this point the data is copied and stored in a "backup" object model. When the user has finished making their changes, and click "ok" to dismiss the dialog, I need a quick way of comparing the backup object model with the live one - if anything is changed I can create the user a new undo state.
I don't want to have to go and write comparison function for every single class in the object model if possible.
If I serialised both object models and they were identical but stored in different memory locations would they be equal? Does some simple way exist to compare two serialised object models?
I didn't bother with a hash string but just a straight Binary serialisation works wonders. When the dialog opens serialise the object model.
BinaryFormatter formatter = new BinaryFormatter();
m_backupStream = new MemoryStream();
formatter.Serialize(m_backupStream,m_objectModel);
Then if the user adds to the object model using available controls (or not). When the dialog closes you can compare to the original serialisation with a new one - this for me is how i decide whether or not an Undo state is required.
BinaryFormatter formatter = new BinaryFormatter();
MemoryStream liveStream = new MemoryStream();
formatter.Serialize(liveStream,m_objectModel);
byte[] streamOneBytes = liveStream.ToArray();
byte[] streamTwoBytes = m_backupStream.ToArray();
if(!CompareArrays(streamOneBytes, streamTwoBytes))
AddUndoState();
And the compare arrays function incase anybody needs it - prob not the best way of comparing two arrays im sure.
private bool CompareArrays(byte[] a, byte[] b)
{
if (a.Length != b.Length)
return false;
for (int i = 0; i < a.Length;i++)
{
if (a[i] != b[i])
return false;
}
return true;
}
I'd say the best way is to implement the equality operators on all classes in your model (which is usually a good idea anyway if you're going to do comparisons).
class Book
{
public string Title { get; set; }
public string Author { get; set; }
public ICollection<Chapter> Chapters { get; set; }
public bool Equals(Book other)
{
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return Equals(other.Title, Title) && Equals(other.Author, Author) && Equals(other.Chapters, Chapters);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != typeof (Book)) return false;
return Equals((Book) obj);
}
public override int GetHashCode()
{
unchecked
{
int result = (Title != null ? Title.GetHashCode() : 0);
result = (result*397) ^ (Author != null ? Author.GetHashCode() : 0);
result = (result*397) ^ (Chapters != null ? Chapters.GetHashCode() : 0);
return result;
}
}
}
This snippet is auto-generated by ReSharper, but you can use this as a basis. Basically you will have to extend the non overriden Equals method with your custom comparison logic.
For instance, you might want to use SequenceEquals from the Linq extensions to check if the chapters collection is equal in sequence.
Comparing two books will now be as simple as saying:
Book book1 = new Book();
Book book2 = new Book();
book1.Title = "A book!";
book2.Title = "A book!";
bool equality = book1.Equals(book2); // returns true
book2.Title = "A different Title";
equality = book1.Equals(book2); // returns false
Keep in mind that there's another way of implementing equality: the System.IEquatable, which is used by various classes in the System.Collections namespace for determining equality.
I'd say check that out as well and you're well on your way!
I understand your question to be how one can compare two objects for value equality (as opposed to reference equality) without prior knowledge of the types, such as if they implement IEquatable or override Equals.
To do this I recommend two options:
A. Use an all-purpose serialization class to serialize both objects and compare their value. For example I have a class called XmlSerializer that takes any object and serializes its public properties as an XML document. Two objects that have the same values and possibly the same reference will have the same values in this sense.
B. Using reflection, compare the values of all of the properties of both objects, like:
bool Equal(object a, object b)
{
// They're both null.
if (a == null && b == null) return true;
// One is null, so they can't be the same.
if (a == null || b == null) return false;
// How can they be the same if they're different types?
if (a.GetType() != b.GetType()) return false;
var Props = a.GetType().GetProperties();
foreach(var Prop in Props)
{
// See notes *
var aPropValue = Prop.GetValue(a) ?? string.Empty;
var bPropValue = Prop.GetValue(b) ?? string.Empty;
if(aPropValue.ToString() != bPropValue.ToString())
return false;
}
return true;
}
Here we're assuming that we can easily compare the properties, like if they all implement IConvertible, or correctly override ToString. If that's not the case I would check if they implement IConvertible and if not, recursively call Equal() on the properties.
This only works if you're content with comparing public properties. Of course you COULD check private and protected fields and properties too, but if you know so little about the objects you're probably asking for trouble but doing so.
Long story short: I have 2 collections of objects. One contains good values (Let's call it "Good"), the other default values (Mr. "Default"). I want the Intersect of the Union between Good and Default, and Default. In other words: Intersect(Union(Good, Default), Default). One might think it resolves as Default, but here is where it gets tricky : I use a custom IEqualityComparer.
I got the following classes :
class MyClass
{
public string MyString1;
public string MyString2;
public string MyString3;
}
class MyEqualityComparer : IEqualityComparer<MyClass>
{
public bool Equals(MyClass item1, MyClass item2)
{
if(item1 == null && item2 == null)
return true;
else if((item1 != null && item2 == null) ||
(item1 == null && item2 != null))
return false;
return item1.MyString1.Equals(item2.MyString1) &&
item1.MyString2.Equals(item2.MyString2);
}
public int GetHashCode(MyClass item)
{
return new { item.MyString1, item.MyString2 }.GetHashCode();
}
}
Here are the characteristic of my collections Good and Default collections :
Default : It's a large set, containing all the wanted { MyString1, MyString2 } pairs, but the MyString3 values are, as you can guess, default values.
Good : It's a smaller set, containing mostly items which are in the Default set, but with some good MyString3 values. It also has some { MyString1, MyString2 } that are outside of the wanted set.
What I want to do is this : Take only the items from Good that are in Default, but add the other items in Default to that.
Here is, what I think is, my best try :
HalfWantedResult = Good.Union(Default, new MyEqualityComparer());
WantedResult= HalfWantedResult.Intersect(Good, new MyEqualityComparer());
I taught it should have worked, but the result I get is basically only the good { MyString1, MyString2 } pairs set, but all coming from the Default set, so I have the default value all across. I also tried switching the Default and Good of the last Intersect, but I get the same result.
First of all this is wrong:
public bool Equals(MyClass item1, MyClass item2)
{
return GetHashCode(item1) == GetHashCode(item2);
}
If the hashcode's are different for sure the corresponding 2 items are different, but if they're equal is not guaranteed that the corresponding 2 items are equal.
So this is the correct Equals implementation:
public bool Equals(MyClass item1, MyClass item2)
{
if(object.ReferenceEquals(item1, item2))
return true;
if(item1 == null || item2 == null)
return false;
return item1.MyString1.Equals(item2.MyString1) &&
item1.MyString2.Equals(item2.MyString2);
}
As Slacks suggested (anticipating me) the code is the following:
var Default = new List<MyClass>
{
new MyClass{MyString1="A",MyString2="A",MyString3="-"},
new MyClass{MyString1="B",MyString2="B",MyString3="-"},
new MyClass{MyString1="X",MyString2="X",MyString3="-"},
new MyClass{MyString1="Y",MyString2="Y",MyString3="-"},
new MyClass{MyString1="Z",MyString2="Z",MyString3="-"},
};
var Good = new List<MyClass>
{
new MyClass{MyString1="A",MyString2="A",MyString3="+"},
new MyClass{MyString1="B",MyString2="B",MyString3="+"},
new MyClass{MyString1="C",MyString2="C",MyString3="+"},
new MyClass{MyString1="D",MyString2="D",MyString3="+"},
new MyClass{MyString1="E",MyString2="E",MyString3="+"},
};
var wantedResult = Good.Intersect(Default, new MyEqualityComparer())
.Union(Default, new MyEqualityComparer());
// wantedResult:
// A A +
// B B +
// X X -
// Y Y -
// Z Z -
You need to check for actual equality, not just hashcode equality.
GetHashCode() is not (and cannot be) collision free, which is why the Equals method is required in the first place.
Also, you can do this much more simply by writing
WantedResult = Good.Concat(Default).Distinct();
The Distinct method will return the first item of each pair of duplicates, so this will return the desired result.
EDIT: That should be
WantedResult = Good.Intersect(Default, new MyEqualityComparer())
.Union(Default, new MyEqualityComparer());
I am expecting a HashSet that has been created with a specified EqualityComparer to use that comparer on a Remove operation. Especially since the Contains operations returns true!
Here is the code I am using:
public virtual IEnumerable<Allocation> Allocations { get { return _allocations; } }
private ICollection<Allocation> _allocations;
public Activity(IActivitySubject subject) { // constructor
....
_allocations = new HashSet<Allocation>(new DurationExcludedEqualityComparer());
}
public virtual void ClockIn(Allocation a)
{
...
if (_allocations.Contains(a))
_allocations.Remove(a);
_allocations.Add(a);
}
Below is some quick and dirty LINQ that gets me the logic I want, but I am guessing the HashSet remove based on the EqualityComparer would be significantly faster.
public virtual void ClockIn(Allocation a)
{
...
var found = _allocations.Where(x => x.StartTime.Equals(a.StartTime) && x.Resource.Equals(a.Resource)).FirstOrDefault();
if (found != null)
{
if (!Equals(found.Duration, a.Duration))
{
found.UpdateDurationTo(a.Duration);
}
}
else
{
_allocations.Add(a);
}
Can anyone suggest why the Remove would fail when the Contains succeeds?
Cheers,
Berryl
=== EDIT === the comparer
public class DurationExcludedEqualityComparer : EqualityComparer<Allocation>
{
public override bool Equals(Allocation lhs, Allocation rhs)
{
if (ReferenceEquals(null, rhs)) return false;
if (ReferenceEquals(lhs, null)) return false;
if (ReferenceEquals(lhs, rhs)) return true;
return
lhs.StartTime.Equals(rhs.StartTime) &&
lhs.Resource.Equals(rhs.Resource) &&
lhs.Activity.Equals(rhs.Activity);
}
public override int GetHashCode(Allocation obj) {
if (ReferenceEquals(obj, null)) return 0;
unchecked
{
var result = 17;
result = (result * 397) ^ obj.StartTime.GetHashCode();
result = (result * 397) ^ (obj.Resource != null ? obj.Resource.GetHashCode() : 0);
result = (result * 397) ^ (obj.Activity != null ? obj.Activity.GetHashCode() : 0);
return result;
}
}
}
=== UPDATE - FIXED ===
Well, the good news is that HashSet is not broken and works exactly as it should. The bad news, for me, is how incredibly stupid I can be when not being able to see the forest while examining the leaves on the trees!
The answer is actually in the posted code above, if you look at the class creating & owning the HashSet, and then taking another look at the Comparer to find out what is wrong with it. Easy points for the first person to spot it.
Thanks to all who looked at the code!
Well, your code that "works" appears to look at StartTime and Resource while ignoring Activity, whereas your IEqualityComparer<Allocation> implementation looks at all three. Could your problem be related to that?
Also: are your StartTime, Resource, and Activity properties unchanging? Otherwise, since they affect your GetHashCode result, I think you run the risk of breaking your HashSet<Allocation>.