Remove duplicates from a list containing a generic class - c#

I'm trying to remove duplicates from a list containing a generic class. The generic class looks like this (stripped back example):
public class Point2D<T>
{
public T x;
public T y;
public Point2D(T x, T y)
{
this.x = x;
this.y = y;
}
}
and I've created the list like this:
List<Point2D<int>> pointList = new List<Point2D<int>>();
pointList.Add(new Point2D<int>(1,1));
pointList.Add(new Point2D<int>(1,2));
pointList.Add(new Point2D<int>(1,1));
pointList.Add(new Point2D<int>(1,3));
I tried removing the duplicates by:
pointList = pointList.Distinct().ToList();
expecting that pointList would only contain: (1,1), (1,2), (1,3) but it still contains all four points that were entered. I suspect I need my own equals or comparator method in Point2D, but I don't know if this is the case, or how they should be written (unless of course I'm just missing something simple).

To do this, you need to override Equals method:
public class Point2D<T>
{
public readonly T x;
public readonly T y;
public Point2D(T x, T y)
{
this.x = x;
this.y = y;
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != this.GetType()) return false;
return Equals((Point2D<T>) obj);
}
protected bool Equals(Point2D<T> other)
{
return EqualityComparer<T>.Default.Equals(x, other.x) && EqualityComparer<T>.Default.Equals(y, other.y);
}
public override int GetHashCode()
{
unchecked
{
return (EqualityComparer<T>.Default.GetHashCode(x)*397) ^ EqualityComparer<T>.Default.GetHashCode(y);
}
}
}
Also, you need to override GetHashCode. But to do it correctly, you must make your x and y readonly fields

You can use Anonymous object. How ever this will change the references. so use it only when you do not need previous references.
pointList = pointList.Select(x => new {x.x,x.y}).Distinct().Select(x => new Point2D<int>(x.x, x.y)).ToList();

You will need to implement
IEquatable<T>
interface for this custom class. Check this link for more details and sample:
https://msdn.microsoft.com/en-us/library/vstudio/bb348436(v=vs.100).aspx

I would suggest overriding == operator. This should help.

Related

Efficient way of retrieving an item from a collection with a mutable key

I have a collection of items Foos that have a property FooPosition, and I need to quickly access Foos by their positions.
For example : retrieve a Foo which is located at X=0 and Y=1.
My first thought was to use a dictionary for that purpose nd to use the FooPosition as dictionary key. I know that every FooPosition in my collection is unique, I don't mind throwing an Exception if it is not the case.
This works well as long as Foos do not move all over the place.
But, as I figured out the hard way, and understood thanks to this and this posts, this does not work anymore if the FooPosition is updated. I shouldn't use mutable keys in a dictionary : the dictionary keeps the FooPosition HashCode in memory but does not update it when the underlying FooPosition is modified. Therefore, calling dic[Position(0,1)] gives me the Foo which was at this position when the dictionary was built.
So, I am now wondering what should I use to retrieve Foos by their positions efficiently.
By efficiently I mean not going all across the whole collection every time I query for a Foo by its position. Is there a suitable structure which would accomodate mutable keys?
Thanks for your help
EDIT
As mentioned rightfully in comments, there is a missing part in my question : I have no control over Foo Moves. The software is actually connected to another software (Excel via VSTO) via a COM Protocol which changes the FooPosition (Excel Ranges) without notifying the change.
Therefore, I cannot take take any action in case a move happens because I don't know that a change did happen.
public class FooManager
{
public void DoSomething(IList<Foo> foos) {
Dictionary<FooPosition, Foo> fooPositionDictionary = foos.ToDictionary(x => x.Position, x => x); //I know that position is unique
//Move Foos all around the place by changing their positions.
FooPosition queryPosition = new FooPosition(0, 1);
fooPositionDictionary.TryGetValue(queryPosition, out var foo1); //DOES NOT WORK
var foo2 = foos.FirstOrDefault(x => x.Position == queryPosition); //NOT EFFICIENT
//Any better idea?
}
}
public class Foo
{
public string Name { get; set; }
public FooPosition Position { get; set; }
}
public class FooPosition : IEquatable<FooPosition>
{
public int X { get; set; }
public int Y { get; set; }
public FooPosition(int x, int y)
{
X = x;
Y = y;
}
public void MoveBy(int i)
{
X = X + i;
Y = Y + i;
}
public bool Equals(FooPosition other)
{
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return X == other.X && Y == other.Y;
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != this.GetType()) return false;
return Equals((FooPosition) obj);
}
public override int GetHashCode()
{
unchecked
{
return (X * 397) ^ Y;
}
}
public static bool operator ==(FooPosition left, FooPosition right)
{
return Equals(left, right);
}
public static bool operator !=(FooPosition left, FooPosition right)
{
return !Equals(left, right);
}
}
In some sense a dictionary - as any other hash-based data-storage - uses some kind of caching. In this case the hashes are cached. However as for every cache you need some constant data that does not change during the lifetime of that data-storage. If there is no such constant data, there´s no way to efficiently cache that data.
So you end up to store all items in some linear collection - e.g. a List<T>- and iterate that list again and again.

How to check if list contains item [duplicate]

Why does this program print "not added" while I think it should print "added"?
using System;
using System.Collections.Generic;
class Element
{
public int id;
public Element(int id)
{
this.id = id;
}
public static implicit operator Element(int d)
{
Element ret = new Element(d);
return ret;
}
public static bool operator ==(Element e1, Element e2)
{
return (e1.id == e2.id);
}
public static bool operator !=(Element e1, Element e2)
{
return !(e1.id == e2.id);
}
}
class MainClass
{
public static void Main(string[] args)
{
List<Element> element = new List<Element>();
element.Add(2);
if(element.Contains(2))
Console.WriteLine("added");
else
Console.WriteLine("not added");
}
}
The Contains method does not use the == operator‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌. What is the problem?
The Contains method does not use the == operator‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌
No - it uses Equals, which you haven't overridden... so you're getting the default behaviour of Equals, which is to check for reference identity instead. You should override Equals(object) and GetHashCode to be consistent with each other - and for sanity's sake, consistent with your == overload too.
I'd also recommend implementing IEquatable<Element>, which List<Element> will use in preference to Equals(object), as EqualityComparer<T>.Default picks it up appropriately.
Oh, and your operator overloads should handle null references, too.
I'd also strongly recommend using private fields instead of public ones, and making your type immutable - seal it and make id readonly. Implementing equality for mutable types can lead to odd situations. For example:
Dictionary<Element, string> dictionary = new Dictionary<Element, string>();
Element x = new Element(10);
dictionary[x] = "foo";
x.id = 100;
Console.WriteLine(dictionary[x]); // No such element!
This would happen because the hash code would change (at least under most implementations), so the hash table underlying the dictionary wouldn't be able to find even a reference to the same object that's already in there.
So your class would look something like this:
internal sealed class Element : IEquatable<Element>
{
private readonly int id;
public int Id { get { return id; } }
public Element(int id)
{
this.id = id;
}
public static implicit operator Element(int d)
{
return new Element(d);
}
public static bool operator ==(Element e1, Element e2)
{
if (object.ReferenceEquals(e1, e2))
{
return true;
}
if (object.ReferenceEquals(e1, null) ||
object.ReferenceEquals(e2, null))
{
return false;
}
return e1.id == e2.id;
}
public static bool operator !=(Element e1, Element e2)
{
// Delegate...
return !(e1 == e2);
}
public bool Equals(Element other)
{
return this == other;
}
public override int GetHashCode()
{
return id;
}
public override bool Equals(object obj)
{
// Delegate...
return Equals(obj as Element);
}
}
(I'm not sure about the merit of the implicit conversion, by the way - I typically stay away from those, myself.)
The Contains method does not use the == operator‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌. What is the problem?
That is correct.
This method [Contains] determines equality by using the default equality comparer, as defined by the object's implementation of the IEquatable.Equals method for T (the type of values in the list).
http://msdn.microsoft.com/en-us/library/bhkz42b3(v=vs.110).aspx
You need to override Equals() as well. Note when you overload Equals(), it is almost always correct to also override GetHashCode().
Override Equals and GetHashCode like:
class Element
{
public int id;
protected bool Equals(Element other)
{
return id == other.id;
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != this.GetType()) return false;
return Equals((Element) obj);
}
public override int GetHashCode()
{
return id; //or id.GetHashCode();
}
//..... rest of the class
See: List<T>.Contains Method
This method determines equality by using the default equality
comparer, as defined by the object's implementation of the
IEquatable<T>.Equals method for T (the type of values in the list).

Overriding GetHash() And Equals()

I am having trouble overriding the GetHashCode() method and the Equals() method.
public class Coordinate
{
int x;
int y;
public Coordinate(int p,int q)
{
this.x = p ;
this.y = q;
}
}
Suppose I created two Coordinate point objects with same x and y coordinates .
I want my program to understand that they are equal.
Coordinate Point 1 = new Coordinate(0,0);
Coordinate Point 2 = new Coordinate(0,0);
By default they are giving different GetHashCode() as expected.
I want them to give same hash code by overriding it and then use that hash code as a Key to generate values from a Dictionary. After searching about it, I know that I also have to override Equals().
You have to override Equals(), because if two objects have the same hashcode, it doesn't mean they are to be considered equal. The hashcode simply acts as an "index" to speed up searches.
Every time you use new, an instance is created and it will not be the same instance as another instance. This is what ReferenceEquals() checks - imagine two identical bottles of soda - they're the same, but they're not the same bottle.
Equals() is meant to check whether you (the developer) want to consider two instances as equal, even though they are not the same instance.
You can implement something in this vein:
public override bool Equal(Object o) {
if (object.ReferenceEquals(o, this))
return true;
Coordinate other = o as Coordinate;
else if (null == other)
return false;
return x == other.x && y == other.y;
}
public override int GetHashCode() {
return x.GetHashCode() ^ y.GetHashCode();
}
where Equals return true if and only if instances are equal while GetHashCode() does a quick estimation (instances are not equal if they have different hash code, the reverse, however, is not true) and ensures uniform distribution of the hashes as far as it's possible (so that in Dictionary and alike structures we have roughly equally number of values per each key)
https://msdn.microsoft.com/en-us/library/336aedhh(v=vs.100).aspx
https://msdn.microsoft.com/en-us/library/system.object.gethashcode(v=vs.110).aspx
I would override the named methods like this. For the GetHashCode method I took one of several options from this question, but you can choose another if you like.
I also changed the class to immutable. You should only use immutable properties/fields to calculate the hashcode.
public class Coordinate {
public Coordinate(int p, int q) {
x = p;
y = q;
}
private readonly int x;
private readonly int y;
public int X { get { return x; } }
public int Y { get { return y; } }
public override int GetHashCode() {
unchecked // Overflow is fine, just wrap
{
int hash = (int) 2166136261;
// Suitable nullity checks etc, of course :)
hash = (hash * 16777619) ^ x.GetHashCode();
hash = (hash * 16777619) ^ y.GetHashCode();
return hash;
}
}
public override bool Equals(object obj) {
if (obj == null)
return false;
var otherCoordinate = obj as Coordinate;
if (otherCoordinate == null)
return false;
return
this.X == otherCoordinate.X &&
this.Y == otherCoordinate.Y;
}
}
Here's a simple way you to do it.
First, override the ToString() method of your class to something like this:
public override string ToString()
{
return string.Format("[{0}, {1}]", this.x, this.y);
}
Now you can easily override GetHashCode() and Equals() like this:
public override int GetHashCode()
{
return this.ToString().GetHashCode();
}
public override bool Equals(object obj)
{
return obj.ToString() == this.ToString();
}
Now if you try this:
Coordinate p1 = new Coordinate(5, 0);
Coordinate p2 = new Coordinate(5, 0);
Console.WriteLine(p1.Equals(p2));
you'll get:
True
What you try to do typically happens when you have immutable objects and such, anyway if you don't want to use a struct, you can do it like this :
public class Coord : IEquatable<Coord>
{
public Coord(int x, int y)
{
this.X = x;
this.Y = y;
}
public int X { get; }
public int Y { get; }
public override int GetHashCode()
{
object.Equals("a", "b");
// Just pick numbers that are prime between them
int hash = 17;
hash = hash * 23 + this.X.GetHashCode();
hash = hash * 23 + this.Y.GetHashCode();
return hash;
}
public override bool Equals(object obj)
{
var casted = obj as Coord;
if (object.ReferenceEquals(this, casted))
{
return true;
}
return this.Equals(casted);
}
public static bool operator !=(Coord first, Coord second)
{
return !(first == second);
}
public static bool operator ==(Coord first, Coord second)
{
if (object.ReferenceEquals(second, null))
{
if (object.ReferenceEquals(first, null))
{
return true;
}
return false;
}
return first.Equals(second);
}
public bool Equals(Coord other)
{
if (object.ReferenceEquals(other, null))
{
return false;
}
return object.ReferenceEquals(this, other) || (this.X.Equals(other.X) && this.Y.Equals(other.Y));
}
}
Note . You really should make your class immutable if you do custom equality since it could break your code if you use a hash based collection.
I think it is considered good practice to do all those overloads when you want custom equality checking like you do. Especially since when object.GetHashCode() returns the same value for two object, Dictionary and other hash based collections use the default equality operator which uses object.Equals.
Object.ReferenceEquals(Ob,Ob) determine reference equality, a.k.a if both reference point to the same allocated value, two references being equal ensure you it's the exact same object.
Object.Equals(Ob) is the virtual method in object class, by default it compares references just like Object.ReferenceEquals(Ob,Ob)
Object.Equals(Ob,Ob) calls the Ob.Equals(Ob), so yeah just a static shorthand checking for null beforehand IIRC.

Why doesn't List.Contains work as I expect?

Why does this program print "not added" while I think it should print "added"?
using System;
using System.Collections.Generic;
class Element
{
public int id;
public Element(int id)
{
this.id = id;
}
public static implicit operator Element(int d)
{
Element ret = new Element(d);
return ret;
}
public static bool operator ==(Element e1, Element e2)
{
return (e1.id == e2.id);
}
public static bool operator !=(Element e1, Element e2)
{
return !(e1.id == e2.id);
}
}
class MainClass
{
public static void Main(string[] args)
{
List<Element> element = new List<Element>();
element.Add(2);
if(element.Contains(2))
Console.WriteLine("added");
else
Console.WriteLine("not added");
}
}
The Contains method does not use the == operator‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌. What is the problem?
The Contains method does not use the == operator‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌
No - it uses Equals, which you haven't overridden... so you're getting the default behaviour of Equals, which is to check for reference identity instead. You should override Equals(object) and GetHashCode to be consistent with each other - and for sanity's sake, consistent with your == overload too.
I'd also recommend implementing IEquatable<Element>, which List<Element> will use in preference to Equals(object), as EqualityComparer<T>.Default picks it up appropriately.
Oh, and your operator overloads should handle null references, too.
I'd also strongly recommend using private fields instead of public ones, and making your type immutable - seal it and make id readonly. Implementing equality for mutable types can lead to odd situations. For example:
Dictionary<Element, string> dictionary = new Dictionary<Element, string>();
Element x = new Element(10);
dictionary[x] = "foo";
x.id = 100;
Console.WriteLine(dictionary[x]); // No such element!
This would happen because the hash code would change (at least under most implementations), so the hash table underlying the dictionary wouldn't be able to find even a reference to the same object that's already in there.
So your class would look something like this:
internal sealed class Element : IEquatable<Element>
{
private readonly int id;
public int Id { get { return id; } }
public Element(int id)
{
this.id = id;
}
public static implicit operator Element(int d)
{
return new Element(d);
}
public static bool operator ==(Element e1, Element e2)
{
if (object.ReferenceEquals(e1, e2))
{
return true;
}
if (object.ReferenceEquals(e1, null) ||
object.ReferenceEquals(e2, null))
{
return false;
}
return e1.id == e2.id;
}
public static bool operator !=(Element e1, Element e2)
{
// Delegate...
return !(e1 == e2);
}
public bool Equals(Element other)
{
return this == other;
}
public override int GetHashCode()
{
return id;
}
public override bool Equals(object obj)
{
// Delegate...
return Equals(obj as Element);
}
}
(I'm not sure about the merit of the implicit conversion, by the way - I typically stay away from those, myself.)
The Contains method does not use the == operator‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌. What is the problem?
That is correct.
This method [Contains] determines equality by using the default equality comparer, as defined by the object's implementation of the IEquatable.Equals method for T (the type of values in the list).
http://msdn.microsoft.com/en-us/library/bhkz42b3(v=vs.110).aspx
You need to override Equals() as well. Note when you overload Equals(), it is almost always correct to also override GetHashCode().
Override Equals and GetHashCode like:
class Element
{
public int id;
protected bool Equals(Element other)
{
return id == other.id;
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != this.GetType()) return false;
return Equals((Element) obj);
}
public override int GetHashCode()
{
return id; //or id.GetHashCode();
}
//..... rest of the class
See: List<T>.Contains Method
This method determines equality by using the default equality
comparer, as defined by the object's implementation of the
IEquatable<T>.Equals method for T (the type of values in the list).

How can I use a HashSet<MyCustomClass> to remove duplicates of MyCustomClass?

I have a HashSet<MyCustomClass> mySet = new HashSet<MyCustomClass>(); and I wish to remove all MyCustomClass's that contain the same values.
Let's say MyCustomClass looks like this:
public class MyCustomClass
{
Point point;
public MyCustomClass(int x, int y)
{
point.X = x;
point.Y = y;
}
// Other methods...
}
I tried to implement IEqualityComparer like MSDN suggests, and pass it through the constructor of the HashSet<MyCustomClass>(); but I ended up unsuccessfully.
What's the correct approach?
EDIT:
This is my Chain class and my ChainEqualityComparer:
public class Chain
{
HashSet<Mark> chain;
HashSet<Mark> marks;
public Chain(HashSet<Mark> marks)
{
chain = new HashSet<Mark>();
this.marks = marks;
}
// Other methods...
}
public class ChainEqualityComparer : IEqualityComparer<Chain>
{
#region IEqualityComparer<Chain> Members
public bool Equals(Chain x, Chain y)
{
if (x.ChainWithMarks.Count == y.ChainWithMarks.Count)
{
foreach (Mark mark in x.ChainWithMarks)
{
if (!y.ChainWithMarks.Contains(mark))
return false;
}
return true;
}
return false;
}
public int GetHashCode(Chain obj)
{
return obj.GetHashCode() ^ obj.GetType().GetHashCode();
}
#endregion
}
And this is my Mark class:
public class Mark
{
int x;
int y;
public Mark(int x, int y)
{
this.x = x;
this.y = y;
}
public int X
{
get { return x; }
set { x = value; }
}
public int Y
{
get { return y; }
set { y = value; }
}
}
public class MarkEqualityComparer : IEqualityComparer<Mark>
{
#region IEqualityComparer<Mark> Members
public bool Equals(Mark x, Mark y)
{
return (x.X == y.X) && (x.Y == y.Y);
}
public int GetHashCode(Mark obj)
{
return obj.GetHashCode() ^ obj.GetType().GetHashCode();
}
#endregion
}
(I can pastebin the code if it's too much code...)
You can use the EqualityComparer or just override Equals and GetHashCode.
You must make sure that whatever you consider to be a duplicate is identified as having an equivalent hash code, and returning true when tested for equality.
My guess is that you weren't returning equal hash codes. Could you post the code from your equality comparer?
As a test, you could do:
var set = new HashSet<MyCustomClass>();
var a = new MyCustomClass(1,2);
var b = new MyCustomClass(1,2);
set.Add(a);
set.Add(b);
Assert.IsTrue(a.Equals(b));
Assert.IsTrue(b.Equals(a));
Assert.AreEqual(a.GetHashCode(), b.GetHashCode());
Assert.AreEqual(1, set.Count);
A similar set of tests would be applicable to an equality comparer too.
EDIT
Yep, as suspected it's the hash code function. You need to calculate it based on the values of the type itself. A common enough mistake.
public int GetHashCode(Mark obj)
{
return ((MyCustomClass)obj).point.GetHashCode();
}
That assumes point is the only state field in your type.
I think you are getting tripped up becase two Mark instances with the same values won't be equal in your ChainEqualityComparer class. It doesn't appear like MarkEqualityComparer is ever used.
The line:
if (!y.ChainWithMarks.Contains(mark))
will always be false unless you override Equals and GetHashCode on the Mark class. (Except if you have two references to the same mark in both Chain x and Chain y, which I'm presuming is not what you want).
If y.ChainWithMarks is a HashSet and you want to use MarkEqualityComparer, make sure you create that HashSet with the correct constructor including an instance of MarkEqualityComparer.
Since Mark is a value type, you might consider using a struct to represent it, since the .Net runtime then uses value equality instead of referential equality when comparing. I think this is actually the most correct implementation of your idea.

Categories