HashSet initialized using IEnumerable contains duplicate elements - c#

I want to be able to use HashSet constructed from IEnumerable collection of custom objects without duplicates. My custom object contains id and some other properties which aren't important for this question. I make a query to a database which returns an IEnumerable that I later use to construct a HashSet with the following code:
HashSet<Question> results = new HashSet<Question>(new QuestionComparer());
var result = await query.ExecuteNextAsync<Question>();
results.UnionWith(result);
The problem is there are duplicate records inside the result collection which I do not want.
The QuestionComparer class looks like this:
public class QuestionComparer : IEqualityComparer<Question>
{
public bool Equals(Question x, Question y)
{
return x != null && y != null && x.Id == y.Id;
}
public int GetHashCode(Question obj)
{
return obj.Id.GetHashCode();
}
}
I also tried overriding both Equals and GetHashCode methods inside the Question class but no success. I considered looping through the collection and removing the duplicates, but it seems like it may become a performance problem...
EDIT: Azure DocumentDB that I am using apparently does not currently support a "distinct" type of query.

Instead of writing a public class QuestionComparer you should override the methods of the existing Question class
public class Question
{
public string ID { get; set; }
public override int GetHashCode()
{
return ID.GetHashCode();
}
public override bool Equals(System.Object obj)
{
return (obj != null && obj is Question) ? (this.ID == ((Question)(obj)).ID) : false;
}
}
So duplicates are not possible. Sample:
HashSet<Question> qh = new HashSet<Question>();
qh.Add(new Question() { ID = "1" });
qh.Add(new Question() { ID = "1" }); //will not be added
qh.Add(new Question() { ID = "2" });
https://dotnetfiddle.net/wrFTaA

Related

.Where with LINQ on custom collection?

I'm trying to return a collection of items, from an existing ConcurrentHashSet but only selecting the ones by a certain ID, I've coded the LINQ for it, the only thing I'm struggling with is it returns a IEnumerable and I have a custom ConcurrentHashSet class and I'm unsure how to convert it?
I've tried casting it using (ConcurrentHashSet) but it hasn't worked, my ConcurrentHashSet does implement IEnumerable so I'm guessing it is possible..
public ConcurrentHashSet<Item> GetItemsById(int id)
{
return Items.Where(x => x.HasLoaded && x.Id == id);
}
.Where returns an IEnumerable<T> and you cannot change that. Explicit casting as you tried does not work because through every ConcurrentHashSet is an IEnumerable (inherits), an IEnumerable is not necessarily a ConcurrentHashSet. The compiler can't say.
Either instantiate a new ConcurrentHashSet (assuming it has a
constructor receiving an IEnumerable, which it should have):
return new ConcurrentHashSet<Item>(Items.Where(x => x.HasLoaded && x.Id == id));
Or change your method to return IEnumerable<Item>
Code below returns a collection since it is enumerable in a foreach :
public class Test
{
ConcurrentHashSet hashSet = new ConcurrentHashSet();
public Test()
{
foreach (ConcurrentHashSet set in hashSet.GetItemsById(123))
{
}
}
}
public class ConcurrentHashSet : HashSet<ConcurrentHashSet>
{
public Boolean HasLoaded { get; set; }
public int Id { get; set; }
public List<ConcurrentHashSet> GetItemsById(int id)
{
return this.Where(x => x.HasLoaded && x.Id == id).ToList();
}
}

Value-equals and circular references: how to resolve infinite recursion?

I have some classes that contain several fields. I need to compare them by value, i.e. two instances of a class are equal if their fields contain the same data. I have overridden the GetHashCode and Equals methods for that.
It can happen that these classes contain circular references.
Example: We want to model institutions (like government, sports clubs, whatever). An institution has a name. A Club is an institution that has a name and a list of members. Each member is a Person that has a name and a favourite institution. If a member of a certain club has this club as his favourite institution, we have a circular reference.
But circular references, in conjunction with value equality, lead to infinite recursion. Here is a code example:
interface IInstitution { string Name { get; } }
class Club : IInstitution
{
public string Name { get; set; }
public HashSet<Person> Members { get; set; }
public override int GetHashCode() { return Name.GetHashCode() + Members.Count; }
public override bool Equals(object obj)
{
Club other = obj as Club;
if (other == null)
return false;
return Name.Equals(other.Name) && Members.SetEquals(other.Members);
}
}
class Person
{
public string Name { get; set; }
public IInstitution FavouriteInstitution { get; set; }
public override int GetHashCode() { return Name.GetHashCode(); }
public override bool Equals(object obj)
{
Person other = obj as Person;
if (other == null)
return false;
return Name.Equals(other.Name)
&& FavouriteInstitution.Equals(other.FavouriteInstitution);
}
}
class Program
{
public static void Main()
{
Club c1 = new Club { Name = "myClub", Members = new HashSet<Person>() };
Person p1 = new Person { Name = "Johnny", FavouriteInstitution = c1 }
c1.Members.Add(p1);
Club c2 = new Club { Name = "myClub", Members = new HashSet<Person>() };
Person p2 = new Person { Name = "Johnny", FavouriteInstitution = c2 }
c2.Members.Add(p2);
bool c1_and_c2_equal = c1.Equals(c2); // StackOverflowException!
// c1.Equals(c2) calls Members.SetEquals(other.Members)
// Members.SetEquals(other.Members) calls p1.Equals(p2)
// p1.Equals(p2) calls c1.Equals(c2)
}
}
c1_and_c2_equal should return true, and in fact we (humans) can see that they are value-equal with a little bit of thinking, without running into infinite recursion. However, I can't really say how we figure that out. But since it is possible, I hope that there is a way to resolve this problem in code as well!
So the question is: How can I check for value equality without running into infinite recursions?
Note that I need to resolve circular references in general, not only the case from above. I'll call it a 2-circle since c1 references p1, and p1 references c1. There can be other n-circles, e.g. if a club A has a member M whose favourite is club B which has member N whose favourite club is A. That would be a 4-circle. Other object models might also allow n-circles with odd numbers n. I am looking for a way to resolve all these problems at once, since I won't know in advance which value n can have.
An easy workaround (used in RDBMS) is to use a unique Id to identify a Person(any type). Then you don't need to compare every other property and you never run into such cuircular references.
Another way is to compare differently in Equals, so provide the deep check only for the type of the Equals and not for the referenced types. You could use a custom comparer:
public class PersonNameComparer : IEqualityComparer<Person>
{
public bool Equals(Person x, Person y)
{
if (x == null && y == null) return true;
if (x == null || y == null) return false;
if(object.ReferenceEquals(x, y)) return true;
return x.Name == y.Name;
}
public int GetHashCode(Person obj)
{
return obj?.Name?.GetHashCode() ?? int.MinValue;
}
}
Now you can change the Equals implementation of Club to avoid that the Members(Persons) will use their deep check which includes the institution but only their Name:
public override bool Equals(object obj)
{
if (Object.ReferenceEquals(this, obj))
return true;
Club other = obj as Club;
if (other == null)
return false;
var personNameComparer = new PersonNameComparer();
return Name.Equals(other.Name)
&& Members.Count == other.Members.Count
&& !Members.Except(other.Members, personNameComparer).Any();
}
You notice that i can't use SetEquals because there is no overload for my custom comparer.
Following the suggestion of Dryadwoods, I changed the Equals methods so that I can keep track of the items that were already compared.
First we need an equality comparer that checks reference equality for corresponding elements of pairs:
public class ValuePairRefEqualityComparer<T> : IEqualityComparer<(T,T)> where T : class
{
public static ValuePairRefEqualityComparer<T> Instance
= new ValuePairRefEqualityComparer<T>();
private ValuePairRefEqualityComparer() { }
public bool Equals((T,T) x, (T,T) y)
{
return ReferenceEquals(x.Item1, y.Item1)
&& ReferenceEquals(x.Item2, y.Item2);
}
public int GetHashCode((T,T) obj)
{
return RuntimeHelpers.GetHashCode(obj.Item1)
+ 2 * RuntimeHelpers.GetHashCode(obj.Item2);
}
}
And here is the modified Equals method of Club:
static HashSet<(Club,Club)> checkedPairs
= new HashSet<(Club,Club)>(ValuePairRefEqualityComparer<Club>.Instance);
public override bool Equals(object obj)
{
Club other = obj as Club;
if (other == null)
return false;
if (!Name.Equals(other.Name))
return;
if (checkedPairs.Contains((this,other)) || checkedPairs.Contains((other,this)))
return true;
checkedPairs.Add((this,other));
bool membersEqual = Members.SetEquals(other.Members);
checkedPairs.Clear();
return membersEqual;
}
The version for Person is analogous. Note that I add (this,other) to checkedPairs and check if either (this,other) or (other,this) is contained because it might happen that after the first call of c1.Equals(c2), we end up with a call of c2.Equals(c1) instead of c1.Equals(c2). I am not sure if this actually happens, but since I can't see the implementation of SetEquals, I believe it is a possibility.
Since I am not happy with using a static field for the already checked pairs (it will not work if the program is concurrent!), I asked another question: make a variable last for a call stack.
For the general case that I am interested in
-- where we have classes C1, ..., Cn where each of these classes can have any number of VALUES (like int, string, ...) as well as any number of REFERENCES to any other classes of C1, ..., Cn (e.g. by having for each type Ci a field ICollection<Ci>) --
the question "Are two objects A and B equal?", in the sense of equality that I described here,
seems to be EQUIVALENT to
the question "For two finite, directed, connected, colored graphs G and H, does there exist an isomorphism from G to H?".
Here is the equivalence:
graph vertices correspond to objects (class instances)
graph edges correspond to references to objects
color corresponds to the conglomerate of values and the type itself (i.e. colors of two vertices are the same if their corresponding objects have the same type and the same values)
That's an NP-hard question, so I think I'm going to discard my plan to implement this and go with a circular-reference-free approach instead.

Linq/Enumerable Any Vs Contains

I've solved a problem I was having but although I've found out how something works (or doesn't) I'm not clear on why.
As I'm the type of person who likes to know the "why" I'm hoping someone can explain:
I have list of items and associated comments, and I wanted to differentiate between admin comments and user comments, so I tried the following code:
User commentUser = userRepository.GetUserById(comment.userId);
Role commentUserRole = context.Roles.Single(x=>x.Name == "admin");
if(commentUser.Roles.Contains(commentUserRole)
{
//do stuff
}
else
{
// do other stuff
}
Stepping through the code showed that although it had the correct Role object, it didn't recognise the role in the commentUser.Roles
The code that eventually worked is:
if(commentUser.Roles.Any(x=>x.Name == "admin"))
{
//do stuff
}
I'm happy with this because it's less code and in my opinion cleaner, but I don't understand how contains didn't work.
Hoping someone can clear that up for me.
This is probably because you didn't override the equality comparisons (Equals, GetHashCode, operator==) on your Role class. Therefore, it was doing reference comparison, which really isn't the best idea, as if they're not the same object, it makes it think it's a different. You need to override those equality operators to provide value equality.
You have to override Equals (and always also GetHashCode then) if you want to use Contains. Otherwise Equals will just compare references.
So for example:
public class Role
{
public string RoleName{ get; set; }
public int RoleID{ get; set; }
// ...
public override bool Equals(object obj)
{
Role r2 = obj as Role;
if (r2 == null) return false;
return RoleID == r2.RoleID;
}
public override int GetHashCode()
{
return RoleID;
}
public override string ToString()
{
return RoleName;
}
}
Another option is to implement a custom IEqualityComparer<Role> for the overload of Enumerable.Contains:
public class RoleComparer : IEqualityComparer<Role>
{
public bool Equals(Role x, Role y)
{
return x.RoleID.Equals(y.RoleID);
}
public int GetHashCode(Role obj)
{
return obj.RoleID;
}
}
Use it in this way:
var comparer = new RoleComparer();
User commentUser = userRepository.GetUserById(comment.userId);
Role commentUserRole = context.Roles.Single(x=>x.Name == "admin");
if(commentUser.Roles.Contains(commentUserRole, comparer))
{
// ...
}
When using the Contains-method, you check if the the array Roles of the user-object contains the object you have retrieved from the database beforehand. Though the array contains an object for the role "admin" it does not contain the exact object you fetched before.
When using the Any-method you check if there is any role having the name "admin" - and that delivers the expected result.
To get the same result with the Contains-method implement the IEquatable<Role>-interface on the role-class and compare the name to check whether two instances have actually the same value.
It will be your equality comparison for a Role.
The object in commentUserRole is not the same object as the one you are looking for commentUser.Roles.
Your context object will create a new object when you select from it and populate your Roles property with a collection of new Roles. If your context is not tracking the objects in order to return the same object when a second copy is requested then it will be a different object even though all the properties may be the same. Hence the failure of Contains
Your Any clause is explicitly checking the Name property which is why it works
Try making Role implement IEquatable<Role>
public class Role : IEquatable<Role> {
public bool Equals(Role compare) {
return compare != null && this.Name == compare.Name;
}
}
Whilst MSDN shows you only need this for a List<T> you may actually need to override Equals and GetHashCode to make this work
in which case:
public class Role : IEquatable<Role> {
public bool Equals(Role compare) {
return compare != null && this.Name == compare.Name;
}
public override bool Equals(object compare) {
return this.Equals(compare as Role); // this will call the above equals method
}
public override int GetHashCode() {
return this.Name == null ? 0 : this.Name.GetHashCode();
}
}

Is there a way of comparing all the values within 2 entities?

I'm using EF4.3 so I'm referring to entities, however it could apply to any class containing properties.
I'm trying to figure out if its possible to compare 2 entities. Each entity has properties that are assigned values for clarity let say the entity is 'Customer'.
public partial class Customer
{
public string Name { get; set; }
public DateTime DateOfBirth { get; set; }
...
...
}
The customer visits my website and types in some details 'TypedCustomer'. I check this against the database and if some of the data matches, I return a record from the database 'StoredCustomer'.
So at this point I've identified that its the same customer returning but I wan't to valid the rest of the data. I could check each property one by one, but there are a fair few to check. Is it possible to make this comparison at a higher level which takes into account the current values of each?
if(TypedCustomer == StoredCustomer)
{
.... do something
}
If you're storing these things in the database, it is logical to assume you'd also have a primary key called something like Id.
public partial class Customer
{
public int Id { get; set; }
public string Name { get; set; }
public DateTime DateOfBirth { get; set; }
...
...
}
Then all you do is:
if(TypedCustomer.Id == StoredCustomer.Id)
{
}
UPDATE:
In my project, I have a comparer for these circumstances:
public sealed class POCOComparer<TPOCO> : IEqualityComparer<TPOCO> where TPOCO : class
{
public bool Equals(TPOCO poco1, TPOCO poco2)
{
if (poco1 != null && poco2 != null)
{
bool areSame = true;
foreach(var property in typeof(TPOCO).GetPublicProperties())
{
object v1 = property.GetValue(poco1, null);
object v2 = property.GetValue(poco2, null);
if (!object.Equals(v1, v2))
{
areSame = false;
break;
}
});
return areSame;
}
return poco1 == poco2;
} // eo Equals
public int GetHashCode(TPOCO poco)
{
int hash = 0;
foreach(var property in typeof(TPOCO).GetPublicProperties())
{
object val = property.GetValue(poco, null);
hash += (val == null ? 0 : val.GetHashCode());
});
return hash;
} // eo GetHashCode
} // eo class POCOComparer
Uses an extension method:
public static partial class TypeExtensionMethods
{
public static PropertyInfo[] GetPublicProperties(this Type self)
{
self.ThrowIfDefault("self");
return self.GetProperties(BindingFlags.Public | BindingFlags.Instance).Where((property) => property.GetIndexParameters().Length == 0 && property.CanRead && property.CanWrite).ToArray();
} // eo GetPublicProperties
} // eo class TypeExtensionMethods
Most simple seems to use reflexion : get the properties and/or fields you want to compare, and loop through them to compare your two objects.
This will be done with getType(Customer).getProperties and getType(Customer).getFields, then using getValue on each field/property and comparing.
You might want to add custom informations to your fields/properties to define the ones that needs
comparing. This could be done by defining a AttributeUsageAttribute, that would inherit from FlagsAttribute for instance. You'll then have to retrieve and handle those attributes in your isEqualTo method.
I don't think there's much of a purpose to checking the entire object in this scenario - they'd have to type every property in perfectly exactly as they did before, and a simple "do they match" doesn't really tell you a lot. But assuming that's what you want, I can see a few ways of doing this:
1) Just bite the bullet and compare each field. You can do this by overriding the bool Equals method, or IEquatable<T>.Equals, or just with a custom method.
2) Reflection, looping through the properties - simple if your properties are simple data fields, but more complex if you've got complex types to worry about.
foreach (var prop in typeof(Customer).GetProperties()) {
// needs better property and value validation
bool propertyMatches = prop.GetValue(cust1, null)
.Equals(prop.GetValue(cust2, null));
}
3) Serialization - serialize both objects to XML or JSON, and compare the strings.
// JSON.NET
string s1 = JsonConvert.SerializeObject(cust1);
string s2 = JsonConvert.SerializeObject(cust2);
bool match = s1 == s2;

How does HashSet compare elements for equality?

I have a class that is IComparable:
public class a : IComparable
{
public int Id { get; set; }
public string Name { get; set; }
public a(int id)
{
this.Id = id;
}
public int CompareTo(object obj)
{
return this.Id.CompareTo(((a)obj).Id);
}
}
When I add a list of object of this class to a hash set:
a a1 = new a(1);
a a2 = new a(2);
HashSet<a> ha = new HashSet<a>();
ha.add(a1);
ha.add(a2);
ha.add(a1);
Everything is fine and ha.count is 2, but:
a a1 = new a(1);
a a2 = new a(2);
HashSet<a> ha = new HashSet<a>();
ha.add(a1);
ha.add(a2);
ha.add(new a(1));
Now ha.count is 3.
Why doesn't HashSet respect a's CompareTo method.
Is HashSet the best way to have a list of unique objects?
It uses an IEqualityComparer<T> (EqualityComparer<T>.Default unless you specify a different one on construction).
When you add an element to the set, it will find the hash code using IEqualityComparer<T>.GetHashCode, and store both the hash code and the element (after checking whether the element is already in the set, of course).
To look an element up, it will first use the IEqualityComparer<T>.GetHashCode to find the hash code, then for all elements with the same hash code, it will use IEqualityComparer<T>.Equals to compare for actual equality.
That means you have two options:
Pass a custom IEqualityComparer<T> into the constructor. This is the best option if you can't modify the T itself, or if you want a non-default equality relation (e.g. "all users with a negative user ID are considered equal"). This is almost never implemented on the type itself (i.e. Foo doesn't implement IEqualityComparer<Foo>) but in a separate type which is only used for comparisons.
Implement equality in the type itself, by overriding GetHashCode and Equals(object). Ideally, implement IEquatable<T> in the type as well, particularly if it's a value type. These methods will be called by the default equality comparer.
Note how none of this is in terms of an ordered comparison - which makes sense, as there are certainly situations where you can easily specify equality but not a total ordering. This is all the same as Dictionary<TKey, TValue>, basically.
If you want a set which uses ordering instead of just equality comparisons, you should use SortedSet<T> from .NET 4 - which allows you to specify an IComparer<T> instead of an IEqualityComparer<T>. This will use IComparer<T>.Compare - which will delegate to IComparable<T>.CompareTo or IComparable.CompareTo if you're using Comparer<T>.Default.
Here's clarification on a part of the answer that's been left unsaid: The object type of your HashSet<T> doesn't have to implement IEqualityComparer<T> but instead just has to override Object.GetHashCode() and Object.Equals(Object obj).
Instead of this:
public class a : IEqualityComparer<a>
{
public int GetHashCode(a obj) { /* Implementation */ }
public bool Equals(a obj1, a obj2) { /* Implementation */ }
}
You do this:
public class a
{
public override int GetHashCode() { /* Implementation */ }
public override bool Equals(object obj) { /* Implementation */ }
}
It is subtle, but this tripped me up for the better part of a day trying to get HashSet to function the way it is intended. And like others have said, HashSet<a> will end up calling a.GetHashCode() and a.Equals(obj) as necessary when working with the set.
HashSet uses Equals and GetHashCode().
CompareTo is for ordered sets.
If you want unique objects, but you don't care about their iteration order, HashSet<T> is typically the best choice.
constructor HashSet receive object what implement IEqualityComparer for adding new object.
if you whant use method in HashSet you nead overrride Equals, GetHashCode
namespace HashSet
{
public class Employe
{
public Employe() {
}
public string Name { get; set; }
public override string ToString() {
return Name;
}
public override bool Equals(object obj) {
return this.Name.Equals(((Employe)obj).Name);
}
public override int GetHashCode() {
return this.Name.GetHashCode();
}
}
class EmployeComparer : IEqualityComparer<Employe>
{
public bool Equals(Employe x, Employe y)
{
return x.Name.Trim().ToLower().Equals(y.Name.Trim().ToLower());
}
public int GetHashCode(Employe obj)
{
return obj.Name.GetHashCode();
}
}
class Program
{
static void Main(string[] args)
{
HashSet<Employe> hashSet = new HashSet<Employe>(new EmployeComparer());
hashSet.Add(new Employe() { Name = "Nik" });
hashSet.Add(new Employe() { Name = "Rob" });
hashSet.Add(new Employe() { Name = "Joe" });
Display(hashSet);
hashSet.Add(new Employe() { Name = "Rob" });
Display(hashSet);
HashSet<Employe> hashSetB = new HashSet<Employe>(new EmployeComparer());
hashSetB.Add(new Employe() { Name = "Max" });
hashSetB.Add(new Employe() { Name = "Solomon" });
hashSetB.Add(new Employe() { Name = "Werter" });
hashSetB.Add(new Employe() { Name = "Rob" });
Display(hashSetB);
var union = hashSet.Union<Employe>(hashSetB).ToList();
Display(union);
var inter = hashSet.Intersect<Employe>(hashSetB).ToList();
Display(inter);
var except = hashSet.Except<Employe>(hashSetB).ToList();
Display(except);
Console.ReadKey();
}
static void Display(HashSet<Employe> hashSet)
{
if (hashSet.Count == 0)
{
Console.Write("Collection is Empty");
return;
}
foreach (var item in hashSet)
{
Console.Write("{0}, ", item);
}
Console.Write("\n");
}
static void Display(List<Employe> list)
{
if (list.Count == 0)
{
Console.WriteLine("Collection is Empty");
return;
}
foreach (var item in list)
{
Console.Write("{0}, ", item);
}
Console.Write("\n");
}
}
}
I came here looking for answers, but found that all the answers had too much info or not enough, so here is my answer...
Since you've created a custom class you need to implement GetHashCode and Equals. In this example I will use a class Student instead of a because it's easier to follow and doesn't violate any naming conventions. Here is what the implementations look like:
public override bool Equals(object obj)
{
return obj is Student student && Id == student.Id;
}
public override int GetHashCode()
{
return HashCode.Combine(Id);
}
I stumbled across this article from Microsoft that gives an incredibly easy way to implement these if you're using Visual Studio. In case it's helpful to anyone else, here are complete steps for using a custom data type in a HashSet using Visual Studio:
Given a class Student with 2 simple properties and an initializer
public class Student
{
public int Id { get; set; }
public string Name { get; set; }
public Student(int id)
{
this.Id = id;
}
}
To Implement IComparable, add : IComparable<Student> like so:
public class Student : IComparable<Student>
You will see a red squiggly appear with an error message saying your class doesn't implement IComparable. Click on suggestions or press Alt+Enter and use the suggestion to implement it.
You will see the method generated. You can then write your own implementation like below:
public int CompareTo(Student student)
{
return this.Id.CompareTo(student.Id);
}
In the above implementation only the Id property is compared, name is ignored. Next right-click in your code and select Quick actions and refactorings, then Generate Equals and GetHashCode
A window will pop up where you can select which properties to use for hashing and even implement IEquitable if you'd like:
Here is the generated code:
public class Student : IComparable<Student>, IEquatable<Student> {
...
public override bool Equals(object obj)
{
return Equals(obj as Student);
}
public bool Equals(Student other)
{
return other != null && Id == other.Id;
}
public override int GetHashCode()
{
return HashCode.Combine(Id);
}
}
Now if you try to add a duplicate item like shown below it will be skipped:
static void Main(string[] args)
{
Student s1 = new Student(1);
Student s2 = new Student(2);
HashSet<Student> hs = new HashSet<Student>();
hs.Add(s1);
hs.Add(s2);
hs.Add(new Student(1)); //will be skipped
hs.Add(new Student(3));
}
You can now use .Contains like so:
for (int i = 0; i <= 4; i++)
{
if (hs.Contains(new Student(i)))
{
Console.WriteLine($#"Set contains student with Id {i}");
}
else
{
Console.WriteLine($#"Set does NOT contain a student with Id {i}");
}
}
Output:

Categories