I have two employee lists that I want to get only unique records from but this has a twist to it. Each list has an Employee class in it:
public class Employee
{
// I want to completely ignore ID in the comparison
public int ID{ get; set; }
// I want to use FirstName and LastName in comparison
public string FirstName{ get; set; }
public string LastName{ get; set; }
}
The only properties I want to compare on for a match are FirstName and LastName. I want to completely ignore ID in the comparison. The allFulltimeEmployees list has 3 employees in it and the allParttimeEmployees list has 3 employees in it. The first name and last name match on two items in the lists - Sally Jones and Fred Jackson. There is one item in the list that does not match because FirstName is the same, but LastName differs:
emp.id = null; // not populated or used in comparison
emp.FirstName = "Joe"; // same
emp.LastName = "Smith"; // different
allFulltimeEmployees.Add(emp);
emp.id = 3; // not used in comparison
emp.FirstName = "Joe"; // a match
emp.LastName = "Williams"; // not a match - different last name
allParttimeEmployees.Add(emp);
So I want to ignore the ID property in the class during the comparison of the two lists. I want to flag Joe Williams as a non-match since the last names of Smith and Williams in the two lists do not match.
// finalResult should only have Joe Williams in it
var finalResult = allFulltimeEmployees.Except(allParttimeEmployees);
I've tried using an IEqualityComparer but it doesn't work since it is using a single Employee class in the parameters rather than an IEnumerable list:
public class EmployeeEqualityComparer : IEqualityComparer<Employee>
{
public bool Equals(Employee x, Employee y)
{
if (x.FirstName == y.FirstName && x.LastName == y.LastName)
{
return true;
}
else
{
return false;
}
}
public int GetHashCode(Employee obj)
{
return obj.GetHashCode();
}
}
How can I successfully do what I want and perform this operation? Thanks for any help!
Your idea of using the IEqualityComparer is fine, it's your execution that is wrong. Notably, your GetHashCode method.
public int GetHashCode(Employee obj)
{
return obj.GetHashCode();
}
IEqualityComparer defines both Equals and GetHashCode because both are important. Do not ignore GetHashCode when you implement this interface! It plays a pivotal role on equality comparisons. No, it is not an indication that two items are equal, but it is an indicator that two elements are not. Two equal elements must return the same hash code. If they do not, they cannot be considered equal. If they do, then they might be equal, and equality functions only then go on to explore Equals.
With your implementation delegating to the GetHashCode method of the actual employee object, you are relying upon the implementation that Employee class uses. Only if that implementation is overriden will it be useful for you, and only if it is using your key fields. And if it is, then it is very likely that you did not need to define your own external comparer in the first place!
Build a GetHashCode method that factors in your key fields and you will be set.
public int GetHashCode(Employee obj)
{
// null handling omitted for brevity, but you will want to
// handle null values appropriately
return obj.FirstName.GetHashCode() * 117
+ obj.LastName.GetHashCode();
}
Once you have this method in place, then use the comparer in your call to Except.
var comparer = new EmployeeEqualityComparer();
var results = allFulltimeEmployees.Except(allParttimeEmployees, comparer);
You can override Equals and GetHashCode in your Employees class.
For example,
public class Employee
{
// I want to completely ignore ID in the comparison
public int ID { get; set; }
// I want to use FirstName and LastName in comparison
public string FirstName { get; set; }
public string LastName { get; set; }
public override bool Equals(object obj)
{
var other = obj as Employee;
return this.FirstName == other.FirstName && this.LastName == other.LastName;
}
public override int GetHashCode()
{
return this.FirstName.GetHashCode() ^ this.LastName.GetHashCode();
}
}
I tested with the following data set:
var empList1 = new List<Employee>
{
new Employee{ID = 1, FirstName = "D", LastName = "M"},
new Employee{ID = 2, FirstName = "Foo", LastName = "Bar"}
};
var empList2 = new List<Employee>
{
new Employee { ID = 2, FirstName = "D", LastName = "M" },
new Employee { ID = 1, FirstName = "Foo", LastName = "Baz" }
};
var result = empList1.Except(empList2); // Contained "Foo Bar", ID #2.
your IEqualityComparer should work:
var finalResult = allFulltimeEmployees.Except(allParttimeEmployees, new EmployeeEqualityComparer());
Try implementing the IEquatable(T) interface for your Employee class. You simply need to provide an implementation for an Equals() method, which you can define however you want (i.e. ignoring employee IDs).
The IEquatable interface is used by generic collection objects such
as Dictionary, List, and LinkedList when testing
for equality in such methods as Contains, IndexOf, LastIndexOf, and
Remove. It should be implemented for any object that might be stored
in a generic collection.
Example implementation of the Equals() method:
public bool Equals(Employee other)
{
return (other != null) && (FirstName == other.FirstName) && (LastName == other.LastName);
}
It's not the most elegant solution, but you could make a function like so
public string GetKey(Employee emp)
{
return string.Format("{0}#{1}", emp.FirstName, emp.LastName)
}
and then populate everything in allFullTimeEmployees into a Dictionary<string, Employee> where the key of the dictionary is the result of calling GetKey on each employee object. Then you could loop over allParttimeEmployees and call GetKey on each of those, probing into the dictionary (e.g. using TryGetValue or ContainsKey), and taking whatever action was necessary on a duplicate, such as removing the duplicate from the dictionary.
Related
this is my Clients class:
public class Clients
{
public string Email { get; set; }
public string Name { get; set; }
public Clients(string e, string n)
{
Email = e;
Name = n;
}
I want to make a new list which contains the same clients from List A and List B .
For example:
List A - John, Jonathan, James ....
List B - Martha, Jane, Jonathan ....
Unsubscribers - Jonathan
public static List<Clients> SameClients(List<Clients> A, List<Clients> B)
{
List<Clients> Unsubscribers = new List<Clients>();
Unsubscribers = A.Intersect(B).ToList();
return Unsubscribers;
}
However for some reasons I get empty list and I have no idea what's wrong.
The problem is that when you are comparing objects Equals and Gethashcode are used to compare them. You can override these two methods and provide your own implementation based on your needs...there is already an answer below covering how to override these two methods
However, normally I prefer to keep my entities/models (or whatever you want to call them) very simple and keep comparison implementation details away from my models. In that case, you can implement an IEqualityComparer<TSource> and use an overload of Intersects that takes in an IEqualityComparer
Here's an example implementation of IEqualityComprarer based on only the Name property...
public class ClientNameEqualityComparer : IEqualityComparer<Clients>
{
public bool Equals(Clients c1, Clients c2)
{
if (c2 == null && c1 == null)
return true;
else if (c1 == null | c2 == null)
return false;
else if(c1.Name == c2.Name)
return true;
else
return false;
}
public int GetHashCode(Client c)
{
return c.Name.GetHashCode();
}
}
Basically, the implementation above only cares about the Name property, if two instances of Clients have the same value for the Name property, then they are considered equal.
Now you can do the followig...
A.Intersect(B, new ClientNameEqualityComparer()).ToList();
And that will produce the results you are expecting...
Intersect uses GetHashCode and Equals by default, but you haven't overriden it, so Object.Equals is used which just compares references. Since all your client-instances are initialized with new they are separate instances even if they have equal values. That's why Intersect "thinks" that there are no common clients.
So you have several options.
implement a custom IEqualityComparer<Clients> and pass that to Intersect(or many other LINQ methods). This has the advantage that you could implement different comparer for different requirements and you don't need to modify the original class
let Clients override Equals and GetHashCode and /or
let Clients implement IEquatable<Clients>
For example(showing the last two because other answer showed already IEqualityComparer<T>):
public class Clients : IEquatable<Clients>
{
public string Email { get; set; }
public string Name { get; set; }
public Clients(string e, string n)
{
Email = e;
Name = n;
}
public override bool Equals(object obj)
{
return obj is Clients && this.Equals((Clients)obj);
}
public bool Equals(Clients other)
{
return Email == other?.Email == true
&& Name == other?.Name == true;
}
public override int GetHashCode()
{
unchecked
{
int hash = 17;
hash = hash * 23 + (Email?.GetHashCode() ?? 0);
hash = hash * 23 + (Name?.GetHashCode() ?? 0);
return hash;
}
}
}
Worth reading:
Differences between IEquatable<T>, IEqualityComparer<T>, and overriding .Equals() when using LINQ on a custom object collection?
I have two collections and I want to loop through each element and compare the corresponding elements in each collection for equality, thus determining if the collections are identical.
Is this possible with a foreach loop or must I use a counter and access the elements by index?
Generally speaking is there a preferred method for comparing collections for equality, like overloading an operator?
TIA.
You can use .SequenceEqual method which is used for this purpose. Read More.
Examples below if link is down or removed for some reason.
Determines whether two sequences are equal by comparing the elements
by using the default equality comparer for their type.
The SequenceEqual(IEnumerable, IEnumerable)
method enumerates the two source sequences in parallel and compares
corresponding elements by using the default equality comparer for
TSource, Default. The default equality comparer, Default, is used to
compare values of the types that implement the IEqualityComparer
generic interface. To compare a custom data type, you need to
implement this interface and provide your own GetHashCode and Equals
methods for the type.
class Pet
{
public string Name { get; set; }
public int Age { get; set; }
}
public static void SequenceEqualEx1()
{
Pet pet1 = new Pet { Name = "Turbo", Age = 2 };
Pet pet2 = new Pet { Name = "Peanut", Age = 8 };
// Create two lists of pets.
List<Pet> pets1 = new List<Pet> { pet1, pet2 };
List<Pet> pets2 = new List<Pet> { pet1, pet2 };
bool equal = pets1.SequenceEqual(pets2);
Console.WriteLine(
"The lists {0} equal.",
equal ? "are" : "are not");
}
/*
This code produces the following output:
The lists are equal.
*/
If you want to compare the actual data of the objects in the sequences
instead of just comparing their references, you have to implement the
IEqualityComparer generic interface in your class. The following
code example shows how to implement this interface in a custom data
type and provide GetHashCode and Equals methods.
public class Product : IEquatable<Product>
{
public string Name { get; set; }
public int Code { get; set; }
public bool Equals(Product other)
{
//Check whether the compared object is null.
if (Object.ReferenceEquals(other, null)) return false;
//Check whether the compared object references the same data.
if (Object.ReferenceEquals(this, other)) return true;
//Check whether the products' properties are equal.
return Code.Equals(other.Code) && Name.Equals(other.Name);
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public override int GetHashCode()
{
//Get hash code for the Name field if it is not null.
int hashProductName = Name == null ? 0 : Name.GetHashCode();
//Get hash code for the Code field.
int hashProductCode = Code.GetHashCode();
//Calculate the hash code for the product.
return hashProductName ^ hashProductCode;
}
}
Usage:
Product[] storeA = { new Product { Name = "apple", Code = 9 },
new Product { Name = "orange", Code = 4 } };
Product[] storeB = { new Product { Name = "apple", Code = 9 },
new Product { Name = "orange", Code = 4 } };
bool equalAB = storeA.SequenceEqual(storeB);
Console.WriteLine("Equal? " + equalAB);
/*
This code produces the following output:
Equal? True
*/
I've been researching IEqualityComparer and IEquitable.
From posts such as What is the difference between IEqualityComparer<T> and IEquatable<T>? the difference between the two is now clear. "IEqualityComparer is an interface for an object that performs the comparison on two objects of the type T."
Following the example at https://msdn.microsoft.com/en-us/library/ms132151(v=vs.110).aspx the purpose of IEqualityComparer is clear and simple.
I've followed the example at https://dotnetcodr.com/2015/05/05/implementing-the-iequatable-of-t-interface-for-object-equality-with-c-net/ to work out how to use it and I get the following code:
class clsIEquitable
{
public static void mainLaunch()
{
Person personOne = new Person() { Age = 6, Name = "Eva", Id = 1 };
Person personTwo = new Person() { Age = 7, Name = "Eva", Id = 1 };
//If Person didn't inherit from IEquatable, equals would point to different points in memory.
//This means this would be false as both objects are stored in different locations
//By using IEquatable on class it compares the objects directly
bool p = personOne.Equals(personTwo);
bool o = personOne.Id == personTwo.Id;
//Here is trying to compare and Object type with Person type and would return false.
//To ensure this works we added an overrides on the object equals method and it now works
object personThree = new Person() { Age = 7, Name = "Eva", Id = 1 };
bool p2 = personOne.Equals(personThree);
Console.WriteLine("Equatable Check", p.ToString());
}
}
public class Person : IEquatable<Person>
{
public int Id { get; set; }
public string Name { get; set; }
public int Age { get; set; }
public bool Equals(Person other)
{
if (other == null) return false;
return Id == other.Id;
}
//These are to support creating an object and comparing it to person rather than comparing person to person
public override bool Equals(object obj)
{
if (obj is Person)
{
Person p = (Person)obj;
return Equals(p);
}
return false;
}
public override int GetHashCode()
{
return Id;
}
}
My question is WHY would I use it? It seems like a lot of extra code to the simple version below (bool o):
//By using IEquatable on class it compares the objects directly
bool p = personOne.Equals(personTwo);
bool o = personOne.Id == personTwo.Id;
IEquatable<T> is used by generic collections to determine equality.
From this msdn article https://msdn.microsoft.com/en-us/library/ms131187.aspx
The IEquatable interface is used by generic collection objects such as Dictionary, List, and LinkedList when testing for equality in such methods as Contains, IndexOf, LastIndexOf, and Remove. It should be implemented for any object that might be stored in a generic collection.
This provides an added benefit when using structs, since calling the IEquatable<T> equals method does not box the struct like calling the base object equals method would.
I have a class that is IComparable:
public class a : IComparable
{
public int Id { get; set; }
public string Name { get; set; }
public a(int id)
{
this.Id = id;
}
public int CompareTo(object obj)
{
return this.Id.CompareTo(((a)obj).Id);
}
}
When I add a list of object of this class to a hash set:
a a1 = new a(1);
a a2 = new a(2);
HashSet<a> ha = new HashSet<a>();
ha.add(a1);
ha.add(a2);
ha.add(a1);
Everything is fine and ha.count is 2, but:
a a1 = new a(1);
a a2 = new a(2);
HashSet<a> ha = new HashSet<a>();
ha.add(a1);
ha.add(a2);
ha.add(new a(1));
Now ha.count is 3.
Why doesn't HashSet respect a's CompareTo method.
Is HashSet the best way to have a list of unique objects?
It uses an IEqualityComparer<T> (EqualityComparer<T>.Default unless you specify a different one on construction).
When you add an element to the set, it will find the hash code using IEqualityComparer<T>.GetHashCode, and store both the hash code and the element (after checking whether the element is already in the set, of course).
To look an element up, it will first use the IEqualityComparer<T>.GetHashCode to find the hash code, then for all elements with the same hash code, it will use IEqualityComparer<T>.Equals to compare for actual equality.
That means you have two options:
Pass a custom IEqualityComparer<T> into the constructor. This is the best option if you can't modify the T itself, or if you want a non-default equality relation (e.g. "all users with a negative user ID are considered equal"). This is almost never implemented on the type itself (i.e. Foo doesn't implement IEqualityComparer<Foo>) but in a separate type which is only used for comparisons.
Implement equality in the type itself, by overriding GetHashCode and Equals(object). Ideally, implement IEquatable<T> in the type as well, particularly if it's a value type. These methods will be called by the default equality comparer.
Note how none of this is in terms of an ordered comparison - which makes sense, as there are certainly situations where you can easily specify equality but not a total ordering. This is all the same as Dictionary<TKey, TValue>, basically.
If you want a set which uses ordering instead of just equality comparisons, you should use SortedSet<T> from .NET 4 - which allows you to specify an IComparer<T> instead of an IEqualityComparer<T>. This will use IComparer<T>.Compare - which will delegate to IComparable<T>.CompareTo or IComparable.CompareTo if you're using Comparer<T>.Default.
Here's clarification on a part of the answer that's been left unsaid: The object type of your HashSet<T> doesn't have to implement IEqualityComparer<T> but instead just has to override Object.GetHashCode() and Object.Equals(Object obj).
Instead of this:
public class a : IEqualityComparer<a>
{
public int GetHashCode(a obj) { /* Implementation */ }
public bool Equals(a obj1, a obj2) { /* Implementation */ }
}
You do this:
public class a
{
public override int GetHashCode() { /* Implementation */ }
public override bool Equals(object obj) { /* Implementation */ }
}
It is subtle, but this tripped me up for the better part of a day trying to get HashSet to function the way it is intended. And like others have said, HashSet<a> will end up calling a.GetHashCode() and a.Equals(obj) as necessary when working with the set.
HashSet uses Equals and GetHashCode().
CompareTo is for ordered sets.
If you want unique objects, but you don't care about their iteration order, HashSet<T> is typically the best choice.
constructor HashSet receive object what implement IEqualityComparer for adding new object.
if you whant use method in HashSet you nead overrride Equals, GetHashCode
namespace HashSet
{
public class Employe
{
public Employe() {
}
public string Name { get; set; }
public override string ToString() {
return Name;
}
public override bool Equals(object obj) {
return this.Name.Equals(((Employe)obj).Name);
}
public override int GetHashCode() {
return this.Name.GetHashCode();
}
}
class EmployeComparer : IEqualityComparer<Employe>
{
public bool Equals(Employe x, Employe y)
{
return x.Name.Trim().ToLower().Equals(y.Name.Trim().ToLower());
}
public int GetHashCode(Employe obj)
{
return obj.Name.GetHashCode();
}
}
class Program
{
static void Main(string[] args)
{
HashSet<Employe> hashSet = new HashSet<Employe>(new EmployeComparer());
hashSet.Add(new Employe() { Name = "Nik" });
hashSet.Add(new Employe() { Name = "Rob" });
hashSet.Add(new Employe() { Name = "Joe" });
Display(hashSet);
hashSet.Add(new Employe() { Name = "Rob" });
Display(hashSet);
HashSet<Employe> hashSetB = new HashSet<Employe>(new EmployeComparer());
hashSetB.Add(new Employe() { Name = "Max" });
hashSetB.Add(new Employe() { Name = "Solomon" });
hashSetB.Add(new Employe() { Name = "Werter" });
hashSetB.Add(new Employe() { Name = "Rob" });
Display(hashSetB);
var union = hashSet.Union<Employe>(hashSetB).ToList();
Display(union);
var inter = hashSet.Intersect<Employe>(hashSetB).ToList();
Display(inter);
var except = hashSet.Except<Employe>(hashSetB).ToList();
Display(except);
Console.ReadKey();
}
static void Display(HashSet<Employe> hashSet)
{
if (hashSet.Count == 0)
{
Console.Write("Collection is Empty");
return;
}
foreach (var item in hashSet)
{
Console.Write("{0}, ", item);
}
Console.Write("\n");
}
static void Display(List<Employe> list)
{
if (list.Count == 0)
{
Console.WriteLine("Collection is Empty");
return;
}
foreach (var item in list)
{
Console.Write("{0}, ", item);
}
Console.Write("\n");
}
}
}
I came here looking for answers, but found that all the answers had too much info or not enough, so here is my answer...
Since you've created a custom class you need to implement GetHashCode and Equals. In this example I will use a class Student instead of a because it's easier to follow and doesn't violate any naming conventions. Here is what the implementations look like:
public override bool Equals(object obj)
{
return obj is Student student && Id == student.Id;
}
public override int GetHashCode()
{
return HashCode.Combine(Id);
}
I stumbled across this article from Microsoft that gives an incredibly easy way to implement these if you're using Visual Studio. In case it's helpful to anyone else, here are complete steps for using a custom data type in a HashSet using Visual Studio:
Given a class Student with 2 simple properties and an initializer
public class Student
{
public int Id { get; set; }
public string Name { get; set; }
public Student(int id)
{
this.Id = id;
}
}
To Implement IComparable, add : IComparable<Student> like so:
public class Student : IComparable<Student>
You will see a red squiggly appear with an error message saying your class doesn't implement IComparable. Click on suggestions or press Alt+Enter and use the suggestion to implement it.
You will see the method generated. You can then write your own implementation like below:
public int CompareTo(Student student)
{
return this.Id.CompareTo(student.Id);
}
In the above implementation only the Id property is compared, name is ignored. Next right-click in your code and select Quick actions and refactorings, then Generate Equals and GetHashCode
A window will pop up where you can select which properties to use for hashing and even implement IEquitable if you'd like:
Here is the generated code:
public class Student : IComparable<Student>, IEquatable<Student> {
...
public override bool Equals(object obj)
{
return Equals(obj as Student);
}
public bool Equals(Student other)
{
return other != null && Id == other.Id;
}
public override int GetHashCode()
{
return HashCode.Combine(Id);
}
}
Now if you try to add a duplicate item like shown below it will be skipped:
static void Main(string[] args)
{
Student s1 = new Student(1);
Student s2 = new Student(2);
HashSet<Student> hs = new HashSet<Student>();
hs.Add(s1);
hs.Add(s2);
hs.Add(new Student(1)); //will be skipped
hs.Add(new Student(3));
}
You can now use .Contains like so:
for (int i = 0; i <= 4; i++)
{
if (hs.Contains(new Student(i)))
{
Console.WriteLine($#"Set contains student with Id {i}");
}
else
{
Console.WriteLine($#"Set does NOT contain a student with Id {i}");
}
}
Output:
I am writing some code which maps LDAP property names to friendly names and back. There are just simple classes called DirectoryProperty:
public class DirectoryProperty
{
public string Id { get; set; }
public string Name { get; set; }
public string HelpText {get; set; }
public DirectoryProperty(string id, string name)
{
Id = id;
Name = name;
}
}
I then have code using a HashSet to build up a collection of these objects. I've got a fiexed set of properties that I supply but I want to allow others to add their own items. A set seems like a good structure for this because when you query LDAP you don't want to have repeating properties, and this also applies to UI where users select from a list of properties.
public class PropertyMapper
{
readonly HashSet<DirectoryProperty> props = new HashSet<DirectoryProperty>(new DirectoryPropertyComparer());
public PropertyMapper() // Will eventually pass data in here
{
props.Add(new DirectoryProperty("displayName", "Display Name"));
props.Add(new DirectoryProperty("displayName", "Display Name")); //err
props.Add(new DirectoryProperty("xyz", "Profile Path")); //err
props.Add(new DirectoryProperty("samAccountName", "User Account Name"));
props.Add(new DirectoryProperty("mobile", "Mobile Number"));
props.Add(new DirectoryProperty("profilePath", "Profile Path"));
}
public List<string> GetProperties()
{
return props.Select(directoryProperty => directoryProperty.Id).OrderBy(p => p).ToList();
}
public List<string> GetFriendlyNames()
{
return props.Select(directoryProperty => directoryProperty.Name).OrderBy(p => p).ToList();
}
}
As you can see I've got 2 problem data items in the constructor right now. The first of these is an obvious duplicate, and the other is a duplicate based on the Name property of DirectoryProperty.
My initial implementation of IEqualityComparer looks like:
class DirectoryPropertyComparer : IEqualityComparer<DirectoryProperty>
{
public bool Equals(DirectoryProperty x, DirectoryProperty y)
{
if (x.Id.ToLower() == y.Id.ToLower() || x.Name.ToLower() == y.Name.ToLower())
{
return true;
}
return false;
}
public int GetHashCode(DirectoryProperty obj)
{
return (obj.Id.Length ^ obj.Name.Length).GetHashCode();
}
}
Is there anything I can do to ensure that the Id, and Name properties of DirectoryProperty are both checked for uniqueness to ensure that duplicates based on either are caught? I'm probably being too strict here and I live with my existing code because it seems like it handles duplicate Id's OK but I'm interested in learning more about this.
It´s unclear exactly what you´re trying to do:
Your equality method considers two values equal if either their names or their IDs are the same
Your GetHashCode method includes both values, so (accidental collisions aside) you´re only matching if both the name and the ID have the same lengths in two objects
Fundamentaly, the first approach is flawed. Consider three entries:
A = { Name=¨N1¨, ID=¨ID1¨ }
B = { Name=¨N2¨, ID=¨ID1¨ }
C = { Name=¨N2¨, ID=¨ID2¨ }
You appear to want:
A.Equals(B) - true
B.Equals(C) - true
A.Equals(C) - false
That violates the rules of Equality (transitivity).
I strongly suggest you simply have two sets - one comparing values by ID, the other comparing values by Name. Then write a method to only add an entry to both sets if it doesn´t occur in either of them.
This approach is not going to work. The Equals method needs to define an equivalence relationship, and such a relationship cannot be defined in this manner.
An equivalence relationship must be transitive, but this relation is not transitive.
{"A", "B"} == {"C", "B"}
and
{"A", "B"} == {"A", "D"}
but
{"C", "B"} != {"A", "D"}
A better approach would be to create two dictionaries—one for ID and one for Name—and check both dictionaries for collisions before adding a new value.