Remove duplicates in custom IComparable class - c#

I have a table that has combo pairs identifiers, and I use that to go through CSV files looking for matches. I'm trapping the unidentified pairs in a List, and sending them to an output box for later addition. I would like the output to only have single occurrences of unique pairs. The class is declared as follows:
public class Unmatched:IComparable<Unmatched>
{
public string first_code { get; set; }
public string second_code { get; set; }
public int CompareTo(Unmatched other)
{
if (this.first_code == other.first_code)
{
return this.second_code.CompareTo(other.second_code);
}
return other.first_code.CompareTo(this.first_code);
}
}
One note on the above code: This returns it in reverse alphabetical order, to get it in alphabetical order use this line:
return this.first_code.CompareTo(other.first_code);
Here is the code that adds it. This is directly after the comparison against the datatable elements
unmatched.Add(new Unmatched()
{ first_code = fields[clients[global_index].first_match_column]
, second_code = fields[clients[global_index].second_match_column] });
I would like to remove all pairs from the list where both first code and second code are equal, i.e.;
PTC,138A
PTC,138A
PTC,138A
MA9,5A
MA9,5A
MA9,5A
MA63,138A
MA63,138A
MA59,87BM
MA59,87BM
Should become:
PTC, 138A
MA9, 5A
MA63, 138A
MA59, 87BM
I have tried adding my own Equate and GetHashCode as outlined here:
http://www.morgantechspace.com/2014/01/Use-of-Distinct-with-Custom-Class-objects-in-C-Sharp.html
The SE links I have tried are here:
How would I distinct my list of key/value pairs
Get list of distinct values in List<T> in c#
Get a list of distinct values in List
All of them return a list that still has all the pairs. Here is the current code (Yes, I know there are two distinct lines, neither appears to be working) that outputs the list:
parser.Close();
List<Unmatched> noDupes = unmatched.Distinct().ToList();
noDupes.Sort();
noDupes.Select(x => x.first_code).Distinct();
foreach (var pair in noDupes)
{
txtUnmatchedList.AppendText(pair.first_code + "," + pair.second_code + Environment.NewLine);
}
Here is the Equate/Hash code as requested:
public bool Equals(Unmatched notmatched)
{
//Check whether the compared object is null.
if (Object.ReferenceEquals(notmatched, null)) return false;
//Check whether the compared object references the same data.
if (Object.ReferenceEquals(this, notmatched)) return true;
//Check whether the UserDetails' properties are equal.
return first_code.Equals(notmatched.first_code) && second_code.Equals(notmatched.second_code);
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public override int GetHashCode()
{
//Get hash code for the UserName field if it is not null.
int hashfirst_code = first_code == null ? 0 : first_code.GetHashCode();
//Get hash code for the City field.
int hashsecond_code = second_code.GetHashCode();
//Calculate the hash code for the GPOPolicy.
return hashfirst_code ^ hashsecond_code;
}
I have also looked at a couple of answers that are using queries and Tuples, which I honestly don't understand. Can someone point me to a source or answer that will explain the how (And why) of getting distinct pairs out of a custom list?
(Side question-Can you declare a class as both IComparable and IEquatable?)

The problem is you are not implementing IEquatable<Unmatched>.
public class Unmatched : IComparable<Unmatched>, IEquatable<Unmatched>
EqualityComparer<T>.Default uses the Equals(T) method only if you implement IEquatable<T>. You are not doing this, so it will instead use Object.Equals(object) which uses reference equality.
The overload of Distinct you are calling uses EqualityComparer<T>.Default to compare different elements of the sequence for equality. As the documentation states, the returned comparer uses your implementation of GetHashCode to find potentially-equal elements. It then uses the Equals(T) method to check for equality, or Object.Equals(Object) if you have not implemented IEquatable<T>.
You have an Equals(Unmatched) method, but it will not be used since you are not implementing IEquatable<Unmatched>. Instead, the default Object.Equals method is used which uses reference equality.
Note your current Equals method is not overriding Object.Equals since that takes an Object parameter, and you would need to specify the override modifier.

For an example on using Distinct see here.
You have to implement the IEqualityComparer<TSource> and not IComparable<TSource>.

Related

C# List comparison between custom objects

Pair<BoardLocation, BoardLocation> loc = new Pair<BoardLocation, BoardLocation>( this.getLocation(), l );
if(!this.getPlayer().getMoves().Contains( loc )) {
this.getPlayer().addMove( loc );
}
I'm using a Type I have created called "Pair" but, I'm trying to use the contains function in C# that would compare the two types but, I have used override in the Type "Pair" itself to compare the "ToString()" of both Pair objects being compared. So there are 4 strings being compared. The two Keys and two value. If the two Keys are equal, then the two values are compared. The reason why this makes sense is the Key is the originating(key) location for the location(value) being attacked. If the key and value are the same then the object should not be added.
public override bool Equals( object obj ) {
Pair<K, V> objNode = (Pair<K, V>)obj;
if(this.value.ToString().CompareTo( objNode.value.ToString() ) == 0) {
if(this.key.ToString().CompareTo( objNode.key.ToString() ) == 0) {
return true;
} else
return false;
} else {
return false;
}
}
The question is, Is there a better way to do this that doesn't involve stupid amounts of code or creating new objects for dealing with this. Of course if any ideas involve these, I am all ears. The part that confuses me about this is, perhaps I dont understand what is going on but, I was hoping that C# offered a method that just equivalence of values and not the object memory locations and etc.
I've just ported this from Java as well, and it works exactly the same but, I'm asking this question for C# because I'm hoping there was a better way for me to compare these objects without using ToString() with generic Types.
You can definitely make this code a lot simpler by using && and just returning the value of equality comparisons, instead of all those if statements and return true; or return false; statements.
public override bool Equals (object obj) {
// Safety first: handle the case where the other object isn't
// of the same type, or obj is null. In both cases we should
// return false, rather than throwing an exception
Pair<K, V> otherPair = objNode as Pair<K, V>;
if (otherPair == null) {
return false;
}
return key.ToString() == otherPair.key.ToString() &&
value.ToString() == otherPair.value.ToString();
}
In Java you could use equals rather than compareTo.
Note that these aren't exactly the same as == (and Equals) use an ordinal comparison rather than a culture-sensitive one - but I suspect that's what you want anyway.
I would personally shy away from comparing the values by ToString() representations. I would use the natural equality comparisons of the key and value types instead:
public override bool Equals (object obj) {
// Safety first: handle the case where the other object isn't
// of the same type, or obj is null. In both cases we should
// return false, rather than throwing an exception
Pair<K, V> otherPair = objNode as Pair<K, V>;
if (otherPair == null) {
return false;
}
return EqualityComparer<K>.Default.Equals(key, otherPair.key) &&
EqualityComparer<K>.Default.Equals(value, otherPair.value);
}
(As Avner notes, you could just use Tuple of course...)
As noted in comments, I'd also strongly recommend that you start using properties and C# naming conventions, e.g.:
if (!Player.Moves.Contains(loc)) {
Player.AddMove(loc);
}
The simplest way to improve this is to use, instead of your custom Pair class, an instance of the built-in Tuple<T1,T2> class.
The Tuple class, in addition to giving you an easy way to bundle several values together, automatically implements structural equality, meaning that a Tuple object is equal to another if:
It is a Tuple object.
Its two components are of the same types as the current instance.
Its two components are equal to those of the current instance. Equality is determined by the default object equality comparer for each component.
(from MSDN)
This means that instead of your Pair having to compare its values, you're delegating the responsibility to the types held in the Tuple.

LINQ Except() Method Does Not Work

I have 2 IList<T> of the same type of object ItemsDTO. I want to exclude one list from another. However this does not seem to be working for me and I was wondering why?
IList<ItemsDTO> related = itemsbl.GetRelatedItems();
IList<ItemsDTO> relating = itemsbl.GetRelatingItems().Except(related).ToList();
I'm trying to remove items in related from the relating list.
Since class is a reference type, your ItemsDTO class must override Equals and GetHashCode for that to work.
From MSDN:
Produces the set difference of two sequences by using the default
equality comparer to compare values.
The default equality comparer is going to be a reference comparison. So if those lists are populated independently of each other, they may contain the same objects from your point of view but different references.
When you use LINQ against SQL Server you have the benefit of LINQ translating your LINQ statement to a SQL query that can perform logical equality for you based on primary keys or value comparitors. With LINQ to Objects you'll need to define what logical equality means to ItemsDTO. And that means overriding Equals() as well as GetHashCode().
Except works well for value types. However, since you are using Ref types, you need to override Equals and GethashCode on your ItemsDTO in order to get this to work
I just ran into the same problem. Apparently .NET thinks the items in one list are different from the same items in the other list (even though they are actually the same). This is what I did to fix it:
Have your class inherit IEqualityComparer<T>, eg.
public class ItemsDTO: IEqualityComparer<ItemsDTO>
{
public bool Equals(ItemsDTO x, ItemsDTO y)
{
if (x == null || y == null) return false;
return ReferenceEquals(x, y) || (x.Id == y.Id); // In this example, treat the items as equal if they have the same Id
}
public int GetHashCode(ItemsDTO obj)
{
return this.Id.GetHashCode();
}
}

Using IndexOf to search a combo box

I've inserted a few StaffRole files into a combobox using the below;
for (int i=0; i < staffRoles.Count; i++)
{
user_Role_Combo.Items.Add(staffRoles[i]);
}
I'm trying to search the index of a specific element within the combo box so it displays the correct element when loaded, I've got this, but it just returns -1 everytime;
StaffRole sr = new StaffRole("",roleID);
int comboBoxID = user_Role_Combo.Items.IndexOf(sr);
I'm doing this correct way no?!
In order for your new StaffRole instance to be 'found' in the combobox you need to describe why two StaffRole instances should be considered equivalent.
So you need to override Equals and GetHashCode. Technically, you need only Equals, but these two methods need to be overriden together.
One way to deal with it is to base object equality on roleId equality, like this:
public override int GetHashCode() {
return roleId.GetHashCode();
}
public override bool Equals(object obj) {
if (obj == this) return true;
var other = obj as StaffRole;
if (other == null) return false;
return roleId == other.roleId;
}
I'm doing this correct way no?!
No. By default IndexOf will check if the same reference exists in the items list. Since it's a new StaffRole that you just instanciated, it doesn't exist in the list.
I think what you want to do is compare by ID. To do this, you could override Equals and GetHashCode in the StaffRole class. In your custom Equals method, you would compare two objects by role ID. After doing this, IndexOf will work as you expect it to, by comparing using IDs instead of references.
Perhaps you could use either
FindString(String)
FindStringExact(String)
Both methods will return the index of the element in the list that matches the value of the string parameter that the method receives.
Combobox documentation here.
I didn't want to replace the equal / hashcode mehtods as I need them to be different for different instances.
So, I used some Linq to find the proper element inside the collection:
this.comboBox_group.SelectedIndex =
this.comboBox_group.Items.IndexOf
(comboBox_group.Items.Cast<Group>().Where(x => x.Id == SelectedId).First());

Union-ing two custom classes returns duplicates

I have two custom classes, ChangeRequest and ChangeRequests, where a ChangeRequests can contain many ChangeRequest instances.
public class ChangeRequests : IXmlSerializable, ICloneable, IEnumerable<ChangeRequest>,
IEquatable<ChangeRequests> { ... }
public class ChangeRequest : ICloneable, IXmlSerializable, IEquatable<ChangeRequest>
{ ... }
I am trying to do a union of two ChangeRequests instances. However, duplicates do not seem to be removed. My MSTest unit test is as follows:
var cr1 = new ChangeRequest { CRID = "12" };
var crs1 = new ChangeRequests { cr1 };
var crs2 = new ChangeRequests
{
cr1.Clone(),
new ChangeRequest { CRID = "34" }
};
Assert.AreEqual(crs1[0], crs2[0], "First CR in both ChangeRequests should be equal");
var unionedCRs = new ChangeRequests(crs1.Union<ChangeRequest>(crs2));
ChangeRequests expected = crs2.Clone();
Assert.AreEqual(expected, unionedCRs, "Duplicates should be removed from a Union");
The test fails in the last line, and unionedCRs contains two copies of cr1. When I tried to debug and step through each line, I had a breakpoint in ChangeRequest.Equals(object) on the first line, as well as in the first line of ChangeRequest.Equals(ChangeRequest), but neither were hit. Why does the union contain duplicate ChangeRequest instances?
Edit: as requested, here is ChangeRequests.Equals(ChangeRequests):
public bool Equals(ChangeRequests other)
{
if (ReferenceEquals(this, other))
{
return true;
}
return null != other && this.SequenceEqual<ChangeRequest>(other);
}
And here's ChangeRequests.Equals(object):
public override bool Equals(object obj)
{
return Equals(obj as ChangeRequests);
}
Edit: I overrode GetHashCode on both ChangeRequest and ChangeRequests but still in my test, if I do IEnumerable<ChangeRequest> unionedCRsIEnum = crs1.Union<ChangeRequest>(crs2);, unionedCRsIEnum ends up with two copies of the ChangeRequest with CRID 12.
Edit: something has to be up with my Equals or GetHashCode implementations somewhere, since Assert.AreEqual(expected, unionedCRs.Distinct(), "Distinct should remove duplicates"); fails, and the string representations of expected and unionedCRs.Distinct() show that unionedCRs.Distinct() definitely has two copies of CR 12.
Make sure your GetHashCode implementation is consistent with your Equals - the Enumerable.Union method does appear to use both.
You should get a warning from the compiler if you've implemented one but not the other; it's still up to you to make sure that both methods agree with each other. Here's a convenient summary of the rules: Why is it important to override GetHashCode when Equals method is overridden?
I don't believe that Assert.AreEqual() examines the contents of the sequence - it compares the sequence objects themselves, which are clearly not equal.
What you want is a SequenceEqual() method, that will actually examine the contents of two sequences. This answer may help you. It's a response to a similar question, that describes how to compare to IEnumerable<> sequences.
You could easily take the responder's answer, and create an extension method to make the calls look more like assertions:
public static class AssertionExt
{
public static bool AreSequencesEqual<T>( IEnumerable<T> expected,
IEnumerable<T> sequence )
{
Assert.AreEqual(expected.Count(), sequence .Count());
IEnumerator<Token> e1 = expected.GetEnumerator();
IEnumerator<Token> e2 = sequence .GetEnumerator();
while (e1.MoveNext() && e2.MoveNext())
{
Assert.AreEqual(e1.Current, e2.Current);
}
}
}
Alternatively you could use SequenceEqual(), to compare the sequences, realizing that it won't provide any information about which elements are not equal.
As LBushkin says, Assert.AreEqual will just call Equals on the sequences.
You can use the SequenceEqual extension method though:
Assert.IsTrue(expected.SequenceEqual(unionedCRs));
That won't give much information if it fails, however.
You may want to use the test code we wrote for MoreLINQ which was sequence-focused - if the sequences aren't equal, it will specify in what way they differ. (I'm trying to get a link to the source file in question, but my network connection is rubbish.)

How to find and remove duplicate objects in a collection using LINQ?

I have a simple class representing an object. It has 5 properties (a date, 2 decimals, an integer and a string). I have a collection class, derived from CollectionBase, which is a container class for holding multiple objects from my first class.
My question is, I want to remove duplicate objects (e.g. objects that have the same date, same decimals, same integers and same string). Is there a LINQ query I can write to find and remove duplicates? Or find them at the very least?
You can remove duplicates using the Distinct operator.
There are two overloads - one uses the default equality comparer for your type (which for a custom type will call the Equals() method on the type). The second allows you to supply your own equality comparer. They both return a new sequence representing your original set without duplicates. Neither overload actually modifies your initial collection - they both return a new sequence that excludes duplicates..
If you want to just find the duplicates, you can use GroupBy to do so:
var groupsWithDups = list.GroupBy( x => new { A = x.A, B = x.B, ... }, x => x )
.Where( g => g.Count() > 1 );
To remove duplicates from something like an IList<> you could do:
yourList.RemoveAll( yourList.Except( yourList.Distinct() ) );
If your simple class uses Equals in a manner that satisfies your requirements then you can use the Distinct method
var col = ...;
var noDupes = col.Distinct();
If not then you will need to provide an instance of IEqualityComparer<T> which compares values in the way you desire. For example (null problems ignored for brevity)
public class MyTypeComparer : IEqualityComparer<MyType> {
public bool Equals(MyType left, MyType right) {
return left.Name == right.Name;
}
public int GetHashCode(MyType type) {
return 42;
}
}
var noDupes = col.Distinct(new MyTypeComparer());
Note the use of a constant for GetHashCode is intentional. Without knowing intimate details about the semantics of MyType it is impossible to write an efficient and correct hashing function. In lieu of an efficient hashing function I used a constant which is correct irrespective of the semantics of the type.

Categories