Union-ing two custom classes returns duplicates - c#

I have two custom classes, ChangeRequest and ChangeRequests, where a ChangeRequests can contain many ChangeRequest instances.
public class ChangeRequests : IXmlSerializable, ICloneable, IEnumerable<ChangeRequest>,
IEquatable<ChangeRequests> { ... }
public class ChangeRequest : ICloneable, IXmlSerializable, IEquatable<ChangeRequest>
{ ... }
I am trying to do a union of two ChangeRequests instances. However, duplicates do not seem to be removed. My MSTest unit test is as follows:
var cr1 = new ChangeRequest { CRID = "12" };
var crs1 = new ChangeRequests { cr1 };
var crs2 = new ChangeRequests
{
cr1.Clone(),
new ChangeRequest { CRID = "34" }
};
Assert.AreEqual(crs1[0], crs2[0], "First CR in both ChangeRequests should be equal");
var unionedCRs = new ChangeRequests(crs1.Union<ChangeRequest>(crs2));
ChangeRequests expected = crs2.Clone();
Assert.AreEqual(expected, unionedCRs, "Duplicates should be removed from a Union");
The test fails in the last line, and unionedCRs contains two copies of cr1. When I tried to debug and step through each line, I had a breakpoint in ChangeRequest.Equals(object) on the first line, as well as in the first line of ChangeRequest.Equals(ChangeRequest), but neither were hit. Why does the union contain duplicate ChangeRequest instances?
Edit: as requested, here is ChangeRequests.Equals(ChangeRequests):
public bool Equals(ChangeRequests other)
{
if (ReferenceEquals(this, other))
{
return true;
}
return null != other && this.SequenceEqual<ChangeRequest>(other);
}
And here's ChangeRequests.Equals(object):
public override bool Equals(object obj)
{
return Equals(obj as ChangeRequests);
}
Edit: I overrode GetHashCode on both ChangeRequest and ChangeRequests but still in my test, if I do IEnumerable<ChangeRequest> unionedCRsIEnum = crs1.Union<ChangeRequest>(crs2);, unionedCRsIEnum ends up with two copies of the ChangeRequest with CRID 12.
Edit: something has to be up with my Equals or GetHashCode implementations somewhere, since Assert.AreEqual(expected, unionedCRs.Distinct(), "Distinct should remove duplicates"); fails, and the string representations of expected and unionedCRs.Distinct() show that unionedCRs.Distinct() definitely has two copies of CR 12.

Make sure your GetHashCode implementation is consistent with your Equals - the Enumerable.Union method does appear to use both.
You should get a warning from the compiler if you've implemented one but not the other; it's still up to you to make sure that both methods agree with each other. Here's a convenient summary of the rules: Why is it important to override GetHashCode when Equals method is overridden?

I don't believe that Assert.AreEqual() examines the contents of the sequence - it compares the sequence objects themselves, which are clearly not equal.
What you want is a SequenceEqual() method, that will actually examine the contents of two sequences. This answer may help you. It's a response to a similar question, that describes how to compare to IEnumerable<> sequences.
You could easily take the responder's answer, and create an extension method to make the calls look more like assertions:
public static class AssertionExt
{
public static bool AreSequencesEqual<T>( IEnumerable<T> expected,
IEnumerable<T> sequence )
{
Assert.AreEqual(expected.Count(), sequence .Count());
IEnumerator<Token> e1 = expected.GetEnumerator();
IEnumerator<Token> e2 = sequence .GetEnumerator();
while (e1.MoveNext() && e2.MoveNext())
{
Assert.AreEqual(e1.Current, e2.Current);
}
}
}
Alternatively you could use SequenceEqual(), to compare the sequences, realizing that it won't provide any information about which elements are not equal.

As LBushkin says, Assert.AreEqual will just call Equals on the sequences.
You can use the SequenceEqual extension method though:
Assert.IsTrue(expected.SequenceEqual(unionedCRs));
That won't give much information if it fails, however.
You may want to use the test code we wrote for MoreLINQ which was sequence-focused - if the sequences aren't equal, it will specify in what way they differ. (I'm trying to get a link to the source file in question, but my network connection is rubbish.)

Related

Assert.Equal() fails when comparing IEnumerable (using yield return) with itself

Ran into the following issue when writing a unit test for my code. Why does the Assert.Equal() fail when comparing an IEnumerable with itself?
private class ReferenceType { }
[Fact]
public void EnumerableEqualityTest()
{
IEnumerable<ReferenceType> GetEnumerable()
{
yield return new ReferenceType();
}
var enumerable = GetEnumerable();
Assert.Equal(enumerable, enumerable); // fails
}
To understand what is going on we need to understand what Assert.Equal() is actually doing. According to the documentation of Assert.Equal<T>(IEnumerable<T> expected, IEnumerable<T> actual) it "Verifies that two sequences are equivalent, using a default comparer".
The Assert.Equal() in this case iterates the enumerable to check if the individual values are equal. This means that the enumerable is iterated twice for the comparison and that a new instance of ReferenceType is created each time (through yield return). The test fails since the default comparer for a reference type only checks if the instances refer to the same object.
There are at least three ways to get the expected result:
Use the overload for Assert.Equal() that takes an argument of IEqualityComparer<T>.
Override the Equals() method of ReferenceType.
Skip yield return and use a collection that implements IEnumerable instead.
Arguably the first solution is the best since it does not change the implementation of GetEnumerable() or ReferenceType. In this case where GetEnumerable() is only used within the test I would opt for the third option as it is the easiest to do. It could look something like this:
IEnumerable<ReferenceType> GetData()
{
return new[] { new ReferenceType() };
}
or this:
IEnumerable<ReferenceType> GetData()
{
var referenceTypes = new List<ReferenceType>();
// ... add reference types
return referenceTypes;
}
This works since we are now iterating the collection that was created when we got the enumerable and not creating a new instance for each iteration.
I think, you can use to below function for problem solution.
public bool AreEquivalentEnumerator(IEnumerator<TSource> first, IEnumerator<TSource> second)
{
while (first.MoveNext())
{
if (!(second.MoveNext() && Equals(first.Current, second.Current))) return false;
}
if (second.MoveNext()) return false;
return true;
}

Getting List.Join to compare properly

I am trying to create a list by joining two lists if a property matches correctly. I am using the following command:
FooList = TrackedStrings.Join (FooList,
str => str,
Foo => Foo.GetString (),
(str, Foo) => Foo,
new Comparer ())
.ToList ();
And the following class to compare:
public class Comparer : IEqualityComparer<string>
{
public bool Equals (string x, string y)
{
return y.Contains (x);
}
public int GetHashCode (string str)
{
return str.GetHashCode ();
}
}
Now, the idea is that I only want to keep the items that have a GetString () containing any one of the strings from TrackedStrings. However, it doesn't work: the comparer only returns true if the strings are equal. For example, let's say that we have two lists:
List<string> TrackedActions = new List<string> { "Created", "Deleted" };
List<Foo> FooList = new List<FooList> { new Foo ("Created"), new Foo ("Deleted Something")};
With the current command, the second Foo is dropped from the list - instead of matching to TrackedActions[1] and being kept.
Thus, my question is: Why is Comparer not working the way I expect it to?
You should not use IEqualityComparer because The Equals method is reflexive, symmetric, and transitive. MSDN
In your case its not symmetric Equals(a,b) != Equals (b,a)
Glorfindel's answer is not totally correct too, because it's not transitive:
Equals("abcd","bc") == true
Equals("bcde", "bc") == true
Equals("abcd","bcde") == false
A custom comparer must make sure that the Equals relationship it defines is symmetric. This means that whenever x.Equals(y), y.Equals(x) and vice versa.
The reason for this is that you can never predict in which order the elements are compared, i.e. which one of these is called:
aStringFromLeftList.Equals(aStringFromRightList)
or
aStringFromRightList.Equals(aStringFromLeftList)
Because the relationship you need is neither symmetric nor transitive, you can't use a Comparer for your problem.
Your comparer not working is due to the implementation of the GetHashCode()
regardless the right way to implement the IEqualityComparer.
The match is done by
Compare the hashcode of 2 strings. In your case Deleted Something definitely return different hashcode with Deleted
If (1) is equal, then use Equals() to compare again because HashCode may have collision and not accurate, but fast.

Remove duplicates in custom IComparable class

I have a table that has combo pairs identifiers, and I use that to go through CSV files looking for matches. I'm trapping the unidentified pairs in a List, and sending them to an output box for later addition. I would like the output to only have single occurrences of unique pairs. The class is declared as follows:
public class Unmatched:IComparable<Unmatched>
{
public string first_code { get; set; }
public string second_code { get; set; }
public int CompareTo(Unmatched other)
{
if (this.first_code == other.first_code)
{
return this.second_code.CompareTo(other.second_code);
}
return other.first_code.CompareTo(this.first_code);
}
}
One note on the above code: This returns it in reverse alphabetical order, to get it in alphabetical order use this line:
return this.first_code.CompareTo(other.first_code);
Here is the code that adds it. This is directly after the comparison against the datatable elements
unmatched.Add(new Unmatched()
{ first_code = fields[clients[global_index].first_match_column]
, second_code = fields[clients[global_index].second_match_column] });
I would like to remove all pairs from the list where both first code and second code are equal, i.e.;
PTC,138A
PTC,138A
PTC,138A
MA9,5A
MA9,5A
MA9,5A
MA63,138A
MA63,138A
MA59,87BM
MA59,87BM
Should become:
PTC, 138A
MA9, 5A
MA63, 138A
MA59, 87BM
I have tried adding my own Equate and GetHashCode as outlined here:
http://www.morgantechspace.com/2014/01/Use-of-Distinct-with-Custom-Class-objects-in-C-Sharp.html
The SE links I have tried are here:
How would I distinct my list of key/value pairs
Get list of distinct values in List<T> in c#
Get a list of distinct values in List
All of them return a list that still has all the pairs. Here is the current code (Yes, I know there are two distinct lines, neither appears to be working) that outputs the list:
parser.Close();
List<Unmatched> noDupes = unmatched.Distinct().ToList();
noDupes.Sort();
noDupes.Select(x => x.first_code).Distinct();
foreach (var pair in noDupes)
{
txtUnmatchedList.AppendText(pair.first_code + "," + pair.second_code + Environment.NewLine);
}
Here is the Equate/Hash code as requested:
public bool Equals(Unmatched notmatched)
{
//Check whether the compared object is null.
if (Object.ReferenceEquals(notmatched, null)) return false;
//Check whether the compared object references the same data.
if (Object.ReferenceEquals(this, notmatched)) return true;
//Check whether the UserDetails' properties are equal.
return first_code.Equals(notmatched.first_code) && second_code.Equals(notmatched.second_code);
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public override int GetHashCode()
{
//Get hash code for the UserName field if it is not null.
int hashfirst_code = first_code == null ? 0 : first_code.GetHashCode();
//Get hash code for the City field.
int hashsecond_code = second_code.GetHashCode();
//Calculate the hash code for the GPOPolicy.
return hashfirst_code ^ hashsecond_code;
}
I have also looked at a couple of answers that are using queries and Tuples, which I honestly don't understand. Can someone point me to a source or answer that will explain the how (And why) of getting distinct pairs out of a custom list?
(Side question-Can you declare a class as both IComparable and IEquatable?)
The problem is you are not implementing IEquatable<Unmatched>.
public class Unmatched : IComparable<Unmatched>, IEquatable<Unmatched>
EqualityComparer<T>.Default uses the Equals(T) method only if you implement IEquatable<T>. You are not doing this, so it will instead use Object.Equals(object) which uses reference equality.
The overload of Distinct you are calling uses EqualityComparer<T>.Default to compare different elements of the sequence for equality. As the documentation states, the returned comparer uses your implementation of GetHashCode to find potentially-equal elements. It then uses the Equals(T) method to check for equality, or Object.Equals(Object) if you have not implemented IEquatable<T>.
You have an Equals(Unmatched) method, but it will not be used since you are not implementing IEquatable<Unmatched>. Instead, the default Object.Equals method is used which uses reference equality.
Note your current Equals method is not overriding Object.Equals since that takes an Object parameter, and you would need to specify the override modifier.
For an example on using Distinct see here.
You have to implement the IEqualityComparer<TSource> and not IComparable<TSource>.

C# - Asserting two objects are equal in unit tests

Either using Nunit or Microsoft.VisualStudio.TestTools.UnitTesting. Right now my assertion fails.
[TestMethod]
public void GivenEmptyBoardExpectEmptyBoard()
{
var test = new Board();
var input = new Board()
{
Rows = new List<Row>()
{
new Row(){Cells = new List<int>(){0,0,0,0}},
new Row(){Cells = new List<int>(){0,0,0,0}},
new Row(){Cells = new List<int>(){0,0,0,0}},
new Row(){Cells = new List<int>(){0,0,0,0}},
}
};
var expected = new Board()
{
Rows = new List<Row>()
{
new Row(){Cells = new List<int>(){0,0,0,0}},
new Row(){Cells = new List<int>(){0,0,0,0}},
new Row(){Cells = new List<int>(){0,0,0,0}},
new Row(){Cells = new List<int>(){0,0,0,0}},
}
};
var lifeOrchestration = new LifeOrchestration();
var actual = lifeOrchestration.Evolve(input);
Assert.AreEqual(expected, actual);
}
You've got two different Board instances, so your call to Assert.AreEqual will fail. Even if their entire contents appear to be the same, you're comparing references, not the underlying values.
You have to specify what makes two Board instances equal.
You can do it in your test:
Assert.AreEqual(expected.Rows.Count, actual.Rows.Count);
Assert.AreEqual(expected.Rows[0].Cells[0], actual.Rows[0].Cells[0]);
// Lots more tests of equality...
Or you can do it in your classes: (note I wrote this on-the-fly - you'll want to adjust this)
public class Board
{
public List<Row> Rows = new List<Row>();
public override bool Equals(object obj)
{
var board = obj as Board;
if (board == null)
return false;
if (board.Rows.Count != Rows.Count)
return false;
return !board.Rows.Where((t, i) => !t.Equals(Rows[i])).Any();
}
public override int GetHashCode()
{
// determine what's appropriate to return here - a unique board id may be appropriate if available
}
}
public class Row
{
public List<int> Cells = new List<int>();
public override bool Equals(object obj)
{
var row = obj as Row;
if (row == null)
return false;
if (row.Cells.Count != Cells.Count)
return false;
if (row.Cells.Except(Cells).Any())
return false;
return true;
}
public override int GetHashCode()
{
// determine what's appropriate to return here - a unique row id may be appropriate if available
}
}
I used to override getHasCode and equals, but I never liked it since I don't want to change my production code for the sake of unit testing.
Also it's kind of pain.
Then I turned too reflection to compare objects which was less invasive...but that's kind of lot of work (lots of corner cases)
In the end I use:
http://www.nuget.org/packages/DeepEqual/
Works great.
Update, 6 years later:
I now use the more general library fluentassertions for .NET
it does the same as above but with more features and a nice DSL, the specific replacement would be: https://fluentassertions.com/objectgraphs/
PM> Install-Package FluentAssertions
Also after some years of experience, I still not recommend the override route, I'd even consider it a bad practice. If you're not careful you could introduce performance issues when using some Collections like Dictionaries. Also when the time would come where you will have a real business case to overload these methods you'd be in trouble because you'd have this test code in there already. Production code and test code should be kept separated, test code should not rely on implementation details or hacks to achieve their goal, this make them hard to maintain and understand.
For trivial objects, like domain objects or DTOs or entities, you could simply serialize both instances to a string and compare that string:
var object1Json = JsonConvert.SerializeObject(object1);
var object2Json = JsonConvert.SerializeObject(object2);
Assert.AreEqual(object1Json, object2Json);
This of course has various caveats, so evaluate whether for your classes, the JSON contains the expected values that should be compared.
For example, if your class contains unmanaged resources or otherwise not serializable properties, those won't be properly compared. It also only serializes public properties by default.
ExpectedObjects would help you to compare equality by property value. It supports:
simple object: expected.ToExpectedObject().ShouldEqual(actual);
collection: expected.ToExpectedObject().ShouldEqual(actual);
composized object: expected.ToExpectedObject().ShouldEqual(actual);
partial compare: expected object need design with anonymous type, and use expected.ToExpectedObject().ShouldMatch(actual)
I love ExpectedObjects because of I only need to invoke 2 API for assertion of comparing object equality:
ShouldEqual()
ShouldMatch() for partial comparing
I wanted a solution that didn't require adding a dependency, worked with VS unit tests, compared the field values of two objects,and told me all unequal fields. This is what I came up with. Note it could be extended to work with property values as well.
In my case, this works well for comparing the results of some file-parsing logic to ensure two technically "different" entries have fields with the same values.
public class AssertHelper
{
public static void HasEqualFieldValues<T>(T expected, T actual)
{
var failures = new List<string>();
var fields = typeof(T).GetFields(System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.Instance);
foreach(var field in fields)
{
var v1 = field.GetValue(expected);
var v2 = field.GetValue(actual);
if (v1 == null && v2 == null) continue;
if(!v1.Equals(v2)) failures.Add(string.Format("{0}: Expected:<{1}> Actual:<{2}>", field.Name, v1, v2));
}
if (failures.Any())
Assert.Fail("AssertHelper.HasEqualFieldValues failed. " + Environment.NewLine+ string.Join(Environment.NewLine, failures));
}
}
[TestClass]
public class AssertHelperTests
{
[TestMethod]
[ExpectedException(typeof(AssertFailedException))]
public void ShouldFailForDifferentClasses()
{
var actual = new NewPaymentEntry() { acct = "1" };
var expected = new NewPaymentEntry() { acct = "2" };
AssertHelper.HasEqualFieldValues(expected, actual);
}
}
Since neither have yet been mentioned on this question, there are a couple of other well adopted libraries out there that can help with this problem:
Fluent Assertions - see object graph comparison
actual.Should().BeEquivalentTo(expected);
Semantic Comparison
Likeness<MyModel, MyModel>(actual).ShouldEqual(expected);
I personally prefer Fluent Assertions as provides greater flexibility with member exclusions etc and it supports the comparison of nested objects out of the box.
Hope this helps!
Assert methods rely on the object's Equals and GetHashcode. You can implement that, but if this object equality is not needed outside unit tests I would instead consider comparing the individual primitive types on the object.
Looks like the objects are simple enough and overriding of equals is not really warranted.
If you want to compare only properties of a complex type object,
Iterating over object properties will gives all the properties.
You can try below code.
//Assert
foreach(PropertyInfo property in Object1.GetType().GetProperties())
{
Assert.AreEqual(property.GetValue(Object1), property.GetValue(Object2));
}

C# Distinct on IEnumerable<T> with custom IEqualityComparer

Here's what I'm trying to do. I'm querying an XML file using LINQ to XML, which gives me an IEnumerable<T> object, where T is my "Village" class, filled with the results of this query. Some results are duplicated, so I would like to perform a Distinct() on the IEnumerable object, like so:
public IEnumerable<Village> GetAllAlliances()
{
try
{
IEnumerable<Village> alliances =
from alliance in xmlDoc.Elements("Village")
where alliance.Element("AllianceName").Value != String.Empty
orderby alliance.Element("AllianceName").Value
select new Village
{
AllianceName = alliance.Element("AllianceName").Value
};
// TODO: make it work...
return alliances.Distinct(new AllianceComparer());
}
catch (Exception ex)
{
throw new Exception("GetAllAlliances", ex);
}
}
As the default comparer would not work for the Village object, I implemented a custom one, as seen here in the AllianceComparer class:
public class AllianceComparer : IEqualityComparer<Village>
{
#region IEqualityComparer<Village> Members
bool IEqualityComparer<Village>.Equals(Village x, Village y)
{
// Check whether the compared objects reference the same data.
if (Object.ReferenceEquals(x, y))
return true;
// Check whether any of the compared objects is null.
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
return false;
return x.AllianceName == y.AllianceName;
}
int IEqualityComparer<Village>.GetHashCode(Village obj)
{
return obj.GetHashCode();
}
#endregion
}
The Distinct() method doesn't work, as I have exactly the same number of results with or without it. Another thing, and I don't know if it's usually possible, but I cannot step into AllianceComparer.Equals() to see what could be the problem.
I've found examples of this on the Internet, but I can't seem to make my implementation work.
Hopefully, someone here might see what could be wrong here!
Thanks in advance!
The problem is with your GetHashCode. You should alter it to return the hash code of AllianceName instead.
int IEqualityComparer<Village>.GetHashCode(Village obj)
{
return obj.AllianceName.GetHashCode();
}
The thing is, if Equals returns true, the objects should have the same hash code which is not the case for different Village objects with same AllianceName. Since Distinct works by building a hash table internally, you'll end up with equal objects that won't be matched at all due to different hash codes.
Similarly, to compare two files, if the hash of two files are not the same, you don't need to check the files themselves at all. They will be different. Otherwise, you'll continue to check to see if they are really the same or not. That's exactly what the hash table that Distinct uses behaves.
Or change the line
return alliances.Distinct(new AllianceComparer());
to
return alliances.Select(v => v.AllianceName).Distinct();

Categories