Attempting to merge two Lists<> via Union. Still have duplicates - c#

I have two Lists:
List<L1>, List<L2>
L1 = { detailId = 5, fileName = "string 1" }{ detailId = 5, fileName = "string 2" }
L2 = { detailId = 5, fileName = "string 2" }{ detailId = 5, fileName = "string 3" }
That I want to combine them with no duplicates:
List<L3>
L1 = { detailId = 5, fileName = "string 1" }{ detailId = 5, fileName = "string 2" }{ detailId = 5, fileName = "string 3" }
I've tried:
L1.Union(L2).ToList();
L1.Concat(L2).Distinct().ToList();
But both return with duplicates (1, 2, 2, 3).
Not sure what I'm missing.
edit Here's the method. It takes one list and creates another from a delimited string, and tries to combine them.
private List<Files> CombineList(int detailId, string fileNames)
{
List<Files> f1 = new List<Files>();
List<Files> f2 = new List<Files>();
f1 = GetFiles(detailId, false);
if (f1[0].fileNames != "")
{
string[] names = fileNames.Split('|');
for (int i = 0; i < names.Length; i++)
{
Files x = new Files();
x.detailId = detailId;
x.fileNames = names[i];
f2.Add(x);
}
List<Files> f3 = f1.Union(f2).ToList();
}
return f3;
}

From MSDN, for Union :
The default equality comparer, Default, is used to compare values of
the types that implement the IEqualityComparer(Of T) generic
interface. To compare a custom data type, you need to implement this
interface and provide your own GetHashCode and Equals methods for the
type.
link
Since you use a custom type, you need to either override the GetHashCode and Equals or provide an object that implements the IEqualityComparer interface (preferably IEquatable) and provide it as a second parameter to Union.
Here's a simple example of implementing such a class.

I don't like overriding the Files class equals object and getHashCode since you are interfering with the object. Let another object do that and just pass it in. This way if you have an issue with it later on, just swap it out and pass another IEqualityComparer
Here's an example you can just test out
public void MainMethod()
{
List<Files> f1 = new List<Files>() { new Files() { detailId = 5, fileName = "string 1" }, new Files() { detailId = 5, fileName = "string 2" } };
List<Files> f2 = new List<Files>() { new Files() { detailId = 5, fileName = "string 2" }, new Files() { detailId = 5, fileName = "string 3" } };
var f3 = f1.Union(f2, new FilesComparer()).ToList();
foreach (var f in f3)
{
Console.WriteLine(f.detailId + " " + f.fileName);
}
}
public class Files
{
public int detailId;
public string fileName;
}
public class FilesComparer : IEqualityComparer<Files>
{
public bool Equals(Files x, Files y)
{
return x.fileName == y.fileName;
}
public int GetHashCode(Files obj)
{
return obj.fileName.GetHashCode();
}
}

If your elements do not implement some kind of comparison interface (Object.Equals, IEquatable, IComparable, etc.) then any equality test between them will involve ReferenceEquals, in which two different objects are two different objects, even if all their members contain identical values.

If you're merging a list of objects, you will need to define some criteria for the equality comparisons. The example below demonstrates this:
class MyModelTheUniqueIDComparer : IEqualityComparer<MyModel>
{
public bool Equals(MyModel x, MyModel y)
{
return x.SomeValue == y.SomeValue && x.OtherValue == y.OtherValue;
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public int GetHashCode(MyModel myModel)
{
unchecked
{
int hash = 17;
hash = hash * 31 + myModel.SomeValue.GetHashCode();
hash = hash * 31 + myModel.OtherValue.GetHashCode();
return hash;
}
}
}
Then you can call to get the result:
var result = q1.Union(q2, new MyModelTheUniqueIDComparer());

From MSDN Enumerable.Union Method:
If you want to compare sequences of objects of a custom data type, you
have to implement the IEqualityComparer < T > generic interface in the
class.
A sample implementation specific to your Files class so that the Union works correctly when merging two collections of custom type:
public class Files : IEquatable<Files>
{
public string fileName { get; set; }
public int detailId { get; set; }
public bool Equals(Files other)
{
//Check whether the compared object is null.
if (Object.ReferenceEquals(other, null)) return false;
//Check whether the compared object references the same data.
if (Object.ReferenceEquals(this, other)) return true;
//Check whether the products' properties are equal.
return detailId.Equals(other.detailId) && fileName.Equals(other.fileName);
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public override int GetHashCode()
{
//Get hash code for the fileName field if it is not null.
int hashFileName = fileName == null ? 0 : fileName.GetHashCode();
//Get hash code for the detailId field.
int hashDetailId = detailId.GetHashCode();
//Calculate the hash code for the Files object.
return hashFileName ^ hashDetailId;
}
}

Related

C# Generic variable changes if another same type generic changes

I've this class:
public class Pair<T, V>
{
public T A = default;
public V B = default;
public Pair()
{
A = default;
B = default;
}
public Pair(T a, V b)
{
A = a;
B = b;
}
public override bool Equals(object obj)
{
Pair<T, V> other = obj as Pair<T, V>;
return A.Equals(other.A) && B.Equals(other.B);
}
public override int GetHashCode()
{
return base.GetHashCode();
}
public override string ToString()
{
return "Pair: (" + A.ToString() + " , " + B.ToString() + ")";
}
}
And I have a class with two Pair variables:
public class FakeClass<T>
{
public T LastValue { get; protected set; } = default;
public T CurrentValue = default;
public void Execute()
{
LastValue = CurrentValue
}
}
public class FakeClassWithPair : FakeClass<Pair<int, int>> { }
Now if I execute this code:
FakeClassWithPair fake = new FakeClassWithPair();
fake.CurrentValue.A = 2;
fake.CurrentValue.B = 5;
fake.Execute();
fake.CurrentValue.A = 32;
fake.CurrentValue.B = 53;
In debugging Current Value and Last Value have the same value "32" and "53".
How can I avoid this?
Classes are reference types, so when you set LastValue = CurrentValue, that means both LastValue and CurrentValue refer to the same object.
If you want Value semantics you should declare your Pair as a struct. This means that an assignment does a copy of the value. Except ofc there already are a built in type for this: ValueTuple, with some special syntax that lets you declare types like (int A, int B). There is also a regular Tuple<T1, T2> if you do want a reference type.
Also note that I see no way for your example to run, fake.CurrentValue should be initialized to null and crash when accessed. Using a value type would also solve this, since they cannot be null.
So just change your example to FakeClassWithPair:FakeClass<(int A, int B)> and everything should work as you expect it to.
Definitely do not roll your own class for a pair if you want value semantics. Use the built-in value tuple, defined as (T a, V b).
Also if your content of FakeClass is cloneable then you should take advantage of that (for example arrays are cloneable). So the assignment in Execute() would check if the current value implements ICloneable and proceeds accordingly.
See this example code with output. The first example with fk variable is defined by FakeClass<(int,int)> and the second example with fa variable is defined by FakeClass<int[]>. Some fun code is added to display arrays as list of vales in ToString() in order to mimic the behavior of tuples with arrays.
public class FakeClass<T>
{
public T LastValue { get; protected set; } = default(T);
public T CurrentValue = default(T);
public void Execute()
{
if (CurrentValue is ICloneable cloneable)
{
LastValue = (T)cloneable.Clone();
}
else
{
LastValue = CurrentValue;
}
}
public override string ToString()
{
if (typeof(T).IsArray)
{
object[] last, current;
Array cv = CurrentValue as Array;
if (cv != null)
{
current = new object[cv.Length];
cv.CopyTo(current, 0);
}
else
{
current = new object[0];
}
Array lv = LastValue as Array;
if (lv != null)
{
last = new object[lv.Length];
lv.CopyTo(last, 0);
}
else
{
last = new object[0];
}
return $"Current=[{string.Join(",",current)}], Last=[{string.Join(",",last)}]";
}
return $"Current={CurrentValue}, Last={LastValue}";
}
}
class Program
{
static void Main(string[] args)
{
var fk = new FakeClass<(int a, int b)>();
fk.CurrentValue = (1, 2);
Console.WriteLine(fk);
// Current=(1, 2), Last=(0, 0)
fk.Execute();
fk.CurrentValue = (3, 4);
Console.WriteLine(fk);
// Current=(3, 4), Last=(1, 2)
var fa = new FakeClass<int[]>();
fa.CurrentValue = new int[] { 1, 2 };
Console.WriteLine(fa);
//Current=[1,2], Last=[]
fa.Execute();
fa.CurrentValue = new int[] { 3, 4 };
Console.WriteLine(fa);
//Current=[3,4], Last=[1,2]
}
}

Why does my direct Equals call pass, but fails when nested? [duplicate]

This question already has answers here:
How to compare arrays in C#? [duplicate]
(6 answers)
Closed 3 years ago.
I am attempting to implement Equals overrides for some structs in my code. I have the following "child" struct
public struct ChildStruct
{
public bool Valid;
public int Value1;
public override bool Equals(object obj)
{
if (obj == null || GetType() != obj.GetType())
{
return false;
}
ChildStruct other = (ChildStruct) obj;
return Valid == other.Valid && Surface == other.Value1;
}
}
And this "parent" struct where one member is an array of ChildStructs
public struct ParentStruct
{
public int Id;
public ChildStruct[] children;
public override bool Equals(object obj)
{
if (obj == null || GetType() != obj.GetType())
{
return false;
}
ParentStruct other = (ParentStruct) obj;
// am I comparing ChildStructs array correctly?
return Id == other.Id && children == other.children;
}
}
In my Nunit testing of overriding the Equals methods, directly comparing objects of type ChildStruct pass, but my unit test of the ParentStructs are failing. Am I missing something in the Equals override in the ParentStruct to account for the array? Is the child Equals method not enumerated to all elements in the children array?
Nunit code:
[Test]
public void ChildEqual()
{
var child1 = new ChildStruct{Valid = true, Value1 = 1};
var child2 = new ChildStruct{Valid = true, Value1 = 1};
// passes!
Assert.AreEqual(child1, child2);
}
[Test]
public void ParentEqual()
{
var child1 = new ChildStruct{Valid = true, Value1 = 1};
var child2 = new ChildStruct{Valid = true, Value1 = 1};
var parent1 = new ParentStruct{Id = 1, children = new[] {child1, child2}}
var parent2 = new ParentStruct{Id = 1, children = new[] {child1, child2}}
// fails during checking for equality of children array!
Assert.AreEqual(parent1, parent2);
}
You need to determine what makes two arrays of ChildStructs equal, for the purpose of ParentStruct equality, and change the last line of ParentStruct's equals method accordingly. For example, if they're only supposed to be "equal" if they contain equivalent children in the same order, this would work:
return Id == other.Id && children.SequenceEqual(other.children);

Checking sub-equality of two complex objects

Suppose I have a class like below:
public class A
{
public string prop1 { get; set; }
public List<B> prop2 { get; set; } //B is a user-defined type
// and so on
}
Now suppose I have two objects of type A, a1 and a2. But not all of the properties of them are initialized. I need to check whether a1 has the same values for all the non-null properties of a2 (I don't know what is the term for this, maybe "sub-equality"?).
What I am trying to do is to overwrite Equals() method for all the types used in A, and then iterate through properties of a2 like this, and check the equality of the corresponding properties (if it is a List then I should use HashSet and SetEquals()).
Is there a better (more efficient, less code, etc) way?
There are several techniques, but here is something to start with. I have used an extension method which works all the way down to object (why not), but you could restrict it to higher levels of make it part of your base class itself.
Note that reflection is relatively expensive so if it is the sort of operation that you are doing often, you should consider caching the relevant PropertyInfos.
namespace ConsoleApp1
{
using System;
using System.Collections;
using System.Collections.Generic;
using System.Reflection;
public class BaseClass<T>
{
public string prop1
{
get;
set;
}
public List<T> prop2
{
get;
set;
} //B is a user-defined type
}
public class DescendentClass<T> : BaseClass<T>
{
}
public static class DeepComparer
{
public static bool IsDeeplyEquivalent(this object current, object other)
{
if (current.Equals(other)) return true;
Type currentType = current.GetType();
// Assumption, cannot be equivalent if another class (descendent classes??)
if (currentType != other.GetType())
{
return false;
}
foreach (PropertyInfo propertyInfo in currentType.GetProperties())
{
object currentValue = propertyInfo.GetValue(current, null);
object otherValue = propertyInfo.GetValue(other, null);
// Assumption, nulls for a given property are considered equivalent
if (currentValue == null && otherValue == null)
{
continue;
}
// One is null, the other isn't so are not equal
if (currentValue == null || otherValue == null)
{
return false;
}
ICollection currentCollection = currentValue as ICollection;
if (currentCollection == null)
{
// Not a collection, just check equality
if (!currentValue.Equals(otherValue))
{
return false;
}
}
else
{
// Collection, not interested whether list/array/etc are same object, just that they contain equal elements
// questioner guaranteed that all elements are unique
HashSet<object> elements = new HashSet<object>();
foreach (object o in currentCollection)
{
elements.Add(o);
}
List<object> otherElements = new List<object>();
foreach (object o in (ICollection)otherValue)
{
otherElements.Add(o);
}
// cast below can be safely made because we have already asserted that
// current and other are the same type
if (!elements.SetEquals(otherElements))
{
return false;
}
}
}
return true;
}
}
public class Program
{
public static void Main(string[] args)
{
BaseClass<int> a1 = new BaseClass<int>{prop1 = "Foo", prop2 = new List<int>{1, 2, 3}};
BaseClass<int> a2 = new BaseClass<int>{prop1 = "Foo", prop2 = new List<int>{2, 1, 3}};
BaseClass<int> a3 = new BaseClass<int>{prop1 = "Bar", prop2 = new List<int>{2, 1, 3}};
BaseClass<string> a4 = new BaseClass<string>{prop1 = "Bar", prop2 = new List<string>()};
BaseClass<int> a5 = new BaseClass<int>{prop1 = "Bar", prop2 = new List<int>{1, 3}};
DateTime d1 = DateTime.Today;
DateTime d2 = DateTime.Today;
List<string> s1 = new List<string>{"s1", "s2"};
string[] s2 = {"s1", "s2"};
BaseClass<DayOfWeek> b1 = new BaseClass<DayOfWeek>{prop1 = "Bar", prop2 = new List<DayOfWeek>{DayOfWeek.Monday, DayOfWeek.Tuesday}};
DescendentClass<DayOfWeek> b2 = new DescendentClass<DayOfWeek>{prop1 = "Bar", prop2 = new List<DayOfWeek>{DayOfWeek.Monday, DayOfWeek.Tuesday}};
Console.WriteLine("a1 == a2 : " + a1.IsDeeplyEquivalent(a2)); // true, different order ignored
Console.WriteLine("a1 == a3 : " + a1.IsDeeplyEquivalent(a3)); // false, different prop1
Console.WriteLine("a3 == a4 : " + a3.IsDeeplyEquivalent(a4)); // false, different types
Console.WriteLine("a3 == a5 : " + a3.IsDeeplyEquivalent(a5)); // false, difference in list elements
Console.WriteLine("d1 == d2 : " + d1.IsDeeplyEquivalent(d2)); // true
Console.WriteLine("s1 == s2 : " + s1.IsDeeplyEquivalent(s2)); // false, different types
Console.WriteLine("b1 == b2 : " + s1.IsDeeplyEquivalent(s2)); // false, different types
Console.WriteLine("b1 == b1 : " + b1.IsDeeplyEquivalent(b1)); // true, same object
Console.ReadLine();
}
}
}

Distinct() How to find unique elements in list of objects

There is a very simple class:
public class LinkInformation
{
public LinkInformation(string link, string text, string group)
{
this.Link = link;
this.Text = text;
this.Group = group;
}
public string Link { get; set; }
public string Text { get; set; }
public string Group { get; set; }
public override string ToString()
{
return Link.PadRight(70) + Text.PadRight(40) + Group;
}
}
And I create a list of objects of this class, containing multiple duplicates.
So, I tried using Distinct() to get a list of unique values.
But it does not work, so I implemented
IComparable<LinkInformation>
int IComparable<LinkInformation>.CompareTo(LinkInformation other)
{
return this.ToString().CompareTo(other.ToString());
}
and then...
IEqualityComparer<LinkInformation>
public bool Equals(LinkInformation x, LinkInformation y)
{
return x.ToString().CompareTo(y.ToString()) == 0;
}
public int GetHashCode(LinkInformation obj)
{
int hash = 17;
// Suitable nullity checks etc, of course :)
hash = hash * 23 + obj.Link.GetHashCode();
hash = hash * 23 + obj.Text.GetHashCode();
hash = hash * 23 + obj.Group.GetHashCode();
return hash;
}
The code using the Distinct is:
static void Main(string[] args)
{
string[] filePath = { #"C:\temp\html\1.html",
#"C:\temp\html\2.html",
#"C:\temp\html\3.html",
#"C:\temp\html\4.html",
#"C:\temp\html\5.html"};
int index = 0;
foreach (var path in filePath)
{
var parser = new HtmlParser();
var list = parser.Parse(path);
var unique = list.Distinct();
foreach (var elem in unique)
{
var full = new FileInfo(path).Name;
var file = full.Substring(0, full.Length - 5);
Console.WriteLine((++index).ToString().PadRight(5) + file.PadRight(20) + elem);
}
}
Console.ReadKey();
}
What has to be done to get Distinct() working?
You need to actually pass the IEqualityComparer that you've created to Disctinct when you call it. It has two overloads, one accepting no parameters and one accepting an IEqualityComparer. If you don't provide a comparer the default is used, and the default comparer doesn't compare the objects as you want them to be compared.
If you want to return distinct elements from sequences of objects of some custom data type, you have to implement the IEquatable generic interface in the class.
here is a sample implementation:
public class Product : IEquatable<Product>
{
public string Name { get; set; }
public int Code { get; set; }
public bool Equals(Product other)
{
//Check whether the compared object is null.
if (Object.ReferenceEquals(other, null)) return false;
//Check whether the compared object references the same data.
if (Object.ReferenceEquals(this, other)) return true;
//Check whether the products' properties are equal.
return Code.Equals(other.Code) && Name.Equals(other.Name);
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public override int GetHashCode()
{
//Get hash code for the Name field if it is not null.
int hashProductName = Name == null ? 0 : Name.GetHashCode();
//Get hash code for the Code field.
int hashProductCode = Code.GetHashCode();
//Calculate the hash code for the product.
return hashProductName ^ hashProductCode;
}
}
And this is how you do the actual distinct:
Product[] products = { new Product { Name = "apple", Code = 9 },
new Product { Name = "orange", Code = 4 },
new Product { Name = "apple", Code = 9 },
new Product { Name = "lemon", Code = 12 } };
//Exclude duplicates.
IEnumerable<Product> noduplicates =
products.Distinct();
If you are happy with defining the "distinctness" by a single property, you can do
list
.GroupBy(x => x.Text)
.Select(x => x.First())
to get a list of "unique" items.
No need to mess around with IEqualityComparer et al.
Without using Distinct nor the comparer, how about:
list.GroupBy(x => x.ToString()).Select(x => x.First())
I know this solution is not the answer for the exact question, but I think is valid to be open for other solutions.

Implementing correct GetHashCode

I have the following class
public class ResourceInfo
{
public string Id { get; set; }
public string Url { get; set; }
}
which contains information about some resource.
Now I need the possibility to check if two such resources are equal by the following scenario (I`ve implemented IEquatable interface)
public class ResourceInfo : IEquatable<ResourceInfo>
{
public string Id { get; set; }
public string Url { get; set; }
public bool Equals(ResourceInfo other)
{
if (other == null)
return false;
// Try to match by Id
if (!string.IsNullOrEmpty(Id) && !string.IsNullOrEmpty(other.Id))
{
return string.Equals(Id, other.Id, StringComparison.InvariantCultureIgnoreCase);
}
// Match by Url if can`t match by Id
return string.Equals(Url, other.Url, StringComparison.InvariantCultureIgnoreCase);
}
}
Usage: oneResource.Equals(otherResource). And everything is just fine. But some time have passed and now I need to use such eqaulity comparing in some linq query.
As a result I need to implement separate Equality comparer which looks like this:
class ResourceInfoEqualityComparer : IEqualityComparer<ResourceInfo>
{
public bool Equals(ResourceInfo x, ResourceInfo y)
{
if (x == null || y == null)
return object.Equals(x, y);
return x.Equals(y);
}
public int GetHashCode(ResourceInfo obj)
{
if (obj == null)
return 0;
return obj.GetHashCode();
}
}
Seems to be ok: it makes some validation logic and uses the native equality comparing logic. But then I need to implement GetHashCode method in the ResourceInfo class and that is the place where I have some problem.
I don`t know how to do this correctly without changing the class itself.
At first glance, the following example can work
public override int GetHashCode()
{
// Try to get hashcode from Id
if(!string.IsNullOrEmpty(Id))
return Id.GetHashCode();
// Try to get hashcode from url
if(!string.IsNullOrEmpty(Url))
return Url.GetHashCode();
// Return zero
return 0;
}
But this implementation is not very good.
GetHashCode should match the Equals method : if two objects are equal, then they should have the same hashcode, right? But my Equals method uses two objects to compare them. Here is the usecase, where you can see the problem itself:
var resInfo1 = new ResourceInfo()
{
Id = null,
Url = "http://res.com/id1"
};
var resInfo2 = new ResourceInfo()
{
Id = "id1",
Url = "http://res.com/id1"
};
So, what will happen, when we invoke Equals method: obviously they will be equal, because Equals method will try to match them by Id and fail, then it tries matching by Url and here we have the same values. As intended.
resInfo1.Equals(resInfo1 ) -> true
But then, if they are equal, they should have the same hash codes:
var hash1 = resInfo.GetHashCode(); // -263327347
var hash2 = resInfo.GetHashCode(); // 1511443452
hash1.GetHashCode() == hash2.GetHashCode() -> false
Shortly speaking, the problem is that Equals method decides which field to use for equality comparing by looking at two different objects, while GetHashCode method have access only to one object.
Is there a way to implement it correctly or I just have to change my class to avoid such situations?
Many thanks.
Your approach to equality fundamentally breaks the specifications in Object.Equals.
In particular, consider:
var x = new ResourceInfo { Id = null, Uri = "a" };
var y = new ResourceInfo { Id = "yz", Uri = "a" };
var z = new ResourceInfo { Id = "yz", Uri = "b" };
Here, x.Equals(y) would be true, and y.Equals(z) would be true - but x.Equals(z) would be false. That is specifically prohibited in the documentation:
If (x.Equals(y) && y.Equals(z)) returns true, then x.Equals(z) returns true.
You'll need to redesign, basically.

Categories