Linq - Select distinct objects [duplicate] - c#

This question already has answers here:
Distinct not working with LINQ to Objects [duplicate]
(11 answers)
Closed 4 years ago.
I am trying to extract distinct objects by their values to have the unique CurrencyISO I have in the .csv.
public List<CurrencyDetail> InitGSCheckComboCurrency()
{
var lines = File.ReadAllLines("Data/AccountsGSCheck.csv");
var data = (from l in lines.Skip(1)
let split = l.Split(',')
select new CurrencyDetail
{
ISOName = split[3],
ISOCode = split[3]
}).Distinct();
List<CurrencyDetail> lstSrv = new List<CurrencyDetail>();
lstSrv = data.ToList();
return lstSrv;
}
However, the distinct function does not work for this and I end up with duplications.

You would need to define the Equals and GetHashCode of CurrencyDetail to do what you want. Quick and dirty solution:
var data = (from l in lines.Skip(1)
let split = l.Split(',')
select new
{
ISOName = split[3],
ISOCode = split[3]
}).Distinct()
.Select(x => new CurrencyDetail
{
ISOName = x.ISOName,
ISOCode = x.ISOCode
};
Anonymous types (the first new { ... }) automatically define sensible Equals() and GetHashCode(). Normally I wouldn't do this, because you are creating objects to then discard them. For this reason it is a quick and dirty solution.
Note that you are using twice split[3]... an error?
Now, a fully equatable version of CurrencyDetail could be:
public class CurrencyDetail : IEquatable<CurrencyDetail>
{
public string ISOName { get; set; }
public string ISOCode { get; set; }
public override bool Equals(object obj)
{
// obj is object, so we can use its == operator
if (obj == null)
{
return false;
}
CurrencyDetail other = obj as CurrencyDetail;
if (object.ReferenceEquals(other, null))
{
return false;
}
return this.InnerEquals(other);
}
public bool Equals(CurrencyDetail other)
{
if (object.ReferenceEquals(other, null))
{
return false;
}
return this.InnerEquals(other);
}
private bool InnerEquals(CurrencyDetail other)
{
// Here we know that other != null;
if (object.ReferenceEquals(this, other))
{
return true;
}
return this.ISOName == other.ISOName && this.ISOCode == other.ISOCode;
}
public override int GetHashCode()
{
unchecked
{
// From http://stackoverflow.com/a/263416/613130
int hash = 17;
hash = hash * 23 + (this.ISOName != null ? this.ISOName.GetHashCode() : 0);
hash = hash * 23 + (this.ISOCode != null ? this.ISOCode.GetHashCode() : 0);
return hash;
}
}
}
With this you can use the Distinct() as used by your code.

Related

Why does my direct Equals call pass, but fails when nested? [duplicate]

This question already has answers here:
How to compare arrays in C#? [duplicate]
(6 answers)
Closed 3 years ago.
I am attempting to implement Equals overrides for some structs in my code. I have the following "child" struct
public struct ChildStruct
{
public bool Valid;
public int Value1;
public override bool Equals(object obj)
{
if (obj == null || GetType() != obj.GetType())
{
return false;
}
ChildStruct other = (ChildStruct) obj;
return Valid == other.Valid && Surface == other.Value1;
}
}
And this "parent" struct where one member is an array of ChildStructs
public struct ParentStruct
{
public int Id;
public ChildStruct[] children;
public override bool Equals(object obj)
{
if (obj == null || GetType() != obj.GetType())
{
return false;
}
ParentStruct other = (ParentStruct) obj;
// am I comparing ChildStructs array correctly?
return Id == other.Id && children == other.children;
}
}
In my Nunit testing of overriding the Equals methods, directly comparing objects of type ChildStruct pass, but my unit test of the ParentStructs are failing. Am I missing something in the Equals override in the ParentStruct to account for the array? Is the child Equals method not enumerated to all elements in the children array?
Nunit code:
[Test]
public void ChildEqual()
{
var child1 = new ChildStruct{Valid = true, Value1 = 1};
var child2 = new ChildStruct{Valid = true, Value1 = 1};
// passes!
Assert.AreEqual(child1, child2);
}
[Test]
public void ParentEqual()
{
var child1 = new ChildStruct{Valid = true, Value1 = 1};
var child2 = new ChildStruct{Valid = true, Value1 = 1};
var parent1 = new ParentStruct{Id = 1, children = new[] {child1, child2}}
var parent2 = new ParentStruct{Id = 1, children = new[] {child1, child2}}
// fails during checking for equality of children array!
Assert.AreEqual(parent1, parent2);
}
You need to determine what makes two arrays of ChildStructs equal, for the purpose of ParentStruct equality, and change the last line of ParentStruct's equals method accordingly. For example, if they're only supposed to be "equal" if they contain equivalent children in the same order, this would work:
return Id == other.Id && children.SequenceEqual(other.children);

Can't get distinct items from a list using LINQ [duplicate]

This question already has answers here:
Select distinct using linq [duplicate]
(4 answers)
Closed 5 years ago.
I have following class in C# and I'm trying to find a distinct list of items.
The list has 24 elements.
public enum DbObjectType
{
Unknown,
Procedure,
Function,
View
}
public class DbObject
{
public string DatabaseName { get; set; }
public string SchemaName { get; set; }
public string ObjectName { get; set; }
public DbObjectType ObjectType { get; set; }
}
I have tow approach and expect to get the same result but I don't.
the first expression returns me the same list (includes duplicates)
var lst1 = from c in DependantObject
group c by new DbObject
{
DatabaseName = c.DatabaseName,
SchemaName = c.SchemaName,
ObjectName = c.ObjectName,
ObjectType = c.ObjectType
} into grp
select grp.First();
lst1 will have 24 items.
but this one returns the desired result.
var lst2 = from c in DependantObject
group c by new
{
DatabaseName = c.DatabaseName,
SchemaName = c.SchemaName,
ObjectName = c.ObjectName,
ObjectType = c.ObjectType
} into grp
select grp.First();
lst2 will have 10 items.
The only difference is the second expression is anonymous but the first one is typed.
I'm interested to understand this behavior.
Thank you!
I believe my question is not duplicate of mentioned one because:
What I'm asking here is not how to get the distinct list. I'm asking why Typed and Anonymous data are returning different result.
Linq's Distinct() method requires an override of GetHashCode and Equals.
C#'s anoynmous types (the new { Name = value } syntax) creates classes that do override those methods, but your own DbObject type does not.
You can also create a a custom IEqualityComparer type too. Look at StructuralComparisons.StructuralEqualityComparer too.
Option 1:
public class DbObject : IEquatable<DbObject> {
public override Int32 GetHashCode() {
// See https://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode
unchecked
{
int hash = 17;
hash = hash * 23 + this.DatabaseName.GetHashCode();
hash = hash * 23 + this.SchemaName.GetHashCode();
hash = hash * 23 + this.ObjectName.GetHashCode();
hash = hash * 23 + this.ObjectType.GetHashCode();
return hash;
}
}
public override Boolean Equals(Object other) {
return this.Equals( other as DbObject );
}
public Boolean Equals(DbObject other) {
if( other == null ) return false;
return
this.DatabaseName.Equals( other.DatabaseName ) &&
this.SchemaName.Equals( other.SchemaName) &&
this.ObjectName.Equals( other.ObjectName ) &&
this.ObjectType.Equals( other.ObjectType);
}
}
Option 2:
class DbObjectComparer : IEqualityComparer {
public Boolean Equals(DbObject x, DbObject y) {
if( Object.ReferenceEquals( x, y ) ) return true;
if( (x == null) != (y == null) ) return false;
if( x == null && y == null ) return true;
return
x.DatabaseName.Equals( y.DatabaseName ) &&
x.SchemaName.Equals( y.SchemaName) &&
x.ObjectName.Equals( y.ObjectName ) &&
x.ObjectType.Equals( y.ObjectType);
}
public override Int32 GetHashCode(DbObject obj) {
unchecked
{
int hash = 17;
// Suitable nullity checks etc, of course :)
hash = hash * 23 + obj.DatabaseName.GetHashCode();
hash = hash * 23 + obj.SchemaName.GetHashCode();
hash = hash * 23 + obj.ObjectName.GetHashCode();
hash = hash * 23 + obj.ObjectType.GetHashCode();
return hash;
}
}
}
Option 2 usage:
var query = this.DependantObject
.GroupBy( c => new DbObject() {
DatabaseName = c.DatabaseName,
SchemaName = c.SchemaName,
ObjectName = c.ObjectName,
ObjectType = c.ObjectType
} )
.First();
Using GroupBy might be suboptimal, you could use Linq Distinct directly:
var query = this.DependantObject
.Select( c => new DbObject() {
DatabaseName = c.DatabaseName,
SchemaName = c.SchemaName,
ObjectName = c.ObjectName,
ObjectType = c.ObjectType
} )
.Distinct()
.First();

Distinct() How to find unique elements in list of objects

There is a very simple class:
public class LinkInformation
{
public LinkInformation(string link, string text, string group)
{
this.Link = link;
this.Text = text;
this.Group = group;
}
public string Link { get; set; }
public string Text { get; set; }
public string Group { get; set; }
public override string ToString()
{
return Link.PadRight(70) + Text.PadRight(40) + Group;
}
}
And I create a list of objects of this class, containing multiple duplicates.
So, I tried using Distinct() to get a list of unique values.
But it does not work, so I implemented
IComparable<LinkInformation>
int IComparable<LinkInformation>.CompareTo(LinkInformation other)
{
return this.ToString().CompareTo(other.ToString());
}
and then...
IEqualityComparer<LinkInformation>
public bool Equals(LinkInformation x, LinkInformation y)
{
return x.ToString().CompareTo(y.ToString()) == 0;
}
public int GetHashCode(LinkInformation obj)
{
int hash = 17;
// Suitable nullity checks etc, of course :)
hash = hash * 23 + obj.Link.GetHashCode();
hash = hash * 23 + obj.Text.GetHashCode();
hash = hash * 23 + obj.Group.GetHashCode();
return hash;
}
The code using the Distinct is:
static void Main(string[] args)
{
string[] filePath = { #"C:\temp\html\1.html",
#"C:\temp\html\2.html",
#"C:\temp\html\3.html",
#"C:\temp\html\4.html",
#"C:\temp\html\5.html"};
int index = 0;
foreach (var path in filePath)
{
var parser = new HtmlParser();
var list = parser.Parse(path);
var unique = list.Distinct();
foreach (var elem in unique)
{
var full = new FileInfo(path).Name;
var file = full.Substring(0, full.Length - 5);
Console.WriteLine((++index).ToString().PadRight(5) + file.PadRight(20) + elem);
}
}
Console.ReadKey();
}
What has to be done to get Distinct() working?
You need to actually pass the IEqualityComparer that you've created to Disctinct when you call it. It has two overloads, one accepting no parameters and one accepting an IEqualityComparer. If you don't provide a comparer the default is used, and the default comparer doesn't compare the objects as you want them to be compared.
If you want to return distinct elements from sequences of objects of some custom data type, you have to implement the IEquatable generic interface in the class.
here is a sample implementation:
public class Product : IEquatable<Product>
{
public string Name { get; set; }
public int Code { get; set; }
public bool Equals(Product other)
{
//Check whether the compared object is null.
if (Object.ReferenceEquals(other, null)) return false;
//Check whether the compared object references the same data.
if (Object.ReferenceEquals(this, other)) return true;
//Check whether the products' properties are equal.
return Code.Equals(other.Code) && Name.Equals(other.Name);
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public override int GetHashCode()
{
//Get hash code for the Name field if it is not null.
int hashProductName = Name == null ? 0 : Name.GetHashCode();
//Get hash code for the Code field.
int hashProductCode = Code.GetHashCode();
//Calculate the hash code for the product.
return hashProductName ^ hashProductCode;
}
}
And this is how you do the actual distinct:
Product[] products = { new Product { Name = "apple", Code = 9 },
new Product { Name = "orange", Code = 4 },
new Product { Name = "apple", Code = 9 },
new Product { Name = "lemon", Code = 12 } };
//Exclude duplicates.
IEnumerable<Product> noduplicates =
products.Distinct();
If you are happy with defining the "distinctness" by a single property, you can do
list
.GroupBy(x => x.Text)
.Select(x => x.First())
to get a list of "unique" items.
No need to mess around with IEqualityComparer et al.
Without using Distinct nor the comparer, how about:
list.GroupBy(x => x.ToString()).Select(x => x.First())
I know this solution is not the answer for the exact question, but I think is valid to be open for other solutions.

Overloading Linq Except to allow custom struct with byte array

I am having a problem with a custom struct and overloading linq's except method to remove duplicates.
My struct is as follows:
public struct hashedFile
{
string _fileString;
byte[] _fileHash;
public hashedFile(string fileString, byte[] fileHash)
{
this._fileString = fileString;
this._fileHash = fileHash;
}
public string FileString { get { return _fileString; } }
public byte[] FileHash { get { return _fileHash; } }
}
Now, the following code works fine:
public static void test2()
{
List<hashedFile> list1 = new List<hashedFile>();
List<hashedFile> list2 = new List<hashedFile>();
hashedFile one = new hashedFile("test1", BitConverter.GetBytes(1));
hashedFile two = new hashedFile("test2", BitConverter.GetBytes(2));
hashedFile three = new hashedFile("test3", BitConverter.GetBytes(3));
hashedFile threeA = new hashedFile("test3", BitConverter.GetBytes(4));
hashedFile four = new hashedFile("test4", BitConverter.GetBytes(4));
list1.Add(one);
list1.Add(two);
list1.Add(threeA);
list1.Add(four);
list2.Add(one);
list2.Add(two);
list2.Add(three);
List<hashedFile> diff = list1.Except(list2).ToList();
foreach (hashedFile h in diff)
{
MessageBox.Show(h.FileString + Environment.NewLine + h.FileHash[0].ToString("x2"));
}
}
This code shows "threeA" and "four" just fine. But if I do the following.
public static List<hashedFile> list1(var stuff1)
{
//Generate a List here and return it
}
public static List<hashedFile> list2(var stuff2)
{
//Generate a List here and return it
}
List<hashedFile> diff = list1.except(list2);
"diff" becomes an exact copy of "list1". I should also mention that I am sending a byte array from ComputeHash from System.Security.Cryptography.MD5 to the byte fileHash in the list generations.
Any ideas on how to overload either the Except or GetHashCode method for linq to successfully exclude the duplicate values from list2?
I'd really appreciate it! Thanks!
~MrFreeman
EDIT: Here was how I was originally trying to use List<hashedFile> diff = newList.Except(oldList, new hashedFileComparer()).ToList();
class hashedFileComparer : IEqualityComparer<hashedFile>
{
public bool Equals(hashedFile x, hashedFile y)
{
if (Object.ReferenceEquals(x, y)) return true;
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
return false;
return x.FileString == y.FileString && x.FileHash == y.FileHash;
}
public int GetHashCode(hashedFile Hashedfile)
{
if (Object.ReferenceEquals(Hashedfile, null)) return 0;
int hashFileString = Hashedfile.FileString == null ? 0 : Hashedfile.FileString.GetHashCode();
int hashFileHash = Hashedfile.FileHash.GetHashCode();
int returnVal = hashFileString ^ hashFileHash;
if (Hashedfile.FileString.Contains("blankmusic") == true)
{
Console.WriteLine(returnVal.ToString());
}
return returnVal;
}
}
If you want the type to handle its own comparisons in Except the interface you need is IEquatable. The IEqualityComparer interface is to have another type handle the comparisons so it can be passed into Except as an overload.
This achieves what you want (assuming you wanted both file string and hash compared).
public struct hashedFile : IEquatable<hashedFile>
{
string _fileString;
byte[] _fileHash;
public hashedFile(string fileString, byte[] fileHash)
{
this._fileString = fileString;
this._fileHash = fileHash;
}
public string FileString { get { return _fileString; } }
public byte[] FileHash { get { return _fileHash; } }
public bool Equals(hashedFile other)
{
return _fileString == other._fileString && _fileHash.SequenceEqual(other._fileHash);
}
}
Here is an example in a working console application.
public class Program
{
public struct hashedFile : IEquatable<hashedFile>
{
string _fileString;
byte[] _fileHash;
public hashedFile(string fileString, byte[] fileHash)
{
this._fileString = fileString;
this._fileHash = fileHash;
}
public string FileString { get { return _fileString; } }
public byte[] FileHash { get { return _fileHash; } }
public bool Equals(hashedFile other)
{
return _fileString == other._fileString && _fileHash.SequenceEqual(other._fileHash);
}
}
public static void Main(string[] args)
{
List<hashedFile> list1 = GetList1();
List<hashedFile> list2 = GetList2();
List<hashedFile> diff = list1.Except(list2).ToList();
foreach (hashedFile h in diff)
{
Console.WriteLine(h.FileString + Environment.NewLine + h.FileHash[0].ToString("x2"));
}
Console.ReadLine();
}
private static List<hashedFile> GetList1()
{
hashedFile one = new hashedFile("test1", BitConverter.GetBytes(1));
hashedFile two = new hashedFile("test2", BitConverter.GetBytes(2));
hashedFile threeA = new hashedFile("test3", BitConverter.GetBytes(4));
hashedFile four = new hashedFile("test4", BitConverter.GetBytes(4));
var list1 = new List<hashedFile>();
list1.Add(one);
list1.Add(two);
list1.Add(threeA);
list1.Add(four);
return list1;
}
private static List<hashedFile> GetList2()
{
hashedFile one = new hashedFile("test1", BitConverter.GetBytes(1));
hashedFile two = new hashedFile("test2", BitConverter.GetBytes(2));
hashedFile three = new hashedFile("test3", BitConverter.GetBytes(3));
var list1 = new List<hashedFile>();
list1.Add(one);
list1.Add(two);
list1.Add(three);
return list1;
}
}
This is becoming quite large but I will continue there is an issue with above implementation if hashedFile is a class not a struct (and sometimes when a stuct maybe version depdendant). Except uses an internal Set class the relevant part of that which is problematic is that it compares the hash codes and only if they are equal does it then use the comparer to check equality.
int hashCode = this.InternalGetHashCode(value);
for (int i = this.buckets[hashCode % this.buckets.Length] - 1; i >= 0; i = this.slots[i].next)
{
if ((this.slots[i].hashCode == hashCode) && this.comparer.Equals(this.slots[i].value, value))
{
return true;
}
}
The fix for this depending on performance requirements is you can just return a 0 hash code. This means the comparer will always be used.
public override int GetHashCode()
{
return 0;
}
The other option is to generate a proper hash code this matters sooner than I expected the difference for 500 items is 7ms vs 1ms and for 5000 items is 650ms vs 13ms. So probably best to go with a proper hash code. byte array hash code function taken from https://stackoverflow.com/a/7244316/1002621
public override int GetHashCode()
{
var hashCode = 0;
var bytes = _fileHash.Union(Encoding.UTF8.GetBytes(_fileString)).ToArray();
for (var i = 0; i < bytes.Length; i++)
hashCode = (hashCode << 3) | (hashCode >> (29)) ^ bytes[i]; // Rotate by 3 bits and XOR the new value.
return hashCode;
}

remove duplicates from linq query c#

This is my linq query and I get lots of duplicates with school names.
so I created a regex function to trim the text:
public static string MyTrimmings(string str)
{
return Regex.Replace(str, #"^\s*$\n", string.Empty, RegexOptions.Multiline).TrimEnd();
}
the text gets trimed alright, however, the dropdown values are all duplicates! please help me eliminate duplicates, oh Linq joy!!
ViewBag.schools = new[]{new SelectListItem
{
Value = "",
Text = "All"
}}.Concat(
db.Schools.Where(x => (x.name != null)).OrderBy(o => o.name).ToList().Select(s => new SelectListItem
{
Value = MyTrimmings(s.name),
Text = MyTrimmings(s.name)
}).Distinct()
);
Distinct is poor, GroupBy for the win:
db.Schools.GroupBy(school => school.name).Select(grp => grp.First());
Assuming you have a School class you can write an IEqualityComparer
class SchoolComparer : IEqualityComparer<School>
{
public bool Equals(School x, School y)
{
//Check whether the compared objects reference the same data.
if (Object.ReferenceEquals(x, y)) return true;
//Check whether any of the compared objects is null.
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
return false;
//Check whether the school' properties are equal.
return x.Name == y.Name;
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public int GetHashCode(School school)
{
//Check whether the object is null
if (Object.ReferenceEquals(school, null)) return 0;
//Get hash code for the Name field if it is not null.
int hashSchoolName = school.Name == null ? 0 : school.Name.GetHashCode();
//Calculate the hash code for the school.
return hashSchoolName;
}
}
Then your linq query would look like this:
db.Schools.Where(x => x.name != null)
.OrderBy(o => o.name).ToList()
.Distinct(new SchoolComparer())
.Select(s => new SelectListItem
{
Value = MyTrimmings(s.name),
Text = MyTrimmings(s.name)
});
You could make your class implement the IEquatable<T> interface, so Distinct will know how to compare them. Like this (basic example):
public class SelectListItem : IEquatable<SelectListItem>
{
public string Value { get; set; }
public string Text { get; set; }
public bool Equals(SelectListItem other)
{
if (other == null)
{
return false;
}
return Value == other.Value && Text == other.Text;
}
public override int GetHashCode()
{
unchecked
{
int hash = 17;
if (Value != null)
{
hash = hash * 23 + Value.GetHashCode();
}
if (Text != null)
{
hash = hash * 23 + Text.GetHashCode();
}
return hash;
}
}
}
(GetHashCode taken fron John Skeet's answer here: https://stackoverflow.com/a/263416/249000)

Categories