Remove duplicates in a list (c#) - c#

I want to remove duplicates in a list using following code, but it does not work. Anyone could enlighten me? Thanks.
public sealed class Pairing
{
public int Index { get; private set; }
public int Length { get; private set; }
public int Offset { get; private set; }
public Pairing(int index, int length, int offset)
{
Index = index;
Length = length;
Offset = offset;
}
}
class MyComparer : IEqualityComparer<Pairing>
{
public bool Equals(Pairing x, Pairing y)
{
return ((x.Index == y.Index) && (x.Length == y.Length) && (x.Offset == y.Offset));
}
public int GetHashCode(Pairing obj)
{
return obj.GetHashCode();
}
}
class Program
{
static void Main(string[] args)
{
List<Pairing> ps = new List<Pairing>();
ps.Add(new Pairing(2, 4, 14));
ps.Add(new Pairing(1, 2, 4));
ps.Add(new Pairing(2, 4, 14));
var unique = ps.Distinct(new MyComparer());
foreach (Pairing p in unique)
{
Console.WriteLine("{0}\t{1}\t{2}", p.Index, p.Length, p.Offset);
}
Console.ReadLine();
}
}

According to the example on the IEnumerable.Distinct page you will need to implement GetHashCode() so that the equal objects return the same hashcode. If you do not override GetHashCode() in your object it is not guaranteed to return the same hashcode.
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public int GetHashCode(Product product)
{
//Check whether the object is null
if (Object.ReferenceEquals(product, null)) return 0;
//Get hash code for the Name field if it is not null.
int hashProductName = product.Name == null ? 0 : product.Name.GetHashCode();
//Get hash code for the Code field.
int hashProductCode = product.Code.GetHashCode();
//Calculate the hash code for the product.
return hashProductName ^ hashProductCode;
}

Defining GetHashCode to return a unique answer causes the Distinct to work as expected;
public int GetHashCode(Pairing obj)
{
if (obj==null) return 0;
var hc1 = obj.Index.GetHashCode();
var hc2 = obj.Length.GetHashCode();
var hc3 = obj.Offset.GetHashCode();
return hc1 ^ hc2 ^ hc3;
}

Related

Trying to set up a CompareLists method with an overridden Equals/Gethash code methods

I'm trying to set up a way to compare some nested lists with objects that I'm importing from MongoDB. I have already set up the lists object:
public class SecurityGroup
{
public ObjectId Id { get; set; }
public string GroupID { get; set; }
public string GroupName{ get; set; }
public List<IpPermission> IpPermissions { get; set; }
public override string ToString()
{
return string.Format("groupid : {0}, groupname : {1} ", GroupID, GroupName );
}
With in that class I also have an overridden Equals method in place.
public override bool Equals(object obj)
{
SecurityGroup secGroup = obj as SecurityGroup;
if (secGroup == null)
{
return false;
}
if (!string.Equals(GroupID, secGroup.GroupID, StringComparison.OrdinalIgnoreCase))
{
return false;
}
if (!string.Equals(GroupName,secGroup.GroupName, StringComparison.OrdinalIgnoreCase))
{
return false;
}
I'm not sure if I need to post the class used for the nested loop but there's an IEquatable interface and the entireity of this class is EXACTLY like the SecurityGroup class I just posted.
//Compare IpPermissions
var diff1 = IpPermissions.Except(secGroup.IpPermissions);
var diff2 = secGroup.IpPermissions.Except(IpPermissions);
if (diff1.Any() || diff2.Any())
{
return false;
}
return true;
}
Now here's the Hashcode method that I set up:
public override int GetHashCode()
{
unchecked
{
const int HashingBase = (int)2166136261;
const int HashingMultiplier = 16777619;
int hash = HashingBase;
hash = (hash * HashingMultiplier) ^ (!Object.ReferenceEquals(null, IpPort) ? IpPort.GetHashCode() : 0);
hash = (hash * HashingMultiplier) ^ (!Object.ReferenceEquals(null, IpProtocol) ? IpProtocol.GetHashCode() : 0);
hash = (hash * HashingMultiplier) ^ (!Object.ReferenceEquals(null, IpRanges) ? IpRanges.GetHashCode() : 0);
return hash;
}
}
}
}
That essentially conludes the code I have in place to set up the framework for my comparesList method. Now here's where I'm having issues. I'm trying to set up the compare lists method and it's just giving me underscored red lines on the 'public static' part. The thing I can't figure out is the error it's giving me for the 'foreach' statements. It's saying, "Cannot convert element type 'AwsInstanceProfile1.Entity.SecurityGroup' to iterator type 'Amazon.EC2.Model.SecurityGroup'" Which is really weird because I should in theory have the framework set up to allow the lists into these objects. Here's the rest of the method:
public static bool CompareLists(List<Entity.SecurityGroup> list1, List<Entity.SecurityGroup> list2) =>
{
if (list1 == null || list2 == null)
return list1 == list2;
Dictionary<SecurityGroup, int> hash = new Dictionary<SecurityGroup, int>();
foreach (SecurityGroup secGroup in list1)
{
if (hash.ContainsKey(secGroup))
{
hash[secGroup]++;
}
else
{
hash.Add(secGroup, 1);
}
}
foreach (SecurityGroup secGroup in list2)
{
if (!hash.ContainsKey(secGroup) || hash[secGroup] == 0)
{
return false;
}
hash[secGroup]--;
}
return true;
}

How to check equality for a custom class array in C#?

I have a custom class named as City and this class has an Equals method. The SequenceEqual method works good when comparing arrays with assigned variables. The problem occurs when comparing two arrays that contains the elements formatted new City(). It results as false.
City class:
interface IGene : IEquatable<IGene>
{
string Name { get; set; }
int Index { get; set; }
}
class City : IGene
{
string name;
int index;
public City(string name, int index)
{
this.name = name;
this.index = index;
}
public string Name
{
get
{
return name;
}
set
{
name = value;
}
}
public int Index
{
get
{
return index;
}
set
{
index = value;
}
}
public bool Equals(IGene other)
{
if (other == null && this == null)
return true;
if((other is City))
{
City c = other as City;
return c.Name == this.Name && c.Index == this.Index;
}
return false;
}
}
In the Test method below, the first comparing result arrayCompare1 is true and the second result arrayCompare2 is false. Both compare result must be true but there is an anormal stuation. How can I fix this problem?
Test code:
public void Test()
{
City c1 = new City("A", 1);
City c2 = new City("B", 2);
City[] arr1 = new City[] { c1, c2 };
City[] arr2 = new City[] { c1, c2 };
City[] arr3 = new City[] { new City("A", 1), new City("B", 2) };
City[] arr4 = new City[] { new City("A", 1), new City("B", 2) };
bool arrayCompare1 = arr1.SequenceEqual(arr2);
bool arrayCompare2 = arr3.SequenceEqual(arr4);
MessageBox.Show(arrayCompare1 + " " + arrayCompare2);
}
You need to override the Object.Equals somehow like this:
public override bool Equals(object other)
{
if (other is IGene)
return Equals((IGene)other);
return base.Equals(other);
}
You need to override bool Equals(object obj). Simplest addition to your code:
public override bool Equals(object obj)
{
return Equals(obj as IGene);
}

C# Group By Several Nested Properties And List values

I have this object structure:
public class Root
{
public int Value1;
public int Value2;
public List<NestedA> NestedAList;
}
public class NestedA
{
public List<NestedB> NestedBList;
public List<NestedC> NestedCList;
}
public class NestedB{
public int ValueB;
public int ValueB2;
}
public class NestedC{
public int ValueC;
public int ValueC2;
}
I need to group root objects using all Values from Root class and it's nested lists.
I've been playing around a while and can't figure out how to/or if I can do this in a single group by statement, or what the best approach to acomplish this could be.
Edit: I need the items grouped by Root properties, Nested A Properties, Nested B Properties and Nested C Properties.
So it makes sense: My real objects have more properties, just showing the ones that I need grouped, and can use as a start point.
Thanks in advance.
If we have this element
Root
Value1 = 1
Value2 = 2
NestedAList = [
{NestedBList = [
{ValueB=2, ValueB2=3}
]
NestedCList = [
{ValueC=5, ValueC2=11}
]}
]
it should be grouped with this one:
Root
Value1 = 1
Value2 = 2
NestedAList = [
{NestedBList = [
{ValueB=2, ValueB2=3}
]
NestedCList = [
{ValueC=5, ValueC2=11}
]}
]
but not with this one:
Root
Value1 = 1
Value2 = 2
NestedAList = [
{NestedBList = [
{ValueB=2, ValueB2=3}, { ValueB= 1, ValueB2=4}
]
NestedCList = [
{ValueC=5, ValueC2=11}
]}
]
To accomplish this task, you can override Equals() and GetHashCode() methods for each class in your hierarchy. It may be little tricky, for example, like this:
public class Root
{
public int Value1;
public int Value2;
public List<NestedA> NestedAList;
public override bool Equals(object obj)
{
Root other = obj as Root;
if (other == null) return false;
return this.Value1 == other.Value1 && this.Value2 == other.Value2 && this.NestedAList.SequenceEqual(other.NestedAList);
}
public override int GetHashCode()
{
unchecked
{
int hasha = 19;
foreach (NestedA na in NestedAList)
{
hasha = hasha * 31 + na.GetHashCode();
}
return (Value1 ^ Value1 ^ hasha).GetHashCode();
}
}
}
public class NestedA
{
public List<NestedB> NestedBList;
public List<NestedC> NestedCList;
public override bool Equals(object obj)
{
NestedA other = obj as NestedA;
if (other == null) return false;
return NestedBList.SequenceEqual(other.NestedBList) && NestedCList.SequenceEqual(other.NestedCList);
}
public override int GetHashCode()
{
unchecked
{
int hashb = 19;
foreach (NestedB nb in NestedBList)
{
hashb = hashb * 31 + nb.GetHashCode();
}
int hashc = 19;
foreach (NestedC nc in NestedCList)
{
hashc = hashc * 31 + nc.GetHashCode();
}
return (hashb ^ hashc).GetHashCode();
}
}
}
public class NestedB{
public int ValueB;
public int ValueB2;
public override bool Equals(object obj)
{
NestedB other = obj as NestedB;
if (other == null) return false;
return this.ValueB == other.ValueB && this.ValueB2 == other.ValueB2;
}
public override int GetHashCode()
{
return (ValueB ^ ValueB2).GetHashCode();
}
}
public class NestedC{
public int ValueC;
public int ValueC2;
public override bool Equals(object obj)
{
NestedC other = obj as NestedC;
if (other == null) return false;
return this.ValueC == other.ValueC && this.ValueC2 == other.ValueC2;
}
public override int GetHashCode()
{
return (ValueC ^ ValueC2).GetHashCode();
}
}
After that you can easily select unique Roots (each unique Root represents a group):
roots.Distinct().ToList()
Same result using GoupBy():
roots.GroupBy(r => r).Select(g => g.First()).ToList()
Count elements in each group:
roots.GroupBy(r => r).Select(g => g.Count())
Enumerate elements in the first group:
roots.GroupBy(r => r).First().Select(g => g)
If you don't care about elements order in Lists, use Enumerable.All instead of SequenceEqual
EDIT: Also, in this case you have to change hash code generation alghoritm. For example, like this: hashb = hashb + nb.GetHashCode() * 31; (additional info about possible algorithms here)

Compare two custom LIST objects

I have scenario to check
1) if the any prop (EmployeeObject), from empDb appear in empXml , return true. Else return false
public class EmployeeObject
{
public Int32 Id { get; set; }
public string Title { get; set; }
public string Desc { get; set; }
.....
}
IList<EmployeeObject> empDb = PopulateFromDb(); //calling ado.net
IList<EmployeeObject> empXml = PopulateFromXml(); //deserializing xml
So let me get this straight, I'm trying to determine if list empXml is a subset of list empDb ?
Tried so far; but it returns false even thought i have check the data in both list and it should have return true unless i am doing something wrong in my expression.
//at least one MATCH
empDb.Any(a => empXml.Contains(a));
or
//at least one EXACT match
empDb.Any(x => empXml.Contains(y => x.Equals(y)));
If you don't have Equals and GetHashCode implemented in EmployeeObject class then employees will be compared by reference. And you will definitely have different instances here, because first list is created when you read data from database, and second list is created when you are deserializing xml. So, even employees with same values of all fields will be considered different.
If you want to check matches only by employee Id, then you can project sequences to ids and then use Intersect to check if match exist
// at least one employee with equal Id
empDb.Select(e => e.Id).Intersect(empXml.Select(e => e.Id)).Any()
If you want to compare employees by value of their fields instead of their references, you have several options. If you can't or don't want to change implementation of EmployeeObject class and override its Equals and GetHashCode methods, then you can create custom comparer for employees:
public class EmployeeComparer : IEqualityComparer<EmployeeObject>
{
public bool Equals(EmployeeObject x, EmployeeObject y)
{
return x.Id == y.Id
&& x.Title == y.Title
&& x.Desc == y.Desc;
}
public int GetHashCode(EmployeeObject obj)
{
int code = 19;
code = code * 23 + obj.Id.GetHashCode();
code = code * 23 + obj.Title.GetHashCode();
code = code * 23 + obj.Desc.GetHashCode();
return code;
}
}
Then you can use this comparer:
empDb.Intersect(empXml, new EmployeeComparer()).Any()
Or you can project your employees to anonymous objects (which have default implementation of Equals and GetHashCode):
empDb.Select(e => new { e.Id, e.Title, e.Desc })
.Intersect(empXml.Select(e => new { e.Id, e.Title, e.Desc })).Any()
Or override these methods:
public class EmployeeObject
{
public Int32 Id { get; set; }
public string Title { get; set; }
public string Desc { get; set; }
public override int GetHashCode()
{
int code = 19;
code = code * 23 + Id.GetHashCode();
code = code * 23 + Title.GetHashCode();
code = code * 23 + Desc.GetHashCode();
return code;
}
public override bool Equals(object obj)
{
EmployeeObject other = obj as EmployeeObject;
if (other == null)
return false;
if (ReferenceEquals(this, other))
return true;
return Id == other.Id &&
Title == other.Title && Desc == other.Desc;
}
}
And your code will work. Or you can use Intersect:
empDb.Intersect(empXml).Any()
If Id is the primary key of the entity you might want to write:
var set = new HashSet<int>(empXml.Select(x => x.Id)); //For faster lookup
empDb.Any(a => set.Contains(a.Id));
But if you need to match on all properties you need to override Equals and GetHashCode. (this implementation also match on null values for the properties)
public class EmployeeObject : IEquatable<EmployeeObject>
{
public bool Equals(EmployeeObject other)
{
return Id == other.Id &&
string.Equals(Title, other.Title) &&
string.Equals(Desc, other.Desc);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != this.GetType()) return false;
return Equals((EmployeeObject) obj);
}
public override int GetHashCode()
{
unchecked
{
var hashCode = Id;
hashCode = (hashCode*397) ^ (Title != null ? Title.GetHashCode() : 0);
hashCode = (hashCode*397) ^ (Desc != null ? Desc.GetHashCode() : 0);
return hashCode;
}
}
public Int32 Id { get; set; }
public string Title { get; set; }
public string Desc { get; set; }
}
And the write:
var set = new HashSet<EmployeeObject>(empXml); //For faster lookup
empDb.Any(a => set.Contains(a));

Getting List of Objects that occurs exaclty twice in a list

I have a List<CustomPoint> points; which contains close to million objects.
From this list I would like to get the List of objects that are occuring exactly twice. What would be the fastest way to do this? I would also be interested in a non-Linq option also since I might have to do this in C++ also.
public class CustomPoint
{
public double X { get; set; }
public double Y { get; set; }
public CustomPoint(double x, double y)
{
this.X = x;
this.Y = y;
}
}
public class PointComparer : IEqualityComparer<CustomPoint>
{
public bool Equals(CustomPoint x, CustomPoint y)
{
return ((x.X == y.X) && (y.Y == x.Y));
}
public int GetHashCode(CustomPoint obj)
{
int hash = 0;
hash ^= obj.X.GetHashCode();
hash ^= obj.Y.GetHashCode();
return hash;
}
}
based on this answer, i tried,
list.GroupBy(x => x).Where(x => x.Count() = 2).Select(x => x.Key).ToList();
but this is giving zero objects in the new list.
Can someone guide me on this?
You should implement Equals and GetHashCode in the class itself and not in the PointComparer
To get your code working, you need to pass an instance of your PointComparer as a second argument to GroupBy.
This method works for me:
public class PointCount
{
public CustomPoint Point { get; set; }
public int Count { get; set; }
}
private static IEnumerable<CustomPoint> GetPointsByCount(Dictionary<int, PointCount> pointcount, int count)
{
return pointcount
.Where(p => p.Value.Count == count)
.Select(p => p.Value.Point);
}
private static Dictionary<int, PointCount> GetPointCount(List<CustomPoint> pointList)
{
var allPoints = new Dictionary<int, PointCount>();
foreach (var point in pointList)
{
int hash = point.GetHashCode();
if (allPoints.ContainsKey(hash))
{
allPoints[hash].Count++;
}
else
{
allPoints.Add(hash, new PointCount { Point = point, Count = 1 });
}
}
return allPoints;
}
Called like this:
static void Main(string[] args)
{
List<CustomPoint> list1 = CreateCustomPointList();
var doubles = GetPointsByCount(GetPointCount(list1), 2);
Console.WriteLine("Doubles:");
foreach (var point in doubles)
{
Console.WriteLine("X: {0}, Y: {1}", point.X, point.Y);
}
}
private static List<CustomPoint> CreateCustomPointList()
{
var result = new List<CustomPoint>();
for (int i = 0; i < 5; i++)
{
for (int j = 0; j < 5; j++)
{
result.Add(new CustomPoint(i, j));
}
}
result.Add(new CustomPoint(1, 3));
result.Add(new CustomPoint(3, 3));
result.Add(new CustomPoint(0, 2));
return result;
}
CustomPoint implementation:
public class CustomPoint
{
public double X { get; set; }
public double Y { get; set; }
public CustomPoint(double x, double y)
{
this.X = x;
this.Y = y;
}
public override bool Equals(object obj)
{
var other = obj as CustomPoint;
if (other == null)
{
return base.Equals(obj);
}
return ((this.X == other.X) && (this.Y == other.Y));
}
public override int GetHashCode()
{
int hash = 23;
hash = hash * 31 + this.X.GetHashCode();
hash = hash * 31 + this.Y.GetHashCode();
return hash;
}
}
It prints:
Doubles:
X: 0, Y: 2
X: 1, Y: 3
X: 3, Y: 3
As you see in GetPointCount(), I create a dictionary per unique CustomPoint (by hash). Then I insert a PointCount object containing a reference to the CustomPoint which starts at a Count of 1, and every time the same point is encountered, the Count is increased.
Finally in GetPointsByCount I return the CustomPoints in the dictionary where PointCount.Count == count, in your case 2.
Please also note I updated the GetHashCode() method, since your one returns the same for point (1,2) and (2,1). If you do want that, feel free to restore your own hashing method. You will have to test the hashing function though, because it's hard to uniquely hash two numbers into one. That depends on the range of numbers used though, so you should implement a hash function that fits your own needs.

Categories