Checking if same object is already present in a list - c#

Here is the story:
Im trying to make a list of different clusters... I only want to have the necessary clusters... And Clusters can be the same.
How can I add this to a list by checking if the list contains the object (I know objects cant be passed here)
This is my sample quote:
foreach (Cluster cluster in clustersByProgramme)
{
if (!clusterList.Contains(cluster))
{
clusterList.Add(cluster);
}
}

Your code should work; if it hasn't, you might be using different object instances that represent the same actual cluster, and you perhaps haven't provided a suitable Equals implementation (you should also update GetHashCode at the same time).
Also - in .NET 3.5, this could be simply:
var clusterList = clustersByProgramme.Distinct().ToList();
As an example of class that supports equality tests:
class Cluster // possibly also IEquatable<Cluster>
{
public string Name { get { return name; } }
private readonly string name;
public Cluster(string name) { this.name = name ?? ""; }
public override string ToString() { return Name; }
public override int GetHashCode() { return Name.GetHashCode(); }
public override bool Equals(object obj)
{
Cluster other = obj as Cluster;
return obj == null ? false : this.Name == other.Name;
}
}

Your example is about as simple as it is going to get. The only thing I could possibly recommend is that you use the Exists method:
The Predicate is a delegate to a
method that returns true if the object
passed to it matches the conditions
defined in the delegate. The elements
of the current List are individually
passed to the Predicate delegate, and
processing is stopped when a match is
found.
This method performs a linear search;
therefore, this method is an O(n)
operation, where n is Count.

If you're using .NET 3.5, use a HashSet to do this.
HashSet<Cluster> clusterList = new HashSet<Cluster>();
foreach (Cluster cluster in clustersByProgramme)
{
clusterList.Add(cluster);
}
In this case, also make sure that if cluster1 == cluster2, then
cluster1.Equals(cluster2);
cluster2.Equals(cluster1); //yeah, could be different depending on your impl
cluster1.GetHashCode() == cluster2.GetHashCode();

Your code is correct, but it is not very efficient. You could instead use a HashSet<T> like this:
HashSet<Cluster> clusterSet = new HashSet<T>();
foreach (Cluster cluster in clustersByProgramme)
clusterSet.Add(cluster);
In this case, also make sure that if cluster1 == cluster2, then
cluster1.Equals(cluster2);
cluster2.Equals(cluster1); //yeah, could be different depending on your impl
cluster1.GetHashCode() == cluster2.GetHashCode();

why not just use dictionary?
It is n(1) as long as your items have a good hash.
Seems a simple solution
Ie dictionary.Contains(key) is n(1)
you can then update existing if at all or add new

Related

Is it safe to override GetHashCode and get it from string property?

I have a class:
public class Item
{
public string Name { get; set; }
public override int GetHashCode()
{
return Name.GetHashCode();
}
}
The purpose of overriding GetHashCode is that I want to have only one occurence of an object with specified name in Dictionary.
But is it safe to get hash code from string?
In other words, is there any chance that two objects with different values of property Name would return the same hash code?
But is it safe to get hash code from string?
Yes, it is safe. But, what you're doing isn't. You're using a mutable string field to generate your hash code. Let's imagine that you inserted an Item as a key for a given value. Then, someone changes the Name string to something else. You now are no longer able to find the same Item inside your Dictionary, HashSet, or whichever structure you use.
More-so, you should be relying on immutable types only. I'd also advise you to implement IEquatable<T> as well:
public class Item : IEquatable<Item>
{
public Item(string name)
{
Name = name;
}
public string Name { get; }
public bool Equals(Item other)
{
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return string.Equals(Name, other.Name);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != this.GetType()) return false;
return Equals((Item) obj);
}
public static bool operator ==(Item left, Item right)
{
return Equals(left, right);
}
public static bool operator !=(Item left, Item right)
{
return !Equals(left, right);
}
public override int GetHashCode()
{
return (Name != null ? Name.GetHashCode() : 0);
}
}
is there any chance that two objects with different values of property
Name would return the same hash code?
Yes, there is a statistical chance that such a thing will happen. Hash codes do not guarantee uniqueness. They strive for uni-formal distribution. Why? because your upper boundary is Int32, which is 32bits. Given the Pigenhole Principle, you may happen at end up with two different strings containing the same hash code.
Your class is buggy, because you have a GetHashCode override, but no Equals override. You also don't consider the case where Name is null.
The rule for GetHashCode is simple:
If a.Equals(b) then it must be the case that a.GetHashCode() == b.GetHashCode().
The more cases where if !a.Equals(b) then a.GetHashCode() != b.GetHashCode() the better, indeed the more cases where !a.Equals(b) then a.GetHashCode() % SomeValue != b.GetHashCode() % SomeValue the better, for any given SomeValue (you can't predict it) so we like to have a good mix of bits in the results. But the vital thing is that two objects considered equal must have equal GetHashCode() results.
Right now this isn't the case, because you've only overridden one of these. However the following is sensible:
public class Item
{
public string Name { get; set; }
public override int GetHashCode()
{
return Name == null ? 0 : Name.GetHashCode();
}
public override bool Equals(object obj)
{
var asItem = obj as Item;
return asItem != null && Name == obj.Name;
}
}
The following is even better, because it allows for faster strongly-typed equality comparisons:
public class Item : IEquatable<Item>
{
public string Name { get; set; }
public override int GetHashCode()
{
return Name == null ? 0 : Name.GetHashCode();
}
public bool Equals(Item other)
{
return other != null && Name == other.Name;
}
public override bool Equals(object obj)
{
return Equals(obj as Item);
}
}
In other words, is there any chance that two objects with different values of property Name would return the same hash code?
Yes, this can happen, but it won't happen often, so that's fine. The hash-based collections like Dictionary and HashSet can handle a few collisions; indeed there'll be collisions even if the hash codes are all different because they're modulo'd down to a smaller index. It's only if this happens a lot that it impacts performance.
Another danger is that you'll be using a mutable value as a key. There's a myth that you shouldn't use mutable values for hash-codes, which isn't true; if a mutable object has a mutable property that affects what it is considered equal with then it must result in a change to the hash-code.
The real danger is mutating an object that is a key to a hash collection at all. If you are defining equality based on Name and you have such an object as the key to a dictionary then you must not change Name while it is used as such a key. The easiest way to ensure that is to have Name be immutable, so that is definitely a good idea if possible. If it is not possible though, you need to be careful just when you allow Name to be changed.
From a comment:
So, even if there is a collision in hash codes, when Equals will return false (because the names are different), the Dictionary will handle propertly?
Yes, it will handle it, though it's not ideal. We can test this with a class like this:
public class SuckyHashCode : IEquatable<SuckyHashCode>
{
public int Value { get; set; }
public bool Equals(SuckyHashCode other)
{
return other != null && other.Value == Value;
}
public override bool Equals(object obj)
{
return Equals(obj as SuckyHashCode);
}
public override int GetHashCode()
{
return 0;
}
}
Now if we use this, it works:
var dict = Enumerable.Range(0, 1000).Select(i => new SuckyHashCode{Value = i}).ToDictionary(shc => shc);
Console.WriteLine(dict.ContainsKey(new SuckyHashCode{Value = 3})); // True
Console.WriteLine(dict.ContainsKey(new SuckyHashCode{Value = -1})); // False
However, as the name suggests, it isn't ideal. Dictionaries and other hash-based collections all have means to deal with collisions, but those means mean that we no longer have the great nearly O(1) look-up, but rather as the percentage of collisions gets greater the look-up approaches O(n). In the case above where the GetHashCode is as bad as it could be without actually throwing an exception, the look-up would be O(n) which is the same as just putting all the items into an unordered collection and then finding them by looking at every one to see if it matches (indeed, due to differences in overheads, it's actually worse than that).
So for this reason we always want to avoid collisions as much as possible. Indeed, to not just avoid collisions, but to avoid collisions after the result has been modulo'd down to make a smaller hash code (because that's what happens internally to the dictionary).
In your case though because string.GetHashCode() is reasonably good at avoiding collisions, and because that one string is the only thing that equality is defined by, your code would in turn be reasonably good at avoiding collisions. More collision-resistant code is certainly possible, but comes at a cost to performance in the the code itself* and/or is more work than can be justified.
*(Though see https://www.nuget.org/packages/SpookilySharp/ for code of mine that is faster than string.GetHashCode() on large strings on 64-bit .NET and more collision-resistant, though it is slower to produce those hash codes on 32-bit .NET or when the string is short).
Instead of using GetHashCode to prevent duplicates to be added to a dictionary, which is risky in your case as explained already, I would recommend to use a (custom) equality comparer for your dictionary.
If the key is an object, you should create an own equality comparer that compares the string Name value. If the key is the string itself, you can use StringComparer.CurrentCulture for example.
Also in this case it is key to make the string immutable, since else you might invalidate your dictionary by changing the Name.

NUnit comparing two lists

OK so I'm fairly new to unit testing and everything is going well until now.
I'm simplifying my problem here, but basically I have the following:
[Test]
public void ListTest()
{
var expected = new List<MyClass>();
expected.Add(new MyOtherClass());
var actual = new List<MyClass>();
actual.Add(new MyOtherClass());
Assert.AreEqual(expected,actual);
//CollectionAssert.AreEqual(expected,actual);
}
But the test is failing, shouldn't the test pass? what am I missing?
If you're comparing two lists, you should use test using collection constraints.
Assert.That(actual, Is.EquivalentTo(expected));
Also, in your classes, you will need to override the Equals method, otherwise like gleng stated, the items in the list are still going to be compared based on reference.
Simple override example:
public class Example
{
public int ID { get; set; }
public override bool Equals(object obj)
{
return this.ID == (obj as Example).ID;
}
}
A very simple way to get this test to work is to only create the MyOtherClass instance once. That way, when comparing the item in the two lists they will be "equal" (because they reference the same object). If you do this, CollectionAssert will work just fine.
[Test]
public void ListTest()
{
var thing = new MyOtherClass();
var expected = new List<MyClass>();
expected.Add(thing);
var actual = new List<MyClass>();
actual.Add(thing);
CollectionAssert.AreEqual(expected,actual);
}
If you don't this though, you'll need to implement IEquatable<MyOtherClass> in MyOtherClass or override Equals to define what makes two instances of that class the "same".
Try to be a bit more specific about what you are trying to achieve. Explicitly telling that you want to compare entire sequence will solve the problem. I personally wouldn't rely on NUnit fancy features for determining what you meant by says AreEqual. E.g.
Assert.IsTrue(actual.SequenceEqual(expected));
I convert my comment to answer on request.
Well, this fails because AreEqual uses reference comparison. In order to make it work you need value comparison(your own custom comparison).
You can pretty much do that by implementing IEquatable interface. and keep in mind when you're implementing this interface you must override Object.Equals and Object.GetHashCode as well to get consistent results.
.Net framework supports doing this without implementing IEquatable you need IEqualityComparer that should do the trick, but nunit should have a method which takes this as a overload. Am not certain about "nunit" though.
From Nunit documentation:
Starting with version 2.2, special provision is also made for comparing single-dimensioned arrays. Two arrays will be treated as equal by Assert.AreEqual if they are the same length and each of the corresponding elements is equal. Note: Multi-dimensioned arrays, nested arrays (arrays of arrays) and other collection types such as ArrayList are not currently supported.
You have a list of objects ... so it's not the same as comparing 2 ints.
What you should do is probably compare all the objects inside the list ... (Try converting your list to an array ... might actually work :) )
As I said (and most others as well), you'll probably need to override Equals. Here's MSDN page about how to do it (Covers Equals, == operator, and GetHashCode).
Similar with more info :
[compare-equality-between-two-objects-in-nunit]
(Compare equality between two objects in NUnit)
If you can't modify a class then this example can be helpful:
[Test]
public void Arrays_Should_Be_Equal()
{
MyClass[] array1 = GetTestArrayOfSize(10);
MyClass[] array2 = GetTestArrayOfSize(10);
// DOESN'T PASS
// Assert.That(array1, Is.EquivalentTo(array2));
Func<MyClass, object> selector = i => new { i.Property1, i.Property2 };
Assert.That(array1.Select(selector), Is.EquivalentTo(array2.Select(selector)));
}
private MyClass[] GetTestArrayOfSize(int count)
{
return Enumerable.Range(1, count)
.Select(i => new MyClass { Property1 = "Property1" + i, Property2 = "Property2" + i }).ToArray();
}
Option 1: AreEquivalent
CollectionAssert.AreEquivalent(expectedList, actualList);
Option 2: IEqualityComparer - for custom equality comparisons
Assert.That(expectedList, Is.EqualTo(actualList).Using(new IEqualityComparerImplementation()));
private class IEqualityComparerImplementation: IEqualityComparer<T>
{
public bool Equals(GeometricPlane plane1, GeometricPlane plane2)
{
// obviously add in your own implementation
throw new NotImplementedException();
}
public int GetHashCode(GeometricPlane obj)
{
throw new NotImplementedException();
}
}

List.Contains doesn't work properly

I have a list which contains objects but these objests aren't unique in the list. I wrte this code to make unique them in another list:
foreach (CategoryProductsResult categoryProductsResult in categoryProductsResults.Where(categoryProductsResult => !resultSet.Contains(categoryProductsResult)))
{
resultSet.Add(categoryProductsResult);
}
But at the end resultSet is the same with categoryProductsResults.
categoryProductsResult's second row :
resultSet first row:
As you can see resultSet's first row and categoryProductsResult's second row is the same but it adds the second row to resultSet.
Do you have any suggestion?
Contains uses the default comparer which is comparing references since your class does not override Equals and GetHashCode.
class CategoryProductsResult
{
public string Name { get; set; }
// ...
public override bool Equals(object obj)
{
if(obj == null)return false;
CategoryProductsResult other = obj as CategoryProductsResult;
if(other == null)return false;
return other.Name == this.Name;
}
public override int GetHashCode()
{
return Name.GetHashCode();
}
}
Now you can simply use:
resultSet = categoryProductsResults.Distinct().ToList();
List uses the comparer returned by EqualityComparer.Default and according to the documentation for that:
The Default property checks whether type T implements the
System.IEquatable(Of T) interface and, if so, returns an
EqualityComparer(Of T) that uses that implementation. Otherwise, it
returns an EqualityComparer(Of T) that uses the overrides of
Object.Equals and Object.GetHashCode provided by T.
So you can either implement IEquatable on your custom class, or override the Equals (and GetHashCode) methods to do the comparison by the properties you require. Alternatively you could use linq:
bool contains = list.Any(i => i.Id == obj.Id);
Each categoryProductsResult is different to each other. It's like something your can see here. If you want a simpler one and the ProductId is your unique identifier. Just do the code below:
foreach (CategoryProductsResult categoryProductsResult in categoryProductsResults.Where(categoryProductsResult => resultSet.ProductId !=categoryProductsResult.ProductId)
{
resultSet.Add(categoryProductsResult);
}
Reference objects in a list are indexed by their hash code. So, Contains will never find a reference object with the same hash code (unless you override the GetHashCode and Equals implementation in the class.
This SO answer explains.
You need to check if your current item is contained in your target list for each iteration. Currently you check once at the start of the loop, which means none of your items is in the target list.
I think Distinct is already doing what you want, you might want to use this extension instead of your own loop.

How to eliminate duplicates from a list?

class infoContact
{
private string contacts_first_nameField;
private string contacts_middle_nameField;
private string contacts_last_nameField;
private Phonenumber[] phone_numbersField;
private Emailaddress[] emailField;
}
I have a List<infoContact> The list contains almost 7000 which I get from some other program. In the list out of 7000, 6500 are duplicates. I am looking for a way how to eliminate duplicates.
A infoContact is duplicate if first_name, last_name, emailaddresses, phone numbers are same.
I thought of using a HashSet<infoContact> and override getHashCode() of infoContact.
I am just curious to know if that is the best way to do. If this is not a good way what is the better way?
You can use the Distinct extension method that takes an IEqualityComparer<T>. Just write a class that implements that interface, and does the comparison, and then you can just do something like this:
var filteredList = oldList.Distinct(new InfoContactComparer());
override an equals method with the parametres you want so you can compare objects through equals
i created a remove deducted items from list class before here is the key for it ,
List<string> list = new List<string>();
foreach (string line in File.ReadAllLines(somefile.txt))
{
if (!list.Contains(line))
{
list.Add(line);
}
}
Firstly think of extracting the unique values. You could use the Distinct() Linq method with a comparer like:
public class infoContactComparer : IEqualityComparer<infoContact>
{
public bool Equals(infoContact x, infoContact y)
{
return x.contacts_first_nameField == y.contacts_first_nameField
&& x.contacts_last_nameField == y.contacts_last_nameField
&& ...
}
public int GetHashCode(infoContact obj)
{
return obj.contacts_first_nameField.GetHashCode();
}
}
Two options: override GetHashCode and Equals if you control the source of infoContact and your overrides will be true for any particular use of the class.
Otherwise, define a class implementing IEqualityComparer<infoContact>, which also allows you to define proper Equals and GetHashCode methods, and then pass an instance of this into a HashSet<infoContact> constructor or into a listOfContacts.Distinct method call (using Linq).
Note: your question seems to be based on the idea that GetHashCode should determine equality or uniqueness. It shouldn't! It's part of the tool that allows a HashSet to do its job, but it is not required to return unique values for unequal instances. The values should be well distributed, but they can ultimately overlap.
In short, two equal instances should have the same hash code, but two instances sharing the same hash code are not necessarily equal. For more on guidelines for GetHashCode, please visit this blog.
The right way is to ovveride the equals method!
In this way, when you add new element in the list, the element don't will be added!
Implement your class infoContact as a derivate of IEquatable<infoContact>:
class InfoContact : IEquatable<InfoContact> {
string contacts_first_nameField;
string contacts_last_nameField;
object[] phone_numbersField;
object[] emailField;
// other fields
public bool Equals(InfoContact other) {
return contacts_first_nameField.Equals(other.contacts_first_nameField)
&& contacts_last_nameField.Equals(other.contacts_last_nameField)
&& phone_numbersField.Equals(other.phone_numbersField)
&& emailField.Equals(other.emailField);
}
}
and use Linqs Enumerable.Distinct method in order to filter the duplicates:
var infoContacts = GetInfoContacts().Distinct();

Complex object comparison in C# [duplicate]

This question already has answers here:
C# implementation of deep/recursive object comparison in .net 3.5
(6 answers)
Closed 8 years ago.
I have two complex objects of the same type. I want to compare both the objects to determine if they have the exact same values. What is the efficient way of doing this ?
sample class structure given below:
class Package
{
public List<GroupList> groupList;
}
class GroupList
{
public List<Feature> featurelist;
}
class Feature
{
public int qty;
}
Okay, so you want deep unordered structural comparison. The "unordered" part is tricky, and in fact it is a strong hint that your classes are not designed right: List<T> is inherently ordered, so perhaps you would rather want to use a HashSet<T> there (if you don't expect to have any duplicates). Doing so would make the comparison both easier to implement, and faster (though insertions would be slower):
class Package
{
public HashSet<GroupList> groupList;
public override bool Equals(object o)
{
Package p = o as Package;
if (p == null) return false;
return groupList.SetEquals(p.groupList);
}
public override int GetHashCode()
{
return groupList.Aggregate(0, (hash, g) => hash ^ g.GetHashCode());
}
}
class GroupList
{
public HashSet<Feature> featureList;
public override bool Equals(object o)
{
GroupList g = o as GroupList;
if (g == null) return false;
return featureList.SetEquals(g.featureList);
}
public override int GetHashCode()
{
return featureList.Aggregate(0, (hash, f) => hash ^ f.GetHashCode());
}
}
class Feature
{
public int qty;
public override bool Equals(object o)
{
Feature f = o as Feature;
if (f == null) return false;
return qty == f.qty;
}
public override int GetHashCode()
{
return qty.GetHashCode();
}
}
If you want to keep using List<T>, you'll need to use LINQ set operations - note, however, that those are significantly slower:
class Package
{
public List<GroupList> groupList;
public override bool Equals(object o)
{
Package p = o as Package;
if (p == null) return false;
return !groupList.Except(p.groupList).Any();
}
}
class GroupList
{
public List<Feature> featureList;
public override bool Equals(object o)
{
GroupList g = o as GroupList;
if (g == null) return false;
return !featureList.Except(f.featureList).Any();
}
}
For complex objects, I would consider operator overloading.
On the overloaded operator, I would define my condition for equality.
http://msdn.microsoft.com/en-us/library/aa288467%28VS.71%29.aspx
We always just end up writing a method on the class that goes through everything and compares it. You could implement this as IComparable, or override Equals.
As the comment said, depends on how "exact" you want to measure.
You could just override equality and implement a GetHashCode method, however this does not guarantee they are exact matches. Will however ensure they are "very likely" an exact match.
Next thing you could do, is to go through every property/field in the class and compare those hash values. This would be "extremely likely" an exact match.
And to truly get an exact match, you have to compare every field and member in a recursive loop...not recommended.
If I were you, I would implement the IComparable Interface on the two types:
http://msdn.microsoft.com/en-us/library/system.icomparable.aspx
From there you can use .CompareTo, and implement the exact comparisons required under your circumstances. This is a general best practice within .NET and I think applies well to your case.
Depends on what you what you want to do with comparison. Like others have pointed out IComparer is a good choice. If you are planning on using lambdas and LINQ, I would go with IEqualityComparer
http://msdn.microsoft.com/en-us/library/system.collections.iequalitycomparer.aspx
In general, you need a method to check the two, regardless of whether or not you overload equals, or use IComparer.
You asked how to do it most efficiently, here are some tips:
Your equality method should try to give up quickly, e.g. check if the size of the lists are the same, if they are not then return false right away
If you could implement an efficient hashCode, you could compare the hashes first, if they are not equal then the objects are not equal, if they are equal, then you need to compare the objects to see if the objects are equal
So in general, do the fastest comparisons first to try to return false.
Here is a somewhat simplified way to do it, using reflection. You will probably need to add other checks of datatypes for specific comparisons or loop through lists etc, but this should get you started.
void Mymethod(){
Class1 class1 = new Class1();
//define properties for class1
Class1 class2 = new Class1();
//define properties for class2
PropertyInfo[] properties = class1.GetType().GetProperties();
bool bClassesEqual = true;
foreach (PropertyInfo property in properties)
{
Console.WriteLine(property.Name.ToString());
if (property.GetValue(class1, null) != property.GetValue(class2, null))
{
bClassesEqual = false;
break;
}
}
}

Categories