How to eliminate duplicates from a list? - c#

class infoContact
{
private string contacts_first_nameField;
private string contacts_middle_nameField;
private string contacts_last_nameField;
private Phonenumber[] phone_numbersField;
private Emailaddress[] emailField;
}
I have a List<infoContact> The list contains almost 7000 which I get from some other program. In the list out of 7000, 6500 are duplicates. I am looking for a way how to eliminate duplicates.
A infoContact is duplicate if first_name, last_name, emailaddresses, phone numbers are same.
I thought of using a HashSet<infoContact> and override getHashCode() of infoContact.
I am just curious to know if that is the best way to do. If this is not a good way what is the better way?

You can use the Distinct extension method that takes an IEqualityComparer<T>. Just write a class that implements that interface, and does the comparison, and then you can just do something like this:
var filteredList = oldList.Distinct(new InfoContactComparer());

override an equals method with the parametres you want so you can compare objects through equals

i created a remove deducted items from list class before here is the key for it ,
List<string> list = new List<string>();
foreach (string line in File.ReadAllLines(somefile.txt))
{
if (!list.Contains(line))
{
list.Add(line);
}
}

Firstly think of extracting the unique values. You could use the Distinct() Linq method with a comparer like:
public class infoContactComparer : IEqualityComparer<infoContact>
{
public bool Equals(infoContact x, infoContact y)
{
return x.contacts_first_nameField == y.contacts_first_nameField
&& x.contacts_last_nameField == y.contacts_last_nameField
&& ...
}
public int GetHashCode(infoContact obj)
{
return obj.contacts_first_nameField.GetHashCode();
}
}

Two options: override GetHashCode and Equals if you control the source of infoContact and your overrides will be true for any particular use of the class.
Otherwise, define a class implementing IEqualityComparer<infoContact>, which also allows you to define proper Equals and GetHashCode methods, and then pass an instance of this into a HashSet<infoContact> constructor or into a listOfContacts.Distinct method call (using Linq).
Note: your question seems to be based on the idea that GetHashCode should determine equality or uniqueness. It shouldn't! It's part of the tool that allows a HashSet to do its job, but it is not required to return unique values for unequal instances. The values should be well distributed, but they can ultimately overlap.
In short, two equal instances should have the same hash code, but two instances sharing the same hash code are not necessarily equal. For more on guidelines for GetHashCode, please visit this blog.

The right way is to ovveride the equals method!
In this way, when you add new element in the list, the element don't will be added!

Implement your class infoContact as a derivate of IEquatable<infoContact>:
class InfoContact : IEquatable<InfoContact> {
string contacts_first_nameField;
string contacts_last_nameField;
object[] phone_numbersField;
object[] emailField;
// other fields
public bool Equals(InfoContact other) {
return contacts_first_nameField.Equals(other.contacts_first_nameField)
&& contacts_last_nameField.Equals(other.contacts_last_nameField)
&& phone_numbersField.Equals(other.phone_numbersField)
&& emailField.Equals(other.emailField);
}
}
and use Linqs Enumerable.Distinct method in order to filter the duplicates:
var infoContacts = GetInfoContacts().Distinct();

Related

Creating multiple custom comparators for a dictionary based class

I wish in my class to return a list from a dictionary but allow custom sorting using pre-written comparison methods. In my original java code that I'm converting from, I created compare methods using Google Guava Ordering in my class and then had a single method called the following passing in one of the public comparator methods, kind of declared like this:
public List<Word> getWords(Comparator c) { }
I'm trying to recreate this in C# but I can't figure out how. Essentially in the code below you can see there are three versions for each type of sort, and in addition I end up creating two lists for every return value which seems a bit wasteful.
I looked at creating delegates but got a bit lost, then figured I could create an IComparable, but then saw IComparator and then saw Sort method takes a Comparator.
Can somebody point me in the direction of converting this into a single sort 'GetWords' in the best way, allowing clients to call the GetWords retrieving a sorted list from a pre-supplied set of ordering.
public partial class WordTable
{
private Dictionary<string, Word> words;
public WordTable()
{
//for testing
words = new Dictionary<string, Word>();
words.Add("B", new Word("B", WordTypes.Adjective));
words.Add("A", new Word("A", WordTypes.Noun));
words.Add("D", new Word("D", WordTypes.Verb));
}
public List<Word> GetWords()
{
return words.Values.ToList();
}
public List<Word> GetWordsByName()
{
List<Word> list = words.Values.ToList<Word>();
return list.OrderBy(word => word.Name).ToList();
}
public List<Word> GetWordsByType()
{
List<Word> list = words.Values.ToList<Word>();
return list.OrderBy(word => word.Type).ToList();
}
}
I think you are looking for predicates.
Effectively, you want a predefined set of predicates (one for ByName, one for ByType), and you pass this predicate into the GetWords function.
There are two approaches you can use.
IComparer
This is more closely related to your past Java experience.
The official way is to use IComparer<T> (link).
Similar to your Comparator in the Java example, this enables you to create different sorting methods which all implement the IComparer<Word> interface, and then you can dynamically choose your sorting method.
As a simple example:
public class WordNameComparer : IComparer<Word>
{
public int Compare(Word word1, Word word2)
{
return word1.Name.CompareTo(word2.Name);
}
}
And then you can do:
public List<Word> GetWords(IComparer<Word> comparer)
{
return words.Values.OrderBy(x => x, comparer).ToList();
}
Which you can call by doing:
var table = new WordTable();
List<Word> sortedWords = table.GetWords(new WordNameComparer());
And of course you change the sorting logic by passing a different IComparer<Word>.
Func parameters
From experience, this is a much preferred approach due to LINQ's enhanced readability and low implementation cost.
Looking at your last two methods, you should see that the only variable part is the lambda method that you use to order the data. You can of course turn this variably into a method parameter:
public List<Word> GetWordsBy<T>(Func<Word,T> orderByPredicate)
{
return words.Values.OrderBy(orderBy).ToList();
}
Because the OrderBy predicate uses a generic parameter for the selected property (e.g. sorting on a string field? an int field? ...), you have to make this method generic, but you don't need to explicitly use the generic parameter when you call the method. For example:
var sortedWordsByName = table.GetWordsBy(w => w.Name);
var sortedWordsByLength = table.GetWordsBy(w => w.Name.Length);
var sortedWordsByType = table.GetWordsBy(w => w.Type);
Note that if you select a class, not a value type, that you will either still have to create and pass an IComparer<> for this class, or the class itself must implement IComparable<> so it can be sorted the way you want it to be.
You can introduce ascending/descending ordering:
public List<Word> GetWordsBy<T>(Func<Word,T> orderByPredicate, bool sortAscending = true)
{
return sortAscending
? words.Values.OrderBy(orderBy).ToList()
? words.Values.OrderByDescending(orderBy).ToList();
}
Update
I was trying to do it with delegates, but avoiding the caller having to roll their own lambda statement and use predefined ones.
You can simply wrap your method with some predefined options:
public List<Word> GetWordsBy<T>(Func<Word,T> orderByPredicate)
{
return words.Values.OrderBy(orderBy).ToList();
}
public List<Word> GetWordsByName()
{
return GetWordsBy(w => w.Name);
}
This way, your external callers don't need to use the lambda if they don't want to; but you still retain the benefits of having reusable code inside your class.
There are many ways to do this. I prefer creating preset methods for readability's sake, but you could instead have an enum which you then map to the correct Func. Or you could create some static preset lambdas which the external caller can reference. Or... The world is your oyster :-)
I hope this works, or even compiles.
class WordTable
{
public List<Word> GetWords(IComparer<Word> comparer)
{
return words.Values.OrderBy(x => x, comparer).ToList();
}
}
class WordsByNameAndThenTypeComparer : IComparer<Word>
{
public override int Compare(Word x, Word y)
{
int byName = x.Name.CompareTo(y.Name);
return byName != 0 ? byName : x.Type.CompareTo(y.Type);
}
}
Usage:
WordTable wt = new WordTable();
List<Words> words = wt.GetWords(new WordsByNameAndThenTypeComparer());

How to Make Collection Stop Calling Equals when Performing Collection.Remove

I have a Class Test which has a overriden method for "Equals" method and then I have a TestCollection class which is implemented using ICollection<Test> & IEnumerable<Test> in the Collection I have implemented Remove method which just removes the item from the current TestCollection object.
Whenever I class remove method for the TestCollection object, this internally calls "Equals" method which is overridden at Test class.
For one of my scenario, I do not want this Equals to be called, what are the other ways where I can remove the item from my collection without calling Equals
Below is the sample code for better understanding.
Test Class
public class Test
{
public int Id { get; set; }
private Collection<Test> _entities = new Collection<Test>();
public bool Remove(Test item)
{
return this._entities.Remove(item);
}
public override bool Equals(object obj)
{
Console.WriteLine("Equals inside Test Object");
return true;
}
}
TestCollection class
public class TestCollection : ICollection<Test>, IEnumerable<Test>
{
public TestCollection() : base() { }
private Collection<Test> _entities = new Collection<Test>();
public TestCollection(IList<Test> entityList)
{
this._entities = new Collection<Test>(entityList);
}
public bool Remove(Test item)
{
return this._entities.Remove(item);
}
public override bool Equals(object obj)
{
Console.WriteLine("Equals inside Test Collection Object");
return true;
}
}
I think you are missing the point here. Equals method is implementing the arithmetic relation of equivalence, like having attributes of being reflexive, symmetric and transitive. There are no two distinct ways to say that two objects are equal, you see?
Solution for you is to remove implementation of the Equals method. This method is intended to be overridden if and only if there is exactly one definition of equivalence for a class - like integer equality - there is exactly one way to test whether two integers are equal.
Also, that is the reason why Remove method does not accept an additional parameter such as an IComparer or IEqualityComparer - that wouldn't make sense.
On a related note: Entities should never override Equals. There is no equality relation (in mathematical terms) defined for objects that can change their state over time, and entity is defined as an object with lifetime. The trouble there is that you can pick two versions of the same entity and ask whether they are equal. Well, they are both equal (that is the same entity) and not equal (those are two versions of it). Therefore, Equals method is not the way to check equality of entities.
The short answer is that you cannot.
The way that an item is removed from a list is done by doing an equality check for the item in question on each of the entries in the list.
There may be some way to do it, however, but I doubt it's a good practice, or even desirable code.
You could wrap the list into another list that uses a custom IEqualityComparer implementation. Allow that comparer to have two different modes (pass through to object.Equals, or don't) and switch them before remove (and switch back afterwards).
You could find the index of the item you want to remove (not use its Equal) and call RemoveAt

NUnit comparing two lists

OK so I'm fairly new to unit testing and everything is going well until now.
I'm simplifying my problem here, but basically I have the following:
[Test]
public void ListTest()
{
var expected = new List<MyClass>();
expected.Add(new MyOtherClass());
var actual = new List<MyClass>();
actual.Add(new MyOtherClass());
Assert.AreEqual(expected,actual);
//CollectionAssert.AreEqual(expected,actual);
}
But the test is failing, shouldn't the test pass? what am I missing?
If you're comparing two lists, you should use test using collection constraints.
Assert.That(actual, Is.EquivalentTo(expected));
Also, in your classes, you will need to override the Equals method, otherwise like gleng stated, the items in the list are still going to be compared based on reference.
Simple override example:
public class Example
{
public int ID { get; set; }
public override bool Equals(object obj)
{
return this.ID == (obj as Example).ID;
}
}
A very simple way to get this test to work is to only create the MyOtherClass instance once. That way, when comparing the item in the two lists they will be "equal" (because they reference the same object). If you do this, CollectionAssert will work just fine.
[Test]
public void ListTest()
{
var thing = new MyOtherClass();
var expected = new List<MyClass>();
expected.Add(thing);
var actual = new List<MyClass>();
actual.Add(thing);
CollectionAssert.AreEqual(expected,actual);
}
If you don't this though, you'll need to implement IEquatable<MyOtherClass> in MyOtherClass or override Equals to define what makes two instances of that class the "same".
Try to be a bit more specific about what you are trying to achieve. Explicitly telling that you want to compare entire sequence will solve the problem. I personally wouldn't rely on NUnit fancy features for determining what you meant by says AreEqual. E.g.
Assert.IsTrue(actual.SequenceEqual(expected));
I convert my comment to answer on request.
Well, this fails because AreEqual uses reference comparison. In order to make it work you need value comparison(your own custom comparison).
You can pretty much do that by implementing IEquatable interface. and keep in mind when you're implementing this interface you must override Object.Equals and Object.GetHashCode as well to get consistent results.
.Net framework supports doing this without implementing IEquatable you need IEqualityComparer that should do the trick, but nunit should have a method which takes this as a overload. Am not certain about "nunit" though.
From Nunit documentation:
Starting with version 2.2, special provision is also made for comparing single-dimensioned arrays. Two arrays will be treated as equal by Assert.AreEqual if they are the same length and each of the corresponding elements is equal. Note: Multi-dimensioned arrays, nested arrays (arrays of arrays) and other collection types such as ArrayList are not currently supported.
You have a list of objects ... so it's not the same as comparing 2 ints.
What you should do is probably compare all the objects inside the list ... (Try converting your list to an array ... might actually work :) )
As I said (and most others as well), you'll probably need to override Equals. Here's MSDN page about how to do it (Covers Equals, == operator, and GetHashCode).
Similar with more info :
[compare-equality-between-two-objects-in-nunit]
(Compare equality between two objects in NUnit)
If you can't modify a class then this example can be helpful:
[Test]
public void Arrays_Should_Be_Equal()
{
MyClass[] array1 = GetTestArrayOfSize(10);
MyClass[] array2 = GetTestArrayOfSize(10);
// DOESN'T PASS
// Assert.That(array1, Is.EquivalentTo(array2));
Func<MyClass, object> selector = i => new { i.Property1, i.Property2 };
Assert.That(array1.Select(selector), Is.EquivalentTo(array2.Select(selector)));
}
private MyClass[] GetTestArrayOfSize(int count)
{
return Enumerable.Range(1, count)
.Select(i => new MyClass { Property1 = "Property1" + i, Property2 = "Property2" + i }).ToArray();
}
Option 1: AreEquivalent
CollectionAssert.AreEquivalent(expectedList, actualList);
Option 2: IEqualityComparer - for custom equality comparisons
Assert.That(expectedList, Is.EqualTo(actualList).Using(new IEqualityComparerImplementation()));
private class IEqualityComparerImplementation: IEqualityComparer<T>
{
public bool Equals(GeometricPlane plane1, GeometricPlane plane2)
{
// obviously add in your own implementation
throw new NotImplementedException();
}
public int GetHashCode(GeometricPlane obj)
{
throw new NotImplementedException();
}
}

List.Contains doesn't work properly

I have a list which contains objects but these objests aren't unique in the list. I wrte this code to make unique them in another list:
foreach (CategoryProductsResult categoryProductsResult in categoryProductsResults.Where(categoryProductsResult => !resultSet.Contains(categoryProductsResult)))
{
resultSet.Add(categoryProductsResult);
}
But at the end resultSet is the same with categoryProductsResults.
categoryProductsResult's second row :
resultSet first row:
As you can see resultSet's first row and categoryProductsResult's second row is the same but it adds the second row to resultSet.
Do you have any suggestion?
Contains uses the default comparer which is comparing references since your class does not override Equals and GetHashCode.
class CategoryProductsResult
{
public string Name { get; set; }
// ...
public override bool Equals(object obj)
{
if(obj == null)return false;
CategoryProductsResult other = obj as CategoryProductsResult;
if(other == null)return false;
return other.Name == this.Name;
}
public override int GetHashCode()
{
return Name.GetHashCode();
}
}
Now you can simply use:
resultSet = categoryProductsResults.Distinct().ToList();
List uses the comparer returned by EqualityComparer.Default and according to the documentation for that:
The Default property checks whether type T implements the
System.IEquatable(Of T) interface and, if so, returns an
EqualityComparer(Of T) that uses that implementation. Otherwise, it
returns an EqualityComparer(Of T) that uses the overrides of
Object.Equals and Object.GetHashCode provided by T.
So you can either implement IEquatable on your custom class, or override the Equals (and GetHashCode) methods to do the comparison by the properties you require. Alternatively you could use linq:
bool contains = list.Any(i => i.Id == obj.Id);
Each categoryProductsResult is different to each other. It's like something your can see here. If you want a simpler one and the ProductId is your unique identifier. Just do the code below:
foreach (CategoryProductsResult categoryProductsResult in categoryProductsResults.Where(categoryProductsResult => resultSet.ProductId !=categoryProductsResult.ProductId)
{
resultSet.Add(categoryProductsResult);
}
Reference objects in a list are indexed by their hash code. So, Contains will never find a reference object with the same hash code (unless you override the GetHashCode and Equals implementation in the class.
This SO answer explains.
You need to check if your current item is contained in your target list for each iteration. Currently you check once at the start of the loop, which means none of your items is in the target list.
I think Distinct is already doing what you want, you might want to use this extension instead of your own loop.

Checking if same object is already present in a list

Here is the story:
Im trying to make a list of different clusters... I only want to have the necessary clusters... And Clusters can be the same.
How can I add this to a list by checking if the list contains the object (I know objects cant be passed here)
This is my sample quote:
foreach (Cluster cluster in clustersByProgramme)
{
if (!clusterList.Contains(cluster))
{
clusterList.Add(cluster);
}
}
Your code should work; if it hasn't, you might be using different object instances that represent the same actual cluster, and you perhaps haven't provided a suitable Equals implementation (you should also update GetHashCode at the same time).
Also - in .NET 3.5, this could be simply:
var clusterList = clustersByProgramme.Distinct().ToList();
As an example of class that supports equality tests:
class Cluster // possibly also IEquatable<Cluster>
{
public string Name { get { return name; } }
private readonly string name;
public Cluster(string name) { this.name = name ?? ""; }
public override string ToString() { return Name; }
public override int GetHashCode() { return Name.GetHashCode(); }
public override bool Equals(object obj)
{
Cluster other = obj as Cluster;
return obj == null ? false : this.Name == other.Name;
}
}
Your example is about as simple as it is going to get. The only thing I could possibly recommend is that you use the Exists method:
The Predicate is a delegate to a
method that returns true if the object
passed to it matches the conditions
defined in the delegate. The elements
of the current List are individually
passed to the Predicate delegate, and
processing is stopped when a match is
found.
This method performs a linear search;
therefore, this method is an O(n)
operation, where n is Count.
If you're using .NET 3.5, use a HashSet to do this.
HashSet<Cluster> clusterList = new HashSet<Cluster>();
foreach (Cluster cluster in clustersByProgramme)
{
clusterList.Add(cluster);
}
In this case, also make sure that if cluster1 == cluster2, then
cluster1.Equals(cluster2);
cluster2.Equals(cluster1); //yeah, could be different depending on your impl
cluster1.GetHashCode() == cluster2.GetHashCode();
Your code is correct, but it is not very efficient. You could instead use a HashSet<T> like this:
HashSet<Cluster> clusterSet = new HashSet<T>();
foreach (Cluster cluster in clustersByProgramme)
clusterSet.Add(cluster);
In this case, also make sure that if cluster1 == cluster2, then
cluster1.Equals(cluster2);
cluster2.Equals(cluster1); //yeah, could be different depending on your impl
cluster1.GetHashCode() == cluster2.GetHashCode();
why not just use dictionary?
It is n(1) as long as your items have a good hash.
Seems a simple solution
Ie dictionary.Contains(key) is n(1)
you can then update existing if at all or add new

Categories