C# how to avoid duplicates in a list? - c#

What way I could use to avoid duplicates in a list?
One way is when I will add a new item, check first if the element exists, but this make me use more code and iterate all the list to check if it exists.
Another way I could use a hashset, that if I try to add a new item, itself check if the item exists, if not, it will add the new item, if exists, then do nothing.
But I know that the hashset is less efficient, need more resources than a list, so I don't know if using a hashset to avoid duplicates it is a good use of the hashset.
There are any other alternative?
Thanks.

You can achieve this in a single line of code :-
List<long> longs = new List<long> { 1, 2, 3, 4, 3, 2, 5 };
List<long> unique = longs.Distinct().ToList();
unique will contains only 1,2,3,4,5

You cannot avoid duplicates in List. No way - there is no verification of items.
If you don't bother with order of items - use HashSet.
If you want to preserve order of items (actually there is a little ambiguity - should item appear at index of first addition or at index of last addition). But you want to be sure that all items are unique, then you should write your own List class. I.e. something which implements IList<T> interface:
public class ListWithoutDuplicates<T> : IList<T>
And you have different options here. E.g. you should decide what is more important for you - fast addition or memory consumption. Because for fast addition and contains operation you should use some hash-based data structure. Which is unordered. Here is sample implementation with HashSet for storing hashes of all items stored in the internal list. You will need following fields:
private readonly HashSet<int> hashes = new HashSet<int>();
private readonly List<T> items = new List<T>();
private static readonly Comparer<T> comparer = Comparer<T>.Default;
Adding items is simple (warning: no null-checks here and further) - use item hash code to quickly O(1) check if it's already added. Use same approach for removing items:
public void Add(T item)
{
var hash = item.GetHashCode();
if (hashes.Contains(hash))
return;
hashes.Add(hash);
items.Add(item);
}
public bool Remove(T item)
{
var hash = item.GetHashCode();
if (!hashes.Contains(hash))
return false;
hashes.Remove(item.GetHashCode());
return items.Remove(item);
}
Some index-based operations:
public int IndexOf(T item)
{
var hash = item.GetHashCode();
if (!hashes.Contains(hash))
return -1;
return items.IndexOf(item);
}
public void Insert(int index, T item)
{
var itemAtIndex = items[index];
if (comparer.Compare(item, itemAtIndex) == 0)
return;
var hash = item.GetHashCode();
if (!hashes.Contains(hash))
{
hashes.Remove(itemAtIndex.GetHashCode());
items[index] = item;
hashes.Add(hash);
return;
}
throw new ArgumentException("Cannot add duplicate item");
}
public void RemoveAt(int index)
{
var item = items[index];
hashes.Remove(item.GetHashCode());
items.RemoveAt(index);
}
And left-overs:
public T this[int index]
{
get { return items[index]; }
set { Insert(index, value); }
}
public int Count => items.Count;
public bool Contains(T item) => hashes.Contains(item.GetHashCode());
public IEnumerator<T> GetEnumerator() => items.GetEnumerator();
IEnumerator IEnumerable.GetEnumerator() => items.GetEnumerator();
That's it. Now you have list implementation which will add item only once (first time). E.g.
var list = new ListWithoutDuplicates<int> { 1, 2, 1, 3, 5, 2, 5, 3, 4 };
Will create list with items 1, 2, 3, 5, 4. Note: if memory consumption is more important than performance, then instead of using hashes use items.Contains operation which is O(n).
BTW What we just did is actually a IList Decorator

A List is a data-structure that may contain duplicates. Duplicate elements are disambiguated by their index.
One way is when I will add a new item, check first if the element exists, but this make me use more code and iterate all the list to check if it exists.
This is possible, but it is error-prone and slow. You will need to iterate through the entire list every time you want to add an element. It is also possible that you will forget to check somewhere in your code.
Another way I could use a hashset, that if I try to add a new item, itself check if the item exists, if not, it will add the new item, if exists, then do nothing.
This is the preferred way. It is best to use the standard library to enforce the contraints that you want.
But I know that the hashset is less efficient, need more resources than a list, so I don't know if using a hashset to avoid duplicates it is a good use of the hashset.
The efficiency depends on what you are trying to do; see https://stackoverflow.com/a/23949528/1256041.
There are any other alternative?
You could implement your own ISet using List. This would make insertion much slower (you would need to iterate the whole collection), but you would gain O(1) random-access.

The hashset is the best way to check if the item exist because it's O(1).
So you can insert the items both in a list and in hashset
and before inserting a new item you check if it's exist in the hashset.

Related

Get last duplicate element in a list

I have a list contains duplicate items.
List<string> filterList = new List<string>()
{
"postpone", "access", "success", "postpone", "success"
};
I get the output which is postpone, access, success by using
List<string> filter = filterList.Distinct().ToList();
string a = string.Join(",", filter.Select(a => a).ToArray());
Console.WriteLine(a);
I had saw other example, they can use groupby to get the latest element since they have other item like ID etc. Now I only have the string, how can I get the latest item in the list which is access, postpone, success? Any suggestion?
One way to do this would be use the Index of the item in original collection along with GroupBy. For example,
var lastDistinct = filterList.Select((x,index)=> new {Value=x,Index=index})
.GroupBy(x=>x.Value)
.Select(x=> x.Last())
.OrderBy(x=>x.Index)
.Select(x=>x.Value);
var result = string.Join(",",lastDistinct);
Output
access,postpone,success
An OrderedDictionary does this. All you have to do is add your items to it with a logic of "if it's in the dictionary, remove it. add it". OrderedDictionary preserves the order of adding so by removing an earlier added one and re-adding it it jumps to the end of the dictionary
var d = new OrderedDictionary();
filterList.ForEach(x => { if(d.Contains(x)) d.Remove(x); d[x] = null; });
Your d.Keys is now a list of strings
access
postpone
success
OrderedDictionary is in the Collections.Specialized namespace
If you wanted the keys as a CSV, you can use Cast to turn them from object to string
var s = string.Join(",", d.Keys.Cast<string>());
Your input list is only of type string, so using groupBy doesn't really add anything. If you consider your code, your first line gives you the distinct list, you only lose the distinct items because you did a string.join on line 2. All you need to do is add a line before you join:
List<string> filter = filterList.Distinct().ToList();
string last = filter.LastOrDefault();
string a = string.Join(",", filter.Select(a => a).ToArray());
Console.WriteLine(a);
I suppose you could make your code more terse because you need neither .Select(a => a) nor .ToArray() in your call to string.Join.
GroupBy would be used if you had a list of class/struct/record/tuple items, where you might want to group by a specific key (or keys) rather than using Distinct() on the whole thing. GroupBy is very useful and you should learn that, and also the ToDictionary and ToLookup LINQ helper functionality.
So why shouldn't you return the first occurrence of "postpone"? Because later in the sequence you see the same word "postpone" again. Why would you return the first occurrence of "access"? Because later in the sequence you don't see this word anymore.
So: return a word if the rest of the sequence does not have this word.
This would be easy in LINQ, with recursion, but it is not very efficient: for every word you would have to check the rest of the sequence to see if the word is in the rest.
It would be way more efficient to remember the highest index on which you found a word.
As an extension method. If you are not familiar with extension methods, see extension methods demystified.
private static IEnumerable<T> FindLastOccurences<T>(this IEnumerable<T> source)
{
return FindLastOccurrences<T>(source, null);
}
private static IEnumerable<T> FindLastOccurences<T>(this IEnumerable<T> source,
IEqualityComparer<T> comparer)
{
// TODO: check source not null
if (comparer == null) comparer = EqualityComparer<T>.Default;
Dictionary<T, int> dictionary = new Dictionary<T, int>(comparer);
int index = 0;
foreach (T item in source)
{
// did we already see this T? = is this in the dictionary
if (dictionary.TryGetValue(item, out int highestIndex))
{
// we already saw it at index highestIndex.
dictionary[item] = index;
}
else
{
// it is not in the dictionary, we never saw this item.
dictionary.Add(item, index);
}
++index;
}
// return the keys after sorting by value (which contains the highest index)
return dictionay.OrderBy(keyValuePair => keyValuePair.Value)
.Select(keyValuePair => keyValuePair.Key);
}
So for every item in the source sequence, we check if it is in the dictionary. If not, we add the item as key to the dictionary. The value is the index.
If it is already in the dictionary, then the value was the highest index of where we found this item before. Apparently the current index is higher, so we replace the value in the dictionary.
Finally we order the key value pairs in the dictionary by ascending value, and return only the keys.

Iterate over C# dictionary's keys with index?

How do I iterate over a Dictionary's keys while maintaining the index of the key.
What I've done is merge a foreach-loop with a local variable i which gets incremented by one for every round of the loop.
Here's my code that works:
public IterateOverMyDict()
{
int i=-1;
foreach (string key in myDict.Keys)
{
i++;
Console.Write(i.ToString() + " : " + key);
}
}
However, it seems really low tech to use a local variable i.
I was wondering if there's a way where I don't have to use the "extra" variable?
Not saying this is a bad way, but is there a better one?
There's no such concept as "the index of the key". You should always treat a Dictionary<TKey, TValue> as having an unpredictable order - where the order which you happen to get when iterating over it may change. (So in theory, you could add one new entry, and the entries could be in a completely different order next time you iterated over them. In theory this could even happen without you changing the data, but that's less likely in normal implementations.)
If you really want to get the numeric index which you happened to observe this time, you could use:
foreach (var x in dictionary.Select((Entry, Index) => new { Entry, Index }))
{
Console.WriteLine("{0}: {1} = {2}", x.Index, x.Entry.Key, x.Entry.Value);
}
... but be aware that that's a fairly misleading display, as it suggests an inherent ordering.
From the documentation:
For purposes of enumeration, each item in the dictionary is treated as a KeyValuePair<TKey, TValue> structure representing a value and its key. The order in which the items are returned is undefined.
EDIT: If you don't like the Select call here, you could create your own extension method:
public struct IndexedValue<T>
{
private readonly T value;
private readonly int index;
public T Value { get { return value; } }
public int Index { get { return index; } }
public IndexedValue(T value, int index)
{
this.value = value;
this.index = index;
}
}
public static class Extensions
{
public static IEnumerable<IndexedValue<T>> WithIndex<T>
(this IEnumerable<T> source)
{
return source.Select((value, index) => new IndexedValue<T>(value, index));
}
}
Then your loop would be:
foreach (var x in dictionary.WithIndex())
{
Console.WriteLine("{0}: {1} = {2}", x.Index, x.Value.Key, x.Value.Value);
}
Technically, the key is the index in a Dictionary<TKey, TValue>. You're not guaranteed to get the items in any specific order, so there's really no numeric index to be applied.
Not really. Note that keys in a dictionary are not logically "ordered". They don't have an index. There is no first or last key, from the Dictionary's point of view. You can keep track on your own whether this is the first key returned by the enumerator, as you are doing, but the Dictionary has no concept of "give me the 5th key", so you couldn't use a for loop with an indexer as you could with a list or array.
Dictionaries are not exactly lists, arrays, or vectors. They take those constructs a step further. The key can be the index:
Dictionary myDictionary<int, string> = new Dictionary<int, string>()
{
{0, "cat"},
{1, "dog"},
{2, "pig"},
{3, "horse"}
};
myDictionary[4] = "hat";
for int i = 0; i <5; i++){
Console.Writeline(myDictionary[i]);
}
At this point you are probably missing most of the benefits of a dictionary (which is similar to enumeration with the benefit of sorting quickly on key values), and using it like a list.
The Select((Entry, Index) => new { Entry, Index }) approach is probably best in the specific context of this question but, as an alternative, System.Linq.Enumerable now lets you convert a dictionary into a list. Something like this would work:
var x = dictionary.ToList();
for (int y=0; y<x.Count; y++) Console.WriteLine(y + " = " + x[y].Key);
There are pros & cons to both approaches depending on what you're trying to do.

Adding an IList item to a particular index number

Our Client's database returns a set of prices in an array, but they sometimes don't include all prices, i.e., they have missing elements in their array. We return what we find as an IList, which works great when we retrieve content from the database. However, we are having difficulties setting the elements in the proper position in the array.
Is it possible to create an IList then add an element at a particular position in the IList?
var myList = new List<Model>();
var myModel = new Model();
myList[3] = myModel; // Something like what we would want to do
Use IList<T>.Insert(int index,T item)
IList<string> mylist = new List<string>(15);
mylist.Insert(0, "hello");
mylist.Insert(1, "world");
mylist.Insert(15, "batman"); // This will throw an exception.
From MSDN
If index equals the number of items in the IList, then item is appended to the list.
In collections of contiguous elements, such as lists, the elements that follow the insertion point move down to accommodate the new element. If the collection is indexed, the indexes of the elements that are moved are also updated. This behavior does not apply to collections where elements are conceptually grouped into buckets, such as a hash table.
Use IList.Insert Method.
Lists grow dynamically to accommodate items as they are added. You would have to initialize the list with a predefined size. The easiest way I can think of to do that would be:
var myList = new Model[100].ToList();
That'll give you a list with 100 items, all null. You're then free to assign a value to myList[3].
Note that in your code you are trying to instantiate an IList<Model> which isn't possible - you need a concrete type (like List<Model>) rather than an interface.
It will insert and resize if needed
public static IList<T> InsertR<T>(this IList<T> ilist, int index, T item) {
if (!(index < ilist.Count)) {
T[] array = Array.CreateInstance(typeof(T), index + 1) as T[];
ilist.CopyTo(array, 0);
array[index] = item;
if (ilist.GetType().IsArray) {
ilist = array;
} else {
ilist = array.ToList();
}
} else
ilist[index] = item;
return ilist;
}
or
public static IList InsertR<T>(this IList ilist, int index, T item) {
if (!(index < ilist.Count)) {
T[] array = Array.CreateInstance(typeof(T), index + 1) as T[];
ilist.CopyTo(array, 0);
array[index] = item;
if (ilist.GetType().IsArray) {
ilist = array;
} else {
ilist = array.ToList();
}
} else
ilist[index] = item;
return ilist;
}

Can't add/remove items from a collection while foreach is iterating over it

If I make my own implementation of IEnumerator interface, then I am able ( inside foreach statement )to add or remove items from a albumsList without generating an exception.But if foreach statement uses IEnumerator supplied by albumsList, then trying to add/delete ( inside the foreach )items from albumsList will result in exception:
class Program
{
static void Main(string[] args)
{
string[] rockAlbums = { "rock", "roll", "rain dogs" };
ArrayList albumsList = new ArrayList(rockAlbums);
AlbumsCollection ac = new AlbumsCollection(albumsList);
foreach (string item in ac)
{
Console.WriteLine(item);
albumsList.Remove(item); //works
}
foreach (string item in albumsList)
{
albumsList.Remove(item); //exception
}
}
class MyEnumerator : IEnumerator
{
ArrayList table;
int _current = -1;
public Object Current
{
get
{
return table[_current];
}
}
public bool MoveNext()
{
if (_current + 1 < table.Count)
{
_current++;
return true;
}
else
return false;
}
public void Reset()
{
_current = -1;
}
public MyEnumerator(ArrayList albums)
{
this.table = albums;
}
}
class AlbumsCollection : IEnumerable
{
public ArrayList albums;
public IEnumerator GetEnumerator()
{
return new MyEnumerator(this.albums);
}
public AlbumsCollection(ArrayList albums)
{
this.albums = albums;
}
}
}
a) I assume code that throws exception ( when using IEnumerator implementation A supplied by albumsList ) is located inside A?
b) If I want to be able to add/remove items from a collection ( while foreach is iterating over it), will I always need to provide my own implementation of IEnumerator interface, or can albumsList be set to allow adding/removing items?
thank you
Easiest way is to either reverse through the items like for(int i = items.Count-1; i >=0; i--), or loop once, gather all the items to remove in a list, then loop through the items to remove, removing them from the original list.
Generally it's discouraged to design collection classes that allow you to modify the collection while enumerating, unless your intention is to design something thread-safe specifically so that this is possible (e.g., adding from one thread while enumerating from another).
The reasons are myriad. Here's one.
Your MyEnumerator class works by incrementing an internal counter. Its Current property exposes the value at the given index in an ArrayList. What this means is that enumerating over the collection and removing "each" item will actually not work as expected (i.e., it won't remove every item in the list).
Consider this possibility:
The code you posted will actually do this:
You start by incrementing your index to 0, which gives you a Current of "rock." You remove "rock."
Now the collection has ["roll", "rain dogs"] and you increment your index to 1, making Current equal to "rain dogs" (NOT "roll"). Next, you remove "rain dogs."
Now the collection has ["roll"], and you increment your index to 2 (which is > Count); so your enumerator thinks it's finished.
There are other reasons this is a problematic implementation, though. For instance someone using your code might not understand how your enumerator works (nor should they -- the implementation should really not matter), and therefore not realize that the cost of calling Remove within a foreach block incurs the penalty of IndexOf -- i.e., a linear search -- on every iteration (see the MSDN documentation on ArrayList.Remove to verify this).
Basically, what I'm getting at is: you don't want to be able to remove items from within a foreach loop (again, unless you're designing something thread-safe... maybe).
OK, so what is the alternative? Here are a few points to get you started:
Don't design your collection to allow -- let alone expect -- modification within an enumeration. It leads to curious behavior such as the example I provided above.
Instead, if you want to provide bulk removal capabilities, consider methods such as Clear (to remove all items) or RemoveAll (to remove items matching a specified filter).
These bulk-removal methods can be implemented fairly easily. ArrayList already has a Clear method, as do most of the collection classes you might use in .NET. Otherwise, if your internal collection is indexed, a common method to remove multiple items is by enumerating from the top index using a for loop and calling RemoveAt on indices where removal is desired (notice this fixes two problems at once: by going backwards from the top, you ensure accessing each item in the collection; moreover, by using RemoveAt instead of Remove, you avoid the penalty of repeated linear searches).
As an added note, I would strongly encourage steering clear of non-generic collections such as ArrayList to begin with. Go with strongly typed, generic counterparts such as List(Of Album) instead (assuming you had an Album class -- otherwise, List(Of String) is still more typesafe than ArrayList).
Suppose I have a collection, an array for that matter
int[] a = { 1, 2, 3, 4, 5 };
I have a function
public IList<int> myiterator()
{
List<int> lst = new List<int>();
for (int i = 0; i <= 4; i++)
{
lst.Add(a[i]);
}
return lst;
}
Now i call this function and iterate over and try to add
var a = myiterator1();
foreach (var a1 in a)
{
a.Add(29);
}
Will cause a runtime exception
Here thing to notice is that if we are allowed to add for each element
in list
list will become something like {1,2,3,4,5,6}
then for every element and every newly added we keep on adding coz of that
we will be stuck in a infinite operation as it will again be repeated for every element
From the MSDN documentation for INotifyCollectionChanged:
You can enumerate over any collection
that implements the IEnumerable
interface. However, to set up dynamic
bindings so that insertions or
deletions in the collection update the
UI automatically, the collection must
implement the INotifyCollectionChanged
interface. This interface exposes the
CollectionChanged event that must be
raised whenever the underlying
collection changes.
WPF provides the
ObservableCollection<(Of <(T>)>)
class, which is a built-in
implementation of a data collection
that exposes the
INotifyCollectionChanged interface.
For an example, see How to: Create and
Bind to an ObservableCollection.
The individual data objects within the
collection must satisfy the
requirements described in the Binding
Sources Overview.
Before implementing your own
collection, consider using
ObservableCollection<(Of <(T>)>) or
one of the existing collection
classes, such as List<(Of <(T>)>),
Collection<(Of <(T>)>), and
BindingList<(Of <(T>)>), among many
others.
If you have an advanced scenario and
want to implement your own collection,
consider using IList, which provides a
non-generic collection of objects that
can be individually accessed by index
and provides the best performance.
Sounds to me that the problem is in the Collection itself, and not its Enumerator.

Removing duplicate string from List (.NET 2.0!)

I'm having issues finding the most efficient way to remove duplicates from a list of strings (List).
My current implementation is a dual foreach loop checking the instance count of each object being only 1, otherwise removing the second.
I know there are MANY other questions out there, but they all the best solutions require above .net 2.0, which is the current build environment I'm working in. (GM and Chrysler are very resistant to changes ... :) )
This limits the possible results by not allowing any LINQ, or HashSets.
The code I'm using is Visual C++, but a C# solution will work just fine as well.
Thanks!
This probably isn't what you're looking for, but if you have control over this, the most efficient way would be to not add them in the first place...
Do you have control over this? If so, all you'd need to do is a myList.Contains(currentItem) call before you add the item and you're set
You could do the following.
List<string> list = GetTheList();
Dictionary<string,object> map = new Dictionary<string,object>();
int i = 0;
while ( i < list.Count ) {
string current = list[i];
if ( map.ContainsKey(current) ) {
list.RemoveAt(i);
} else {
i++;
map.Add(current,null);
}
}
This has the overhead of building a Dictionary<TKey,TValue> object which will duplicate the list of unique values in the list. But it's fairly efficient speed wise.
I'm no Comp Sci PhD, but I'd imagine using a dictionary, with the items in your list as the keys would be fast.
Since a dictionary doesn't allow duplicate keys, you'd only have unique strings at the end of iteration.
Just remember when providing a custom class to override the Equals() method in order for the Contains() to function as required.
Example
List<CustomClass> clz = new List<CustomClass>()
public class CustomClass{
public bool Equals(Object param){
//Put equal code here...
}
}
If you're going the route of "just don't add duplicates", then checking "List.Contains" before adding an item works, but its O(n^2) where n is the number strings you want to add. Its no different from your current solution using two nested loops.
You'll have better luck using a hashset to store items you've already added, but since you're using .NET 2.0, a Dictionary can substitute for a hash set:
static List<T> RemoveDuplicates<T>(List<T> input)
{
List<T> result = new List<T>(input.Count);
Dictionary<T, object> hashSet = new Dictionary<T, object>();
foreach (T s in input)
{
if (!hashSet.ContainsKey(s))
{
result.Add(s);
hashSet.Add(s, null);
}
}
return result;
}
This runs in O(n) and uses O(2n) space, it will generally work very well for up to 100K items. Actual performance depends on the average length of the strings -- if you really need to maximum performance, you can exploit some more powerful data structures like tries make inserts even faster.

Categories