Test if all values in a list are unique - c#

I have a small list of bytes and I want to test that they're all different values.
For instance, I have this:
List<byte> theList = new List<byte> { 1,4,3,6,1 };
What's the best way to check if all values are distinct or not?

bool isUnique = theList.Distinct().Count() == theList.Count();

Here's another approach which is more efficient than Enumerable.Distinct + Enumerable.Count (all the more if the sequence is not a collection type). It uses a HashSet<T> which eliminates duplicates, is very efficient in lookups and has a count-property:
var distinctBytes = new HashSet<byte>(theList);
bool allDifferent = distinctBytes.Count == theList.Count;
or another - more subtle and efficient - approach:
var diffChecker = new HashSet<byte>();
bool allDifferent = theList.All(diffChecker.Add);
HashSet<T>.Add returns false if the element could not be added since it was already in the HashSet. Enumerable.All stops on the first "false".

Okay, here is the most efficient method I can think of using standard .Net
using System;
using System.Collections.Generic;
public static class Extension
{
public static bool HasDuplicate<T>(
this IEnumerable<T> source,
out T firstDuplicate)
{
if (source == null)
{
throw new ArgumentNullException(nameof(source));
}
var checkBuffer = new HashSet<T>();
foreach (var t in source)
{
if (checkBuffer.Add(t))
{
continue;
}
firstDuplicate = t;
return true;
}
firstDuplicate = default(T);
return false;
}
}
essentially, what is the point of enumerating the whole sequence twice if all you want to do is find the first duplicate.
I could optimise this more by special casing an empty and single element sequences but that would depreciate from readability/maintainability with minimal gain.

The similar logic to Distinct using GroupBy:
var isUnique = theList.GroupBy(i => i).Count() == theList.Count;

I check if an IEnumerable (aray, list, etc ) is unique like this :
var isUnique = someObjectsEnum.GroupBy(o => o.SomeProperty).Max(g => g.Count()) == 1;

One can also do: Use Hashset
var uniqueIds = new HashSet<long>(originalList.Select(item => item.Id));
if (uniqueIds.Count != originalList.Count)
{
}

There are many solutions.
And no doubt more beautiful ones with the usage of LINQ as "juergen d" and "Tim Schmelter" mentioned.
But, if you bare "Complexity" and speed, the best solution will be to implement it by yourself.
One of the solution will be, to create an array of N size (for byte it's 256).
And loop the array, and on every iteration will test the matching number index if the value is 1 if it does, that means i already increment the array index and therefore the array isn't distinct otherwise i will increment the array cell and continue checking.

And another solution, if you want to find duplicated values.
var values = new [] { 9, 7, 2, 6, 7, 3, 8, 2 };
var sorted = values.ToList();
sorted.Sort();
for (var index = 1; index < sorted.Count; index++)
{
var previous = sorted[index - 1];
var current = sorted[index];
if (current == previous)
Console.WriteLine(string.Format("duplicated value: {0}", current));
}
Output:
duplicated value: 2
duplicated value: 7
http://rextester.com/SIDG48202

Related

What means a combined All with Any LINQ query

I have a line of code:
bool true/false = array1.All(a => array2.Any(t => t.field == a.field));
I do not understand this combination of all + any.
Does it mean 'No field of array1 equals any field of array2' then return true ?
Would this not be the same as array1.Except(array2).Any(); ?
UPDATE
Accidently I put a "!" before the .Any() !!!
I think they are really different, it also depends on how you array is structured. If it has only field property or it has other properties as well.
Code inspection
array1.All(a => array2.Any(t => t.field == a.field));
Return true if For each element in array1 there's at least one element in array2 that has the
same value for the field property
array1.Except(array2).Any();
Return True if there's at least one element of array1 not present in
array2
Now given your context, if field is the only property of your structure it produces the same result, but it does not if there's other things going on.
For example
struct Test
{
public string int { get; set; }
public string int { get; set; }
}
//...
var array1 = new Test[]
{
new Test { Field = 0, OtherField = 1 },
new Test { Field = 1, OtherField = 2 }
}
var array2 = new Test[]
{
new Test { Field = 0, OtherField = 1 },
new Test { Field = 2, OtherField = 2 }
}
First case: is it true that for each element of array1 there's at least one element in array2 with same value in field property? False
Second case: is it true that at least one element of array1 is not present in array2? True
That means return true if there is no item in array2 which has the same value in field for all of items in array1.
Simpler version:
For all items in array1, there is no item in array2 with same value of field.
UPDATE:
Now the modified version is much simpler since it says return true if for all items in array1 there is an item in array2 with same value for field.
In summary, your first solution is to check if all the elements in array1 have the same field value in some element in array2, which could be also translated to:
var areContained =!array1.Select(e=>e.field)).Except(array2.Select(d=>d.field)).Any();
Another variant could be using a hash that could help to the efficiency if the superset if too big:
HashSet<int> hashSet = new HashSet<int>(array2.Select(d=>d.field));//Superset
bool contained = array1.Select(e=>e.field)).All(i => hashSet.Contains(i));
In your alternative you are comparing based on object instances of the arrays by using the default equality comparer, so that can produce a completely different result. Take a look the example in this link

How to remove first occurence of element in List C# with LINQ?

I have the following list
List<string> listString = new List<string>() { "UserId", "VesselId", "AccountId", "VesselId" };
I would like to use a Linq operator which removes only the first occurrence of VesselId.
you can not change the original collection with LINQ, so the closest thing would be:
var index = listString.IndexOf("VesselId");
if(index>-1)
listString.RemoveAt(index);
EDIT 1:
According to research done by Igor Mesaros, I ended up implementing the logic which resides in the List.Remove method, so an even simpler solution would be:
listString.Remove("VesselId");
If you have a simple list of strings or some other primitive type, then you can just call:
listString.Remove("VesselId");
as mentioned by #Eyal Perry
If you have a huge none primitive list, this is the most efficient
class MyClass
{
string Name {get; set;}
}
var listString = new List<MyClass>() {/* Fill in with Data */};
var match = listString.FirstOrDefault(x => x.Name == "VesselId");
if(match != null)
listString.Remove(match);
If what you are looking to do is get a distinct list then you have an extension method for that listString.Distinct()
If you absolutely MUST use LINQ, you can use it to find the first instance of "VesselId" inside the Remove() method, like so:
listString.Remove((from a in listString
where a == "VesselId"
select a).First());
listString = listString.Union(new List<string>()).ToList();
OR
List<string> CopyString = new List<string>();
CopyString.AddRange(listString);
foreach (var item in CopyString)
{
var index = CopyString.IndexOf(item);
if (index >= 0 && listString.Count(cnt => cnt == item) > 1)
listString.RemoveAt(index);
}

C# - fastest way of comparing a collection against itself to find duplicates

public class TestObject
{
string TestValue { get; set; }
bool IsDuplicate { get; set; }
}
List<TestObject> testList = new List<TestObject>
{
new TestObject { TestValue = "Matt" },
new TestObject { TestValue = "Bob" },
new TestObject { TestValue = "Alice" },
new TestObject { TestValue = "Matt" },
new TestObject { TestValue = "Claire" },
new TestObject { TestValue = "Matt" }
};
Imagine testList is actually millions of objects long.
What's the fastest way to ensure that two of those three TestObjects with TestValue of Matt gets its IsDuplicate set to true? No matter how may instances of a given value there are, only one should come out of the process with IsDuplicate of false.
I am not averse to doing this via threading. And the collection doesn't have to be a list if converting it to another collection type is faster.
I need to keep duplicates and mark them as such, not remove them from the collection.
To expand, this is (as you might imagine) a simple expression of a much more complex problem. The objects in question already have an ordinal which I can use to order them.
After matching initial duplicates on exact string equality, I'm going to have to go back through the collection again and re-try the remainder using some fuzzy matching logic. The collection that exists at the start of this process won't be changed during the deduplication, or afterwards.
Eventually the original collection is going to be written out to a file, with likely duplicates flagged.
As others mentioned, the correct approach here would be to use the HashSet class.
var hashSet = new HashSet<string>();
foreach (var obj in testList)
{
if (!hashSet.Add(obj.TestValue))
{
obj.IsDuplicate = true;
}
}
When you add a value first time to the HashSet, it adds successfully and HashSet.Add() method returns true so you don't make any changes to the item. When you're trying to add it second time, HashSet.Add() returns false and you mark your item as a duplicate.
The list will have the following state after finishing running our marking duplicates method:
Matt
Bob
Alice
Claire
Matt DUPLICATE
This is probably quite performant:
foreach (var dupe in testList.GroupBy(x => x.TestValue).SelectMany(g => g.Skip(1)))
dupe.IsDuplicate = true;
[EDIT] This method turns out to be about a third of the speed of the accepted answer above, so that one should be used. This answer is merely of academic interest.
Probably I would go to check for the duplicates while building the collection of the TestValue to avoid looping two times on millions of elements. If this scenario is possible then I would use a Dictionary<string, List<TestValue>>
Dictionary<string, List<TestValue>> myList = new Dictionary<string, List<TestValue>>();
while(NotEndOfData())
{
TestValue obj = GetTestValue();
if(myList.ContainsKey(obj.Name))
{
obj.IsDuplicate = true;
myList[obj.Name].Add(obj);
}
else
{
obj.IsDuplicate = false;
myList.Add(obj.Name, new List<TestValue>() { obj};
}
}
SortedSet<string> sorted = new SortedSet<string>();
for (int i = 0; i < testList.Count; i++)
testList[i].IsDuplicate = !sorted.Add(testList[i].TestValue);
As you have allowed in the question, I'd change testList to be an array instead of a list, to make indexer faster.
Since you indicated that you have a property that keeps the ordinal of your items. We can use that property to reset the sort order back to its original after marking our items as duplicates.
The code below is self-explainatory. But just let me know in case you need any further explaination.
I have assumed that the property name is SortOrder. Modify the code accordingly.
void MarkDuplicates()
{
testList = testList.OrderBy(f => f.TestValue).ThenBy(f => f.SortOrder).ToList();
for (int i = 1; i < testList.Count; i++)
{
if (testList[i].TestValue == testList[i - 1].TestValue) testList[i].IsDuplicate = true;
}
testList = testList.OrderBy(f => f.SortOrder).ToList();
}
I'm not a performance expert. But you can time the various solutions provided here and check the performance for yourself.

How to iterate through two collections of the same length using a single foreach

I know this question has been asked many times before but I tried out the answers and they don't seem to work.
I have two lists of the same length but not the same type, and I want to iterate through both of them at the same time as list1[i] is connected to list2[i].
Eg:
Assuming that i have list1 (as List<string>) and list2 (as List<int>)
I want to do something like
foreach( var listitem1, listitem2 in list1, list2)
{
// do stuff
}
Is this possible?
This is possible using .NET 4 LINQ Zip() operator or using open source MoreLINQ library which provides Zip() operator as well so you can use it in more earlier .NET versions
Example from MSDN:
int[] numbers = { 1, 2, 3, 4 };
string[] words = { "one", "two", "three" };
// The following example concatenates corresponding elements of the
// two input sequences.
var numbersAndWords = numbers.Zip(words, (first, second) => first + " " + second);
foreach (var item in numbersAndWords)
{
Console.WriteLine(item);
}
// OUTPUT:
// 1 one
// 2 two
// 3 three
Useful links:
Soure code of the MoreLINQ Zip() implementation: MoreLINQ Zip.cs
Edit - Iterating whilst positioning at the same index in both collections
If the requirement is to move through both collections in a 'synchronized' fashion, i.e. to use the 1st element of the first collection with the 1st element of the second collection, then 2nd with 2nd, and so on, without needing to perform any side effecting code, then see #sll's answer and use .Zip() to project out pairs of elements at the same index, until one of the collections runs out of elements.
More Generally
Instead of the foreach, you can access the IEnumerator from the IEnumerable of both collections using the GetEnumerator() method and then call MoveNext() on the collection when you need to move on to the next element in that collection. This technique is common when processing two or more ordered streams, without needing to materialize the streams.
var stream1Enumerator = stream1.GetEnumerator();
var stream2Enumerator = stream2.GetEnumerator();
var currentGroupId = -1; // Initial value
// i.e. Until stream1Enumerator runs out of
while (stream1Enumerator.MoveNext())
{
// Now you can iterate the collections independently
if (stream1Enumerator.Current.Id != currentGroupId)
{
stream2Enumerator.MoveNext();
currentGroupId = stream2Enumerator.Current.Id;
}
// Do something with stream1Enumerator.Current and stream2Enumerator.Current
}
As others have pointed out, if the collections are materialized and support indexing, such as an ICollection interface, you can also use the subscript [] operator, although this feels rather clumsy nowadays:
var smallestUpperBound = Math.Min(collection1.Count, collection2.Count);
for (var index = 0; index < smallestUpperBound; index++)
{
// Do something with collection1[index] and collection2[index]
}
Finally, there is also an overload of Linq's .Select() which provides the index ordinal of the element returned, which could also be useful.
e.g. the below will pair up all elements of collection1 alternatively with the first two elements of collection2:
var alternatePairs = collection1.Select(
(item1, index1) => new
{
Item1 = item1,
Item2 = collection2[index1 % 2]
});
Short answer is no you can't.
Longer answer is that is because foreach is syntactic sugar - it gets an iterator from the collection and calls Next on it. This is not possible with two collections at the same time.
If you just want to have a single loop, you can use a for loop and use the same index value for both collections.
for(int i = 0; i < collectionsLength; i++)
{
list1[i];
list2[i];
}
An alternative is to merge both collections into one using the LINQ Zip operator (new to .NET 4.0) and iterate over the result.
foreach(var tup in list1.Zip(list2, (i1, i2) => Tuple.Create(i1, i2)))
{
var listItem1 = tup.Item1;
var listItem2 = tup.Item2;
/* The "do stuff" from your question goes here */
}
It can though be such that much of your "do stuff" can go in the lambda that here creates a tuple, which would be even better.
If the collections are such that they can be iterated, then a for() loop is probably simpler still though.
Update: Now with the built-in support for ValueTuple in C#7.0 we can use:
foreach ((var listitem1, var listitem2) in list1.Zip(list2, (i1, i2) => (i1, i2)))
{
/* The "do stuff" from your question goes here */
}
You can wrap the two IEnumerable<> in helper class:
var nums = new []{1, 2, 3};
var strings = new []{"a", "b", "c"};
ForEach(nums, strings).Do((n, s) =>
{
Console.WriteLine(n + " " + s);
});
//-----------------------------
public static TwoForEach<A, B> ForEach<A, B>(IEnumerable<A> a, IEnumerable<B> b)
{
return new TwoForEach<A, B>(a, b);
}
public class TwoForEach<A, B>
{
private IEnumerator<A> a;
private IEnumerator<B> b;
public TwoForEach(IEnumerable<A> a, IEnumerable<B> b)
{
this.a = a.GetEnumerator();
this.b = b.GetEnumerator();
}
public void Do(Action<A, B> action)
{
while (a.MoveNext() && b.MoveNext())
{
action.Invoke(a.Current, b.Current);
}
}
}
Instead of a foreach, why not use a for()? for example...
int length = list1.length;
for(int i = 0; i < length; i++)
{
// do stuff with list1[i] and list2[i] here.
}

Removing duplicates from a list with "priority"

Given a collection of records like this:
string ID1;
string ID2;
string Data1;
string Data2;
// :
string DataN
Initially Data1..N are null, and can pretty much be ignored for this question. ID1 & ID2 both uniquely identify the record. All records will have an ID2; some will also have an ID1. Given an ID2, there is a (time-consuming) method to get it's corresponding ID1. Given an ID1, there is a (time-consuming) method to get Data1..N for the record. Our ultimate goal is to fill in Data1..N for all records as quickly as possible.
Our immediate goal is to (as quickly as possible) eliminate all duplicates in the list, keeping the one with more information.
For example, if Rec1 == {ID1="ABC", ID2="XYZ"}, and Rec2 = {ID1=null, ID2="XYZ"}, then these are duplicates, --- BUT we must specifically remove Rec2 and keep Rec1.
That last requirement eliminates the standard ways of removing Dups (e.g. HashSet), as they consider both sides of the "duplicate" to be interchangeable.
How about you split your original list into 3 - ones with all data, ones with ID1, and ones with just ID2.
Then do:
var unique = allData.Concat(id1Data.Except(allData))
.Concat(id2Data.Except(id1Data).Except(allData));
having defined equality just on the basis of ID2.
I suspect there are more efficient ways of expressing that, but the fundamental idea is sound as far as I can tell. Splitting the initial list into three is simply a matter of using GroupBy (and then calling ToList on each group to avoid repeated queries).
EDIT: Potentially nicer idea: split the data up as before, then do:
var result = new HashSet<...>(allData);
result.UnionWith(id1Data);
result.UnionWith(id2Data);
I believe that UnionWith keeps the existing elements rather than overwriting them with new but equal ones. On the other hand, that's not explicitly specified. It would be nice for it to be well-defined...
(Again, either make your type implement equality based on ID2, or create the hash set using an equality comparer which does so.)
This may smell quite a bit, but I think a LINQ-distinct will still work for you if you ensure the two compared objects come out to be the same. The following comparer would do this:
private class Comp : IEqualityComparer<Item>
{
public bool Equals(Item x, Item y)
{
var equalityOfB = x.ID2 == y.ID2;
if (x.ID1 == y.ID1 && equalityOfB)
return true;
if (x.ID1 == null && equalityOfB)
{
x.ID1 = y.ID1;
return true;
}
if (y.ID1 == null && equalityOfB)
{
y.ID1 = x.ID1;
return true;
}
return false;
}
public int GetHashCode(Item obj)
{
return obj.ID2.GetHashCode();
}
}
Then you could use it on a list as such...
var l = new[] {
new Item { ID1 = "a", ID2 = "b" },
new Item { ID1 = null, ID2 = "b" } };
var l2 = l.Distinct(new Comp()).ToArray();
I had a similar issue a couple of months ago.
Try something like this...
public static List<T> RemoveDuplicateSections<T>(List<T> sections) where T:INamedObject
{
Dictionary<string, int> uniqueStore = new Dictionary<string, int>();
List<T> finalList = new List<T>();
int i = 0;
foreach (T currValue in sections)
{
if (!uniqueStore.ContainsKey(currValue.Name))
{
uniqueStore.Add(currValue.Name, 0);
finalList.Add(sections[i]);
}
i++;
}
return finalList;
}
records.GroupBy(r => r, new RecordByIDsEqualityComparer())
.Select(g => g.OrderByDescending(r => r, new RecordByFullnessComparer()).First())
or if you want to merge the records, then Aggregate instead of OrderByDescending/First.

Categories