How do I verify a collection of values is unique (contains no duplicates) in C#

How do I verify a collection of values is unique (contains no duplicates) in C# - c#

Surely there is an easy way to verify a collection of values has no duplicates [using the default Comparison of the collection's Type] in C#/.NET ? Doesn't have to be directly built in but should be short and efficient.
I've looked a lot but I keep hitting examples of using collection.Count() == collection.Distinct().Count() which for me is inefficient. I'm not interested in the result and want to bail out as soon as I detect a duplicate, should that be the case.
(I'd love to delete this question and/or its answer if someone can point out the duplicates)

Okay, if you just want to get out as soon as the duplicate is found, it's simple:
// TODO: add an overload taking an IEqualityComparer<T>
public bool AllUnique<T>(this IEnumerable<T> source)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
var distinctItems = new HashSet<T>();
foreach (var item in source)
{
if (!distinctItems.Add(item))
{
return false;
}
}
return true;
}
... or use All, as you've already shown. I'd argue that this is slightly simpler to understand in this case... or if you do want to use All, I'd at least separate the creation of the set from the method group conversion, for clarity:
public static bool IsUnique<T>(this IEnumerable<T> source)
{
// TODO: validation
var distinctItems = new HashSet<T>();
// Add will return false if the element already exists. If
// every element is actually added, then they must all be unique.
return source.All(distinctItems.Add);
}

Doing it inline, you can replace:
collection.Count() == collection.Distinct().Count()
with
collection.All( new HashSet<T>().Add );
(where T is the type of your collection's elements)
Or you can extract the above to a helper extension method[1] so you can say:
collection.IsUnique()
[1]
static class EnumerableUniquenessExtensions
{
public static bool IsUnique<T>(this IEnumerable<T> that)
{
return that.All( new HashSet<T>().Add );
}
}
(and as Jon has pointed out in his answer, one really should separate and comment the two lines as such 'cuteness' is generally Not A Good Idea)

Related

Does Any() stop on success?

To be more specific: will the Linq extension method Any(IEnumerable collection, Func predicate) stop checking all the remaining elements of the collections once the predicate has yielded true for an item?
Because I don't want to spend to much time on figuring out if I need to do the really expensive parts at all:
if(lotsOfItems.Any(x => x.ID == target.ID))
//do expensive calculation here
So if Any is always checking all the items in the source this might end up being a waste of time instead of just going with:
var candidate = lotsOfItems.FirstOrDefault(x => x.ID == target.ID)
if(candicate != null)
//do expensive calculation here
because I'm pretty sure that FirstOrDefault does return once it got a result and only keeps going through the whole Enumerable if it does not find a suitable entry in the collection.
Does anyonehave information about the internal workings of Any, or could anyone suggest a solution for this kind of decision?
Also, a colleague suggested something along the lines of:
if(!lotsOfItems.All(x => x.ID != target.ID))
since this is supposed to stop once the conditions returns false for the first time but I'm not sure on that, so if anyone could shed some light on this as well it would be appreciated.

As we see from the source code, Yes:
internal static bool Any<T>(this IEnumerable<T> source, Func<T, bool> predicate) {
foreach (T element in source) {
if (predicate(element)) {
return true; // Attention to this line
}
}
return false;
}
Any() is the most efficient way to determine whether any element of a sequence satisfies a condition with LINQ.
also:a colleague suggested something along the lines of
if(!lotsOfItems.All(x => x.ID != target.ID)) since this is supposed to
stop once the conditions returns false for the first time but i'm not
sure on that, so if anyone could shed some light on this as well it
would be appreciated :>]
All() determines whether all elements of a sequence satisfy a condition. So, the enumeration of source is stopped as soon as the result can be determined.
Additional note:
The above is true if you are using Linq to objects. If you are using Linq to Database, then it will create a query and will execute it against database.

You could test it yourself: https://ideone.com/nIDKxr
public static IEnumerable<int> Tester()
{
yield return 1;
yield return 2;
throw new Exception();
}
static void Main(string[] args)
{
Console.WriteLine(Tester().Any(x => x == 1));
Console.WriteLine(Tester().Any(x => x == 2));
try
{
Console.WriteLine(Tester().Any(x => x == 3));
}
catch
{
Console.WriteLine("Error here");
}
}
Yes, it does :-)
also:a colleague suggested something along the lines of
if(!lotsOfItems.All(x => x.ID != target.ID))
since this is supposed to stop once the conditions returns false for the first time but i'm not sure on that, so if anyone could shed some light on this as well it would be appreciated :>]
Using the same reasoning, All() could continue even if one of the element returns false :-) No, even All() is programmed correctly :-)

It does whatever is the quickest way of doing what it has to do.
When used on an IEnumerable this will be along the lines of:
foreach(var item in source)
if(predicate(item))
return true;
return false;
Or for the variant that doesn't take a predicate:
using(var en = source.GetEnumerator())
return en.MoveNext();
When run against at database it will be something like
SELECT EXISTS(SELECT null FROM [some table] WHERE [some where clause])
And so on. How that was executed would depend in turn on what indices were available for fulfilling the WHERE clause, so it could be a quick index lookup, a full table scan aborting on first match found, or an index lookup followed by a partial table scan aborting on first match found, depending on that.
Yet other Linq providers would have yet other implementations, but generally the people responsible will be trying to be at least reasonably efficient.
In all, you can depend upon it being at least slightly more efficient than calling FirstOrDefault, as FirstOrDefault uses similar approaches but does have to return a full object (perhaps constructing it). Likewise !All(inversePredicate) tends to be pretty much on a par with Any(predicate) as per this answer.
Single is an exception to this
Update: The following from this point on no longer applies to .NET Core, which has changed the implementation of Single.
It's important to note that in the case of linq-to objects, the overloads of Single and SingleOrDefault that take a predicate do not stop on identified failure. While the obvious approach to Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) would be something like:
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
/* do null checks */
using(var en = source.GetEnumerator())
while(en.MoveNext())
{
var val = en.Current;
if(predicate(val))
{
while(en.MoveNext())
if(predicate(en.Current))
throw new InvalidOperationException("too many matching items");
return val;
}
}
throw new InvalidOperationException("no matching items");
}
The actual implementation is something like:
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
/* do null checks */
var result = default(TSource);
long tally = 0;
for(var item in source)
if(predicate(item))
{
result = item;
checked{++tally;}
}
switch(tally)
{
case 0:
throw new InvalidOperationException("no matching items");
case 1:
return result;
default:
throw new InvalidOperationException("too many matching items");
}
}
Now, while successful Single will have to scan everything, this can mean that an unsucessful Single is much, much slower than it needs to (and can even potentially throw an undocumented error) and if the reason for the unexpected duplicate is a bug which is duplicating items into the sequence - and hence making it far larger than it should be, then the Single that should have helped you find that problem is now dragging away through this.
SingleOrDefault has the same issue.
This only applies to linq-to-objects, but it remains safer to do .Where(predicate).Single() rather than Single(predicate).

Any stops at the first match. All stops at the first non-match.
I don't know whether the documentation guarantees that but this behavior is now effectively fixed for all time due to compatibility reasons. It also makes sense.

Yes it stops when the predicate is satisfied once. Here is code via RedGate Reflector:
[__DynamicallyInvokable]
public static bool Any<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
if (predicate == null)
{
throw Error.ArgumentNull("predicate");
}
foreach (TSource local in source)
{
if (predicate(local))
{
return true;
}
}
return false;
}

Remove specific entry from list (beginner in c#)

I have a simple static inventory class which is a list of custom class Item. I am working on a crafting system and when I craft something I need to remove the required Items from my inventory list.
I tried to create a method that I can call which takes an array of the items to remove as a parameter, but its not working.
I think its because the foreach loop doesn't know which items to remove? I am not getting an error messages, it just doesn't work. How can I accomplish this?
public class PlayerInventory: MonoBehaviour
{
public Texture2D tempIcon;
private static List<Item> _inventory=new List<Item>();
public static List<Item> Inventory
{
get { return _inventory; }
}
public static void RemoveCraftedMaterialsFromInventory(Item[] items)
{
foreach(Item item in items)
{
PlayerInventory._inventory.Remove(item);
}
}
}
Here is the function that shows what items will be removed:
public static Item[] BowAndArrowReqs()
{
Item requiredItem1 = ObjectGenerator.CreateItem(CraftingMatType.BasicWood);
Item requiredItem2 = ObjectGenerator.CreateItem(CraftingMatType.BasicWood);
Item requiredItem3 = ObjectGenerator.CreateItem(CraftingMatType.String);
Item[] arrowRequiredItems = new Item[]{requiredItem1, requiredItem2, requiredItem3};
return arrowRequiredItems;
}
And here is where that is called:
THis is within the RecipeCheck static class:
PlayerInventory.RemoveCraftedMaterialsFromInventory(RecipeCheck.BowAndArrowReqs());

While I like Jame's answer (and it sufficiently covers the contracts), I will talk on how one might implement this equality and make several observations.
For starts, in the list returned there may be multiple objects of the same type - e.g. BasicWood, String. Then there needs to be a discriminator used for each new object.
It would be bad if RemoveCraftedMaterialsFromInventory(new [] { aWoodBlock }) to remove a Wood piece in the same way that two wood pieces were checked ("equals") to each other. This is because being "compatible for crafting" isn't necessarily the same as "being equals".
One simple approach is to assign a unique ID (see Guid.NewGuid) for each specific object. This field would be used (and it could be used exclusively) in the Equals method - however, now we're back at the initial problem, where each new object is different from any other!
So, what's the solution? Make sure to use equivalent (or identical objects) when removing them!
List<Item> items = new List<Item> {
new Wood { Condition = Wood.Rotten },
new Wood { Condition = Wood.Epic },
};
// We find the EXISTING objects that we already have ..
var woodToBurn = items.OfType<Wood>
.Where(w => w.Condition == Wood.Rotten);
// .. so we can remove them
foreach (var wood in woodToBurn) {
items.Remove(wood);
}
Well, okay, that's out of the way, but then we say: "How can we do this with a Recipe such that Equals isn't butchered and yet it will remove any items of the given type?"
Well, we can either do this by using LINQ or a List method that supports predicates (i.e. List.FindIndex) or we can implement a special Equatable to only be used in this case.
An implementation that uses a predicate might look like:
foreach (var recipeItem in recipeItems) {
// List sort of sucks; this implementation also has bad bounds
var index = items.FindIndex((item) => {
return recipeItem.MaterialType == item.MaterialType;
});
if (index >= 0) {
items.RemoveAt(index);
} else {
// Missing material :(
}
}

If class Item doesn't implement IEquatable<Item> and the bool Equals(Item other) method, then by default it will use Object.Equals which checks if they are the same object. (not two objects with the same value --- the same object).
Since you don't say how Item is implemented, I can't suggest how to write it's Equals(), however, you should also override GetHashCode() so that two Items that are Equal return the same hash code.
UPDATE (based on comments):
Essentially, List.Remove works like this:
foreach(var t in theList)
{
if (t.Equals(itemToBeRemove))
PerformSomeMagicToRemove(t);
}
So, you don't have to do anything to the code you've given in your question. Just add the Equals() method to Item.

LINQ's ForEach on HashSet?

I am curious as to what restrictions necessitated the design decision to not have HashSet's be able to use LINQ's ForEach query.
What's really going on differently behind the scenes for these two implementations:
var myHashSet = new HashSet<T>;
foreach( var item in myHashSet ) { do.Stuff(); }
vs
var myHashSet = new HashSet<T>;
myHashSet.ForEach( item => do.Stuff(); }
I'm (pretty) sure that this is just because HashSet does not implement IEnumerable -- but what is a normal ForEach loop doing differently that makes it more supported by a HashSet?
Thanks

LINQ doesn't have ForEach. Only the List<T> class has a ForEach method.
It's also important to note that HashSet does implement IEnumerable<T>.
Remember, LINQ stands for Language INtegrated Query. It is meant to query collections of data. ForEach has nothing to do with querying. It simply loops over the data. Therefore it really doesn't belong in LINQ.

LINQ is meant to query data, I'm guessing it avoided ForEach() because there's a chance it could mutate data that would affect the way the data could be queried (i.e. if you changed a field that affected the hash code or equality).
You may be confused with the fact that List<T> has a ForEach()?
It's easy enough to write one, of course, but it should be used with caution because of those aforementioned concerns...
public static class EnumerableExtensions
{
public static void ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
if (source == null) throw new ArgumentNullException("source");
if (action == null) throw new ArgumentNullException("action");
foreach(var item in source)
{
action(item);
}
}
}

var myHashSet = new HashSet<T>;
myHashSet.ToList().ForEach( x => x.Stuff() );

The first use the method GetEnumerator of HashSet
The second the method ForEach
Maybe the second use GetEnumerator behind the scene but I'm not sure.

Is this achievable with a single LINQ query?

Suppose I have a given object of type IEnumerable<string> which is the return value of method SomeMethod(), and which contains no repeated elements. I would like to be able to "zip" the following lines in a single LINQ query:
IEnumerable<string> someList = SomeMethod();
if (someList.Contains(givenString))
{
return (someList.Where(givenString));
}
else
{
return (someList);
}
Edit: I mistakenly used Single instead of First. Corrected now.
I know I can "zip" this by using the ternary operator, but that's just not the point. I would just list to be able to achieve this with a single line. Is that possible?

This will return items with given string or all items if given is not present in the list:
someList.Where(i => i == givenString || !someList.Contains(givenString))

The nature of your desired output requires that you either make two requests for the data, like you are now, or buffer the non-matches to return if no matches are found. The later would be especially useful in cases where actually getting the data is a relatively expensive call (eg: database query or WCF service). The buffering method would look like this:
static IEnumerable<T> AllIfNone<T>(this IEnumerable<T> source,
Func<T, bool> predicate)
{
//argument checking ignored for sample purposes
var buffer = new List<T>();
bool foundFirst = false;
foreach (var item in source)
{
if (predicate(item))
{
foundFirst = true;
yield return item;
}
else if (!foundFirst)
{
buffer.Add(item);
}
}
if (!foundFirst)
{
foreach (var item in buffer)
{
yield return item;
}
}
}
The laziness of this method is either that of Where or ToList depending on if the collection contains a match or not. If it does, you should get execution similar to Where. If not, you will get roughly the execution of calling ToList (with the overhead of all the failed filter checks) and iterating the result.

What is wrong with the ternary operator?
someList.Any(s => s == givenString) ? someList.Where(s => s == givenString) : someList;
It would be better to do the Where followed by the Any but I can't think of how to one-line that.
var reducedEnumerable = someList.Where(s => s == givenString);
return reducedEnumerable.Any() ? reducedEnumerable : someList;

It is not possible to change the return type on the method, which is what you're asking. The first condition returns a string and the second condition returns a collection of strings.
Just return the IEnumerable<string> collection, and call Single on the return value like this:
string test = ReturnCollectionOfStrings().Single(x => x == "test");

What's the fastest way to convert List<string> to List<int> in C# assuming int.Parse will work for every item?

By fastest I mean what is the most performant means of converting each item in List to type int using C# assuming int.Parse will work for every item?

You won't get around iterating over all elements. Using LINQ:
var ints = strings.Select(s => int.Parse(s));
This has the added bonus it will only convert at the time you iterate over it, and only as much elements as you request.
If you really need a list, use the ToList method. However, you have to be aware that the performance bonus mentioned above won't be available then.

If you're really trying to eeke out the last bit of performance you could try doing someting with pointers like this, but personally I'd go with the simple linq implementation that others have mentioned.
unsafe static int ParseUnsafe(string value)
{
int result = 0;
fixed (char* v = value)
{
char* str = v;
while (*str != '\0')
{
result = 10 * result + (*str - 48);
str++;
}
}
return result;
}
var parsed = input.Select(i=>ParseUnsafe(i));//optionally .ToList() if you really need list

There is likely to be very little difference between any of the obvious ways to do this: therefore go for readability (one of the LINQ-style methods posted in other answers).
You may gain some performance for very large lists by initializing the output list to its required capacity, but it's unlikely you'd notice the difference, and readability will suffer:
List<string> input = ..
List<int> output = new List<int>(input.Count);
... Parse in a loop ...
The slight performance gain will come from the fact that the output list won't need to be repeatedly reallocated as it grows.

I don't know what the performance implications are, but there is a List<T>.ConvertAll<TOutput> method for converting the elements in the current List to another type, returning a list containing the converted elements.
List.ConvertAll Method

var myListOfInts = myListString.Select(x => int.Parse(x)).ToList()
Side note: If you call ToList() on ICollection .NET framework automatically preallocates an
List of needed size, so it doesn't have to allocate new space for each new item added to the list.
Unfortunately LINQ Select doesn't return an ICollection (as Joe pointed out in comments).
From ILSpy:
// System.Linq.Enumerable
public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
return new List<TSource>(source);
}
// System.Collections.Generic.List<T>
public List(IEnumerable<T> collection)
{
if (collection == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
}
ICollection<T> collection2 = collection as ICollection<T>;
if (collection2 != null)
{
int count = collection2.Count;
this._items = new T[count];
collection2.CopyTo(this._items, 0);
this._size = count;
return;
}
this._size = 0;
this._items = new T[4];
using (IEnumerator<T> enumerator = collection.GetEnumerator())
{
while (enumerator.MoveNext())
{
this.Add(enumerator.Current);
}
}
}
So, ToList() just calls List constructor and passes in an IEnumerable.
The List constructor is smart enough that if it is an ICollection it uses most efficient way of filling a new instance of List

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How do I verify a collection of values is unique (contains no duplicates) in C# - c#

Related

Does Any() stop on success?

Remove specific entry from list (beginner in c#)

LINQ's ForEach on HashSet?

Is this achievable with a single LINQ query?

What's the fastest way to convert List<string> to List<int> in C# assuming int.Parse will work for every item?

Categories

Resources