I have me some Resharper squiggles here.
and they tell me that I have a possible multiple enumeration of IEnumerable going on. However you can see that this is not true. final is explicitly declared as a list ( List<Point2D> ) and pointTangents is declared previously as List<PointVector2D>
Any idea on why Resharper might be telling me this?
Edit Experiments To See If I can replicate with simpler code
As you can see below there are no squiggles and no warnings even though Bar is declared to take IEnumerable as arg.
Looks a lot like RSRP-429474 False-positive warning for possible multiple enumeration :
I have this code:
List<string> duplicateLabelsList = allResourcesLookup.SelectMany(x => x).Select(x => x.LoaderOptions.Label).Duplicates<string, string>().ToList(); ;
if (duplicateLabelsList.Any())
throw new DuplicateResourceLoaderLabelsException(duplicateLabelsList);
For both usages of duplicateLabelsList, I'm being warned about
possible multiple enumeration, despite the fact I've called ToList and
therefore there should be no multiple enumeration.
which (currently) has a Fix Version of 9.2, which (currently) isn't yet released.
The extension method public static TSource Last<TSource>(this IEnumerable<TSource> source); is defined for the type IEnumberable<TSource>.
If one looks at the implentation of Last<TSource>:
public static TSource Last<TSource>(this IEnumerable<TSource> source)
{
if (source == null) throw Error.ArgumentNull("source");
IList<TSource> list = source as IList<TSource>;
if (list != null)
{
int count = list.Count;
if (count > 0) return list[count - 1];
}
else
{
using (IEnumerator<TSource> e = source.GetEnumerator())
{
if (e.MoveNext())
{
TSource result;
do
{
result = e.Current;
} while (e.MoveNext());
return result;
}
}
}
throw Error.NoElements();
}
It is clear that if source implements IList then source is not enumerated and therefore your assumption that this is a "bug" in Resharper is correct.
I'd consider it more like a false positive probably due to the fact that Resharper has no general way to know that Last()'s implementation avoids unnecessary enumerations. It is probably deciding to flag the potential multiple enumeration based on the fact that Last<TSource> is defined for typed IEnumerable<T> objects.
Related
I am using XUnit to test for scenarios where an empty Enumerable list is expected.
I have noticed that in certain scenarios:
Assert.Empty(msgs); fails;
BUT
Assert.False(msgs.Any()); is passing.
This is a bit confusing to me as I anticipated that this was testing for the same thing.
I understand that this likely because of the differences in expected behaviour between:
Enumerable.Any() (which defines this as "Determines whether a sequence contains any elements.")
AND
The empty expected in XUnit.Empty() (which defines that this is testing for an empty Object).
However, I am not sure exactly the difference as it appeared to me to be essentially testing the same thing.
Could someone please explain the differences in what is being tested for in these two different types of Asserts?
Here is the source for Enumerable.Any (The Assert.False() just validates that this returns false.):
public static bool Any<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
using (IEnumerator<TSource> e = source.GetEnumerator()) {
if (e.MoveNext()) return true;
}
return false;
}
Here is the source for Assert.Empty from xUnit:
public static void Empty(IEnumerable collection)
{
Assert.GuardArgumentNotNull("collection", collection);
var enumerator = collection.GetEnumerator();
try
{
if (enumerator.MoveNext())
throw new EmptyException(collection);
}
finally
{
(enumerator as IDisposable)?.Dispose();
}
}
They seem to be using a very similar way of checking for the presence of items in the collection. I'd expect the same result from each method.
Without more details about how you are using each one, it's hard to say why you are getting different results.
msge.Any() returns true when msge is not null and have one or more element and false otherwise so probebly msge is null and Assert.Empty fails when argument is null.
With the best regards.
There is a difference between these two methods:
.Any() is an extension method which takes an IEnumerable - An object an implement the IEnumerable interface to let the code iterate through it for set operations, (like .Any() or .Where())
Assert.Empty() doesn't appear to check whether an object implements IEnumerable, but only checks against an empty set if the input data is a string or an array.
My guess then is that you're passing in an IEnumerable object, rather than an array.
To get around this you could either use Assert.False(msgs.Any()); as before, or else use something like Assert.Empty(msgs.ToArray());
I'm working on a code generator that validated objects based on certain business rules. As an example, I’m curious to find out various ways below logic can be written as LINQ expression.
Assertion should evaluate to true when collection is null OR when count of "TrueAndCorrect" items is anything but 1. One possible solution is:
bool assertion = report.DeclarationOfTrusteeCollection == null
|| report.DeclarationOfTrusteeCollection.Count(f => f.FTER99.Equals("TrueAndCorrect")) != 1
Are there other ways this LINQ can be expressed as, perhaps more compact, using Any, inverting the operators, or any other?
The original code is:
bool assertion =
report.DeclarationOfTrusteeCollection == null ||
report.DeclarationOfTrusteeCollection.Count(
f => f.FTER99.Equals("TrueAndCorrect")) != 1;
There are some problems here.
First, the intention of the null check seems to be "a null collection has the same semantics as an empty collection". This is a worst-practice in C#. Never do this! If you want to represent an empty collection, make an empty collection. There's even an Enumerable.Empty helper method for you.
So, start with that; the code should be:
if (report.DeclarationOfTrusteeCollection == null)
throw some appropriate exception
or
Debug.Assert(report.DeclarationOfTrusteeCollection != null);
if the condition is impossible.
That leaves us with
bool assertion =
report.DeclarationOfTrusteeCollection.Count(
f => f.FTER99.Equals("TrueAndCorrect")) != 1;
This is bad. Suppose I show you a jar that contains some number of pennies and I ask you "is there exactly one penny in the jar?" How many pennies do you have to count before you know the answer? Your code here is counting all of them, but you could stop after two.
Enumerable gives you a method which throws if a sequence is not a singleton, but no method that tests it. Fortunately it is easy to write. The best practice here is to write a helper method that has the exact semantics you want:
static class Extensions
{
public static bool IsSingleton<T>(this IEnumerable<T> items)
{
bool seenOne = false;
foreach(T item in items)
{
if (seenOne) return false;
seenOne = true;
}
return seenOne;
}
public static bool IsSingleton<T>(
this IEnumerable<T> items, Func<T, bool> predicate) =>
items.Where(predicate).IsSingleton();
}
Done. And now your code is:
if (report.DeclarationOfTrusteeCollection == null)
throw some appropriate exception
bool assertion =
report.DeclarationOfTrusteeCollection.IsSingleton(f => ...);
Write the code so that it reads like what it is logically doing. That's the beauty and power of LINQ sequence operators.
You could use the null-propagation operator:
bool assertion = report.DeclarationOfTrusteeCollection?.Count(f => f.FTER99.Equals("TrueAndCorrect")) != 1;
Since null is not 1 this is also true if the collection is null.
It would be nice if you don't need to count the whole collection, you already know it's wrong when there's more than one matching element. But I don't know of a built-in method for that. You could write your own extension:
public static class MyExtensions
{
public static bool IsNullOrHasNotExactlyOneMatching<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
if (source == null) return true;
bool found = false;
foreach(T element in source)
{
if (!predicate(element)) continue;
if (found) return true; // this is the second match!
found = true;
}
return !found; // one match found (or not)
}
}
And use it:
bool assertion = report.DeclarationOfTrusteeCollection.IsNullOrHasNotExactlyOneMatching(f => f.FTER99.Equals("TrueAndCorrect"));
As mentioned by Rawling you could shorten the extension using Take():
public static bool IsNullOrHasNotExactlyOneMatching<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
return source?.Where(predicate).Take(2).Count() != 1;
}
or do this directly:
bool assertion = report.DeclarationOfTrusteeCollection?.Where(f => f.FTER99.Equals("TrueAndCorrect"))
.Take(2).Count() != 1;
Both versions only iterate until a second match was found (or until the end if no match was found).
To be more specific: will the Linq extension method Any(IEnumerable collection, Func predicate) stop checking all the remaining elements of the collections once the predicate has yielded true for an item?
Because I don't want to spend to much time on figuring out if I need to do the really expensive parts at all:
if(lotsOfItems.Any(x => x.ID == target.ID))
//do expensive calculation here
So if Any is always checking all the items in the source this might end up being a waste of time instead of just going with:
var candidate = lotsOfItems.FirstOrDefault(x => x.ID == target.ID)
if(candicate != null)
//do expensive calculation here
because I'm pretty sure that FirstOrDefault does return once it got a result and only keeps going through the whole Enumerable if it does not find a suitable entry in the collection.
Does anyonehave information about the internal workings of Any, or could anyone suggest a solution for this kind of decision?
Also, a colleague suggested something along the lines of:
if(!lotsOfItems.All(x => x.ID != target.ID))
since this is supposed to stop once the conditions returns false for the first time but I'm not sure on that, so if anyone could shed some light on this as well it would be appreciated.
As we see from the source code, Yes:
internal static bool Any<T>(this IEnumerable<T> source, Func<T, bool> predicate) {
foreach (T element in source) {
if (predicate(element)) {
return true; // Attention to this line
}
}
return false;
}
Any() is the most efficient way to determine whether any element of a sequence satisfies a condition with LINQ.
also:a colleague suggested something along the lines of
if(!lotsOfItems.All(x => x.ID != target.ID)) since this is supposed to
stop once the conditions returns false for the first time but i'm not
sure on that, so if anyone could shed some light on this as well it
would be appreciated :>]
All() determines whether all elements of a sequence satisfy a condition. So, the enumeration of source is stopped as soon as the result can be determined.
Additional note:
The above is true if you are using Linq to objects. If you are using Linq to Database, then it will create a query and will execute it against database.
You could test it yourself: https://ideone.com/nIDKxr
public static IEnumerable<int> Tester()
{
yield return 1;
yield return 2;
throw new Exception();
}
static void Main(string[] args)
{
Console.WriteLine(Tester().Any(x => x == 1));
Console.WriteLine(Tester().Any(x => x == 2));
try
{
Console.WriteLine(Tester().Any(x => x == 3));
}
catch
{
Console.WriteLine("Error here");
}
}
Yes, it does :-)
also:a colleague suggested something along the lines of
if(!lotsOfItems.All(x => x.ID != target.ID))
since this is supposed to stop once the conditions returns false for the first time but i'm not sure on that, so if anyone could shed some light on this as well it would be appreciated :>]
Using the same reasoning, All() could continue even if one of the element returns false :-) No, even All() is programmed correctly :-)
It does whatever is the quickest way of doing what it has to do.
When used on an IEnumerable this will be along the lines of:
foreach(var item in source)
if(predicate(item))
return true;
return false;
Or for the variant that doesn't take a predicate:
using(var en = source.GetEnumerator())
return en.MoveNext();
When run against at database it will be something like
SELECT EXISTS(SELECT null FROM [some table] WHERE [some where clause])
And so on. How that was executed would depend in turn on what indices were available for fulfilling the WHERE clause, so it could be a quick index lookup, a full table scan aborting on first match found, or an index lookup followed by a partial table scan aborting on first match found, depending on that.
Yet other Linq providers would have yet other implementations, but generally the people responsible will be trying to be at least reasonably efficient.
In all, you can depend upon it being at least slightly more efficient than calling FirstOrDefault, as FirstOrDefault uses similar approaches but does have to return a full object (perhaps constructing it). Likewise !All(inversePredicate) tends to be pretty much on a par with Any(predicate) as per this answer.
Single is an exception to this
Update: The following from this point on no longer applies to .NET Core, which has changed the implementation of Single.
It's important to note that in the case of linq-to objects, the overloads of Single and SingleOrDefault that take a predicate do not stop on identified failure. While the obvious approach to Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) would be something like:
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
/* do null checks */
using(var en = source.GetEnumerator())
while(en.MoveNext())
{
var val = en.Current;
if(predicate(val))
{
while(en.MoveNext())
if(predicate(en.Current))
throw new InvalidOperationException("too many matching items");
return val;
}
}
throw new InvalidOperationException("no matching items");
}
The actual implementation is something like:
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
/* do null checks */
var result = default(TSource);
long tally = 0;
for(var item in source)
if(predicate(item))
{
result = item;
checked{++tally;}
}
switch(tally)
{
case 0:
throw new InvalidOperationException("no matching items");
case 1:
return result;
default:
throw new InvalidOperationException("too many matching items");
}
}
Now, while successful Single will have to scan everything, this can mean that an unsucessful Single is much, much slower than it needs to (and can even potentially throw an undocumented error) and if the reason for the unexpected duplicate is a bug which is duplicating items into the sequence - and hence making it far larger than it should be, then the Single that should have helped you find that problem is now dragging away through this.
SingleOrDefault has the same issue.
This only applies to linq-to-objects, but it remains safer to do .Where(predicate).Single() rather than Single(predicate).
Any stops at the first match. All stops at the first non-match.
I don't know whether the documentation guarantees that but this behavior is now effectively fixed for all time due to compatibility reasons. It also makes sense.
Yes it stops when the predicate is satisfied once. Here is code via RedGate Reflector:
[__DynamicallyInvokable]
public static bool Any<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
if (predicate == null)
{
throw Error.ArgumentNull("predicate");
}
foreach (TSource local in source)
{
if (predicate(local))
{
return true;
}
}
return false;
}
Say I have ISet<T> _set = new HashSet<T>();
Now if I do: _set.Cast<TInterface>().Contains(obj, comparer); (where T implements TInterface), do I loose the O(1) benefit of the HashSet<T>?
In other words - does .Cast<T>()ing changes the underlying type (HashSet<T> in this case) to something else, or the underlying type preserved?
Logically, a HashSet<T> uses an internal hash-table based on the hashing logic of the comparer that it was created with, so of course it's not possible to do an element-containment test on it with a different comparer and expect O(1) performance.
That said, let's look at things in a bit more detail for your specific scenario:
The Cast<T> method looks like this (from reference-source):
public static IEnumerable<TResult> Cast<TResult>(this IEnumerable source) {
IEnumerable<TResult> typedSource = source as IEnumerable<TResult>;
if (typedSource != null) return typedSource;
if (source == null) throw Error.ArgumentNull("source");
return CastIterator<TResult>(source);
}
As you can see, if the source implements IEnumerable<TResult> it just returns the source directly. Since IEnumerable<> is a covariant interface, this test will pass for your use case (assuming the concrete type implements the interface type) and the hash-set will be returned directly - a good thing as there's still hope of its internal hash-table being used.
However, the overload of Contains you are using looks like this:
public static bool Contains<TSource>(this IEnumerable<TSource> source, TSource value, IEqualityComparer<TSource> comparer)
{
if (comparer == null) comparer = EqualityComparer<TSource>.Default;
if (source == null) throw Error.ArgumentNull("source");
foreach (TSource element in source)
if (comparer.Equals(element, value)) return true;
return false;
}
As you can see, it always loops through the collection to linear-search, which is O(n).
So the entire operation is going to be O(n) regardless.
_set.Cast<TInterface>() will return an IEnumerable<TInterface> so _set.Cast<TInterface>().Contains(obj, comparer); doesn't invokes HashSet.Contains, rather it invokes Enumerable.Contains extension method.
So obviously you don't get O(1) operation anymore.
If you need O(1) you again need to create a HashSet out of it.
var newSet = new HashSet(_set.Cast<TInterface>(),comparer);
newSet.Contains();
The Cast method returns an IEnumerable so the Contains method will operate on the IEnumerable rather than the HashSet. So I think you'd loose the benefit of HashSet. Why don't you do the cast in the compared instead?
Surely there is an easy way to verify a collection of values has no duplicates [using the default Comparison of the collection's Type] in C#/.NET ? Doesn't have to be directly built in but should be short and efficient.
I've looked a lot but I keep hitting examples of using collection.Count() == collection.Distinct().Count() which for me is inefficient. I'm not interested in the result and want to bail out as soon as I detect a duplicate, should that be the case.
(I'd love to delete this question and/or its answer if someone can point out the duplicates)
Okay, if you just want to get out as soon as the duplicate is found, it's simple:
// TODO: add an overload taking an IEqualityComparer<T>
public bool AllUnique<T>(this IEnumerable<T> source)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
var distinctItems = new HashSet<T>();
foreach (var item in source)
{
if (!distinctItems.Add(item))
{
return false;
}
}
return true;
}
... or use All, as you've already shown. I'd argue that this is slightly simpler to understand in this case... or if you do want to use All, I'd at least separate the creation of the set from the method group conversion, for clarity:
public static bool IsUnique<T>(this IEnumerable<T> source)
{
// TODO: validation
var distinctItems = new HashSet<T>();
// Add will return false if the element already exists. If
// every element is actually added, then they must all be unique.
return source.All(distinctItems.Add);
}
Doing it inline, you can replace:
collection.Count() == collection.Distinct().Count()
with
collection.All( new HashSet<T>().Add );
(where T is the type of your collection's elements)
Or you can extract the above to a helper extension method[1] so you can say:
collection.IsUnique()
[1]
static class EnumerableUniquenessExtensions
{
public static bool IsUnique<T>(this IEnumerable<T> that)
{
return that.All( new HashSet<T>().Add );
}
}
(and as Jon has pointed out in his answer, one really should separate and comment the two lines as such 'cuteness' is generally Not A Good Idea)