Efficient Linq Enumerable's 'Count() == 1' test

Efficient Linq Enumerable's 'Count() == 1' test - c#

Similar to this question but rephrased for Linq:
You can use Enumerable<T>.Any() to test if the enumerable contains data. But what's the efficient way to test if the enumerable contains a single value (i.e. Enumerable<T>.Count() == 1) or greater than a single value (i.e. Enumerable<T>.Count() > 1) without using an expensive count operation?

int constrainedCount = yourSequence.Take(2).Count();
// if constrainedCount == 0 then the sequence is empty
// if constrainedCount == 1 then the sequence contains a single element
// if constrainedCount == 2 then the sequence has more than one element

One way is to write a new extension method
public static bool IsSingle<T>(this IEnumerable<T> enumerable) {
using (var enumerator = enumerable.GetEnumerator()) {
if (!enumerator.MoveNext()) {
return false;
}
return !enumerator.MoveNext();
}
}

This code take's LukeH's excellent answer and wraps it up as an IEnumerable extension so that your code can deal in terms of None, One and Many rather than 0, 1 and 2.
public enum Multiplicity
{
None,
One,
Many,
}
In a static class, e.g. EnumerableExtensions:
public static Multiplicity Multiplicity<TElement>(this IEnumerable<TElement> #this)
{
switch (#this.Take(2).Count())
{
case 0: return General.Multiplicity.None;
case 1: return General.Multiplicity.One;
case 2: return General.Multiplicity.Many;
default: throw new Exception("WTF‽");
}
}

Another way:
bool containsMoreThanOneElement = yourSequence.Skip(1).Any();
Or for exactly 1 element:
bool containsOneElement = yourSequence.Any() && !yourSequence.Skip(1).Any();

Efficient Count() == n test:
public static bool CountIsEqualTo<T>(this IEnumerable<T> enumerable, int c)
{
using (var enumerator = enumerable.GetEnumerator())
{
for(var i = 0; i < c ; i++)
if (!enumerator.MoveNext())
return false;
return !enumerator.MoveNext();
}
}

With linq to objects, SingleOrDefault throws if there is more than one element, so you're probably best off if you roll your own.
EDIT: Now I've seen LukeH's answer, and I have to say I prefer it. Wish I'd thought of it myself!

bool hasTwo = yourSequence.ElementAtOrDefault(1) != default(T);
...in case of class where values can be null this could maybe we useful.

Related

Does successive calls to a method like Count() reenumerate an IEnumerable<T>?

I'm a bit confused with the IEnumerable and it's deferred execution behaviour.
Let's say I have the following IEnumerable<T>:
IEnumerable<Foo> enumerable = GetFoos();
What happens if I do:
int count = enumerable.Count();
count = enumerable.Count();
I can think of three possibilities:
a) will enumerate the collection again
b) will cache the result to use on the second time (like lazy loading)
c) will depend on the underlining type instantiated in GetFoos() method and how it implemented the IEnumerable interface
Which one is the correct one? Also, if c is the correct one, what happens with an IEnumerable created using yield return?

A quick check on referencesourcecode gives following definition for the .Count<T>(this IEnumerable<T>) extension (simplified):
Disclaimer you should not depend on that implementation nor expect that the implementation will always do something in a certain way.
public static int Count<TSource>(this IEnumerable<TSource> source) {
ICollection<TSource> collectionoft = source as ICollection<TSource>;
if (collectionoft != null) return collectionoft.Count;
ICollection collection = source as ICollection;
if (collection != null) return collection.Count;
int count = 0;
using (IEnumerator<TSource> e = source.GetEnumerator()) {
while (e.MoveNext()) count++;
}
return count;
}
So the answer it is (c), it will depend on the underlying type. And if it's not of type ICollection the IEnumerable will be evaluated a second time (i.e GetEnumerator will be called and the Enumerator will be looped.
So what happens when using yield syntax?
Well yield is just a fancy way to implement GetEnumerator, so what will happen when calling Count() twice is that this pseudo GetEnumerator method will be called twice. I think a code snippet says more than a thousand words:
private static IEnumerable<int> ConstantEnumerable()
{
yield return 1;
yield return 2;
yield return 3;
}
private static int i = 0;
private static IEnumerable<int> ChangingEnumerable()
{
if (i == 0)
{
yield return 1;
i++;
}
else
{
yield return 2;
yield return 3;
}
}
public static void Main()
{
var constant = ConstantEnumerable();
var changing = ChangingEnumerable();
Console.WriteLine("Constant: {0}, {1}", constant.Count(), constant.Count()); // 3, 3
Console.WriteLine("Changing: {0}, {1}", changing.Count(), changing.Count()); // 1, 2
}

Looking for alternative LINQ expression(s)

I'm working on a code generator that validated objects based on certain business rules. As an example, I’m curious to find out various ways below logic can be written as LINQ expression.
Assertion should evaluate to true when collection is null OR when count of "TrueAndCorrect" items is anything but 1. One possible solution is:
bool assertion = report.DeclarationOfTrusteeCollection == null
|| report.DeclarationOfTrusteeCollection.Count(f => f.FTER99.Equals("TrueAndCorrect")) != 1
Are there other ways this LINQ can be expressed as, perhaps more compact, using Any, inverting the operators, or any other?

The original code is:
bool assertion =
report.DeclarationOfTrusteeCollection == null ||
report.DeclarationOfTrusteeCollection.Count(
f => f.FTER99.Equals("TrueAndCorrect")) != 1;
There are some problems here.
First, the intention of the null check seems to be "a null collection has the same semantics as an empty collection". This is a worst-practice in C#. Never do this! If you want to represent an empty collection, make an empty collection. There's even an Enumerable.Empty helper method for you.
So, start with that; the code should be:
if (report.DeclarationOfTrusteeCollection == null)
throw some appropriate exception
or
Debug.Assert(report.DeclarationOfTrusteeCollection != null);
if the condition is impossible.
That leaves us with
bool assertion =
report.DeclarationOfTrusteeCollection.Count(
f => f.FTER99.Equals("TrueAndCorrect")) != 1;
This is bad. Suppose I show you a jar that contains some number of pennies and I ask you "is there exactly one penny in the jar?" How many pennies do you have to count before you know the answer? Your code here is counting all of them, but you could stop after two.
Enumerable gives you a method which throws if a sequence is not a singleton, but no method that tests it. Fortunately it is easy to write. The best practice here is to write a helper method that has the exact semantics you want:
static class Extensions
{
public static bool IsSingleton<T>(this IEnumerable<T> items)
{
bool seenOne = false;
foreach(T item in items)
{
if (seenOne) return false;
seenOne = true;
}
return seenOne;
}
public static bool IsSingleton<T>(
this IEnumerable<T> items, Func<T, bool> predicate) =>
items.Where(predicate).IsSingleton();
}
Done. And now your code is:
if (report.DeclarationOfTrusteeCollection == null)
throw some appropriate exception
bool assertion =
report.DeclarationOfTrusteeCollection.IsSingleton(f => ...);
Write the code so that it reads like what it is logically doing. That's the beauty and power of LINQ sequence operators.

You could use the null-propagation operator:
bool assertion = report.DeclarationOfTrusteeCollection?.Count(f => f.FTER99.Equals("TrueAndCorrect")) != 1;
Since null is not 1 this is also true if the collection is null.
It would be nice if you don't need to count the whole collection, you already know it's wrong when there's more than one matching element. But I don't know of a built-in method for that. You could write your own extension:
public static class MyExtensions
{
public static bool IsNullOrHasNotExactlyOneMatching<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
if (source == null) return true;
bool found = false;
foreach(T element in source)
{
if (!predicate(element)) continue;
if (found) return true; // this is the second match!
found = true;
}
return !found; // one match found (or not)
}
}
And use it:
bool assertion = report.DeclarationOfTrusteeCollection.IsNullOrHasNotExactlyOneMatching(f => f.FTER99.Equals("TrueAndCorrect"));
As mentioned by Rawling you could shorten the extension using Take():
public static bool IsNullOrHasNotExactlyOneMatching<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
return source?.Where(predicate).Take(2).Count() != 1;
}
or do this directly:
bool assertion = report.DeclarationOfTrusteeCollection?.Where(f => f.FTER99.Equals("TrueAndCorrect"))
.Take(2).Count() != 1;
Both versions only iterate until a second match was found (or until the end if no match was found).

How to properly check IEnumerable for existing results

What's the best practice to check if a collection has items?
Here's an example of what I have:
var terminalsToSync = TerminalAction.GetAllTerminals();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
The GetAllTerminals() method will execute a stored procedure and, if we return a result, (Any() is true), SyncTerminals() will loop through the elements; thus enumerating it again and executing the stored procedure for the second time.
What's the best way to avoid this?
I'd like a good solution that can be used in other cases too; possibly without converting it to List.
Thanks in advance.

I would probably use a ToArray call, and then check Length; you're going to enumerate all the results anyway so why not do it early? However, since you've said you want to avoid early realisation of the enumerable...
I'm guessing that SyncTerminals has a foreach, in which case you can write it something like this:
bool any = false;
foreach(var terminal in terminalsToSync)
{
if(!any)any = true;
//....
}
if(!any)
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
Okay, there's a redundant if after the first loop, but I'm guessing the cost of an extra few CPU cycles isn't going to matter much.
Equally, you could do the iteration the old way and use a do...while loop and GetEnumerator; taking the first iteration out of the loop; that way there are literally no wasted operations:
var enumerator = terminalsToSync.GetEnumerator();
if(enumerator.MoveNext())
{
do
{
//sync enumerator.Current
} while(enumerator.MoveNext())
}
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);

How about this, which still defers execution, but buffers it once executed:
var terminalsToSync = TerminalAction.GetAllTerminals().Lazily();
with:
public static class LazyEnumerable {
public static IEnumerable<T> Lazily<T>(this IEnumerable<T> source) {
if (source is LazyWrapper<T>) return source;
return new LazyWrapper<T>(source);
}
class LazyWrapper<T> : IEnumerable<T> {
private IEnumerable<T> source;
private bool executed;
public LazyWrapper(IEnumerable<T> source) {
if (source == null) throw new ArgumentNullException("source");
this.source = source;
}
IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
public IEnumerator<T> GetEnumerator() {
if (!executed) {
executed = true;
source = source.ToList();
}
return source.GetEnumerator();
}
}
}

Personally i wouldnt use an any here, foreach will simply not loop through any items if the collection is empty, so i would just do it like that. However i would recommend that you check for null.
If you do want to pre-enumerate the set use .ToArray() eg will only enumerate once:
var terminalsToSync = TerminalAction.GetAllTerminals().ToArray();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);

var terminalsToSync = TerminalAction.GetAllTerminals().ToList();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);

.Length or .Count is faster since it doesn't need to go through the GetEnumerator()/MoveNext()/Dispose() required by Any()

Here's another way of approaching this problem:
int count = SyncTerminals(terminalsToSync);
if(count == 0) GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
where you change SyncTerminals to do:
int count = 0;
foreach(var obj in terminalsToSync) {
count++;
// some code
}
return count;
Nice and simple.

All the caching solutions here are caching all items when the first item is being retrieved. It it really lazy if you cache each single item while the items of the list are is iterated.
The difference can be seen in this example:
public class LazyListTest
{
private int _count = 0;
public void Test()
{
var numbers = Enumerable.Range(1, 40);
var numbersQuery = numbers.Select(GetElement).ToLazyList(); // Cache lazy
var total = numbersQuery.Take(3)
.Concat(numbersQuery.Take(10))
.Concat(numbersQuery.Take(3))
.Sum();
Console.WriteLine(_count);
}
private int GetElement(int value)
{
_count++;
// Some slow stuff here...
return value * 100;
}
}
If you run the Test() method, the _count is only 10. Without caching it would be 16 and with .ToList() it would be 40!
An example of the implementation of LazyList can be found here.

If you're seeing two procedure calls for the evaluation of whatever GetAllTerminals() returns, this means that the procedure's result isn't being cached. Without knowing what data-access strategy you're using, this is quite hard to fix in a general way.
The simplest solution, as you've alluded, is to copy the result of the call before you perform any other operations. If you wanted to, you could neatly wrap this behaviour up in an IEnumerable<T> which executes the inner enumerable call just once:
public class CachedEnumerable<T> : IEnumerable<T>
{
public CachedEnumerable<T>(IEnumerable<T> enumerable)
{
result = new Lazy<List<T>>(() => enumerable.ToList());
}
private Lazy<List<T>> result;
public IEnumerator<T> GetEnumerator()
{
return this.result.Value.GetEnumerator();
}
System.Collections.IEnumerable GetEnumerator()
{
return this.GetEnumerator();
}
}
Wrap the result in an instance of this type and it will not evaluate the inner enumerable multiple times.

Extensions for IEnumerable generic

I've got two extensions for IEnumerable:
public static class IEnumerableGenericExtensions
{
public static IEnumerable<IEnumerable<T>> InSetsOf<T>(this IEnumerable<T> source, int max)
{
List<T> toReturn = new List<T>(max);
foreach (var item in source)
{
toReturn.Add(item);
if (toReturn.Count == max)
{
yield return toReturn;
toReturn = new List<T>(max);
}
}
if (toReturn.Any())
{
yield return toReturn;
}
}
public static int IndexOf<T>(this IEnumerable<T> source, Predicate<T> searchPredicate)
{
int i = 0;
foreach (var item in source)
if (searchPredicate(item))
return i;
else
i++;
return -1;
}
}
Then I write this code:
Pages = history.InSetsOf<Message>(500);
var index = Pages.IndexOf(x => x == Pages.ElementAt(0));
where
public class History : IEnumerable
But as a result I've got not '0' as I've expected, but '-1'. I cant understand - why so?

When you write Pages.IndexOf(x => x == Pages.ElementAt(0));, you actually run InSetsOf many times, due to deferred execution (aka lazy). To expand:
Pages = history.InSetsOf<Message>(500) - this line doesn't run InSetsOf at all.
Pages.IndexOf - Iterates over Pages, so it starts executing InSetsOf once.
x == Pages.ElementAt(0) - this executes InSetsOf again, once for every element in the collection of Pages (or at least until searchPredicate return true, which doesn't happen here).
Each time you run InSetsOf you create a new list (specifically, a new first list, because you use ElementAt(0)). These are two different objects, so comparison of == between them fails.
An extremely simple fix would be to return a list, so Pages is not a deferred query, but a concrete collection:
Pages = history.InSetsOf<Message>(500).ToList();
Another option is to use SequenceEqual, though I'd recommend caching the first element anyway:
Pages = history.InSetsOf<Message>(500);
var firstPage = Pages.FirstOrDefault();
var index = Pages.IndexOf(x => x.SequenceEqual(firstPage));

Does your class T implement the IComparable? If not, your equality check might be flawed, as the framework does not know exactly when T= T. You would also get by just overriding equals on your class T I would guess.

How do I ensure a sequence has a certain length?

I want to check that an IEnumerable contains exactly one element. This snippet does work:
bool hasOneElement = seq.Count() == 1
However it's not very efficient, as Count() will enumerate the entire list. Obviously, knowing a list is empty or contains more than 1 element means it's not empty. Is there an extension method that has this short-circuiting behaviour?

This should do it:
public static bool ContainsExactlyOneItem<T>(this IEnumerable<T> source)
{
using (IEnumerator<T> iterator = source.GetEnumerator())
{
// Check we've got at least one item
if (!iterator.MoveNext())
{
return false;
}
// Check we've got no more
return !iterator.MoveNext();
}
}
You could elide this further, but I don't suggest you do so:
public static bool ContainsExactlyOneItem<T>(this IEnumerable<T> source)
{
using (IEnumerator<T> iterator = source.GetEnumerator())
{
return iterator.MoveNext() && !iterator.MoveNext();
}
}
It's the sort of trick which is funky, but probably shouldn't be used in production code. It's just not clear enough. The fact that the side-effect in the LHS of the && operator is required for the RHS to work appropriately is just nasty... while a lot of fun ;)
EDIT: I've just seen that you came up with exactly the same thing but for an arbitrary length. Your final return statement is wrong though - it should be return !en.MoveNext(). Here's a complete method with a nicer name (IMO), argument checking and optimization for ICollection/ICollection<T>:
public static bool CountEquals<T>(this IEnumerable<T> source, int count)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (count < 0)
{
throw new ArgumentOutOfRangeException("count",
"count must not be negative");
}
// We don't rely on the optimizations in LINQ to Objects here, as
// they have changed between versions.
ICollection<T> genericCollection = source as ICollection<T>;
if (genericCollection != null)
{
return genericCollection.Count == count;
}
ICollection nonGenericCollection = source as ICollection;
if (nonGenericCollection != null)
{
return nonGenericCollection.Count == count;
}
// Okay, we're finally ready to do the actual work...
using (IEnumerator<T> iterator = source.GetEnumerator())
{
for (int i = 0; i < count; i++)
{
if (!iterator.MoveNext())
{
return false;
}
}
// Check we've got no more
return !iterator.MoveNext();
}
}
EDIT: And now for functional fans, a recursive form of CountEquals (please don't use this, it's only here for giggles):
public static bool CountEquals<T>(this IEnumerable<T> source, int count)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (count < 0)
{
throw new ArgumentOutOfRangeException("count",
"count must not be negative");
}
using (IEnumerator<T> iterator = source.GetEnumerator())
{
return IteratorCountEquals(iterator, count);
}
}
private static bool IteratorCountEquals<T>(IEnumerator<T> iterator, int count)
{
return count == 0 ? !iterator.MoveNext()
: iterator.MoveNext() && IteratorCountEquals(iterator, count - 1);
}
EDIT: Note that for something like LINQ to SQL, you should use the simple Count() approach - because that'll allow it to be done at the database instead of after fetching actual results.

No, but you can write one yourself:
public static bool HasExactly<T>(this IEnumerable<T> source, int count)
{
if(source == null)
throw new ArgumentNullException("source");
if(count < 0)
return false;
return source.Take(count + 1).Count() == count;
}
EDIT: Changed from atleast to exactly after clarification.
For a more general and efficient solution (which uses only 1 enumerator and checks if the sequence implements ICollection or ICollection<T> in which case enumeration is not necessary), you might want to take a look at my answer here, which lets you specify whether you are looking forExact,AtLeast, orAtMost tests.

seq.Skip(1).Any() will tell you if the list has zero or one elements.
I think the edit you made is about the most efficient way to check the length is n. But there's a logic fault, items less than length long will return true. See what I've done to the second return statement.
public static bool LengthEquals<T>(this IEnumerable<T> en, int length)
{
using (var er = en.GetEnumerator())
{
for (int i = 0; i < length; i++)
{
if (!er.MoveNext())
return false;
}
return !er.MoveNext();
}
}

How about this?
public static bool CountEquals<T>(this IEnumerable<T> source, int count) {
return source.Take(count + 1).Count() == count;
}
The Take() will make sure we never call MoveNext more than count+1 times.
I'd like to note that for any instance of ICollection, the original implementation source.Count() == count should be faster because Count() is optimised to just look at the Count member.

I believe what you're looking for is .Single(). Anything other than exactly one will throw InvalidOperationException that you can catch.
http://msdn.microsoft.com/nb-no/library/bb155325.aspx

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Efficient Linq Enumerable's 'Count() == 1' test - c#

int constrainedCount = yourSequence.Take(2).Count(); // if constrainedCount == 0 then the sequence is empty // if constrainedCount == 1 then the sequence contains a single element // if constrainedCount == 2 then the sequence has more than one element

One way is to write a new extension method public static bool IsSingle<T>(this IEnumerable<T> enumerable) { using (var enumerator = enumerable.GetEnumerator()) { if (!enumerator.MoveNext()) { return false; } return !enumerator.MoveNext(); } }

Another way: bool containsMoreThanOneElement = yourSequence.Skip(1).Any(); Or for exactly 1 element: bool containsOneElement = yourSequence.Any() && !yourSequence.Skip(1).Any();

Efficient Count() == n test: public static bool CountIsEqualTo<T>(this IEnumerable<T> enumerable, int c) { using (var enumerator = enumerable.GetEnumerator()) { for(var i = 0; i < c ; i++) if (!enumerator.MoveNext()) return false; return !enumerator.MoveNext(); } }

With linq to objects, SingleOrDefault throws if there is more than one element, so you're probably best off if you roll your own. EDIT: Now I've seen LukeH's answer, and I have to say I prefer it. Wish I'd thought of it myself!

bool hasTwo = yourSequence.ElementAtOrDefault(1) != default(T); ...in case of class where values can be null this could maybe we useful.

Related

Does successive calls to a method like Count() reenumerate an IEnumerable<T>?

Looking for alternative LINQ expression(s)

How to properly check IEnumerable for existing results

Extensions for IEnumerable generic

How do I ensure a sequence has a certain length?

Categories

Resources