How do I ensure a sequence has a certain length? - c#

I want to check that an IEnumerable contains exactly one element. This snippet does work:
bool hasOneElement = seq.Count() == 1
However it's not very efficient, as Count() will enumerate the entire list. Obviously, knowing a list is empty or contains more than 1 element means it's not empty. Is there an extension method that has this short-circuiting behaviour?

This should do it:
public static bool ContainsExactlyOneItem<T>(this IEnumerable<T> source)
{
using (IEnumerator<T> iterator = source.GetEnumerator())
{
// Check we've got at least one item
if (!iterator.MoveNext())
{
return false;
}
// Check we've got no more
return !iterator.MoveNext();
}
}
You could elide this further, but I don't suggest you do so:
public static bool ContainsExactlyOneItem<T>(this IEnumerable<T> source)
{
using (IEnumerator<T> iterator = source.GetEnumerator())
{
return iterator.MoveNext() && !iterator.MoveNext();
}
}
It's the sort of trick which is funky, but probably shouldn't be used in production code. It's just not clear enough. The fact that the side-effect in the LHS of the && operator is required for the RHS to work appropriately is just nasty... while a lot of fun ;)
EDIT: I've just seen that you came up with exactly the same thing but for an arbitrary length. Your final return statement is wrong though - it should be return !en.MoveNext(). Here's a complete method with a nicer name (IMO), argument checking and optimization for ICollection/ICollection<T>:
public static bool CountEquals<T>(this IEnumerable<T> source, int count)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (count < 0)
{
throw new ArgumentOutOfRangeException("count",
"count must not be negative");
}
// We don't rely on the optimizations in LINQ to Objects here, as
// they have changed between versions.
ICollection<T> genericCollection = source as ICollection<T>;
if (genericCollection != null)
{
return genericCollection.Count == count;
}
ICollection nonGenericCollection = source as ICollection;
if (nonGenericCollection != null)
{
return nonGenericCollection.Count == count;
}
// Okay, we're finally ready to do the actual work...
using (IEnumerator<T> iterator = source.GetEnumerator())
{
for (int i = 0; i < count; i++)
{
if (!iterator.MoveNext())
{
return false;
}
}
// Check we've got no more
return !iterator.MoveNext();
}
}
EDIT: And now for functional fans, a recursive form of CountEquals (please don't use this, it's only here for giggles):
public static bool CountEquals<T>(this IEnumerable<T> source, int count)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
if (count < 0)
{
throw new ArgumentOutOfRangeException("count",
"count must not be negative");
}
using (IEnumerator<T> iterator = source.GetEnumerator())
{
return IteratorCountEquals(iterator, count);
}
}
private static bool IteratorCountEquals<T>(IEnumerator<T> iterator, int count)
{
return count == 0 ? !iterator.MoveNext()
: iterator.MoveNext() && IteratorCountEquals(iterator, count - 1);
}
EDIT: Note that for something like LINQ to SQL, you should use the simple Count() approach - because that'll allow it to be done at the database instead of after fetching actual results.

No, but you can write one yourself:
public static bool HasExactly<T>(this IEnumerable<T> source, int count)
{
if(source == null)
throw new ArgumentNullException("source");
if(count < 0)
return false;
return source.Take(count + 1).Count() == count;
}
EDIT: Changed from atleast to exactly after clarification.
For a more general and efficient solution (which uses only 1 enumerator and checks if the sequence implements ICollection or ICollection<T> in which case enumeration is not necessary), you might want to take a look at my answer here, which lets you specify whether you are looking forExact,AtLeast, orAtMost tests.

seq.Skip(1).Any() will tell you if the list has zero or one elements.
I think the edit you made is about the most efficient way to check the length is n. But there's a logic fault, items less than length long will return true. See what I've done to the second return statement.
public static bool LengthEquals<T>(this IEnumerable<T> en, int length)
{
using (var er = en.GetEnumerator())
{
for (int i = 0; i < length; i++)
{
if (!er.MoveNext())
return false;
}
return !er.MoveNext();
}
}

How about this?
public static bool CountEquals<T>(this IEnumerable<T> source, int count) {
return source.Take(count + 1).Count() == count;
}
The Take() will make sure we never call MoveNext more than count+1 times.
I'd like to note that for any instance of ICollection, the original implementation source.Count() == count should be faster because Count() is optimised to just look at the Count member.

I believe what you're looking for is .Single(). Anything other than exactly one will throw InvalidOperationException that you can catch.
http://msdn.microsoft.com/nb-no/library/bb155325.aspx

Related

Does successive calls to a method like Count() reenumerate an IEnumerable<T>?

I'm a bit confused with the IEnumerable and it's deferred execution behaviour.
Let's say I have the following IEnumerable<T>:
IEnumerable<Foo> enumerable = GetFoos();
What happens if I do:
int count = enumerable.Count();
count = enumerable.Count();
I can think of three possibilities:
a) will enumerate the collection again
b) will cache the result to use on the second time (like lazy loading)
c) will depend on the underlining type instantiated in GetFoos() method and how it implemented the IEnumerable interface
Which one is the correct one? Also, if c is the correct one, what happens with an IEnumerable created using yield return?
A quick check on referencesourcecode gives following definition for the .Count<T>(this IEnumerable<T>) extension (simplified):
Disclaimer you should not depend on that implementation nor expect that the implementation will always do something in a certain way.
public static int Count<TSource>(this IEnumerable<TSource> source) {
ICollection<TSource> collectionoft = source as ICollection<TSource>;
if (collectionoft != null) return collectionoft.Count;
ICollection collection = source as ICollection;
if (collection != null) return collection.Count;
int count = 0;
using (IEnumerator<TSource> e = source.GetEnumerator()) {
while (e.MoveNext()) count++;
}
return count;
}
So the answer it is (c), it will depend on the underlying type. And if it's not of type ICollection the IEnumerable will be evaluated a second time (i.e GetEnumerator will be called and the Enumerator will be looped.
So what happens when using yield syntax?
Well yield is just a fancy way to implement GetEnumerator, so what will happen when calling Count() twice is that this pseudo GetEnumerator method will be called twice. I think a code snippet says more than a thousand words:
private static IEnumerable<int> ConstantEnumerable()
{
yield return 1;
yield return 2;
yield return 3;
}
private static int i = 0;
private static IEnumerable<int> ChangingEnumerable()
{
if (i == 0)
{
yield return 1;
i++;
}
else
{
yield return 2;
yield return 3;
}
}
public static void Main()
{
var constant = ConstantEnumerable();
var changing = ChangingEnumerable();
Console.WriteLine("Constant: {0}, {1}", constant.Count(), constant.Count()); // 3, 3
Console.WriteLine("Changing: {0}, {1}", changing.Count(), changing.Count()); // 1, 2
}

How to properly check IEnumerable for existing results

What's the best practice to check if a collection has items?
Here's an example of what I have:
var terminalsToSync = TerminalAction.GetAllTerminals();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
The GetAllTerminals() method will execute a stored procedure and, if we return a result, (Any() is true), SyncTerminals() will loop through the elements; thus enumerating it again and executing the stored procedure for the second time.
What's the best way to avoid this?
I'd like a good solution that can be used in other cases too; possibly without converting it to List.
Thanks in advance.
I would probably use a ToArray call, and then check Length; you're going to enumerate all the results anyway so why not do it early? However, since you've said you want to avoid early realisation of the enumerable...
I'm guessing that SyncTerminals has a foreach, in which case you can write it something like this:
bool any = false;
foreach(var terminal in terminalsToSync)
{
if(!any)any = true;
//....
}
if(!any)
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
Okay, there's a redundant if after the first loop, but I'm guessing the cost of an extra few CPU cycles isn't going to matter much.
Equally, you could do the iteration the old way and use a do...while loop and GetEnumerator; taking the first iteration out of the loop; that way there are literally no wasted operations:
var enumerator = terminalsToSync.GetEnumerator();
if(enumerator.MoveNext())
{
do
{
//sync enumerator.Current
} while(enumerator.MoveNext())
}
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
How about this, which still defers execution, but buffers it once executed:
var terminalsToSync = TerminalAction.GetAllTerminals().Lazily();
with:
public static class LazyEnumerable {
public static IEnumerable<T> Lazily<T>(this IEnumerable<T> source) {
if (source is LazyWrapper<T>) return source;
return new LazyWrapper<T>(source);
}
class LazyWrapper<T> : IEnumerable<T> {
private IEnumerable<T> source;
private bool executed;
public LazyWrapper(IEnumerable<T> source) {
if (source == null) throw new ArgumentNullException("source");
this.source = source;
}
IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
public IEnumerator<T> GetEnumerator() {
if (!executed) {
executed = true;
source = source.ToList();
}
return source.GetEnumerator();
}
}
}
Personally i wouldnt use an any here, foreach will simply not loop through any items if the collection is empty, so i would just do it like that. However i would recommend that you check for null.
If you do want to pre-enumerate the set use .ToArray() eg will only enumerate once:
var terminalsToSync = TerminalAction.GetAllTerminals().ToArray();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
var terminalsToSync = TerminalAction.GetAllTerminals().ToList();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
.Length or .Count is faster since it doesn't need to go through the GetEnumerator()/MoveNext()/Dispose() required by Any()
Here's another way of approaching this problem:
int count = SyncTerminals(terminalsToSync);
if(count == 0) GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
where you change SyncTerminals to do:
int count = 0;
foreach(var obj in terminalsToSync) {
count++;
// some code
}
return count;
Nice and simple.
All the caching solutions here are caching all items when the first item is being retrieved. It it really lazy if you cache each single item while the items of the list are is iterated.
The difference can be seen in this example:
public class LazyListTest
{
private int _count = 0;
public void Test()
{
var numbers = Enumerable.Range(1, 40);
var numbersQuery = numbers.Select(GetElement).ToLazyList(); // Cache lazy
var total = numbersQuery.Take(3)
.Concat(numbersQuery.Take(10))
.Concat(numbersQuery.Take(3))
.Sum();
Console.WriteLine(_count);
}
private int GetElement(int value)
{
_count++;
// Some slow stuff here...
return value * 100;
}
}
If you run the Test() method, the _count is only 10. Without caching it would be 16 and with .ToList() it would be 40!
An example of the implementation of LazyList can be found here.
If you're seeing two procedure calls for the evaluation of whatever GetAllTerminals() returns, this means that the procedure's result isn't being cached. Without knowing what data-access strategy you're using, this is quite hard to fix in a general way.
The simplest solution, as you've alluded, is to copy the result of the call before you perform any other operations. If you wanted to, you could neatly wrap this behaviour up in an IEnumerable<T> which executes the inner enumerable call just once:
public class CachedEnumerable<T> : IEnumerable<T>
{
public CachedEnumerable<T>(IEnumerable<T> enumerable)
{
result = new Lazy<List<T>>(() => enumerable.ToList());
}
private Lazy<List<T>> result;
public IEnumerator<T> GetEnumerator()
{
return this.result.Value.GetEnumerator();
}
System.Collections.IEnumerable GetEnumerator()
{
return this.GetEnumerator();
}
}
Wrap the result in an instance of this type and it will not evaluate the inner enumerable multiple times.

Extensions for IEnumerable generic

I've got two extensions for IEnumerable:
public static class IEnumerableGenericExtensions
{
public static IEnumerable<IEnumerable<T>> InSetsOf<T>(this IEnumerable<T> source, int max)
{
List<T> toReturn = new List<T>(max);
foreach (var item in source)
{
toReturn.Add(item);
if (toReturn.Count == max)
{
yield return toReturn;
toReturn = new List<T>(max);
}
}
if (toReturn.Any())
{
yield return toReturn;
}
}
public static int IndexOf<T>(this IEnumerable<T> source, Predicate<T> searchPredicate)
{
int i = 0;
foreach (var item in source)
if (searchPredicate(item))
return i;
else
i++;
return -1;
}
}
Then I write this code:
Pages = history.InSetsOf<Message>(500);
var index = Pages.IndexOf(x => x == Pages.ElementAt(0));
where
public class History : IEnumerable
But as a result I've got not '0' as I've expected, but '-1'. I cant understand - why so?
When you write Pages.IndexOf(x => x == Pages.ElementAt(0));, you actually run InSetsOf many times, due to deferred execution (aka lazy). To expand:
Pages = history.InSetsOf<Message>(500) - this line doesn't run InSetsOf at all.
Pages.IndexOf - Iterates over Pages, so it starts executing InSetsOf once.
x == Pages.ElementAt(0) - this executes InSetsOf again, once for every element in the collection of Pages (or at least until searchPredicate return true, which doesn't happen here).
Each time you run InSetsOf you create a new list (specifically, a new first list, because you use ElementAt(0)). These are two different objects, so comparison of == between them fails.
An extremely simple fix would be to return a list, so Pages is not a deferred query, but a concrete collection:
Pages = history.InSetsOf<Message>(500).ToList();
Another option is to use SequenceEqual, though I'd recommend caching the first element anyway:
Pages = history.InSetsOf<Message>(500);
var firstPage = Pages.FirstOrDefault();
var index = Pages.IndexOf(x => x.SequenceEqual(firstPage));
Does your class T implement the IComparable? If not, your equality check might be flawed, as the framework does not know exactly when T= T. You would also get by just overriding equals on your class T I would guess.

Efficient Linq Enumerable's 'Count() == 1' test

Similar to this question but rephrased for Linq:
You can use Enumerable<T>.Any() to test if the enumerable contains data. But what's the efficient way to test if the enumerable contains a single value (i.e. Enumerable<T>.Count() == 1) or greater than a single value (i.e. Enumerable<T>.Count() > 1) without using an expensive count operation?
int constrainedCount = yourSequence.Take(2).Count();
// if constrainedCount == 0 then the sequence is empty
// if constrainedCount == 1 then the sequence contains a single element
// if constrainedCount == 2 then the sequence has more than one element
One way is to write a new extension method
public static bool IsSingle<T>(this IEnumerable<T> enumerable) {
using (var enumerator = enumerable.GetEnumerator()) {
if (!enumerator.MoveNext()) {
return false;
}
return !enumerator.MoveNext();
}
}
This code take's LukeH's excellent answer and wraps it up as an IEnumerable extension so that your code can deal in terms of None, One and Many rather than 0, 1 and 2.
public enum Multiplicity
{
None,
One,
Many,
}
In a static class, e.g. EnumerableExtensions:
public static Multiplicity Multiplicity<TElement>(this IEnumerable<TElement> #this)
{
switch (#this.Take(2).Count())
{
case 0: return General.Multiplicity.None;
case 1: return General.Multiplicity.One;
case 2: return General.Multiplicity.Many;
default: throw new Exception("WTF‽");
}
}
Another way:
bool containsMoreThanOneElement = yourSequence.Skip(1).Any();
Or for exactly 1 element:
bool containsOneElement = yourSequence.Any() && !yourSequence.Skip(1).Any();
Efficient Count() == n test:
public static bool CountIsEqualTo<T>(this IEnumerable<T> enumerable, int c)
{
using (var enumerator = enumerable.GetEnumerator())
{
for(var i = 0; i < c ; i++)
if (!enumerator.MoveNext())
return false;
return !enumerator.MoveNext();
}
}
With linq to objects, SingleOrDefault throws if there is more than one element, so you're probably best off if you roll your own.
EDIT: Now I've seen LukeH's answer, and I have to say I prefer it. Wish I'd thought of it myself!
bool hasTwo = yourSequence.ElementAtOrDefault(1) != default(T);
...in case of class where values can be null this could maybe we useful.

Calculating Count for IEnumerable (Non Generic)

Can anyone help me with a Count extension method for IEnumerable (non generic interface).
I know it is not supported in LINQ but how to write it manually?
yourEnumerable.Cast<object>().Count()
To the comment about performance:
I think this is a good example of premature optimization but here you go:
static class EnumerableExtensions
{
public static int Count(this IEnumerable source)
{
int res = 0;
foreach (var item in source)
res++;
return res;
}
}
The simplest form would be:
public static int Count(this IEnumerable source)
{
int c = 0;
using (var e = source.GetEnumerator())
{
while (e.MoveNext())
c++;
}
return c;
}
You can then improve on this by querying for ICollection:
public static int Count(this IEnumerable source)
{
var col = source as ICollection;
if (col != null)
return col.Count;
int c = 0;
using (var e = source.GetEnumerator())
{
while (e.MoveNext())
c++;
}
return c;
}
Update
As Gerard points out in the comments, non-generic IEnumerable does not inherit IDisposable so the normal using statement won't work. It is probably still important to attempt to dispose of such enumerators if possible - an iterator method implements IEnumerable and so may be passed indirectly to this Count method. Internally, that iterator method will be depending on a call to Dispose to trigger its own try/finally and using statements.
To make this easy in other circumstances too, you can make your own version of the using statement that is less fussy at compile time:
public static void DynamicUsing(object resource, Action action)
{
try
{
action();
}
finally
{
IDisposable d = resource as IDisposable;
if (d != null)
d.Dispose();
}
}
And the updated Count method would then be:
public static int Count(this IEnumerable source)
{
var col = source as ICollection;
if (col != null)
return col.Count;
int c = 0;
var e = source.GetEnumerator();
DynamicUsing(e, () =>
{
while (e.MoveNext())
c++;
});
return c;
}
Different types of IEnumerable have different optimal methods for determining count; unfortunately, there's no general-purpose means of knowing which method will be best for any given IEnumerable, nor is there even any standard means by which an IEmumerable can indicate which of the following techniques is best:
Simply ask the object directly. Some types of objects that support IEnumerable, such as Array, List and Collection, have properties which can directly report the number of elements in them.
Enumerate all items, discarding them, and count the number of items enumerated.
Enumerate all items into a list, and then use the list if it's necessary to use the enumeration again.
Each of the above will be optimal in different cases.
I think the type chosen to represent your sequence of elements should have been ICollection instead of IEnumerable, in the first place.
Both ICollection and ICollection<T> provide a Count property - plus - every ICollection implements IEnumearable as well.

Categories