Extensions for IEnumerable generic - c#

I've got two extensions for IEnumerable:
public static class IEnumerableGenericExtensions
{
public static IEnumerable<IEnumerable<T>> InSetsOf<T>(this IEnumerable<T> source, int max)
{
List<T> toReturn = new List<T>(max);
foreach (var item in source)
{
toReturn.Add(item);
if (toReturn.Count == max)
{
yield return toReturn;
toReturn = new List<T>(max);
}
}
if (toReturn.Any())
{
yield return toReturn;
}
}
public static int IndexOf<T>(this IEnumerable<T> source, Predicate<T> searchPredicate)
{
int i = 0;
foreach (var item in source)
if (searchPredicate(item))
return i;
else
i++;
return -1;
}
}
Then I write this code:
Pages = history.InSetsOf<Message>(500);
var index = Pages.IndexOf(x => x == Pages.ElementAt(0));
where
public class History : IEnumerable
But as a result I've got not '0' as I've expected, but '-1'. I cant understand - why so?

When you write Pages.IndexOf(x => x == Pages.ElementAt(0));, you actually run InSetsOf many times, due to deferred execution (aka lazy). To expand:
Pages = history.InSetsOf<Message>(500) - this line doesn't run InSetsOf at all.
Pages.IndexOf - Iterates over Pages, so it starts executing InSetsOf once.
x == Pages.ElementAt(0) - this executes InSetsOf again, once for every element in the collection of Pages (or at least until searchPredicate return true, which doesn't happen here).
Each time you run InSetsOf you create a new list (specifically, a new first list, because you use ElementAt(0)). These are two different objects, so comparison of == between them fails.
An extremely simple fix would be to return a list, so Pages is not a deferred query, but a concrete collection:
Pages = history.InSetsOf<Message>(500).ToList();
Another option is to use SequenceEqual, though I'd recommend caching the first element anyway:
Pages = history.InSetsOf<Message>(500);
var firstPage = Pages.FirstOrDefault();
var index = Pages.IndexOf(x => x.SequenceEqual(firstPage));

Does your class T implement the IComparable? If not, your equality check might be flawed, as the framework does not know exactly when T= T. You would also get by just overriding equals on your class T I would guess.

Related

Does successive calls to a method like Count() reenumerate an IEnumerable<T>?

I'm a bit confused with the IEnumerable and it's deferred execution behaviour.
Let's say I have the following IEnumerable<T>:
IEnumerable<Foo> enumerable = GetFoos();
What happens if I do:
int count = enumerable.Count();
count = enumerable.Count();
I can think of three possibilities:
a) will enumerate the collection again
b) will cache the result to use on the second time (like lazy loading)
c) will depend on the underlining type instantiated in GetFoos() method and how it implemented the IEnumerable interface
Which one is the correct one? Also, if c is the correct one, what happens with an IEnumerable created using yield return?
A quick check on referencesourcecode gives following definition for the .Count<T>(this IEnumerable<T>) extension (simplified):
Disclaimer you should not depend on that implementation nor expect that the implementation will always do something in a certain way.
public static int Count<TSource>(this IEnumerable<TSource> source) {
ICollection<TSource> collectionoft = source as ICollection<TSource>;
if (collectionoft != null) return collectionoft.Count;
ICollection collection = source as ICollection;
if (collection != null) return collection.Count;
int count = 0;
using (IEnumerator<TSource> e = source.GetEnumerator()) {
while (e.MoveNext()) count++;
}
return count;
}
So the answer it is (c), it will depend on the underlying type. And if it's not of type ICollection the IEnumerable will be evaluated a second time (i.e GetEnumerator will be called and the Enumerator will be looped.
So what happens when using yield syntax?
Well yield is just a fancy way to implement GetEnumerator, so what will happen when calling Count() twice is that this pseudo GetEnumerator method will be called twice. I think a code snippet says more than a thousand words:
private static IEnumerable<int> ConstantEnumerable()
{
yield return 1;
yield return 2;
yield return 3;
}
private static int i = 0;
private static IEnumerable<int> ChangingEnumerable()
{
if (i == 0)
{
yield return 1;
i++;
}
else
{
yield return 2;
yield return 3;
}
}
public static void Main()
{
var constant = ConstantEnumerable();
var changing = ChangingEnumerable();
Console.WriteLine("Constant: {0}, {1}", constant.Count(), constant.Count()); // 3, 3
Console.WriteLine("Changing: {0}, {1}", changing.Count(), changing.Count()); // 1, 2
}

How to properly check IEnumerable for existing results

What's the best practice to check if a collection has items?
Here's an example of what I have:
var terminalsToSync = TerminalAction.GetAllTerminals();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
The GetAllTerminals() method will execute a stored procedure and, if we return a result, (Any() is true), SyncTerminals() will loop through the elements; thus enumerating it again and executing the stored procedure for the second time.
What's the best way to avoid this?
I'd like a good solution that can be used in other cases too; possibly without converting it to List.
Thanks in advance.
I would probably use a ToArray call, and then check Length; you're going to enumerate all the results anyway so why not do it early? However, since you've said you want to avoid early realisation of the enumerable...
I'm guessing that SyncTerminals has a foreach, in which case you can write it something like this:
bool any = false;
foreach(var terminal in terminalsToSync)
{
if(!any)any = true;
//....
}
if(!any)
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
Okay, there's a redundant if after the first loop, but I'm guessing the cost of an extra few CPU cycles isn't going to matter much.
Equally, you could do the iteration the old way and use a do...while loop and GetEnumerator; taking the first iteration out of the loop; that way there are literally no wasted operations:
var enumerator = terminalsToSync.GetEnumerator();
if(enumerator.MoveNext())
{
do
{
//sync enumerator.Current
} while(enumerator.MoveNext())
}
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
How about this, which still defers execution, but buffers it once executed:
var terminalsToSync = TerminalAction.GetAllTerminals().Lazily();
with:
public static class LazyEnumerable {
public static IEnumerable<T> Lazily<T>(this IEnumerable<T> source) {
if (source is LazyWrapper<T>) return source;
return new LazyWrapper<T>(source);
}
class LazyWrapper<T> : IEnumerable<T> {
private IEnumerable<T> source;
private bool executed;
public LazyWrapper(IEnumerable<T> source) {
if (source == null) throw new ArgumentNullException("source");
this.source = source;
}
IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
public IEnumerator<T> GetEnumerator() {
if (!executed) {
executed = true;
source = source.ToList();
}
return source.GetEnumerator();
}
}
}
Personally i wouldnt use an any here, foreach will simply not loop through any items if the collection is empty, so i would just do it like that. However i would recommend that you check for null.
If you do want to pre-enumerate the set use .ToArray() eg will only enumerate once:
var terminalsToSync = TerminalAction.GetAllTerminals().ToArray();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
var terminalsToSync = TerminalAction.GetAllTerminals().ToList();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
.Length or .Count is faster since it doesn't need to go through the GetEnumerator()/MoveNext()/Dispose() required by Any()
Here's another way of approaching this problem:
int count = SyncTerminals(terminalsToSync);
if(count == 0) GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
where you change SyncTerminals to do:
int count = 0;
foreach(var obj in terminalsToSync) {
count++;
// some code
}
return count;
Nice and simple.
All the caching solutions here are caching all items when the first item is being retrieved. It it really lazy if you cache each single item while the items of the list are is iterated.
The difference can be seen in this example:
public class LazyListTest
{
private int _count = 0;
public void Test()
{
var numbers = Enumerable.Range(1, 40);
var numbersQuery = numbers.Select(GetElement).ToLazyList(); // Cache lazy
var total = numbersQuery.Take(3)
.Concat(numbersQuery.Take(10))
.Concat(numbersQuery.Take(3))
.Sum();
Console.WriteLine(_count);
}
private int GetElement(int value)
{
_count++;
// Some slow stuff here...
return value * 100;
}
}
If you run the Test() method, the _count is only 10. Without caching it would be 16 and with .ToList() it would be 40!
An example of the implementation of LazyList can be found here.
If you're seeing two procedure calls for the evaluation of whatever GetAllTerminals() returns, this means that the procedure's result isn't being cached. Without knowing what data-access strategy you're using, this is quite hard to fix in a general way.
The simplest solution, as you've alluded, is to copy the result of the call before you perform any other operations. If you wanted to, you could neatly wrap this behaviour up in an IEnumerable<T> which executes the inner enumerable call just once:
public class CachedEnumerable<T> : IEnumerable<T>
{
public CachedEnumerable<T>(IEnumerable<T> enumerable)
{
result = new Lazy<List<T>>(() => enumerable.ToList());
}
private Lazy<List<T>> result;
public IEnumerator<T> GetEnumerator()
{
return this.result.Value.GetEnumerator();
}
System.Collections.IEnumerable GetEnumerator()
{
return this.GetEnumerator();
}
}
Wrap the result in an instance of this type and it will not evaluate the inner enumerable multiple times.

Efficient Linq Enumerable's 'Count() == 1' test

Similar to this question but rephrased for Linq:
You can use Enumerable<T>.Any() to test if the enumerable contains data. But what's the efficient way to test if the enumerable contains a single value (i.e. Enumerable<T>.Count() == 1) or greater than a single value (i.e. Enumerable<T>.Count() > 1) without using an expensive count operation?
int constrainedCount = yourSequence.Take(2).Count();
// if constrainedCount == 0 then the sequence is empty
// if constrainedCount == 1 then the sequence contains a single element
// if constrainedCount == 2 then the sequence has more than one element
One way is to write a new extension method
public static bool IsSingle<T>(this IEnumerable<T> enumerable) {
using (var enumerator = enumerable.GetEnumerator()) {
if (!enumerator.MoveNext()) {
return false;
}
return !enumerator.MoveNext();
}
}
This code take's LukeH's excellent answer and wraps it up as an IEnumerable extension so that your code can deal in terms of None, One and Many rather than 0, 1 and 2.
public enum Multiplicity
{
None,
One,
Many,
}
In a static class, e.g. EnumerableExtensions:
public static Multiplicity Multiplicity<TElement>(this IEnumerable<TElement> #this)
{
switch (#this.Take(2).Count())
{
case 0: return General.Multiplicity.None;
case 1: return General.Multiplicity.One;
case 2: return General.Multiplicity.Many;
default: throw new Exception("WTF‽");
}
}
Another way:
bool containsMoreThanOneElement = yourSequence.Skip(1).Any();
Or for exactly 1 element:
bool containsOneElement = yourSequence.Any() && !yourSequence.Skip(1).Any();
Efficient Count() == n test:
public static bool CountIsEqualTo<T>(this IEnumerable<T> enumerable, int c)
{
using (var enumerator = enumerable.GetEnumerator())
{
for(var i = 0; i < c ; i++)
if (!enumerator.MoveNext())
return false;
return !enumerator.MoveNext();
}
}
With linq to objects, SingleOrDefault throws if there is more than one element, so you're probably best off if you roll your own.
EDIT: Now I've seen LukeH's answer, and I have to say I prefer it. Wish I'd thought of it myself!
bool hasTwo = yourSequence.ElementAtOrDefault(1) != default(T);
...in case of class where values can be null this could maybe we useful.

Calculating Count for IEnumerable (Non Generic)

Can anyone help me with a Count extension method for IEnumerable (non generic interface).
I know it is not supported in LINQ but how to write it manually?
yourEnumerable.Cast<object>().Count()
To the comment about performance:
I think this is a good example of premature optimization but here you go:
static class EnumerableExtensions
{
public static int Count(this IEnumerable source)
{
int res = 0;
foreach (var item in source)
res++;
return res;
}
}
The simplest form would be:
public static int Count(this IEnumerable source)
{
int c = 0;
using (var e = source.GetEnumerator())
{
while (e.MoveNext())
c++;
}
return c;
}
You can then improve on this by querying for ICollection:
public static int Count(this IEnumerable source)
{
var col = source as ICollection;
if (col != null)
return col.Count;
int c = 0;
using (var e = source.GetEnumerator())
{
while (e.MoveNext())
c++;
}
return c;
}
Update
As Gerard points out in the comments, non-generic IEnumerable does not inherit IDisposable so the normal using statement won't work. It is probably still important to attempt to dispose of such enumerators if possible - an iterator method implements IEnumerable and so may be passed indirectly to this Count method. Internally, that iterator method will be depending on a call to Dispose to trigger its own try/finally and using statements.
To make this easy in other circumstances too, you can make your own version of the using statement that is less fussy at compile time:
public static void DynamicUsing(object resource, Action action)
{
try
{
action();
}
finally
{
IDisposable d = resource as IDisposable;
if (d != null)
d.Dispose();
}
}
And the updated Count method would then be:
public static int Count(this IEnumerable source)
{
var col = source as ICollection;
if (col != null)
return col.Count;
int c = 0;
var e = source.GetEnumerator();
DynamicUsing(e, () =>
{
while (e.MoveNext())
c++;
});
return c;
}
Different types of IEnumerable have different optimal methods for determining count; unfortunately, there's no general-purpose means of knowing which method will be best for any given IEnumerable, nor is there even any standard means by which an IEmumerable can indicate which of the following techniques is best:
Simply ask the object directly. Some types of objects that support IEnumerable, such as Array, List and Collection, have properties which can directly report the number of elements in them.
Enumerate all items, discarding them, and count the number of items enumerated.
Enumerate all items into a list, and then use the list if it's necessary to use the enumeration again.
Each of the above will be optimal in different cases.
I think the type chosen to represent your sequence of elements should have been ICollection instead of IEnumerable, in the first place.
Both ICollection and ICollection<T> provide a Count property - plus - every ICollection implements IEnumearable as well.

How to iterate over two arrays at once?

I have two arrays built while parsing a text file. The first contains the column names, the second contains the values from the current row. I need to iterate over both lists at once to build a map. Right now I have the following:
var currentValues = currentRow.Split(separatorChar);
var valueEnumerator = currentValues.GetEnumerator();
foreach (String column in columnList)
{
valueEnumerator.MoveNext();
valueMap.Add(column, (String)valueEnumerator.Current);
}
This works just fine, but it doesn't quite satisfy my sense of elegance, and it gets really hairy if the number of arrays is larger than two (as I have to do occasionally). Does anyone have another, terser idiom?
You've got a non-obvious pseudo-bug in your initial code - IEnumerator<T> extends IDisposable so you should dispose it. This can be very important with iterator blocks! Not a problem for arrays, but would be with other IEnumerable<T> implementations.
I'd do it like this:
public static IEnumerable<TResult> PairUp<TFirst,TSecond,TResult>
(this IEnumerable<TFirst> source, IEnumerable<TSecond> secondSequence,
Func<TFirst,TSecond,TResult> projection)
{
using (IEnumerator<TSecond> secondIter = secondSequence.GetEnumerator())
{
foreach (TFirst first in source)
{
if (!secondIter.MoveNext())
{
throw new ArgumentException
("First sequence longer than second");
}
yield return projection(first, secondIter.Current);
}
if (secondIter.MoveNext())
{
throw new ArgumentException
("Second sequence longer than first");
}
}
}
Then you can reuse this whenever you have the need:
foreach (var pair in columnList.PairUp(currentRow.Split(separatorChar),
(column, value) => new { column, value })
{
// Do something
}
Alternatively you could create a generic Pair type, and get rid of the projection parameter in the PairUp method.
EDIT:
With the Pair type, the calling code would look like this:
foreach (var pair in columnList.PairUp(currentRow.Split(separatorChar))
{
// column = pair.First, value = pair.Second
}
That looks about as simple as you can get. Yes, you need to put the utility method somewhere, as reusable code. Hardly a problem in my view. Now for multiple arrays...
If the arrays are of different types, we have a problem. You can't express an arbitrary number of type parameters in a generic method/type declaration - you could write versions of PairUp for as many type parameters as you wanted, just like there are Action and Func delegates for up to 4 delegate parameters - but you can't make it arbitrary.
If the values will all be of the same type, however - and if you're happy to stick to arrays - it's easy. (Non-arrays is okay too, but you can't do the length checking ahead of time.) You could do this:
public static IEnumerable<T[]> Zip<T>(params T[][] sources)
{
// (Insert error checking code here for null or empty sources parameter)
int length = sources[0].Length;
if (!sources.All(array => array.Length == length))
{
throw new ArgumentException("Arrays must all be of the same length");
}
for (int i=0; i < length; i++)
{
// Could do this bit with LINQ if you wanted
T[] result = new T[sources.Length];
for (int j=0; j < result.Length; j++)
{
result[j] = sources[j][i];
}
yield return result;
}
}
Then the calling code would be:
foreach (var array in Zip(columns, row, whatevers))
{
// column = array[0]
// value = array[1]
// whatever = array[2]
}
This involves a certain amount of copying, of course - you're creating an array each time. You could change that by introducing another type like this:
public struct Snapshot<T>
{
readonly T[][] sources;
readonly int index;
public Snapshot(T[][] sources, int index)
{
this.sources = sources;
this.index = index;
}
public T this[int element]
{
return sources[element][index];
}
}
This would probably be regarded as overkill by most though ;)
I could keep coming up with all kinds of ideas, to be honest... but the basics are:
With a little bit of reusable work, you can make the calling code nicer
For arbitrary combinations of types you'll have to do each number of parameters (2, 3, 4...) separately due to the way generics works
If you're happy to use the same type for each part, you can do better
if there are the same number of column names as there are elements in each row, could you not use a for loop?
var currentValues = currentRow.Split(separatorChar);
for(var i=0;i<columnList.Length;i++){
// use i to index both (or all) arrays and build your map
}
In a functional language you would usually find a "zip" function which will hopefully be part of a C#4.0 . Bart de Smet provides a funny implementation of zip based on existing LINQ functions:
public static IEnumerable<TResult> Zip<TFirst, TSecond, TResult>(
this IEnumerable<TFirst> first,
IEnumerable<TSecond> second,
Func<TFirst, TSecond, TResult> func)
{
return first.Select((x, i) => new { X = x, I = i })
.Join(second.Select((x, i) => new { X = x, I = i }),
o => o.I,
i => i.I,
(o, i) => func(o.X, i.X));
}
Then you can do:
int[] s1 = new [] { 1, 2, 3 };
int[] s2 = new[] { 4, 5, 6 };
var result = s1.Zip(s2, (i1, i2) => new {Value1 = i1, Value2 = i2});
If you're really using arrays, the best way is probably just to use the conventional for loop with indices. Not as nice, granted, but as far as I know .NET doesn't offer a better way of doing this.
You could also encapsulate your code into a method called zip – this is a common higher-order list function. However, C# lacking a suitable Tuple type, this is quite crufty. You'd end up returning an IEnumerable<KeyValuePair<T1, T2>> which isn't very nice.
By the way, are you really using IEnumerable instead of IEnumerable<T> or why do you cast the Current value?
Use IEnumerator for both would be nice
var currentValues = currentRow.Split(separatorChar);
using (IEnumerator<string> valueEnum = currentValues.GetEnumerator(), columnEnum = columnList.GetEnumerator()) {
while (valueEnum.MoveNext() && columnEnum.MoveNext())
valueMap.Add(columnEnum.Current, valueEnum.Current);
}
Or create an extension methods
public static IEnumerable<TResult> Zip<T1, T2, TResult>(this IEnumerable<T1> source, IEnumerable<T2> other, Func<T1, T2, TResult> selector) {
using (IEnumerator<T1> sourceEnum = source.GetEnumerator()) {
using (IEnumerator<T2> otherEnum = other.GetEnumerator()) {
while (sourceEnum.MoveNext() && columnEnum.MoveNext())
yield return selector(sourceEnum.Current, otherEnum.Current);
}
}
}
Usage
var currentValues = currentRow.Split(separatorChar);
foreach (var valueColumnPair in currentValues.Zip(columnList, (a, b) => new { Value = a, Column = b }) {
valueMap.Add(valueColumnPair.Column, valueColumnPair.Value);
}
Instead of creating two seperate arrays you could make a two-dimensional array, or a dictionary (which would be better). But really, if it works I wouldn't try to change it.

Categories