How does Concat() actually join the collections at lower level? - c#

What is Linq actually doing?

(I'm assuming this is for LINQ to Objects. Anything else will be implemented differently :)
It's just returning everything from the first, and then everything from the second. All data is streamed. Something like this:
public static IEnumerable<T> Concat(this IEnumerable<T> source1,
IEnumerable<T> source2)
{
if (source1 == null)
{
throw new ArgumentNullException("source1");
}
if (source2 == null)
{
throw new ArgumentNullException("source1");
}
return ConcatImpl(source1, source2);
}
private static IEnumerable<T> ConcatImpl(this IEnumerable<T> source1,
IEnumerable<T> source2)
{
foreach (T item in source1)
{
yield return item;
}
foreach (T item in source2)
{
yield return item;
}
}
I've split this into two methods so that the argument validation can be performed eagerly but I can still use an iterator block. (No code within an iterator block is executed until the first call to MoveNext() on the result.)

It enumerates each collection in turn, and yields each element. Something like that :
public static IEnumerable<T> Concat<T>(this IEnumerable<T> source, IEnumerable<T> other)
{
foreach(var item in source) yield return item;
foreach(var item in other) yield return item;
}
(if you look at the actual implementation using Reflector, you will see that the iterator is actually implemented in a separate method)

It depends on the LINQ provider you are using. LinqToSql or L2E might use a database UNION, whereas LINQ to Objects might just enumerate both collections for your in turn.

Related

Is there really multiple iterations of an IEnumerable in this code?

public static IEnumerable<T> ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
foreach (var element in source) action(element);
return source;
}
The code above gives me a warning that source is potentially iterated multiple times, but is it really? It's iterated once in the foreach statement, but how does the return statement make it iterated again?
The return doesn't iterate the enumerable again. But you are returning it, and since the only way you could possibly do ANYTHING with this result is to iterate it elsewhere, this warning is given.
Why return it if you are not going to iterate it later?
In the current code you show there is only one iteration. It is very well possible that you are enumerating it again somewhere else since the same enumerable is returned. That is what the warning is telling you.
My personal preference would be to change the return type of the ForEach to void, and add a second one that returns the result of a Func<T, R>.
So:
public static void ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
foreach (var element in source)
action(element);
}
And:
public static IEnumerable<R> ForEach<T, R>(this IEnumerable<T> source, Func<T, R> action)
{
foreach (var element in source)
yield return action(element);
}
That way you are never accidentally reusing the same enumerable and you are making it lazy: it won't execute before you actually call the enumerable.

How can I add an IEnumerable<T> to an existing ICollection<T>

Given an existing ICollection<T> instance (e.g. dest) what is the most efficient and readable way to add items from an IEnumerable<T>?
In my use case, I have some kind of utility method Collect(IEnumerable items) which returns a new ICollection with the elements from items, so I am doing it in the following way:
public static ICollection<T> Collect<T>(IEnumerable<T> items) where T:ICollection<T>
{
...
ICollection<T> dest = Activator.CreateInstance<T>();
items.Aggregate(dest, (acc, item) => { acc.Add(item); return acc; });
...
return dest;
}
Question: Is there any “better” way (more efficient or readable) of doing it?
UPDATE: I think the use of Aggregate() is quite fluent and not so inefficient as invoking ToList().ForEach(). But it does not look very readable. Since nobody else agrees with the use of Aggregate() I would like to read your reasons to NOT use Aggregate() for this purpose.
Just use Enumerable.Concat:
IEnumerable<YourType> result = dest.Concat(items);
If you want a List<T> as result use ToList:
List<YourType> result = dest.Concat(items).ToList();
// perhaps:
dest = result;
If dest is actually already a list and you want to modify it use AddRange:
dest.AddRange(items);
Update:
if you have to add items to a ICollection<T> method argument you could use this extension:
public static void AddRange<T>(this ICollection<T> collection, IEnumerable<T> seq)
{
List<T> list = collection as List<T>;
if (list != null)
list.AddRange(seq);
else
{
foreach (T item in seq)
collection.Add(item);
}
}
// ...
public static void Foo<T>(ICollection<T> dest)
{
IEnumerable<T> items = ...
dest.AddRange(items);
}
Personally I'd go with #ckruczek's comment of a foreach loop:
foreach (var item in items)
dest.Add(item);
Simple, clean, and pretty much everybody immediately understands what it does.
If you do insist on some method call hiding the loop, then some people define a custom ForEach extension method for IEnumerable<T>, similar to what's defined for List<T>. The implementation is trivial:
public static void ForEach<T>(this IEnumerable<T> source, Action<T> action) {
if (source == null) throw new ArgumentNullException(nameof(source));
if (action == null) throw new ArgumentNullException(nameof(action));
foreach (item in source)
action(item);
}
Given that, you would be able to write
items.ForEach(dest.Add);
I don't see much benefit in it myself, but no drawbacks either.
We actually wrote an extension method for this (along with a bunch of other ICollection extension methods):
public static class CollectionExt
{
public static void AddRange<T>(this ICollection<T> collection, IEnumerable<T> source)
{
Contract.Requires(collection != null);
Contract.Requires(source != null);
foreach (T item in source)
{
collection.Add(item);
}
}
}
So we can just use AddRange() on an ICollection():
ICollection<int> test = new List<int>();
test.AddRange(new [] {1, 2, 3});
Note: If you wanted to use List<T>.AddRange() if the underlying collection was of type List<T> you could implement the extension method like so:
public static void AddRange<T>(this ICollection<T> collection, IEnumerable<T> source)
{
var asList = collection as List<T>;
if (asList != null)
{
asList.AddRange(source);
}
else
{
foreach (T item in source)
{
collection.Add(item);
}
}
}
Most efficient:
foreach(T item in itens) dest.Add(item)
Most readable (BUT inefficient because it is creating a throwaway list):
items.ToList().ForEach(dest.Add);
Less readable, but Not so inefficient:
items.Aggregate(dest, (acc, item) => { acc.Add(item); return acc; });
items.ToList().ForEach(dest.Add);
If you dont want to create a new collection instance, then create an extension method.
public static class Extension
{
public static void AddRange<T>(this ICollection<T> source, IEnumerable<T> items)
{
if (items == null)
{
return;
}
foreach (T item in items)
{
source.Add(item);
}
}
}
Then you can edit your code like this:
ICollection<T> dest = ...;
IEnumerable<T> items = ...;
dest.AddRange(items);

Does LinQ Any() cast all items in a collection?

I know when Linq's Any() extension is used to determine if an enumerable has at least one element it will only consume a single element. But how does that work actually? Does it have to cast all items in the enumerable first, or does it just cast them one at a time, starting with the first and stopping there?
Any() works on an IEnumerable<T> so no cast is required. It's implementation is very simple, it simply iterates through the enumerable and sees if it can find any elements matching the specified criteria.
Simple implementation looks like:
public bool Any<T>(IEnumerable<T> list)
{
using (var enumerator = list.GetEnumerator())
{
return enumerator.MoveNext();
}
}
So, no any casting required
Code in the public static class Enumerable:
public static bool Any<TSource>(this IEnumerable<TSource> source) {
if(source==null) {
throw Error.ArgumentNull("source");
}
using(IEnumerator<TSource> enumerator=source.GetEnumerator()) {
if(enumerator.MoveNext()) {
return true;
}
}
return false;
}
public static bool Any<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {
if(source==null) {
throw Error.ArgumentNull("source");
}
if(predicate==null) {
throw Error.ArgumentNull("predicate");
}
foreach(TSource local in source) {
if(predicate(local)) {
return true;
}
}
return false;
}
Not seen the casting, but generic.

Is it possible to determine if an IEnumerable<T> has deffered execution pending?

I have a function that accepts an Enumerable. I need to ensure that the enumerator is evaluated, but I'd rather not create a copy of it (e.g. via ToList() or ToArray()) if it is all ready in a List or some other "frozen" collection. By Frozen I mean collections where the set of items is already established e.g. List, Array, FsharpSet, Collection etc, as opposed to linq stuff like Select() and where().
Is it possible to create a function "ForceEvaluation" that can determine if the enumerable has deffered execution pending, and then evaluate the enumerable?
public void Process(IEnumerable<Foo> foos)
{
IEnumerable<Foo> evalutedFoos = ForceEvaluation(foos)
EnterLockedMode(); // all the deferred processing needs to have been done before this line.
foreach (Foo foo in foos)
{
Bar(foo);
}
}
public IEnumerable ForceEvaluation(IEnumerable<Foo> foos)
{
if(??????)
{ return foos}
else
{return foos.ToList()}
}
}
After some more research I've realized that this is pretty much impossible in any practical sense, and would require complex code inspection of each iterator.
So I'm going to go with a variant of Mark's answer and create a white-list of known safe types and just call ToList() anything not on that is not on the white-list.
Thank you all for your help.
Edit*
After even more reflection, I've realized that this is equivalent to the halting problem. So very impossible.
Something that worked for me way :
IEnumerable<t> deffered = someArray.Where(somecondition);
if (deffered.GetType().UnderlyingSystemType.Namespace.Equals("System.Linq"))
{
//this is a deffered executin IEnumerable
}
You could try a hopeful check against IList<T> or ICollection<T>, but note that these can still be implemented lazily - but it is much rarer, and LINQ doesn't do that - it just uses iterators (not lazy collections). So:
var list = foos as IList<Foo>;
if(list != null) return list; // unchanged
return foos.ToList();
Note that this is different to the regular .ToList(), which gives you back a different list each time, to ensure nothing unexpected happens.
Most concrete collection types (including T[] and List<T>) satisfy IList<T>. I'm not familiar with the F# collections - you'd need to check that.
I would avoid it if you want to make sure it is "frozen". Both Array elements and List<> can be changed at any time (i.e. infamous "collection changed during iteration" exception). If you really need to make sure IEnumerable is evaluated AND not changing underneath your code than copy all items into your own List/Array.
There could be other reasons to try it - i.e. some operations inside run time do special checks for collection being an array to optimize them. Or have special version for specialized interface like ICollection or IQueryable in addition to generic IEnumerable.
EDIT: Example of collection changing during iteration:
IEnumerable<T> collectionAsEnumrable = collection;
foreach(var i in collectionAsEnumrable)
{
// something like following can be indirectly called by
// synchronous method on the same thread
collection.Add(i.Clone());
collection[3] = 33;
}
If it is possible to use a wrapper in your case, you could do something like this
public class ForceableEnumerable<T> : IEnumerable<T>
{
IEnumerable<T> _enumerable;
IEnumerator<T> _enumerator;
public ForceableEnumerable(IEnumerable<T> enumerable)
{
_enumerable = enumerable;
}
public void ForceEvaluation()
{
if (_enumerator != null) {
while (_enumerator.MoveNext()) {
}
}
}
#region IEnumerable<T> Members
public IEnumerator<T> GetEnumerator()
{
_enumerator = _enumerable.GetEnumerator();
return _enumerator;
}
#endregion
#region IEnumerable Members
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
#endregion
}
Or implement the force method like this if you want to evaluate in any case
public void ForceEvaluation()
{
if (_enumerator == null) {
_enumerator = _enumerable.GetEnumerator();
}
while (_enumerator.MoveNext()) {
}
}
EDIT:
If you want to ensure that the enumeration is evaluated only once in any case, you could change GetEnumerator to
public IEnumerator<T> GetEnumerator()
{
if (_enumerator == null) }
_enumerator = _enumerable.GetEnumerator();
}
return _enumerator;
}

Why is there a different runtime behaviour with deferred execution using the "yield" keyword in c#?

If you call the IgnoreNullItems extension method in the sammple code below the deferred execution works as expected however when using the IgnoreNullItemsHavingDifferentBehaviour the exception is raised immediately. Why?
List<string> testList = null;
testList.IgnoreNullItems(); //nothing happens as expected
testList.IgnoreNullItems().FirstOrDefault();
//raises ArgumentNullException as expected
testList.IgnoreNullItemsHavingDifferentBehaviour();
//raises ArgumentNullException immediately. not expected behaviour ->
// why is deferred execution not working here?
Thanks for sharing your ideas!
Raffael Zaghet
public static class EnumerableOfTExtension
{
public static IEnumerable<T> IgnoreNullItems<T>(this IEnumerable<T> source)
where T: class
{
if (source == null) throw new ArgumentNullException("source");
foreach (var item in source)
{
if (item != null)
{
yield return item;
}
}
yield break;
}
public static IEnumerable<T> IgnoreNullItemsHavingDifferentBehaviour<T>(
this IEnumerable<T> source)
where T : class
{
if (source == null) throw new ArgumentNullException("source");
return IgnoreNulls(source);
}
private static IEnumerable<T> IgnoreNulls<T>(IEnumerable<T> source)
where T : class
{
foreach (var item in source)
{
if (item != null)
{
yield return item;
}
}
yield break;
}
}
Here a version with the same behaviour:
Here a version that shows the same behaviour. Don't let resharper "improve" your foreach statement in this case ;) --> resharper changes the foreach to the "IgnoreNullItemsHavingDifferentBehaviour" version with a return statement.
public static IEnumerable<T> IgnoreNullItemsHavingSameBehaviour<T>(this IEnumerable<T> source) where T : class
{
if (source == null) throw new ArgumentNullException("source");
foreach (var item in IgnoreNulls(source))
{
yield return item;
}
yield break;
}
The exception is raised immediately because IgnoreNullItemsHavingDifferentBehaviour doesn't contain any "yield" itself.
Rather, it's IgnoreNulls which gets converted into an iterator block and thus uses deferred execution.
This is actually the approach that Jon Skeet used in his EduLinq series to force immediate null checks for source sequences. See this post for a more detailed explanation (specifically the "Let's implement it" section).
I haven't tested, but I can guess...
With the IgnoreNullItems method, the whole method is deferred until you being the enumeration. With your alternate method, only the execution of IgnoreNulls is deferred - the null check in IgnoreNullItemsHavingDifferentBehaviour happens immediately.
Deferred execution comes from how yield return works.
It will create the state machine inside a method, that will not start or do any code until you try to enumerate first item.
But when there is no yield return it will behave like normal method.
Perfectly explained and shown in Jon Skeet's Edulinq.

Categories