Imagine I have the following:
private IEnumerable MyFunc(parameter a)
{
using(MyDataContext dc = new MyDataContext)
{
return dc.tablename.Select(row => row.parameter == a);
}
}
private void UsingFunc()
{
var result = MyFunc(new a());
foreach(var row in result)
{
//Do something
}
}
According to the documentation the linq execution will defer till I actual enumerate the result, which occurs in the line at the foreach. However the using statement should force the object to be collected reliably at the end of the call to MyFunct().
What actually happens, when will the disposer run and/or the result run?
Only thing I can think of is the deferred execution is computed at compile time, so the actual call is moved by the compiler to the first line of the foreach, causing the using to perform correctly, but not run until the foreach line?
Is there a guru out there who can help?
EDIT: NOTE: This code does work, I just don't understand how.
I did some reading and I realised in my code that I had called the ToList() extension method which of course enumerates the result. The ticked answer's behaviour is perfectly correct for the actual question answered.
Sorry for any confusion.
I would expect that to simply not work; the Select is deferred, so no data has been consumed at this point. However, since you have disposed the data-context (before leaving MyFunc), it will never be able to get data. A better option is to pass the data-context into the method, so that the consumer can choose the lifetime. Also, I would recommend returning IQueryable<T> so that the consumer can "compose" the result (i.e. add OrderBy / Skip / Take / Where etc, and have it impact the final query):
// this could also be an instance method on the data-context
internal static IQueryable<SomeType> MyFunc(
this MyDataContext dc, parameter a)
{
return dc.tablename.Where(row => row.parameter == a);
}
private void UsingFunc()
{
using(MyDataContext dc = new MyDataContext()) {
var result = dc.MyFunc(new a());
foreach(var row in result)
{
//Do something
}
}
}
Update: if you (comments) don't want to defer execution (i.e. you don't want the caller dealing with the data-context), then you need to evaluate the results. You can do this by calling .ToList() or .ToArray() on the result to buffer the values.
private IEnumerable<SomeType> MyFunc(parameter a)
{
using(MyDataContext dc = new MyDataContext)
{
// or ToList() etc
return dc.tablename.Where(row => row.parameter == a).ToArray();
}
}
If you want to keep it deferred in this case, then you need to use an "iterator block":
private IEnumerable<SomeType> MyFunc(parameter a)
{
using(MyDataContext dc = new MyDataContext)
{
foreach(SomeType row in dc
.tablename.Where(row => row.parameter == a))
{
yield return row;
}
}
}
This is now deferred without passing the data-context around.
I just posted another deferred-execution solution to this problem here, including this sample code:
IQueryable<MyType> MyFunc(string myValue)
{
return from dc in new MyDataContext().Use()
from row in dc.MyTable
where row.MyField == myValue
select row;
}
void UsingFunc()
{
var result = MyFunc("MyValue").OrderBy(row => row.SortOrder);
foreach(var row in result)
{
//Do something
}
}
The Use() extension method essentially acts like a deferred using block.
Related
Just wondering why a Select call won't execute if it's called inside of an extended method?
Or is it maybe that I'm thinking Select does one thing, while it's purpose is for something different?
Code Example:
var someList = new List<SomeObject>();
int triggerOn = 5;
/* list gets populated*/
someList.MutateList(triggerOn, "Add something", true);
MutateList method declaration:
public static class ListExtension
{
public static IEnumerable<SomeObject> MutateList(this IEnumerable<SomeObject> objects, int triggerOn, string attachment, bool shouldSkip = false)
{
return objects.Select(obj =>
{
if (obj.ID == triggerOn)
{
if (shouldSkip) shouldSkip = false;
else obj.Name += $" {attachment}";
}
return obj;
});
}
}
The solution without Select works. I'm just doing a foreach instead.
I know that the Select method has a summary saying: "Projects each element of a sequence into a new form." But if that were true, then wouldn't my code example be showing errors?
Solution that I used (Inside of the MutateList method):
foreach(SomeObject obj in objects)
{
if (obj.ID == triggerOn)
{
if (shouldSkip) shouldSkip = false;
else obj.Name += $" {attachment}";
}
});
return objects;
Select uses deferred execution, meaning that it does not actually execute until you try to iterate over the results, with a ForEach, or using Linq methods that require the actual results like ToList or Sum.
Also, it returns an iterator, it does not run on the items in-place, but you're not capturing the return value in your calling code.
For those reasons - I would recommend not using Select to mutate the object in the list. You're just wrapping a ForEach call in a less clean way. I would just use ForEach within the method.
If I am not wrong, the ToList() method iterate on each element of provided collection and add them to new instance of List and return this instance.Suppose an example
//using linq
list = Students.Where(s => s.Name == "ABC").ToList();
//traditional way
foreach (var student in Students)
{
if (student.Name == "ABC")
list.Add(student);
}
I think the traditional way is faster, as it loops only once, where as of above of Linq iterates twice once for Where method and then for ToList() method.
The project I am working on now has extensive use of Lists all over and I see there is alot of such kind of use of ToList() and other Methods that can be made better like above if I take list variable as IEnumerable and remove .ToList() and use it further as IEnumerable.
Do these things make any impact on performance?
Do these things make any impact on performance?
That depends on your code. Most of the time, using LINQ does cause a small performance hit. In some cases, this hit can be significant for you, but you should avoid LINQ only when you know that it is too slow for you (i.e. if profiling your code showed that LINQ is reason why your code is slow).
But you're right that using ToList() too often can cause significant performance problems. You should call ToList() only when you have to. Be aware that there are also cases where adding ToList() can improve performance a lot (e.g. when the collection is loaded from database every time it's iterated).
Regarding the number of iterations: it depends on what exactly do you mean by “iterates twice”. If you count the number of times MoveNext() is called on some collection, then yes, using Where() this way leads to iterating twice. The sequence of operations goes like this (to simplify, I'm going to assume that all items match the condition):
Where() is called, no iteration for now, Where() returns a special enumerable.
ToList() is called, calling MoveNext() on the enumerable returned from Where().
Where() now calls MoveNext() on the original collection and gets the value.
Where() calls your predicate, which returns true.
MoveNext() called from ToList() returns, ToList() gets the value and adds it to the list.
…
What this means is that if all n items in the original collection match the condition, MoveNext() will be called 2n times, n times from Where() and n times from ToList().
var list = Students.Where(s=>s.Name == "ABC");
This will only create a query and not loop the elements until the query is used. By calling ToList() will first then execute the query and thus only loop your elements once.
List<Student> studentList = new List<Student>();
var list = Students.Where(s=>s.Name == "ABC");
foreach(Student s in list)
{
studentList.add(s);
}
this example will also only iterate once. Because its only used once. Keep in mind that list will iterate all students everytime its called.. Not only just those whose names are ABC. Since its a query.
And for the later discussion Ive made a testexample. Perhaps its not the very best implementation of IEnumable but it does what its supposed to do.
First we have our list
public class TestList<T> : IEnumerable<T>
{
private TestEnumerator<T> _Enumerator;
public TestList()
{
_Enumerator = new TestEnumerator<T>();
}
public IEnumerator<T> GetEnumerator()
{
return _Enumerator;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
throw new NotImplementedException();
}
internal void Add(T p)
{
_Enumerator.Add(p);
}
}
And since we want to count how many times MoveNext is called we have to implement our custom enumerator aswel. Observe in MoveNext we have a counter that is static in our program.
public class TestEnumerator : IEnumerator
{
public Item FirstItem = null;
public Item CurrentItem = null;
public TestEnumerator()
{
}
public T Current
{
get { return CurrentItem.Value; }
}
public void Dispose()
{
}
object System.Collections.IEnumerator.Current
{
get { throw new NotImplementedException(); }
}
public bool MoveNext()
{
Program.Counter++;
if (CurrentItem == null)
{
CurrentItem = FirstItem;
return true;
}
if (CurrentItem != null && CurrentItem.NextItem != null)
{
CurrentItem = CurrentItem.NextItem;
return true;
}
return false;
}
public void Reset()
{
CurrentItem = null;
}
internal void Add(T p)
{
if (FirstItem == null)
{
FirstItem = new Item<T>(p);
return;
}
Item<T> lastItem = FirstItem;
while (lastItem.NextItem != null)
{
lastItem = lastItem.NextItem;
}
lastItem.NextItem = new Item<T>(p);
}
}
And then we have a custom item that just wraps our value
public class Item<T>
{
public Item(T item)
{
Value = item;
}
public T Value;
public Item<T> NextItem;
}
To use the actual code we create a "list" with 3 entries.
public static int Counter = 0;
static void Main(string[] args)
{
TestList<int> list = new TestList<int>();
list.Add(1);
list.Add(2);
list.Add(3);
var v = list.Where(c => c == 2).ToList(); //will use movenext 4 times
var v = list.Where(c => true).ToList(); //will also use movenext 4 times
List<int> tmpList = new List<int>(); //And the loop in OP question
foreach(var i in list)
{
tmpList.Add(i);
} //Also 4 times.
}
And conclusion? How does it hit performance?
The MoveNext is called n+1 times in this case. Regardless of how many items we have.
And also the WhereClause does not matter, he will still run MoveNext 4 times. Because we always run our query on our initial list.
The only performance hit we will take is the actual LINQ framework and its calls. The actual loops made will be the same.
And before anyone asks why its N+1 times and not N times. Its because he returns false the last time when he is out of elements. Making it the number of elements + end of list.
To answer this completely, it depends on the implementation. If you are talking about LINQ to SQL/EF, there will be only one iteration in this case when .ToList is called, which internally calls .GetEnumerator. The query expression is then parsed into TSQL and passed to the database. The resulting rows are then iterated over (once) and added to the list.
In the case of LINQ to Objects, there is only one pass through the data as well. The use of yield return in the where clause sets up a state machine internally which keeps track of where the process is in the iteration. Where does NOT do a full iteration creating a temporary list and then passing those results to the rest of the query. It just determines if an item meets a criteria and only passes on those that match.
First of all, Why are you even asking me? Measure for yourself and see.
That said, Where, Select, OrderBy and the other LINQ IEnumerable extension methods, in general, are implemented as lazy as possible (the yield keyword is used often). That means that they do not work on the data unless they have to. From your example:
var list = Students.Where(s => s.Name == "ABC");
won't execute anything. This will return momentarily even if Students is a list of 10 million objects. The predicate won't be called at all until the result is actually requested somewhere, and that is practically what ToList() does: It says "Yes, the results - all of them - are required immediately".
There is however, some initial overhead in calling of the LINQ methods, so the traditional way will, in general, be faster, but composability and the ease-of-use of the LINQ methods, IMHO, more than compensate for that.
If you like to take a look at how these methods are implemented, they are available for reference from Microsoft Reference Sources.
I have a nested while loop inside a foreach loop where I would like to advance the enumerator indefinitately while a certain condition is met. To do this I try casting the enumerator to IEnumerator< T > (which it must be if it is in a foreach loop) then calling MoveNext() on the casted object but it gives me an error saying I cannot convert it.
Cannot convert type 'System.DateTime' to System.Collections.Generic.IEnumerator via a reference conversion, boxing conversion, unboxing conversion, wrapping conversion, or null type conversion.
foreach (DateTime time in times)
{
while (condition)
{
// perform action
// move to next item
(time as IEnumerator<DateTime>).MoveNext(); // will not let me do this
}
// code to execute after while condition is met
}
What is the best way to manually increment the IEnumerator inside of the foreach loop?
EDIT:
Edited to show there is code after the while loop that I would like executed once the condition is met which is why I wanted to manually increment inside the while then break out of it as opposed to continue which would put me back at the top. If this isn't possible I believe the best thing is to redesign how I am doing it.
Many of the other answers recommend using continue, which may very well help you do what you need to do. However, in the interests of showing manually moving the enumerator, first you must have the enumerator, and that means writing your loop as a while.
using (var enumerator = times.GetEnumerator())
{
DateTime time;
while (enumerator.MoveNext())
{
time = enumerator.Current;
// pre-condition code
while (condition)
{
if (enumerator.MoveNext())
{
time = enumerator.Current;
// condition code
}
else
{
condition = false;
}
}
// post-condition code
}
}
From your comments:
How can the foreach loop advance it if it doesn't implement the IEnumerator interface?
In your loop, time is a DateTime. It is not the object that needs to implement an interface or pattern to work in the loop. times is a sequence of DateTime values, it is the one that must implement the enumerable pattern. This is generally fulfilled by implementing the IEnumerable<T> and IEnumerable interfaces, which simply require T GetEnumerator() and object GetEnumerator() methods. The methods return an object implementing IEnumerator<T> and IEnumerator, which define a bool MoveNext() method and a T or object Current property. But time cannot be cast to IEnumerator, because it is no such thing, and neither is the times sequence.
You cannot modify the enumerator from inside the for loop. The language does not permit this. You need to use the continue statement in order to advance to the next iteration of a loop.
However, I'm not convinced that your loop even needs a continue. Read on.
In the context of your code you would need to convert the while to an if in order to make the continue refer to the foreach block.
foreach (DateTime time in times)
{
if (condition)
{
// perform action
continue;
}
// code to execute if condition is not met
}
But written like this it is clear that the following equivalent variant is simpler still
foreach (DateTime time in times)
{
if (condition)
{
// perform action
}
else
{
// code to execute if condition is not met
}
}
This is equivalent to your pseudo-code because the part marked code to execute after while condition is met is executed for each item for which condition is false.
My assumption in all of this is that condition is evaluated for each item in the list.
Perhaps you can use continue?
You would use the continue statement:
continue;
This is just a guess, but it sounds like what you're trying to do is take a list of datetimes and move past all of them which meet a certain criteria, then perform an action on the rest of the list. If that's what you're trying to do, you probably want something like SkipWhile() from System.Linq. For example, the following code takes a series of datetimes and skips past all of them which are before the cutoff date; then it prints out the remaining datetimes:
var times = new List<DateTime>()
{
DateTime.Now.AddDays(1), DateTime.Now.AddDays(2), DateTime.Now.AddDays(3), DateTime.Now.AddDays(4)
};
var cutoff = DateTime.Now.AddDays(2);
var timesAfterCutoff = times.SkipWhile(datetime => datetime.CompareTo(cutoff) < 1)
.Select(datetime => datetime);
foreach (var dateTime in timesAfterCutoff)
{
Console.WriteLine(dateTime);
}
Console.ReadLine();
Is that the sort of thing you're trying to do?
I definitely do not condone what I am about to suggest, but you can create a wrapper around the original IEnumerable to transform it into something that returns items which can be used to navigate the underlying the enumerator. The end result might look like the following.
public static void Main(string[] args)
{
IEnumerable<DateTime> times = GetTimes();
foreach (var step in times.StepWise())
{
while (condition)
{
step.MoveNext();
}
Console.WriteLine(step.Current);
}
}
Then we need to create our StepWise extension method.
public static class EnumerableExtension
{
public static IEnumerable<Step<T>> StepWise<T>(this IEnumerable<T> instance)
{
using (IEnumerator<T> enumerator = instance.GetEnumerator())
{
while (enumerator.MoveNext())
{
yield return new Step<T>(enumerator);
}
}
}
public struct Step<T>
{
private IEnumerator<T> enumerator;
public Step(IEnumerator<T> enumerator)
{
this.enumerator = enumerator;
}
public bool MoveNext()
{
return enumerator.MoveNext();
}
public T Current
{
get { return enumerator.Current; }
}
}
}
You could use a func as your iterator and keep the state that you are changing in that delegate to be evaluated each iteration.
public static IEnumerable<T> FunkyIEnumerable<T>(this Func<Tuple<bool, T>> nextOrNot)
{
while(true)
{
var result = nextOrNot();
if(result.Item1)
yield return result.Item2;
else
break;
}
yield break;
}
Func<Tuple<bool, int>> nextNumber = () =>
Tuple.Create(SomeRemoteService.CanIContinueToSendNumbers(), 1);
foreach(var justGonnaBeOne in nextNumber.FunkyIEnumerable())
Console.Writeline(justGonnaBeOne.ToString());
One alternative not yet mentioned is to have an enumerator return a wrapper object which allows access to itself in addition to the data element being enumerated. For sample:
struct ControllableEnumeratorItem<T>
{
private ControllableEnumerator parent;
public T Value {get {return parent.Value;}}
public bool MoveNext() {return parent.MoveNext();}
public ControllableEnumeratorItem(ControllableEnumerator newParent)
{parent = newParent;}
}
This approach could also be used by data structures that want to allow collections to be modified in controlled fashion during enumeration (e.g. by including "DeleteCurrentItem", "AddBeforeCurrentItem", and "AddAfterCurrentItem" methods).
What's the best practice to check if a collection has items?
Here's an example of what I have:
var terminalsToSync = TerminalAction.GetAllTerminals();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
The GetAllTerminals() method will execute a stored procedure and, if we return a result, (Any() is true), SyncTerminals() will loop through the elements; thus enumerating it again and executing the stored procedure for the second time.
What's the best way to avoid this?
I'd like a good solution that can be used in other cases too; possibly without converting it to List.
Thanks in advance.
I would probably use a ToArray call, and then check Length; you're going to enumerate all the results anyway so why not do it early? However, since you've said you want to avoid early realisation of the enumerable...
I'm guessing that SyncTerminals has a foreach, in which case you can write it something like this:
bool any = false;
foreach(var terminal in terminalsToSync)
{
if(!any)any = true;
//....
}
if(!any)
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
Okay, there's a redundant if after the first loop, but I'm guessing the cost of an extra few CPU cycles isn't going to matter much.
Equally, you could do the iteration the old way and use a do...while loop and GetEnumerator; taking the first iteration out of the loop; that way there are literally no wasted operations:
var enumerator = terminalsToSync.GetEnumerator();
if(enumerator.MoveNext())
{
do
{
//sync enumerator.Current
} while(enumerator.MoveNext())
}
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
How about this, which still defers execution, but buffers it once executed:
var terminalsToSync = TerminalAction.GetAllTerminals().Lazily();
with:
public static class LazyEnumerable {
public static IEnumerable<T> Lazily<T>(this IEnumerable<T> source) {
if (source is LazyWrapper<T>) return source;
return new LazyWrapper<T>(source);
}
class LazyWrapper<T> : IEnumerable<T> {
private IEnumerable<T> source;
private bool executed;
public LazyWrapper(IEnumerable<T> source) {
if (source == null) throw new ArgumentNullException("source");
this.source = source;
}
IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
public IEnumerator<T> GetEnumerator() {
if (!executed) {
executed = true;
source = source.ToList();
}
return source.GetEnumerator();
}
}
}
Personally i wouldnt use an any here, foreach will simply not loop through any items if the collection is empty, so i would just do it like that. However i would recommend that you check for null.
If you do want to pre-enumerate the set use .ToArray() eg will only enumerate once:
var terminalsToSync = TerminalAction.GetAllTerminals().ToArray();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
var terminalsToSync = TerminalAction.GetAllTerminals().ToList();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
.Length or .Count is faster since it doesn't need to go through the GetEnumerator()/MoveNext()/Dispose() required by Any()
Here's another way of approaching this problem:
int count = SyncTerminals(terminalsToSync);
if(count == 0) GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
where you change SyncTerminals to do:
int count = 0;
foreach(var obj in terminalsToSync) {
count++;
// some code
}
return count;
Nice and simple.
All the caching solutions here are caching all items when the first item is being retrieved. It it really lazy if you cache each single item while the items of the list are is iterated.
The difference can be seen in this example:
public class LazyListTest
{
private int _count = 0;
public void Test()
{
var numbers = Enumerable.Range(1, 40);
var numbersQuery = numbers.Select(GetElement).ToLazyList(); // Cache lazy
var total = numbersQuery.Take(3)
.Concat(numbersQuery.Take(10))
.Concat(numbersQuery.Take(3))
.Sum();
Console.WriteLine(_count);
}
private int GetElement(int value)
{
_count++;
// Some slow stuff here...
return value * 100;
}
}
If you run the Test() method, the _count is only 10. Without caching it would be 16 and with .ToList() it would be 40!
An example of the implementation of LazyList can be found here.
If you're seeing two procedure calls for the evaluation of whatever GetAllTerminals() returns, this means that the procedure's result isn't being cached. Without knowing what data-access strategy you're using, this is quite hard to fix in a general way.
The simplest solution, as you've alluded, is to copy the result of the call before you perform any other operations. If you wanted to, you could neatly wrap this behaviour up in an IEnumerable<T> which executes the inner enumerable call just once:
public class CachedEnumerable<T> : IEnumerable<T>
{
public CachedEnumerable<T>(IEnumerable<T> enumerable)
{
result = new Lazy<List<T>>(() => enumerable.ToList());
}
private Lazy<List<T>> result;
public IEnumerator<T> GetEnumerator()
{
return this.result.Value.GetEnumerator();
}
System.Collections.IEnumerable GetEnumerator()
{
return this.GetEnumerator();
}
}
Wrap the result in an instance of this type and it will not evaluate the inner enumerable multiple times.
is there a way to break out of the foreach extension method? The "break" keyword doesn't recognize the extension method as a valid scope to break from.
//Doesn't compile
Enumerable.Range(0, 10).ToList().ForEach(i => { System.Windows.MessageBox.Show(i.ToString()); if (i > 2)break; });
Edit: removed "linq" from question
note the code is just an example to show break not working in the extension method... really what I want is for the user to be able to abort processing a list.. the UI thread has an abort variable and the for loop just breaks when the user hits a cancel button. Right now, I have a normal for loop, but I wanted to see if it was possible to do with the extension method.
It's probably more accurate to call this a List<T> Foreach vs. a LINQ one.
In either case though no there is no way to break out of this loop. Primarily because it's not actually a loop per say. It's a method which takes a delegate that is called inside a loop.
Creating a ForEach with break capability is fairly straight forward though
public delegate void ForEachAction<T>(T value, ref bool doBreak);
public static void ForEach<T>(this IEnumerable<T> enumerable, ForEachAction<T> action) {
var doBreak = false;
foreach (var cur in enumerable) {
action(cur, ref doBreak);
if (doBreak) {
break;
}
}
}
You could then rewrite your code as the following
Enumerable.Range(0,10)
.ForEach((int i,ref bool doBreak) => {
System.Windows.MessageBox.Show(i.ToString());
if ( i > 2) {doBreak = true;}
});
I recommend using TakeWhile.
Enumerable.Range(0, 10).TakeWhile(i => i <= 2).ToList().ForEach(i => MessageBox.Show(i.ToString()));
Or, using Rx:
Enumerable.Range(0, 10).TakeWhile(i => i <= 2).Run(i => MessageBox.Show(i.ToString()));
Why not use Where?
Enumerable.Range(0, 10).Where(i => i <= 2).ToList().ForEach(...)
Try a "return" statement instead of "break"; it's a delegate function and not a real loop.
Try:
Enumerable.Range(0, 10).ToList().ForEach(i => { System.Windows.MessageBox.Show(i.ToString()); if (i > 2) return; });
For myself, I used Linq Select + FirstOrDefault. Transform "each" in the list but since we are asking for the First, it will stop transforming them after finding the first one that matches.
var thingOption = list.Select(thing => AsThingOption(thing))
.FirstOrDefault(option => option.HasValue) ?? Option.None<MyThing>;
As you can see I'm transforming each thing into an Option and my method AsThingOption may succeed at transforming and when it does we stop iterating over the list. If the conditions inside AsThingOption method never succeed at transforming, then end result is None.
Hope that helps!
Why not throw new Exception("Specific message") or use a boolean flag?
DataTable dt = new DataTable();
bool error = false;
try
{
dt.AsEnumerable().ToList().ForEach(dr =>
{
if (dr["Col1"].ToString() == "")
{
error = true;
throw new Exception("Specific message");
}
else
// Do something
});
}
catch (Exception ex)
{
if (ex.Message == "Specific message" || error)
// do something
else
throw ex;
}
I have a case where I need to loop actions until a condition is false and each action can modify the value used to track the condition. I came up with this variation of .TakeWhile(). The advantage here is that each item in the list can participate in some logic, then the loop can be aborted on a condition not directly related to the item.
new List<Action> {
Func1,
Func2
}.TakeWhile(action => {
action();
return _canContinue;
});