Resharper is suggesting to use the top example, over the bottom example. However I am under the impression that a new list of items will be created first, and thus all of the _executeFuncs will be run before the runstoredprocedure is called.
This would normally not be an issue, however exceptions are prone to occur and if my hypothesis is correct then my database will not be update despite the functions having been ran??
foreach (var result in rows.Select(row => _executeFunc(row)))
{
RunStoredProcedure(result)
}
Or
foreach(var row in rows)
{
var result = _executeFunc(row);
RunStoredProcedure(result);
}
The statements are, in this case, semantically the same as Select (and linq in general) uses deferred execution of delegates. It won't run any declared queries until the result is being materialised, and depending on how you write that query it will do it in proper sequence.
A very simple example to show that:
var list = new List<string>{"hello", "world", "example"};
Func<string, string> func = (s) => {
Console.WriteLine(s);
return s.ToUpper();
};
foreach(var item in list.Select(i => func(i)))
{
Console.WriteLine(item);
}
results in
hello
HELLO
world
WORLD
example
EXAMPLE
In your first example, _executeFunc(row) will NOT be called first for each item in rows before your foreach loop begins. LINQ will defer execution. See This answer for more details.
The order of events will be:
Evaluate the first item in rows
Call executeFunc(row) on that item
Call RunStoredProcedure(result)
Repeat with the next item in rows
Now, if your code were something like this:
foreach (var result in rows.Select(row => _executeFunc(row)).ToList())
{
RunStoredProcedure(result)
}
Then it WOULD run the LINQ .Select first for every item in rows because the .ToList() causes the collection to be enumerated.
In the top example, using Select will project the rows, by yielding them one by one.
So
foreach (var result in rows.Select(row => _executeFunc(row)))
is basically the same as
foreach(var row in rows)
Thus Select is doing something like this
for each row in source
result = _executeFunc(row)
yield result
That yield is passing each row back one by one (it's a bit more complicated than that, but this explanation should suffice for now).
If you did this instead
foreach (var result in rows.Select(row => _executeFunc(row)).ToList())
Calling ToList() will return a List of rows immediately, and that means _executeFunc() will indeed be called for every row, before you've had a chance to call RunStoredProcedure().
Thus what Resharper is suggesting is valid. To be fair, I'm sure the Jetbrains devs know what they are doing :)
Select uses deferred execution. This means that it will, in order:
take an item from rows
call _executeFunc on it
call RunStoredProcedure on the result of _executeFunc
And then it will do the same for the next item, until all the list has been processed.
The execution will be deferred meaning they will have the same exec
Related
One of my coworkers came to me with a question about this method that results in an infinite loop. The actual code is a bit too involved to post here, but essentially the problem boils down to this:
private IEnumerable<int> GoNuts(IEnumerable<int> items)
{
items = items.Select(item => items.First(i => i == item));
return items;
}
This should (you would think) just be a very inefficient way to create a copy of a list. I called it with:
var foo = GoNuts(new[]{1,2,3,4,5,6});
The result is an infinite loop. Strange.
I think that modifying the parameter is, stylistically a bad thing, so I changed the code slightly:
var foo = items.Select(item => items.First(i => i == item));
return foo;
That worked. That is, the program completed; no exception.
More experiments showed that this works, too:
items = items.Select(item => items.First(i => i == item)).ToList();
return items;
As does a simple
return items.Select(item => .....);
Curious.
It's clear that the problem has to do with reassigning the parameter, but only if evaluation is deferred beyond that statement. If I add the ToList() it works.
I have a general, vague, idea of what's going wrong. It looks like the Select is iterating over its own output. That's a little bit strange in itself, because typically an IEnumerable will throw if the collection it's iterating changes.
What I don't understand, because I'm not intimately familiar with the internals of how this stuff works, is why re-assigning the parameter causes this infinite loop.
Is there somebody with more knowledge of the internals who would be willing to explain why the infinite loop occurs here?
The key to answering this is deferred execution. When you do this
items = items.Select(item => items.First(i => i == item));
you do not iterate the items array passed into the method. Instead, you assign it a new IEnumerable<int>, which references itself back, and starts iterating only when the caller starts enumerating the results.
That is why all your other fixes have dealt with the problem: all you needed to do is to stop feeding IEnumerable<int> back to itself:
Using var foo breaks self-reference by using a different variable,
Using return items.Select... breaks self-reference by not using intermediate variables at all,
Using ToList() breaks self-reference by avoiding deferred execution: by the time items is re-assigned, old items has been iterated over, so you end up with a plain in-memory List<int>.
But if it's feeding on itself, how does it get anything at all?
That's right, it does not get anything! The moment you try iterating items and ask it for the first item, the deferred sequence asks the sequence fed to it for the first item to process, which means that the sequence is asking itself for the first item to process. At this point, it's turtles all the way down, because in order to return the first item to process the sequence must first get the first item to process from itself.
It looks like the Select is iterating over its own output
You are correct. You are returning a query that iterates over itself.
The key is that you reference items within the lambda. The items reference is not resolved ("closed over") until the query iterates, at which point items now references the query instead of the source collection. That's where the self-reference occurs.
Picture a deck of cards with a sign in front of it labelled items. Now picture a man standing beside the deck of cards whose assignment is to iterate the collection called items. But then you move the sign from the deck to the man. When you ask the man for the first "item" - he looks for the collection marked "items" - which is now him! So he asks himself for the first item, which is where the circular reference occurs.
When you assign the result to a new variable, you then have a query that iterates over a different collection, and so does not result in an infinite loop.
When you call ToList, you hydrate the query to a new collection and also do not get an infinite loop.
Other things that would break the circular reference:
Hydrating items within the lambda by calling ToList
Assigning items to another variable and referencing that within the lambda.
After studying the two answers given and poking around a bit, I came up with a little program that better illustrates the problem.
private int GetFirst(IEnumerable<int> items, int foo)
{
Console.WriteLine("GetFirst {0}", foo);
var rslt = items.First(i => i == foo);
Console.WriteLine("GetFirst returns {0}", rslt);
return rslt;
}
private IEnumerable<int> GoNuts(IEnumerable<int> items)
{
items = items.Select(item =>
{
Console.WriteLine("Select item = {0}", item);
return GetFirst(items, item);
});
return items;
}
If you call that with:
var newList = GoNuts(new[]{1, 2, 3, 4, 5, 6});
You'll get this output repeatedly until you finally get StackOverflowException.
Select item = 1
GetFirst 1
Select item = 1
GetFirst 1
Select item = 1
GetFirst 1
...
What this shows is exactly what dasblinkenlight made clear in his updated answer: the query goes into an infinite loop trying to get the first item.
Let's write GoNuts a slightly different way:
private IEnumerable<int> GoNuts(IEnumerable<int> items)
{
var originalItems = items;
items = items.Select(item =>
{
Console.WriteLine("Select item = {0}", item);
return GetFirst(originalItems, item);
});
return items;
}
If you run that, it succeeds. Why? Because in this case it's clear that the call to GetFirst is passing a reference to the original items that were passed to the method. In the first case, GetFirst is passing a reference to the new items collection, which hasn't yet been realized. In turn, GetFirst says, "Hey, I need to enumerate this collection." And thus begins the first recursive call that eventually leads to StackOverflowException.
Interestingly, I was right and wrong when I said that it was consuming its own output. The Select is consuming the original input, as I would expect. The First is trying to consume the output.
Lots of lessons to be learned here. To me, the most important is "don't modify the value of input parameters."
Thanks to dasblinkenlight, D Stanley, and Lucas Trzesniewski for their help.
I have the below which calculates the running total for a customer account status, however he first value is always added to itself and I'm not sure why - though I suspect I've missed something obvious:
decimal? runningTotal = 0;
IEnumerable<StatementModel> statement = sage.Repository<FDSSLTransactionHistory>()
.Queryable()
.Where(x => x.CustomerAccountNumber == sageAccount)
.OrderBy(x=>x.UniqueReferenceNumber)
.AsEnumerable()
.Select(x => new StatementModel()
{
SLAccountId = x.CustomerAccountNumber,
TransactionReference = x.TransactionReference,
SecondReference = x.SecondReference,
Currency = x.CurrencyCode,
Value = x.GoodsValueInAccountCurrency,
TransactionDate = x.TransactionDate,
TransactionType = x.TransactionType,
TransactionDescription = x.TransactionTypeName,
Status = x.Status,
RunningTotal = (runningTotal += x.GoodsValueInAccountCurrency)
});
Which outputs:
29/02/2012 00:00:00 154.80 309.60
30/04/2012 00:00:00 242.40 552.00
30/04/2012 00:00:00 242.40 794.40
30/04/2012 00:00:00 117.60 912.00
Where the 309.60 of the first row should be simply 154.80
What have I done wrong?
EDIT:
As per ahruss's comment below, I was calling Any() on the result in my View, causing the first to be evaluated twice - to resolve I appended ToList() to my query.
Thanks all for your suggestions
Add a ToList() to the end of the call to avoid duplicate invocations of the selector.
This is a stateful LINQ query with side-effects, which is by nature unpredictable. Somewhere else in the code, you called something that caused the first element to be evaluated, like First() or Any(). In general, it is dangerous to have side-effects in LINQ queries, and when you find yourself needing them, it's time to think about whether or not it should just be a foreach instead.
Edit, or Why is this happening?
This is a result of how LINQ queries are evaluated: until you actually use the results of a query, nothing really happens to the collection. It doesn't evaluate any of the elements. Instead, it stores Abstract Expression Trees or just the delegates it needs to evaluate the query. Then, it evaluates those only when the results are needed, and unless you explicitly store the results, they're thrown away afterwards, and re-evaluated the next time.
So this makes the question why does it have different results each time? The answer is that runningTotal is only initialized the first time around. After that, its value is whatever it was after the last execution of the query, which can lead to strange results.
This means the question could just have easily have been "Why is the total always twice what it should be?" if the asker were doing something like this:
Console.WriteLine(statement.Count()); // this enumerates all the elements!
foreach (var item in statement) { Console.WriteLine(item.Total); }
Because the only way to get the number of elements in the sequence is to actually evaluate all of them.
Similarly, what actually happened in this question was that somewhere there was code like this:
if (statement.Any()) // this actually involves getting the first result
{
// do something with the statement
}
// ...
foreach (var item in statement) { Console.WriteLine(item.Total); }
It seems innocuous, but if you know how LINQ and IEnumerable work, you know that .Any() is basically the same as .GetEnumerator().MoveNext(), which makes it more obvious that it requires getting the first element.
It all boils down to the fact that LINQ is based on deferred execution, which is why the solution is to use ToList, which circumvents that and forces immediate execution.
If you don't want to freeze the results with ToList, a solution to the outer scope variable problem is using an iterator function, like this:
IEnumerable<StatementModel> GetStatement(IEnumerable<DataObject> source) {
decimal runningTotal = 0;
foreach (var x in source) {
yield return new StatementModel() {
...
RunningTotal = (runningTotal += x.GoodsValueInAccountCurrency)
};
}
}
Then pass to this function the source query (not including the Select):
var statement = GetStatement(sage.Repository...AsEnumerable());
Now it is safe to enumerate statement multiple times. Basically, this creates an enumerable that re-executes this entire block on each enumeration, as opposed to executing a selector (which equates to only the foreach part) -- so runningTotal will be reset.
I have the following piece of code which uses the foreach iterator.
foreach (var item in daysOfWeeksList)
{
daysOper |= item;
}
daysOfWeeksList is a list. I want to OR each item in the list and process the result?
This daysOfWeeksList is a
List<int> daysOfWeeksList
Say I wan to do something like this. The dosomething I want to do is the OR operation.
list.ForEach( item =>
{
item.DoSomething();
} );
How would you go about this using an foreach method available as part of the List collection? I got plenty of examples for this for 2 operands but not for a single operand.
Assuming daysOper starts as 0, I wouldn't use ForEach at all - I'd use Aggregate from LINQ:
var daysOper = daysOfWeekList.Aggregate((current, next) => current | next);
In other words, keep a running "current" value, and keep OR-ing it with the next value each time. (The result of one iteration will be used as the "current" value for the next iteration.)
In general, you want to use the Aggregate method for stuff like this where the standard aggragetors, like Sum don't fit.
(Edit: I assumed that the OP was doing the OR operation over a List. So I edited the below paragraph)
However, if DaysOfWeekList is a List, then have the opportunity to optimize performance by stopping at the first instance of "true". The Any method does this.
var result = daysOfWeekList.Any(daysOpr=>daysOpr);
I am transforming an Excel spreadsheet into a list of "Elements" (this is a domain term). During this transformation, I need to skip the header rows and throw out malformed rows that cannot be transformed.
Now comes the fun part. I need to capture those malformed records so that I can report on them. I constructed a crazy LINQ statement (below). These are extension methods hiding the messy LINQ operations on the types from the OpenXml library.
var elements = sheet
.Rows() <-- BEGIN sheet data transform
.SkipColumnHeaders()
.ToRowLookup()
.ToCellLookup()
.SkipEmptyRows() <-- END sheet data transform
.ToElements(strings) <-- BEGIN domain transform
.RemoveBadRecords(out discard)
.OrderByCompositeKey();
The interesting part starts at ToElements, where I transform the row lookup to my domain object list (details: it's called an ElementRow, which is later transformed into an Element). Bad records are created with just a key (the Excel row index) and are uniquely identifiable vs. a real element.
public static IEnumerable<ElementRow> ToElements(this IEnumerable<KeyValuePair<UInt32Value, Cell[]>> map)
{
return map.Select(pair =>
{
try
{
return ElementRow.FromCells(pair.Key, pair.Value);
}
catch (Exception)
{
return ElementRow.BadRecord(pair.Key);
}
});
}
Then, I want to remove those bad records (it's easier to collect all of them before filtering). That method is RemoveBadRecords, which started like this...
public static IEnumerable<ElementRow> RemoveBadRecords(this IEnumerable<ElementRow> elements)
{
return elements.Where(el => el.FormatId != 0);
}
However, I need to report the discarded elements! And I don't want to muddy my transform extension method with reporting. So, I went to the out parameter (taking into account the difficulties of using an out param in an anonymous block)
public static IEnumerable<ElementRow> RemoveBadRecords(this IEnumerable<ElementRow> elements, out List<ElementRow> discard)
{
var temp = new List<ElementRow>();
var filtered = elements.Where(el =>
{
if (el.FormatId == 0) temp.Add(el);
return el.FormatId != 0;
});
discard = temp;
return filtered;
}
And, lo! I thought I was hardcore and would have this working in one shot...
var discard = new List<ElementRow>();
var elements = data
/* snipped long LINQ statement */
.RemoveBadRecords(out discard)
/* snipped long LINQ statement */
discard.ForEach(el => failures.Add(el));
foreach(var el in elements)
{
/* do more work, maybe add more failures */
}
return new Result(elements, failures);
But, nothing was in my discard list at the time I looped through it! I stepped through the code and realized that I successfully created a fully-streaming LINQ statement.
The temp list was created
The Where filter was assigned (but not yet run)
And the discard list was assigned
Then the streaming thing was returned
When discard was iterated, it contained no elements, because the elements weren't iterated over yet.
Is there a way to fix this problem using the thing I constructed? Do I have to force an iteration of the data before or during the bad record filter? Is there another construction that I've missed?
Some Commentary
Jon mentioned that the assignment /was/ happening. I simply wasn't waiting for it. If I check the contents of discard after the iteration of elements, it is, in fact, full! So, I don't actually have an assignment problem. Unless I take Jon's advice on what's good/bad to have in a LINQ statement.
When the statement was actually iterated, the Where clause ran and temp filled up, but discard was never assigned again!
It doesn't need to be assigned again - the existing list which will have been assigned to discard in the calling code will be populated.
However, I'd strongly recommend against this approach. Using an out parameter here is really against the spirit of LINQ. (If you iterate over your results twice, you'll end up with a list which contains all the bad elements twice. Ick!)
I'd suggest materializing the query before removing the bad records - and then you can run separate queries:
var allElements = sheet
.Rows()
.SkipColumnHeaders()
.ToRowLookup()
.ToCellLookup()
.SkipEmptyRows()
.ToElements(strings)
.ToList();
var goodElements = allElements.Where(el => el.FormatId != 0)
.OrderByCompositeKey();
var badElements = allElements.Where(el => el.FormatId == 0);
By materializing the query in a List<>, you only process each row once in terms of ToRowLookup, ToCellLookup etc. It does mean you need to have enough memory to keep all the elements at a time, of course. There are alternative approaches (such as taking an action on each bad element while filtering it) but they're still likely to end up being fairly fragile.
EDIT: Another option as mentioned by Servy is to use ToLookup, which will materialize and group in one go:
var lookup = sheet
.Rows()
.SkipColumnHeaders()
.ToRowLookup()
.ToCellLookup()
.SkipEmptyRows()
.ToElements(strings)
.OrderByCompositeKey()
.ToLookup(el => el.FormatId == 0);
Then you can use:
foreach (var goodElement in lookup[false])
{
...
}
and
foreach (var badElement in lookup[true])
{
...
}
Note that this performs the ordering on all elements, good and bad. An alternative is to remove the ordering from the original query and use:
foreach (var goodElement in lookup[false].OrderByCompositeKey())
{
...
}
I'm not personally wild about grouping by true/false - it feels like a bit of an abuse of what's normally meant to be a key-based lookup - but it would certainly work.
For example
var query = myDic.Where(x => !blacklist.Contains(x.Key));
foreach (var item in query)
{
if (condition)
blacklist.Add(item.key+1); //key is int type
ret.add(item);
}
return ret;
would this code be valid? and how do I improve it?
Updated
i am expecting my blacklist.add(item.key+1) would result in smaller ret then otherwise. The ToList() approach won't achieve my intention in this sense.
is there any other better ideas, correct and unambiguous.
That is perfectly safe to do and there shouldn't be any problems as you're not directly modifying the collection that you are iterating over. Though you are making other changes that affects where clause, it's not going to blow up on you.
The query (as written) is lazily evaluated so blacklist is updated as you iterate through the collection and all following iterations will see any newly added items in the list as it is iterated.
The above code is effectively the same as this:
foreach (var item in myDic)
{
if (!blacklist.Contains(item.Key))
{
if (condition)
blacklist.Add(item.key + 1);
}
}
So what you should get out of this is that as long as you are not directly modifying the collection that you are iterating over (the item after in in the foreach loop), what you are doing is safe.
If you're still not convinced, consider this and what would be written out to the console:
var blacklist = new HashSet<int>(Enumerable.Range(3, 100));
var query = Enumerable.Range(2, 98).Where(i => !blacklist.Contains(i));
foreach (var item in query)
{
Console.WriteLine(item);
if ((item % 2) == 0)
{
var value = 2 * item;
blacklist.Remove(value);
}
}
Yes. Changing a collections internal objects is strictly prohibited when iterating over a collection.
UPDATE
I initially made this a comment, but here is a further bit of information:
I should note that my knowledge comes from experience and articles I've read a long time ago. There is a chance that you can execute the code above because (I believe) the query contains references to the selected object within blacklist. blacklist might be able to change, but not query. If you were strictly iterating over blacklist, you would not be able to add to the blacklist collection.
Your code as presented would not throw an exception. The collection being iterated (myDic) is not the collection being modified (blacklist or ret).
What will happen is that each iteration of the loop will evaluate the current item against the query predicate, which would inspect the blacklist collection to see if it contains the current item's key. This is lazily evaluated, so a change to blacklist in one iteration will potentially impact subsequent iterations, but it will not be an error. (blacklist is fully evaluated upon each iteration, its enumerator is not being held.)