Examine IEnumerable "stack" while debugging? - c#

Suppose we have a source IEnumerable sequence:
IEnumerable<Tuple<int, int>> source = new [] {
new Tuple<int, int>(1, 2),
new Tuple<int, int>(2, 3),
new Tuple<int, int>(3, 2),
new Tuple<int, int>(5, 2),
new Tuple<int, int>(2, 0),
};
We want to apply some filters and some transformations:
IEnumerable<int> result1 = source.Where(t => (t.Item1 + t.Item2) % 2 == 0)
.Select(t => t.Item2)
.Select(i => 1 / i);
IEnumerable<int> result2 = from t in source
where (t.Item1 + t.Item2) % 2 == 0
let i = t.Item2
select 1 / i;
These two queries are equivalent, and both will throw a DivideByZeroException on the last item.
However, when the second query is enumerated, the VS debugger will let me inspect the entire query, thus very handy in determining the source of the problem.
However, there is no equivalent help when the first query is enumerated. Inspecting into the LINQ implementation yields no useful data, probably due to the binary being optimized:
Is there a way to usefully inspect the enumerable values up the "stack" of IEnumerables when not using query syntax? Query syntax is not an option because sharing code is impossible with it (ie, the transformations are non trivial and used more than once).

But you can debug the first one. Just insert a breakpoint on any one of the lambdas and you're free to inspect the values of the parameters or whatever else is in scope.
When debugging you can then inspect the values of (in the case of breaking within the first Where) t, t.Item1, etc.
As for the reason that you can inspect t when performing the final select in your second query, but not your first, it's because you haven't created equivalent queries. The second query you wrote, when written out by the compiler, will not generate something like your first query. It will create something subtly, but still significantly, different. It will create something like this:
IEnumerable<int> result1 = source.Where(t => (t.Item1 + t.Item2) % 2 == 0)
.Select(t => new
{
t,
i = t.Item2,
})
.Select(result => 1 / result.i);
A let call doesn't just select out that value, as the first query you wrote does. It selects out a new anonymous type that pulls out the value from the let clause as well as the previous value, and then modifies the subsequent queries to pull out the appropriate variable. That's why the "previous" variables (i.e. t are still in scope at the end of the query (at both compile time and runtime; that alone should have been a big hint to you). Using the query I provided above, when breaking on the select, you can see the value of result.t through the debugger.

Related

Querying nested lists with LINQ instead of loops

Lets say I have the following setup
Continent
--Countries
----Provinces
------Cities
A continent contains a list of many countries which contains a list of many provinces which contains a list of many cities.
For each nested list lets say I want to do a check (name length is greater than 5)
Instead of using this loop structure
var countries = dbSet.Countries.Where(c => c.Name.Length > 5);
foreach (var country in countries)
{
country.Provinces = country.Provinces.Where(p => p.Name.Length > 5);
foreach (var province in country.Provinces)
{
province.Cities = province.Cities.Where(ci => ci.Name.Length() > 5);
}
}
How could I accomplish the same efficiently with LINQ?
Efficiently? In terms of written code, sure, but we'll call that "cleanly". In terms of execution, that's not a question you should be asking at this point. Focus on getting the job done in code that's understandable and then "race your horses" to see if you really need to improve on it.
One thing I should caution is that LINQ is about querying, which doesn't mutate the source sequences. You're assigning the filtered sequences back to the properties and that's contrary to LINQ principles. The tag shows you're using Entity Framework so it's definitely not a good idea to do that because it uses its own collection types under the hood.
To answer your question, the SelectMany extension method loops on the projected sequence. When it's translated to a database query, it translates to a join.
dbSet.Countries
.Where(c => c.Names.Length > 5)
.SelectMany(c => c.Provinces)
.Where(p => p.Name.Length > 5)
.SelectMany(p => p.Cities)
.Where(ci => ci.Name.Length > 5)
.Select(ci => ci.Name);
That'll give you the names of all cities where the country, province, and city names are all longer than 5 characters.
But that only gives you the names of the cities. If you want to know each level of information, extension methods are difficult to use because you have to project "transparent identifiers" at each step along the way and it can get pretty cluttered. Let the compiler do that for you by using LINQ syntax.
from c in dbSet.Countries
where c.Name.Length > 5
from p in c.Provinces
where p.Name.Length > 5
from ci in p.Cities
where ci.Name.Length > 5
That will do the same thing as above, except now, all your range variables are carried through the expression so you can do this:
select new
{
CountryName = c.Name,
ProvinceName = p.Name,
CityName = ci.Name
};
...or whatever you want to do with c, p, and ci.
EDIT: Merged the second answer, which addressed questions in the comments, into this one.
In order to preserve the parent levels through the query, you need to project a container for the parent and the child each time you loop through a collection of child objects. When you use LINQ syntax, the compiler does this for you in the form of a "transparent identifier". It's transparent because your references to range variables "go right through" it and you never see it. Jon Skeet touches on them near the end of Reimplementing LINQ to Objects: Part 19 – Join.
To accomplish this, you want to use a different overload of SelectMany this time, one that also takes a lambda to project the container you need. Each iteration through the child items, that lambda is called and passed two parameters, the parent and the current iteration's child item.
var result = dbSet.Countries
.Where(c => c.Names.Length > 5)
.SelectMany(c => c.Provinces, (c, p) => new { c, p })
.Where(x1 => x1.p.Name.Length > 5)
.SelectMany(x1 => x1.p.Cities, (x1, ci) => new { x1.c, x1.p, ci })
.Where(x2 => x2.ci.Name.Length > 5)
.Select(x2 => new
{
Country = x2.c,
Province = x2.p,
City = x2.ci
})
.ToList();
The x1 and x2 lambda arguments are the containers projected from the preceding SelectMany call. I like to call them "opaque identifiers". They're no longer transparent if you have refer to them explicitly.
The c, p, and ci range variables are now properties of those containers.
As a bonus note, when you use a let clause, the compiler's doing the exact same thing, creating a container that has all of the available range variables and the new variable that's being introduced.
I want to end this with a word of advice: Use LINQ syntax as much as possible. It's easier to write and get right, and it's easier to read because you don't have all those projections that the compiler can do for you. If you have to resort to extension methods, do so in parts. The two techniques can be mixed. There's art in keeping it from looking like a mess.

Why is this output variable in my LINQ expression NOT problematic?

Given the following code:
var strings = Enumerable.Range(0, 100).Select(i => i.ToString());
int outValue = 0;
var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
.Select(s => outValue);
outValue = 3;
//enumerating over someEnumerable here shows ints from 0 to 99
I am able to see a "snapshot" of the out parameter for each iteration. Why does this work correctly instead of me seeing 100 3's (deferred execution) or 100 99's (access to modified closure)?
First you define a query, strings that knows how to generate a sequence of strings, when queried. Each time a value is asked for it will generate a new number and convert it to a string.
Then you declare a variable, outValue, and assign 0 to it.
Then you define a new query, someEnumerable, that knows how to, when asked for a value, get the next value from the query strings, try to parse the value and, if the value can be parsed, yields the value of outValue. Once again, we have defined a query that can do this, we have not actually done any of this.
You then set outValue to 3.
Then you ask someEnumerable for it's first value, you are asking the implementation of Select for its value. To compute that value it will ask the Where for its first value. The Where will ask strings. (We'll skip a few steps now.) The Where will get a 0. It will call the predicate on 0, specifically calling int.TryParse. A side effect of this is that outValue will be set to 0. TryParse returns true, so the item is yielded. Select then maps that value (the string 0) into a new value using its selector. The selector ignores the value and yields the value of outValue at that point in time, which is 0. Our foreach loop now does whatever with 0.
Now we ask someEnumerable for its second value, on the next iteration of the loop. It asks Select for a value, Select asks Where,Where asks strings, strings yields "1", Where calls the predicate, setting outValue to 1 as a side effect, Select yields the current value of outValue, which is 1. The foreach loop now does whatever with 1.
So the key point here is that due to the way in which Where and Select defer execution, performing their work only immediately when the values are needed, the side effect of the Where predicate ends up being called immediately before each projection in the Select. If you didn't defer execution, and instead performed all of the TryParse calls before any of the projections in Select, then you would see 100 for each value. We can actually simulate this easily enough. We can materialize the results of the Where into a collection, and then see the results of the Select be 100 repeated over and over:
var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
.ToList()//eagerly evaluate the query up to this point
.Select(s => outValue);
Having said all of that, the query that you have is not particularly good design. Whenever possible you should avoid queries that have side effects (such as your Where). The fact that the query both causes side effects, and observes the side effects that it creates, makes following all of this rather hard. The preferable design would be to rely on purely functional methods that aren't causing side effects. In this context the simplest way to do that is to create a method that tries to parse a string and returns an int?:
public static int? TryParse(string rawValue)
{
int output;
if (int.TryParse(rawValue, out output))
return output;
else
return null;
}
This allows us to write:
var someEnumerable = from s in strings
let n = TryParse(s)
where n != null
select n.Value;
Here there are no observable side effects in the query, nor is the query observing any external side effects. It makes the whole query far easier to reason about.
Because when you enumerate the value changes one at a time and changes the value of the variable on the fly. Due to the nature of LINQ the select for the first iteration is executed before the where for the second iteration. Basically this variable turns into a foreach loop variable of a kind.
This is what deferred execution buys us. Previous methods do not have to execute fully before the next method in the chain starts. One value moves through all the methods before the second goes in. This is very useful with methods like First or Take which stop the iteration early. Exceptions to the rule are methods that need to aggregate or sort like OrderBy (they need to look at all elements before finding out which is first). If you add an OrderBy before the Select the behavior will probably break.
Of course I wouldn't depend on this behavior in production code.
I don't understand what is odd for you?
If you write a loop on this enumerable like this
foreach (var i in someEnumerable)
{
Console.WriteLine(outValue);
}
Because LINQ enumerates each where and select lazyly and yield each value, if you add ToArray
var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
.Select(s => outValue).ToArray();
Than in the loop you will see 99 s
Edit
The below code will print 99 s
var strings = Enumerable.Range(0, 100).Select(i => i.ToString());
int outValue = 0;
var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
.Select(s => outValue).ToArray();
//outValue = 3;
foreach (var i in someEnumerable)
{
Console.WriteLine(outValue);
}

Reorder by original index when using Linq

I've look around for a solution this problem and although I've found similar I couldnt find an answer to this specific problem. I've generalised the problem but it goes something as follows:
I have the following int[]
[423]
[234]
[5]
[79]
[211]
[1001]
I would like to use linq to only select the entries that are less than 200 or greater than 300 and then order by the original array index so that the final array is guranteed to be:
[423]
[5]
[79]
[1001]
LINQ to object preserves the order in selection, so a simple where clause would do the job.
Order Preservation in PLINQ
In PLINQ, the goal is to maximize performance while maintaining correctness. A query should run as fast as possible but still produce the correct results. In some cases, correctness requires the order of the source sequence to be preserved; however, ordering can be computationally expensive. Therefore, by default, PLINQ does not preserve the order of the source sequence. In this regard, PLINQ resembles LINQ to SQL, but is unlike LINQ to Objects, which does preserve ordering.
But if you want, you can select the index with the value and later use OrderBy on index
int[] array = new []
{
423,234,5,79,211,1001
};
var sortedArray = array.Select((r, i) => new { value = r, index = i })
.Where(t => t.value < 200 || t.value > 300)
.OrderBy(o => o.index)
.Select(s => s.value).ToArray();
When you filtering objects with Enumerable.Where original order is preserved. MSDN:
LINQ to Objects, does preserve ordering
Few more words. You can think of Where as simple filtering elements in foreach loop, which returns items one by one, in exact same order they are come into loop. Like this:
public IEnumerable<T> Where(this IEnumerable<T> sequence, Func<T,bool> predicate)
{
foreach(var item in sequence)
if (predicate(item))
yield return item;
}
Read more on Jon blog.
No need to do any sorting, order will be upheld.
var someInts = new int[] { 423, 234, 5, 79, 211, 1001 };
var filteredInts = someInts.Where(i => i < 200 || i > 300);
// filteredInts = [423, 5, 79, 1001]

LINQ to find array indexes of a value

Assuming I have the following string array:
string[] str = new string[] {"max", "min", "avg", "max", "avg", "min"}
Is it possbile to use LINQ to get a list of indexes that match one string?
As an example, I would like to search for the string "avg" and get a list containing
2, 4
meaning that "avg" can be found at str[2] and str[4].
.Select has a seldom-used overload that produces an index. You can use it like this:
str.Select((s, i) => new {i, s})
.Where(t => t.s == "avg")
.Select(t => t.i)
.ToList()
The result will be a list containing 2 and 4.
Documentation here
You can do it like this:
str.Select((v,i) => new {Index = i, Value = v}) // Pair up values and indexes
.Where(p => p.Value == "avg") // Do the filtering
.Select(p => p.Index); // Keep the index and drop the value
The key step is using the overload of Select that supplies the current index to your functor.
You can use the overload of Enumerable.Select that passes the index and then use Enumerable.Where on an anonymous type:
List<int> result = str.Select((s, index) => new { s, index })
.Where(x => x.s== "avg")
.Select(x => x.index)
.ToList();
If you just want to find the first/last index, you have also the builtin methods List.IndexOf and List.LastIndexOf:
int firstIndex = str.IndexOf("avg");
int lastIndex = str.LastIndexOf("avg");
(or you can use this overload that take a start index to specify the start position)
First off, your code doesn't actually iterate over the list twice, it only iterates it once.
That said, your Select is really just getting a sequence of all of the indexes; that is more easily done with Enumerable.Range:
var result = Enumerable.Range(0, str.Count)
.Where(i => str[i] == "avg")
.ToList();
Understanding why the list isn't actually iterated twice will take some getting used to. I'll try to give a basic explanation.
You should think of most of the LINQ methods, such as Select and Where as a pipeline. Each method does some tiny bit of work. In the case of Select you give it a method, and it essentially says, "Whenever someone asks me for my next item I'll first ask my input sequence for an item, then use the method I have to convert it into something else, and then give that item to whoever is using me." Where, more or less, is saying, "whenever someone asks me for an item I'll ask my input sequence for an item, if the function say it's good I'll pass it on, if not I'll keep asking for items until I get one that passes."
So when you chain them what happens is ToList asks for the first item, it goes to Where to as it for it's first item, Where goes to Select and asks it for it's first item, Select goes to the list to ask it for its first item. The list then provides it's first item. Select then transforms that item into what it needs to spit out (in this case, just the int 0) and gives it to Where. Where takes that item and runs it's function which determine's that it's true and so spits out 0 to ToList, which adds it to the list. That whole thing then happens 9 more times. This means that Select will end up asking for each item from the list exactly once, and it will feed each of its results directly to Where, which will feed the results that "pass the test" directly to ToList, which stores them in a list. All of the LINQ methods are carefully designed to only ever iterate the source sequence once (when they are iterated once).
Note that, while this seems complicated at first to you, it's actually pretty easy for the computer to do all of this. It's not actually as performance intensive as it may seem at first.
While you could use a combination of Select and Where, this is likely a good candidate for making your own function:
public static IEnumerable<int> Indexes<T>(IEnumerable<T> source, T itemToFind)
{
if (source == null)
throw new ArgumentNullException("source");
int i = 0;
foreach (T item in source)
{
if (object.Equals(itemToFind, item))
{
yield return i;
}
i++;
}
}
You need a combined select and where operator, comparing to accepted answer this will be cheaper, since won't require intermediate objects:
public static IEnumerable<TResult> SelectWhere<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, bool> filter, Func<TSource, int, TResult> selector)
{
int index = -1;
foreach (var s in source)
{
checked{ ++index; }
if (filter(s))
yield return selector(s, index);
}
}

Return Modal Average in LINQ (Mode)

I am not sure if CopyMost is the correct term to use here, but it's the term my client used ("CopyMost Data Protocol"). Sounds like he wants the mode? I have a set of data:
Increment Value
.02 1
.04 1
.06 1
.08 2
.10 2
I need to return which Value occurs the most "CopyMost". In this case, the value is 1. Right now I had planned on writing an Extension Method for IEnumerable to do this for integer values. Is there something built into Linq that already does this easily? Or is it best for me to write an extension method that would look something like this
records.CopyMost(x => x.Value);
EDIT
Looks like I am looking for the modal average. I've provided an updated answer that allows for a tiebreaker condition. It's meant to be used like this, and is generic.
records.CopyMost(x => x.Value, x => x == 0);
In this case x.Value would be an int, and if the the count of 0s was the same as the counts of 1s and 3s, it would tiebreak on 0.
Well, here's one option:
var query = (from item in data
group 1 by item.Value into g
orderby g.Count() descending
select g.Key).First();
Basically we're using GroupBy to group by the value - but all we're interested in for each group is the size of the group and the key (which is the original value). We sort the groups by size, and take the first element (the one with the most elements).
Does that help?
Jon beat me to it, but the term you're looking for is Modal Average.
Edit:
If I'm right In thinking that it's modal average you need then the following should do the trick:
var i = (from t in data
group t by t.Value into aggr
orderby aggr.Count() descending
select aggr.Key).First();
This method has been updated several times in my code over the years. It's become a very important method, and is much different than it use to be. I wanted to provide the most up to date version in case anyone was looking to add CopyMost or a Modal Average as a linq extension.
One thing I did not think I would need was a tiebreaker of some sort. I have now overloaded the method to include a tiebreaker.
public static K CopyMost<T, K>(this IEnumerable<T> records, Func<T, K> propertySelector, Func<K, bool> tieBreaker)
{
var grouped = records.GroupBy(x => propertySelector(x)).Select(x => new { Group = x, Count = x.Count() });
var maxCount = grouped.Max(x => x.Count);
var subGroup = grouped.Where(x => x.Count == maxCount);
if (subGroup.Count() == 1)
return subGroup.Single().Group.Key;
else
return subGroup.Where(x => tieBreaker(x.Group.Key)).Single().Group.Key;
}
The above assumes the user enters a legitimate tiebreaker condition. You may want to check and see if the tiebreaker returns a valid value, and if not, throw an exception. And here's my normal method.
public static K CopyMost<T, K>(this IEnumerable<T> records, Func<T, K> propertySelector)
{
return records.GroupBy(x => propertySelector(x)).OrderByDescending(x => x.Count()).Select(x => x.Key).First();
}

Categories