Querying nested lists with LINQ instead of loops - c#

Lets say I have the following setup
Continent
--Countries
----Provinces
------Cities
A continent contains a list of many countries which contains a list of many provinces which contains a list of many cities.
For each nested list lets say I want to do a check (name length is greater than 5)
Instead of using this loop structure
var countries = dbSet.Countries.Where(c => c.Name.Length > 5);
foreach (var country in countries)
{
country.Provinces = country.Provinces.Where(p => p.Name.Length > 5);
foreach (var province in country.Provinces)
{
province.Cities = province.Cities.Where(ci => ci.Name.Length() > 5);
}
}
How could I accomplish the same efficiently with LINQ?

Efficiently? In terms of written code, sure, but we'll call that "cleanly". In terms of execution, that's not a question you should be asking at this point. Focus on getting the job done in code that's understandable and then "race your horses" to see if you really need to improve on it.
One thing I should caution is that LINQ is about querying, which doesn't mutate the source sequences. You're assigning the filtered sequences back to the properties and that's contrary to LINQ principles. The tag shows you're using Entity Framework so it's definitely not a good idea to do that because it uses its own collection types under the hood.
To answer your question, the SelectMany extension method loops on the projected sequence. When it's translated to a database query, it translates to a join.
dbSet.Countries
.Where(c => c.Names.Length > 5)
.SelectMany(c => c.Provinces)
.Where(p => p.Name.Length > 5)
.SelectMany(p => p.Cities)
.Where(ci => ci.Name.Length > 5)
.Select(ci => ci.Name);
That'll give you the names of all cities where the country, province, and city names are all longer than 5 characters.
But that only gives you the names of the cities. If you want to know each level of information, extension methods are difficult to use because you have to project "transparent identifiers" at each step along the way and it can get pretty cluttered. Let the compiler do that for you by using LINQ syntax.
from c in dbSet.Countries
where c.Name.Length > 5
from p in c.Provinces
where p.Name.Length > 5
from ci in p.Cities
where ci.Name.Length > 5
That will do the same thing as above, except now, all your range variables are carried through the expression so you can do this:
select new
{
CountryName = c.Name,
ProvinceName = p.Name,
CityName = ci.Name
};
...or whatever you want to do with c, p, and ci.
EDIT: Merged the second answer, which addressed questions in the comments, into this one.
In order to preserve the parent levels through the query, you need to project a container for the parent and the child each time you loop through a collection of child objects. When you use LINQ syntax, the compiler does this for you in the form of a "transparent identifier". It's transparent because your references to range variables "go right through" it and you never see it. Jon Skeet touches on them near the end of Reimplementing LINQ to Objects: Part 19 – Join.
To accomplish this, you want to use a different overload of SelectMany this time, one that also takes a lambda to project the container you need. Each iteration through the child items, that lambda is called and passed two parameters, the parent and the current iteration's child item.
var result = dbSet.Countries
.Where(c => c.Names.Length > 5)
.SelectMany(c => c.Provinces, (c, p) => new { c, p })
.Where(x1 => x1.p.Name.Length > 5)
.SelectMany(x1 => x1.p.Cities, (x1, ci) => new { x1.c, x1.p, ci })
.Where(x2 => x2.ci.Name.Length > 5)
.Select(x2 => new
{
Country = x2.c,
Province = x2.p,
City = x2.ci
})
.ToList();
The x1 and x2 lambda arguments are the containers projected from the preceding SelectMany call. I like to call them "opaque identifiers". They're no longer transparent if you have refer to them explicitly.
The c, p, and ci range variables are now properties of those containers.
As a bonus note, when you use a let clause, the compiler's doing the exact same thing, creating a container that has all of the available range variables and the new variable that's being introduced.
I want to end this with a word of advice: Use LINQ syntax as much as possible. It's easier to write and get right, and it's easier to read because you don't have all those projections that the compiler can do for you. If you have to resort to extension methods, do so in parts. The two techniques can be mixed. There's art in keeping it from looking like a mess.

Related

C#, lambda : How are redundant calls handled?

Im curious about how the compiler handles the following expression:
var collapsed = elements.GroupBy(elm => elm.OrderIdentifier).Select(group => new ModelsBase.Laser.Element()
{
CuttingDurationInSeconds = group.Sum(itm => itm.CuttingDurationInSeconds),
FladderDurationInSeconds = group.Sum(itm => itm.FladderDurationInSeconds),
DeliveryDate = group.Min(itm => itm.DeliveryDate),
EfterFladderOpstilTid = group.First().EfterFladderOpstilTid,
EfterRadanOpstilTid = group.First().EfterRadanOpstilTid,
});
As you can see, I'm using group sum twice, so does anyone know if the "group" list will be iterated twice to get both sums, or will it be optimized so there is actually only 1 complete iteration of the list.
LINQ ist most often not the best way to reach high performance, what you get is productivity in programming, you get a result without much lines of code.
The possibilities to optimize is limited. In case of Querys to SQL, there is one rule of thumb: One Query is better than two queries.
1) there is only one round trip to the SQL_Server
2) SQL Server is made to optimize those queries, and optimization is getting better if the server knows, what you want to do in the next step. Optimization is done per query, not over multiple queries.
In case of Linq to Objects, there is absolutely no gain in building huge queries.
As your example shows, it will probably cause multiple iterations. You keep your code simpler and easier to read - but you give up control and therefore performance.
The compiler certainly won't optimize any of that.
If this is using LINQ to Objects, and therefore delegates, the delegate will iterate over each group 5 times, for the 5 properties.
If this is using LINQ to SQL, Entity Framework or something similar, and therefore expression trees, then it's basically up to the query provider to optimize this appropriately.
You can optimise your request by adding two field in the grouping key
var collapsed = elements.GroupBy(elm => new{
OrderIdentifier=elm.OrderIdentifier,
EfterFladderOpstilTid=elm.EfterFladderOpstilTid,
EfterRadanOpstilTid=elm.EfterRadanOpstilTid
})
.Select(group => new ModelsBase.Laser.Element()
{
CuttingDurationInSeconds = group.Sum(itm => itm.CuttingDurationInSeconds),
FladderDurationInSeconds = group.Sum(itm => itm.FladderDurationInSeconds),
DeliveryDate = group.Min(itm => itm.DeliveryDate),
EfterFladderOpstilTid = group.Key.EfterFladderOpstilTid,
EfterRadanOpstilTid = group.Key.EfterRadanOpstilTid,
});
Or by using LET statement
var collapsed = from groupedElement in
(from element in elements
group element by element.OrderIdentifier into g
select g)
let First = groupedElement.First()
select new ModelsBase.Laser.Element()
{
CuttingDurationInSeconds = groupedElement.Sum(itm => itm.CuttingDurationInSeconds),
FladderDurationInSeconds = groupedElement.Sum(itm => itm.FladderDurationInSeconds),
DeliveryDate = groupedElement.Min(itm => itm.DeliveryDate),
EfterFladderOpstilTid = First.EfterFladderOpstilTid,
EfterRadanOpstilTid = First.EfterRadanOpstilTid
};

Specific Ordering of a List of Objects using LINQ

I am using LINQ and I want to order a list using one of the columns in my DB table. The issue is that I want to order a list of people by their branch which I am able to do but I want to place the branch in a certain order.
Right now I am using this:
phoneList.OrderBy(e => e.Branch).ThenBy(e => e.FullName)
Say that i have these branches avaialble: 82pk,corp,prfe,hrbd.
My current code will sort the people in this branch order: 82pk,corp,hrbd,prfe
I want to sort the people in this branch order: corp,82pk,prfe,hrbd
How can I use LINQ to order my list manually given my specific order?
This is what i am trying to accomplish:
phoneList.OrderBy(e => e.Branch == corp).ThenBy(e => e.Branch == 82pk).ThenBy(e => e.Branch == prfe).ThenBy(e => e.Branch == hrbd)
Start with an array of your required order
var branchOrder = new[]{"corp","82pk","prfe","hrbd"};
Then order by the index position in this list:
phoneList.OrderBy(e => branchOrder.IndexOf(e.Branch)).ThenBy(e => e.FullName);
This has the added benefit that it works as expected even with EF queries.
This will get slower as the number of items grows, as described in comments there is a simple enhancement to store the "branch" and required order in a dictionary.
var branchOrder = new Dictionary<string,int>(){
{"corp",1},
{"82pk",2},
... etc
}
phoneList.OrderBy(
e => branchOrder.ContainsKey(e.Branch)
? branchOrder[e.Branch]
: 0) // give a default to protect against invalid key
.ThenBy(e => e.FullName);
Let your Branch-class implement ICompareble with the logic you described. Then Linq's OrderBy will sort as you want.

Identify items in one list not in another of a different type

I need to identify items from one list that are not present in another list. The two lists are of different entities (ToDo and WorkshopItem). I consider a workshop item to be in the todo list if the Name is matched in any of the todo list items.
The following does what I'm after but find it awkward and hard to understand each time I revisit it. I use NHibernate QueryOver syntax to get the two lists and then a LINQ statement to filter down to just the Workshop items that meet the requirement (DateDue is in the next two weeks and the Name is not present in the list of ToDo items.
var allTodos = Session.QueryOver<ToDo>().List();
var twoWeeksTime = DateTime.Now.AddDays(14);
var workshopItemsDueSoon = Session.QueryOver<WorkshopItem>()
.Where(w => w.DateDue <= twoWeeksTime).List();
var matches = from wsi in workshopItemsDueSoon
where !(from todo in allTodos
select todo.TaskName)
.Contains(wsi.Name)
select wsi;
Ideally I'd like to have just one NHibernate query that returns a list of WorkshopItems that match my requirement.
I think I've managed to put together a Linq version of the answer put forward by #CSL and will mark that as the accepted answer as it put me in the direction of the following.
var twoWeeksTime = DateTime.Now.AddDays(14);
var subquery = NHibernate.Criterion.QueryOver.Of<ToDo>().Select(t => t.TaskName);
var matchingItems = Session.QueryOver<WorkshopItem>()
.Where(w => w.DateDue <= twoWeeksTime &&
w.IsWorkshopItemInProgress == true)
.WithSubquery.WhereProperty(x => x.Name).NotIn(subquery)
.Future<WorkshopItem>();
It returns the results I'm expecting and doesn't rely on magic strings. I'm hesitant because I don't fully understand the WithSubquery (and whether inlining it would be a good thing). It seems to equate to
WHERE WorkshopItem.Name IS NOT IN (subquery)
Also I don't understand the Future instead of List. If anyone would shed some light on those that would help.
I am not 100% sure how to achieve what you need using LINQ so to give you an option I am just putting up an alternative solution using nHibernate Criteria (this will execute in one database hit):
// Create a query
ICriteria query = Session.CreateCriteria<WorkShopItem>("wsi");
// Restrict to items due within the next 14 days
query.Add(Restrictions.Le("DateDue", DateTime.Now.AddDays(14));
// Return all TaskNames from Todo's
DetachedCriteria allTodos = DetachedCriteria.For(typeof(Todo)).SetProjection(Projections.Property("TaskName"));
// Filter Work Shop Items for any that do not have a To-do item
query.Add(SubQueries.PropertyNotIn("Name", allTodos);
// Return results
var matchingItems = query.Future<WorkShopItem>().ToList()
I'd recommend
var workshopItemsDueSoon = Session.QueryOver<WorkshopItem>()
.Where(w => w.DateDue <= twoWeeksTime)
var allTodos = Session.QueryOver<ToDo>();
Instead of
var allTodos = Session.QueryOver<ToDo>().List();
var workshopItemsDueSoon = Session.QueryOver<WorkshopItem>()
.Where(w => w.DateDue <= twoWeeksTime).List();
So that the collection isn't iterated until you need it to be.
I've found that it's helpfull to use linq extension methods to make subqueries more readable and less awkward.
For example:
var matches = from wsi in workshopItemsDueSoon
where !allTodos.Select(it=>it.TaskName).Contains(wsi.Name)
select wsi
Personally, since the query is fairly simple, I'd prefer to do it like so:
var matches = workshopItemsDueSoon.Where(wsi => !allTodos.Select(it => it.TaskName).Contains(wsi.Name))
The latter seems less verbose to me.

Can these two LINQ queries be used interchangeably?

a) Would the following two queries produce the same results:
var query1 = collection_1
.SelectMany(c_1 => c_1.collection_2)
.SelectMany(c_2 => c_2.collection_3)
.Select(c_3 => c_3);
var query2 = collection_1
.SelectMany(c_1 => c_1.collection_2
.SelectMany(c_2 => c_2.collection_3.Select(c_3 => c_3)));
b) I assume the two queries can't always be used interchangeably? For example, if we wanted the output elements to also contain values of c_1 and c_2, then we only achieve this with query2, but not with query1:
var query2 = collection_1
.SelectMany(c_1 => c_1.collection_2
.SelectMany(c_2 => c_2.collection_3.Select(c_3 => new { c_1, c_2, c_3 } )));
?
Thank you
The snippets you've given seem to be invalid. c_3 isn't defined in the scope of the Select statement, so unless I've misunderstood something, this won't compile.
It seems as though you're trying to select the elements of collection_3, but this is done implicitly by SelectMany, and so the final Select statements in both cases are redundant. Take them out, and the two queries are equivalent.
All you need is this:
var query = collection_1
.SelectMany(c_1 => c_1.collection_2)
.SelectMany(c_2 => c_2.collection_3);
Update: x => x is the identity mapping, so Select(x => x) is always redundant, regardless of the context. It just means "for every element in the sequence, select the element".
The second snippet is of course different, and the SelectMany and Select statements indeed need to be nested in order to select all three elements, c_1, c_2, and c_3.
Like Gert, says, though, you're probably better off using query comprehension syntax. It's much more succinct and makes it easier to mentally parse the workings of a query.
a. The queries are equal because in both cases you end up with all c_3's in c_1 through c_2.
b. You can't get to c_1 and c_2 with these queries as you suggest. If you want that you need this overload of SelectMany. This "fluent" syntax is quite clumsy though. This is typically a case where comprehensive syntax which does the same is much better:
from c_1 in colection_1
from c_2 in c_1.collection_2
from c_3 in c_2.collection_3
select new { c_1.x, c_2.y, c_3.z }

Replacing nested foreach with LINQ; modify and update a property deep within

Consider the requirement to change a data member on one or more properties of an object that is 5 or 6 levels deep.
There are sub-collections that need to be iterated through to get to the property that needs inspection & modification.
Here we're calling a method that cleans the street address of a Employee. Since we're changing data within the loops, the current implementation needs a for loop to prevent the exception:
Cannot assign to "someVariable" because it is a 'foreach iteration variable'
Here's the current algorithm (obfuscated) with nested foreach and a for.
foreach (var emp in company.internalData.Emps)
{
foreach (var addr in emp.privateData.Addresses)
{
int numberAddresses = addr.Items.Length;
for (int i = 0; i < numberAddresses; i++)
{
//transform this street address via a static method
if (addr.Items[i].Type =="StreetAddress")
addr.Items[i].Text = CleanStreetAddressLine(addr.Items[i].Text);
}
}
}
Question:
Can this algorithm be reimplemented using LINQ? The requirement is for the original collection to have its data changed by that static method call.
Update: I was thinking/leaning in the direction of a jQuery/selector type solution. I didn't specifically word this question in that way. I realize that I was over-reaching on that idea (no side-effects). Thanks to everyone! If there is such a way to perform a jQuery-like selector, please let's see it!
foreach(var item in company.internalData.Emps
.SelectMany(emp => emp.privateData.Addresses)
.SelectMany(addr => addr.Items)
.Where(addr => addr.Type == "StreetAddress"))
item.Text = CleanStreetAddressLine(item.Text);
var dirtyAddresses = company.internalData.Emps.SelectMany( x => x.privateData.Addresses )
.SelectMany(y => y.Items)
.Where( z => z.Type == "StreetAddress");
foreach(var addr in dirtyAddresses)
addr.Text = CleanStreetAddressLine(addr.Text);
LINQ is not intended to modify sets of objects. You wouldn't expect a SELECT sql statement to modify the values of the rows being selected, would you? It helps to remember what LINQ stands for - Language INtegrated Query. Modifying objects within a linq query is, IMHO, an anti-pattern.
Stan R.'s answer would be a better solution using a foreach loop, I think.
I don't like mixing "query comprehension" syntax and dotted-method-call syntax in the same statement.
I do like the idea of separating the query from the action. These are semantically distinct, so separating them in code often makes sense.
var addrItemQuery = from emp in company.internalData.Emps
from addr in emp.privateData.Addresses
from addrItem in addr.Items
where addrItem.Type == "StreetAddress"
select addrItem;
foreach (var addrItem in addrItemQuery)
{
addrItem.Text = CleanStreetAddressLine(addrItem.Text);
}
A few style notes about your code; these are personal, so I you may not agree:
In general, I avoid abbreviations (Emps, emp, addr)
Inconsistent names are more confusing (addr vs. Addresses): pick one and stick with it
The word "number" is ambigious. It can either be an identity ("Prisoner number 378 please step forward.") or a count ("the number of sheep in that field is 12."). Since we use both concepts in code a lot, it is valuable to get this clear. I use often use "index" for the first one and "count" for the second.
Having the type field be a string is a code smell. If you can make it an enum your code will probably be better off.
Dirty one-liner.
company.internalData.Emps.SelectMany(x => x.privateData.Addresses)
.SelectMany(x => x.Items)
.Where(x => x.Type == "StreetAddress")
.Select(x => { x.Text = CleanStreetAddressLine(x.Text); return x; });
LINQ does not provide the option of having side effects. however you could do:
company.internalData.Emps.SelectMany(emp => emp.Addresses).SelectMany(addr => Addr.Items).ToList().ForEach(/*either make an anonymous method or refactor your side effect code out to a method on its own*/);
You can do this, but you don't really want to. Several bloggers have talked about the functional nature of Linq, and if you look at all the MS supplied Linq methods, you will find that they don't produce side effects. They produce return values, but they don't change anything else. Search for the arguments over a Linq ForEach method, and you'll get a good explanation of this concept.
With that in mind, what you probaly want is something like this:
var addressItems = company.internalData.Emps.SelectMany(
emp => emp.privateData.Addresses.SelectMany(
addr => addr.Items
)
);
foreach (var item in addressItems)
{
...
}
However, if you do want to do exactly what you asked, then this is the direction you'll need to go:
var addressItems = company.internalData.Emps.SelectMany(
emp => emp.privateData.Addresses.SelectMany(
addr => addr.Items.Select(item =>
{
// Do the stuff
return item;
})
)
);
To update the LINQ result using FOREACH loop, I first create local ‘list’ variable and then perform the update using FOREACH Loop. The value are updated this way. Read more here:
How to update value of LINQ results using FOREACH loop
I cloned list and worked NET 4.7.2
List<TrendWords> ListCopy = new List<TrendWords>(sorted);
foreach (var words in stopWords)
{
foreach (var item in ListCopy.Where(w => w.word == words))
{
item.disabled = true;
}
}

Categories