Im curious about how the compiler handles the following expression:
var collapsed = elements.GroupBy(elm => elm.OrderIdentifier).Select(group => new ModelsBase.Laser.Element()
{
CuttingDurationInSeconds = group.Sum(itm => itm.CuttingDurationInSeconds),
FladderDurationInSeconds = group.Sum(itm => itm.FladderDurationInSeconds),
DeliveryDate = group.Min(itm => itm.DeliveryDate),
EfterFladderOpstilTid = group.First().EfterFladderOpstilTid,
EfterRadanOpstilTid = group.First().EfterRadanOpstilTid,
});
As you can see, I'm using group sum twice, so does anyone know if the "group" list will be iterated twice to get both sums, or will it be optimized so there is actually only 1 complete iteration of the list.
LINQ ist most often not the best way to reach high performance, what you get is productivity in programming, you get a result without much lines of code.
The possibilities to optimize is limited. In case of Querys to SQL, there is one rule of thumb: One Query is better than two queries.
1) there is only one round trip to the SQL_Server
2) SQL Server is made to optimize those queries, and optimization is getting better if the server knows, what you want to do in the next step. Optimization is done per query, not over multiple queries.
In case of Linq to Objects, there is absolutely no gain in building huge queries.
As your example shows, it will probably cause multiple iterations. You keep your code simpler and easier to read - but you give up control and therefore performance.
The compiler certainly won't optimize any of that.
If this is using LINQ to Objects, and therefore delegates, the delegate will iterate over each group 5 times, for the 5 properties.
If this is using LINQ to SQL, Entity Framework or something similar, and therefore expression trees, then it's basically up to the query provider to optimize this appropriately.
You can optimise your request by adding two field in the grouping key
var collapsed = elements.GroupBy(elm => new{
OrderIdentifier=elm.OrderIdentifier,
EfterFladderOpstilTid=elm.EfterFladderOpstilTid,
EfterRadanOpstilTid=elm.EfterRadanOpstilTid
})
.Select(group => new ModelsBase.Laser.Element()
{
CuttingDurationInSeconds = group.Sum(itm => itm.CuttingDurationInSeconds),
FladderDurationInSeconds = group.Sum(itm => itm.FladderDurationInSeconds),
DeliveryDate = group.Min(itm => itm.DeliveryDate),
EfterFladderOpstilTid = group.Key.EfterFladderOpstilTid,
EfterRadanOpstilTid = group.Key.EfterRadanOpstilTid,
});
Or by using LET statement
var collapsed = from groupedElement in
(from element in elements
group element by element.OrderIdentifier into g
select g)
let First = groupedElement.First()
select new ModelsBase.Laser.Element()
{
CuttingDurationInSeconds = groupedElement.Sum(itm => itm.CuttingDurationInSeconds),
FladderDurationInSeconds = groupedElement.Sum(itm => itm.FladderDurationInSeconds),
DeliveryDate = groupedElement.Min(itm => itm.DeliveryDate),
EfterFladderOpstilTid = First.EfterFladderOpstilTid,
EfterRadanOpstilTid = First.EfterRadanOpstilTid
};
Related
I seem to have written some very slow piece of code which gets slower when I have to deal with EF Core.
Basically I have a list of items that store attributes in a Json string in the database as I am storing many different items with different attributes.
I then have another table that contains the display order for each attribute, so when I send the items to the client I am order them based on that order.
It is kinda slow at doing 700 records in about 18-30 seconds (from where I start my timer, not the whole block of code).
var itemDtos = new List<ItemDto>();
var inventoryItems = dbContext.InventoryItems.Where(x => x.InventoryCategoryId == categoryId);
var inventorySpecifications = dbContext.InventoryCategorySpecifications.Where(x => x.InventoryCategoryId == categoryId).Select(x => x.InventorySpecification);
Stopwatch a = new Stopwatch();
a.Start();
foreach (var item in inventoryItems)
{
var specs = JObject.Parse(item.Attributes);
var specDtos = new List<SpecDto>();
foreach (var inventorySpecification in inventorySpecifications.OrderBy(x => x.DisplayOrder))
{
if (specs.ContainsKey(inventorySpecification.JsonKey))
{
var value = specs.GetValue(inventorySpecification.JsonKey);
var newSpecDto = new SpecDto()
{
Key = inventorySpecification.JsonKey,
Value = displaySpec.ToString()
};
specDtos.Add(newSpecDto);
}
}
var dto = new InventoryItemDto()
{
// create dto
};
inventoryItemDtos.Add(dto);
}
Now it goes crazy slow when I add EF some more columns that I need info from.
In the //create dto area I access some information from other tables
var dto = new InventoryItemDto()
{
// access brand columns
// access company columns
// access branch columns
// access country columns
// access state columns
};
By trying to access these columns in the loop takes 6mins to process 700 rows.
I don't understand why it is so slow, it's the only change I really made and I made sure to eager load everything in.
To me it almost makes me think eager loading is not working, but I don't know how to verify if it is or not.
var inventoryItems = dbContext.InventoryItems.Include(x => x.Branch).ThenInclude(x => x.Company)
.Include(x => x.Branch).ThenInclude(x => x.Country)
.Include(x => x.Branch).ThenInclude(x => x.State)
.Include(x => x.Brand)
.Where(x => x.InventoryCategoryId == categoryId).ToList();
so I thought because of doing this the speed would not be that much different then the original 18-30 seconds.
I would like to speed up the original code too but I am not really sure how to get rid of the dual foreach loops that is probably slowing it down.
First, loops inside loops is a very bad thing, you should refactor that out and make it a single loop. This should not be a problem because inventorySpecifications is declared outside the loop
Second, the line
var inventorySpecifications = dbContext.InventoryCategorySpecifications.Where(x => x.InventoryCategoryId == categoryId).Select(x => x.InventorySpecification);
should end with ToList(), because it's enumerations is happening within the inner foreach, which means that the query is running for each of "inventoryItems"
that should save you a good amount of time
I'm no expert but this part of your second foreach raises a red flag: inventorySpecifications.OrderBy(x => x.DisplayOrder). Because this is getting called inside another foreach it's doing the .OrderBy call every time you iterate over inventoryItems.
Before your first foreach loop, try this: var orderedInventorySpecs = inventorySpecifications.OrderBy(x => x.DisplayOrder); and then use foreach (var inventorySpec in orderedInventorySpecs) and see if it makes a difference.
To help you better understand what EF is running behind the scenes add some logging in to expose the SQL being run which might help you see how/where your queries are going wrong. This can be extremely helpful to help determine if your queries are hitting the DB too often. As a very general rule you want to hit the DB as few times as possible and retrieve only the information you need via the use of .Select() to reduce what is being returned. The docs for the logging are: http://learn.microsoft.com/en-us/ef/core/miscellaneous/logging
I obviously cannot test this and I am a little unsure where your specDto's go once you have them but I assume they become part of the InventoryItemDto?
var itemDtos = new List<ItemDto>();
var inventoryItems = dbContext.InventoryItems.Where(x => x.InventoryCategoryId == categoryId).Select(x => new InventoryItemDto() {
Attributes = x.Attributes,
//.....
// access brand columns
// access company columns
// access branch columns
// access country columns
// access state columns
}).ToList();
var inventorySpecifications = dbContext.InventoryCategorySpecifications
.Where(x => x.InventoryCategoryId == categoryId)
.OrderBy(x => x.DisplayOrder)
.Select(x => x.InventorySpecification).ToList();
foreach (var item in inventoryItems)
{
var specs = JObject.Parse(item.Attributes);
// Assuming the specs become part of an inventory item?
item.specs = inventorySpecification.Where(x => specs.ContainsKey(x.JsonKey)).Select(x => new SpecDto() { Key = x.JsonKey, Value = specs.GetValue(x.JsonKey)});
}
The first call to the DB for inventoryItems should produce one SQL query that will pull all the information you need at once to construct your InventoryItemDto and thus only hits the DB once. Then it pulls the specs out and uses OrderBy() before materialising which means the OrderBy will be run as part of the SQL query rather than in memory. Both those results are materialised via .ToList() which will cause EF to pull the results into memory in one go.
Finally the loop goes over your constructed inventoryItems, parses the Json and then filters the specs based on that. I am unsure of where you were using the specDtos so I made an assumption that it was part of the model. I would recomend checking the performance of the Json work you are doing as that could be contributing to your slow down.
A more integrated approach to using Json as part of your EF models can be seen at this answer: https://stackoverflow.com/a/51613611/621524 however you will still be unable to use those properties to offload execution to SQL as accessing properties that are defined within code will cause queries to fragment and run in several parts.
I am trying to use LINQ and LAMBDA expressions for querying tables from ORACLE database. When using group by clauses, the time to fetch data is growing considerably.
In the following code block, there is a group by expression which contains if condition.
using (var entities = new Entities())
{
var result = entities.myTable.Where(a => a.COLUMNONE > 1)
.GroupBy(g => new { columnForGrouping = (g.COLUMNTWO > 50 ? "Group1" : "Group2") })
.Select(sel => new {
columnGroup = sel.Key.columnForGrouping,
count = sel.Count()
}).ToList();
}
I am wondering how efficient is this type of group by expressions? And, does it have a better one?
The following instruction might cause the performance issue:
CAST( "Extent1"."COLUMNTWO" AS number(10,0)))
Cast in sql might produce unexpected behavior performance-wise. I suggest you to use a different data-type.
Linq sql queries are not very efficient, especially with group by and joins. Would highly recommend optimize a query and use that directly.
In one of my cases that reduced time from 12sec to 3secs.
This is the gist of my query which I'm testing in LinqPad using Linq to Entity Framework.
In my mind the resultant SQL should begin with something like SELECT TableA.ID AS myID. Instead, the SELECT includes all fields from all of the tables. Needless to say this incurs a massive performance hit among other problems. How can I prevent this?
var AnswerList = this.Answers
.Where(x=>
..... various conditions on x and related entities...
)
.GroupBy(x => new {x.TableA,x.TableB,x.TableC})
.Select(g=>new {
myID = g.Key.TableA.ID,
})
AnswerList.Dump();
In practice I'm using a new type instead of an anonymous one but the results are the same either way.
Let me know if you need me to fill in more of the ...'s.
UPDATE
I've noticed I can prevent this problem by explicitly specifying the fields I want returned in the GroupBy method, e.g. new {x.TableA.ID ... }
But I still don't understand why it doesn't work just using the Select method (which DOES work when doing the equivalent in Linq to SQL).
Hi,
Could you please try below....?
var query = from SubCat in mySubCategory
where SubCat.CategoryID == 1
group 1 by SubCat.CategoryID into grouped
select new { Catg = grouped.Key,
Count = grouped.Count() };
Thank you,
Vishal Patel
I need to identify items from one list that are not present in another list. The two lists are of different entities (ToDo and WorkshopItem). I consider a workshop item to be in the todo list if the Name is matched in any of the todo list items.
The following does what I'm after but find it awkward and hard to understand each time I revisit it. I use NHibernate QueryOver syntax to get the two lists and then a LINQ statement to filter down to just the Workshop items that meet the requirement (DateDue is in the next two weeks and the Name is not present in the list of ToDo items.
var allTodos = Session.QueryOver<ToDo>().List();
var twoWeeksTime = DateTime.Now.AddDays(14);
var workshopItemsDueSoon = Session.QueryOver<WorkshopItem>()
.Where(w => w.DateDue <= twoWeeksTime).List();
var matches = from wsi in workshopItemsDueSoon
where !(from todo in allTodos
select todo.TaskName)
.Contains(wsi.Name)
select wsi;
Ideally I'd like to have just one NHibernate query that returns a list of WorkshopItems that match my requirement.
I think I've managed to put together a Linq version of the answer put forward by #CSL and will mark that as the accepted answer as it put me in the direction of the following.
var twoWeeksTime = DateTime.Now.AddDays(14);
var subquery = NHibernate.Criterion.QueryOver.Of<ToDo>().Select(t => t.TaskName);
var matchingItems = Session.QueryOver<WorkshopItem>()
.Where(w => w.DateDue <= twoWeeksTime &&
w.IsWorkshopItemInProgress == true)
.WithSubquery.WhereProperty(x => x.Name).NotIn(subquery)
.Future<WorkshopItem>();
It returns the results I'm expecting and doesn't rely on magic strings. I'm hesitant because I don't fully understand the WithSubquery (and whether inlining it would be a good thing). It seems to equate to
WHERE WorkshopItem.Name IS NOT IN (subquery)
Also I don't understand the Future instead of List. If anyone would shed some light on those that would help.
I am not 100% sure how to achieve what you need using LINQ so to give you an option I am just putting up an alternative solution using nHibernate Criteria (this will execute in one database hit):
// Create a query
ICriteria query = Session.CreateCriteria<WorkShopItem>("wsi");
// Restrict to items due within the next 14 days
query.Add(Restrictions.Le("DateDue", DateTime.Now.AddDays(14));
// Return all TaskNames from Todo's
DetachedCriteria allTodos = DetachedCriteria.For(typeof(Todo)).SetProjection(Projections.Property("TaskName"));
// Filter Work Shop Items for any that do not have a To-do item
query.Add(SubQueries.PropertyNotIn("Name", allTodos);
// Return results
var matchingItems = query.Future<WorkShopItem>().ToList()
I'd recommend
var workshopItemsDueSoon = Session.QueryOver<WorkshopItem>()
.Where(w => w.DateDue <= twoWeeksTime)
var allTodos = Session.QueryOver<ToDo>();
Instead of
var allTodos = Session.QueryOver<ToDo>().List();
var workshopItemsDueSoon = Session.QueryOver<WorkshopItem>()
.Where(w => w.DateDue <= twoWeeksTime).List();
So that the collection isn't iterated until you need it to be.
I've found that it's helpfull to use linq extension methods to make subqueries more readable and less awkward.
For example:
var matches = from wsi in workshopItemsDueSoon
where !allTodos.Select(it=>it.TaskName).Contains(wsi.Name)
select wsi
Personally, since the query is fairly simple, I'd prefer to do it like so:
var matches = workshopItemsDueSoon.Where(wsi => !allTodos.Select(it => it.TaskName).Contains(wsi.Name))
The latter seems less verbose to me.
a) Would the following two queries produce the same results:
var query1 = collection_1
.SelectMany(c_1 => c_1.collection_2)
.SelectMany(c_2 => c_2.collection_3)
.Select(c_3 => c_3);
var query2 = collection_1
.SelectMany(c_1 => c_1.collection_2
.SelectMany(c_2 => c_2.collection_3.Select(c_3 => c_3)));
b) I assume the two queries can't always be used interchangeably? For example, if we wanted the output elements to also contain values of c_1 and c_2, then we only achieve this with query2, but not with query1:
var query2 = collection_1
.SelectMany(c_1 => c_1.collection_2
.SelectMany(c_2 => c_2.collection_3.Select(c_3 => new { c_1, c_2, c_3 } )));
?
Thank you
The snippets you've given seem to be invalid. c_3 isn't defined in the scope of the Select statement, so unless I've misunderstood something, this won't compile.
It seems as though you're trying to select the elements of collection_3, but this is done implicitly by SelectMany, and so the final Select statements in both cases are redundant. Take them out, and the two queries are equivalent.
All you need is this:
var query = collection_1
.SelectMany(c_1 => c_1.collection_2)
.SelectMany(c_2 => c_2.collection_3);
Update: x => x is the identity mapping, so Select(x => x) is always redundant, regardless of the context. It just means "for every element in the sequence, select the element".
The second snippet is of course different, and the SelectMany and Select statements indeed need to be nested in order to select all three elements, c_1, c_2, and c_3.
Like Gert, says, though, you're probably better off using query comprehension syntax. It's much more succinct and makes it easier to mentally parse the workings of a query.
a. The queries are equal because in both cases you end up with all c_3's in c_1 through c_2.
b. You can't get to c_1 and c_2 with these queries as you suggest. If you want that you need this overload of SelectMany. This "fluent" syntax is quite clumsy though. This is typically a case where comprehensive syntax which does the same is much better:
from c_1 in colection_1
from c_2 in c_1.collection_2
from c_3 in c_2.collection_3
select new { c_1.x, c_2.y, c_3.z }