I am trying to use LINQ and LAMBDA expressions for querying tables from ORACLE database. When using group by clauses, the time to fetch data is growing considerably.
In the following code block, there is a group by expression which contains if condition.
using (var entities = new Entities())
{
var result = entities.myTable.Where(a => a.COLUMNONE > 1)
.GroupBy(g => new { columnForGrouping = (g.COLUMNTWO > 50 ? "Group1" : "Group2") })
.Select(sel => new {
columnGroup = sel.Key.columnForGrouping,
count = sel.Count()
}).ToList();
}
I am wondering how efficient is this type of group by expressions? And, does it have a better one?
The following instruction might cause the performance issue:
CAST( "Extent1"."COLUMNTWO" AS number(10,0)))
Cast in sql might produce unexpected behavior performance-wise. I suggest you to use a different data-type.
Linq sql queries are not very efficient, especially with group by and joins. Would highly recommend optimize a query and use that directly.
In one of my cases that reduced time from 12sec to 3secs.
Related
Im curious about how the compiler handles the following expression:
var collapsed = elements.GroupBy(elm => elm.OrderIdentifier).Select(group => new ModelsBase.Laser.Element()
{
CuttingDurationInSeconds = group.Sum(itm => itm.CuttingDurationInSeconds),
FladderDurationInSeconds = group.Sum(itm => itm.FladderDurationInSeconds),
DeliveryDate = group.Min(itm => itm.DeliveryDate),
EfterFladderOpstilTid = group.First().EfterFladderOpstilTid,
EfterRadanOpstilTid = group.First().EfterRadanOpstilTid,
});
As you can see, I'm using group sum twice, so does anyone know if the "group" list will be iterated twice to get both sums, or will it be optimized so there is actually only 1 complete iteration of the list.
LINQ ist most often not the best way to reach high performance, what you get is productivity in programming, you get a result without much lines of code.
The possibilities to optimize is limited. In case of Querys to SQL, there is one rule of thumb: One Query is better than two queries.
1) there is only one round trip to the SQL_Server
2) SQL Server is made to optimize those queries, and optimization is getting better if the server knows, what you want to do in the next step. Optimization is done per query, not over multiple queries.
In case of Linq to Objects, there is absolutely no gain in building huge queries.
As your example shows, it will probably cause multiple iterations. You keep your code simpler and easier to read - but you give up control and therefore performance.
The compiler certainly won't optimize any of that.
If this is using LINQ to Objects, and therefore delegates, the delegate will iterate over each group 5 times, for the 5 properties.
If this is using LINQ to SQL, Entity Framework or something similar, and therefore expression trees, then it's basically up to the query provider to optimize this appropriately.
You can optimise your request by adding two field in the grouping key
var collapsed = elements.GroupBy(elm => new{
OrderIdentifier=elm.OrderIdentifier,
EfterFladderOpstilTid=elm.EfterFladderOpstilTid,
EfterRadanOpstilTid=elm.EfterRadanOpstilTid
})
.Select(group => new ModelsBase.Laser.Element()
{
CuttingDurationInSeconds = group.Sum(itm => itm.CuttingDurationInSeconds),
FladderDurationInSeconds = group.Sum(itm => itm.FladderDurationInSeconds),
DeliveryDate = group.Min(itm => itm.DeliveryDate),
EfterFladderOpstilTid = group.Key.EfterFladderOpstilTid,
EfterRadanOpstilTid = group.Key.EfterRadanOpstilTid,
});
Or by using LET statement
var collapsed = from groupedElement in
(from element in elements
group element by element.OrderIdentifier into g
select g)
let First = groupedElement.First()
select new ModelsBase.Laser.Element()
{
CuttingDurationInSeconds = groupedElement.Sum(itm => itm.CuttingDurationInSeconds),
FladderDurationInSeconds = groupedElement.Sum(itm => itm.FladderDurationInSeconds),
DeliveryDate = groupedElement.Min(itm => itm.DeliveryDate),
EfterFladderOpstilTid = First.EfterFladderOpstilTid,
EfterRadanOpstilTid = First.EfterRadanOpstilTid
};
we are Entity Framework in our project. Need to know the performance impact between .ANY() and Expressions to form Where clause.
In the below function i used two approach to get result:
APPROACH 1 - Form Lambda expression query using ANY()
From my observation using .Any() is not adding where clause when query is executed(checked in sql profiler) what EF does is gets all matched inner joined records store in-memory and then apply condition specified in .ANY()
APPROACH 2 - Form Expression Query Starts
With Expressions i'm explicitly forming where clause and executing.checked the same in SQL query Profiler i'm able to see where clause.
Note: To form expression where clause i'm doing additional loops and "CombinePredicates".
Now, my doubts are:
which approach will improve performance. Do i need to go with Lambda
with .ANY() or Expressions?
what is the right way to from where clause to improve performance?
If not the two approach suggest me the right way to do it
private bool GetClientNotifications(int clientId, IList<ClientNotification> clientNotifications)
{
IList<string> clientNotificationList = null;
var clientNotificationsExists = clientNotifications?.Select(x => new { x.Name, x.notificationId
}).ToList();
if (clientNotificationsExists?.Count > 0)
{
//**Approach 1 => Form Lamada Query starts**
notificationList = this._clientNotificationRepository?.FindBy(x => clientNotificationsExists.Any(x1 => x.notificationId == x1.notificationId && x.clientId ==
clientId)).Select(x => x.Name).ToList();
**//Form Lamada Query Ends**
//**Approach 2 =>Form Expression Query Starts**
var filterExpressions = new List<Expression<Func<DbModel.ClientNotification, bool>>>();
Expression<Func<DbModel.ClientNotification, bool>> predicate = null;
foreach (var clientNotification in clientNotificationsExists)
{
predicate = a => a.clientId == clientId && a.notificationId == clientNotification .notificationId;
filterExpressions.Add(predicate);
}
predicate = filterExpressions.CombinePredicates<DbModel.ClientNotification>(Expression.OrElse);
clientNotificationList = this._clientNotificationRepository?.FindBy(predicate).Select(x => x.Name).ToList();
//**Form Expression Query Ends**
}
return clientNotificationList;
}
If non of the approaches were good please suggest me the right way to do.
I noticed the clientId being used on both approaches is used on conditions linked to clientNotificationsExists but there is no actual relationship with its objects so this condition can be moved up one level. This might bring a minuscule benefit as the overall sql will be smaller without the duplicate clauses.
From the described behavior the first approach filter is not being translated to sql so it is being resolved at the client side however, if it could be translated the performance would be similar or identical.
If you want to generate an IN sql clause you can use an array and use the Contains method within your filter expression. That could perhaps improve a little bit but don't get your hopes too high.
Try the following:
private bool GetClientNotifications(int clientId, IList<ClientNotification> clientNotifications)
{
IList<string> clientNotificationList = null;
var clientNotificationsExists = clientNotifications?
.Select(x => new { x.Name, x.notificationId }).ToList();
if (clientNotificationsExists?.Count > 0)
{
var notificationIds = clientNotificationsExists.Select(x => x.notificationId).ToArray();
clientNotificationList = this._clientNotificationRepository?
.FindBy(x => x.clientId == clientId && notificationIds.Contains(x.notificationId));
}
return clientNotificationList;
}
The following code produces the error
The nested query is not supported. Operation1='Case' Operation2='Collect'
The question is what am i doing so terribly wrong? How can i fix that?
IQueryable<Map.League> v = from ul in userLeagues
select new Map.League
{
id = ul.LeagueID,
seasons =
inc.Seasons ? (from ss in ul.Standings
where ss.LeagueID == ul.LeagueID
select new Map.Season
{
seasonId = ss.Season.SeasonId,
seasonName = ss.Season.SeasonName
}).ToList() : null,
};
Update
what i cannot follow is why this is working as a charm
seasons = (from ss in ul.Standings
where ss.LeagueID == ul.LeagueID
select new Map.Season
{
seasonId = ss.Season.SeasonId,
seasonName = ss.Season.SeasonName
}).Distinct(),
what is wrong with the ternary operator?
The exception indicates that you're using Entity Framework. Always good to mention the LINQ implementation in questions.
When LINQ runs against a SQL backend, the SQL provider tries to translate the whole statement into one SQL statement. This greatly reduces the types of operations that are supported, because SQL is far more restricted than LINQ. Note that the variable inc.Seasons should also part of the SQL statement. Now the problem is that a SQL can't return two different result set depending on a variable that's part of itself: there is always one fixed SELECT clause.
So there is a Case method in the expression in a place where it's not supported (and I guess that hence the subsequent Collect isn't supported either).
You could solve this by making the inclusion part of the where clause:
from ul in userLeagues
select new Map.League
{
id = ul.LeagueID,
seasons = from ss in ul.Standings
where inc.Seasons // here
&& ss.LeagueID == ul.LeagueID
select new Map.Season
{
seasonId = ss.Season.SeasonId,
seasonName = ss.Season.SeasonName
})
}
I think you simply can't put an if-else inside a linq query, at least not in that spot.
See this post for detailed explanation and discussion.
Oh, and specially look at the "hack" by user AyCabron, I think that could resolve your situation neatly (depending on what you exactly want and why you choose to have null pointers).
The problem is not with Linq in general but with Linq to objects.
since you are using IQueryable you expect the query to run in the DB,
in that context, you cannot use many operators including the ternary operator.
if you tried the same code using Linq to objects (i.e Enumerable) it will succeed.
see example here: working example
Error The nested query is not supported. Operation1='Case' Operation2='Collect' is generated by EF when you use null within a ? statement.
EF can not convert statements like condition ? object/list : null.
In your specific example, remove .ToList() as it will also produce error when there is no rows return. EF will automatically give you null when there is no items to select.
This is the gist of my query which I'm testing in LinqPad using Linq to Entity Framework.
In my mind the resultant SQL should begin with something like SELECT TableA.ID AS myID. Instead, the SELECT includes all fields from all of the tables. Needless to say this incurs a massive performance hit among other problems. How can I prevent this?
var AnswerList = this.Answers
.Where(x=>
..... various conditions on x and related entities...
)
.GroupBy(x => new {x.TableA,x.TableB,x.TableC})
.Select(g=>new {
myID = g.Key.TableA.ID,
})
AnswerList.Dump();
In practice I'm using a new type instead of an anonymous one but the results are the same either way.
Let me know if you need me to fill in more of the ...'s.
UPDATE
I've noticed I can prevent this problem by explicitly specifying the fields I want returned in the GroupBy method, e.g. new {x.TableA.ID ... }
But I still don't understand why it doesn't work just using the Select method (which DOES work when doing the equivalent in Linq to SQL).
Hi,
Could you please try below....?
var query = from SubCat in mySubCategory
where SubCat.CategoryID == 1
group 1 by SubCat.CategoryID into grouped
select new { Catg = grouped.Key,
Count = grouped.Count() };
Thank you,
Vishal Patel
Why cant I do this:
usuariosEntities usersDB = new usuariosEntities();
foreach (DataGridViewRow user in dgvUsuarios.Rows)
{
var rowtoupdate =
usersDB.usuarios.Where(
u => u.codigo_usuario == Convert.ToInt32(user.Cells[0].Value)
).First();
rowtoupdate.password = user.Cells[3].Value.ToString();
}
usersDB.SaveChanges();
And have to do this:
usuariosEntities usersDB = new usuariosEntities();
foreach (DataGridViewRow user in dgvUsuarios.Rows)
{
int usercode = Convert.ToInt32(user.Cells[0].Value);
var rowtoupdate =
usersDB.usuarios.Where(u => u.codigo_usuario == usercode).First();
rowtoupdate.password = user.Cells[3].Value.ToString();
}
usersDB.SaveChanges();
I must admit it is a more readable code but why cant this be done?
The thing about it is that LINQ queries are transformed by the compiler into an expression tree. This expression tree is then converted into T-SQL and passed to the server. LINQ to SQL maps certain methods like String.Contains to the T-SQL equivalents.
In the first example, LINQ apparently does not map Convert.ToInt32 to anything and the exception is thrown. The reason it works in your second example is because the Convert.ToInt32 call is done outside of the query so it isn't part of the expression tree and doesn't need to be converted to T-SQL.
This MSDN page describes how LINQ to SQL translates various data types, operators, and methods into T-SQL. (Although the documentation does suggest Convert.ToInt32 is supported so I'm not sure what else might be going on here.)
Sorry just realized this is ADO.NET Entity Framework, not LINQ to SQL. This page lists the ADO.NET Entity Framework mappings. It is a bit more restrictive, mostly because it needs to work with multiple providers.
Because your LINQ to Ent. doesn't compile the query into MSIL with all the meta data, but simply translates the query into a few extantion methods and is bounded to lambda parsing abuilities of the languge. That means that
this code:
var results = from c in SomeCollection
where c.SomeProperty < someValue * 2
select new {c.SomeProperty, c.OtherProperty};
is the same as this:
var results =
SomeCollection
.Where(c => c.SomeProperty < someValue * 2)
.Select(c => new {c.SomeProperty, c.OtherProperty});