SQL equivalent of Count extension method for LINQ isn't obvious - c#

I'm doing LINQ to entity framework (EF) to get count of records in my table using below code:
using (var db = new StackOverflowEntities())
{
var empLevelCount = db.employeeLevels.Count();
}
I captured the query fired by EF towards database using SQL Server Profiler. I got the following query :
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[employeeLevels] AS [Extent1]
) AS [GroupBy1]
This query remains exactly the same even for LongCount extension method except for the fact that COUNT SQL function gets replaced by COUNT_BIG in the SQL query being created by EF. The query created by LINQ to EF provider looks very weird to me. Why it is not simply doing something like below to return the scalar count value?
SELECT
COUNT(1) AS [A1]
FROM [dbo].[employeeLevels] AS [Extent1]
It will be really helpful if someone can help me understand the additional logistics being taken care of by EF internally which is why LINQ to EF provider is creating such a query? It seems EF is trying to deal with some additional use cases as well through some common algorithm which results in some sort of generic query as the one created above.

Testing both queries (suitably changing the table) in a DB of mine reveals that they both generate exactly the same query plan. So, the structure shouldn't concern you overly much. In SQL, you tell the system what you want, and it works out how best to do it, and here the optimizer is able to generate the optimal plan given either sample.
As to why LINQ generates code like this, I'd suspect it's just a generalized pattern in its code generator that lets it generate similar code for any aggregation and subsequent transformations, not just for unfiltered counts.

Related

Whats the best solution to Entity Framework cores lack of moderate LINQ query support?

So basically I have a table containing a set of data. This data is then joined onto an organisation table to which multiple users can be apart of. Im then trying to get all files in the table where the user executing the query, has permission to access the organisation. To do this I'm using a where clause that checks the users permissions from the application, to the files that have them organisations linked. Im then selecting the top 100 results and counting the records returned. (I want to see if the user has access to 100+ files over all the organisations).
The problem is when I use the following LINQ query:
(from f in File
join o in Organisation on f.OrganisationId equals o.Id
where permissions.Contains(o.Id.ToString())
select f).Take(100).Count();
The take and the count aren't executed on the SQL server and are run in memory when I try a contains on a list which should convert to an IN (VALUES) query on SQL. I have 70,000+ File records and this is very slow and times out on a web server. This is expected as Entity Framework core is in early stages and does not support moderate or advanced LINQ queries yet.
My question is, is there a better alternative to raw SQL queries while still being able to filter by an array of items and still using Entity Framework core v1.1? Thanks.
Edit: I tried updating to the latest version, this still did not solve my issue as I still got the following output.
The LINQ expression '{permissions => Contains([o].Id.ToString())}' could not be translated and will be evaluated locally.
The LINQ expression 'Contains([o].Id.ToString())' could not be translated and will be evaluated locally.
The LINQ expression 'Take(__p_1)' could not be translated and will be evaluated locally.
The LINQ expression 'Count()' could not be translated and will be evaluated locally.
The warnings are misleading - the problem is the ToString() call which causes client evaluation of the query.
The following should produce the intended SQL query:
var idList = permissions.Select(int.Parse);
var result = (
from f in File
join o in Organisation on f.OrganisationId equals o.Id
where idList.Contains(o.Id)
select f).Take(100).Count();
which in my environment (EF Core v1.1.1) produces the following SQL with no warnings (as expected):
SELECT COUNT(*)
FROM (
SELECT TOP(#__p_1) [f].[Id], [f].[Name], [f].[OrganisationId]
FROM [Files] AS [f]
INNER JOIN [Organisations] AS [o] ON [f].[OrganisationId] = [o].[Id]
WHERE [o].[Id] IN (1, 3, 4)
) AS [t]

C# Linq to MySQL query with join makes bad SQL?

I'm using C# to write LINQ in to a MySQL database. I think the SQL generated might be wrong for a simple table join that I'm doing.
My nuget packages are Mysql.Data v6.9.9, Mysql.data.entities v6.8.3, and MySql.data.entity v6.9.9
The LINQ is this:
query = from peopleResult in query
join t in technologyQuery on peopleResult.Company_Id equals t.Company_Id
select peopleResult;
The SQL generated looks like this:
SELECT ...
FROM `people` AS `Extent1`
INNER JOIN `technologies` AS `Extent2` ON (`Extent1`.`Company_Id` = `Extent2`.`Company_Id`) OR ((`Extent1`.`Company_Id` IS NULL) AND (`Extent2`.`Company_Id` IS NULL))
WHERE ...
Is this part of the join right?
(`Extent1`.`Company_Id` IS NULL) AND (`Extent2`.`Company_Id` IS NULL)
The query is incredibly long running when that is included. I pulled that out of the SQL with a regex, and it runs much faster and seems to give the correct results.
Is my LINQ incorrect or missing something? Does the MySQL linq-to-sql likely have a bug?
Thank you for your time thinking about this.
It's not a MySQL connector bug, but EF feature which tries to emulate the C# equality rules for nullable types.
First, make sure to set DbContext.Configuration.UseDatabaseNullSemantics to true, for instance inside your DbContext derived class constructor:
Configuration.UseDatabaseNullSemantics = true;
By idea this should solve the issue. However they implemented it for comparison operators and forgot the joins. So you have to use the alternative join syntax with where clause:
query =
from peopleResult in query
from t in technologyQuery
where peopleResult.Company_Id == t.Company_Id
select peopleResult;
which will be translated to the desired SQL JOIN without IS NULL part.

Can I Insert the Results of a Select Statement Into Another Table Without a Roundtrip?

I have a web application that is written in MVC.Net using C# and LINQ-to-SQL (SQL Server 2008 R2).
I'd like to query the database for some values, and also insert those values into another table for later use. Obviously, I could do a normal select, then take those results and do a normal insert, but that will result in my application sending the values back to the SQL server, which is a waste as the server is where the values came from.
Is there any way I can get the select results in my application and insert them into another table without the information making a roundtrip from the the SQL server to my application and back again?
It would be cool if this was in one query, but that's less important than avoiding the roundtrip.
Assume whatever basic schema you like, I'll be extrapolating your simple example to a much more complex query.
Can I Insert the Results of a Select Statement Into Another Table Without a Roundtrip?
From a "single-query" and/or "avoid the round-trip" perspective: Yes.
From a "doing that purely in Linq to SQL" perspective: Well...mostly ;-).
The three pieces required are:
The INSERT...SELECT construct:
By using this we get half of the goal in that we have selected data and inserted it. And this is the only way to keep the data entirely at the database server and avoid the round-trip. Unfortunately, this construct is not supported by Linq-to-SQL (or Entity Framework): Insert/Select with Linq-To-SQL
The T-SQL OUTPUT clause:
This allows for doing what is essentially the tee command in Unix shell scripting: save and display the incoming rows at the same time. The OUTPUT clause just takes the set of inserted rows and sends it back to the caller, providing the other half of the goal. Unfortunately, this is also not supported by Linq-to-SQL (or Entity Framework). Now, this type of operation can also be achieved across multiple queries when not using OUTPUT, but there is really nothing gained since you then either need to a) create a temp table to dump the initial results into that will be used to insert into the table and then selected back to the caller, or b) have some way of knowing which rows that were just inserted into the table are new so that they can be properly selected back to the caller.
The DataContext.ExecuteQuery<TResult> (String, Object[]) method:
This is needed due to the two required T-SQL pieces not being supported directly in Linq-to-SQL. And even if the clunky approach to avoiding the OUTPUT clause is done (assuming it could be done in pure Linq/Lambda expressions), there is still no way around the INSERT...SELECT construct that would not be a round-trip.
Hence, multiple queries that are all pure Linq/Lambda expressions equates to a round-trip.
The only way to truly avoid the round-trip should be something like:
var _MyStuff = db.ExecuteQuery<Stuffs>(#"
INSERT INTO dbo.Table1 (Col1, Col2, Col2)
OUTPUT INSERTED.*
SELECT Col1, Col2, Col3
FROM dbo.Table2 t2
WHERE t2.Col4 = {0};",
_SomeID);
And just in case it helps anyone (since I already spent the time looking it up :), the equivalent command for Entity Framework is: Database.SqlQuery<TElement> (String, Object[])
try this query according your requirement
insert into IndentProcessDetails (DemandId,DemandMasterId,DemandQty) ( select DemandId,DemandMasterId,DemandQty from DemandDetails)

Why does Entity Framework generate slow overengineered SQL?

I have this code:
DbSet<TableName> table = ...// stored reference
var items = from n in table where
n.Name.ToUpper().Contains(searchString.ToUpper().Trim())
select n;
WriteToLog( items.ToString() );
The last line outputs the generated SQL. Here's what I get:
SELECT
[Extent1].[Name] AS [Name],
// all the other columns follow
FROM (SELECT
[TableName].[Name] AS [Name],
// all the other columns follow
FROM [dbo].[TableName] AS [TableName]) AS [Extent1]
WHERE ( CAST(CHARINDEX(LTRIM(RTRIM(UPPER(#p__linq__0))), UPPER([Extent1].[Name])) AS int)) > 0
You see, there's SELECT-from-SELECT although it's completely redundant - one SELECT would be just enough. The code using EF runs longer than half a minute and time out on that query although the table is rather small.
Why is this overengineered SQL query generated and how do I make EF generate a better query?
It generates the resulting SQL by transforming an expression tree. It appears overengineered (for example, using a subquery) as a side-effect of the way it's done that transformation.
The details of the transformation are proprietary and complex, and the results are not supposed to be human-readable.
The question is not entirely clear - and you are trying to solve a problem which I believe may not be a problem. Try comparing the generated query and your own - I would guess the query optimiser will make short work of such an easy optimisation.
My guess (and that's probably the best kind of answer you can get here unless a LINQ to Entities MS dev comes along) is that they're doing exactly that: generating the most effective query, but leaving the head-hurtingly-difficult job of optimising the query to the bit they've already put hundreds or thousands of man-days into: the query optimiser in SQL Server.
It does an extra Select but Selects have no cost associated. You can see the estimated query plan and it would show 0% cost in that. It does that because EF is compatible with various RDBMS systems like Oracle, SQL server and to ensure maximum compatibility it might be doing this.
However I do agree that Entity Framework generates UGLY sql. The example that you gave was a very simple Linq query and you'll see more of that ugliness when your queries start becoming complex.
1) While this may or may not answer your answer, I would say use a micro ORM like PetaPoco:
https://github.com/toptensoftware/PetaPoco
or Dapper.Net
https://github.com/SamSaffron/dapper-dot-net
I have been using it in one of my project and I am completely satisfied with the raw speed that you get with plain Ado.Net.
2) My 2nd suggestion would be always use stored procedure for atleast Select statements. For inserts, updates and deletes you should probably use EF and take advantage of change tracking mechamism and that would save your time from writing tedious queries but atleast for Select statements you should try to use plain SQL and that gives you more freedom over what SQL is generated.

Combining LINQ queries with entity framework C#

I have a linq query which selects several fields from my Customer table.
Applied to this method are multiple filters, using Func<IQueryable<T>, IQueryable<T>> with .Invoke.
The original query is essentially select * from customer.
The filter method is essentially select top 10
The output SQL is select top 10 from (select * from customer)
My customer table has over 1,000,000 rows which causes this query to take about 7 seconds to execute in SSMS. If I alter the output SQL to select top 10 from (select top 10 * from customer) by running it in SSMS then the query is instant (as you'd expect).
I am wondering if anyone knows what might cause LINQ to not combine these in a nice way, and if there is a best practice/workaround I can implement.
I should note that my actual code isn't select * it is selecting a few fields, but there is nothing more complex.
I am using SQL Server 2008 and MVC 3 with entity framework (not sure what version)
Edit: I should add, it's IQueryable all the way, nothing is evaluated until the end, and as a result the long execution is confined to that single line.
I don't know why it's not being optimised.
If the filter method really is equivalent to SELECT TOP 10 then you should be able to do it like this:
return query.Take(10);
which would resolve to select top 10 * from customer rather than the more convoluted thing you ended up with.
If this won't work then I'm afraid I'll need a little more detail.
EDIT: To clarify, if you do this in LINQ:
DataItems.Take(10).Take(10)
you would get this SQL:
SELECT TOP (10) [t1].[col1], [t1].[col2]
FROM (
SELECT TOP (10) [t0].[col1], [t0].[col2]
FROM [DataItem] AS [t0]
) AS [t1]
So if you can somehow use a Take(n) you will be okay.

Categories