Combining LINQ queries with entity framework C# - c#

I have a linq query which selects several fields from my Customer table.
Applied to this method are multiple filters, using Func<IQueryable<T>, IQueryable<T>> with .Invoke.
The original query is essentially select * from customer.
The filter method is essentially select top 10
The output SQL is select top 10 from (select * from customer)
My customer table has over 1,000,000 rows which causes this query to take about 7 seconds to execute in SSMS. If I alter the output SQL to select top 10 from (select top 10 * from customer) by running it in SSMS then the query is instant (as you'd expect).
I am wondering if anyone knows what might cause LINQ to not combine these in a nice way, and if there is a best practice/workaround I can implement.
I should note that my actual code isn't select * it is selecting a few fields, but there is nothing more complex.
I am using SQL Server 2008 and MVC 3 with entity framework (not sure what version)
Edit: I should add, it's IQueryable all the way, nothing is evaluated until the end, and as a result the long execution is confined to that single line.

I don't know why it's not being optimised.
If the filter method really is equivalent to SELECT TOP 10 then you should be able to do it like this:
return query.Take(10);
which would resolve to select top 10 * from customer rather than the more convoluted thing you ended up with.
If this won't work then I'm afraid I'll need a little more detail.
EDIT: To clarify, if you do this in LINQ:
DataItems.Take(10).Take(10)
you would get this SQL:
SELECT TOP (10) [t1].[col1], [t1].[col2]
FROM (
SELECT TOP (10) [t0].[col1], [t0].[col2]
FROM [DataItem] AS [t0]
) AS [t1]
So if you can somehow use a Take(n) you will be okay.

Related

Select N Row with DB2 in Visual Studio

I want to get the first 95 data from DB2 in Visual Studio. I'm using table adapter and I have this query,
SELECT * FROM ASEINDTA.TRX_BWS WHERE (DKLDATE
= '2019-10-31') Fetch First 95 Rows Only
or this
SELECT * FROM ASEINDTA.TRX_BWS WHERE (DKLDATE
= '2019-10-31') ORDER BY Col[ 1 ]...Col[ n ]
Fetch First 95 Rows Only
But when I click Query Builder, this error appears.
But when I tried it in DBVisualizer, it works. How do I get that data? A help would be appreciate. Thanks
One approach would work on DB2 and most other databases would be to use ROW_NUMBER with a subquery:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY some_col) rn
FROM ASEINDTA.TRX_BWS
WHERE DKLDATE = '2019-10-31'
)
SELECT *
FROM cte
WHERE rn <= 95;
Or, the inlined version that does not use CTE:
SELECT *
FROM
(
SELECT *, ROW_NUMBER() OVER (ORDER BY some_col) rn
FROM ASEINDTA.TRX_BWS
WHERE DKLDATE = '2019-10-31'
) t
WHERE rn <= 95;
Remove the parentheses of the where clause:
SELECT * FROM ASEINDTA.TRX_BWS WHERE DKLDATE = '2019-10-31' Fetch First 95 Rows Only
It seems the issue is not on db2-server it-self, but from the tool you are using to execute the queries. You said, if you run the same in 'DBVisualizer' it works properly.
Looking at db2 documentation for this error: sqlcode -104 sqlstate 42601
it seems this error is returned by SYSPROC.ADMIN_CMD stored procedure.
This procedure is designed to run db2 administration commands in the target database, remotely. It was not designed to run queries... so, the parser for this proc is just a sub-set of db2 parser, just for specific admin commands.
So it's complaining about the FETCH token from your query.
It seems that the tool you are using 'I'm using table adapter' (no idea on what is it) , is calling this SYSPROC.ADMIN_CMD to execute the queries, but it should be using a regular CLI interface instead.
I don't know what exactly tool you are using. but try to see if it has some sort of settings, so you can change that behavior.
Here is the list of admin cmds that ADMIN_CMD proc can execute:
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.sql.rtn.doc/doc/r0012547.html .
as you can see, no SELECT statement there.
If I try to execute a simple SELECT , using this SP, I get the same error, from db2 CLP window, directly at the server.
db2 "call SYSPROC.ADMIN_CMD('SELECT * FROM DEPARTMENT')"
SQL0104N An unexpected token "SELECT" was found following
"BEGIN-OF-STATEMENT". Expected tokens may include: "ADD". SQLSTATE=42601
Regards
Samuel Pizarro
Ok
I had a closer look at the image error you posted...
the query is wrong. It's not exactly the same queries you posted in your question.
Have a closer look at the image error, and you will see that the Fetch and Rows words are double-quoted.
SELECT ... FROM WHERE (...) "Fetch" First 95 "Rows" Only
Remove the double-quotes from them.
If you are not writing them like that, so, it looks your tool is "changing" it, before submitting to the db2 engine. Again, not a db2 server issue.
Regards

Does anyone know of a way to paginate a call to GetSchema from C#?

I'm using the ADO.NET provider function "GetSchema" to fetch meta data out of a Sql Server database (and an Informix system as well) and want to know if there is anyway to paginate the results. I ask because one of the systems has over 3,000 tables (yes, three thousand) and twice that many views and let's not even talk about the stored procedures.
Needless to say, trying to bring down that list in one shot is too much for the VM I have running (a mere 4GB of memory). I'm already aware of the restrictions that can be applied, these are all tables in the "dbo" schema so there isn't much else that I'm aware of for limiting the result set before it gets to my client.
Instead of using GetSchema I suggest to use the more flexible INFORMATION_SCHEMA system views. These views already divide the information about the Tables, StoredProcedures and Views and you can write a specific query to retrieve your data in a paginated way.
For example to retrieve the first 100 rows of table names you could write a query like this
SELECT *
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY TABLE_NAME) AS RowNum, *
FROM INFORMATION_SCHEMA.TABLES
) AS TableWithRowNum
WHERE RowNum >= 0
AND RowNum < 100
ORDER BY RowNum
Following queries could be easily prepared changing the min and max values used by the query.
The same code could be applied for StoredProcedures (using INFORMATION_SCHEMA.ROUTINES WHERE ROUTINE_TYPE = 'PROCEDURE') or the views using INFORMATION_SCHEMA.VIEWS
Note, if you are using Sql Server 2012 and later the first query could be rewritten to use this syntax
SELECT *
FROM INFORMATION_SCHEMA.TABLES
ORDER BY TABLE_NAME
OFFSET 0 ROWS FETCH NEXT 100 ROWS ONLY
And the C# code could also use parameters for the FIRST (0) and COUNT(100) values

SQL equivalent of Count extension method for LINQ isn't obvious

I'm doing LINQ to entity framework (EF) to get count of records in my table using below code:
using (var db = new StackOverflowEntities())
{
var empLevelCount = db.employeeLevels.Count();
}
I captured the query fired by EF towards database using SQL Server Profiler. I got the following query :
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[employeeLevels] AS [Extent1]
) AS [GroupBy1]
This query remains exactly the same even for LongCount extension method except for the fact that COUNT SQL function gets replaced by COUNT_BIG in the SQL query being created by EF. The query created by LINQ to EF provider looks very weird to me. Why it is not simply doing something like below to return the scalar count value?
SELECT
COUNT(1) AS [A1]
FROM [dbo].[employeeLevels] AS [Extent1]
It will be really helpful if someone can help me understand the additional logistics being taken care of by EF internally which is why LINQ to EF provider is creating such a query? It seems EF is trying to deal with some additional use cases as well through some common algorithm which results in some sort of generic query as the one created above.
Testing both queries (suitably changing the table) in a DB of mine reveals that they both generate exactly the same query plan. So, the structure shouldn't concern you overly much. In SQL, you tell the system what you want, and it works out how best to do it, and here the optimizer is able to generate the optimal plan given either sample.
As to why LINQ generates code like this, I'd suspect it's just a generalized pattern in its code generator that lets it generate similar code for any aggregation and subsequent transformations, not just for unfiltered counts.

Execute SELECT for all returned rows from another SELECT within the same query

With this query:
SELECT id FROM org.employees WHERE {some_condition}
For every row from the above query, I need to call:
SELECT * FROM org.work_schedule(#employeeId, #fromDate, #toDate)
where org.work_schedule is table-valued function that process all of the employee's available work schedules and constraints and return two DATETIME (start, end) columns representing the availabilities of the given employee for the provided date range.
I am thinking using a cursor on the first query and feed a temporary table that would be returned. Is this the only solution?
The project is in C# and I could also accomplish this in C# directly, but I suspect it would be more optimal to do this entirely in SQL (SQL Server 2008).
This seems localized, and I would generalize the question with :
How can I execute a query (SELECT) for every row returned by another query (SELECT) and return the entire results in one call (dynamically do SELECT UNION SELECT UNION ...)?
Thanks
You should use OUTER APPLY or CROSS APPLY instead of a cursor:
SELECT *
FROM ( SELECT id
FROM org.employees
WHERE {some_condition}) A
OUTER APPLY org.work_schedule(A.id, #fromDate, #toDate) B

sp_executesql runs in milliseconds in SSMS but takes 3 seconds from ado.net [duplicate]

This question already has an answer here:
Stored Proc slower from application than Management Studio
(1 answer)
Closed 9 years ago.
This is my dynamic query used on search form which runs in milliseconds in SSMS roughly between 300 to 400 ms:
exec sp_executesql N'set arithabort off;
set transaction isolation level read uncommitted;
With cte as
(Select ROW_NUMBER() OVER
(Order By Case When d.OldInstrumentID IS NULL
THEN d.LastStatusChangedDateTime Else d.RecordingDateTime End
desc) peta_rn,
d.DocumentID
From Documents d
Inner Join Users u on d.UserID = u.UserID
Inner Join IGroupes ig on ig.IGroupID = d.IGroupID
Inner Join ITypes it on it.ITypeID = d.ITypeID
Where 1=1
And (CreatedByAccountID = #0 Or DocumentStatusID = #1 Or DocumentStatusID = #2 )
And (d.JurisdictionID = #3 Or DocumentStatusID = #4 Or DocumentStatusID = #5)
AND ( d.DocumentStatusID = 9 )
)
Select d.DocumentID, d.IsReEfiled, d.IGroupID, d.ITypeID, d.RecordingDateTime,
d.CreatedByAccountID, d.JurisdictionID,
Case When d.OldInstrumentID IS NULL THEN d.LastStatusChangedDateTime
Else d.RecordingDateTime End as LastStatusChangedDateTime,
dbo.FnCanChangeDocumentStatus(d.DocumentStatusID,d.DocumentID) as CanChangeStatus,
d.IDate, d.InstrumentID, d.DocumentStatusID,ig.Abbreviation as IGroupAbbreviation,
u.Username, j.JDAbbreviation, inf.DocumentName,
it.Abbreviation as ITypeAbbreviation, d.DocumentDate,
ds.Abbreviation as DocumentStatusAbbreviation,
Upper(dbo.GetFlatDocumentName(d.DocumentID)) as FlatDocumentName
From Documents d
Left Join IGroupes ig On d.IGroupID = ig.IGroupID
Left Join ITypes it On d.ITypeID = it.ITypeID
Left Join Users u On u.UserID = d.UserID
Left Join DocumentStatuses ds On d.DocumentStatusID = ds.DocumentStatusID
Left Join InstrumentFiles inf On d.DocumentID = inf.DocumentID
Left Join Jurisdictions j on j.JurisdictionID = d.JurisdictionID
Inner Join cte on cte.DocumentID = d.DocumentID
Where 1=1
And peta_rn>=#6 AND peta_rn<=#7
Order by peta_rn',
N'#0 int,#1 int,#2 int,#3 int,#4 int,#5 int,#6 bigint,#7 bigint',
#0=44,#1=5,#2=9,#3=1,#4=5,#5=9,#6=94200,#7=94250
This sql is formed in C# code and the where clauses are added dynamically based on the value the user has searched in search form. It takes roughly 3 seconds to move from one page to 2nd. I already have necessary indexes on most of the columns where I search.
Any idea why would my Ado.Net code be slow?
Update: Not sure if execution plans would help but here they are:
It is possible that SQL server has created inappropriate query plan for ADO.NET connections. We have seen similar issues with ADO, usual solution is to clear any query plans and run slow query again - this may create better plan.
To clear query plans most general solution is to update statistics for involved tables. Like next for you:
update statistics documents with fullscan
Do same for other tables involved and then run your slow query from ADO.NET (do not run SSMS before).
Note that such timing inconsistencies may hint of bad query or database design - at least for us that is usually so :)
If you run a query repeatedly in SSMS, the database may re-use a previously created execution plan, and the required data may already be cached in memory.
There are a couple of things I notice in your query:
the CTE joins Users, IGroupes and ITypes, but the joined records are not used in the SELECT
the CTE performs an ORDER BY on a calculated expression (notice the 85% cost in (unindexed) Sort)
probably replacing the CASE expression with a computed persisted column which can be indexed speeds up execution.
note that the ORDER BY is executed on data resulting from joining 4 tables
the WHERE condition of the CTE states AND d.DocumentStatusID = 9, but AND's other DocumentStatusIDs
paging is performed on the result of 8 JOINed tables.
most likely creating an intermediate CTE which filters the first CTE based on peta_rn improves performance
.net by default uses UTF strings, which equates to NVARCHAR as opposed to VARCHAR.
When you are doing a WHERE ID = #foo in dot net, you are likely to be implicitly doing
WHERE CONVERT(ID, NVARCHAR) = #foo
The result is that this where clause can't be indexed, and must be table scanned. The solution is to actually pass each parameter into the SqlCommand as a DbParameter with the DbType set to VARCHAR (in the case of string).
A similar situation could of course occur with Int types if the .net parameter is "wider" than the SQL column equivalent.
PS The easiest way to "prove" this issue is to run your query in SSMS with the following above
DECLARE #p0 INT = 123
DECLARE #p1 NVARCHAR = "foobar" //etc etc
and compare with
DECLARE #p0 INT = 123
DECLARE #p1 VARCHAR = "foobar" //etc etc

Categories