Why time consuming at .ToList() in linq to entities?

Why time consuming at .ToList() in linq to entities? - c#

We have a website which is using linq to entities, we found that it's very slow recently, after troubleshooting, I found whenever we use linq to entities to search data from database, it will consume very much CPU time, like toList() function. I know it might because we have lots of data in database which caused to slow response, but I just wonder if there are any other reasons which might cause this problem?
How should I do to optimize these kinds of problem? following is the possible reasons:
ToList() might load all object's foreign object(foreign key), how can I force it only load the object?
Is my connection pool too small?
Please let me know if there are any other possible reason, and point me the right direction to solve this issue.

In Linq - a query returns the results from a sequence of manipulations to sources when the query is enumerated.
IQueryable<Customer> myQuery = ...
foreach(Customer c in myQuery) //enumerating the query causes it to be executed
{
}
List<Customer> customers = myQuery.ToList();
// ToList will enumerate the query, and put the results in a list.
// enumerating the query causes it to be executed.
An executing query requires a few things (in no particular order)
A database connection is drawn from the pool.
The query is interpreted by the query provider (in this case, the provider is linq to entities and the interpretation is some form of sql)
The interpretted form is transmitted to the database, where it does what it does and returns data objects.
Some method must be generated to translate the incoming data objects into the desired query output.
The database connection is returned to the pool.
The desired query output may have state tracking done to it before it is returned to your code.
Additionally, the database has a few steps, here listed from the point of view of querying a sql server:
The query text is recieved and checked against the query plan cache for an existing plan.
If no plan exists, a new one is created and stuck into the plan cache by the query optimizer.
The query plan is executed - IO/locks/CPU/Memory - any of these may be bottlenecks
Query results are returned - network may be a bottleneck, particularly if the resultset is large.
So - to find out where the problem with your query is, you need to start measuring. I'll order these targets in the order I'd check them. This is not a complete list.
Get the translated sql text of the query. You can use sql server profiler for this. You can use the debugger. There are many ways to go about it. Make sure the query text returns what you require for your objects, no more no less. Make sure the tables queried match your expectations. Run the query a couple times.
Look at the result set. Is it reasonable or are we looking at 500 Gigs of results? Was a whole table queried, when the whole thing wasn't needed? Was a cartesian result generated unexpectedly?
Get the execution plan of the query (in sql studio, click the show estimated execution plan button). Does the query use the indexes you expect it to? Does the plan look wierd (possibly a bad plan came from the cache)? Does the query work on tables in the order you expect it to, and perform nested/merge/hash joins in the way you expect? Is there parallellization kicking in, when the query doesn't deserve it (this is a sign of bad indexes/TONS of IO)?
Measure the IO of the query. (in sql server, issue SET STATISTICS IO ON). Examine the logical IO per table. Which table stands out? Again, look for a wrong order of table access or an index that can support the query.
If you've made it this far, you've likely found and fixed the problem. I'll keep going though, in case you haven't.
Compare the execution time of the query to the execution time of the enumeration. If there's a large difference, it may be that the code which interprets the data objects is slow or that it generated slow. It could also be that the translation of the query took a while. These are tricky problems to solve (in LinqToSql we use compiled queries to sort them out).
Measure Memory and CPU for the machine the code is running on. If you are capped there, use a code profiler or memory profiler to identify and resolve the issue.
Look at the network stats on the machine, in particular you may want to use TCPView to see the TCP socket connections on the machine. Socket resources may be mis-used (such as opening and closing thousands in a minute).
Examine the database for locks held by other connections.
I guess that's enough. Hope I didn't forget any obvious things to check.

You might find the solution to your problem in Performance Considerations (Entity Framework) on MSDN. In particular
Return the correct amount of data
In some scenarios, specifying a query path using the Include method is
much faster because it requires fewer round trips to the database.
However, in other scenarios, additional round trips to the database to
load related objects may be faster because the simpler queries with
fewer joins result in less redundancy of data. Because of this, we
recommend that you test the performance of various ways to retrieve
related objects. For more information, see Loading Related Objects.
To avoid returning too much data in a single query, consider paging
the results of the query into more manageable groups. For more
information, see How to: Page Through Query Results.

Related

EF6 SQLQuery very slow but database is very fast

I have a performance problem we have done a bunch of analysis and are stuck. Hopefully one of you have seen this before.
I'm calling DbContext.Database.SqlQuery the database portion takes 3ms but the full execution takes 9 seconds.
We've used EF Profiler to discover this and we also run the SQL directly in SQL Server Management Studio and it is instantaneous.
We also used glimpse and couldn't see deep enough into the process.
The result type is not an entity from the model and therefore we are confident that tracking is not involved.
We also know that this is not the first query executed against the context therefore we are not paying EF startup cost on this query.
We tried the .net profiler and had so many problems running it that we decided we should just ask.
Any tips on how to dig in and figure this out ?
EDIT: The result set for this query is 1 row with 4 columns (decimal)
The line of code is just:
var list=contextInstance.Database.SqlQuery<nonEntityType>(sqstring).ToList();
The SQL itself is not a very long string. We will use a more detailed profiler to find out where in the process this is getting hung up.

We've used EF profiler to discover this and we also run the SQL
directly in SQL server management studio and it is instantaneous.
This doesn't prove anything. The query might run fast, but the data might result in 100MB of data which is then transported to the client and materialized in objects. This might take more time than you think.
The query in SSMS might return instantaneous because it shows only part of the data. You didn't say what the data was.
Use a real .NET profiler, like dotTrace or Ants. This way you can see where time is lost exactly on the line. EF Prof (or my own ORM Profiler: http://www.ormprofiler.com) will tell you which part of the total route taken (ORM->DB->ORM) takes what time. Even EF prof does ;)

If the client for some reason can't use a profiler as Frans suggest you will have to play the guessing game and exclude possiblities.
First of all I think a critical piece of information is missing. Does it always take around 9 seconds or does it vary?
First step:
Decide if the delay is before or after the query hits the database. Should be possible to do either with EF profiler and looking at the timestamps in Sql profiler.
Either way you will have limited the possibilities a bit.
Second step:
Exclude as much as possible
Indexes (No, the query is fast)
Returning too much data (No, according to the info you have)
Slow query compilation (No, raw sql query is used)
Slow data transfer (No, the other queries works well)
Slow DbContext initialization (No, you said it's not the first query)
Row or table locks (Not likely, That would probably show up as a long running query in the profiler)
Slow materialization (No, to few fields unless there is a serious edge case bug)
Third step:
What's left? That depends on the answer to #1 and also if it's always 9 seconds.
My prime suspects here is either some connection issue because another call is blocking so it has to wait for a connection or some second level cache or something that doesn't work well with this query.
To exclude some more alternatives I would try to run the same query using plain old ADO.NET. If the problem persists you know it's not a EF problem and very likely a connection issue. If it goes away it could still be both issues though.
Not so much as an answer as some rants, but hopefully something you didn't think of already.

Mid-tier caching for Windows Forms Application

I have a simple Windows Forms Application which is written C# 4.0. The application shows some of the records from database. The application features a query option which is initiated by user.
The records in the database we can call as jobs
Consider the two columns JobID and Status
These being updated by two of the background services which in fact work like a producer consumer services. The status of the job will be updated by these services running behind.
Now for the user, who has an option to query the records from the database, say for e.g. to query data based on status (Submitted, processing, completed). This can result in thousands of records and the GUI might face some performance glitches on displaying these much of data.
Hence, it's important to display chunks of the query results as pages. The GUI isn't refreshed until user manually refresh or make the new query.
Say for e.g. Since the jobs are being constantly updated from the services, the job status can be different at any point of time. The basic requirement that the pages should have the data at the time those were fetched from the DB.
I am using LINQ to SQL for fetching data from the DB. It's quite easy to use but there isn't something mid-level caching required to meet this demand. Using the process memory to cache the results can shoot up page memory to the extreme if the number of records are very high. Unfortunately LINQ isn't providing any mid-tier caching facilities with the DataContext objects.
What are the preferable way to implement a paging mechanism with C# 4.0 + SQL Server + Windows environment?
Some of the alternatives I feel like to have a duplicated table/DB which can temporarily store the results as cache. Or using Enterprising Application Library's Application Cache Block. I believe that this is a typical problem faced by most of the developers. Which is the most efficient way to solve this problem. (NOTE: my application and DB running on same box)

While caching is a sure way to improve performance, implementing a caching strategy properly can be more difficult than it may seem. The problem is managing cache expiration or essentially ensuring that the cache is synchronized up to a desired degree. Therefore, before considering caching consider whether you need it in the first place. Based on what I can gather from the question it seems like the data model is relatively simple and doesn't require any joins. If that is the case, why not optimize the tables and indexes for pagination? SQL server and Linq To SQL will handle pagination for thousands of records transparently and with a breeze.
You are correct in stating that displaying too many records at once is prohibitive for the GUI and it is also prohibitive for the user. No user will want to see more records than are filling the screen at any given time. Given the constraint that the data doesn't need to be refreshed until requested by the user, it should be safe to assume that the number of queries will be relatively low. The additional constraint that the DB is on the same box as the application further solidifies the point that you don't need caching. SQL server already does caching internally.
All advice about performance tuning states that you should profile and measure performance before attempting to make optimizations. As state by Donald Knuth, premature optimization is the root of all evil.

Replication from an SQL Server into memory of a process

I have a couple of tables in a SQL Server database, most of them are updated only rarely, i.e. they are mostly-read.
In order not to have to go to the database every time I read an entry, what we have done is, on startup we load all tables completely into memory of our .net process (the data is small enough), and at intervals of 10 seconds we reread the whole thing and replace our in-memory representation of the data.
This in-memory representation of the data is then used for reading, and we don't have to go synchronously to the DB, unless we want to update the data.
Suffice to say this currently hand-coded process (for each table we have to write code that SELECT * and handles the received rows) is tedious, and bound to attract bugs during the maintenance cycle. In addition, it is obviously inefficient to always read the whole DB and reprocess all entries, even though nothing has changed.
I can think of a couple of meaningful optimizations to the above procedure, but my point is, I don't want to have to do manually what looks like a feature that could come out of the box: The replication of a set of tables into memory of a process to speed up read access.
I guess if I went ORM and used nhibernate etc., I could get something like that in addition to the ORM layer (by means of caching and eager loading).
Now if I don't want the ORM part, just the replication of the lower relational level, is there anything that I can just switch on?

You can look at the metadata and make something generic which can load any table into some kind of structure you like or simply use an ADO.NET DataSet.
Also, instead of reloading your data on a timer even when it hasn't changed, you can subscribe to changes using SqlDependency

c# Parallel loop error

Anyone can help? When i write this code and run. The program show me error stated "There is already an open DataReader associated with this Command which must be closed first".
This is my code.
Parallel.For(0, MthRange, i => {
PSUpfrontFeeForMonth[i] = CommissionSummary
.Where(s => s.TransDate == oReportCommonFilter.fromDate.AddMonths(i))
.Sum(s => s.PSUpfrontFee);
if (!PSUpfrontFeeForMonth[i].HasValue)
{
PSUpfrontFeeForMonth[i] = 0;
}
});
Thanks.
Regards,
Jane

Parallelizing database query is completely wrong for the following reasons:
Query is issued against sql from
each processor so multiple data
readers will be opened -> 'error'
No performance gain is achieved, in fact
the program becomes slower because
each processor is trying to connect to
the database and no parallel
processing is actually done since all
query processing is done in sql! so
the normal serial query is faster in
this case.

If you really need to have multiple database connections open simultaneously (as others have stated not necessarily a good idea) then you can usually specify in the connection string that this is needed.
Typical scenario in which I've used this is for using a DataReader to stream rows from a database table (as I don't know how many rows I need in advanced) and where I then need to make additional queries on other database tables. I do this rather than a single query as it would require multiple complex joins and my app has a good caching layer to reduce queries to the database.
For Microsoft SQL Server you add MultipleActiveResultSets=True; to the connection string

My guess would be something to do with how PSUpfrontFeeForMonth works internally is using data readers.
Since I have no idea how that works, first thing I would try would be to initialise the PSUpfrontFeeForMonth within the loop. Maybe that will ensure a dedicated data reader for each iteration.

Why does a database query only go slow in the application?

I have a webpage that takes 10 minutes to run one query against a database, but the same query returns in less than a second when run from SQL Server Management Studio.
The webpage is just firing SQL at the database that is executing a stored procedure, which in turn is performing a pretty simple select over four tables. Again the code is basic ADO, setting the CommandText on an SqlCommand and then performing an ExecuteReader to get the data.
The webpage normally works quickly, but when it slows down the only way to get it speeded up is to defragment the indexes on the tables being queried (different ones different times), which doesn't seem to make sense when the same query executes so quickly manually.
I have had a look at this question but it doesn't apply as the webpage is literally just firing text at the database.
Does anyone have any good ideas why this is going slow one way and not the other?
Thanks

I would suspect parameter sniffing.
The cached execution plan used for your application's connection probably won't be usable by your SSMS connection due to different set options so it will generate a new different plan.
You can retrieve the cached plans for the stored procedure by using the query below. Then compare to see if they are different (e.g. is the slow one doing index seeks and bookmark lookups at a place where the other one does a scan?)
Use YourDatabase;
SELECT *
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle)
CROSS APPLY sys.dm_exec_query_plan(plan_handle)
cross APPLY sys.dm_exec_plan_attributes(plan_handle) AS epa
where sys.dm_exec_sql_text.OBJECTID=object_id('YourProcName')
and attribute='set_options'

Is there any difference between the command text of the query in the app and the query you are executing manually? Since you said that reindexing helps performance (which also updates statistics), it sounds like it may be getting stuck on a bad execution plan.
You might want to run a sql trace and capture the showplanxml event to see what the execution plan looks like, and also capture sql statement complete (though this can slow the server down if a lot of statements are coming through the system so be careful) to be sure the statement sent to SQL server is the same one you are running manually.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.