Enumeration of EF stored procedure results

Enumeration of EF stored procedure results - c#

I'm calling a simple stored procedure that returns around 650 rows. There are several joins and the procedure takes about 5-6 seconds. No problem.
Enumerating the results, however, is taking about a minute to complete.
using (var context = new DBContext())
{
var results = context.GetResults(param); //5-6 seconds
var resultList = results.ToList(); //1 minute+
}
I don't use Entity Framework much, but this seems abnormal. Am I doing something wrong? Is there something I can look at to speed this up? The table is huge, but the way I read it, this code should only be enumerating the 650 results... which should take no time at all.
Note: Not sure if this is related, but the time it takes to select all rows from said table is about the same (around a minute)

The solution to my problem was to disable parameter sniffing by creating a copy of the input parameter.
alter procedure dbo.procedure
#param int
as
begin
set nocount on;
declare #paramCopy int
set #paramCopy = #param
...

Based on your recent edit, I have an idea of what's happening. I think that the .GetResults() call is simply getting the query ready to be run, utilizing deferred execution. Only when you are calling .ToList() in the next line is it actually going out and trying to build the entities themselves (hence the time difference).
So why is it taking so long to load? That could be for a number of reasons, including:
You might have lazy loading disabled. This will cause all of the records to be fully loaded, with all of their respective navigational properties as well, and have all of that be tracked by the DbContext. That makes for a lot of memory consumption. You might want to consider turning it on (but not everyone likes having lazy loading enabled).
You are allowing the tracker to track all of the records, which takes up memory. Instead of this, if the data you're grabbing is going to be read-only anyway, you might want to consider the use of AsNoTracking, like in this blog post. That should reduce the load time.
You could be grabbing a lot of columns. I don't know what your procedure returns, but if it's a lot of rows, with lots of different columns, all of that data being shoved into memory will take a loooong time to process. Instead, you might want to consider only selecting as few columns as needed (by using a .Select() before the call to .ToList()) to only grab what you need.

Related

Ideas on incorrect ORDER BY results

I want to emphasize that I'm looking for ideas, not necessarily a concrete answer since it's difficult to show what my queries look like, but I don't believe that's needed.
The process looks like this:
Table A keeps filling up, like a bucket - an SQL job keeps calling SP_Proc1 every minute or less and it inserts multiple records into table A.
At the same time a C# process keeps calling another procedure SP_Proc2 every minute or less that does an ordered TOP 5 select from table A and returns the results to the C# method. After C# code finishes processing the results it deletes the selected 5 records from table A.
I bolded the problematic part above. It is necessary that the records from table A be processed 5 at a time in the order specified, but a few times a month SP_Proc2 selects the ordered TOP 5 records in a wrong order even though all the records are present in table A and have correct column values that are used for ordering.
Something to note:
I'm ordering by integers, not varchar.
The C# part is using 1 thread.
Both SP_Proc1 and SP_Proc2 use a transaction and use READ COMMITTED OR READ COMMITTED SNAPSHOT transaction isolation level.
One column that is used for ordering is a computed value, but a very simple one. It just checks if another column in table A is not null and sets the computed column to either 1 or 0.
There's a unique nonclustered index on primary key Id and a clustered index composed of the same columns used for ordering in SP_Proc2.
I'm using SQL Server 2012 (v11.0.3000)
I'm beginning to think that this might be an SQL bug or maybe the records or index in table A get corrupted and then deleted by the C# process and that's why I can't catch it.
Edit:
To clarify, SP_Proc1 commits a big batch of N records to table A at once and SP_Proc2 pulls the records from table A in batches of 5, it orders the records in the table and selects TOP 5 and sometimes a wrong batch is selected, the batch itself is ordered correctly, but a different batch was supposed to be selected according to ORDER BY. I believe Rob Farley might have the right idea.

My guess is that your “out of order TOP 5” is ordered, but that a later five overlaps. Like, one time you get 1231, 1232, 1233, 1234, and 1236, and the next batch is 1235, 1237, and so on.
This can be an issue with locking and blocking. You’ve indicated your processes use transactions, so it wouldn’t surprise me if your 1235 hasn’t been committed yet, but can just be ignored by your snapshot isolation, and your 1236 can get picked up.
It doesn’t sound like there’s a bug here. What I’m describing above is a definite feature of snapshot isolation. If you must have 1235 picked up in an earlier batch than 1236, then don’t use snapshot isolation, and force your table to be locked until each block of inserts is finished.

An alternative suggestion would be to use a table lock (tablock) for the reading and writing procedures.
Though this is expensive, if you desire absolute consistency then this may be the way to go.

Where to do pagination/filtering? In the database or in the code?

I have to write the code for the following method:
public IEnumerable<Product> GetProducts(int pageNumber, int pageSize, string sortKey, string sortDirection, string locale, string filterKey, string filterValue)
The method will be used by a web UI and must support pagination, sorting and filtering. The database (SQL Server 2008) has ~250,000 products. My question is the following: where do I implement the pagination, sorting and filtering logic? Should I do it in a T-SQL stored procedure or in the C# code?
I think that it is better if I do it in T-SQL but I will end up with a very complex query. On the other hand, doing that in C# implies that I have to load the entire list of products, which is also bad...
Any idea what is the best option here? Am I missing an option?

You would definitely want to have the DB do this for you. Moving ~250K records up from the database for each request will be a huge overhead. If you are using LINQ-to-SQL, the Skip and Take methods will do this (here is an example), but I don't know exactly how efficient they are.

I think other (and potentionaly best) option is to use some higher level framework that shield you from complexity of query writing. EntityFramework, NHibernate and LINQ(toSQL) help you a lot. That said database is typically best place to do it in your case.

today itself I implement pagination for my website. I have done with stored procedure though I am using Entity-Framework. I found that executing a complex query is better then fetching all records and doing pagination with code. So do it with stored procedure.
And I see your code line, which you have attached, I have implemented in same way only.

I would definatly do it in a stored procedure something along the lines of :
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY Quantity) AS row, *
FROM Products
) AS a WHERE row BETWEEN 11 AND 20
If you are using linq then the Take and Skip methods will take care of this for you.

Definitely in the DB for preference, if at all possible.
Sometimes you can mix things up a bit such as if you have the results returned from a database function (not a stored procedure, functions can be parts of larger queries in ways that stored procedures cannot), then you can have another function order and paginate, or perhaps have Linq2SQL or similar call for a page of results from said function, producing the correct SQL as needed.
If you can at least get the ordering done in the database, and will usually only want the first few pages (quite often happens in real use), then you can at least have reasonable performance for those cases, as only enough rows to skip to, and then take, the wanted rows need be loaded from the db. You of course still need to test that performance is reasonable in those rare cases where someone really does look for page 1,2312!
Still, that's only a compromise for cases where paging is very difficult indeed, as a rule always page in the DB unless it's either extremely difficult for some reason, or the total number of rows is guaranteed to be low.

Can I do a very large insert with Linq-to-SQL?

I've got some text data that I'm loading into a SQL Server 2005 database using Linq-to-SQL using this method (psuedo-code):
Create a DataContext
While (new data exists)
{
Read a record from the text file
Create a new Record
Populate the record
dataContext.InsertOnSubmit(record);
}
dataContext.SubmitChanges();
The code is a little C# console application. This works fine so far, but I'm about to do an import of the real data (rather than a test subset) and this contains about 2 million rows instead of the 1000 I've tested. Am I going to have to do some clever batching or something similar to avoid the code falling over or performing woefully, or should Linq-to-SQL handle this gracefully?

It looks like this would work however the changes (and thus memory) that are kept by the DataContext are going to grow with each InsertOnSubmit. Maybe it's adviseable to perform a SubmitChanges every 100 records?
I would also take a look at SqlBulkCopy to see if it doesn't fit your usecase better.

IF you need to do bulk inserts, you should check out SqlBulkCopy
Linq-to-SQL is not really suited for doing large-scale bulk inserts.

You would want to call SubmitChanges() every 1000 records or so to flush the changes so far otherwise you'll run out of memory.
If you want performance, you might want to bypass Linq-To-SQL and go for System.Data.SqlClient.SqlBulkCopy instead.

Just for the record I did as marc_s and Peter suggested and chunked the data. It's not especially fast (it took about an hour and a half as Debug configuration, with the debugger attached and quite a lot of console progress output), but it's perfectly adequate for our needs:
Create a DataContext
numRows = 0;
While (new data exists)
{
Read a record from the text file
Create a new Record
Populate the record
dataContext.InsertOnSubmit(record)
// Submit the changes in thousand row batches
if (numRows % 1000 == 999)
dataContext.SubmitChanges()
numRows++
}
dataContext.SubmitChanges()

LINQ-to-SQL performance issue for mass inserts

I have identified a problem within my application; basically, one sub-routine prepares (lots) of data that is later on inserted into my local database via a LINQ-to-SQL data context. However, even a relatively modest amount of new data (100,000-ish) takes a tremendous amount of time to be saved into the database when SubmitChanges() is called. Most of the time, however, it is more likely that the application has to save around 200,000 to 300,000 rows.
According to SQL Server's profiler, all generated queries look like the one below, and there's one for each item the application inserts.
exec sp_executesql N'INSERT INTO [dbo].[AdjectivesExpanded]([Adjective], [Genus], [Casus], [SingularOrPlural], [Kind], [Form])
VALUES (#p0, #p1, #p2, #p3, #p4, #p5)
SELECT CONVERT(BigInt,SCOPE_IDENTITY()) AS [value]',N'#p0 bigint,#p1 char(1),#p2 tinyint,#p3 bit,#p4 tinyint,#p5 nvarchar(4000)',#p0=2777,#p1='n',#p2=4,#p3=0,#p4=3,#p5=N'neugeborener'
Does anyone have an idea how to increase the performance of mass inserts with LINQ-to-SQL data contexts, ideally without getting rid of the stronlgy-typed DataContext and falling back to hand-written queries per se? Plus, there's little opportunity or room to tune the underlying database. If anything at all, I could disable integrity constraints, if it helps.

Are you doing something like this:
foreach (var adjective in adjectives) {
dataContext.AdjectivesExpanding.InsertOnSubmit(adjective)
dataContext.SubmitChanges();
}
Or:
foreach (var adjective in adjectives) {
dataContext.AdjectivesExpanding.InsertOnSubmit(adjective);
}
dataContext.SubmitChanges();
If it is similar to the first, I would recommend changing it to something like the second. Each call to SubmitChanges is a look through all the tracked objects to see what has changed.
Either way, I'm not convinced that inserting that volume of items is a good idea for Linq-to-Sql because it has to generate and evaluate the SQL each time.
Could you script a stored procedure and add as a DataContext method for the designer?

Have a look at the following page for a simple walk-through of how to change your code to use a Bulk Insert.
You just need to add the (provided) BulkInsert class to your code, make a couple of changes, and you'll see a huge improvement in performance.
Mikes Knowledge Base - BulkInserts with LINQ

ORM is usually not a good idea for mass operations. I'd recommend an old fashioned bulk insert to get the best performance.

Why is IEnumerable.Count() capped at 200?

Is there a limit to the rows that IEnumerable.Count() (or IQueryable.Count()) using LINQ to SQL? For whatever reason, if my query returns more than 200 records, I only get 200 from IEnumerable.Count(). I've even tried using IEnumerable.LongCount() (even though the number of results shouldn't be high enough to need it). I've also verified that calling COUNT on the database returns more than 200 records.
I've checked MSDN and tried Googling it, but to no avail.
Am I going crazy, or is there something that I'm missing somewhere? I suppose this isn't really a big deal (as the program is inserting the right number of records), but it'd be nice to be able to log the number of records transferred.
Could we see some sample code?
public IEnumerable<Irms_tx_modify_profile_ban> ExtractNewAdmits()
{
var results = from a in _dc.Applications
select (Irms_tx_modify_profile_ban)new RMSProfile
{
//assign column names to property names
};
//Don't know why, but Bad Things happen if you don't put the
//OfType call here.
IEnumerable<Irms_tx_modify_profile_ban> narrowed_results = results.OfType<Irms_tx_modify_profile_ban>();
Program.log.InfoFormat("Read {0} records from Banner.", narrowed_results.Count());
return narrowed_results;
}
The reason for the comment about bad things happening is due to the issues brought up in this thread. What I did just find out is that if I call Count on narrowed_results (IEnumerable), it returns the right amount. If I call it on results (IQueryable), it returns just 200. I'll go ahead and solve Skeet's answer (since he mentioned the difference between IQueryable and IEnumerable), but if anyone is able to explain why this is happening, I'd like to hear it.

I've not heard of anything like that, and it does sound very odd.
The most obvious thing to check is what query is being sent to the database. Also, it matters a great deal whether you're calling Enumerable.Count() (i.e. on an IEnumerable<T>) or Queryable.Count() (i.e. on an IQueryable<T>). The former will be iterating through the actual rows in .NET code to retrieve the count; the latter will put the count into the query.
Could we see some sample code?
EDIT: Okay, so having seen the code:
When you didn't call OfType, it was executing the count at the SQL level. That should have been visible in the SQL logged, and should be reproducible with any other SQL tool.
I suspect you didn't really have to call OfType. You could have called AsEnumerable, or just declared results as IEnumerable<Irms_tx_modify_profile_ban>. The important thing is that the type of the variable decides the extension method to use - and thus where the count is executed.
It's worth noting that your current solution is really inefficient - it's fetching all the data, and counting it but ignoring everything but the count. It would be much better to get the count onto the server side - and while I know that doesn't work at the moment, I'm sure with a bit of analysis we can make it work :)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.