How to improve AsEnumerable performance in EF

How to improve AsEnumerable performance in EF - c#

I work on vs2012 ef.
I have 1 to many mapping table structure in my edmx.
var query = (
from bm in this.Context.BilBillMasters.AsEnumerable ()
join g in
(
from c in this.Context.BilBillDetails.AsEnumerable ()
group c by new { c.BillID }
)
on bm.BillID equals (g == null ? 0 : g.Key.BillID) into bDG
from billDetailGroup in bDG.DefaultIfEmpty()
where bm.IsDeleted == false
&& (companyID == 0 || bm.CompanyID == companyID)
&& (userID == 0 || bm.CustomerID == userID)
select new
{
bm.BillID,
BillNo = bm.CustomCode,
bm.BillDate,
BillMonth = bm.MonthFrom,
TransactionTypeID = bm.TransactionTypeID ?? 0,
CustomerID = bm.CustomerID,
Total = billDetailGroup.Sum(p => p.Amount),//group result
bm.ReferenceID,
bm.ReferenceTypeID
}
);
This method is taking close 30 seconds to return back the result in the first run.
Not sure what is wrong.
I tried getting List of results and tried elementAt(0) that is also slow.

As soon as you use AsEnumerable, your query stops being a "queryable". That means that what you're doing is that you're downloading the whole BilBillMasters and BilBillDetails tables and then doing some processing on those in your application, rather than on the SQL server. This is bound to be slow.
The obvious solution is obvious - don't use AsEnumerable - it basically moves processing from the SQL server (which has all the data and indexes etc.) to your application server (which has neither and has to get the data from the DB server; all of the data).
At the very least, you want to limit the amount of data downloaded as much as possible, ie. for example filter the tables by CompanyID and CustomerID before using AsEnumerable. However, overall, I see no reason why the query couldn't be executed completely on the SQL server - this is usually the preferred solution for many reasons.
Overall, it sounds as if you're using the AsEnumerable as a fix to another problem, but it's almost definitely a bad solution - at least without further filtering of the data before using AsEnumerable.

Related

Why a query that is searching through 1 million related records takes considerable time even after applying indexes?

var recordsStates = from s in db.Record
join ss in db.Record_State
on s.Id equals ss.RecordId
join apct in db.RecordTypesContentMapping
on new { t1 = ss.RecordId } equals new { t1 = apct.ContentId } into ss_apct
from apct in ss_apct.DefaultIfEmpty()
where (apct.ContentSourceName == null || apct.ContentSourceName == nameof(Record))
&& ss.isCurrent && ss.StateId == (int)RecordStateType.Publish
&& !ss.isDeleated && !s.isDeleted
&& (searchRecords.CategoryIds.Count == 0 || searchRecords.CategoryIds.Contains((int)s.CategoryId))
&& (string.IsNullOrEmpty(searchRecords.RecordTitle) || (string.IsNullOrEmpty(s.RecordNameModified) ?
s.RecordName.Contains(searchRecords.RecordTitle) :
s.RecordNameModified.Contains(searchRecords.RecordTitle)))
Each table has around 1 million records.
It takes around 7-8 seconds if I send the RecordTitle empty and takes 4-5 seconds if not empty.
I tried applying NC indexes on the subject title and whatnot. It’s of type nvarchar(1000).
Every table is related via foreign key. I don’t know now what makes it slow.

The problem is probably Cartesian explosion
You either improve the query performance by linq to sql ( as you should ) or you can use Split query*.
Note that : there is a possibility that split query can cause dirty data problems , think your first sql query done but before second sql query starts there could be data change with another request thus second query will continue with old data .
Split query usage msdn
Cartesian explosion

LinqToSql query loading unwanted data into memory

I am simulating a join on a linked server through linq to sql. My question is, it appears that linqtosql is bringing all the rows for y.Xstatuses into memory and then doing the join. If this is true how do i keep all the memory on sql server(and still do a cross datacontext join operation), if this not true what is going on that is eating all my ram?
var x = new fooDataContext();
var y = new barDataContext();
var allXNotDeleted = (from w in x.CoolTable
where x.IsDeleted != false).ToList();//for our demo this returns 218 records
var allXWithCompleteStatus = (from notDeleted in allXNotDeleted
join s in y.XStatuses on notDeleted.StatusID equals s.StatusID
where s.StatusID == 1
select notDeleted).Tolist();// insert massive memory gobbler here
return allXwithCompleteStatus;
EDIT:
Trying to implement Kevinbabcock's idea
using (x = new fooDataContext())
using (var y = new barDataContext())
{
var n = (from notDeleted in x.GetTable<CoolTable>()
join z in y.GetTable<Xstatus>() on x.StatusID equals z.StatusID
where z.StatusID == 1 and x.IsDeleted != false
select x).ToList();
}
This still throws a cross context query exeception

It is not possible to perform cross data context query directly on the database.
Fetching in memory one of the recordset (ToList()) forces anyway the other joined to be processed in memory.
If you want to perform everything on sql server you have to have every entity in the same DataContext.

I'd recommend not calling ToList on allXNotDeleted for a start. That will pull those records into memory, which will probably mean that you can't avoid pulling all the other data into memory when you perform your second query.
EDIT:
As an additional note if your tables are particularly big, and you only need data from a few columns, you could set Delay Loaded to True in your database object models for the columns you don't need.
EDIT2:
I have just noticed both queries come from different contexts. In that case I suggest you create a stored procedure and call that from one of the contexts. The sproc should be responsible for spanning the contexts.

Do not call ToList() on allXNotDeleted. This materializes those records in memory, which will cause the entire XStatuses table to also be materialized in memory to perform the join.
Try this:
using(var context = new DataContext(connectionString))
{
var allXNotDeleted =
from w in context.GetTable<CoolTable>()
where x.IsDeleted != false;
var allXWithCompleteStatus = (
from notDeleted in allXNotDeleted
join s in context.GetTable<XStatuses>()
on notDeleted.StatusID equals s.StatusID
where s.StatusID == 1
select notDeleted)
.ToList();
return allXwithCompleteStatus;
}
This will only send a single query to SQL Server, and will only materialize the "notDeleted" values returned from the query. Don't forget to wrap your DataContext instance in using statements so that Dispose() is properly called when they go out of context.
Also, did you mean to filter CoolTable with IsDeleted != false? This is equivalent to IsDeleted == true, which to me indicates that you want to join all deleted records (which the name of your variable, allXNotDeleted, seems to contradict).
EDIT: updated code to work with a single DataContext instance, which should eliminate the "query contains a reference to another DataContext" error. You will need to pass in the ConnectionString to the DataContext constructor if you're not using a derived DataContext class.

LINQ Query takes too long

I am running this query :
List<RerocdByGeo> reports = (from p in db.SOMETABLE.ToList()
where (p.colID == prog && p.status != "some_string" && p.col_date < enddate && p.col_date > startdate)
group p by new {
country = (
(p.some_integer_that_represents_an_id <= -0) ? "unknown" : (from f in db.A_LAGE_TABLE where (f.ID == p.some_integer_that_represents_and_id) select f.COUNTRIES_TABLE.COU_Name).FirstOrDefault()),
p.status }
into g
select new TransRerocdByGeo
{
ColA = g.Sum(x => x.ColA),
ColB = g.Sum(x => x.ColB),
Percentage = (g.Sum(x => x.ColA) != null && g.Sum(x => x.ColA) != 0) ? (g.Sum(x => x.ColB) / g.Sum(x => x.ColA)) * 100 : 0,
Status = g.Key.status,
Country = g.Key.country
}).ToList();
a similar query in sql for the same database would run for a few seconds while this one takes about 30-60 seconds in the good case...
the table SOMETABLE contains abount 10-60 K rows
and the table called here A_LARGE_TABLE contains about 10-20 mill rows
the coulmn some_inteher_that_reoresents_an_id is the id on the large table but can also be 0 or -1 and than needs to get the "unknown" value, so i cannot make a relationship (or can i ? if so please explain)
the COUNTRIES_TABLE contains 100-200 rows.
the coulID and ID are identity columns ...
any suggestions ?

You're calling ToList on "SOMETABLE" right at the start. This is pulling the entire database table, with all rows and all columns, into memory and then performing all of the subsequent operations via Linq-to-objects on that in-memory data structure.
Not only are you suffering the penalty of transferring way more information across the network than you need to (which is slow), but C# isn't able to perform the operations nearly as efficiently as a database. That's partly because it looses access to any indexes, any database caching, any cached compiled queries, it isn't as efficient at dealing with data sets that large to begin with, and any higher level optimizations of the query itself (databases tend to do a lot of that).
Next, you have a query inside of your GroupBy clause from f in db.A_LAGE_TABLE where [...] that is performed for every row in the sequence. If the entire query is evaluated at the database level that would potentially be optimized, and even if it's not you're not going across the network to pass information (which is quite slow) for each record.

from p in db.SOMETABLE.ToList()
This basically says "get every record from SOMETABLE and put it in a List", without filtering at all. This is probably your problem.

How to fix super slow EF/LINQ query executing multiple SQL statements

I have the following code, which is misbehaving:
TPM_USER user = UserManager.GetUser(context, UserId);
var tasks = (from t in user.TPM_TASK
where t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
The first line, UserManager.GetUser just does a simple lookup in the database to get the correct TPM_USER record. However, the second line causes all sorts of SQL chaos.
First off, it's executing two SQL statements here. The first one grabs every single row in TPM_TASK which is linked to that user, which is sometimes tens of thousands of rows:
SELECT
-- Columns
FROM TPMDBO.TPM_USERTASKS "Extent1"
INNER JOIN TPMDBO.TPM_TASK "Extent2" ON "Extent1".TASKID = "Extent2".TASKID
WHERE "Extent1".USERID = :EntityKeyValue1
This query takes about 18 seconds on users with lots of tasks. I would expect the WHERE clause to contain the STAGEID filters too, which would remove the majority of the rows.
Next, it seems to execute a new query for each TPM_PROJECTVERSION pair in the list above:
SELECT
-- Columns
FROM TPMDBO.TPM_PROJECTVERSION "Extent1"
WHERE ("Extent1".PROJECTID = :EntityKeyValue1) AND ("Extent1".VERSIONID = :EntityKeyValue2)
Even though this query is fast, it's executed several hundred times if the user has tasks in a whole bunch of projects.
The query I would like to generate would look something like:
SELECT
-- Columns
FROM TPMDBO.TPM_USERTASKS "Extent1"
INNER JOIN TPMDBO.TPM_TASK "Extent2" ON "Extent1".TASKID = "Extent2".TASKID
INNER JOIN TPMDBO.TPM_PROJECTVERSION "Extent3" ON "Extent2".PROJECTID = "Extent3".PROJECTID AND "Extent2".VERSIONID = "Extent3".VERSIONID
WHERE "Extent1".USERID = 5 and "Extent2".STAGEID > 0 and "Extent2".STAGEID <> 3 and "Extent3".STAGEID <= 10
The query above would run in about 1 second. Normally, I could specify that JOIN using the Include method. However, this doesn't seem to work on properties. In other words, I can't do:
from t in user.TPM_TASK.Include("TPM_PROJECTVERSION")
Is there any way to optimize this LINQ statement? I'm using .NET4 and Oracle as the backend DB.
Solution:
This solution is based on Kirk's suggestions below, and works since context.TPM_USERTASK cannot be queried directly:
var tasks = (from t in context.TPM_TASK.Include("TPM_PROJECTVERSION")
where t.TPM_USER.Any(y => y.USERID == UserId) &&
t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
It does result in a nested SELECT rather than querying TPM_USERTASK directly, but it seems fairly efficient none-the-less.

Yes, you are pulling down a specific user, and then referencing the relationship TPM_TASK. That it is pulling down every task attached to that user is exactly what it's supposed to be doing. There's no ORM SQL translation when you're doing it this way. You're getting a user, then getting all his tasks into memory, and then performing some client-side filtering. This is all done using lazy-loading, so the SQL is going to be exceptionally inefficient as it can't batch anything up.
Instead, rewrite your query to go directly against TPM_TASK and filter against the user:
var tasks = (from t in context.TPM_TASK
where t.USERID == user.UserId && t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
Note how we're checking t.USERID == user.UserId. This produces the same effect as user.TPM_TASK but now all the heavy lifting is done by the database rather than in memory.

C# LINQ MySQL Query optimisation

Using LINQ to MySQL
MySQL TABLE Definition
ID binary(16) PK
UtcTriggerTime datetime NOT NULL
PersonID binary(16) NOT NULL FK
Status int(11) NOT NULL
I have a array of 1000s of PersonIDs(Guids) and for each of the PersonID I would like to pick matching records from the table with the following criteria:
UtcTriggerTime >= PREDEFINED_DATE_TIME (e.g. UtcNow - 30days)
AND
Status=1 OR Status=2
I am currently using a
foreach(var personID in personIDsArray){
var qryResult = (from a in AlertObjects.AlertsTriggered
where a.PersonID == personID &&
(a.Status == 1 || a.Status == 2) &&
a.UtcTriggerTime >= PREDEFINED_DATE_TIME
select a).ToArray();
}
What are the possible options to optimise this for performance? Or is there?
I tried putting an Index on (UtcTriggerTime, PersonID, Status) and then used the array of PersonIDs to do it in one query as follows, but it was even slower which when I thought about makes sense:
var qryResult = (from a in AlertObjects.AlertsTriggered
where personIDsArray.Contains(a.PersonID) &&
(a.Status == 1 || a.Status == 2) &&
a.UtcTimeTriggered >= PREDEFINED_DATE_TIME
group a by a.PersonID into alerts
select alerts).ToArray();

It seems to me that you are dealing with typical Select N+1 problem which is caused by
group a by a.PersonID into alerts
select alerts
part.
Could you look in the generated SQL and see what it looks like?
Also there is no need to put Status in index if there not many values because the increase on the performance will be minimal.
If my quess is right you can look on these question to see how the problem can be handled:
How to Detect Select n+1 problems in Linq to SQL?
http://www.west-wind.com/weblog/posts/2009/Oct/12/LINQ-to-SQL-Lazy-Loading-and-Prefetching
I'm not familiar with MySql but it seems to me that this link could be helpful in diagnosing which queries are slow.
http://www.electrictoolbox.com/show-running-queries-mysql/

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to improve AsEnumerable performance in EF - c#

Related

Why a query that is searching through 1 million related records takes considerable time even after applying indexes?

LinqToSql query loading unwanted data into memory

LINQ Query takes too long

How to fix super slow EF/LINQ query executing multiple SQL statements

C# LINQ MySQL Query optimisation

Categories

Resources