Using LINQ to MySQL
MySQL TABLE Definition
ID binary(16) PK
UtcTriggerTime datetime NOT NULL
PersonID binary(16) NOT NULL FK
Status int(11) NOT NULL
I have a array of 1000s of PersonIDs(Guids) and for each of the PersonID I would like to pick matching records from the table with the following criteria:
UtcTriggerTime >= PREDEFINED_DATE_TIME (e.g. UtcNow - 30days)
AND
Status=1 OR Status=2
I am currently using a
foreach(var personID in personIDsArray){
var qryResult = (from a in AlertObjects.AlertsTriggered
where a.PersonID == personID &&
(a.Status == 1 || a.Status == 2) &&
a.UtcTriggerTime >= PREDEFINED_DATE_TIME
select a).ToArray();
}
What are the possible options to optimise this for performance? Or is there?
I tried putting an Index on (UtcTriggerTime, PersonID, Status) and then used the array of PersonIDs to do it in one query as follows, but it was even slower which when I thought about makes sense:
var qryResult = (from a in AlertObjects.AlertsTriggered
where personIDsArray.Contains(a.PersonID) &&
(a.Status == 1 || a.Status == 2) &&
a.UtcTimeTriggered >= PREDEFINED_DATE_TIME
group a by a.PersonID into alerts
select alerts).ToArray();
It seems to me that you are dealing with typical Select N+1 problem which is caused by
group a by a.PersonID into alerts
select alerts
part.
Could you look in the generated SQL and see what it looks like?
Also there is no need to put Status in index if there not many values because the increase on the performance will be minimal.
If my quess is right you can look on these question to see how the problem can be handled:
How to Detect Select n+1 problems in Linq to SQL?
http://www.west-wind.com/weblog/posts/2009/Oct/12/LINQ-to-SQL-Lazy-Loading-and-Prefetching
I'm not familiar with MySql but it seems to me that this link could be helpful in diagnosing which queries are slow.
http://www.electrictoolbox.com/show-running-queries-mysql/
Related
I work on vs2012 ef.
I have 1 to many mapping table structure in my edmx.
var query = (
from bm in this.Context.BilBillMasters.AsEnumerable ()
join g in
(
from c in this.Context.BilBillDetails.AsEnumerable ()
group c by new { c.BillID }
)
on bm.BillID equals (g == null ? 0 : g.Key.BillID) into bDG
from billDetailGroup in bDG.DefaultIfEmpty()
where bm.IsDeleted == false
&& (companyID == 0 || bm.CompanyID == companyID)
&& (userID == 0 || bm.CustomerID == userID)
select new
{
bm.BillID,
BillNo = bm.CustomCode,
bm.BillDate,
BillMonth = bm.MonthFrom,
TransactionTypeID = bm.TransactionTypeID ?? 0,
CustomerID = bm.CustomerID,
Total = billDetailGroup.Sum(p => p.Amount),//group result
bm.ReferenceID,
bm.ReferenceTypeID
}
);
This method is taking close 30 seconds to return back the result in the first run.
Not sure what is wrong.
I tried getting List of results and tried elementAt(0) that is also slow.
As soon as you use AsEnumerable, your query stops being a "queryable". That means that what you're doing is that you're downloading the whole BilBillMasters and BilBillDetails tables and then doing some processing on those in your application, rather than on the SQL server. This is bound to be slow.
The obvious solution is obvious - don't use AsEnumerable - it basically moves processing from the SQL server (which has all the data and indexes etc.) to your application server (which has neither and has to get the data from the DB server; all of the data).
At the very least, you want to limit the amount of data downloaded as much as possible, ie. for example filter the tables by CompanyID and CustomerID before using AsEnumerable. However, overall, I see no reason why the query couldn't be executed completely on the SQL server - this is usually the preferred solution for many reasons.
Overall, it sounds as if you're using the AsEnumerable as a fix to another problem, but it's almost definitely a bad solution - at least without further filtering of the data before using AsEnumerable.
I have FinancialCompliances and Compliance tables. Below my query work flow is get one latest row .The problem is i have empty values in table because i deleted all rows in my table .But my below query is return one old rows .
var Compliance = (from c in datamodel.Compliances
join f in datamodel.FinancialCompliances on c.ComplianceId equals f.ComplianceId
where (c.VerifierId == userId || c.OwnerId == userId || c.UserId == userId) && (f.ComplianceId == c.ComplianceId)
orderby (f.AddedDate)
select f);
financialCompliance = Compliance.ToList().LastOrDefault();
What is the problem?
It sounds like you may be deleting your objects in the datamodel instance but not saving the changes and resetting the datamodel, thereby keeping the old records still in the context even if they aren't in the database. To be safe, try using a new context for this query.
Also, you may want to consider modifying the query to order the results decending and then selecting the top one rather than ordering them ascending and taking only the last one:
var Compliance = (from c in datamodel.Compliances
join f in datamodel.FinancialCompliances on c.ComplianceId equals f.ComplianceId
where (c.VerifierId == userId || c.OwnerId == userId || c.UserId == userId) && (f.ComplianceId == c.ComplianceId)
orderby (f.AddedDate) descending
select f);
financialCompliance = Compliance.FirstOrDefault();
Perhaps lastordefalt are sending the default value. Could you please confirm that you actually have real data in your object that is returned? I doubt that would be the case.
One obvious problem with your code is that you are calling ToList() before LastOrDefault(). This will cause your code to load all data from your storage into your application and context, and then from the result retrieving the last object. I suspect that this may cause some problems.
Try to skip your .ToList() and call LastOrDefault() directly.
I have the following code, which is misbehaving:
TPM_USER user = UserManager.GetUser(context, UserId);
var tasks = (from t in user.TPM_TASK
where t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
The first line, UserManager.GetUser just does a simple lookup in the database to get the correct TPM_USER record. However, the second line causes all sorts of SQL chaos.
First off, it's executing two SQL statements here. The first one grabs every single row in TPM_TASK which is linked to that user, which is sometimes tens of thousands of rows:
SELECT
-- Columns
FROM TPMDBO.TPM_USERTASKS "Extent1"
INNER JOIN TPMDBO.TPM_TASK "Extent2" ON "Extent1".TASKID = "Extent2".TASKID
WHERE "Extent1".USERID = :EntityKeyValue1
This query takes about 18 seconds on users with lots of tasks. I would expect the WHERE clause to contain the STAGEID filters too, which would remove the majority of the rows.
Next, it seems to execute a new query for each TPM_PROJECTVERSION pair in the list above:
SELECT
-- Columns
FROM TPMDBO.TPM_PROJECTVERSION "Extent1"
WHERE ("Extent1".PROJECTID = :EntityKeyValue1) AND ("Extent1".VERSIONID = :EntityKeyValue2)
Even though this query is fast, it's executed several hundred times if the user has tasks in a whole bunch of projects.
The query I would like to generate would look something like:
SELECT
-- Columns
FROM TPMDBO.TPM_USERTASKS "Extent1"
INNER JOIN TPMDBO.TPM_TASK "Extent2" ON "Extent1".TASKID = "Extent2".TASKID
INNER JOIN TPMDBO.TPM_PROJECTVERSION "Extent3" ON "Extent2".PROJECTID = "Extent3".PROJECTID AND "Extent2".VERSIONID = "Extent3".VERSIONID
WHERE "Extent1".USERID = 5 and "Extent2".STAGEID > 0 and "Extent2".STAGEID <> 3 and "Extent3".STAGEID <= 10
The query above would run in about 1 second. Normally, I could specify that JOIN using the Include method. However, this doesn't seem to work on properties. In other words, I can't do:
from t in user.TPM_TASK.Include("TPM_PROJECTVERSION")
Is there any way to optimize this LINQ statement? I'm using .NET4 and Oracle as the backend DB.
Solution:
This solution is based on Kirk's suggestions below, and works since context.TPM_USERTASK cannot be queried directly:
var tasks = (from t in context.TPM_TASK.Include("TPM_PROJECTVERSION")
where t.TPM_USER.Any(y => y.USERID == UserId) &&
t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
It does result in a nested SELECT rather than querying TPM_USERTASK directly, but it seems fairly efficient none-the-less.
Yes, you are pulling down a specific user, and then referencing the relationship TPM_TASK. That it is pulling down every task attached to that user is exactly what it's supposed to be doing. There's no ORM SQL translation when you're doing it this way. You're getting a user, then getting all his tasks into memory, and then performing some client-side filtering. This is all done using lazy-loading, so the SQL is going to be exceptionally inefficient as it can't batch anything up.
Instead, rewrite your query to go directly against TPM_TASK and filter against the user:
var tasks = (from t in context.TPM_TASK
where t.USERID == user.UserId && t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
Note how we're checking t.USERID == user.UserId. This produces the same effect as user.TPM_TASK but now all the heavy lifting is done by the database rather than in memory.
The Any() linq function seems to load all of the entity's columns even though they're not needed.
The following code:
if(Session.Query<Project>().Any(p=>p.ID == projectID && p.ProjectOwner.Responsible == CurrentUserID))
// Current user is the responsible for this project
Generates the following SQL:
select TOP (1) project0_.ProjectID as ProjectID7_,
project0_.DateCreated as DateCrea2_7_,
project0_.Type as Type7_,
project0_.ProjectOwner_FK as ProjectOy8_7_,
project0_.Address_FK as Address9_7_,
**[Snip -- the statement selects all of Project's columns]**
from [Project] project0_
inner join [OrganizationProject] organizati1_
on project0_.ProjectOwner_FK = organizati1_.OrganizationProjectID
where project0_.ProjectID = 1 /* #p0 */
and organizati1_.Responsible_FK = 1 /* #p1 */
However, the following code:
if(Context.Projects.Where(p=>p.ID == projectID && p.ProjectOwner.Responsible == CurrentUserID).Count() == 1)
// Current user is the responsible for this project
Generates the following sql, which is what is expected:
select cast(count(*) as INT) as col_0_0_
from [Project] project0_
inner join [OrganizationProject] organizati1_
on project0_.ProjectOwner_FK = organizati1_.OrganizationProjectID
where project0_.ProjectID = 1 /* #p0 */
and organizati1_.Responsible_FK = 1 /* #p1 */
The Count() method does what is expected, but it is a bit less straightforward.
Is the Any() behavior considered normal or is it a bug? It doesn't seem optimal to me, but maybe loading the entity isn't really slower than asking SQL to return the count?
In light of this, what is considered to be the best way to test a condition in Linq to NHibernate?
It was interesting but did you tryed:
if(Session.Query<Project>()
.FirstOrDefault(p=>p.ID == projectID
&& p.ProjectOwner.Responsible == CurrentUserID) != null)
I think it would be faster from your current options. In fact it doesn't checks for all items (like count) Also it doesn't fetch all data.
Something like this should create a query with only a single column:
if(Session
.Query<Project>()
.Where(p=>p.ID == projectID && p.ProjectOwner.Responsible == CurrentUserID)
.Select(p=>new { pID = p.ID })
.Any())
{
...
}
Projection to an anonymous type allows NHibernate to fetch only the specified column (ID, for example).
But it is usually more suitable when there is a significant number of rows to retrieve.
Using linq to entities i am connecting to a database, the database has tables in it that has payments that have a multi to multi relationship with jobs. This is acheived via an allocs table. I want a list box with all the jobs that has a column called due price which takes all of the allocations of payments for this job and takes that away from the job price. However, using the below linq to entities statement. The problem is that if the job has no allocations it returns null and therefore the due payment is empty. What i really want is for the due payment to be the job price if there are no allocations however, i cannot think of a way around this. Please help before i finally go insane :-(
var jobs = from j in data.jobs
where j.property.customer.id == customerid
&& j.completed != null
select new
{
j.id,
j.price,
dueprice = j.price - ( from a in data.allocs
where a.job.id == j.id
select a.amount ).Sum(),
lineone = j.property.lineone,
postcode = j.property.postcode,
jobtype = j.jobtype.name,
j.completed
};
You can also use the Null Coalescing operator to simplify code like this:
dueprice = j.price - (( from a in data.allocs
where a.job.id == j.id
select a.amount ).Sum() ?? 0)
This operator returns the first value which is not null, so myNum ?? 0 will return myNum if it is not null, and if it is null the operator will return 0.
select a.amount ).Concat(/* array with one zero element */).Sum()
I actually found an answer my self, i will try oleg's as well as it seems to be a little more precise but i thought of doing this
dueprice = j.price - (( from a in data.allocs
where a.job.id == j.id
select a.amount ).Sum())==null ? j.price:
//sets the due price to the normal sum if it has allocs
j.price - ( from a in data.allocs
where a.job.id == j.id
select a.amount ).Sum(),
however i like the idea of only adding on more line of code instead of my several(plus it's repeated code) i will try this and let you all know. Thanks for the response.