Speed up the linq group by statement - c#

I have a table like this
UserID Year EffectiveDate Type SpecialExpiryDate
1 2015 7/1/2014 A
1 2016 7/1/2015 B 10/1/2015
there is no ExpriyDate in the table because it is only valid for one year, so the expiry date can be calculated from the effective date by adding a year.
The result I want to get is like this (the current year's effective date and the next year's expiry date)
UserID EffectiveDate ExpiryDate
1 7/1/2014 7/1/2016
And If the user's type is B, then there will be a special expiry date, so for this person, the result will be
UserID EffectiveDate ExpiryDate
1 7/1/2014 10/1/2015
Here is the code I wrote
var result = db.Table1
.Where(x => x.Year>= 2015 && (x.Type == "A" || x.Type == "B"))
.GroupBy(y => y.UserID)
.OrderByDescending(x => x.FirstOrDefault().Year)
.Select(t => new
{
ID = t.Key,
Type = t.FirstOrDefault().Type,
EffectiveDate = t.FirstOrDefault().EffectiveDate,
ExpiryDate = t.FirstOrDefault().SpecialExpiryDate != null ? t.FirstOrDefault().SpecialExpiryDate : (t.Count() >= 2 ? NextExpiryDate : CurrentExpiryDate)
}
);
The code can get the result I need, but the problem is that in the result set there are about 10000 records which took about 5 to 6 seconds. The project is for a web search API, so I want to speed it up, is there a better way to do the query?
Edit
Sorry I made a mistake, in the select clause it should be
EffectiveDate = t.LastOrDefault().EffectiveDate
but in the Linq of C#, it didn't support this LastOrDefault function transfered to sql, and it cause the new problem, what is the easiest way to get the second item of the group?

You could generate the calculated data on the fly, using a View in your database.
Something like this (pseudocode):
Create View vwUsers AS
Select
UserID,
Year,
EffectiveDate,
EffectiveData + 1 as ExpiryDate, // <--
Type,
SpecialExpiryDate
From
tblUsers
And just connect your LINQ query to that.

Try this:
var result =
db
.Table1
.Where(x => x.Year>= 2015 && (x.Type == "A" || x.Type == "B"))
.GroupBy(y => y.UserID)
.SelectMany(y => y.Take(1), (y, z) => new
{
ID = y.Key,
z.Type,
z.EffectiveDate,
ExpiryDate = z.SpecialExpiryDate != null
? z.SpecialExpiryDate
: (t.Count() >= 2 ? NextExpiryDate : CurrentExpiryDate),
z.Year,
})
.OrderByDescending(x => x.Year);
The .SelectMany(y => y.Take(1) effectively does the .FirstOrDefault() part of your code. By doing this once rather than for many properties you may improve the speed immensely.
In a test I performed using a similarly structured query I got these sub-queries being run when using your approach:
SELECT t0.increment_id
FROM sales_flat_order AS t0
GROUP BY t0.increment_id
SELECT t0.hidden_tax_amount
FROM sales_flat_order AS t0
WHERE ((t0.increment_id IS NULL AND #n0 IS NULL) OR (t0.increment_id = #n0))
LIMIT 0, 1
-- n0 = [100000001]
SELECT t0.customer_email
FROM sales_flat_order AS t0
WHERE ((t0.increment_id IS NULL AND #n0 IS NULL) OR (t0.increment_id = #n0))
LIMIT 0, 1
-- n0 = [100000001]
SELECT t0.hidden_tax_amount
FROM sales_flat_order AS t0
WHERE ((t0.increment_id IS NULL AND #n0 IS NULL) OR (t0.increment_id = #n0))
LIMIT 0, 1
-- n0 = [100000002]
SELECT t0.customer_email
FROM sales_flat_order AS t0
WHERE ((t0.increment_id IS NULL AND #n0 IS NULL) OR (t0.increment_id = #n0))
LIMIT 0, 1
-- n0 = [100000002]
(This continued on for two sub-queries per record number.)
If I ran my approach I got this single query:
SELECT t0.increment_id, t1.hidden_tax_amount, t1.customer_email
FROM (
SELECT t2.increment_id
FROM sales_flat_order AS t2
GROUP BY t2.increment_id
) AS t0
CROSS APPLY (
SELECT t3.customer_email, t3.hidden_tax_amount
FROM sales_flat_order AS t3
WHERE ((t3.increment_id IS NULL AND t0.increment_id IS NULL) OR (t3.increment_id = t0.increment_id))
LIMIT 0, 1
) AS t1
My approach should be much faster.

Related

Complex Linq Query Update as DateTime

There are A and B tables that are related to each other. I want to create a linq query that will update the Status value in the A table if the entire row of relationship lines with the AID column in the B table is equal to or smaller than today's date in the Date field.
For example, according to the table below, the Status values of the rows with ID value 1 (AAA) and 2 (BBB) in Table A will be 1. Its Status value will not change because the line with ID value 3 (CCC) is not smaller than the current date of all the related rows in the B table.
How can I write the most stable and performance linq query?
Today : 2018-7-10
A Table
ID Name Status
1 AAA 0
2 BBB 0
3 CCC 0
B Table
ID AID Date
6 1 2018-5-3
7 2 2018-6-2
8 2 2018-6-4
9 3 2018-10-12
10 3 2018-7-7
Grouping TableB on AID
Selecting the "Max" date in each group.(Each unique AID)
Compares the selected dates with the corresponding Id in Table A.
Sets the Status value to true if the date is less or equal to the current date.
TableB.GroupBy(x => x.AId).Select(group => new { identifier = group.Key, MaxDate = group.Max(m => m.Date) }).ToList().ForEach(y =>
{
if (y.MaxDate <= DateTime.Now.Date)
{
TableA.Where(g => g.Id == y.identifier).First().Status = true;
}
});
This will select AIDs from Table B where Date is samller than now.
we select records from table A where its ID is in List from
previous step
Then we update Status value
A.Where ( a => B.Where( b => b.Date <= DateTime.Now).Select(b => b.AID).Contains(a.ID)).ForEach( a => a.Status = 1 )
/*Fetching those aS Who meet the condition. */
var aList1=(from b in dbset.Bs.Where(x=>x.Date<DateTime.Now)//TimeZone may vary
join a in dbSet.As
on b.AID equals a.ID
select a);
/*Fetching those aS Who don't meet the condition. */
var aList2=(from b in dbset.Bs.Where(x=>x.Date>=DateTime.Now)//TimeZone may vary
join a in dbSet.As
on b.AID equals a.ID
select a);
/*Removing those aS from list1 which occured in list2 */
var aFinalList=(aList1.Except(aList2)).ToList();
/*Updating status */
aFinalList.ForEach(x=>x.Status=1);
aFinalList.SaveChanges();
You can use GroupJoin extension in Lambda to Join the A and B tables then use All extension with your condition (date <= Today or any condition) then update the Status. Something like,
var lstResult = lstA.GroupJoin(lstB, a => new { a.Id }, b => new { Id = b.AId }, (a, b) => new { a, b })
.Select(x =>
{
if (x.b.All(y => y.Date <= DateTime.Now)) //Actual condition here.
{
x.a.Status = true;
return x.a;
}
else return x.a;
});
C# fiddle with sample data.

Convert SQL to EF Linq

I have the following query:
SELECT COUNT(1)
FROM Warehouse.WorkItems wi
WHERE wi.TaskId = (SELECT TaskId
FROM Warehouse.WorkItems
WHERE WorkItemId = #WorkItemId)
AND wi.IsComplete = 0;
And since we are using EF, I'd like to be able to use the Linq functionality to generate this query. (I know that I can give it a string query like this, but I would like to use EF+Linq to generate the query for me, for refactoring reasons.)
I really don't need to know the results of the query. I just need to know if there are any results. (The use of an Any() would be perfect, but I can't get the write code for it.)
So... Basically, how do I write that SQL query as a LINQ query?
Edit: Table Structure
WorkItemId - int - Primary Key
TaskId - int - Foreign Key on Warehouse.Tasks
IsComplete - bool
JobId - int
UserName - string
ReportName - string
ReportCriteria - string
ReportId - int - Foreign Key on Warehouse.Reports
CreatedTime - DateTime
The direct translation could be something like this
var result = db.WorkItems.Any(wi =>
!wi.IsComplete && wi.TaskId == db.WorkItems
.Where(x => x.WorkItemId == workItemId)
.Select(x => x.TaskId)
.FirstOrDefault()));
Taking into account the fact that SQL =(subquery), IN (subquery) and EXISTS(subquery) in nowadays modern databases are handled identically, you can try this instead
var result = db.WorkItems.Any(wi =>
!wi.IsComplete && db.WorkItems.Any(x => x.WorkItemId == workItemId
&& x.TaskId == wi.TaskId));
Turns out that I just needed to approach the problem from a different angle.
I came up with about three solutions with varying Linq syntaxes:
Full method chain:
var q1 = Warehouse.WorkItems
.Where(workItem => workItem.TaskId == (from wis in Warehouse.WorkItems
where wis.WorkItemId == workItemId
select wis.TaskId).First())
.Any(workItem => !workItem.IsComplete);
Mixed query + method chain:
var q2 = Warehouse.WorkItems
.Where(workItem => workItem.TaskId == Warehouse.WorkItems
.Where(wis => wis.WorkItemId == workItemId)
.Select(wis => wis.TaskId)
.First())
.Any(workItem => !workItem.IsComplete);
Full query:
var q3 = (from wi in Warehouse.WorkItems
where wi.TaskId == (from swi in Warehouse.WorkItems
where swi.WorkItemId == workItemId
select swi.TaskId).First()
where !wi.IsComplete
select 1).Any();
The only problems with this is that it comes up with some really jacked up SQL:
SELECT
(CASE
WHEN EXISTS(
SELECT NULL AS [EMPTY]
FROM [Warehouse].[WorkItems] AS [t0]
WHERE (NOT ([t0].[IsComplete] = 1)) AND ([t0].[TaskId] = ((
SELECT TOP (1) [t1].[TaskId]
FROM [Warehouse].[WorkItems] AS [t1]
WHERE [t1].[WorkItemId] = #p0
)))
) THEN 1
ELSE 0
END) AS [value]
You can use the Any() function like so:
var result = Warehouse.WorkItems.Any(x => x.WorkItemId != null);
In short, you pass in your condition, which in this case is checking whether or not any of the items in your collection have an ID
The variable result will tell you whether or not all items in your collection have ID's.
Here's a helpful webpage to help you get started with LINQ: http://www.dotnetperls.com/linq
Subquery in the original SQL was a useless one, thus not a good sample for Any() usage. It is simply:
SELECT COUNT(*)
FROM Warehouse.WorkItems wi
WHERE WorkItemId = #WorkItemId
AND wi.IsComplete = 0;
It looks like, since the result would be 0 or 1 only, guessing the purpose and based on seeking how to write Any(), it may be written as:
SELECT CASE WHEN EXISTS ( SELECT *
FROM Warehouse.WorkItems wi
WHERE WorkItemId = #WorkItemId AND
wi.IsComplete = 0 ) THEN 1
ELSE 0
END;
Then it makes sense to use Any():
bool exists = db.WorkItems.Any( wi => wi.WorkItemId == workItemId & !wi.IsComplete );
EDIT: I misread the original query in a hurry, sorry. Here is an update on the Linq usage:
bool exists = db.WorkItems.Any( wi =>
db.WorkItems
.SingleOrDefault(wi.WorkItemId == workItemId).TaskId == wi.TaskId
&& !wi.IsComplete );
If the count was needed as in the original SQL:
var count = db.WorkItems.Count( wi =>
db.WorkItems
.SingleOrDefault(wi.WorkItemId == workItemId).TaskId == wi.TaskId
&& !wi.IsComplete );
Sorry again for the confusion.

How to return value from 2 tables in one linq query

please consider this table:
PK_Id Number Year Month Value
-------------------------------------------------------------------------
1 1 2000 5 100000
410 4 2000 6 10000
8888 1 2001 5 100
I Id=8888 and now I want to first select record with Id=8888 and second select previos year of that record*(I mean Id=1)*. How I can do this with linq and one query.
basically we have some queries that first it should find a value from a table (that may be not PK) and find Corresponding records in another tables. How I can do this with linq and one reference to database.
thanks
from a in Record
where a.PK_Id == 8888
from b in Record
where b.Number == a.Number && b.Year == a.Year - 1
select new { Current = a, Previous = b }
or
Record
.Where(a => a.PK_Id == 888)
.SelectMany(a =>
Record
.Where(b => b.Number == a.Number && b.Year == a.Year - 1)
.Select(b => new { Current = a, Previous = b })
If I understand your question right, then you need to filter the data of one table and join two tables.
You can join the tables and filter your data
var query = from c in Table1
join o in Table2 on c.Col1 equals o.Col2
where o.Col3 == "x"
select c;
or you can filter your data from one table and then join the tables (result will be the same)
var query = from c in Table1.Where(item => item.Col3 == "x")
join o in Table2 on c.Col1 equals o.Col2
select c;

Using Linq to SQL is it possible to retrieve two values and subtract them?

I am populating a class using Linq to SQL.
What I am trying to do is query my database, return two integer values and subtract the two values from each other, producing the result, but I can't think of a smart way to do it.
What can I do in this case ?
If it is not clear, , then this psuedocode implementation should clarify what functionality I wish for :
DECLARE #currentVal INT, #previousVal INT
#currentVal = SELECT VALUE
FROM Table1
WHERE Date = CURRDATE()
#previousVal = SELECT VALUE
FROM Table1
WHERE Date = MIN(Date)
RETURN #currentVal - #previousVal
But in Linq to SQL, (from o in context.Table1 where Date = currentDate select Value), how can I subtract the other value from this? Is this possible?
I'd stick to having it as a broken out set of queries, because you can then test if the values were actually returned or not and handle the case where too many values are returned:
var currentValResults = (from row in rows
where row.Date == DateTime.Now
select row.Value)
.ToArray();
var previousValResults = (from row in rows
let minDate = rows.Min(r => r.Date)
where row.Date == minDate
select row.Value)
.ToArray();
if (currentValResults.Length == 1 && previousValResults.Length == 1)
{
var diff = currentValResults[0] - previousValResults[0];
}
else
{
// Error condition?
}
Putting it all into a giant linq statement makes too many assumptions (or at least, my implementation does).
Why not simply do a cross join
var query=
from a in Table1 where a.Date == DateTime.Now
from b in Table1 where b.Date == Table1.Min(c=>c.Date)
select a.Amount - b.Amount;
var result=query.First();
Something like this would work to keep it into one trip to the db (Keep in mind this assumes that only two results will be returned):
int[] values = (from o in context.Table1
where Date = currentDate || Date = context.Table1.Min(x => x.Date)
order by Date descending
select value).ToArray();
return values[0] - values[1];
var currentVal = context.Table1.FirstOrDefault(t=>t.Date == DateTime.Now.Date);
var previousVal = context.Table1.FirstOrDefault(t=>t.Date == context.Table1.Min(d=>d.Date));
var result = currentVal - previousVal;
Or
from d in context.Table1
let prevVal = context.Table1.FirstOrDefault(t=>t.Date == context.Table1.Min(d=>d.Date));
where d.Date == DateTime.Now.Date
return new { d - prevVal };

Grouping by Time ranges in Linq

I've been getting stuck into some linq queries for the first time today and I'm struggling with some of the more complicated ones. I'm building a query to extract data from a table to build a graph. The tables colums I'm interested in are Id, Time and Value.
The user will select a start time, an end time and the number of intervals (points) to graph. The value column will averaged for each interval.
I can do this with a linq request for each interval but I'm trying to write it in one query so I only need to go to the database once.
So far I have got:
var timeSpan = endTime.Subtract(startTime);
var intervalInSeconds = timeSpan.TotalSeconds / intervals;
var wattList = (from t in _table
where t.Id == id
&& t.Time >= startTime
&& t.Time <= endTime
group t by intervalInSeconds // This is the bit I'm struggling with
into g
orderby g.Key
select g.Average(a => a.Value))
).ToList();
Any help on grouping over time ranges will be most welcome.
I've done this myself for exactly the same situation you describe.
For speed, modified the database's datapoints table to include an integer-based time column, SecondsSince2000, and then worked with that value in my LINQ to SQL query. SecondsSince2000 is a computed column defined as:
datediff(second, dateadd(month,1200,0), DataPointTimeColumn) PERSISTED
Where DataPointTimeColumn is the name of the column that stores the datapoint's time. The magic function call dateadd(month,1200,0) returns 2000-01-01 at midnight, so the column stores the number of seconds since that time.
The LINQ to SQL query is then made much simpler, and faster:
int timeSlotInSeconds = 60;
var wattList =
(from t in _table
where t.Id == id
&& t.Time >= startTime
&& t.Time <= endTime
group t by t.SecondsSince2000 - (t.SecondsSince2000 % timeSlotInSeconds)
into g
orderby g.Key
select g.Average(a => a.Value))).ToList();
If you can't modify your database, you can still do this:
var baseTime = new DateTime(2000, 1, 1);
var wattList =
(from t in _table
where t.Id == id
&& t.Time >= startTime
&& t.Time <= endTime
let secondsSince2000 = (int)(t.Time- baseTime).TotalSeconds
group t by secondsSince2000 - (secondsSince2000 % timeSlotInSeconds)
into g
orderby g.Key
select g.Average(a => a.Value))).ToList();
The query will be quite a bit slower.
Check out this example I wrote a while ago. It sounds like what you are trying to do, but I'm not sure if it does the grouping in SQL or by .NET.
http://mikeinmadison.wordpress.com/2008/03/12/datetimeround/
Maybe you can do something like:
var wattList = (from t in _table
where t.Id == id
&& t.Time >= startTime
&& t.Time <= endTime
).GroupBy(x => (int) ((x.Time - startTime).TotalSeconds / intervalInSeconds))
.Select(grp => grp.Average(x => x.Value));

Categories