Is this LINQ Query "correct"? - c#

I have the following LINQ query, that is returning the results that I expect, but it does not "feel" right.
Basically it is a left join. I need ALL records from the UserProfile table.
Then the LastWinnerDate is a single record from the winner table (possible multiple records) indicating the DateTime the last record was entered in that table for the user.
WinnerCount is the number of records for the user in the winner table (possible multiple records).
Video1 is basically a bool indicating there is, or is not a record for the user in the winner table matching on a third table Objective (should be 1 or 0 rows).
Quiz1 is same as Video 1 matching another record from Objective Table (should be 1 or 0 rows).
Video and Quiz is repeated 12 times because it is for a report to be displayed to a user listing all user records and indicate if they have met the objectives.
var objectiveIds = new List<int>();
objectiveIds.AddRange(GetObjectiveIds(objectiveName, false));
var q =
from up in MetaData.UserProfile
select new RankingDTO
{
UserId = up.UserID,
FirstName = up.FirstName,
LastName = up.LastName,
LastWinnerDate = (
from winner in MetaData.Winner
where objectiveIds.Contains(winner.ObjectiveID)
where winner.Active
where winner.UserID == up.UserID
orderby winner.CreatedOn descending
select winner.CreatedOn).First(),
WinnerCount = (
from winner in MetaData.Winner
where objectiveIds.Contains(winner.ObjectiveID)
where winner.Active
where winner.UserID == up.UserID
orderby winner.CreatedOn descending
select winner).Count(),
Video1 = (
from winner in MetaData.Winner
join o in MetaData.Objective on winner.ObjectiveID equals o.ObjectiveID
where o.ObjectiveNm == Constants.Promotions.SecVideo1
where winner.Active
where winner.UserID == up.UserID
select winner).Count(),
Quiz1 = (
from winner2 in MetaData.Winner
join o2 in MetaData.Objective on winner2.ObjectiveID equals o2.ObjectiveID
where o2.ObjectiveNm == Constants.Promotions.SecQuiz1
where winner2.Active
where winner2.UserID == up.UserID
select winner2).Count(),
};

You're repeating join winners table part several times. In order to avoid it you can break it into several consequent Selects. So instead of having one huge select, you can make two selects with lesser code. In your example I would first of all select winner2 variable before selecting other result properties:
var q1 =
from up in MetaData.UserProfile
select new {up,
winners = from winner in MetaData.Winner
where winner.Active
where winner.UserID == up.UserID
select winner};
var q = from upWinnerPair in q1
select new RankingDTO
{
UserId = upWinnerPair.up.UserID,
FirstName = upWinnerPair.up.FirstName,
LastName = upWinnerPair.up.LastName,
LastWinnerDate = /* Here you will have more simple and less repeatable code
using winners collection from "upWinnerPair.winners"*/

The query itself is pretty simple: just a main outer query and a series of subselects to retrieve actual column data. While it's not the most efficient means of querying the data you're after (joins and using windowing functions will likely get you better performance), it's the only real way to represent that query using either the query or expression syntax (windowing functions in SQL have no mapping in LINQ or the LINQ-supporting extension methods).
Note that you aren't doing any actual outer joins (left or right) in your code; you're creating subqueries to retrieve the column data. It might be worth looking at the actual SQL being generated by your query. You don't specify which ORM you're using (which would determine how to examine it client-side) or which database you're using (which would determine how to examine it server-side).
If you're using the ADO.NET Entity Framework, you can cast your query to an ObjectQuery and call ToTraceString().
If you're using SQL Server, you can use SQL Server Profiler (assuming you have access to it) to view the SQL being executed, or you can run a trace manually to do the same thing.
To perform an outer join in LINQ query syntax, do this:
Assuming we have two sources alpha and beta, each having a common Id property, you can select from alpha and perform a left join on beta in this way:
from a in alpha
join btemp in beta on a.Id equals btemp.Id into bleft
from b in bleft.DefaultIfEmpty()
select new { IdA = a.Id, IdB = b.Id }
Admittedly, the syntax is a little oblique. Nonetheless, it works and will be translated into something like this in SQL:
select
a.Id as IdA,
b.Id as Idb
from alpha a
left join beta b on a.Id = b.Id

It looks fine to me, though I could see why the multiple sub-queries could trigger inefficiency worries in the eyes of a coder.
Take a look at what SQL is produced though (I'm guessing you're running this against a database source from your saying "table" above), before you start worrying about that. The query providers can be pretty good at producing nice efficient SQL that in turn produces a good underlying database query, and if that's happening, then happy days (it will also give you another view on being sure of the correctness).

Related

Entity Framework LINQ to Entities Join Query Timeout

I am executing the following LINQ to Entities query but it is stuck and does not return response until timeout. I executed the same query on SQL Server and it return 92000 in 3 sec.
var query = (from r in WinCtx.PartsRoutings
join s in WinCtx.Tab_Processes on r.ProcessName equals s.ProcessName
join p in WinCtx.Tab_Parts on r.CustPartNum equals p.CustPartNum
select new { r}).ToList();
SQL Generated:
SELECT [ I omitted columns]
FROM [dbo].[PartsRouting] AS [Extent1]
INNER JOIN [dbo].[Tab_Processes] AS [Extent2] ON ([Extent1].[ProcessName] = [Extent2].[ProcessName]) OR (([Extent1].[ProcessName] IS NULL) AND ([Extent2].[ProcessName] IS NULL))
INNER JOIN [dbo].[Tab_Parts] AS [Extent3] ON ([Extent1].[CustPartNum] = [Extent3].[CustPartNum]) OR (([Extent1].[CustPartNum] IS NULL) AND ([Extent3].[CustPartNum] IS NULL))
PartsRouting Table has 100,000+ records, Parts = 15000+, Processes = 200.
I tried too many things found online but nothing worked for me as to how I can achieve the result with same performance of SQL.
Based on the comments, looks like the issue is caused by the additional OR with IS NULL conditions in joins generated by the EF SQL translator. They were added in EF in order to emulate the C# == operator semantics which are different from SQL = for NULL values.
You can start by turning that EF behavior off through UseDatabaseNullSemantics property (it's false by default):
WinCtx.Configuration.UseDatabaseNullSemantics = true;
Unfortunately that's not enough, because it fixes the normal comparison operators, but they simply forgot to do the same for join conditions.
In case you are using joins just for filtering (as it seems), you can replace them with LINQ Any conditions which translates to SQL EXISTS and nowadays database query optimizers are treating it the same way as if it was an inner join:
var query = (from r in WinCtx.PartsRoutings
where WinCtx.Tab_Processes.Any(s => r.ProcessName == s.ProcessName)
where WinCtx.Tab_Parts.Any(p => r.CustPartNum == p.CustPartNum)
select new { r }).ToList();
You might also consider using just select r since creating anonymous type with single property just introdeces additional memory overhead with no advantages.
Update: Looking at the latest comment, you do need fields from joined tables (that's why it's important to not omit relevant parts of the query in question). In such case, you could try the alternative join syntax with where clauses:
WinCtx.Configuration.UseDatabaseNullSemantics = true;
var query = (from r in WinCtx.PartsRoutings
from s in WinCtx.Tab_Processes where r.ProcessName == s.ProcessName
from p in WinCtx.Tab_Parts where r.CustPartNum == p.CustPartNum
select new { r, s.Foo, p.Bar }).ToList();

LINQ Null Join with Pivot Table

I'm trying to get a list of servers thay may or may not belong to 1 or more groups to display in a grid.
Example
ServerID IP GroupID
1 192.168.1.44 1
1 192.168.1.44 10
2 192.168.1.45 1
3 192.168.1.46 2
4 192.168.1.47 null
5 192.168.1.48 null
If I have no records In the GroupServer Table. (Since there is no groups or groups exist but they are not assigned) I expect to get something like this:
ServerID IP GroupID
1 192.168.1.44 null
2 192.168.1.45 null
3 192.168.1.46 null
4 192.168.1.47 null
5 192.168.1.48 null
Since is a Many-to-Many relationship. I have
Group Table
Server Table
GroupServer Table
I could not find a LINQ Pivot Table example.
So I tried to buid my own.
var query = (from sg in context.ServerGroups
join servers in context.Servers on sg.ServerID equals servers.ID
join groups in context.Groups on sg.GroupID equals groups.ID
into serverxgroup
from gAddrBilling in serverxgroup.DefaultIfEmpty()
select new
{
ServerID = sg.ServerID,
ServerIP = server.IP,
GroupID = sg.GroupID
});
The Query above does not retrieve anything
And I quiet dont understand what the "from gAddrBilling" is for. Since I modify a snippet I was trying to make work. So I wonder if someone has already faced a problem like this and give me some hint, snippet or advice about what is what I'm missing.
Thank you.
First, this is not a pivot query, but a regular query on many-to-may relationship via explicit junction table.
Second, looks like you are using Entity Framework, in which case you'd better define and use navigation properties rather than manual joins.
Third, and the most important, the structure of the query is wrong. If you want to get a list of servers that may or may not belong to 1 or more groups, then you should start your query from Servers (the table which records you want to be always included, not from link table where some ServerID are missing) and then use left outer joins to the other tables like this:
var query =
from s in servers in context.Servers
join sg in context.ServerGroups on s.ID equals sg.ServerID
into s_sg from sg in s_sg.DefaultIfEmpty() // make the above LEFT OUTER JOIN
// You can remove the next two lines if all you need is the GroupId
// and keep them if you need some other Group field in the select
join g in context.Groups on sg.GroupID equals g.ID
into sg_g from g in sg_g.DefaultIfEmpty() // make the above LEFT OUTER JOIN
select new
{
ServerID = s.ID,
ServerIP = s.IP, // or sg.IP?
GroupID = (int?)sg.GroupID
};

Does Entity Framework query the database multiple times if I use different fields of the same Linq query at different times?

I tried the Internet and the SOF but couldn't locate a helpful resource. Perhaps I may not be using correct wording to search. If there are any previous questions I have missed due to this reason please let me know and I will take this question down.
I am dealing with a busy database so I am required to send less queries to the database.
If I access different columns of the same Linq query from different levels of the code then is Entity Framework smart enough to foresee the required columns and bring them all or does it call the db twice?
eg.
var query = from t1 in table_1
join t2 in table_2 on t1.col1 equals t2.col1
where t1.EmployeeId == EmployeeId
group new { t1, t2 } by t1.col2 into grouped
orderby grouped.Count() descending
select new { Column1 = grouped.Key, Column2 = grouped.Sum(g=>g.t2.col4) };
var records = query.Take(10);
// point x
var x = records.Select(a => a.Column1).ToArray();
var y = records.Select(a => a.Column2).ToArray();
Does EF generate query the database twice to faciliate x and y (send a query first to get Column1, and then send another to get Column2) or is it smart enough to know it needs both Columns to be materialised and bring them both at point x?
Added to clarify the intention of the question:
I understand I can simply add a greedy method to the end of query.Take(10) and get it done but I am trying to understand if the approach I try (and in my opinion, more elegant) does work of if not what makes EF to make two queries please.
Yes currently your code will generate 2 queries that will be executed to the database. Reason being is because you have 2 different sqls generated:
First is the top query, taking only 10 records and then only Column1
Second is the top query, taking only 10 records and then only Column2
The reason these are 2 queries is because you have a ToArray over different Select statements -> generating different sql. Most of linq queries are differed executed and will be executed only when you use something like ToArray()/ToList()/FirstOrDefault() and so on - those that actually give you the concrete data. In your original query you have 2 different ToArray on data that has not yet been retrieved - meaning 2 queries (once for the first field and then for the second).
The following code will result in a single query to the database
var records = (from t1 in table_1
join t2 in table_2 on t1.col1 equals t2.col1
where t1.EmployeeId == EmployeeId
group new { t1, t2 } by t1.col2 into grouped
orderby grouped.Count() descending
select new { Column1 = grouped.Key, Column2 = grouped.Sum(g=>g.t2.col4) })
.Take(10).ToList();
var x = records.Select(a => a.Column1).ToArray();
var y = records.Select(a => a.Column2).ToArray();
In my solution above I added a ToList() after filtering out only that data you need (Take(10)) and then at that point it will execute to the database. Then you have all the data in memory and you can do any other linq operation over it without it going again to the database.
Add to your code ToString() so you can check the generated sql at different points. Then you will understand when and what is being executed:
var query = from t1 in table_1
join t2 in table_2 on t1.col1 equals t2.col1
where t1.EmployeeId == EmployeeId
group new { t1, t2 } by t1.col2 into grouped
orderby grouped.Count() descending
select new { Column1 = grouped.Key, Column2 = grouped.Sum(g=>g.t2.col4) };
var generatedSql = query.ToString(); // Here you will see a query that brings all records
var records = query.Take(10);
generatedSql = query.ToString(); // Here you will see it taking only 10 records
// point x
var xQuery = records.Select(a => a.Column1);
generatedSql = xQuery.ToString(); // Here you will see only 1 column in query
// Still nothing has been executed to DB at this point
var x = xQuery.ToArray(); // And that is what will be executed here
// Now you are before second execution
var yQuery = records.Select(a => a.Column2);
generatedSql = yQuery.ToString(); // Here you will see only the second column in query
// Finally, second execution, now with the other column
var y = yQuery.ToArray();
When you are running linq statement on an entity in EF if only prepares the Select statement (thats why the type is IQueryable). The data is loaded lazily. When you try to use a value from that query then only the result gets evaluated using a enumerator.
So when you turn it to a collection (.toList() etc.) explicitly it tries to get data to populate the list and hence the sql command is fired.
It is designed so to enhance the performance. So if a particular property of an entity is to be used EF doesn't get the value for all the columns from that table

LINQ Group by and having where clause

Below is the SQL Query I am trying to translate
SELECT dbo.Contracts.Supplier
FROM dbo.Contracts INNER JOIN dbo.Products ON dbo.Contracts.Product = dbo.Products.Product
where dbo.Products.ProductGroup='Crude'
GROUP BY dbo.Contracts.Supplier
Am I doing something wrong because I do not get same results with the following LINQ
var result = from c in context.Contracts
join p in context.Products on c.Product equals p.Product1
where p.Product1.Equals("Crude")
group c by c.Supplier into g
select new { supplier = g.Key };
It is generating a weird statement
SELECT
1 AS [C1],
[Distinct1].[Supplier] AS [Supplier]
FROM ( SELECT DISTINCT
[Extent1].[Supplier] AS [Supplier]
FROM [dbo].[Contracts] AS [Extent1]
WHERE N'Crude' = [Extent1].[Product]
) AS [Distinct1]
Using distinct would work but to get same results, LINQ should be generating a statement like so (it's like it is ignoring the join):
SELECT distinct dbo.Contracts.Supplier
FROM dbo.Contracts INNER JOIN dbo.Products ON dbo.Contracts.Product = dbo.Products.Product
where dbo.Products.ProductGroup='Crude'
I'm assuming that you are using 'EntityFramework' or 'Linq To SQL'. If so, you should be able to use navigation properties to navigate to product and filter invalit results out. This way your query might look something like this:
var result = (from c in context.Contracts
where c.Products.Any(p => p.ProductGroup == "Crude")
select c.Supplier).Distinct();
It will automatically convert into correct query (in this case possibly without join even, just using Exists sql keyword) and return distinct suppliers. This is if I understand your objective correctly - you want to obtain all suppliers assigned to contracts that contain product from 'Crude' product group.
Basically you should try to avoid using joins from linq to sql or linq to entities as much as possible when you can use navigation properties. System will probably be better at converting them into specific sql.

Advice on removing multiple sub-queries (which contains a join) by optimizing Linq To SQL query

I'm working on adding globalization to my product cataloge and I have made it work. However, I feel that the underlaying SQL query isn't performing as well as it could and I could need some advice on how to change my Linq To SQL query to make it more efficient.
The tables that are used
Product contains a unique id for each product. This column is called EntityID
TextTranslation contains the globalized text. The columns in this table are CultureID (a string), TextID (a reference to the text), Value (the actuall globalized text).
Text contains the mapping between a globalized text and a product. There is also a column which indicates which type of text it is (like name, description and so on)
TextType contains the definition (id, name and description) for a text type.
var culturedTexts =
from translation in ctx.TextTranslations
join text in ctx.Texts on translation.TextId equals text.TextId
where translation.CultureId == "en-EN"
select new
{
text.EntityId,
text.TextTypeId,
translation.Value,
};
var products =
from p in ctx.Products
let texts = culturedTexts.Where(i => i.EntityId == p.EntityId)
select new Model.Product
{
Description = texts.Where(c => c.TextTypeId == (int)TextType.Description).SingleOrDefault().Value,
Name = texts.Where(c => c.TextTypeId == (int)TextType.Name).SingleOrDefault().Value
};
When this is executed I get a query which looks like
SELECT (
SELECT [t1].[Value]
FROM [Common].[TextTranslation] AS [t1]
INNER JOIN [Common].[Text] AS [t2] ON [t1].[TextId] = [t2].[TextId]
WHERE ([t2].[TextTypeId] = 2) AND ([t2].[EntityId] = [t0].[EntityId]) AND ([t1].[CultureId] = 'sv-SE')
) AS [Description], (
SELECT [t3].[Value]
FROM [Common].[TextTranslation] AS [t3]
INNER JOIN [Common].[Text] AS [t4] ON [t3].[TextId] = [t4].[TextId]
WHERE ([t4].[TextTypeId] = 1) AND ([t4].[EntityId] = [t0].[EntityId]) AND ([t3].[CultureId] = 'sv-SE')
) AS [Name]
FROM [Catalog].[Product] AS [t0]
So each globalized text (Name, Description) in the LINQ query gets its own sub query and associated join. Is it possible to streamline this a bit and remove each text type getting its own join and subquery?
Well, given that they are getting different TextTypeId values, how would you prefer the TSQL to look? If you do a single JOIN, you'll have to put in a messy SELECT CASE or similar to discriminate between type "1" and type "2".
One option would be to to simply bring back all the suitable rows and do the final projection in-memory at the client, but to be honest I expect that the SQL optimizer will make light work of that TSQL anyway... especially if that query hits a good spanning index.

Categories