LINQ to Entities performance issue with Where and Contains

LINQ to Entities performance issue with Where and Contains - c#

Table A has - ID(PK), PartNumber, Code1, Code2
Table B has - InventoryID(PK) PartNumber, Part, and a bunch of other columns.
I need to get everything from Table B where Table B's PartNumber is NOT in Table A.
Example: Table B has PartNumber 123. There is no PartNumber in Table A for 123. Get that row.
What I currently have:
using (SomeEntity context = new SomeEntity())
{
var partmasterids = context.PartsMasters.Select(x => x.PartNumber).Distinct().ToList();
var test = context.Parts.Where(x => !partmasterids.Contains(x.PartNumber)).ToList();
}
I currently first get and select all the distinct part numbers from Table A.
Then I check Table A and Table B's partnumbers and get each part from Table B where there that part number is not in Table A.
There are about 11,000 records in table B and 200,000 records in table A.
I should be getting about 9000 parts which are not in table A.
I am running into huge performance issues with that second LINQ statement. If I do a .Take(100), that will even take around 20-30 seconds. Anything above 1000 will take way too long.
Is there a better way to write this LINQ statement?

From what I understand of your question, the equivalent in SQL would be something like
SELECT DISTINCT B.PartNumber AS MissingParts
FROM TableB as B
LEFT OUTER JOIN TableA as A ON B.PartNumber = A.PartNumber
WHERE A.PartNumber IS NULL
Run that SQL and measure the time it takes. Without indexes, that's as fast as it's going to get.
Now, if you really have to do it in EF, you'll need to do an equivalent statement, complete with the left join. Based on this question, it would look something like this
var query = from b in TableB
join a in TableA on b.PartNumber equals a.PartNumber into joind
from existsInA in joind.DefaultIfEmpty()
where existsInA == null
select b.PartNumber;
var missingParts = query.Distinct().ToList();

Related

Does Entity Framework query the database multiple times if I use different fields of the same Linq query at different times?

I tried the Internet and the SOF but couldn't locate a helpful resource. Perhaps I may not be using correct wording to search. If there are any previous questions I have missed due to this reason please let me know and I will take this question down.
I am dealing with a busy database so I am required to send less queries to the database.
If I access different columns of the same Linq query from different levels of the code then is Entity Framework smart enough to foresee the required columns and bring them all or does it call the db twice?
eg.
var query = from t1 in table_1
join t2 in table_2 on t1.col1 equals t2.col1
where t1.EmployeeId == EmployeeId
group new { t1, t2 } by t1.col2 into grouped
orderby grouped.Count() descending
select new { Column1 = grouped.Key, Column2 = grouped.Sum(g=>g.t2.col4) };
var records = query.Take(10);
// point x
var x = records.Select(a => a.Column1).ToArray();
var y = records.Select(a => a.Column2).ToArray();
Does EF generate query the database twice to faciliate x and y (send a query first to get Column1, and then send another to get Column2) or is it smart enough to know it needs both Columns to be materialised and bring them both at point x?
Added to clarify the intention of the question:
I understand I can simply add a greedy method to the end of query.Take(10) and get it done but I am trying to understand if the approach I try (and in my opinion, more elegant) does work of if not what makes EF to make two queries please.

Yes currently your code will generate 2 queries that will be executed to the database. Reason being is because you have 2 different sqls generated:
First is the top query, taking only 10 records and then only Column1
Second is the top query, taking only 10 records and then only Column2
The reason these are 2 queries is because you have a ToArray over different Select statements -> generating different sql. Most of linq queries are differed executed and will be executed only when you use something like ToArray()/ToList()/FirstOrDefault() and so on - those that actually give you the concrete data. In your original query you have 2 different ToArray on data that has not yet been retrieved - meaning 2 queries (once for the first field and then for the second).
The following code will result in a single query to the database
var records = (from t1 in table_1
join t2 in table_2 on t1.col1 equals t2.col1
where t1.EmployeeId == EmployeeId
group new { t1, t2 } by t1.col2 into grouped
orderby grouped.Count() descending
select new { Column1 = grouped.Key, Column2 = grouped.Sum(g=>g.t2.col4) })
.Take(10).ToList();
var x = records.Select(a => a.Column1).ToArray();
var y = records.Select(a => a.Column2).ToArray();
In my solution above I added a ToList() after filtering out only that data you need (Take(10)) and then at that point it will execute to the database. Then you have all the data in memory and you can do any other linq operation over it without it going again to the database.
Add to your code ToString() so you can check the generated sql at different points. Then you will understand when and what is being executed:
var query = from t1 in table_1
join t2 in table_2 on t1.col1 equals t2.col1
where t1.EmployeeId == EmployeeId
group new { t1, t2 } by t1.col2 into grouped
orderby grouped.Count() descending
select new { Column1 = grouped.Key, Column2 = grouped.Sum(g=>g.t2.col4) };
var generatedSql = query.ToString(); // Here you will see a query that brings all records
var records = query.Take(10);
generatedSql = query.ToString(); // Here you will see it taking only 10 records
// point x
var xQuery = records.Select(a => a.Column1);
generatedSql = xQuery.ToString(); // Here you will see only 1 column in query
// Still nothing has been executed to DB at this point
var x = xQuery.ToArray(); // And that is what will be executed here
// Now you are before second execution
var yQuery = records.Select(a => a.Column2);
generatedSql = yQuery.ToString(); // Here you will see only the second column in query
// Finally, second execution, now with the other column
var y = yQuery.ToArray();

When you are running linq statement on an entity in EF if only prepares the Select statement (thats why the type is IQueryable). The data is loaded lazily. When you try to use a value from that query then only the result gets evaluated using a enumerator.
So when you turn it to a collection (.toList() etc.) explicitly it tries to get data to populate the list and hence the sql command is fired.
It is designed so to enhance the performance. So if a particular property of an entity is to be used EF doesn't get the value for all the columns from that table

Convert sql query to lambda expression for table with foreign keys

I have 2 tables. One is a user table that holds userid and userSelection(foreign key to another table) both are primary keys so multiple rows for a user.
The 2nd table holds columns with it's primary id being userSelection.
I want to retrieve all the userSelection rows that a userId has from the 2nd table. I want to use linq lambda expressions too.
I have it working in sql jsut can't convert it for use in c#.
Select * From column
where colID in (
select colId from users
where userID = 'someUser')
Thanks

Assuming you're using Entity Framework, what you're really looking for is an inner join. it would look something like this:
from c in context.Column
join u in context.Users on c.ColId equals u.ColId
where u.UserId = 'SomeUser'
select c;
as a lambda that is something like (syntax might be lacking something) (no where clause here, but easily added)
context.Column.Join( context.Users, u => u.ColId, c => c.ColId).Select

Change this code to two parts
Select * From column
where colID in (
select colId from users
where userID = 'someUser')
First part to get colId list:
var colIds = context.users.Where(x=>x.userID == "someUser").Select(x=>x.colId).ToList();
Second part to get the result use Where and List.Contains
// IQueryable result
var result = context.column.Where(x=>colIds.Contains(x.colID));
You can use it inline, but I recommend it to be two parts.

Is this LINQ Query "correct"?

I have the following LINQ query, that is returning the results that I expect, but it does not "feel" right.
Basically it is a left join. I need ALL records from the UserProfile table.
Then the LastWinnerDate is a single record from the winner table (possible multiple records) indicating the DateTime the last record was entered in that table for the user.
WinnerCount is the number of records for the user in the winner table (possible multiple records).
Video1 is basically a bool indicating there is, or is not a record for the user in the winner table matching on a third table Objective (should be 1 or 0 rows).
Quiz1 is same as Video 1 matching another record from Objective Table (should be 1 or 0 rows).
Video and Quiz is repeated 12 times because it is for a report to be displayed to a user listing all user records and indicate if they have met the objectives.
var objectiveIds = new List<int>();
objectiveIds.AddRange(GetObjectiveIds(objectiveName, false));
var q =
from up in MetaData.UserProfile
select new RankingDTO
{
UserId = up.UserID,
FirstName = up.FirstName,
LastName = up.LastName,
LastWinnerDate = (
from winner in MetaData.Winner
where objectiveIds.Contains(winner.ObjectiveID)
where winner.Active
where winner.UserID == up.UserID
orderby winner.CreatedOn descending
select winner.CreatedOn).First(),
WinnerCount = (
from winner in MetaData.Winner
where objectiveIds.Contains(winner.ObjectiveID)
where winner.Active
where winner.UserID == up.UserID
orderby winner.CreatedOn descending
select winner).Count(),
Video1 = (
from winner in MetaData.Winner
join o in MetaData.Objective on winner.ObjectiveID equals o.ObjectiveID
where o.ObjectiveNm == Constants.Promotions.SecVideo1
where winner.Active
where winner.UserID == up.UserID
select winner).Count(),
Quiz1 = (
from winner2 in MetaData.Winner
join o2 in MetaData.Objective on winner2.ObjectiveID equals o2.ObjectiveID
where o2.ObjectiveNm == Constants.Promotions.SecQuiz1
where winner2.Active
where winner2.UserID == up.UserID
select winner2).Count(),
};

You're repeating join winners table part several times. In order to avoid it you can break it into several consequent Selects. So instead of having one huge select, you can make two selects with lesser code. In your example I would first of all select winner2 variable before selecting other result properties:
var q1 =
from up in MetaData.UserProfile
select new {up,
winners = from winner in MetaData.Winner
where winner.Active
where winner.UserID == up.UserID
select winner};
var q = from upWinnerPair in q1
select new RankingDTO
{
UserId = upWinnerPair.up.UserID,
FirstName = upWinnerPair.up.FirstName,
LastName = upWinnerPair.up.LastName,
LastWinnerDate = /* Here you will have more simple and less repeatable code
using winners collection from "upWinnerPair.winners"*/

The query itself is pretty simple: just a main outer query and a series of subselects to retrieve actual column data. While it's not the most efficient means of querying the data you're after (joins and using windowing functions will likely get you better performance), it's the only real way to represent that query using either the query or expression syntax (windowing functions in SQL have no mapping in LINQ or the LINQ-supporting extension methods).
Note that you aren't doing any actual outer joins (left or right) in your code; you're creating subqueries to retrieve the column data. It might be worth looking at the actual SQL being generated by your query. You don't specify which ORM you're using (which would determine how to examine it client-side) or which database you're using (which would determine how to examine it server-side).
If you're using the ADO.NET Entity Framework, you can cast your query to an ObjectQuery and call ToTraceString().
If you're using SQL Server, you can use SQL Server Profiler (assuming you have access to it) to view the SQL being executed, or you can run a trace manually to do the same thing.
To perform an outer join in LINQ query syntax, do this:
Assuming we have two sources alpha and beta, each having a common Id property, you can select from alpha and perform a left join on beta in this way:
from a in alpha
join btemp in beta on a.Id equals btemp.Id into bleft
from b in bleft.DefaultIfEmpty()
select new { IdA = a.Id, IdB = b.Id }
Admittedly, the syntax is a little oblique. Nonetheless, it works and will be translated into something like this in SQL:
select
a.Id as IdA,
b.Id as Idb
from alpha a
left join beta b on a.Id = b.Id

It looks fine to me, though I could see why the multiple sub-queries could trigger inefficiency worries in the eyes of a coder.
Take a look at what SQL is produced though (I'm guessing you're running this against a database source from your saying "table" above), before you start worrying about that. The query providers can be pretty good at producing nice efficient SQL that in turn produces a good underlying database query, and if that's happening, then happy days (it will also give you another view on being sure of the correctness).

How to make a SQL query like this?

Given 2 base tables, 1 table which stores the relation between them with a few extra attributes; taking a few extra attribute values as user input, based on that extracting relation from relation table.
This information has the ID of the main values (person and animal) not the names. I want to display the names on screen, like according to the input you gave the records which found are this person has this animal with him.
select DISTINCT table0.person_name, table5.animal_name
from table1
INNER JOIN table0, table5
on table1.person_id=table0.person_id
and
table1.animal_id=table5.animal_id
where table1.aa=input1
and table1.bb=input2
and table1.cc=input3
and table1.dd=input4

You have at least three errors.
The WHERE clause should come after the JOIN .. ON clause, not before it.
You cannot refer to columns in table5 because it doesn't appear in the FROM list.
You shouldn't write ON xxx AND ON yyy. Just write ON xxx AND yyy.
Other points to consider:
Are you sure that you meant FULL OUTER JOIN and not INNER JOIN?
Why do you add the distinct? If a person owns two animals with the same name do you really want to return only one row?
Where do the values input1, ..., input4 come from?
I think table0 should be renamed to person, table5 to animal, and table1 to person_animal to make it easier to understand the purpose of each table.
My best guess as to what you meant is this:
SELECT table0.person_name, table5.animal_name
FROM table1
JOIN table0 ON table1.person_id = table0.person_id
JOIN table5 ON table1.animal_id = table5.animal_id
WHERE table1.aa = input1
AND table1.bb = input2
AND table1.cc = input3
AND table1.dd = input4

How do I join tables with a condition using LLBLGen?

I have the following Sql Query that returns the type of results that I want:
SELECT b.ID, a.Name, b.Col2, b.COl3
FROM Table1 a
LEFT OUTER JOIN Table2 b on b.Col4 = a.ID AND b.Col5 = 'test'
In essence, I want a number of rows equal to Table1 (a) while having the data from Table2 (b) listed or NULL if the condition, 'test', doesn't exist in Table2.
I'm rather new to LLBLGen and have tried a few things and it isn't working. I can get it to work if the condition exists; however, when a requirements change came in and caused me to rewrite the query to that above, I'm at a loss.
Below is the old LLBLGen C# code that worked for existing products but not for the above query:
LookupTable2Collection table2col = new LookupTable2Collection();
RelationCollection relationships = new RelationCollection();
relationships.Add(LookupTable2Entity.Relations.LookupTable1EntityUsingTable1ID, JoinHint.Left);
IPredicateExpression filter = new PredicateExpression();
filter.Add(new FieldCompareValuePredicate(LookupTable2Fields.Col5, ComparisonOperator.Equal, "test"));
table2col.GetMulti(filter, relationships);
Table 1 has 3 records in it. I need the 3 records back even if all items from Table 2 are NULL because the condition doesn't exist. Any ideas?

You've to add your filter to the relation join like this:
relationships.Add(LookupTable2Entity.Relations.LookupTable1EntityUsingTable1ID, JoinHint.Left).CustomFilter = new FieldCompareValuePredicate(LookupTable2Fields.Col5, ComparisonOperator.Equal, "test");

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LINQ to Entities performance issue with Where and Contains - c#

Related

Does Entity Framework query the database multiple times if I use different fields of the same Linq query at different times?

Convert sql query to lambda expression for table with foreign keys

Is this LINQ Query "correct"?

How to make a SQL query like this?

How do I join tables with a condition using LLBLGen?

Categories

Resources