C# Linq to MySQL query with join makes bad SQL? - c#

I'm using C# to write LINQ in to a MySQL database. I think the SQL generated might be wrong for a simple table join that I'm doing.
My nuget packages are Mysql.Data v6.9.9, Mysql.data.entities v6.8.3, and MySql.data.entity v6.9.9
The LINQ is this:
query = from peopleResult in query
join t in technologyQuery on peopleResult.Company_Id equals t.Company_Id
select peopleResult;
The SQL generated looks like this:
SELECT ...
FROM `people` AS `Extent1`
INNER JOIN `technologies` AS `Extent2` ON (`Extent1`.`Company_Id` = `Extent2`.`Company_Id`) OR ((`Extent1`.`Company_Id` IS NULL) AND (`Extent2`.`Company_Id` IS NULL))
WHERE ...
Is this part of the join right?
(`Extent1`.`Company_Id` IS NULL) AND (`Extent2`.`Company_Id` IS NULL)
The query is incredibly long running when that is included. I pulled that out of the SQL with a regex, and it runs much faster and seems to give the correct results.
Is my LINQ incorrect or missing something? Does the MySQL linq-to-sql likely have a bug?
Thank you for your time thinking about this.

It's not a MySQL connector bug, but EF feature which tries to emulate the C# equality rules for nullable types.
First, make sure to set DbContext.Configuration.UseDatabaseNullSemantics to true, for instance inside your DbContext derived class constructor:
Configuration.UseDatabaseNullSemantics = true;
By idea this should solve the issue. However they implemented it for comparison operators and forgot the joins. So you have to use the alternative join syntax with where clause:
query =
from peopleResult in query
from t in technologyQuery
where peopleResult.Company_Id == t.Company_Id
select peopleResult;
which will be translated to the desired SQL JOIN without IS NULL part.

Related

Whats the best solution to Entity Framework cores lack of moderate LINQ query support?

So basically I have a table containing a set of data. This data is then joined onto an organisation table to which multiple users can be apart of. Im then trying to get all files in the table where the user executing the query, has permission to access the organisation. To do this I'm using a where clause that checks the users permissions from the application, to the files that have them organisations linked. Im then selecting the top 100 results and counting the records returned. (I want to see if the user has access to 100+ files over all the organisations).
The problem is when I use the following LINQ query:
(from f in File
join o in Organisation on f.OrganisationId equals o.Id
where permissions.Contains(o.Id.ToString())
select f).Take(100).Count();
The take and the count aren't executed on the SQL server and are run in memory when I try a contains on a list which should convert to an IN (VALUES) query on SQL. I have 70,000+ File records and this is very slow and times out on a web server. This is expected as Entity Framework core is in early stages and does not support moderate or advanced LINQ queries yet.
My question is, is there a better alternative to raw SQL queries while still being able to filter by an array of items and still using Entity Framework core v1.1? Thanks.
Edit: I tried updating to the latest version, this still did not solve my issue as I still got the following output.
The LINQ expression '{permissions => Contains([o].Id.ToString())}' could not be translated and will be evaluated locally.
The LINQ expression 'Contains([o].Id.ToString())' could not be translated and will be evaluated locally.
The LINQ expression 'Take(__p_1)' could not be translated and will be evaluated locally.
The LINQ expression 'Count()' could not be translated and will be evaluated locally.
The warnings are misleading - the problem is the ToString() call which causes client evaluation of the query.
The following should produce the intended SQL query:
var idList = permissions.Select(int.Parse);
var result = (
from f in File
join o in Organisation on f.OrganisationId equals o.Id
where idList.Contains(o.Id)
select f).Take(100).Count();
which in my environment (EF Core v1.1.1) produces the following SQL with no warnings (as expected):
SELECT COUNT(*)
FROM (
SELECT TOP(#__p_1) [f].[Id], [f].[Name], [f].[OrganisationId]
FROM [Files] AS [f]
INNER JOIN [Organisations] AS [o] ON [f].[OrganisationId] = [o].[Id]
WHERE [o].[Id] IN (1, 3, 4)
) AS [t]

EntityFramework Linq Left Join Parsing Error - DotNet Core 1.0

I'm attempting to explicitly join 3 tables using a left outer join in a linq query and am running into linq parsing issues. Performing an inner join parses correctly and returns data but using the left outer fails.
Example:
var query = from p in DatabaseContext.Products
where p.ClientID == clientID
join l in DatabaseContext.Licenses on p.ProductID equals l.ProductID into pl
from pli in pl.DefaultIfEmpty()
join a in DatabaseContext.Articles on p.ArticleID equals a.ArticleID into pa
from pai in pa.DefaultIfEmpty()
select new SomeEntityDTO
{
SomethingFromP = p.Something,
SomethingFromL = pli.Something,
SomethingFromA = pai.Something
};
As both joined tables key off of the first table, I can test each individually by removing the other join, e.g., test the query for p to l and then for p to a. These test queries function perfectly. It's also possible to remove the left outer rule and receive a proper result.
var query = from p in DatabaseContext.Products
where p.ClientID == clientID
join l in DatabaseContext.Licenses on p.ProductID equals l.ProductID
join a in DatabaseContext.Articles on p.ArticleID equals a.ArticleID
select new SomeEntityDTO
... the rest ...
Viewing the offending query in SQL Profiler (top code example) I see that the first two tables are successfully joined, e.g.:
SELECT p.Something, l.Something
FROM Products AS p
LEFT JOIN Licenses AS l ON p.ProductID = l.ProductID
WHERE p.ClientID = 5
ORDER BY p.ProductID
And, then right after this successful query, are another 2 queries (identical to each other):
SELECT a.ArticleID, a.Something, <all fields, even when not specified in query>
FROM Articles AS a
ORDER BY a.ArticleID
The outer joined 3 tables will successfully return an object, as long as I don't attempt to access a field from the "a" table. When doing that, I recieve a Null Exception error, as that table was never really joined.
As stated, removing the outer join rule brings back a successfully joined query.
I have attempted to adjust the linq query figuring that the Linq parser had an issue, but to no avail:
var query = from p in DatabaseContext.Products
from l in DatabaseContext.Licenses.Where(g => g.ProduktID == p.ProduktID).DefaultIfEmpty()
from a in DatabaseContext.Articles.Where(g => g.ArticleID == p.ArticleID).DefaultIfEmpty()
where ....
This parses to a set of CROSS APPLYs that doesn't function at all and the profiled query, when copied into a query editor window, doesn't run at all (as opposed to the 3 individual queries seen in the profiler for the first code example). I have also attempted the more complicated lambdas, which also doesn't work.
Is this an error in the Linq parser? Am I doing this completely wrong? (According to the multiple answered questions here on explicit left outer joins (as opposed to natural associations), I'm doing it correctly. But, it doesn't parse correctly. I've avoided creating the associations so I can join them without explicitly defining the join. Is that potentially required here and won't work properly without it?
Note: Each table has complex keys but I only really need to join based on single key values (the DB is part of a product I can't change).
Using. DotNet Core, EntityFramework, EntityFrameworkCore.SqlServer, etc., all version 1.0.1.
Help?
The short answer is to use EF6 instead of EFCore if you absolutely have to have complex Linq queries on your entites, even after the 1.1 release. There are still too many things missing in EFCore compared to EF6.
Roadmap here.
In my case, I kept EFCore and used the Context.Entity.FromSql(query) method in order to get the results. This allowed me to utilize EFCore for most of the EF Entities, and thereby keeping a forward-looking approach to the application, while allowing for special exceptions for complicated queries not based on an actual entity. The plan is to replace those FromSql queries as EF Core matures.
Prior to deciding on .FromSql, I also tested a query on a View and on a stored procedure. In both instances, I failed. For stored procedures, named parameters is not yet implemented, and views are not currently supported unless you attempt to trick EF into thinking the view is actually a table (which brings its own issues).
In order to access EF Core .FromSql, you need to install the following package:
Microsoft.EntityFrameworkCore.Relational

SQL equivalent of Count extension method for LINQ isn't obvious

I'm doing LINQ to entity framework (EF) to get count of records in my table using below code:
using (var db = new StackOverflowEntities())
{
var empLevelCount = db.employeeLevels.Count();
}
I captured the query fired by EF towards database using SQL Server Profiler. I got the following query :
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[employeeLevels] AS [Extent1]
) AS [GroupBy1]
This query remains exactly the same even for LongCount extension method except for the fact that COUNT SQL function gets replaced by COUNT_BIG in the SQL query being created by EF. The query created by LINQ to EF provider looks very weird to me. Why it is not simply doing something like below to return the scalar count value?
SELECT
COUNT(1) AS [A1]
FROM [dbo].[employeeLevels] AS [Extent1]
It will be really helpful if someone can help me understand the additional logistics being taken care of by EF internally which is why LINQ to EF provider is creating such a query? It seems EF is trying to deal with some additional use cases as well through some common algorithm which results in some sort of generic query as the one created above.
Testing both queries (suitably changing the table) in a DB of mine reveals that they both generate exactly the same query plan. So, the structure shouldn't concern you overly much. In SQL, you tell the system what you want, and it works out how best to do it, and here the optimizer is able to generate the optimal plan given either sample.
As to why LINQ generates code like this, I'd suspect it's just a generalized pattern in its code generator that lets it generate similar code for any aggregation and subsequent transformations, not just for unfiltered counts.

Is it necessary to use joins in Linq to Sql

I was practicing today when I realized that there are two ways linq to sql can retrieve data from db, I created two datagrid and used the two different ways to populate each of these datagrids and they produced the same result.
The first method is using joins to get data from related tables, and the other methods is using linq query like an object to access related tables. The code is shown below:
NorthWindDataContext dbContext = new NorthWindDataContext();
var orders = from ord in dbContext.Orders
select new { ord.ShipCountry , ord.Customer.ContactName};
var orders2 = from ord in dbContext.Orders
join cust in dbContext.Customers on ord.CustomerID equals cust.CustomerID
select new
{
ord.ShipCountry, cust.ContactName
};
var data = orders2;
DataGrid.ItemsSource= orders;
DataGrid2.ItemsSource = orders2;
My question like the title is if it is entirely necessary to use joins, because I find them really cumbersome to use sometimes.
You need to use something that gets you from the order to the customer.
Join can do this. This is how the second query works.
Having the order "know" about the customer can do this. This is how the first query works.
If your data provider is aware of the connection between order and customer then these will amount to the same thing.
If your data provider is not aware of the connection, then the approach in the first example would result in N + 1 look ups instead of 1.
A linq-friendly ORM will generally be aware of these connections as long as the appropriate relationship-marking attributes are present (just what that is differs between Linq2SQL, EF, NHibernate, etc.).
It's still important to know the join approach for cases where either the relationship isn't known about by the provider, or you have a reason to join on something other than a foreign-key relationship.
The answer is "sort of". Since you're using an ORM such as Linq-to-Sql, no you don't directly need to call join within your linq queries to accomplish what you're trying to do.
However, when the ORM activates the query it will generate actual SQL code that'll have a join statement in it to get the results you're querying. Since you're using an ORM though, the data returned is mapped to objects, and since Customer has a relationship between the objects, the relationship will also be translated to from the database INTO the objects.
ord.Customer.ContactName
The above statement is most likely translated to a JOIN statement performing an INNER JOIN between Customer & Orders.
Due to this, both of your LINQ queries most likely generating similar SQL queries. Both of which has a JOIN statement in them. Because the relationships between your objects also exists within the database (and everything is mapped together showing this relationship) you don't directly need to use join within a LINQ statement.

How to determine whether to use join in linq to sql?

I am just wondering about how we can determine whether to use join or not in linq to sql.
Eg. let say if we have two tables like this
Table 1 Customer
id
name
Table 2 addresstype
id
address1
customerid
and
var address = from cu in Customer
from ad in addresstype
where cu.id == ad.customerid
select ad;
or
var address = from cu in Customer
join ad in addresstype on cu.id equals ad.customerid
select de;
Is both way are the same. Is there any difference in performance?
Also the second method, will it come up with an error if there isn’t any matching?
Are you using linq to entities or linq to SQL? If its the former then you can avoid both of these by defining your relationships in the model and using navigation properties. This would be the clearest way of doing things
Basically, these two LINQ queries are equivalent to the following SQL queries:
select ad.*
from Customer cu, AddressType ad
where cu.ID == ad.CustomerID -- I assume this was meant by the OP
and
select ad.*
from Customer cu
inner join AddressType ad on cu.id = ad.CustomerID;
The difference between these two queries is mostly semantic, since the database will do the same thing in both cases and return a same result set for both queries.
I would prefer the join syntax in both SQL and LINQ since it defines an explicit relationship between the two tables/entities, that is only implied in the join-less version.
These are seems same query, they return same result but I don't know which one can be a faster, it should be bench marked.
But, In the case of linq2sql I prefer correlated subquery over join, because currently if you want t check the equation two element you should use syntax of:
new {X,Y} equals new {X',Y'}
in join and if you have more than this equations you should convert it to nested query. So I Prefer to have a more readable code which uses minimum differences in difference actions.
To throw a third and more prefered method into the mix with LINQ to SQL, use associations between the tables (even if you don't have them set up in your database). With that in place, you can navigate the object graph rather than using joins:
var query = from cu in Customer
from ad in cu.Addresses
select ad;
Note: when querying the object graphs, LINQ to SQL translates the join into a left outer join where-as the join/where syntax by default is an inner join.
Joins in LINQ should be used when there isn't a natural relationship between the objects. For example, use a join if you want to see the the listing of stores that are in the same city as your customers. (Join Customer.Address.City with Store.Address.City).
There should not be a difference between these two queries. I actually wondered this question myself a few months ago. I verified this through LINQPad. It's a free tool that you can download and actually see the generated SQL of any LINQ query (this is the query that is sent to the database).
The generated SQL should be the same for these two queries.
If you're doing this through Visual Studio, there is also a way you can see the generated SQL as well.

Categories