Making a UNION query more efficient in LINQ - c#

I am currently working on a project leveraging EF and I am wondering if there is a more efficient or cleaner way to handle what I have below.
In SQL Server I could get the data I want by doing something like this:
SELECT tbl2.* FROM
dbo.Table1 tbl
INNER JOIN dbo.Table2 tbl2 ON tbl.Column = tbls2.Colunm
WHERE tbl.Column2 IS NULL
UNION
SELECT * FROM
dbo.Table2
WHERE Column2 = value
Very straight forward. However in LINQ I have something that looks like this:
var results1 = Repository.Select<Table>()
.Include(t => t.Table2)
.Where(t => t.Column == null);
var table2Results = results1.Select(t => t.Table2);
var results2 = Repository.Select<Table2>().Where(t => t.Column2 == "VALUE");
table2Results = table2Results.Concat(results2);
return results2.ToList();
First and foremost the return type of the method that contains this code is of type IEnumerable< Table2 > so first I get back all of the Table2 associations where a column in Table1 is null. I then have to select out my Table2 records so that I have a variable that is of type IEnumerable. The rest of the code is fairly straightforward in what it does.
This seems awfully chatty to me and, I think, there is a better way to do what I am trying to achieve. The produced SQL isn't terrible (I've omitted the column list for readability)
SELECT
[UnionAll1].*
FROM (SELECT
[Extent2].*
FROM [dbo].[Table1] AS [Extent1]
INNER JOIN [dbo].[Table2] AS [Extent2] ON [Extent1].[Column] = [Extent2].[Column]
WHERE [Extent1].[Column2] IS NULL
UNION ALL
SELECT
[Extent3].*
FROM [dbo].[Table2] AS [Extent3]
WHERE VALUE = [Extent3].[Column]) AS [UnionAll1]
So is there a cleaner / more efficient way to do what I have described? Thanks!

Well, one problem is that your results may not return the same data as your original SQL query. Union will select distinct values, Union All will select all values. First, I think your code could be made a lot clearer like so:
// Notice the lack of "Include". "Include" only states what should be returned
// *with* the original type, and is not necessary if you only need to select the
// individual property.
var firstResults = Repository.Select<Table>()
.Where(t => t.Column == null)
.Select(t => t.Table2);
var secondResults = Repository.Select<Table2>()
.Where(t => t.Column2 == "Value");
return firstResults.Union(secondResults);
If you know that it's impossible to have duplicates in this query, use Concat instead on the last line (which will produce the UNION ALL that you see in your current code) for reasons described in more detail here. If you want something similar to the original query, continue to use Union like in the example above.
It's important to remember that LINQ-to-Entities is not always going to be able to produce the SQL that you desire, since it has to handle so many cases in a generic fashion. The benefit of using EF is that it makes your code a lot more expressive, clearer, strongly typed, etc. so you should favor readability first. Then, if you actually see a performance problem when profiling, then you might want to consider alternate ways to query for the data. If you profile the two queries first, then you might not even care about the answer to this question.

Related

Cosmos DB SQL API NOT IN operator taking a List WithParameter(List<t>) not working

var ids = IdsList.Select(pID => pID.ID).ToArray();
var response= await MyService.GetByMyQuery(
new QueryDefinition(
"SELECT * FROM p WHERE p.id NOT IN(#ids)"
)
.WithParameter("#ids", string.Join(",", ids)),
);
So this is not working. The operator returns all the items, instead of just the ones not in the list. In the cosmos DB SQL query editor i can easily do
SELECT * FROM p WHERE p.id NOT IN("id1","id2")
and it returns the expected results without any problems. SO i guess that the problem is in the code layer, the way I'm passing the ids to the WithParameter() menthod.
Any insights is greatly appreciated.
The problem
Your C# code is not sending multiple values as the #ids parameter, but a single string value, effectively like this:
SELECT * FROM p
WHERE p.id NOT IN("id1, id2")
Since this compound id does not exist then it returns all items like you observed.
Solution
It may be possible with IN keyword as well, but I do know for sure that this pattern will work:
SELECT * FROM p
WHERE NOT ARRAY_CONTAINS(#ids, p.id)
NB! Correct me if I'm mistaken, but most likely this condition will NOT be served by index. So, you may want to reconsider your design, unless your real case would have an additional good indexable predicate, or else it will be slow and costly.

Multiple Where vs Inner Join

I have a filter where depending on the user selection I conditionally add in more Where/Joins.
Which method is faster than the other and why?
Example with Where:
var queryable = db.Sometable.Where(x=> x.Id > 30);
queryable = queryable.Where(x=> x.Name.Contains('something'));
var final = queryable.ToList();
Example with Join:
var queryable1 = db.Sometable.Where(x=> x.Id > 30);
var queryable2 = db.Sometable.Where(x=> x.Name.Contains('something'));
var final = (from q1 in queryable1 join q2 in queryable2 on q1.Id equals q2.Id select q1).ToList();
NOTE: I would have preferred the multiple Where but it is causing error as described in a question. Hence had to shift to JOIN. Hope 'JOIN' code is not slower than multiple WHERE
I just tried running similar linq statements against an MSsql 2008 database table with 10million rows. I found that the query optimizer converted both statements into similar query plans and the performance difference was a wash.
I would say that as someone who is reading the code, the first example more clearly states your intentions, and therefore would be preferred. Many times performance is not the best metric to choose when evaluating code.
i whould go for the where clause, avoiding to self joining the same table and make the code clearer
you can add a log to your dbcontext to see the generated sql query
db.context.Database.Log = System.Diagnostic.Debug.WriteLine;
anyway to improve the performance of the query i would :
select ONLY the fields that you actually need (not *)
check the indexes of the table
do you really need the contains statement ? if the records grow a lot you will have performance issue with sql as "like '%XXX%'"
I'm sure you already understand that LINQ converts your code into a SQL statement. Your first query would result in something like:
SELECT * FROM Sometable WHERE Id > 30 AND Name LIKE '%something%'
Your second query would result in something like
SELECT q1.*
FROM Sometable q1
JOIN Sometable q2 ON q1.Id = q2.Id
WHERE q1.Id > 30 AND q2.Name LIKE '%something%')
Nearly every time, a select from a single will return results faster than a join between 2 tables.
If you LINQ statement is failing to add tables, be sure you are including them.
var queryable = db.Sometable.Include(i => i.ForeignTable).Where(x=> x.Id > 30);

Join data from IQueryable and IList<KeyValuePair>

I'm trying to join a row of parent data to a related collection which has been squeezed into a single piece of data.
I have an IQueryable of Orders:
IQueryable<Order> orderList = context.Set<Order>
.Where("OrderDate >= #0", startDate)
.Where("OrderDate <= #0", endDate)
and a related IList<KeyValuePair<int, string>> where each KVP contains the OrderID and a concatenated string of the product names from each Order Line. I want to join the Value from the correct KeyValuePair to the Order info based on the OrderID Key.
To illustrate, the desired output would look something like:
OrderNum OrderDate Customer State OrderTotal Products_Ordered
12345 12/12/2012 J.Bloggs WA $25.50 Bolts, Hammer, Suregrip Clamp
I am trying a linq join that looks like this:
IQueryable result = from o in orders
join line in orderLines on o.OrderID equals line.Key
select new
{
o.OrderNum
o.OrderDate,
o.Customer.CustomerFullName,
o.DeliverAddress.State,
o.TotalPrice,
line.Value
}
The method performing the join seems to work, but when I access the returned IQueryable, I get a NotSupportedException: Unable to create a constant value of type 'System.Collections.Generic.KeyValuePair`2'. Only primitive types or enumeration types are supported in this context.
What am I doing wrong?
Your IQueryable is actually a specialized entity framework implementation that, when iterated, attempts to construct an SQL query by examining an expression tree, execute this query, and return an enumerable over the results. This is fragile, and your projections and queries can't be arbitrarily complex, or the expression -> SQL converter has no idea what to do with it.
Fixing this by materializing your IQueryable first is fine, but you don't even really need to do that. Why have a list of what is essentially tuples, when what you want is a dictionary that maps the order ID to a bunch of data?
IDictionary<int, string> orderLines = new Dictionary<int, string>();
// Add a dummy item
orderLines[1234] = "Hello, this, is, a, test";
// Get your combined view
// Assuming you have an order with 1234 as the ID, this should work
var result = from o in orders
select new
{
o.OrderNum
o.OrderDate,
o.Customer.CustomerFullName,
o.DeliverAddress.State,
o.TotalPrice,
Products = orderLines[o.OrderID]
}
Entity Framework has issues with some types, because it tries to push them to the backing database. The quick and easy solution is to pull the data down before performing the join.
var orderList = context.Set<Order>
.Where("OrderDate >= #0", startDate)
.Where("OrderDate <= #0", endDate)
.ToList(); // ------> Relevant line <------
var result = from o in orderList
join line in orderLines on o.OrderID equals line.Key
select new
{
o.OrderNum
o.OrderDate,
o.Customer.CustomerFullName,
o.DeliverAddress.State,
o.TotalPrice,
line.Value
};
It's just a change in where the joining happens, RDBMS- or client-side.
Performance won't be an issue, assuming you don't have a ton of data and all rows will be matched. Of course, you will have to pull down data that doesn't get joined, so if only a few of orderList will be in result, that might be worth a second thought, design-wise. The only alternative would be in pushing the KeyValuePair items to the server first, which probably isn't what you're looking for.
Joining a query result (IQueryable) with a normal enumerable stored in memory (IEnumerable<...>) doesn't make much sense. Think about it, the query doesn't get processed until it's materialized, injecting your own data in queries is done in the query text itself -- that's not what you really want, now is it?
I think what you expect from this is best achieved by first materializing the IQueryable into an IEnumerable<>, then doing a plain LINQ join on two IEnumerable<>s, which is trivial.
It's not like you'd be using the IEnumerable<> to filter the result set on the server side, you're not really losing any performance here.
Edit: Note that if you are using the IEnumerable<> to filter the results on the server side, you can do that! EF (I assume that's what you're using) has very strong special cases for things like IQueriable<>.Any<>() with an IEnumerable<>.Contains<>() inside it -- it inserts the literal values in the query text. It's just the actual join that doesn't make much sense in this context.

IQueryable<T>.GroupBy() On Reference Type

Using Linq to SQL I'm writing queries that are taking advantage of the IQueryable.GroupBy method.
Even though my query involves many tables and left joins lets say for illustration that we are only working with two tables. TableA has a one to many relationship to TableB.
var queryResults = from db.TableA
.Join(db.TableB, tA => tA.ID, tB => tB.TableA_ID, (ta, tb) => tb)
.GroupBy(tb => tb.TableA);
This will give me an
<IQueryable<IGrouping<TableA, TableB>>
On the surface this seems to work however I'm worried because I'm calling the GroupBy method and passing in a reference type for the keySelector argument.
Please help me understand why this is or isn't a safe thing to do.
Hard to tell what you're trying to do, but I think you might want to use comprehension syntax here to make it easier to write/understand. In particular, do your grouping after you've done whatever joins or other operations you want to do instead of mixing them together.
var query = ... (can be chained methods, whatever)
var grouped = from row in query
group row by row.SomeProperty;
Then you'll likely have an easier time (I would think) writing and reasoning about the query.
Try this:
var queryResults =
from ta in db.TableA
join tb in db.TableB on ta.ID equals tb.TableA_ID
group tb by ta;
And yes, this is safe to do on reference types. The query generated by linq-to-sql has nothing to do with reference types - it's just SQL.

Sort Linq list with one column

I guess it should be really simple, but i cannot find how to do it.
I have a linq query, that selects one column, of type int, and i need it sorted.
var values = (from p in context.Products
where p.LockedSince == null
select Convert.ToInt32(p.SearchColumn3)).Distinct();
values = values.OrderBy(x => x);
SearchColumn3 is op type string, but i only contains integers. So i thought, converting to Int32 and ordering would definitely give me a nice 1,2,3 sorted list of values. But instead, the list stays ordered like it were strings.
199 20 201
Update:
I've done some tests with C# code and LinqPad.
LinqPad generates the following SQL:
SELECT [t2].[value]
FROM (
SELECT DISTINCT [t1].[value]
FROM (
SELECT CONVERT(Int,[t0].[SearchColumn3]) AS [value], [t0].[LockedSince], [t0].[SearchColumn3]
FROM [Product] AS [t0]
) AS [t1]
WHERE ([t1].[LockedSince] IS NULL)
) AS [t2]
ORDER BY [t2].[value]
And my SQL profiler says that my C# code generates this piece of SQL:
SELECT DISTINCT a.[SearchColumn3] AS COL1
FROM [Product] a
WHERE a.[LockedSince] IS NULL
ORDER BY a.[SearchColumn3]
So it look like C# Linq code just omits the Convert.ToInt32.
Can anyone say something useful about this?
[Disclaimer - I work at Telerik]
You can solve this problem with Telerik OpenAccess ORM too. Here is what i would suggest in this case.
var values = (from p in context.Products
where p.LockedSince == null
orderby "cast({0} as integer)".SQL<int>(p.SearchColumn3)
select "cast({0} as integer)".SQL<int>(p.SearchColumn3)).ToList().Distinct();
OpenAccess provides the SQL extension method, which gives you the ability to add some specific sql code to the generated sql statement.
We have started working on improving this behavior.
Thank you for pointing this out.
Regards
Ralph
Same answer as one my other questions, it turns out that the Linq provider i'm using, the one that comes with Telerik OpenAccess ORM does things different than the standard Linq to SQL provider! See the SQL i've posted in my opening post! I totally wasn't expecting something like this, but i seem that the Telerik OpenAccess thing still needs a lot of improvement. So be careful before you start using it. It looks nice, but it has some serious shortcomings.
I can't replicate this problem. But just make sure you're enumerating the collection when you inspect it. How are you checking the result?
values = values.OrderBy(x => x);
foreach (var v in values)
{
Console.WriteLine(v.ToString());
}
Remember, this won't change the order of the records in the database or anywhere else - only the order that you can retrieve them from the values enumeration.
Because your values variable is a result of a Linq expression, so that it doest not really have values until you calling a method such as ToList, ToArray, etc.
Get back to your example, the variable x in OrderBy method, will be treated as p.SearchColumn3 and therefore, it's a string.
To avoid that, you need to let p.SearchColumn3 become integer before OrderBy method.
You should add a let statement in to your code as below:
var values = (from p in context.Products
where p.LockedSince == null
let val = Convert.ToInt32(p.SearchColumn3)
select val).Distinct();
values = values.OrderBy(x => x);
In addition, you can combine order by statement with the first, it will be fine.

Categories