LINQ to Entities equivalent of sql "TOP(n) WITH TIES"

LINQ to Entities equivalent of sql "TOP(n) WITH TIES" - c#

I have been searcing for LINQ equivalent of WITH TIES in sql server lately, I came across a couple things, which couldn't proove to be useful.
I know this question was asked before and has an accepted answer, but it doesn't work the way with ties does. The solution using GroupBy() doesn't result as expected for TOP(3) WITH TIES considering a data set consisting of {3 2 2 1 1 0} the result set will be {3 2 2 1 1} where it should be {3 2 2}
Using the following sample data (taken from this question):
CREATE TABLE Person
(
Id int primary key,
Name nvarchar(50),
Score float
)
INSERT INTO Person VALUES (1, 'Tom',8.9)
INSERT INTO Person VALUES (2, 'Jerry',8.9)
INSERT INTO Person VALUES (3, 'Sharti',7)
INSERT INTO Person VALUES (4, 'Mamuzi',9)
INSERT INTO Person VALUES (5, 'Kamala',9)
Traditional OrderByDescending(p => p.Score).Take(3) will result with: Mamuzi, Kamala and one of Tom (or Jerry) where it should include BOTH
I know there is no built-in equivalent of it and i've found a way to implement it. I don't know if it is the best way to do it and open for alternative solutions.

var query = (from q in list.OrderByDescending(s => s.Score).Take(3).Select(s => s.Score).Distinct()
from i in list
where q == i.Score
select i).ToList();
Edit:
#Zefnus
I wasn't sure in which order you wanted it but to change the order you can put a OrderBy(s => s.Score) between select i and ToList()
I don't have the possibility to check what sql statement my linq clause would produce. But your answer is much better i think. And your question was also really good. I never thought about top with ties in linq. ;)
Basically it only takes top 3 scores from the first list and compares them with the whole list and i takes only those scores which are equal to the scores of the first list.

Do not use IEnumerable<T> with anything touching a database!
A solution aimed at LinqToSql and LinqToEntities should not be using IEnumerable<T>. Your current self answer will result in every single person being selected from the database and then being queried in memory using LinqToObjects.
To make a solution that is translated to SQL and executed by the database you have to use IQueryable<T> and Expressions instead.
public static class QueryableExtensions
{
public static IQueryable<T> TopWithTies<T, TComparand>(this IQueryable<T> source, Expression<Func<T, TComparand>> topBy, int topCount)
{
if (source == null) throw new ArgumentNullException("source");
if (topBy == null) throw new ArgumentNullException("topBy");
if (topCount < 1) throw new ArgumentOutOfRangeException("topCount", string.Format("topCount must be greater than 0, was {0}", topCount));
var topValues = source.OrderBy(topBy)
.Select(topBy)
.Take(topCount);
var queryableMaxMethod = typeof(Queryable).GetMethods()
.Single(mi => mi.Name == "Max" &&
mi.GetParameters().Length == 1 &&
mi.IsGenericMethod)
.MakeGenericMethod(typeof(TComparand));
var lessThanOrEqualToMaxTopValue = Expression.Lambda<Func<T, bool>>(
Expression.LessThanOrEqual(
topBy.Body,
Expression.Call(
queryableMaxMethod,
topValues.Expression)),
new[] { topBy.Parameters.Single() });
var topNRowsWithTies = source.Where(lessThanOrEqualToMaxTopValue)
.OrderBy(topBy);
return topNRowsWithTies;
}
public static IQueryable<T> TopWithTiesDescending<T, TComparand>(this IQueryable<T> source, Expression<Func<T, TComparand>> topBy, int topCount)
{
if (source == null) throw new ArgumentNullException("source");
if (topBy == null) throw new ArgumentNullException("topBy");
if (topCount < 1) throw new ArgumentOutOfRangeException("topCount", string.Format("topCount must be greater than 0, was {0}", topCount));
var topValues = source.OrderByDescending(topBy)
.Select(topBy)
.Take(topCount);
var queryableMinMethod = typeof(Queryable).GetMethods()
.Single(mi => mi.Name == "Min" &&
mi.GetParameters().Length == 1 &&
mi.IsGenericMethod)
.MakeGenericMethod(typeof(TComparand));
var greaterThanOrEqualToMinTopValue = Expression.Lambda<Func<T, bool>>(
Expression.GreaterThanOrEqual(
topBy.Body,
Expression.Call(queryableMinMethod,
topValues.Expression)),
new[] { topBy.Parameters.Single() });
var topNRowsWithTies = source.Where(greaterThanOrEqualToMinTopValue)
.OrderByDescending(topBy);
return topNRowsWithTies;
}
}
This creates queries of the following form:
SELECT [t0].[Id], [t0].[Name], [t0].[Score]
FROM [Person] AS [t0]
WHERE [t0].[Score] >= ((
SELECT MIN([t2].[Score])
FROM (
SELECT TOP (3) [t1].[Score]
FROM [Person] AS [t1]
ORDER BY [t1].[Score] DESC
) AS [t2]
))
ORDER BY [t0].[Score] DESC
That query is only about 50% worse than the baseline query:
SELECT TOP (3) WITH TIES
[t0].[Id],
[t0].[Name],
[t0].[Score]
FROM
[Person] AS [t0]
ORDER BY [t0].[Score] desc
With a data set consisting of your original 5 records and an additional 10000 records all with scores less than the original both of these are more or less instant (less than 20 milliseconds).
The IEnumerable<T> approach took a whole 2 minutes!
If the expression building and reflection seems scary the same thing can be achieved with a join:
public static IQueryable<T> TopWithTiesDescendingJoin<T, TComparand>(this IQueryable<T> source, Expression<Func<T, TComparand>> topBy, int topCount)
{
if (source == null) throw new ArgumentNullException("source");
if (topBy == null) throw new ArgumentNullException("topBy");
if (topCount < 1) throw new ArgumentOutOfRangeException("topCount", string.Format("topCount must be greater than 0, was {0}", topCount));
var orderedByValue = source.OrderByDescending(topBy);
var topNValues = orderedByValue.Select(topBy).Take(topCount).Distinct();
var topNRowsWithTies = topNValues.Join(source, value => value, topBy, (x, row) => row);
return topNRowsWithTies.OrderByDescending(topBy);
}
With the following query as the result (with about the same performance):
SELECT [t3].[Id], [t3].[Name], [t3].[Score]
FROM (
SELECT DISTINCT [t1].[Score]
FROM (
SELECT TOP (3) [t0].[Score]
FROM [Person] AS [t0]
ORDER BY [t0].[Score] DESC
) AS [t1]
) AS [t2]
INNER JOIN [Person] AS [t3] ON [t2].[Score] = [t3].[Score]
ORDER BY [t3].[Score] DESC

Another solution - which probably is not as efficient as the other solution - is to get TOP(3) Scores and get the rows with Score values contained in the TOP(3).
We can use Contains() as follows;
orderedPerson = datamodel.People.OrderByDescending(p => p.Score);
topPeopleList =
(
from p in orderedPerson
let topNPersonScores = orderedPerson.Take(n).Select(p => p.Score).Distinct()
where topNPersonScores.Contains(p.Score)
select p
).ToList();
What's good about this implementation is that it's extension method TopWithTies() can be implemented easly as;
public static IEnumerable<T> TopWithTies<T, TResult>(this IEnumerable<T> enumerable, Func<T, TResult> selector, int n)
{
IEnumerable<T> orderedEnumerable = enumerable.OrderByDescending(selector);
return
(
from p in orderedEnumerable
let topNValues = orderedEnumerable.Take(n).Select(selector).Distinct()
where topNValues.Contains(selector(p))
select p
);
}

I think that maybe you can do something like:
OrderByDescending(p => p.Score).Skip(2).Take(1)
Count the number of occurrences of this element, and then:
OrderByDescending(p => p.Score).Take(2 + "The select with the number of occurrences for the third element")
I think that maybe this works ;)
It´s only an idea!

I've found a solution taking the Score field value of the Nth row (3rd row in this case) using .Skip(n-1).Take(1) and selecting all rows with score value greater or equal to that as follows:
qryPeopleOrderedByScore = datamodel.People.OrderByDescending(p => p.Score);
topPeopleList =
(
from p in qryPeopleOrderedByScore
let lastPersonInList = qryPeopleOrderedByScore.Skip(2).Take(1).FirstOrDefault()
where lastPersonInList == null || p.Score >= lastPersonInList.Score
select p
).ToList();

Related

Linq to Entities Where clause compare value that can be int or string

I have a drop down list that will provide either a numeric or the word ANY. I need to create a LINQ SELECT containing a WHERE clause that can mimic the following SQL:
var p varchar2(3);
select ... from ...
where (
( (:p = 'ANY') and id in (select distinct id from Ids) )
or
(:p='1' and id = 42)
)
ps: I will be using an expression tree to handle the OR aspect :-)

Somthing like this?
string input = /***/
var result = Context.Entities
.Where(ent => (input == "ANY"
&& Context.UserIds.Select(usr => isr.Id)
.Distinct()
.Contains(ent.Id))
|| (input == "1" && ent.Id == 42))
.Select(ent => /***/);
Disclaimer: written from memory, can contain compile-time errors (typo mistakes etc)

Unable to form the proper Linq query using an "IN" list

I have the below SQL Query
;with cte as(
select a.*
from [dbo].[AccountViewModel] a
where a.COLLECTORID = 724852
and a.MONTH = 12
and a.YEAR=2015)
select *
from cte c
where c.DispCode in ('Deceased','DND','WN','WI','NC','NORESPONSE','SKIP','SHIFTED','SFU')
OR (c.DispCode in('PTP','DIB','WCE','DP') and convert(varchar(11), c.PTPDate) >=convert(varchar(11), getdate()))
OR (MONTH(c.LastPaymentDate) = 12 and YEAR(c.LastPaymentDate)=2015)
I need to convert this into an equivalent Linq query (C#).
The Cte part is working fine with the below program (I have cross checked the records)
private List<AccountViewModel> GetAllAcountsForLoggedInAgents()
{
var allAcountsForLoggedInAgents = new List<AccountViewModel>();
allAcountsForLoggedInAgents = new ViewModelDatabase()
.Accounts
.Where(a =>
a.COLLECTORID == 724852 &&
a.MONTH == DateTime.Now.Month &&
a.YEAR == DateTime.Now.Year
)
.ToList();
return allAcountsForLoggedInAgents;
}
However the part outside CTE is not working correctly (means improper records)
GetAllAcountsForLoggedInAgents()
.Where
(
a =>
("Deceased,DND,WN,WI,NC,NORESPONSE,SKIP,SHIFTED,SFU".Split(',').Any(x => x.Contains(a.DispCode)))
|| ("PTP,DIB,WCE,DP".Split(',').Any(b => b.Contains(a.DispCode)) && a.PTPDate >= DateTime.Now)
|| (a.LastPaymentDate.Value.Month == 12 && a.LastPaymentDate.Value.Year == 2015)
)
I believe that may be I am using "ANY" in a wrong way.

This condition is not the same as the IN clause
("Deceased,DND,WN,WI,NC,NORESPONSE,SKIP,SHIFTED,SFU".Split(',').Any(x => x.Contains(a.DispCode)))
because it searches a.DispCode in one of the strings. You should use equality instead:
("Deceased,DND,WN,WI,NC,NORESPONSE,SKIP,SHIFTED,SFU".Split(',').Any(x => x == a.DispCode))
This is not ideal, because Split operation is not free, so you don't want to do it as part of your query. Making a static array of strings:
static readonly string[] DispCodeFilter = new string[] {
"Deceased", "DND", "WN", "WI", "NC", "NORESPONSE", "SKIP", "SHIFTED", "SFU"
};
...
(DispCodeFilter.Any(x => x == a.DispCode))

Your In condition is incorrect. It can be fixed by adding an extension method. I am using a generic method, but you could make it type specific if you only need/want it for strings. I am using params, so you can either provide the items one by one or via a split.
public static bool In<T>(this T item, params T[] items) {
return items.Any(i=> Equals(item, i));
}
GetAllAcountsForLoggedInAgents().Where( a => a.DispCode.In
("Deceased","DND","WN","WI","NC","NORESPONSE","SKIP","SHIFTED","SFU")
|| (a.DispCode.In("PTP,DIB,WCE,DP".Split(',')) &&
a.PTPDate >= DateTime.Now)
|| (a.LastPaymentDate.Value.Month == 12 && a.LastPaymentDate.Value.Year == 2015)
)
One difference between this and the sql version, and a reason you may not want it to be generic, is that it is case sensitive: "wi" doesn't equal "WI".

Here are 2 simple rules for converting SQL to Linq
SQL Linq
============ ==========
IN (...) Contains
EXISTS (...) Any
where Contains is the corresponding Enumerable/Queryable method (not to be mixed with string.Contains).
According to this, your Linq criteria should be something like this
var DispCodes1 = new [] { "Deceased", "DND", "WN", "WI", "NC", "NORESPONSE", "SKIP", "SHIFTED", "SFU" };
var DispCodes2 = new [] { "PTP", "DIB", "WCE", "DP" };
GetAllAcountsForLoggedInAgents()
.Where
(
a =>
DispCodes1.Contains(a.DispCode)
|| (DispCodes2.Contains(a.DispCode)) && a.PTPDate >= DateTime.Now)
|| (a.LastPaymentDate.Value.Month == 12 && a.LastPaymentDate.Value.Year == 2015)
)
dasblinkenlight answer contains a good point, so you can make DispCodes1 and DispCodes2 static, but that's not essential.
Another thing to mention is that the way you did the "CTE part" is not equivalent to the SQL query, where cte is just a named subquery and the whole query executes in the database, while in your implementation the cte part is executed in the database, then gets materialized in the memory and the additional query is executed in the memory using Linq To Objects. To make it fully equivalent and let the whole query execute in the database, change the GetAllAcountsForLoggedInAgents result type to IQueryable<AccountViewModel> and remove ToList call.

How to conveniently rewrite LINQ query?

I have realized that SQL Server is unable efficiently process some basic SQL queries, such as:
SELECT TOP (1) [t0].[Id], [t0].[L1], [t0].[L2], [t0].[Value]
FROM [Foos] AS [t0]
INNER JOIN [Lookup1] AS [t1] ON [t1].[Id] = [t0].[L2]
INNER JOIN [Lookup2] AS [t2] ON [t2].[Id] = [t0].[L1]
WHERE ([t1].[Name] = 'a') AND ([t2].[Name] = 'b')
ORDER BY [t0].[Value]
which is generated from LINQ expression:
// query 1
Foos
.Where(f => f.Lookup1.Name == "a" && f.Lookup2.Name == "b")
.OrderBy(f => f.Value)
.Take(1)
The schema definition is in question 'Index over multiple lookup tables in SQL Server'. #Hoots in the answer shows that the SQL query must look like:
SELECT TOP (1) [t0].[Id], [t0].[L1], [t0].[L2], [t0].[Value]
FROM [Foos] AS [t0]
CROSS JOIN (
SELECT TOP (1) [t1].[Id], [t2].[Id] AS [Id2]
FROM [Lookup1] AS [t1], [Lookup2] AS [t2]
WHERE ([t1].[Name] = 'a') AND ([t2].[Name] = 'b')
) AS [t3]
WHERE ([t0].[L1] = [t3].[Id]) AND ([t0].[L2] = [t3].[Id2])
ORDER BY [t0].[Value] DESC
which could be generated from the following LINQ expression:
// query 2
(from f in Foos
from l in (
from l1 in Lookup1s
from l2 in Lookup2s
where l1.Name == "a"
&& l2.Name == "b"
select new { L1 = l1.Id, L2 = l2.Id }).Take(1)
where f.L1 == l.L1 && f.L2 == l.L2
orderby f.Value descending
select f).Take(1)
My question is how to automatically rewrite the query 1 into query 2? So I could compose queries in multiple steps:
void Do()
{
var x = ListFoos("a", "b").OrderBy(f => f.Value).Take(2);
// ...
}
IQueryable<Foos> ListFoos(string l1, string l2)
{
var foos = Foos.AsQueryable();
if (l1 != null)
foos = foos.Where(f => f.Lookup1.Name == l1);
if (l2 != null)
foos = foos.Where(f => f.Lookup2.Name == l2);
return foos;
}
Has someone done that already? Is there a library simplifying the task?
Clarification:
The resulting expression of IQueryable<> is translated into SQL statement which SQL Server is unable efficiently evaluate. So I need to transform the expression into an expression which is translated into better SQL statement for SQL Server.
I think that I am not the first who has encountered this issue. LINQ is a longer time here and the SQL statements are pretty basic, so other developers might already have been solving this problem with SQL Server.

I'm not 100% certain I know what you're asking for, but if I am right, you should look at PredicateBuilder. It's very useful. Link here:
http://www.albahari.com/nutshell/predicatebuilder.aspx
It's part of LinqKit:
http://www.albahari.com/nutshell/linqkit.aspx
and here's some info on how it's used:
How does PredicateBuilder work
It will basically let you do something like this:
var predicate = PredicateBuilder.True<Foo>();
if (l1 != null)
predicate = predicate.And(f => f.Lookup1.Name == l1);
if (l2 != null)
predicate = predicate.Or(f => f.Lookup2.Name == l2);
return Foos.Where(predicate);
Note the above is from memory.. I have not tested this...so might have some typos...

How to formulate an IQueryable to query a recursive database table?

I have a database table like this:
Entity
---------------------
ID int PK
ParentID int FK
Code varchar
Text text
The ParentID field is a foreign key with another record in the same table (recursive). So the structure represents a Tree.
I'm trying to write a method to query this table and get 1 specific Entity based on a path. A path would be a string representing the Code properties of the Entity and the parent Entities. So an example path would be "foo/bar/baz" which means the one specific Entity of which the Code == "baz", the parent's Code == "bar" and the parent of the parent's Code == "foo".
My attempt:
public Entity Single(string path)
{
string[] pathParts = path.Split('/');
string code = pathParts[pathParts.Length -1];
if (pathParts.Length == 1)
return dataContext.Entities.Single(e => e.Code == code && e.ParentID == 0);
IQueryable<Entity> entities = dataContext.Entities.Where(e => e.Code == code);
for (int i = pathParts.Length - 2; i >= 0; i--)
{
string parentCode = pathParts[i];
entities = entities.Where(e => e.Entity1.Code == parentCode); // incorrect
}
return entities.Single();
}
I know this isn't correct because the Where inside the forloop just adds more conditions to the current Entity instead of the parent Entity, but how do I correct this? In words I would like the for-loop to say "and the parent's code must be x and the parent of that parent's code must be y, and the parent of that parent of that parent's code must be z .... etc". Besides that, for performance reasons I'd like it to be one IQueryable so there will be just 1 query going to the database.

How to formulate an IQueryable to query a recursive database table?
I'd like it to be one IQueryable so there will be just 1 query going
to the database.
I don't think traversing an hierarchical table using a single translated query is currently possible with Entity Framework. The reason is you'll need to implement either a loop or recursion and to my best knowledge neither can be translated into an EF object store query.
UPDATE
#Bazzz and #Steven got me thinking and I have to admit I was completely wrong: it is possible and quite easy to construct an IQueryable for these requirements dynamically.
The following function can be called recursively to build up the query:
public static IQueryable<TestTree> Traverse(this IQueryable<TestTree> source, IQueryable<TestTree> table, LinkedList<string> parts)
{
var code = parts.First.Value;
var query = source.SelectMany(r1 => table.Where(r2 => r2.Code == code && r2.ParentID == r1.ID), (r1, r2) => r2);
if (parts.Count == 1)
{
return query;
}
parts.RemoveFirst();
return query.Traverse(table, parts);
}
The root query is a special case; here's a working example of calling Traverse:
using (var context = new TestDBEntities())
{
var path = "foo/bar/baz";
var parts = new LinkedList<string>(path.Split('/'));
var table = context.TestTrees;
var code = parts.First.Value;
var root = table.Where(r1 => r1.Code == code && !r1.ParentID.HasValue);
parts.RemoveFirst();
foreach (var q in root.Traverse(table, parts))
Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code);
}
The DB is queried only once with this generated code:
exec sp_executesql N'SELECT
[Extent3].[ID] AS [ID],
[Extent3].[ParentID] AS [ParentID],
[Extent3].[Code] AS [Code]
FROM [dbo].[TestTree] AS [Extent1]
INNER JOIN [dbo].[TestTree] AS [Extent2] ON ([Extent2].[Code] = #p__linq__1) AND ([Extent2].[ParentID] = [Extent1].[ID])
INNER JOIN [dbo].[TestTree] AS [Extent3] ON ([Extent3].[Code] = #p__linq__2) AND ([Extent3].[ParentID] = [Extent2].[ID])
WHERE ([Extent1].[Code] = #p__linq__0) AND ([Extent1].[ParentID] IS NULL)',N'#p__linq__1 nvarchar(4000),#p__linq__2 nvarchar(4000),#p__linq__0 nvarchar(4000)',#p__linq__1=N'bar',#p__linq__2=N'baz',#p__linq__0=N'foo'
And while I like the execution plan of the raw query (see below) a bit better, the approach is valid and perhaps useful.
End of UPDATE
Using IEnumerable
The idea is to grab the relevant data from the table in one go and then do the traversing in the application using LINQ to Objects.
Here's a recursive function that will get a node from a sequence:
static TestTree GetNode(this IEnumerable<TestTree> table, string[] parts, int index, int? parentID)
{
var q = table
.Where(r =>
r.Code == parts[index] &&
(r.ParentID.HasValue ? r.ParentID == parentID : parentID == null))
.Single();
return index < parts.Length - 1 ? table.GetNode(parts, index + 1, q.ID) : q;
}
You can use like this:
using (var context = new TestDBEntities())
{
var path = "foo/bar/baz";
var q = context.TestTrees.GetNode(path.Split('/'), 0, null);
Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code);
}
This will execute one DB query for each path part, so if you want the DB to only be queried once, use this instead:
using (var context = new TestDBEntities())
{
var path = "foo/bar/baz";
var q = context.TestTrees
.ToList()
.GetNode(path.Split('/'), 0, null);
Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code);
}
An obvious optimization is to exclude the codes not present in our path before traversing:
using (var context = new TestDBEntities())
{
var path = "foo/bar/baz";
var parts = path.Split('/');
var q = context
.TestTrees
.Where(r => parts.Any(p => p == r.Code))
.ToList()
.GetNode(parts, 0, null);
Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code);
}
This query should be fast enough unless most of your entities have similar codes. However, if you absolutely need top performance, you could use raw queries.
SQL Server Raw Query
For SQL Server a CTE-based query would probably be best:
using (var context = new TestDBEntities())
{
var path = "foo/bar/baz";
var q = context.Database.SqlQuery<TestTree>(#"
WITH Tree(ID, ParentID, Code, TreePath) AS
(
SELECT ID, ParentID, Code, CAST(Code AS nvarchar(512)) AS TreePath
FROM dbo.TestTree
WHERE ParentID IS NULL
UNION ALL
SELECT TestTree.ID, TestTree.ParentID, TestTree.Code, CAST(TreePath + '/' + TestTree.Code AS nvarchar(512))
FROM dbo.TestTree
INNER JOIN Tree ON Tree.ID = TestTree.ParentID
)
SELECT * FROM Tree WHERE TreePath = #path", new SqlParameter("path", path)).Single();
Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code);
}
Limiting data by the root node is easy and might be quite useful performance-wise:
using (var context = new TestDBEntities())
{
var path = "foo/bar/baz";
var q = context.Database.SqlQuery<TestTree>(#"
WITH Tree(ID, ParentID, Code, TreePath) AS
(
SELECT ID, ParentID, Code, CAST(Code AS nvarchar(512)) AS TreePath
FROM dbo.TestTree
WHERE ParentID IS NULL AND Code = #parentCode
UNION ALL
SELECT TestTree.ID, TestTree.ParentID, TestTree.Code, CAST(TreePath + '/' + TestTree.Code AS nvarchar(512))
FROM dbo.TestTree
INNER JOIN Tree ON Tree.ID = TestTree.ParentID
)
SELECT * FROM Tree WHERE TreePath = #path",
new SqlParameter("path", path),
new SqlParameter("parentCode", path.Split('/')[0]))
.Single();
Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code);
}
Footnotes
All of this was tested with .NET 4.5, EF 5, SQL Server 2012. Data setup script:
CREATE TABLE dbo.TestTree
(
ID int not null IDENTITY PRIMARY KEY,
ParentID int null REFERENCES dbo.TestTree (ID),
Code nvarchar(100)
)
GO
INSERT dbo.TestTree (ParentID, Code) VALUES (null, 'foo')
INSERT dbo.TestTree (ParentID, Code) VALUES (1, 'bar')
INSERT dbo.TestTree (ParentID, Code) VALUES (2, 'baz')
INSERT dbo.TestTree (ParentID, Code) VALUES (null, 'bla')
INSERT dbo.TestTree (ParentID, Code) VALUES (1, 'blu')
INSERT dbo.TestTree (ParentID, Code) VALUES (2, 'blo')
INSERT dbo.TestTree (ParentID, Code) VALUES (null, 'baz')
INSERT dbo.TestTree (ParentID, Code) VALUES (1, 'foo')
INSERT dbo.TestTree (ParentID, Code) VALUES (2, 'bar')
All examples in my test returned the 'baz' entity with ID 3. It's assumed that the entity actually exists. Error handling is out of scope of this post.
UPDATE
To address #Bazzz's comment, the data with paths is shown below. Code is unique by level, not globally.
ID ParentID Code TreePath
---- ----------- --------- -------------------
1 NULL foo foo
4 NULL bla bla
7 NULL baz baz
2 1 bar foo/bar
5 1 blu foo/blu
8 1 foo foo/foo
3 2 baz foo/bar/baz
6 2 blo foo/bar/blo
9 2 bar foo/bar/bar

The trick is to do it the other way around, and build up the following query:
from entity in dataContext.Entities
where entity.Code == "baz"
where entity.Parent.Code == "bar"
where entity.Parent.Parent.Code == "foo"
where entity.Parent.Parent.ParentID == 0
select entity;
A bit naive (hard coded) solution would be like this:
var pathParts = path.Split('/').ToList();
var entities =
from entity in dataContext.Entities
select entity;
pathParts.Reverse();
for (int index = 0; index < pathParts.Count+ index++)
{
string pathPart = pathParts[index];
switch (index)
{
case 0:
entities = entities.Where(
entity.Code == pathPart);
break;
case 1:
entities = entities.Where(
entity.Parent.Code == pathPart);
break;
case 2:
entities = entities.Where(entity.Parent.Parent.Code == pathPart);
break;
case 3:
entities = entities.Where(
entity.Parent.Parent.Parent.Code == pathPart);
break;
default:
throw new NotSupportedException();
}
}
Doing this dynamically by building expression trees isn't trivial, but can be done by looking closely at what the C# compiler generates (using ILDasm or Reflector for instance). Here is an example:
private static Entity GetEntityByPath(DataContext dataContext, string path)
{
List<string> pathParts = path.Split(new char[] { '/' }).ToList<string>();
pathParts.Reverse();
var entities =
from entity in dataContext.Entities
select entity;
// Build up a template expression that will be used to create the real expressions with.
Expression<Func<Entity, bool>> templateExpression = entity => entity.Code == "dummy";
var equals = (BinaryExpression)templateExpression.Body;
var property = (MemberExpression)equals.Left;
ParameterExpression entityParameter = Expression.Parameter(typeof(Entity), "entity");
for (int index = 0; index < pathParts.Count; index++)
{
string pathPart = pathParts[index];
var entityFilterExpression =
Expression.Lambda<Func<Entity, bool>>(
Expression.Equal(
Expression.Property(
BuildParentPropertiesExpression(index, entityParameter),
(MethodInfo)property.Member),
Expression.Constant(pathPart),
equals.IsLiftedToNull,
equals.Method),
templateExpression.Parameters);
entities = entities.Where<Entity>(entityFilterExpression);
// TODO: The entity.Parent.Parent.ParentID == 0 part is missing here.
}
return entities.Single<Entity>();
}
private static Expression BuildParentPropertiesExpression(int numberOfParents, ParameterExpression entityParameter)
{
if (numberOfParents == 0)
{
return entityParameter;
}
var getParentMethod = typeof(Entity).GetProperty("Parent").GetGetMethod();
var property = Expression.Property(entityParameter, getParentMethod);
for (int count = 2; count <= numberOfParents; count++)
{
property = Expression.Property(property, getParentMethod);
}
return property;
}

You need a recursive function instead of your loop. Something like this should do the job:
public EntityTable Single(string path)
{
List<string> pathParts = path.Split('/').ToList();
string code = pathParts.Last();
var entities = dataContext.EntityTables.Where(e => e.Code == code);
pathParts.RemoveAt(pathParts.Count - 1);
return GetRecursively(entities, pathParts);
}
private EntityTable GetRecursively(IQueryable<EntityTable> entity, List<string> pathParts)
{
if (!(entity == null || pathParts.Count == 0))
{
string code = pathParts.Last();
if (pathParts.Count == 1)
{
return entity.Where(x => x.EntityTable1.Code == code && x.ParentId == x.Id).FirstOrDefault();
}
else
{
pathParts.RemoveAt(pathParts.Count - 1);
return this.GetRecursively(entity.Where(x => x.EntityTable1.Code == code), pathParts);
}
}
else
{
return null;
}
}
As you see, I am just returning the ultimate parent node. If you wanted to get a list of all EntityTable objects then I would make the recursive method to return a List of Ids of found nodes, and at the end - in the Single(...) method - run a simple LINQ query to get your IQueryable object using this list of IDs.
Edit:
I tried to do your task but I think that there is a fundamental problem: there are cases when you are not able to identify a single path. For example, you have two pathes "foo/bar/baz" and "foo/bar/baz/bak" where "baz" entities are different. If you'll be seeking path "foo/bar/baz" then you'll always find two matching pathes (one would be partial of the four-entity path). Although you can get your "baz" entity correctly, but this is too confusing and I would just redesign this: either put a unique constraint so that each entity can only be used once, or store full path in the "Code" column.

LINQ extension method help sought

this is by far my toughest question yet and I'm hoping someone has stumbled upon this issue before and found an elegant answer. Basically, I've got a few linq extension methods (which just happen to be in subsonic but would be applicable in any linq derivative) that are working perfectly (extensions for .WhereIn() and .WhereNotIn()). these methods operate to transform the linq to the sql equivalents of in(). Now the code below works perfectly when supplying known typed parameters (i.e. an array or params array):
public static IQueryable<T> WhereIn<T, TValue>(
this IQueryable<T> query,
Expression<Func<T, TValue>> selector,
params TValue[] collection) where T : class
{
if (selector == null) throw new ArgumentNullException("selector");
if (collection == null) throw new ArgumentNullException("collection");
ParameterExpression p = selector.Parameters.Single();
if (!collection.Any()) return query;
IEnumerable<Expression> equals = collection.Select(value =>
(Expression)Expression.Equal(selector.Body,
Expression.Constant(value, typeof(TValue))));
Expression body = equals.Aggregate(Expression.Or);
return query.Where(Expression.Lambda<Func<T, bool>>(body, p));
}
usage:
var args = new [] { 1, 2, 3 };
var bookings = _repository.Find(r => r.id > 0).WhereIn(x => x.BookingTypeID, args);
// OR we could just as easily plug args in as 1,2,3 as it's defined as params
var bookings2 = _repository.Find(r => r.id > 0).WhereIn(x => x.BookingTypeID, 1,2,3,90);
However, now for the complicated part. I'd like to be able to pass an IQueryable object into an overload version of the above that accepts a second linq object as the parameter in order to achieve the equivalent of select * from table1 where table1.id in(select id from table2). here is the method signature that actually compiles ok but has the all important logic missing:
public static IQueryable<T> WhereIn<T, TValue, T2, TValue2>(
this IQueryable<T> query,
Expression<Func<T, TValue>> selector,
T2 entity2,
Expression<Func<T2, TValue2>> selector2) where T : class
{
if (selector == null) throw new ArgumentNullException("selector");
if (selector2 == null) throw new ArgumentNullException("selector2");
ParameterExpression p = selector.Parameters.Single();
ParameterExpression p2 = selector2.Parameters.Single();
/* this is the missing section */
/* i'd like to see the final select generated as
*
* select * from T where T.selector in(select T2.selector2 from T2)
*/
return null;
// this is just to allow it to compile - proper return value pending
}
usage:
var bookings = _repository.Find(r => r.BookingID>0)
.WhereIn(x => x.BookingTypeID, new BookingType(), y => y.BookingTypeID);
am i barking up an non existent (expression) tree here :-) - or is this pretty do-able.
all the best - here's hoping.
jim

Why would you not just use a join?
var query = from x in table1
join y in table2 on x.Id equals y.Id
select x;
Or if there might be multiple y values for each x:
var query = from x in table1
join z in table2.Select(y => y.Id).Distinct() on x.Id equals z
select x;
I would expect queries like that to be well optimized in SQL databases.
Or if you really want to use Where:
var query = table1.Where(x => table2.Select(y => y.Id).Contains(x.Id));
I may be missing something bigger... or it could be that translating the above queries into extension methods is what you're looking for :)

i eventually opted for an extension method to achieve this but still isn't 100% sucessful.
I'll drop the actual full working code here at some point later, once i've integrated it with all my other options.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LINQ to Entities equivalent of sql "TOP(n) WITH TIES" - c#

Related

Linq to Entities Where clause compare value that can be int or string

Unable to form the proper Linq query using an "IN" list

How to conveniently rewrite LINQ query?

How to formulate an IQueryable to query a recursive database table?

LINQ extension method help sought

Categories

Resources