How to formulate an IQueryable to query a recursive database table? - c#

I have a database table like this:
Entity
---------------------
ID int PK
ParentID int FK
Code varchar
Text text
The ParentID field is a foreign key with another record in the same table (recursive). So the structure represents a Tree.
I'm trying to write a method to query this table and get 1 specific Entity based on a path. A path would be a string representing the Code properties of the Entity and the parent Entities. So an example path would be "foo/bar/baz" which means the one specific Entity of which the Code == "baz", the parent's Code == "bar" and the parent of the parent's Code == "foo".
My attempt:
public Entity Single(string path)
{
string[] pathParts = path.Split('/');
string code = pathParts[pathParts.Length -1];
if (pathParts.Length == 1)
return dataContext.Entities.Single(e => e.Code == code && e.ParentID == 0);
IQueryable<Entity> entities = dataContext.Entities.Where(e => e.Code == code);
for (int i = pathParts.Length - 2; i >= 0; i--)
{
string parentCode = pathParts[i];
entities = entities.Where(e => e.Entity1.Code == parentCode); // incorrect
}
return entities.Single();
}
I know this isn't correct because the Where inside the forloop just adds more conditions to the current Entity instead of the parent Entity, but how do I correct this? In words I would like the for-loop to say "and the parent's code must be x and the parent of that parent's code must be y, and the parent of that parent of that parent's code must be z .... etc". Besides that, for performance reasons I'd like it to be one IQueryable so there will be just 1 query going to the database.

How to formulate an IQueryable to query a recursive database table?
I'd like it to be one IQueryable so there will be just 1 query going
to the database.
I don't think traversing an hierarchical table using a single translated query is currently possible with Entity Framework. The reason is you'll need to implement either a loop or recursion and to my best knowledge neither can be translated into an EF object store query.
UPDATE
#Bazzz and #Steven got me thinking and I have to admit I was completely wrong: it is possible and quite easy to construct an IQueryable for these requirements dynamically.
The following function can be called recursively to build up the query:
public static IQueryable<TestTree> Traverse(this IQueryable<TestTree> source, IQueryable<TestTree> table, LinkedList<string> parts)
{
var code = parts.First.Value;
var query = source.SelectMany(r1 => table.Where(r2 => r2.Code == code && r2.ParentID == r1.ID), (r1, r2) => r2);
if (parts.Count == 1)
{
return query;
}
parts.RemoveFirst();
return query.Traverse(table, parts);
}
The root query is a special case; here's a working example of calling Traverse:
using (var context = new TestDBEntities())
{
var path = "foo/bar/baz";
var parts = new LinkedList<string>(path.Split('/'));
var table = context.TestTrees;
var code = parts.First.Value;
var root = table.Where(r1 => r1.Code == code && !r1.ParentID.HasValue);
parts.RemoveFirst();
foreach (var q in root.Traverse(table, parts))
Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code);
}
The DB is queried only once with this generated code:
exec sp_executesql N'SELECT
[Extent3].[ID] AS [ID],
[Extent3].[ParentID] AS [ParentID],
[Extent3].[Code] AS [Code]
FROM [dbo].[TestTree] AS [Extent1]
INNER JOIN [dbo].[TestTree] AS [Extent2] ON ([Extent2].[Code] = #p__linq__1) AND ([Extent2].[ParentID] = [Extent1].[ID])
INNER JOIN [dbo].[TestTree] AS [Extent3] ON ([Extent3].[Code] = #p__linq__2) AND ([Extent3].[ParentID] = [Extent2].[ID])
WHERE ([Extent1].[Code] = #p__linq__0) AND ([Extent1].[ParentID] IS NULL)',N'#p__linq__1 nvarchar(4000),#p__linq__2 nvarchar(4000),#p__linq__0 nvarchar(4000)',#p__linq__1=N'bar',#p__linq__2=N'baz',#p__linq__0=N'foo'
And while I like the execution plan of the raw query (see below) a bit better, the approach is valid and perhaps useful.
End of UPDATE
Using IEnumerable
The idea is to grab the relevant data from the table in one go and then do the traversing in the application using LINQ to Objects.
Here's a recursive function that will get a node from a sequence:
static TestTree GetNode(this IEnumerable<TestTree> table, string[] parts, int index, int? parentID)
{
var q = table
.Where(r =>
r.Code == parts[index] &&
(r.ParentID.HasValue ? r.ParentID == parentID : parentID == null))
.Single();
return index < parts.Length - 1 ? table.GetNode(parts, index + 1, q.ID) : q;
}
You can use like this:
using (var context = new TestDBEntities())
{
var path = "foo/bar/baz";
var q = context.TestTrees.GetNode(path.Split('/'), 0, null);
Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code);
}
This will execute one DB query for each path part, so if you want the DB to only be queried once, use this instead:
using (var context = new TestDBEntities())
{
var path = "foo/bar/baz";
var q = context.TestTrees
.ToList()
.GetNode(path.Split('/'), 0, null);
Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code);
}
An obvious optimization is to exclude the codes not present in our path before traversing:
using (var context = new TestDBEntities())
{
var path = "foo/bar/baz";
var parts = path.Split('/');
var q = context
.TestTrees
.Where(r => parts.Any(p => p == r.Code))
.ToList()
.GetNode(parts, 0, null);
Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code);
}
This query should be fast enough unless most of your entities have similar codes. However, if you absolutely need top performance, you could use raw queries.
SQL Server Raw Query
For SQL Server a CTE-based query would probably be best:
using (var context = new TestDBEntities())
{
var path = "foo/bar/baz";
var q = context.Database.SqlQuery<TestTree>(#"
WITH Tree(ID, ParentID, Code, TreePath) AS
(
SELECT ID, ParentID, Code, CAST(Code AS nvarchar(512)) AS TreePath
FROM dbo.TestTree
WHERE ParentID IS NULL
UNION ALL
SELECT TestTree.ID, TestTree.ParentID, TestTree.Code, CAST(TreePath + '/' + TestTree.Code AS nvarchar(512))
FROM dbo.TestTree
INNER JOIN Tree ON Tree.ID = TestTree.ParentID
)
SELECT * FROM Tree WHERE TreePath = #path", new SqlParameter("path", path)).Single();
Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code);
}
Limiting data by the root node is easy and might be quite useful performance-wise:
using (var context = new TestDBEntities())
{
var path = "foo/bar/baz";
var q = context.Database.SqlQuery<TestTree>(#"
WITH Tree(ID, ParentID, Code, TreePath) AS
(
SELECT ID, ParentID, Code, CAST(Code AS nvarchar(512)) AS TreePath
FROM dbo.TestTree
WHERE ParentID IS NULL AND Code = #parentCode
UNION ALL
SELECT TestTree.ID, TestTree.ParentID, TestTree.Code, CAST(TreePath + '/' + TestTree.Code AS nvarchar(512))
FROM dbo.TestTree
INNER JOIN Tree ON Tree.ID = TestTree.ParentID
)
SELECT * FROM Tree WHERE TreePath = #path",
new SqlParameter("path", path),
new SqlParameter("parentCode", path.Split('/')[0]))
.Single();
Console.WriteLine("{0} {1} {2}", q.ID, q.ParentID, q.Code);
}
Footnotes
All of this was tested with .NET 4.5, EF 5, SQL Server 2012. Data setup script:
CREATE TABLE dbo.TestTree
(
ID int not null IDENTITY PRIMARY KEY,
ParentID int null REFERENCES dbo.TestTree (ID),
Code nvarchar(100)
)
GO
INSERT dbo.TestTree (ParentID, Code) VALUES (null, 'foo')
INSERT dbo.TestTree (ParentID, Code) VALUES (1, 'bar')
INSERT dbo.TestTree (ParentID, Code) VALUES (2, 'baz')
INSERT dbo.TestTree (ParentID, Code) VALUES (null, 'bla')
INSERT dbo.TestTree (ParentID, Code) VALUES (1, 'blu')
INSERT dbo.TestTree (ParentID, Code) VALUES (2, 'blo')
INSERT dbo.TestTree (ParentID, Code) VALUES (null, 'baz')
INSERT dbo.TestTree (ParentID, Code) VALUES (1, 'foo')
INSERT dbo.TestTree (ParentID, Code) VALUES (2, 'bar')
All examples in my test returned the 'baz' entity with ID 3. It's assumed that the entity actually exists. Error handling is out of scope of this post.
UPDATE
To address #Bazzz's comment, the data with paths is shown below. Code is unique by level, not globally.
ID ParentID Code TreePath
---- ----------- --------- -------------------
1 NULL foo foo
4 NULL bla bla
7 NULL baz baz
2 1 bar foo/bar
5 1 blu foo/blu
8 1 foo foo/foo
3 2 baz foo/bar/baz
6 2 blo foo/bar/blo
9 2 bar foo/bar/bar

The trick is to do it the other way around, and build up the following query:
from entity in dataContext.Entities
where entity.Code == "baz"
where entity.Parent.Code == "bar"
where entity.Parent.Parent.Code == "foo"
where entity.Parent.Parent.ParentID == 0
select entity;
A bit naive (hard coded) solution would be like this:
var pathParts = path.Split('/').ToList();
var entities =
from entity in dataContext.Entities
select entity;
pathParts.Reverse();
for (int index = 0; index < pathParts.Count+ index++)
{
string pathPart = pathParts[index];
switch (index)
{
case 0:
entities = entities.Where(
entity.Code == pathPart);
break;
case 1:
entities = entities.Where(
entity.Parent.Code == pathPart);
break;
case 2:
entities = entities.Where(entity.Parent.Parent.Code == pathPart);
break;
case 3:
entities = entities.Where(
entity.Parent.Parent.Parent.Code == pathPart);
break;
default:
throw new NotSupportedException();
}
}
Doing this dynamically by building expression trees isn't trivial, but can be done by looking closely at what the C# compiler generates (using ILDasm or Reflector for instance). Here is an example:
private static Entity GetEntityByPath(DataContext dataContext, string path)
{
List<string> pathParts = path.Split(new char[] { '/' }).ToList<string>();
pathParts.Reverse();
var entities =
from entity in dataContext.Entities
select entity;
// Build up a template expression that will be used to create the real expressions with.
Expression<Func<Entity, bool>> templateExpression = entity => entity.Code == "dummy";
var equals = (BinaryExpression)templateExpression.Body;
var property = (MemberExpression)equals.Left;
ParameterExpression entityParameter = Expression.Parameter(typeof(Entity), "entity");
for (int index = 0; index < pathParts.Count; index++)
{
string pathPart = pathParts[index];
var entityFilterExpression =
Expression.Lambda<Func<Entity, bool>>(
Expression.Equal(
Expression.Property(
BuildParentPropertiesExpression(index, entityParameter),
(MethodInfo)property.Member),
Expression.Constant(pathPart),
equals.IsLiftedToNull,
equals.Method),
templateExpression.Parameters);
entities = entities.Where<Entity>(entityFilterExpression);
// TODO: The entity.Parent.Parent.ParentID == 0 part is missing here.
}
return entities.Single<Entity>();
}
private static Expression BuildParentPropertiesExpression(int numberOfParents, ParameterExpression entityParameter)
{
if (numberOfParents == 0)
{
return entityParameter;
}
var getParentMethod = typeof(Entity).GetProperty("Parent").GetGetMethod();
var property = Expression.Property(entityParameter, getParentMethod);
for (int count = 2; count <= numberOfParents; count++)
{
property = Expression.Property(property, getParentMethod);
}
return property;
}

You need a recursive function instead of your loop. Something like this should do the job:
public EntityTable Single(string path)
{
List<string> pathParts = path.Split('/').ToList();
string code = pathParts.Last();
var entities = dataContext.EntityTables.Where(e => e.Code == code);
pathParts.RemoveAt(pathParts.Count - 1);
return GetRecursively(entities, pathParts);
}
private EntityTable GetRecursively(IQueryable<EntityTable> entity, List<string> pathParts)
{
if (!(entity == null || pathParts.Count == 0))
{
string code = pathParts.Last();
if (pathParts.Count == 1)
{
return entity.Where(x => x.EntityTable1.Code == code && x.ParentId == x.Id).FirstOrDefault();
}
else
{
pathParts.RemoveAt(pathParts.Count - 1);
return this.GetRecursively(entity.Where(x => x.EntityTable1.Code == code), pathParts);
}
}
else
{
return null;
}
}
As you see, I am just returning the ultimate parent node. If you wanted to get a list of all EntityTable objects then I would make the recursive method to return a List of Ids of found nodes, and at the end - in the Single(...) method - run a simple LINQ query to get your IQueryable object using this list of IDs.
Edit:
I tried to do your task but I think that there is a fundamental problem: there are cases when you are not able to identify a single path. For example, you have two pathes "foo/bar/baz" and "foo/bar/baz/bak" where "baz" entities are different. If you'll be seeking path "foo/bar/baz" then you'll always find two matching pathes (one would be partial of the four-entity path). Although you can get your "baz" entity correctly, but this is too confusing and I would just redesign this: either put a unique constraint so that each entity can only be used once, or store full path in the "Code" column.

Related

Update IQueryable result before using as join in next query

I need to use Linq to Entity Framework to query a LOCATION table to get the record of the location code with the MAX effective date, then use that result as a join in the next query.
I BELIEVE I need to do convert before the IQueryable is used, because I have that last clause in the second query where I want to exclude records where the FLOOR code is in the excludedSchools list. That excludedSchools list will have the newLocationCode in it.
So, I need to update the values in the IQueryable result before I use it. Can I do this? Here is my code:
using (var db = new TheContext())
{
IQueryable<LocationTable> locatinWithMaxEffDate =
(from lc in db.LocationTable
where lc.EFF_STATUS == "A" && lc.EFFDT <= DateTime.Now
group lc by lc.LOCATION into g
select g.OrderByDescending(x => x.EFFDT).FirstOrDefault()
);
foreach (var location in locatinWithMaxEffDate.ToList())
{
string newLocationCode;
if(codeMappingDictionary.TryGetValue(location.FLOOR, out newLocationCode))
{
// how do I update locatinWithMaxEffDate FLOOR value
// with newLocationCode so it works in the query below?
location.FLOOR = newLocationCode;
}
}
var query =
(from fim in db.PS_PPS_FIM_EE_DATA
join mloc in locatinWithMaxEffDate on fim.LOCATION equals mloc.LOCATION
where
fim.EMPL_STATUS == PsPpsFimEeData.EmployeeStatusValues.Active
&& fim.AUTO_UPDATE == PsPpsFimEeData.AutoUpdateValues.Enabled
&& includeJobCodes.Contains(fim.JOBCODE)
&& !excludedSchools.Contains(mloc.FLOOR)
select new PpsAdministratorResult
{
SchoolId = mloc.FLOOR,
Login = fim.OPRID,
EmployeeId = fim.EMPLID,
}
With the code above, the locatinWithMaxEffDate does not have the updated FLOOR values. I can see why this is, but can't seem to fix it.
So far, I have tried introducing another list to ADD() the new location record to, then casting that as an IQueryable, but I get an error about primitive vs concrete types.
I decided to make things easier on myself. Since both sets of data are very small (fewer than 1000 records each) I call take the entire set of data as an annonymous type:
using (var db = new TheContext())
{
IQueryable<LocationTable> locatinWithMaxEffDate =
(from lc in db.LocationTable
where lc.EFF_STATUS == "A" && lc.EFFDT <= DateTime.Now
group lc by lc.LOCATION into g
select g.OrderByDescending(x => x.EFFDT).FirstOrDefault()
);
var query =
(from fim in db.PS_PPS_FIM_EE_DATA
join mloc in locatinWithMaxEffDate on fim.LOCATION equals mloc.LOCATION
where
fim.EMPL_STATUS == PsPpsFimEeData.EmployeeStatusValues.Active
&& fim.AUTO_UPDATE == PsPpsFimEeData.AutoUpdateValues.Enabled
&& includeJobCodes.Contains(fim.JOBCODE)
select new PpsAdministratorResult
{
SchoolId = mloc.FLOOR,
Login = fim.OPRID,
EmployeeId = fim.EMPLID,
}
}
Then, just work with the two objects:
List<PpsAdministratorResult> administratorList = new List<PpsAdministratorResult>();
foreach (var location in query.ToList())
{
string newLocationCode;
if(schoolCodeMappings.TryGetValue(location.SchoolId, out newLocationCode)) // && newLocationCode.Contains(location.LOCATION))
{
location.SchoolId = newLocationCode;
}
if( !excludedSchools.Contains(location.SchoolId) )
{
administratorList.Add(location);
}
}
Now, I have the list I want.

Querying a list of entities with composite keys in EF [duplicate]

given a list of ids, I can query all relevant rows by:
context.Table.Where(q => listOfIds.Contains(q.Id));
But how do you achieve the same functionality when the Table has a composite key?
This is a nasty problem for which I don't know any elegant solution.
Suppose you have these key combinations, and you only want to select the marked ones (*).
Id1 Id2
--- ---
1 2 *
1 3
1 6
2 2 *
2 3 *
... (many more)
How to do this is a way that Entity Framework is happy? Let's look at some possible solutions and see if they're any good.
Solution 1: Join (or Contains) with pairs
The best solution would be to create a list of the pairs you want, for instance Tuples, (List<Tuple<int,int>>) and join the database data with this list:
from entity in db.Table // db is a DbContext
join pair in Tuples on new { entity.Id1, entity.Id2 }
equals new { Id1 = pair.Item1, Id2 = pair.Item2 }
select entity
In LINQ to objects this would be perfect, but, too bad, EF will throw an exception like
Unable to create a constant value of type 'System.Tuple`2 (...) Only primitive types or enumeration types are supported in this context.
which is a rather clumsy way to tell you that it can't translate this statement into SQL, because Tuples is not a list of primitive values (like int or string). For the same reason a similar statement using Contains (or any other LINQ statement) would fail.
Solution 2: In-memory
Of course we could turn the problem into simple LINQ to objects like so:
from entity in db.Table.AsEnumerable() // fetch db.Table into memory first
join pair Tuples on new { entity.Id1, entity.Id2 }
equals new { Id1 = pair.Item1, Id2 = pair.Item2 }
select entity
Needless to say that this is not a good solution. db.Table could contain millions of records.
Solution 3: Two Contains statements (incorrect)
So let's offer EF two lists of primitive values, [1,2] for Id1 and [2,3] for Id2. We don't want to use join, so let's use Contains:
from entity in db.Table
where ids1.Contains(entity.Id1) && ids2.Contains(entity.Id2)
select entity
But now the results also contains entity {1,3}! Well, of course, this entity perfectly matches the two predicates. But let's keep in mind that we're getting closer. In stead of pulling millions of entities into memory, we now only get four of them.
Solution 4: One Contains with computed values
Solution 3 failed because the two separate Contains statements don't only filter the combinations of their values. What if we create a list of combinations first and try to match these combinations? We know from solution 1 that this list should contain primitive values. For instance:
var computed = ids1.Zip(ids2, (i1,i2) => i1 * i2); // [2,6]
and the LINQ statement:
from entity in db.Table
where computed.Contains(entity.Id1 * entity.Id2)
select entity
There are some problems with this approach. First, you'll see that this also returns entity {1,6}. The combination function (a*b) does not produce values that uniquely identify a pair in the database. Now we could create a list of strings like ["Id1=1,Id2=2","Id1=2,Id2=3]" and do
from entity in db.Table
where computed.Contains("Id1=" + entity.Id1 + "," + "Id2=" + entity.Id2)
select entity
(This would work in EF6, not in earlier versions).
This is getting pretty messy. But a more important problem is that this solution is not sargable, which means: it bypasses any database indexes on Id1 and Id2 that could have been used otherwise. This will perform very very poorly.
Solution 5: Best of 2 and 3
So the most viable solution I can think of is a combination of Contains and a join in memory: First do the contains statement as in solution 3. Remember, it got us very close to what we wanted. Then refine the query result by joining the result as an in-memory list:
var rawSelection = from entity in db.Table
where ids1.Contains(entity.Id1) && ids2.Contains(entity.Id2)
select entity;
var refined = from entity in rawSelection.AsEnumerable()
join pair in Tuples on new { entity.Id1, entity.Id2 }
equals new { Id1 = pair.Item1, Id2 = pair.Item2 }
select entity;
It's not elegant, messy all the same maybe, but so far it's the only scalable1 solution to this problem I found, and applied in my own code.
Solution 6: Build a query with OR clauses
Using a Predicate builder like Linqkit or alternatives, you can build a query that contains an OR clause for each element in the list of combinations. This could be a viable option for really short lists. With a couple of hundreds of elements, the query will start performing very poorly. So I don't consider this a good solution unless you can be 100% sure that there will always be a small number of elements. One elaboration of this option can be found here.
Solution 7: Unions
There's also a solution using UNIONs that I posted later here.
1As far as the Contains statement is scalable: Scalable Contains method for LINQ against a SQL backend
Solution for Entity Framework Core with SQL Server
🎉 NEW! QueryableValues EF6 Edition has arrived!
The following solution makes use of QueryableValues. This is a library that I wrote to primarily solve the problem of query plan cache pollution in SQL Server caused by queries that compose local values using the Contains LINQ method. It also allows you to compose values of complex types in your queries in a performant way, which will achieve what's being asked in this question.
First you will need to install and set up the library, after doing that you can use any of the following patterns that will allow you to query your entities using a composite key:
// Required to make the AsQueryableValues method available on the DbContext.
using BlazarTech.QueryableValues;
// Local data that will be used to query by the composite key
// of the fictitious OrderProduct table.
var values = new[]
{
new { OrderId = 1, ProductId = 10 },
new { OrderId = 2, ProductId = 20 },
new { OrderId = 3, ProductId = 30 }
};
// Optional helper variable (needed by the second example due to CS0854)
var queryableValues = dbContext.AsQueryableValues(values);
// Example 1 - Using a Join (preferred).
var example1Results = dbContext
.OrderProduct
.Join(
queryableValues,
e => new { e.OrderId, e.ProductId },
v => new { v.OrderId, v.ProductId },
(e, v) => e
)
.ToList();
// Example 2 - Using Any (similar behavior as Contains).
var example2Results = dbContext
.OrderProduct
.Where(e => queryableValues
.Where(v =>
v.OrderId == e.OrderId &&
v.ProductId == e.ProductId
)
.Any()
)
.ToList();
Useful Links
Nuget Package
GitHub Repository
Benchmarks
QueryableValues is distributed under the MIT license.
You can use Union for each composite primary key:
var compositeKeys = new List<CK>
{
new CK { id1 = 1, id2 = 2 },
new CK { id1 = 1, id2 = 3 },
new CK { id1 = 2, id2 = 4 }
};
IQuerable<CK> query = null;
foreach(var ck in compositeKeys)
{
var temp = context.Table.Where(x => x.id1 == ck.id1 && x.id2 == ck.id2);
query = query == null ? temp : query.Union(temp);
}
var result = query.ToList();
You can create a collection of strings with both keys like this (I am assuming that your keys are int type):
var id1id2Strings = listOfIds.Select(p => p.Id1+ "-" + p.Id2);
Then you can just use "Contains" on your db:
using (dbEntities context = new dbEntities())
{
var rec = await context.Table1.Where(entity => id1id2Strings .Contains(entity.Id1+ "-" + entity.Id2));
return rec.ToList();
}
You need a set of objects representing the keys you want to query.
class Key
{
int Id1 {get;set;}
int Id2 {get;set;}
If you have two lists and you simply check that each value appears in their respective list then you are getting the cartesian product of the lists - which is likely not what you want. Instead you need to query the specific combinations required
List<Key> keys = // get keys;
context.Table.Where(q => keys.Any(k => k.Id1 == q.Id1 && k.Id2 == q.Id2));
I'm not completely sure that this is valid use of Entity Framework; you may have issues with sending the Key type to the database. If that happens then you can be creative:
var composites = keys.Select(k => p1 * k.Id1 + p2 * k.Id2).ToList();
context.Table.Where(q => composites.Contains(p1 * q.Id1 + p2 * q.Id2));
You can create an isomorphic function (prime numbers are good for this), something like a hashcode, which you can use to compare the pair of values. As long as the multiplicative factors are co-prime this pattern will be isomorphic (one-to-one) - i.e. the result of p1*Id1 + p2*Id2 will uniquely identify the values of Id1 and Id2 as long as the prime numbers are correctly chosen.
But then you end up in a situation where you're implementing complex concepts and someone is going to have to support this. Probably better to write a stored procedure which takes the valid key objects.
Ran into this issue as well and needed a solution that both did not perform a table scan and also provided exact matches.
This can be achieved by combining Solution 3 and Solution 4 from Gert Arnold's Answer
var firstIds = results.Select(r => r.FirstId);
var secondIds = results.Select(r => r.SecondId);
var compositeIds = results.Select(r => $"{r.FirstId}:{r.SecondId}");
var query = from e in dbContext.Table
//first check the indexes to avoid a table scan
where firstIds.Contains(e.FirstId) && secondIds.Contains(e.SecondId))
//then compare the compositeId for an exact match
//ToString() must be called unless using EF Core 5+
where compositeIds.Contains(e.FirstId.ToString() + ":" + e.SecondId.ToString()))
select e;
var entities = await query.ToListAsync();
For EF Core I use a slightly modified version of the bucketized IN method by EricEJ to map composite keys as tuples. It performs pretty well for small sets of data.
Sample usage
List<(int Id, int Id2)> listOfIds = ...
context.Table.In(listOfIds, q => q.Id, q => q.Id2);
Implementation
public static IQueryable<TQuery> In<TKey1, TKey2, TQuery>(
this IQueryable<TQuery> queryable,
IEnumerable<(TKey1, TKey2)> values,
Expression<Func<TQuery, TKey1>> key1Selector,
Expression<Func<TQuery, TKey2>> key2Selector)
{
if (values is null)
{
throw new ArgumentNullException(nameof(values));
}
if (key1Selector is null)
{
throw new ArgumentNullException(nameof(key1Selector));
}
if (key2Selector is null)
{
throw new ArgumentNullException(nameof(key2Selector));
}
if (!values.Any())
{
return queryable.Take(0);
}
var distinctValues = Bucketize(values);
if (distinctValues.Length > 1024)
{
throw new ArgumentException("Too many parameters for SQL Server, reduce the number of parameters", nameof(values));
}
var predicates = distinctValues
.Select(v =>
{
// Create an expression that captures the variable so EF can turn this into a parameterized SQL query
Expression<Func<TKey1>> value1AsExpression = () => v.Item1;
Expression<Func<TKey2>> value2AsExpression = () => v.Item2;
var firstEqual = Expression.Equal(key1Selector.Body, value1AsExpression.Body);
var visitor = new ReplaceParameterVisitor(key2Selector.Parameters[0], key1Selector.Parameters[0]);
var secondEqual = Expression.Equal(visitor.Visit(key2Selector.Body), value2AsExpression.Body);
return Expression.AndAlso(firstEqual, secondEqual);
})
.ToList();
while (predicates.Count > 1)
{
predicates = PairWise(predicates).Select(p => Expression.OrElse(p.Item1, p.Item2)).ToList();
}
var body = predicates.Single();
var clause = Expression.Lambda<Func<TQuery, bool>>(body, key1Selector.Parameters[0]);
return queryable.Where(clause);
}
class ReplaceParameterVisitor : ExpressionVisitor
{
private ParameterExpression _oldParameter;
private ParameterExpression _newParameter;
public ReplaceParameterVisitor(ParameterExpression oldParameter, ParameterExpression newParameter)
{
_oldParameter = oldParameter;
_newParameter = newParameter;
}
protected override Expression VisitParameter(ParameterExpression node)
{
if (ReferenceEquals(node, _oldParameter))
return _newParameter;
return base.VisitParameter(node);
}
}
/// <summary>
/// Break a list of items tuples of pairs.
/// </summary>
private static IEnumerable<(T, T)> PairWise<T>(this IEnumerable<T> source)
{
var sourceEnumerator = source.GetEnumerator();
while (sourceEnumerator.MoveNext())
{
var a = sourceEnumerator.Current;
sourceEnumerator.MoveNext();
var b = sourceEnumerator.Current;
yield return (a, b);
}
}
private static TKey[] Bucketize<TKey>(IEnumerable<TKey> values)
{
var distinctValueList = values.Distinct().ToList();
// Calculate bucket size as 1,2,4,8,16,32,64,...
var bucket = 1;
while (distinctValueList.Count > bucket)
{
bucket *= 2;
}
// Fill all slots.
var lastValue = distinctValueList.Last();
for (var index = distinctValueList.Count; index < bucket; index++)
{
distinctValueList.Add(lastValue);
}
var distinctValues = distinctValueList.ToArray();
return distinctValues;
}
In the absence of a general solution, I think there are two things to consider:
Avoid multi-column primary keys (will make unit testing easier too).
But if you have to, chances are that one of them will reduce the
query result size to O(n) where n is the size of the ideal query
result. From here, its Solution 5 from Gerd Arnold above.
For example, the problem leading me to this question was querying order lines, where the key is order id + order line number + order type, and the source had the order type being implicit. That is, the order type was a constant, order ID would reduce the query set to order lines of relevant orders, and there would usually be 5 or less of these per order.
To rephrase: If you have a composite key, changes are that one of them have very few duplicates. Apply Solution 5 from above with that.
I tried this solution and it worked with me and the output query was perfect without any parameters
using LinqKit; // nuget
var customField_Ids = customFields?.Select(t => new CustomFieldKey { Id = t.Id, TicketId = t.TicketId }).ToList();
var uniqueIds1 = customField_Ids.Select(cf => cf.Id).Distinct().ToList();
var uniqueIds2 = customField_Ids.Select(cf => cf.TicketId).Distinct().ToList();
var predicate = PredicateBuilder.New<CustomFieldKey>(false); //LinqKit
var lambdas = new List<Expression<Func<CustomFieldKey, bool>>>();
foreach (var cfKey in customField_Ids)
{
var id = uniqueIds1.Where(uid => uid == cfKey.Id).Take(1).ToList();
var ticketId = uniqueIds2.Where(uid => uid == cfKey.TicketId).Take(1).ToList();
lambdas.Add(t => id.Contains(t.Id) && ticketId.Contains(t.TicketId));
}
predicate = AggregateExtensions.AggregateBalanced(lambdas.ToArray(), (expr1, expr2) =>
{
var invokedExpr = Expression.Invoke(expr2, expr1.Parameters.Cast<Expression>());
return Expression.Lambda<Func<CustomFieldKey, bool>>
(Expression.OrElse(expr1.Body, invokedExpr), expr1.Parameters);
});
var modifiedCustomField_Ids = repository.GetTable<CustomFieldLocal>()
.Select(cf => new CustomFieldKey() { Id = cf.Id, TicketId = cf.TicketId }).Where(predicate).ToArray();
I ended up writing a helper for this problem that relies on System.Linq.Dynamic.Core;
Its a lot of code and don't have time to refactor at the moment but input / suggestions appreciated.
public static IQueryable<TEntity> WhereIsOneOf<TEntity, TSource>(this IQueryable<TEntity> dbSet,
IEnumerable<TSource> source,
Expression<Func<TEntity, TSource,bool>> predicate) where TEntity : class
{
var (where, pDict) = GetEntityPredicate(predicate, source);
return dbSet.Where(where, pDict);
(string WhereStr, IDictionary<string, object> paramDict) GetEntityPredicate(Expression<Func<TEntity, TSource, bool>> func, IEnumerable<TSource> source)
{
var firstP = func.Parameters[0];
var binaryExpressions = RecurseBinaryExpressions((BinaryExpression)func.Body);
var i = 0;
var paramDict = new Dictionary<string, object>();
var res = new List<string>();
foreach (var sourceItem in source)
{
var innerRes = new List<string>();
foreach (var bExp in binaryExpressions)
{
var emp = ToEMemberPredicate(firstP, bExp);
var val = emp.GetKeyValue(sourceItem);
var pName = $"#{i++}";
paramDict.Add(pName, val);
var str = $"{emp.EntityMemberName} {emp.SQLOperator} {pName}";
innerRes.Add(str);
}
res.Add( "(" + string.Join(" and ", innerRes) + ")");
}
var sRes = string.Join(" || ", res);
return (sRes, paramDict);
}
EMemberPredicate ToEMemberPredicate(ParameterExpression firstP, BinaryExpression bExp)
{
var lMember = (MemberExpression)bExp.Left;
var rMember = (MemberExpression)bExp.Right;
var entityMember = lMember.Expression == firstP ? lMember : rMember;
var keyMember = entityMember == lMember ? rMember : lMember;
return new EMemberPredicate(entityMember, keyMember, bExp.NodeType);
}
List<BinaryExpression> RecurseBinaryExpressions(BinaryExpression e, List<BinaryExpression> runningList = null)
{
if (runningList == null) runningList = new List<BinaryExpression>();
if (e.Left is BinaryExpression lbe)
{
var additions = RecurseBinaryExpressions(lbe);
runningList.AddRange(additions);
}
if (e.Right is BinaryExpression rbe)
{
var additions = RecurseBinaryExpressions(rbe);
runningList.AddRange(additions);
}
if (e.Left is MemberExpression && e.Right is MemberExpression)
{
runningList.Add(e);
}
return runningList;
}
}
Helper class:
public class EMemberPredicate
{
public readonly MemberExpression EntityMember;
public readonly MemberExpression KeyMember;
public readonly PropertyInfo KeyMemberPropInfo;
public readonly string EntityMemberName;
public readonly string SQLOperator;
public EMemberPredicate(MemberExpression entityMember, MemberExpression keyMember, ExpressionType eType)
{
EntityMember = entityMember;
KeyMember = keyMember;
KeyMemberPropInfo = (PropertyInfo)keyMember.Member;
EntityMemberName = entityMember.Member.Name;
SQLOperator = BinaryExpressionToMSSQLOperator(eType);
}
public object GetKeyValue(object o)
{
return KeyMemberPropInfo.GetValue(o, null);
}
private string BinaryExpressionToMSSQLOperator(ExpressionType eType)
{
switch (eType)
{
case ExpressionType.Equal:
return "==";
case ExpressionType.GreaterThan:
return ">";
case ExpressionType.GreaterThanOrEqual:
return ">=";
case ExpressionType.LessThan:
return "<";
case ExpressionType.LessThanOrEqual:
return "<=";
case ExpressionType.NotEqual:
return "<>";
default:
throw new ArgumentException($"{eType} is not a handled Expression Type.");
}
}
}
Use Like so:
// This can be a Tuple or whatever.. If Tuple, then y below would be .Item1, etc.
// This data structure is up to you but is what I use.
[FromBody] List<CustomerAddressPk> cKeys
var res = await dbCtx.CustomerAddress
.WhereIsOneOf(cKeys, (x, y) => y.CustomerId == x.CustomerId
&& x.AddressId == y.AddressId)
.ToListAsync();
Hope this helps others.
in Case of composite key you can use another idlist and add a condition for that in your code
context.Table.Where(q => listOfIds.Contains(q.Id) && listOfIds2.Contains(q.Id2));
or you can use one another trick create a list of your keys by adding them
listofid.add(id+id1+......)
context.Table.Where(q => listOfIds.Contains(q.Id+q.id1+.......));
I tried this on EF Core 5.0.3 with the Postgres provider.
context.Table
.Select(entity => new
{
Entity = entity,
CompositeKey = entity.Id1 + entity.Id2,
})
.Where(x => compositeKeys.Contains(x.CompositeKey))
.Select(x => x.Entity);
This produced SQL like:
SELECT *
FROM table AS t
WHERE t.Id1 + t.Id2 IN (#__compositeKeys_0)),
Caveats
this should only be used where the combination of Id1 and Id2 will always produce a unique result (e.g., they're both UUIDs)
this cannot use indexes, though you could save the composite key to the db with an index

Matching objects by property name and value using Linq

I need to be able to match an object to a record by matching property names and values using a single Linq query. I don't see why this shouldn't be possible, but I haven't been able to figure out how to make this work. Right now I can do it using a loop but this is slow.
Heres the scenario:
I have tables set up that store records of any given entity by putting their primary keys into an associated table with the key's property name and value.
If I have a random object at run-time, I need to be able to check if a copy of that object exists in the database by checking if the object has property names that match all of the keys of a record in the database ( this would mean that they would be the same type of object) and then checking if the values for each of the keys match, giving me the same record.
Here's how I got it to work using a loop (simplified a bit):
public IQueryable<ResultDataType> MatchingRecordFor(object entity)
{
var result = Enumerable.Empty<ResultDataType>();
var records = _context.DataBaseRecords
var entityType = entity.GetType();
var properties = entityType.GetProperties().Where(p => p.PropertyType.Namespace == "System");
foreach (var property in properties)
{
var name = property.Name;
var value = property.GetValue(entity);
if (value != null)
{
var matchingRecords = records.Where(c => c.DataBaseRecordKeys.Any(k => k.DataBaseRecordKeyName == name && k.DataBaseRecordValue == value.ToString()));
if (matchingRecords.Count() > 0)
{
records = matchingRecords;
}
}
}
result = (from c in records
from p in c.DataBaseRecordProperties
select new ResultDataType()
{
ResultDataTypeId = c.ResultDataTypeID,
SubmitDate = c.SubmitDate,
SubmitUserId = c.SubmitUserId,
PropertyName = p.PropertyName
});
return result.AsQueryable();
}
The last statement joins a property table related to the database record with information on all of the properties.
This works well enough for a single record, but I'd like to get rid of that loop so that I can speed things up enough to work on many records.
using System.Reflection;
public IQueryable<ResultDataType> MatchingRecordFor(object entity)
{
var records = _context.DataBaseRecords;
var entityType = entity.GetType();
var properties = entityType.GetProperties().Where(p => p.PropertyType.Namespace == "System");
Func<KeyType, PropertyInfo, bool> keyMatchesProperty =
(k, p) => p.Name == k.DataBaseRecordKeyName && p.GetValue(entity).ToString() == k.DataBaseRecordValue;
var result =
from r in records
where r.DataBaseRecordKeys.All(k => properties.Any(pr => keyMatchesProperty(k, pr)))
from p in r.DataBaseRecordProperties
select new ResultDataType()
{
ResultDataTypeId = r.ResultDataTypeId,
SubmitDate = r.SubmitDate,
SubmitUserId = r.SubmitUserId,
PropertyName = p.PropertyName
});
return result.AsQueryable();
}
Hopefully I got that query language right. You'll have to benchmark it to see if it's more efficient than your original approach.
edit: This is wrong, see comments

Entity Query in Query, is it possible

I'm not really sure how to ask this question. I need to create an object, I believe it is called a projection, that has the result of one query, plus from that need to query another table and get that object into the projection.
This is a C# WCF Service for a Website we are building with HTML5, JS, and PhoneGap.
EDIT: Getting an error on the ToList (see code) - "The method or operation is not implemented."
EDIT3: changed the Entity Object company_deployed_files to IQueryable AND removed the FirstOrDefault caused a new/different exception Message = "The 'Distinct' operation cannot be applied to the collection ResultType of the specified argument.\r\nParameter name: argument"
Background: This is a kind of messed up Entity Model as it was developed for Postgresql, and I don't have access to any tools to update the model except by hand. Plus some design issues with the database does not allow for great model even if we did. In other words my two tables don't have key constrains(in the entity model) to perform a join in the entity model - unless someone shows me how - that honestly might be the best solution - but would need some help with that.
But getting the below code to work would be a great solution.
public List<FileIDResult> GetAllFileIDFromDeviceAndGroup ( int deviceID, int groupID)
{
List<FileIDResult> returnList = null;
using (var db = new PgContext())
{
IQueryable<FileIDResult> query = null;
if (deviceID > 0)
{
var queryForID =
from b in db.device_files
where b.device_id == deviceID
select new FileIDResult
{
file_id = b.file_id,
file_description = b.file_description,
company_deployed_files = (from o in db.company_deployed_files
where o.file_id == b.file_id
select o).FirstOrDefault(),
IsDeviceFile = true
};
if (query == null)
{
query = queryForID;
}
else
{
// query should always be null here
}
}
if (groupID > 0)
{
var queryForfileID =
from b in db.group_files
where b.group_id == groupID
select new FileIDResult
{
file_id = b.file_id,
file_description = b.file_description,
company_deployed_files = (from o in db.company_deployed_files
where o.file_id == b.file_id
select o).FirstOrDefault(),
IsDeviceFile = false
};
if (query != null)
{
// query may or may not be null here
query = query.Union(queryForfileID);
}
else
{
// query may or may not be null here
query = queryForfileID;
}
}
//This query.ToList(); is failing - "The method or operation is not implemented."
returnList = query.ToList ();
}
return returnList;
}
Edit 2
The ToList is throwing an exception.
I'm 98% sure it is the lines: company_deployed_files = (from o in db.company_deployed_files where o.file_id == b.file_id select o).FirstOrDefault()
End Edit 2

LINQ to SQL - Select where starts with any of list

Working on a Linq-to-SQL project and observing some odd behavior with the generated SQL. Basically I have an array of strings, and I need to select all rows where a column starts with one of those strings.
using (SqlConnection sqlConn = new SqlConnection(connString))
{
using (IdsSqlDataContext context = new IdsSqlDataContext(sqlConn))
{
//generated results should start with one of these.
//in real code base they are obviously not hardcoded and list is variable length
string[] args = new string[] { "abc", "def", "hig" };
IQueryable<string> queryable = null;
//loop through the array, the first time through create an iqueryable<>, and subsequent passes union results onto original
foreach (string arg in args)
{
if (queryable == null)
{
queryable = context.IdsForms.Where(f => f.MatterNumber.StartsWith(arg)).Select(f => f.MatterNumber);
}
else
{
queryable = queryable.Union(context.IdsForms.Where(f => f.MatterNumber.StartsWith(arg)).Select(f => f.MatterNumber));
}
}
//actually execute the query.
var result = queryable.ToArray();
}
}
I would expect the sql generated to be functionally equivalent to the following.
select MatterNumber
from IdsForm
where MatterNumber like 'abc%' or MatterNumber like 'def%' or MatterNumber like 'hig%'
But the actual SQL generated is below, notice 'hig%' is the argument for all three like clauses.
exec sp_executesql N'SELECT [t4].[MatterNumber]
FROM (
SELECT [t2].[MatterNumber]
FROM (
SELECT [t0].[MatterNumber]
FROM [dbo].[IdsForm] AS [t0]
WHERE [t0].[MatterNumber] LIKE #p0
UNION
SELECT [t1].[MatterNumber]
FROM [dbo].[IdsForm] AS [t1]
WHERE [t1].[MatterNumber] LIKE #p1
) AS [t2]
UNION
SELECT [t3].[MatterNumber]
FROM [dbo].[IdsForm] AS [t3]
WHERE [t3].[MatterNumber] LIKE #p2
) AS [t4]',N'#p0 varchar(4),#p1 varchar(4),#p2 varchar(4)',#p0='hig%',#p1='hig%',#p2='hig%'
Looks like you're closing over the loop variable. This is a common gotcha in C#. What happens is that the value of arg is evaluated when the query is run, not when it is created.
Create a temp variable to hold the value:
foreach (string arg in args)
{
var temp = arg;
if (queryable == null)
{
queryable = context.IdsForms.Where(f => f.MatterNumber.StartsWith(temp)).Select(f => f.MatterNumber);
}
else
{
queryable = queryable.Union(context.IdsForms.Where(f => f.MatterNumber.StartsWith(temp)).Select(f => f.MatterNumber));
}
}
You can read this Eric Lippert post about closing over a loop variable. As Eric notes at the top of the article, and as #Magus points out in a comment, this has changed in C# 5 so that the foreach variable is a new copy on each iteration. Creating a temp variable, like above, is forward compatible though.
The union is correct, due to you using union in your linq to sql query. The reason they are all hig% is because the lambda f => f.MatterNumber.StartsWith(arg) creates a closure around the loop parameter. To fix, declare a local variable in the loop
foreach (string arg in args)
{
var _arg = arg;
if (queryable == null)
{
queryable = context.IdsForms.Where(f => f.MatterNumber.StartsWith(_arg)).Select(f => f.MatterNumber);
}
else
{
queryable = queryable.Union(context.IdsForms.Where(f => f.MatterNumber.StartsWith(_arg)).Select(f => f.MatterNumber));
}
}
But I agree the union seems unnecessary. If the array of strings to check against is not going to change, then you can just use a standard where clause. Otherwise you could take a look at predicate builder! Check here
How about this ?
queryable = context..IdsForms.Where(f =>
{
foreach (var arg in args)
{
if (f.MatterNumber.StartsWith(arg))
return true;
}
return false;
}).Select(f => f.MatterNumber);

Categories