Retrieving a tree structure from a database using LINQ

Retrieving a tree structure from a database using LINQ - c#

I have an organization chart tree structure stored in a database.
Is is something like
ID (int);
Name (String);
ParentID (int)
In C# it is represented by a class like
class Employee
{
int ID,
string Name,
IList < Employee> Subs
}
I am wondering how is the best way to retrieve these values from the database to fill up the C# Objects using LINQ (I am using Entity Framework)
There must be something better than making a call to get the top level then making repeated calls to get subs and so on.
How best to do it?

You can build a stored proc that has built in recursion. Take a look at http://msdn.microsoft.com/en-us/library/ms190766.aspx for more info on Common Table Expressions in SQL Server
You might want to find a different (better?) way to model your data. http://www.sqlteam.com/article/more-trees-hierarchies-in-sql lists a popular way of modeling hierarchical data in a database. Changing the modeling can allow you to create queries that can be expressed without recursion.

If you're using SQL Server 2008, you could make use of the new HIERARCHYID feature.
Organizations have struggled in past
with the representation of tree like
structures in the databases, lot of
joins lots of complex logic goes into
the place, whether it is organization
hierarchy or defining a BOM (Bill of
Materials) where one finished product
is dependent on another semi finished
materials / kit items and these kit
items are dependent on another semi
finished items or raw materials.
SQL Server 2008 has the solution to
the problem where we store the entire
hierarchy in the data type
HierarchyID. HierarchyID is a variable
length system data type. HierarchyID
is used to locate the position in the
hierarchy of the element like Scott is
the CEO and Mark as well as Ravi
reports to Scott and Ben and Laura
report to Mark, Vijay, James and Frank
report to Ravi.
So use the new functions available, and simply return the data you need without using LINQ. The drawback is you'll need to use UDF or stored procedures for anything beyond a simple root query:
SELECT #Manager = CAST('/1/' AS hierarchyid)
SELECT #FirstChild = #Manager.GetDescendant(NULL,NULL)

I'd add a field to the entity to include the parent ID, then I'd pull the whole table into memory leaving the List subs null. Id then iterate through the objects and populate the list using linq to objects. Only one DB query so should be reasonable.

An Entity Framework query should allow you to include related entity sets, though in a unary relationship, not sure how it would work...
Check this out for more information on that: http://msdn.microsoft.com/en-us/library/bb896272.aspx

Well... even with LINQ you will need two queries, because any single query will duplicate the main employee and thus will result in multiple employees (that are really the same) being created... However, you can hide this a bit with linq when you create the object, that's when you would execute the second query, something like this:
var v = from u in TblUsers
select new {
SupervisorName = u.DisplayName,
Subs = (from sub in TblUsers where sub.SupervisorID.Value==u.UserID select sub.DisplayName).ToList()
};

Related

EF 5 large data load strategy

Good day,
I have the following tables
PARENT 1=>N CHILDREN 1=>N GRANDCHILDREN.
Both tables have over 30 columns.
I need to select over 50,000 records form PARENT, plus I will need certain fields from CHILDREN and GRANDCHILDREN. Data is needed to manipulate in memory (complex algorithms on what's been selected).
I am using Entity Framework 5.
I tried various combinations of Eager loading (Include, projection etc), but I am still not able to make it perform better then it perorms with LINQ-to-SQL in the following scenario:
"
SELECT from PROJECTS
on binding of each row:
SELECT from CHILDREN
SELECT from GRANDCHILDREN
"
it generates at least 50,001 calls to the DB, but it's still performing better then any of my EF approaches, which take over x5 longer than the current LINQ-to-SQL design.
The best solution would be to have an WHERE IN query on children, but it's not available in EF 5 in native implementation (contains doesn't cut it - too slow for badly done...).
Any ideas will be greatly appreciated.
Thanks,

I assume you are implementing paging in your grid view and are not puting thousands of rows into a grid view at once. If so, you can only select 10 or however many rows you are displaying in the grid view at a time. This will be a lot easier to work with.
I found this example on MSDN that implements paging server side to reduce the number of rows returned in a single query.
You can also consider writing or having a dba write an efficient stored procedure that you can link to your entity framework to control the SQL Code.

I had similar issue some days ago. EF is very slow. After some experiments I received more or less normal performans with direct queries:
Create ViewModel with needed fields:
public class MyViewModel
{
public string one {get; set;}
public string two {get; set;}
}
Then in controller action:
MyViewModel result = db.Database.SqlQuery<MyViewModel>
("SELECT a.one, b.two" +
" FROM Table1 a, Table2 b" +
" WHERE a.id == somthing"
).FirstOrDefault();

Paging wouldn't work for I need data to be sorted based on a calculated field. The field can be only calculated in the web-server memory for the calculation needs client info (yes, yes, there is a way of passing this info to the db server, but this wasn't an option).
Solution:
using(var onecontext = new myCTx())
{
SELECT all from PROJECTS
and implement Context.EntityName.SQLQuery() on all grand children, using the good old WHERE IN construct (I put it all into my entities' partial classes as extensions).
}
this way I get all my data in N db trips, where N is the number of generations, which is fine. The EF context then connects everything together. And then I perform all my r
EF 6 should have WHERE IN built in, so I guess this approach will become more obvious then. Mind you: using Contains() is not an option for large data for it produces multiple OR's instead of the straight IN. Yes, ADO.NET then translates OR's into IN, but before that there is some really heavy lifting being done, which is killing your app server.

NHibernate stored procedure returns data from previous query

I use two stored procedures that return the data with the same structure (list of records of the same type).
I call my method Execute(ISession session) twice. First time for the first stored procedure (it returns correct list of 6 rows). Second time - for the second stored procedure (it returns list of 11 rows, but first 6 rows are from the first request that overwrite the correct rows).
I found
Impact on NHibernate caching for searches with results including calculated value mapped as a formula (e.g. rank)
But I can't use it for IQuery
Any ideas or links how it can be fixed ?
public dynamic Execute(ISession session)
{
var query = session.GetNamedQuery(QueryName)
.SetCacheable(false)
.SetCacheMode(CacheMode.Ignore)
.SetReadOnly(true);
var results = query.List<T>();
return results;
}

I'm going to take a stab at answering this, because I think I have a hunch of what's going on, and I want to set you on the right track. I've made a lot of assumptions here, so please don't be too harsh on me if I was completely wrong with my guesses.
It feels like you're trying to use NHibernate as a tool to simply translate rows into objects. Instead NHibernate is a tool that translates between your object oriented domain model and your relational database domain model. It does a lot more that just turn rows into objects. In particular, the NHibernate feature that you're tripping over here is how NHibernate ensures that within a single NHibernate session, a single row in the database which represents a single entity will correspond to a single instance of an object. It uses its first-level cache to accomplish this.
Let's say you have two queries, QueryA and QueryB. These queries have been constructed so that they each pull from separate tables, TableA and TableB, so really they represent separate entities. However, the queries have also somehow been built so that the result look to NHibernate like the same entity. If QueryA and QueryB happen to return some of the same ids, then NHibernate will combine them into the same instance, so you would see some of the results from QueryA repeated when you run QueryB.
So how do we fix it?
The quick and dirty fix would be to use different sessions for each of those two queries, or throw a session.Clear() in-between them. The more appropriate fix would be to change these named queries so that they actually do return two different entities.

I have the same problem, in first place I resolved the problem with session.Clear() but this solution lead to another bug. I read the response of Daniel and this response I served to detect that the issue is in the stored procedure, the stored procedure did not return an unique identifier and this produced the error when I mapped the ID with nhibernate.

CRM 2011: Limitation of query expression?

I believe the answer to this question may be to use Linq to Sql, but wanted to see if this is something which is possible using QueryExpressions:-
I create a query expression which queries against Entity A, it also links to Entity B (via LinkEntity) and imposes additional criteria. It is possible to retrieve columns from Entity B by adding the appropriate attribute names. However, it will only retrieve the linked entity (inner join).
Is it possible using QueryExpression to retrieve all related records (and required columns) from Entity B related to Entity A (e.g. all cases associated with contact where contact passes specified criteria). Normally I would consider inverting the query and searching for Entity B relatig to Entity A with the appropriate LinkEntity Conditions, but there are a number of linked entities which I would like to retrieve for the same contact query.
So I'm left with some options:-
(1) Perform a second query (not ideal when iterating over a large number of results from the initial query),
(2) Perform a query using Linq to CRM on the filtered views,
(3) A different method entirely?
Any thoughts would be appreciated.
EDIT:
I ended up using Linq-to-Sql to complete this task and the code used is similar to that below (albeit with a few more joins for the actual query!):-
var dataCollection = (from eA in xrmServiceContext.EntityASet
join eB in xrmServiceContext.EntityBSet on new EntityReference(EntityA.EntityLogicalName, eA.Id) equals (EntityReference)eB.EntityBLookupToEntityA
select new
{
Id = eA.Id,
EntityBInterestingAttribute = eB.InterestingAttributeName
}
So this will bring back a row per Entity A, per Entity B. To make things easier I then defined a custom class "MyEntityAClass" which had properties which were Lists so I could return one object for filling of GridView etc. This is more to do with the processing of these results though so I haven't posted that code here.
I hope that makes sense. Essentially, it is getting the multiple rows per record a la SQL which makes this method work.

QueryExpression can only return fields from one type of entity, the one specified in QueryExpression.EntityName.
You can use FetchXML which allows you to also get the fields of any link entities, which would be an option 3 for you, unfortunately it returns the data as XML which you would then have to parse yourself.
It might be quicker to run the FetchXML, but it will take longet to write and test, and its not the easiest thing to maintain either.
Sample Code, this gets the first 101 of all Cases that are active for all accounts that are active
string fetch = "<fetch count='101' mapping='logical'><entity name='account'><filter type='and'><condition attribute='statecode' operator='eq' value='1'/></filter><link-entity name='incident' from='customerid' to='accountid'><all-attributes/><filter type='and'><condition attribute='statecode' operator='eq' value='1'/></filter></link-entity></entity></fetch>";
string data = yourCrmServiceObject.Fetch(fetch);

Accessing foreign keys through LINQ

I have a setup on SQL Server 2008. I've got three tables. One has a string identifier as a primary key. The second table holds indices into an attribute table. The third simply holds foreign keys into both tables- so that the attributes themselves aren't held in the first table but are instead referred to. Apparently this is common in database normalization, although it is still insane because I know that, since the key is a string, it would take a maximum of 1 attribute per 30 first table room entries to yield a space benefit, let alone the time and complexity problems.
How can I write a LINQ to SQL query to only return values from the first table, such that they hold only specific attributes, as defined in the list in the second table? I attempted to use a Join or GroupJoin, but apparently SQL Server 2008 cannot use a Tuple as the return value.

"I attempted to use a Join or
GroupJoin, but apparently SQL Server
2008 cannot use a Tuple as the return
value".
You can use anonymous types instead of Tuples which are supported by Linq2SQL.
IE:
from x in source group x by new {x.Field1, x.Field2}

I'm not quite clear what you're asking for. Some code might help. Are you looking for something like this?
var q = from i in ctx.Items
select new
{
i.ItemId,
i.ItemTitle,
Attributes = from map in i.AttributeMaps
select map.Attribute
};

I use this page all the time for figuring out complex linq queries when I know the sql approach I want to use.
VB http://msdn.microsoft.com/en-us/vbasic/bb688085
C# http://msdn.microsoft.com/en-us/vcsharp/aa336746.aspx
If you know how to write the sql query to get the data you want then this will show you how to get the same result translating it into linq syntax.

How to get linq to produce exactly the sql I want?

It is second nature for me to whip up some elaborate SQL set processing code to solve various domain model questions. However, the trend is not to touch SQL anymore. Is there some pattern reference or conversion tool out there that helps convert the various SQL patterns to Linq syntax?
I would look-up ways to code things like the following code: (this has a sub query):
SELECT * FROM orders X WHERE
(SELECT COUNT(*) FROM orders Y
WHERE Y.totalOrder > X.totalOrder) < 6
(Grab the top five highest total orders with side effects)
Alternatively, how do you know Linq executes as a single statement without using a debugger? I know you need to follow the enumeration, but I would assume just lookup the patterns somewhere.
This is from the MSDN site which is their example of doing a SQL difference. I am probably wrong, but I wouldn't think this uses set processing on the server (I think it pulls both sets locally then takes the difference, which would be very inefficient). I am probably wrong, and this could be one of the patterns on that reference.
SQL difference example:
var differenceQuery =
(from cust in db.Customers
select cust.Country)
.Except
(from emp in db.Employees
select emp.Country);
Thanks
-- Update:
-- Microsoft's 101 Linq Samples in C# is a closer means of constructing linq in a pattern to produce the SQL you want. I will post more as I find them. I am really looking for a methodology (patterns or a conversion tool) to convert SQL to Linq.
-- Update (sql from Microsoft's difference pattern in Linq):
SELECT DISTINCT [t0].[field] AS [Field_Name]
FROM [left_table] AS [t0]
WHERE NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM [right_table] AS [t1]
WHERE [t0].[field] = [t1].[field]
))
That's what we wanted, not what I expected. So, that's one pattern to memorize.

If you have hand-written SQL, you can use ExecuteQuery, specifying the type of "row" class as a function template argument:
var myList = DataContext.ExecuteQuery<MyRow>(
"select * from myview");
The "row" class exposes the columns as public properties. For example:
public class MyRow {
public int Id { get; set; }
public string Name { get; set; }
....
}
You can decorate the columns with more information:
public class MyRow {
....
[Column(Storage="NameColumn", DbType="VarChar(50)")]
public string Name { get; set; }
....
}
In my experience linq to sql doesn't generate very good SQL code, and the code it does generate breaks down for large databases. What linq to sql does very well is expose stored procedures to your client. For example:
var result = DataContext.MyProcedure(a,b,c);
This allows you to store SQL in the database, while having the benefits of an easy to use, automatically generated .NET wrapper.
To see the exact SQL that's being used, you can use the SQL Server Profiler tool:
http://msdn.microsoft.com/en-us/library/ms187929.aspx
The Linq-to-Sql Debug Visualizer:
http://weblogs.asp.net/scottgu/archive/2007/07/31/linq-to-sql-debug-visualizer.aspx
Or you can write custom code to log the queries:
http://goneale.wordpress.com/2008/12/31/log-linq-2-sql-query-execution-to-consoledebug-window/

This is why Linq Pad was created in the first place. :) It allows you to easily see what the output is. What the results of the query would be etc. Best is it's free. Maybe not the answer to your question but I am sure it could help you.

If you know exactly the sql you want, then you should use ExecuteQuery.
I can imagine a few ways to translate the query you've shown, but if you're concerned that "Except" might not be translated.
Test it. If it works the way you want then great, otherwise:
Rewrite it with items you know will translate, for example:
db.Customers.Where(c => !db.Employees.Any(e => c.Country == e.Country) );

If you are concerned about the TSQL generated, then I would suggest formalising the queries into stored procedures or UDFs, and accessing them via the data-context. The UDF approach has slightly better metadata and composability (compared to stored procedure) - for example you can add addition Where/Skip/Take etc to a UDF query and have it run at the database (but last time I checked, only LINQ-to-SQL (not Entity Framework) supported UDF usage).
You can also use ExecuteQuery, but there are advantages of letting the database own the fixed queries.
Re finding what TSQL executed... with LINQ-to-SQL you can assign any TextWriter (for example, Console.Out) to DataContext.Log.

I believe the best way is to use stored procedures. In this case you has full control on the SQL.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Retrieving a tree structure from a database using LINQ - c#

I'd add a field to the entity to include the parent ID, then I'd pull the whole table into memory leaving the List subs null. Id then iterate through the objects and populate the list using linq to objects. Only one DB query so should be reasonable.

An Entity Framework query should allow you to include related entity sets, though in a unary relationship, not sure how it would work... Check this out for more information on that: http://msdn.microsoft.com/en-us/library/bb896272.aspx

Related

EF 5 large data load strategy

NHibernate stored procedure returns data from previous query

CRM 2011: Limitation of query expression?

Accessing foreign keys through LINQ

How to get linq to produce exactly the sql I want?

Categories

Resources