Linq to SQL Nested Objects - c#

I have a object called Category which has an Id, Name, and OwnerId. I then nest these to create Subcategories. If a category has an Owner Id it is a sub category. The number of subcategories is unlimited but each item can only have 1 parent. Simple enough.
My Issue is, I need to access a subcategory after loaded. How do I get the Owning category using Linq. I know the Owner Id but I dont know how many lvls deep the owner could be.
Basically I am looking for a way to get the Category or subcategory where the Id == X but this can live in a subcategory 6 levels or more deep.
I am trying to avoid a loop for each sub category in each sub category....

There is another way to store/retrieve a tree hierarchy as explained in this fogbugz blog post:
Turns out there's a pretty cool
solution for this problem explained by
Joe Celko. Instead of attempting to
maintain a bunch of parent/child
relationships all over your database
-- which would necessitate recursive SQL queries to find all the
descendents of a node -- we mark each
case with a "left" and "right" value
calculated by traversing the tree
depth-first and counting as we go. A
node's "left" value is set whenever it
is first seen during traversal, and
the "right" value is set when walking
back up the tree away from the node.
A picture probably makes more sense:
The Nested Set SQL model lets us add
case hierarchies without sacrificing
performance.
How does this help? Now we just ask
for all the cases with a "left" value
between 2 and 9 to find all of the
descendents of B in one fast, indexed
query. Ancestors of G are found by
asking for nodes with "left" less than
6 (G's own "left") and "right" greater
than 6. Works in all databases.
Greatly increases performance --
particularly when querying large
hierarchies
Here's another post going into more detail. It's written using Sql and php but I think you can get the gist of it and easily translate in Linq to Sql.

In MS SQL 2005 and up you can create recursive queries. In LINQ to SQL however, you are out of luck. Without restructoring the data in the database, there is no way you can traverse the tree in a single database call.
However... there is 1 workaround I can think of. When you are able to group all Category elements of a single tree (or a part of the tree) together, you can pre-load that part of the complete tree in a single statement. After that, you will be able to traverse that part of the tree, without triggering new calls to the database. It would look something like this:
// Load the category that will be used as starting point.
var subCategory = db.Categories.Single(c => c.Id == 56);
// Performance: Load the complete group in one go.
var categories = (
from category in db.Categories
where category.GroupId == subCategory.GroupId
select category)
.ToArray();
// Traverse the tree and get the top-most parent (if any).
var parent = subCategory.GetParents().LastOrDefault();
// Extension method to get the parents.
public static IEnumerable<Category> GetParents(
this Category category)
{
while (category.Parent != null)
{
// NOTE: cat.Parent will not cause a database call
// when the Parent is already loaded by L2S.
yield return cat.Parent;
category = category.Parent;
}
}
This of course will only work if you will be able to determine elements as a group. Whether this solution will be faster also depends on the size of the group. When the group of objects that you load (and don't use) is very big, it will actually slow the application down.

Related

EF 5 large data load strategy

Good day,
I have the following tables
PARENT 1=>N CHILDREN 1=>N GRANDCHILDREN.
Both tables have over 30 columns.
I need to select over 50,000 records form PARENT, plus I will need certain fields from CHILDREN and GRANDCHILDREN. Data is needed to manipulate in memory (complex algorithms on what's been selected).
I am using Entity Framework 5.
I tried various combinations of Eager loading (Include, projection etc), but I am still not able to make it perform better then it perorms with LINQ-to-SQL in the following scenario:
"
SELECT from PROJECTS
on binding of each row:
SELECT from CHILDREN
SELECT from GRANDCHILDREN
"
it generates at least 50,001 calls to the DB, but it's still performing better then any of my EF approaches, which take over x5 longer than the current LINQ-to-SQL design.
The best solution would be to have an WHERE IN query on children, but it's not available in EF 5 in native implementation (contains doesn't cut it - too slow for badly done...).
Any ideas will be greatly appreciated.
Thanks,
I assume you are implementing paging in your grid view and are not puting thousands of rows into a grid view at once. If so, you can only select 10 or however many rows you are displaying in the grid view at a time. This will be a lot easier to work with.
I found this example on MSDN that implements paging server side to reduce the number of rows returned in a single query.
You can also consider writing or having a dba write an efficient stored procedure that you can link to your entity framework to control the SQL Code.
I had similar issue some days ago. EF is very slow. After some experiments I received more or less normal performans with direct queries:
Create ViewModel with needed fields:
public class MyViewModel
{
public string one {get; set;}
public string two {get; set;}
}
Then in controller action:
MyViewModel result = db.Database.SqlQuery<MyViewModel>
("SELECT a.one, b.two" +
" FROM Table1 a, Table2 b" +
" WHERE a.id == somthing"
).FirstOrDefault();
Paging wouldn't work for I need data to be sorted based on a calculated field. The field can be only calculated in the web-server memory for the calculation needs client info (yes, yes, there is a way of passing this info to the db server, but this wasn't an option).
Solution:
using(var onecontext = new myCTx())
{
SELECT all from PROJECTS
and implement Context.EntityName.SQLQuery() on all grand children, using the good old WHERE IN construct (I put it all into my entities' partial classes as extensions).
}
this way I get all my data in N db trips, where N is the number of generations, which is fine. The EF context then connects everything together. And then I perform all my r
EF 6 should have WHERE IN built in, so I guess this approach will become more obvious then. Mind you: using Contains() is not an option for large data for it produces multiple OR's instead of the straight IN. Yes, ADO.NET then translates OR's into IN, but before that there is some really heavy lifting being done, which is killing your app server.

Entity Framework 5 (Code First) Navigation Properties

Is it the correct behaviour of entity framework to load all items with the given foreign key for a navigation property before querying/filtering?
For example:
myUser.Apples.First(a => a.Id == 1 && !a.Expires.HasValue);
Will load all apples associated with that user. (The SQL query doesn't query the ID or Expires fields).
There are two other ways of doing it (which generate the correct SQL) but neither as clean as using the navigation properties:
myDbContext.Entry(myUser).Collection(u => u.Apples).Query().First(a => a.Id == 1 && !a.Expires.HasValue);
myDbContext.Apples.First(a => a.UserId == myUser.Id && a.Id == 1 && !a.Expires.HasValue);
Things I've Checked
Lazy load is enabled and is not disabled anywhere.
The navigation properties are virtual.
EDIT:
Ok based on your edit I think i had the wrong idea about what you were asking (which makes a lot more sense now). Ill leave the previous answer around as i think its probably useful to explain but is much less relevant to your specific question as it stands.
From what you've posted your user object is enabled for lazy loading. EF enables lazy loading by default, however there is one requirement to lazy loading which is to mark navigation properties as virtual (which you have done).
Lazy loading works by attaching to the get method on a navigation property and performing a SQL query at that point to retrieve the foreign entity. Navigation properties are also not queriable collections, which means that when you execute the get method your query will be executed immediately.
In your above example the apples collection on User is enumerated before you execute the .first call (which occurs using plain old linq to objects). This means that SQL will return back all of the apples associated to the user and filter them in memory on the querying machine (as you have observed). This will also mean you need two queries to pull down the apples you are interested in (one for the user and one for the nav property) which may not be efficient for you if all you want is apples.
A perhaps better way of doing this is to keep the whole expression as a query for as long as possible. An example of this would be something like the following:
myDbContext.Users
.Where(u=>u.Id == userId)
.SelectMany(u=>u.Apples)
.Where(a=>a.Id == 1 && !a.Expires.HasValue);
this should execute as a single SQL statement and only pull down the apples you care about.
HTH
Ok from what i can understand of your question you are asking why EF appears to allow you to use navigation properties in a query even though they may be null in the result set.
In answer to your question yes this is expected behavior, heres why:
Why you write a query it is translated into SQL, for example something like
myDbContext.Apples.Where(a=>a.IsRed)
will turn into something like
Select * from Apples
where [IsRed] = 1
similarly something like the following will also be translated directly to SQL
myDbContext.Apples.Where(a=>a.Tree.Height > 100)
will turn into something like
Select a.* from Apples as a
inner join Tree as t on a.TreeId = t.Id
where t.Height > 100
However its a bit of a different story when we actually pull down the result sets.
To avoid pulling down too much data and making it slow EF offers several mechanisms for specifying what comes back in the result set. One is lazy loading (which incidently needs to be used carefully if you want to avoid performance issues) and the second is the include syntax. These methods restrict what we are pulling back so that queries are quick and dont consume un-needed resources.
For example in the above you will note that only Apple fields are returned.
If we were to add an include to that as below you could get a different result:
myDbContext.Apples.Include(a=>a.Tree).Where(a=>a.Tree.Height > 100)
will translate to SQL similar to:
Select a.*, t.* from Apples as a
inner join Tree as t on a.TreeId = t.Id
where t.Height > 100
In your above example (which I'm fairly sure isn't syntactically correct as myContext.Users should be a collection and therefore shouldn't have a .Apples) you are creating a query therefor all variables are available. When you enumerate that query you have to be explicit about whats returned.
For more details on navigation properties and how they work (and the .Include syntax) check out my blog: http://blog.staticvoid.co.nz/2012/07/entity-framework-navigation-property.html

LINQ Intersection of two different types

I have two different list types. I need to remove the elements from list1 that is not there in list2 and list2 element satisfies certain criteria.
Here is what I tried, seems to work but each element is listed twice.
var filteredTracks =
from mtrack in mTracks
join ftrack in tracksFileStatus on mtrack.id equals ftrack.Id
where mtrack.id == ftrack.Id && ftrack.status == "ONDISK" && ftrack.content_type_id == 234
select mtrack;
Ideally I don't want to create a new copy of the filteredTracks, is it possible modify mTracks in place?
If you're getting duplicates, it's because your id fields are not unique in one or both of the two sequences. Also, you don't need to say where mtrack.id == ftrack.Id since that condition already has to be met for the join to succeed.
I would probably use loops here, but if you are dead set on LINQ, you may need to group tracksFileStatus by its Id field. It's hard to tell by what you posted.
As far as "modifying mTracks in place", this is probably not possible or worthwhile (I'm assuming that mTracks is some type derived from IEnumerable<T>). If you're worried about the efficiency of this approach, then you may want to consider using another kind of data structure, like a dictionary with Id values as the keys.
Since the Q was about lists primarily...
this is probably better linq wise...
var test = (from m in mTracks
from f in fTracks
where m.Id == f.Id && ...
select m);
However you should optimize, e.g.
Are your lists sorted? If they are, see e.g. Best algorithm for synchronizing two IList in C# 2.0
If it's coming from Db (it's not clear here), then you need to build your linq query based on the SQL / relations and indexes you have in the Db and go a bit different route.
If I were you, I'd make a query (for each of the lists, presuming it's not Db bound) so that tracks are sorted in the first place (and sort on whatever is used to compare them, usually),
then enumerate in parallel (using enumerators), comparing other things in the process (like in that link).
that's likely the most efficient way.
if/when it comes from database, optimize at the 'source' - i.e. fetch data already sorted and filtered as much as you can. And basically, build an SQL first, or inspect the returned SQL from the linq query (let me know if you need the link).

Retrieving a tree structure from a database using LINQ

I have an organization chart tree structure stored in a database.
Is is something like
ID (int);
Name (String);
ParentID (int)
In C# it is represented by a class like
class Employee
{
int ID,
string Name,
IList < Employee> Subs
}
I am wondering how is the best way to retrieve these values from the database to fill up the C# Objects using LINQ (I am using Entity Framework)
There must be something better than making a call to get the top level then making repeated calls to get subs and so on.
How best to do it?
You can build a stored proc that has built in recursion. Take a look at http://msdn.microsoft.com/en-us/library/ms190766.aspx for more info on Common Table Expressions in SQL Server
You might want to find a different (better?) way to model your data. http://www.sqlteam.com/article/more-trees-hierarchies-in-sql lists a popular way of modeling hierarchical data in a database. Changing the modeling can allow you to create queries that can be expressed without recursion.
If you're using SQL Server 2008, you could make use of the new HIERARCHYID feature.
Organizations have struggled in past
with the representation of tree like
structures in the databases, lot of
joins lots of complex logic goes into
the place, whether it is organization
hierarchy or defining a BOM (Bill of
Materials) where one finished product
is dependent on another semi finished
materials / kit items and these kit
items are dependent on another semi
finished items or raw materials.
SQL Server 2008 has the solution to
the problem where we store the entire
hierarchy in the data type
HierarchyID. HierarchyID is a variable
length system data type. HierarchyID
is used to locate the position in the
hierarchy of the element like Scott is
the CEO and Mark as well as Ravi
reports to Scott and Ben and Laura
report to Mark, Vijay, James and Frank
report to Ravi.
So use the new functions available, and simply return the data you need without using LINQ. The drawback is you'll need to use UDF or stored procedures for anything beyond a simple root query:
SELECT #Manager = CAST('/1/' AS hierarchyid)
SELECT #FirstChild = #Manager.GetDescendant(NULL,NULL)
I'd add a field to the entity to include the parent ID, then I'd pull the whole table into memory leaving the List subs null. Id then iterate through the objects and populate the list using linq to objects. Only one DB query so should be reasonable.
An Entity Framework query should allow you to include related entity sets, though in a unary relationship, not sure how it would work...
Check this out for more information on that: http://msdn.microsoft.com/en-us/library/bb896272.aspx
Well... even with LINQ you will need two queries, because any single query will duplicate the main employee and thus will result in multiple employees (that are really the same) being created... However, you can hide this a bit with linq when you create the object, that's when you would execute the second query, something like this:
var v = from u in TblUsers
select new {
SupervisorName = u.DisplayName,
Subs = (from sub in TblUsers where sub.SupervisorID.Value==u.UserID select sub.DisplayName).ToList()
};

Linq2Entities, many to many and dynamic where clause

I'm fairly new to Linq and struggling using dynamic where over a many to many relationship.
Database tables are like so:
Products <-> Products_SubCategories <-> SubCategories
with Products_SubCategories being a link table.
My full linq statement is
db.Products.Where("it.SubCategories.SubCategoryID = 2")
.Include("SubCategories")
.OrderBy(searchOrderBy)
.Skip(currentPage * pageSize)
.Take(pageSize)
.ToList()
.ForEach(p => AddResultItem(items, p));
So ignoring everything bar the Where() I'm just trying to pull out all products which are linked to sub category ID 2, this fails with
To extract properties out of collections, you must use a sub-query to iterate over the collection., near multipart identifier, line 8, column 1.
I think using the SQL-esque syntax I can do a subquery as per this link. However I'm not sure how to do that in the lambda / chaining syntax.
This is the start of a search function and I would like to build up the where string dynamically, as I have with the searchOrderBy string to avoid a large SELECT CASE. Products is linked to another table via a link table that I will need to include once I understand how to do this example.
Any help would be much appreciated!
Thanks
This is wrong:
db.Products.Where("it.SubCategories.SubCategoryID = 2")
SubCategories is a list. It does not have a property called SubCategoryID. Rather, it contains a group of entities which each have a property called SubCategoryID. That's a critical distinction.
When you run into a situation where you don't know how to proceed in there are multiple problems, it is good to break the problem down into several, smaller problems.
Let's start by removing the dynamic query. It will be easier to solve the problem with a non-dynamic query. Once you've done that, you can go back and make it dynamic again.
So start by using the non-dynamic syntax. Type something like this in Visual Studio, and see what IntelliSense does for you:
db.Products.Where(p => p.SubCategories.
You will quickly see that there is no SubCategoryID property. Instead, you will see a bunch of LINQ API methods for working with lists. If you know LINQ well, you will recognize that the Any() method is what you want here:
db.Products.Where(p => p.SubCategories.Any(sc => sc.SubCategoryID == 2))
Go ahead and run that query. Does it work? If so, you can move ahead to making it dynamic. I'm no ESQL expert, but I'd start with something along the lines of:
db.Products.Where("EXISTS(SELECT SC FROM it.SubCategories AS SC WHERE SC.SubCategoryID = 2");
As an aside, I use MS Dynamic Query ("Dynamic LINQ") for this sort of thing rather than Query Builder, as it's more testable.
It worked for me.
db.Products.Where("SubCategories.Any(SubCategoryID = 2)")

Categories