EF 5 large data load strategy

EF 5 large data load strategy - c#

Good day,
I have the following tables
PARENT 1=>N CHILDREN 1=>N GRANDCHILDREN.
Both tables have over 30 columns.
I need to select over 50,000 records form PARENT, plus I will need certain fields from CHILDREN and GRANDCHILDREN. Data is needed to manipulate in memory (complex algorithms on what's been selected).
I am using Entity Framework 5.
I tried various combinations of Eager loading (Include, projection etc), but I am still not able to make it perform better then it perorms with LINQ-to-SQL in the following scenario:
"
SELECT from PROJECTS
on binding of each row:
SELECT from CHILDREN
SELECT from GRANDCHILDREN
"
it generates at least 50,001 calls to the DB, but it's still performing better then any of my EF approaches, which take over x5 longer than the current LINQ-to-SQL design.
The best solution would be to have an WHERE IN query on children, but it's not available in EF 5 in native implementation (contains doesn't cut it - too slow for badly done...).
Any ideas will be greatly appreciated.
Thanks,

I assume you are implementing paging in your grid view and are not puting thousands of rows into a grid view at once. If so, you can only select 10 or however many rows you are displaying in the grid view at a time. This will be a lot easier to work with.
I found this example on MSDN that implements paging server side to reduce the number of rows returned in a single query.
You can also consider writing or having a dba write an efficient stored procedure that you can link to your entity framework to control the SQL Code.

I had similar issue some days ago. EF is very slow. After some experiments I received more or less normal performans with direct queries:
Create ViewModel with needed fields:
public class MyViewModel
{
public string one {get; set;}
public string two {get; set;}
}
Then in controller action:
MyViewModel result = db.Database.SqlQuery<MyViewModel>
("SELECT a.one, b.two" +
" FROM Table1 a, Table2 b" +
" WHERE a.id == somthing"
).FirstOrDefault();

Paging wouldn't work for I need data to be sorted based on a calculated field. The field can be only calculated in the web-server memory for the calculation needs client info (yes, yes, there is a way of passing this info to the db server, but this wasn't an option).
Solution:
using(var onecontext = new myCTx())
{
SELECT all from PROJECTS
and implement Context.EntityName.SQLQuery() on all grand children, using the good old WHERE IN construct (I put it all into my entities' partial classes as extensions).
}
this way I get all my data in N db trips, where N is the number of generations, which is fine. The EF context then connects everything together. And then I perform all my r
EF 6 should have WHERE IN built in, so I guess this approach will become more obvious then. Mind you: using Contains() is not an option for large data for it produces multiple OR's instead of the straight IN. Yes, ADO.NET then translates OR's into IN, but before that there is some really heavy lifting being done, which is killing your app server.

Related

NHibernate stored procedure returns data from previous query

I use two stored procedures that return the data with the same structure (list of records of the same type).
I call my method Execute(ISession session) twice. First time for the first stored procedure (it returns correct list of 6 rows). Second time - for the second stored procedure (it returns list of 11 rows, but first 6 rows are from the first request that overwrite the correct rows).
I found
Impact on NHibernate caching for searches with results including calculated value mapped as a formula (e.g. rank)
But I can't use it for IQuery
Any ideas or links how it can be fixed ?
public dynamic Execute(ISession session)
{
var query = session.GetNamedQuery(QueryName)
.SetCacheable(false)
.SetCacheMode(CacheMode.Ignore)
.SetReadOnly(true);
var results = query.List<T>();
return results;
}

I'm going to take a stab at answering this, because I think I have a hunch of what's going on, and I want to set you on the right track. I've made a lot of assumptions here, so please don't be too harsh on me if I was completely wrong with my guesses.
It feels like you're trying to use NHibernate as a tool to simply translate rows into objects. Instead NHibernate is a tool that translates between your object oriented domain model and your relational database domain model. It does a lot more that just turn rows into objects. In particular, the NHibernate feature that you're tripping over here is how NHibernate ensures that within a single NHibernate session, a single row in the database which represents a single entity will correspond to a single instance of an object. It uses its first-level cache to accomplish this.
Let's say you have two queries, QueryA and QueryB. These queries have been constructed so that they each pull from separate tables, TableA and TableB, so really they represent separate entities. However, the queries have also somehow been built so that the result look to NHibernate like the same entity. If QueryA and QueryB happen to return some of the same ids, then NHibernate will combine them into the same instance, so you would see some of the results from QueryA repeated when you run QueryB.
So how do we fix it?
The quick and dirty fix would be to use different sessions for each of those two queries, or throw a session.Clear() in-between them. The more appropriate fix would be to change these named queries so that they actually do return two different entities.

I have the same problem, in first place I resolved the problem with session.Clear() but this solution lead to another bug. I read the response of Daniel and this response I served to detect that the issue is in the stored procedure, the stored procedure did not return an unique identifier and this produced the error when I mapped the ID with nhibernate.

Most efficient way to sum data in C#

I am trying to create a friendly report summing enrollment for number of students by time of day. I initially started with loops for campusname, then time, then day and hibut it was extremely inefficient and slow. I decided to take another approach and select all the data I need in one select and organize it using c#.
Raw Data View
My problem is I am not sure whether to put this into arrays, or lists, or a dictionary or datatable to sum the enrollment and organize it as seen below(mockup, not calculated). Any guidance would be appreciated.
Friendly View

Well, if you only need to show the user some data (and not edit it) you may want to create a report.
Otherwise, if you only need sums, you could get all the data in an IEnumerable and call .Sum(). And as pointed out by colinsmith, you can use Linq in parallel.
But one thing is definite though... If you have a lot of data, you don't want to do many queries. You could either use a sum query in SQL (if the data is stored in a database) or do the sum from a collection you've fetched.
You don't want to fetch the data in a loop. Processing data in memory is way faster than querying multiple times the database and then process it.

Normally I would advise you to do this in the database, i.e. a select using group by etc, I'm having a bit of trouble figuring out how your first picture relates to the second with regards to the days so I can't offer an example.
You could of course do this in C# as well using LINQ to objects but I would first try and solve it in the DB, you are better of performance and bandwidth wise that way.

I am not quite sure what you are exactly after. But from my understanding, i would suggest you to create a class to represent your enrollment
public class Enrollment
{
public string CampusName { set;get;}
public DateTime DateEnrolled { set;get;}
}
And Get all enrollment details from the database to a collection of this class
List<Enrollment> enrollments=db.GetEnrollments();
Now you can do so many operations on this Collection to get your desired data
Ex : If you want to get all Enrollment happened on Fridays
var fridaysEnrollMent = enrollments.
Where(x => x.DateEnrolled.DayOfWeek == DayOfWeek.Friday).ToList();
If you want the Count of Enrollments happened in AA campus
var fridayCount = fridaysEnrollMent .Where(d => d.CampusName == "AA").Count();

something like
select campusname, ssrmeet_begin_time, count(ssrmeet_monday), count(ssrmeet_tue_day) ..
from the_table
group by campusname, ssrmeet_begin_time
order by campusname, ssrmeet_begin_time
/
should be close to what you want. The count only counts the values, not the NULL's. It is also thousands of times faster than first fetching all data to the client. Let the database do the analysis for you, it already has all the data.
BTW: instead of those pics, it is smarter to give some ddl and insert statements with data to work on. That would invite more people to help to answer the question.

CRM 2011: Limitation of query expression?

I believe the answer to this question may be to use Linq to Sql, but wanted to see if this is something which is possible using QueryExpressions:-
I create a query expression which queries against Entity A, it also links to Entity B (via LinkEntity) and imposes additional criteria. It is possible to retrieve columns from Entity B by adding the appropriate attribute names. However, it will only retrieve the linked entity (inner join).
Is it possible using QueryExpression to retrieve all related records (and required columns) from Entity B related to Entity A (e.g. all cases associated with contact where contact passes specified criteria). Normally I would consider inverting the query and searching for Entity B relatig to Entity A with the appropriate LinkEntity Conditions, but there are a number of linked entities which I would like to retrieve for the same contact query.
So I'm left with some options:-
(1) Perform a second query (not ideal when iterating over a large number of results from the initial query),
(2) Perform a query using Linq to CRM on the filtered views,
(3) A different method entirely?
Any thoughts would be appreciated.
EDIT:
I ended up using Linq-to-Sql to complete this task and the code used is similar to that below (albeit with a few more joins for the actual query!):-
var dataCollection = (from eA in xrmServiceContext.EntityASet
join eB in xrmServiceContext.EntityBSet on new EntityReference(EntityA.EntityLogicalName, eA.Id) equals (EntityReference)eB.EntityBLookupToEntityA
select new
{
Id = eA.Id,
EntityBInterestingAttribute = eB.InterestingAttributeName
}
So this will bring back a row per Entity A, per Entity B. To make things easier I then defined a custom class "MyEntityAClass" which had properties which were Lists so I could return one object for filling of GridView etc. This is more to do with the processing of these results though so I haven't posted that code here.
I hope that makes sense. Essentially, it is getting the multiple rows per record a la SQL which makes this method work.

QueryExpression can only return fields from one type of entity, the one specified in QueryExpression.EntityName.
You can use FetchXML which allows you to also get the fields of any link entities, which would be an option 3 for you, unfortunately it returns the data as XML which you would then have to parse yourself.
It might be quicker to run the FetchXML, but it will take longet to write and test, and its not the easiest thing to maintain either.
Sample Code, this gets the first 101 of all Cases that are active for all accounts that are active
string fetch = "<fetch count='101' mapping='logical'><entity name='account'><filter type='and'><condition attribute='statecode' operator='eq' value='1'/></filter><link-entity name='incident' from='customerid' to='accountid'><all-attributes/><filter type='and'><condition attribute='statecode' operator='eq' value='1'/></filter></link-entity></entity></fetch>";
string data = yourCrmServiceObject.Fetch(fetch);

How to optimize this linq query? (with OrderBy and ThenBy)

I have currently in a table about 90k rows. And it's will grow up about 1kk ~ 5kk before i execute a clean up and put all rows in a "historical table". So, when i run this following query (MyEntities is a ObjectSet):
MyEntities.Skip(amount * page).Take(amount).ToList();
This query takes about 1.2s... but when i run this following query with OrderBy and ThenBy:
MyEntities.OrderBy(b => b.Day).ThenBy(b => b.InitialHour).Skip(amount * page).Take(amount).ToList();
This query takes about 5.7s. There is a way to optimize the second query?

A few suggestions:
Check that it really is happening in the database (instead of fetching all entities, then sorting)
Make sure that both Day and InitialHour are indexed.
Check the generated SQL isn't doing anything crazy (check the query plan)
EDIT: Okay, so it looks like MyEntities is actually declared as IEnumerable<MyEntity>, which means everything will be done in-process... all your LINQ calls will be via Enumerable.Select etc, rather than Queryable.Select etc. Just change the declared type of MyEntities to IQueryable<MyEntity> and watch it fly...

For reading data from your DB, it's usually a good idea to create custom SQL Views, say one View per grid and one View per Form that you want to populate.
In this example, you would create a view that does the sorting for you, then map that View to an Entity in Entity Framework, then query that Entity using LINQ.
This is nice, clean, readable, maintainable and as optimal as you can make it.
Good luck!

Retrieving a tree structure from a database using LINQ

I have an organization chart tree structure stored in a database.
Is is something like
ID (int);
Name (String);
ParentID (int)
In C# it is represented by a class like
class Employee
{
int ID,
string Name,
IList < Employee> Subs
}
I am wondering how is the best way to retrieve these values from the database to fill up the C# Objects using LINQ (I am using Entity Framework)
There must be something better than making a call to get the top level then making repeated calls to get subs and so on.
How best to do it?

You can build a stored proc that has built in recursion. Take a look at http://msdn.microsoft.com/en-us/library/ms190766.aspx for more info on Common Table Expressions in SQL Server
You might want to find a different (better?) way to model your data. http://www.sqlteam.com/article/more-trees-hierarchies-in-sql lists a popular way of modeling hierarchical data in a database. Changing the modeling can allow you to create queries that can be expressed without recursion.

If you're using SQL Server 2008, you could make use of the new HIERARCHYID feature.
Organizations have struggled in past
with the representation of tree like
structures in the databases, lot of
joins lots of complex logic goes into
the place, whether it is organization
hierarchy or defining a BOM (Bill of
Materials) where one finished product
is dependent on another semi finished
materials / kit items and these kit
items are dependent on another semi
finished items or raw materials.
SQL Server 2008 has the solution to
the problem where we store the entire
hierarchy in the data type
HierarchyID. HierarchyID is a variable
length system data type. HierarchyID
is used to locate the position in the
hierarchy of the element like Scott is
the CEO and Mark as well as Ravi
reports to Scott and Ben and Laura
report to Mark, Vijay, James and Frank
report to Ravi.
So use the new functions available, and simply return the data you need without using LINQ. The drawback is you'll need to use UDF or stored procedures for anything beyond a simple root query:
SELECT #Manager = CAST('/1/' AS hierarchyid)
SELECT #FirstChild = #Manager.GetDescendant(NULL,NULL)

I'd add a field to the entity to include the parent ID, then I'd pull the whole table into memory leaving the List subs null. Id then iterate through the objects and populate the list using linq to objects. Only one DB query so should be reasonable.

An Entity Framework query should allow you to include related entity sets, though in a unary relationship, not sure how it would work...
Check this out for more information on that: http://msdn.microsoft.com/en-us/library/bb896272.aspx

Well... even with LINQ you will need two queries, because any single query will duplicate the main employee and thus will result in multiple employees (that are really the same) being created... However, you can hide this a bit with linq when you create the object, that's when you would execute the second query, something like this:
var v = from u in TblUsers
select new {
SupervisorName = u.DisplayName,
Subs = (from sub in TblUsers where sub.SupervisorID.Value==u.UserID select sub.DisplayName).ToList()
};

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

EF 5 large data load strategy - c#

Related

NHibernate stored procedure returns data from previous query

Most efficient way to sum data in C#

CRM 2011: Limitation of query expression?

How to optimize this linq query? (with OrderBy and ThenBy)

Retrieving a tree structure from a database using LINQ

Categories

Resources