Entity Framework - Lazy Loading working even with ToList() - c#

First of all, I am using EF 6.0 with Code First approach.
My context Configuration is set to Enable "Proxy Creation" and "Lazy Loading".
My question is:
Does the lazy loading work with results of a method that returns IEnumerable (and not IQueryable)?
I think the code below is more explanatory:
public void Test()
{
var company = GetCompanies().FirstOrDefault();
if (company.Employees.Count() > 0)
{
//I got here without errors!
}
}
public IEnumerable<Company> GetCompanies()
{
var company = context.Companies.ToList();
//Note that I did not Include the Employee (child table)
return company;
}
Note comment where I say that: "I got here without errors!". It means that lazy loading is working even after ToList() call. I thought that after converting IQueryable to List or IEnumerable the EF would lose the capability of doing lazy loading.
I have noted that the Proxy still enabled for the entities that are returned by GetCompanies method (in debbug mode I can see that ugly hash like: System.Data.Entity.DynamicProxies.Company_7035BEA374959AC1...).
The lazy loading works even when calling it on different DLL. Is this correct? I mean, can a different DLL make subsequent calls in my database even if my method returns an IEnumerable (and not IQueryable)?
Any clarification will be greatly appreciated.

Note that comment that I say: "I got here without errors!". It means
that lazy loading is working even after ToList() call.
That's the whole point of lazy-loading: you can get entities from the DB when they are required (i.e. when you access to the property), not only when you execute the query for the first time (i.e. your call to .ToList()).
The lazy loading works even calling it on different DLL. Is this
correct? I mean, can a different DLL made subsequent calls in my
database even if my method return an IEnumerable (and not IQueriable)?
Yes it's correct, but be careful, if you dispose your context, lazy loading won't work (it'll throw an ObjectDisposedException).
Also, while your code will work, you might have performance issues because of the number of SQL requests generated.
Side note: personally I recommend to not use lazy-loading. See https://stackoverflow.com/a/21379510/870604

Related

Understanding lazy loading optimization in C#

After reading a bit of how yield, foreach, linq deferred execution and iterators work in C#. I decided to give it a try optimizing an attribute based validation mechanic inside a small project. The result:
private IEnumerable<string> GetPropertyErrors(PropertyInfo property)
{
// where Entity is the current object instance
string propertyValue = property.GetValue(Entity)?.ToString();
foreach (var attribute in property.GetCustomAttributes().OfType<ValidationAttribute>())
{
if (!attribute.IsValid(propertyValue))
{
yield return $"Error: {property.Name} {attribute.ErrorMessage}";
}
}
}
// inside another method
foreach(string error in GetPropertyErrors(property))
{
// Some display/insert log operation
}
I find this slow but that also could be due to reflection or a large amount of properties to process.
So my question is... Is this optimal or a good use of the lazy loading mechanic? or I'm missing something and just wasting tons of resources.
NOTE: The code intention itself is not important, my concern is the use of lazy loading in it.
Lazy loading is not something specific to C# or to Entity Framework. It's a common pattern, which allows defer some data loading. Deferring means not loading immediately. Some samples when you need that:
Loading images in (Word) document. Document may be big and it can contain thousands of images. If you'll load all them when document is opened it might take big amount of time. Nobody wants sit and watch 30 seconds on loading document. Same approach is used in web browsers - resources are not sent with body of page. Browser defers resources loading.
Loading graphs of objects. It may be objects from database, file system objects etc. Loading full graph might be equal to loading all database content into memory. How long it will take? Is it efficient? No. If you are building some file system explorer will you load info about every file in system before you start using it? It's much faster if you will load info about current directory only (and probably it's direct children).
Lazy loading not always mean deferring loading until you really need data. Loading might occur in background thread before you really need that data. E.g. you might never scroll to the bottom of web page to see footer image. Lazy loading means only deferring. And C# enumerators can help you with that. Consider getting list of files in directory:
string[] files = Directory.GetFiles("D:");
IEnumerable<string> filesEnumerator = Directory.EnumerateFiles("D:");
First approach returns array of files. It means directory should get all its files and save their names to array before you can get even first file name. It's like loading all images before you see document.
Second approach uses enumerator - it returns files one by one when you ask for next file name. It means that enumerator is returned immediately without getting all files and saving them to some collection. And you can process files one by one when you need that. Here getting files list is deferred.
But you should be careful. If underlying operation is not deferred, then returning enumerator gives you no benefits. E.g.
public IEnumerable<string> EnumerateFiles(string path)
{
foreach(string file in Directory.GetFiles(path))
yield return file;
}
Here you use GetFiles method which fills array of file names before returning them. So yielding files one by one gives you no speed benefits.
Btw in your case you have exactly same problem - GetCustomAttributes extension internally uses Attribute.GetCustomAttributes method which returns array of attributes. So you will not reduce time of getting first result.
This isn't quite how the term "lazy loading" is generally used in .NET. "Lazy loading" is most often used of something like:
public SomeType SomeValue
{
get
{
if (_backingField == null)
_backingField = RelativelyLengthyCalculationOrRetrieval();
return _backingField;
}
}
As opposed to just having _backingField set when an instance was constructed. Its advantage is that it costs nothing in the cases when SomeValue is never accessed, at the expense of a slightly greater cost when it is. It's therefore advantageous when the chances of SomeValue not being called are relatively high, and generally disadvantageous otherwise with some exceptions (when we might care about how quickly things are done in between instance creation and the first call to SomeValue).
Here we have deferred execution. It's similar, but not quite the same. When you call GetPropertyErrors(property) rather than receiving a collection of all of the errors you receive an object that can find those errors when asked for them.
It will always save the time taken to get the first such item, because it allows you to act upon it immediately rather than waiting until it has finished processing.
It will always reduce memory use, because it isn't spending memory on a collection.
It will also save time in total, because no time is spent creating a collection.
However, if you need to access it more than once, then while a collection will still have the same results, it will have to calculate them all again (unlike lazy loading which loads its results and stores them for subsequent reuse).
If you're rarely going to want to hit the same set of results, it's generally always a win.
If you're almost always going to want to hit the same set of results, it's generally a lose.
If you are sometimes going to want to hit the same set of results though, you can pass the decision on whether to cache or not up to the caller, with a single use calling GetPropertyErrors() and acting on the results directly, but a repeated use calling ToList() on that and then acting repeatedly on that list.
As such, the approach of not sending a list is the more flexible, allowing the calling code to decide which approach is the more efficient for its particular use of it.
You could also combine it with lazy loading:
private IEnumerable<string> LazyLoadedEnumerator()
{
if (_store == null)
return StoringCalculatingEnumerator();
return _store;
}
private IEnumerable<string> StoringCalculatingEnumerator()
{
List<string> store = new List<string>();
foreach(string str in SomethingThatCalculatesTheseStrings())
{
yield return str;
store.Add(str);
}
_store = store;
}
This combination is rarely useful in practice though.
As a rule, start with deferred evaluation as the normal approach and decide further up the call chain whether to store the results or not. An exception though is if you can know the size of the results before you begin (you can't here because you don't know if an element will be added or not until you've examined the property). In this case there is the possibility of a performance improvement in just how you create that list, because you can set its capacity ahead of time. This though is a micro-optimisation that is only applicable if you also know that you'll also always want to work on a list and doesn't save that much in the grand scheme of things.

Eager loading and navigation properties

var ret = (from f in context.foo
join b in context.bar on f.barid = b.barid
select f).ToList();
my returning list cointains all foos that have a barId, it also contains all navigation properties. What I mean by that is,
context.foo.mark is populated even though I didnt not explicitly include it nor did i access it during the query. I have lazy loading turned on, why is this occuring?
To elaborate on my question, somehow my related entities are getting loaded from the above query.I am curious as to how that is occurring, I have lazy loading enabled and I am not accessing any of the related objects
The lazy loading inspection is kind of a "catch-22" type problem. With Lazy Loading turned on, even a call to the property from the debugger will load the results as long as your context is still hanging around. Furthermore, if your context is still open from other queries, EF will maintain the state of those objects automatically and include them.
The only real way I can think of to determine if it is being lazily loaded or not is to inspect the SQL code sent to your database.
First, add this line to your DbContext constructor:
this.Database.Log = s => System.Diagnostics.Debug.WriteLine(s); //remove in production
Then, run your code as normal (but don't stop in the debugger to inspect your object). Look at your debug console and inspect the SQL calls made. My bet is that the SQL will not include the related properties.
If you run the code again, and stop the debugger to inspect the object properties, you should see another SQL call in the debug console fetching the related entities.

Unused results from Entity Framework database call

Any decent compiler should eliminate dead code, at least to a certain extent. However, I am curious how a compiler (specifically MSBuild) handles a situation like the following:
// let's assume LazyLoadingEnabled = false;
var users = db.Users.ToList();
// more code that never touches 'users'
Since LazyLoadingEnabled = false, will the compiled code:
Eagerly load the results from the database call
Make the call to the database without storing the results
or
Never make the call to begin with?
I was cleaning up some old code at work and I found several cases of this occurring, so I'm curious as to whether we've been wasting resources or not.
It feels like the right answer is number 3, but I haven't found any solid evidence to back up my claims. Thank you for your help!
The answer is #1.
Not only will this execute the database query to select all the records from the Users table, but it will fetch all those records and construct entities for each of those records in the Users table. Very expensive if you have many records. Of course, the GC will eventually collect the wasted resources.
If you want to prove the above for yourself, just add the following line after you create your DbContext to log the SQL being executed:
db.Database.Log = s => Console.WriteLine(s);
BTW, the LazyLoadingEnabled setting has no effect on the observed behavior. The LazyLoadingEnabled setting determines if navigational properties are eagerly loaded or not. In this case, db.Users is not a navigational property, so it has no effect.

Filtering IQueryable<T> returns wrong result

I have just run into quite surprising problem.
The case is simple: return all entities that are currently active, which means: filter all result returned by GetAll() method according to their Boolean Active property
public IQueryable<T> GetAllActive()
{
return implementation.GetAll().Where(a => ((IDeactivable)a).Active);
}
where GetAll() method of implementation object is defined as:
public IQueryable<T> GetAll();
The problem is, that GetAllActive() returns all the records, regardless of value of their Active property, just like there is no Where clause.
What could be the reason for it?
Note: The code is simplified, T type is checked to implement the IDeactivable interface. Also no exception is thrown during at runtime.
Edit: IQueryable returned by implementation object comes from NHibernate
Edit2: I have used following code to check the actual values for the entities (besides using VS Debugger):
foreach (var a in active) { //active -> filtered IQueryable before return
_logger.Warn(a.Id);
_logger.Warn(((IDeactivable)a).Active);
}
the result was:
11/30/2011 18:10:00 WARN xxx.Repository`1.GetAllActive: 70db43fa-2361-4c1f-a8e5-9fab012b5a2b
11/30/2011 18:10:01 WARN xxx.Repository`1.GetAllActive: False
11/30/2011 18:10:02 WARN xxx.Repository`1.GetAllActive: 5493c9bb-ec6e-4690-b5d6-9fab012b5b16
11/30/2011 18:10:02 WARN xxx.Repository`1.GetAllActive: True
When you return an IQueryable<T>, you are not actually returning a result set. What you are returning is an object that can be queried.
Execution of the .Where() method is deferred until you (or someone calling your method) actually compels execution of the Linq chain. This is what makes it possible for downstream clients to apply their additional Linq methods to the result, and still get lazy evaluation for the entire Linq chain.
So when you say that the IQueryable<T> is returning all records, you're probably looking at the result in the debugger, and it's showing you the original data set without the filtering (since the .Where() hasn't executed yet).
The reason casting to IEnumerable works is because it triggers execution of the Linq command chain, and the result is a bonafide list, rather than an object that can be queried. Calling ToList() or ToArray() will also trigger execution.
In short, the only way you can be sure you're seeing the correct result from your Linq methods during your testing process is to force execution of the Linq chain:
foreach(var record in GetAllActive.ToList())
{
// Display each record
}
For a little flavor of how this works, see Working with Deferred Execution. It contains an example showing how you can actually get into trouble returning an IQueryable from a using block, because the IQueryable object gets disposed before the query executes.
I have tried several different approaches and I finally I have found a part of my code not tested yet. It turned out that LINQ Queries to NHibernate caused some issues when using Where clause, that I have not noticed before.
Eventually, I figured out, that I am using a wrong version of LINQ to NHibernate QueryProvider (not the one included in NH 3.0) and that is a known issue. Now that I have get rid of it, everything works fine. THANK YOU FOR YOUR HELP, GUYS! You pointed me out to the right direction.
Mentioned issue is described in following thread:
Problem with linq query

Why use AsQueryable() instead of List()?

I'm getting into using the Repository Pattern for data access with the Entity Framework and LINQ as the underpinning of implementation of the non-Test Repository. Most samples I see return AsQueryable() when the call returns N records instead of List<T>. What is the advantage of doing this?
AsQueryable just creates a query, the instructions needed to get a list. You can make futher changes to the query later such as adding new Where clauses that get sent all the way down to the database level.
AsList returns an actual list with all the items in memory. If you add a new Where cluse to it, you don't get the fast filtering the database provides. Instead you get all the information in the list and then filter out what you don't need in the application.
So basically it comes down to waiting until the last possible momment before committing yourself.
Returning IQueryable<T> has the advantage, that the execution is defferer until you really start enumerating the result and you can compose the query with other queries and still get server side execution.
The problem is that you cannot control the lifetime of the database context in this method - you need an open context and must ensure that it stays open until the query gets executed. And then you must ensure that the context will be disposed. If you return the result as a List<T>, T[], or something similar, you loose deffered execution and server side execution of composed queries, but you win control over the lifetime of the database context.
What fits best, of course, depends on the actual requirements. It's another question without a single truth.
AsQueryable is an extension method for IEnumerable<T> that could do two things:
If the IEnumerable<T> implements IQueryable<T> justs casts, doing nothing.
Otherwise creates a 'fake' IEnumerable<T> (EnumerableQuery<T>) that implements every method compiling the lambdas and calling to Enumerable extension methods.
So in most of the cases using AsQueryable is useless, unless u are forced to pass a IQueryable to a method and u have a IEnumerable instead, it's a hack.
NOTE: AsQueryable is a hack, IQueryable of course is not!
Returning IQueryable<T> will defer execution of the query until its results are actually used. Until then, you can also perform additional database query operations on the IQueryable<T>; on a List you're restricted to generally less-efficient in-memory operations.
IQueryable: This is kind of deferred execution - lazy loading so your query is evaluated and hit to database. If we add any additional clause it will hit the Db with the required query with filters.
IEnumerable: Is eager loading thus records will be loaded first in-memory operation and then the operation will be performed.
So depending on use case like while paging on a huge number of records we should use IQueryable<T>, if short operation and doesn't create huge memory storage use IEnumerable.

Categories