Performance with Entity Framework - c#

I have another question about performance with EF.
There's one method to get an object from context:
tDocumentTyp DocumentTypObject = Context.tDocumentTyps.Where(s => s.DocumentTypID == iTypID).FirstOrDefault();
This method takes ~2979 ms.
Then I wrote a method to get the DBSet via reflection and is executed this way:
tDocumentTyp DocumentTypObject = Context.GetEntries<tDocumentTyp>().Where(s => s.DocumentTypID == iTypID).FirstOrDefault();
My method needs ~222 ms to execute.
So my question now is, why is my method much faster than the original one? Or is there anything wrong with my method?
To make this a bit easier, here is my method for getting DBSet via reflection:
public static IEnumerable<T> GetEntries<T>(this AppContext DataContext,
string PropertyName = null, IEnumerable<string> includes = null) where T : IEntity
{
Type ContextType = typeof(AppContext);
PropertyInfo Entity = null;
if (null == PropertyName)
Entity = ContextType.GetProperty(typeof(T).Name)
?? ContextType.GetProperty(typeof(T).Name + "s");
else
Entity = ContextType.GetProperty(PropertyName);
if (null == Entity)
throw new Exception("Could not find the property. If the property is not equal to the tablesname, you have to parametrize it.");
DbQuery<T> set = ((DbSet<T>)Entity.GetValue(DataContext, null));
if (includes != null)
includes.ForEach(f => set = set.Include(f));
return set;
}

The second example is getting the entire table and applying the Where in memory. You are applying the extension method System.Linq.Enumerable.Where which operates on IEnumerable<T>. Note that this is an in-memory implementation. In the first example you using the extension method System.Linq.Queryable.Where which operates on IQueryable<T>. This is a different method, though they share the same name.
If you inspect closely you will also find that in the first example the method parameter is of type Expression<Func<T, bool>> whilst in the second example it is simply Func<T, bool>. This is a very important difference: the expression can be processed to produce a SQL-query.
So why is the second one faster? Well, that is hard to answer without more information about your data source. But as others have noted in the comments, if the database is not indexed then it may well be quicker to select the entire table and execute the filter in memory than to have the SQL server apply the filtering.

Related

How to get an overloaded == operator to work with LINQ and EF Core?

so basically, I have a project which uses EF Core. In order to shorten my lambdas when comparing if two objects (class Protocol) are equal, I've overridden my Equals method and overloaded the == and != operators. However, LINQ doesn't seem to care about it, and still uses reference for determining equality. Thanks
As I've said before, I've overridden the Equals method and overloaded the == and != operators. With no luck. I've also tried implementing the IEquatable interface. Also no luck.
I am using:
EF Core 2.2.4
//the protocol class
[Key]
public int ProtocolId {get;set;}
public string Repnr {get;set;}
public string Service {get;set;}
public override bool Equals(object obj)
{
if (obj is Protocol other)
{
return this.Equals(other);
}
return false;
}
public override int GetHashCode()
{
return $"{Repnr}-{Service}".GetHashCode();
}
public bool Equals(Protocol other)
{
return this?.Repnr == other?.Repnr && this?.Service == other?.Service;
}
public static bool operator ==(Protocol lhs, Protocol rhs)
{
return lhs.Equals(rhs);
}
public static bool operator !=(Protocol lhs, Protocol rhs)
{
return !lhs.Equals(rhs);
}
//the problem
using (var db = new DbContext())
{
var item1 = new Protocol() { Repnr = "1666", Service = "180" };
db.Protocols.Add(item1 );
db.SaveChanges();
var item2 = new Protocol() { Repnr = "1666", Service = "180" };
var result1 = db.Protocols.FirstOrDefault(a => a == item2);
var result2 = db.Protocols.FirstOrDefault(a => a.Equals(item2));
//both result1 and result2 are null
}
I would expect both result1 and result2 to be item1. However, they're both null. I know I could just do a.Repnr == b.Repnr && a.Service == b.Service, but that just isn't as clean. Thanks
To understand why the incorrect equality comparer is used, you have to be aware about the difference between IEnumerable<...> and IQueryable<...>.
IEnumerable
An object that implements IEnumerable<...>, is an object that represents a sequence of similar objects. It holds everything to fetch the first item of the sequence, and once you've got an item of the sequence you can get the next item, as long as there is a next item.
You start enumerating either explicitly by calling GetEnumerator() and repeatedly call MoveNext(). More common is to start enumerating implicitly by using foreach, or LINQ terminating statements like ToList(), ToDictionary(), FirstOrDefault(), Count() or Any(). This group of LINQ methods internally uses either foreach, or GetEnumerator() and MoveNext() / Current.
IQueryable
An object that implements IQueryable<...> also represents an enumerable sequence. The difference however, is that this sequence usually is not held by your process, but by a different process, like a database management system.
The IQueryable does not (necessarily) hold everything to enumerate. Instead it holds an Expression and a Provider. The Expression is a generic description about what must be queried. The Provider knows which process will execute the query (usually a database management system) and how to communicate with this process (usually something SQL-like).
An IQueryable<..> also implements IEnumerable<..>, so you can start enumerating the sequence as if it was a standard IEnumerable. Once you start enumerating an IQueryable<...> by calling (internally) GetEnumerator(), the Expression is sent to the Provider, who translates the Expression into SQL and executes the query. The result is presented as an enumerator, which can be enumerated using MoveNext() / Current.
This means, that if you want to enumerate an IQueryable<...>, the Expression must be translated into a language that the Provider supports. As the compiler does not really know who will execute the query, the compiler can't complain if your Expression holds methods or classes that your Provider doesn't know how to translate to SQL. In such cases you'll get a run-time error.
It is easy to see, that SQL does not know your own defined Equals method. In fact, there are even several standard LINQ functions that are not supported. See Supported and Unsupported LINQ Methods (LINQ to Entities).
So what should I do if I want to use an unsupported function?
One of the things that you could do is move the data to your local process, and then call the unsupported function.
This can be done using ToList, but if you will only use one or a few of the fetched items, this would be a waste of processing power.
One of the slower parts of a database query is the transport of the selected data to your local process. Hence it is wise to limit the data to the data that you actually plan to use.
A smarter solution would be to use AsEnumerable. This will fetch the selected data "per page". It will fetch the first page, and once you've enumerated through the fetched page (using MoveNext), it will fetch the next page.
So if you only use a few of the fetched items, you will have fetched some items that are not used, but at least you won't have fetched all of them.
Example
Suppose you have a local function that takes a Student as input and returns a Boolean
bool HasSpecialAbility(Student student);
Requirement: give me three Students that live in New York City that have the special Ability.
Alas, HasSpecialAbility is a local function, it can't be translated into Sql. You'll have to get the Students to your local process before calling it.
var result = dbContext.Students
// limit the transported data as much as you can:
.Where(student => student.CityCode == "NYC")
// transport to local process per page:
.AsEnumerable()
// now you can call HasSpecialAbility:
.Where(student => HasSpecialAbility(student))
.Take(3)
.ToList();
Ok, you might have fetched a page of 100 Students while you only needed 3, but at least you haven't fetched all 25000 students.

Can I clone an IQueryable to run on a DbSet for another DbContext?

Suppose I have built up, through some conditional logic over many steps, an IQueryable<T> instance we'll call query.
I want to get a count of total records and a page of data, so I want to call query.CountAsync() and query.Skip(0).Take(10).ToListAsync(). I cannot call these in succession, because a race condition occurs where they both try to run a query on the same DbContext at the same time. This is not allowed:
"A second operation started on this context before a previous asynchronous operation completed. Use 'await' to ensure that any asynchronous operations have completed before calling another method on this context. Any instance members are not guaranteed to be thread safe."
I do not want to 'await' the first before even starting the second. I want to fire off both queries as soon as possible. The only way to do this is to run them from separate DbContexts. It seems ridiculous that I might have to build the entire query (or 2, or 3) side-by-side starting with different instances of DbSet. Is there any way to clone or alter an IQueryable<T> (not necessarily that interface, but it's underlying implementation) such that I can have one copy that runs on DbContext "A", and another that will run on DbContext "B", so that both queries can be executing simultaneously? I'm just trying to avoid recomposing the query X times from scratch just to run it on X contexts.
There is no standard way of doing that. The problem is that EF6 query expression trees contain constant nodes holding ObjectQuery instances which are bound to the DbContext (actually the underlying ObjectContext) used when creating the query. Also there is a runtime check before executing the query if there are such expressions bound to a different context than the one executing the query.
The only idea that comes in my mind is to process the query expression tree with ExpressionVisitor and replace these ObjectQuery instances with new ones bound to the new context.
Here is a possible implementation of the aforementioned idea:
using System.Data.Entity.Core.Objects;
using System.Data.Entity.Infrastructure;
using System.Linq;
using System.Linq.Expressions;
namespace System.Data.Entity
{
public static class DbQueryExtensions
{
public static IQueryable<T> BindTo<T>(this IQueryable<T> source, DbContext target)
{
var binder = new DbContextBinder(target);
var expression = binder.Visit(source.Expression);
var provider = binder.TargetProvider;
return provider != null ? provider.CreateQuery<T>(expression) : source;
}
class DbContextBinder : ExpressionVisitor
{
ObjectContext targetObjectContext;
public IQueryProvider TargetProvider { get; private set; }
public DbContextBinder(DbContext target)
{
targetObjectContext = ((IObjectContextAdapter)target).ObjectContext;
}
protected override Expression VisitConstant(ConstantExpression node)
{
if (node.Value is ObjectQuery objectQuery && objectQuery.Context != targetObjectContext)
return Expression.Constant(CreateObjectQuery((dynamic)objectQuery));
return base.VisitConstant(node);
}
ObjectQuery<T> CreateObjectQuery<T>(ObjectQuery<T> source)
{
var parameters = source.Parameters
.Select(p => new ObjectParameter(p.Name, p.ParameterType) { Value = p.Value })
.ToArray();
var query = targetObjectContext.CreateQuery<T>(source.CommandText, parameters);
query.MergeOption = source.MergeOption;
query.Streaming = source.Streaming;
query.EnablePlanCaching = source.EnablePlanCaching;
if (TargetProvider == null)
TargetProvider = ((IQueryable)query).Provider;
return query;
}
}
}
}
One difference with the standard EF6 LINQ queries is that this produces ObjectQuery<T> rather than DbQuery<T>, although except that ToString() does not return the generated SQL, I haven't noticed any difference in the further query building / execution. It seems to work, but use it with care and on your own risk.
You could write a function to build up your query, taking DbContext as a parameter.
public IQueryable<T> MyQuery(DbContext<T> db)
{
return db.Table
.Where(p => p.reallycomplex)
....
...
.OrderBy(p => p.manythings);
}
I've done this many times and it works well.
Now it's easy to make queries with two different contexts:
IQueryable<T> q1 = MyQuery(dbContext1);
IQueryable<T> q2 = MyQuery(dbContext2);
If your concern was the execution time taken to build the IQueryable objects, then my only suggestion is don't worry about it.
So you have an IQueryable<T> that will be performed on DbContext A as soon as the query is executed and you want the same query to run on DbContext B when the query is executed.
For this you'll have to understand the difference between an IEnumerable<T> and an IQueryable<T>.
An IEnumerable<T> holds all code to enumerate over the elements that the enumerable represents. The enumeration starts when GetEnumerator and MoveNext are called. This can be done explicitly. However it is usually done implicitly by functions like foreach, ToList, FirstOrDefault, etc.
An IQueryable does not hold the code to enumerate, it holds an Expression and a Provider. The Provider knows who will execute the query, and it knows how to translate the Expression into the language that is understood by the query executioner.
Due to this separation, it is possible to let the same Expression be executed by different data sources. They don't even have to be of the same type: one data source can be a database management system that understands SQL, the other one could be a comma separated file.
As long as you concatenate Linq statements that return an IQueryable, the query is not executed, only the Expression is changed.
As soon as enumeration starts, either by calling GetEnumerator / MoveNext, or by using foreach or one of the LINQ functions that do not return an IQueryable, the Provider will translate the Expression into the language the the data source understands and communicates with the data source to execute the query. The result of the query is an IEnumerable, which can be enumerated as if all data was in local code.
Some Providers are smart and use some buffering, so that not all data is transferred to local memory, but only part of the data. New data is queried when needed. So if you do a foreach in a database with a zillion elements, only the first few (thousands) elements are queried. More data is queried if your foreach runs out of fetched data.
So you already have one IQueryable<T>, therefore you have an Expression a Provider and an ElementType. You want the same Expression / ElementType to be executed by a differentProvider. You even want to change theExpression` slightly before you execute it.
Therefore you need to be able to create an object that implements IQueryable<T> and you want to be able to set the Expression, ElementType and a Provider
class MyQueryable<T> : IQueryable<T>
{
public type ElementType {get; set;}
public Expression Expression {get; set;}
public Provider Provider {get; set;}
}
IQueryable<T> queryOnDbContextA= dbCotextA ...
IQueryable<T> setInDbContextB = dbContextB.Set<T>();
IQueryable<T> queryOnDbContextB = new MyQueryable<T>()
{
ElementType = queryOnDbContextA.ElementType,
Expression = queryOnDbContextB.Expression,
Provider = setInDbContextB.Provider,
}
If desired you can adjust the query on the other context before executing it:
var getPageOnContextB = queryOnDbContextB
.Skip(...)
.Take(...);
Both queries are still not executed yet. Execute them:
var countA = await queryOnContextA.CountAsync();
var fetchedPageContextB = await getPageOnContextB.ToListAsync();

How to combine Find() and AsNoTracking()?

How to combine Find() with AsNoTracking() when making queries to an EF context to prevent the returned object from being tracked. This is what I can't do
_context.Set<Entity>().AsNoTracking().Find(id);
How can I do that? I am using EF version 6.
Note: I do not want to use SingleOrDefault(), or Where. I just can't because the parameter Id is generic and it's a struct and I can not apply operator == for generics in that case.
So instead of using AsNoTracking() what you can do is Find() and then detach it from the context. I believe that this gives you the same result as AsNoTracking() besides the additional overhead of getting the entity tracked. See EntityState for more information.
var entity = Context.Set<T>().Find(id);
Context.Entry(entity).State = EntityState.Detached;
return entity;
Edit: This has some potential issues, if the context hasn't loaded some relationships, then those navigation properties will not work and you will be confused and frustrated why everything is returning null! See https://stackoverflow.com/a/10343174/2558743 for more info. For now on those repositories I'm overriding the FindNoTracking() methods in my repositories that I need that in.
<context>.<Entity>.AsNoTracking().Where(s => s.Id == id);
Find() does not make sense with AsNoTracking() because Find is supposed to be able to return tracked entities without going to database.. your only option with AsNoTracking is either Where or First or Single...
The accepted answer has the issue that if the item you are trying to find is already being tracked, it will return that item then mark it as untracked (which may mess up other parts of the code).
Akos is on the right track with his suggestion to build the expression yourself, but the example only works for entities that have a single primary key (which covers most cases).
This extension method works in EF Core and effectively matches the signature for the DbSet<T>.Find(object []). But it is an extension method for DbContext instead of DbSet because it needs access to the Entity's metadata from the DbContext.
public static T FindNoTracking<T>(this DbContext source, params object[] keyValues)
where T : class
{
DbSet<T> set = source.Set<T>();
if (keyValues == null || !keyValues.Any())
{
throw new Exception("No Keys Provided.");
}
PropertyInfo[] keyProps = GetKeyProperties<T>(source);
if (keyProps.Count() != keyValues.Count())
{
throw new Exception("Incorrect Number of Keys Provided.");
}
ParameterExpression prm = Expression.Parameter(typeof(T));
Expression body = null;
for (int i = 0; i < keyProps.Count(); i++)
{
PropertyInfo pi = keyProps[i];
object value = keyValues[i];
Expression propertyEx = Expression.Property(prm, pi);
Expression valueEx = Expression.Constant(value);
Expression condition = Expression.Equal(propertyEx, valueEx);
body = body == null ? condition : Expression.AndAlso(body, condition);
}
var filter = Expression.Lambda<Func<T, bool>>(body, prm);
return set.AsNoTracking().SingleOrDefault(filter);
}
public static PropertyInfo[] GetKeyProperties<T>(this DbContext source)
{
return source.Model.FindEntityType(typeof(T)).FindPrimaryKey().Properties.Select(p => p.PropertyInfo).ToArray();
}
you can then use the method directly on the DbContext. For example, if your entity has a composite key consisting of two strings:
context.FindNoTracking<MyEntity>("Key Value 1", "Key Value 2");
If you really want the Extension method to be on DbSet instead of the DbContext, you can do so but you'll need to get the context from the set in order to gain access to the metadata about the entity. Currently there isn't a good way to do this. There are some hacky ways to do this, but they involve using reflection to access private fields of framework classes, so I'd advise against it.
Alternatively...
If you have a way of figure out what the Key properties are without using the DbContext/Metadata, you can make it an extension for DbSet instead. For example, if all of your Key properties are marked with the [Key] attribute, you can use this code:
public static T FindNoTracking<T>(this DbSet<T> source, params object[] keyValues)
where T : class
{
//Pretty much the same...
}
public static PropertyInfo[] GetKeyProperties<T>()
{
return typeof(T).GetProperties()
.Where(pi => pi.GetCustomAttribute<KeyAttribute>() != null).ToArray();
}
This would also work in both Entity Framework and EF Core.
Back in 2015, an official request was made to include the functionality, i.e. combine Find() and AsNoTracking(). The issue was immediately closed after giving this argument:
AsNoTracking doesn't really make sense for Find since one of the key features of find is that it will return the already tracked version of the entity without hitting the database if it is already in memory. If you want to load an entity by key without tracking it then use Single.
Hence, you could replace:
_context.Set<Entity>().AsNoTracking().Find(id); // Invalid
with something like this:
_context.Set<Entity>().AsNoTracking().Single(e => e.Id == id);
Well, I guess if you really want to do this, you can try creating your expression yourself. I assume you have a base entity class that's generic and that's where the generic key property comes from. I named that class KeyedEntityBase<TKey>, TKey is the type of the key (if you don't have such a class, that's fine, the only thing that I used that for is the generic constraint). Then you can create an extension method like this to build the expression yourself:
public static class Extensions
{
public static IQueryable<TEntity> WhereIdEquals<TEntity, TKey>(
this IQueryable<TEntity> source,
Expression<Func<TEntity, TKey>> keyExpression,
TKey otherKeyValue)
where TEntity : KeyedEntityBase<TKey>
{
var memberExpression = (MemberExpression)keyExpression.Body;
var parameter = Expression.Parameter(typeof(TEntity), "x");
var property = Expression.Property(parameter, memberExpression.Member.Name);
var equal = Expression.Equal(property, Expression.Constant(otherKeyValue));
var lambda = Expression.Lambda<Func<TEntity, bool>>(equal, parameter);
return source.Where(lambda);
}
}
And then, you can use it like this (for an integer key type):
context.Set<MyEntity>.AsNoTracking().WhereIdEquals(m=>m.Id, 9).ToList();

Why does this linq extension method hit the database twice?

I have an extension method called ToListIfNotNullOrEmpty(), which is hitting the DB twice, instead of once. The first time it returns one result, the second time it returns all the correct results.
I'm pretty sure the first time it hits the database, is when the .Any() method is getting called.
here's the code.
public static IList<T> ToListIfNotNullOrEmpty<T>(this IEnumerable<T> value)
{
if (value.IsNullOrEmpty())
{
return null;
}
if (value is IList<T>)
{
return (value as IList<T>);
}
return new List<T>(value);
}
public static bool IsNullOrEmpty<T>(this IEnumerable<T> value)
{
if (value != null)
{
return !value.Any();
}
return true;
}
I'm hoping to refactor it so that, before the .Any() method is called, it actually enumerates through the entire list.
If i do the following, only one DB call is made, because the list is already enumerated.
var pewPew = (from x in whatever
select x)
.ToList() // This enumerates.
.ToListIsNotNullOrEmpty(); // This checks the enumerated result.
I sorta don't really want to call ToList() then my extension method.
Any ideas, folks?
I confess that I see little point in this method. Surely if you simply do a ToList(), a check to see if the list is empty suffices as well. It's arguably harder to handle the null result when you expect a list because then you always have to check for null before you iterate over it.
I think that:
var query = (from ...).ToList();
if (query.Count == 0) {
...
}
works as well and is less burdensome than
var query = (from ...).ToListIfNotNullOrEmpty();
if (query == null) {
...
}
and you don't have to implement (and maintain) any code.
How about something like this?
public static IList<T> ToListIfNotNullOrEmpty<T>(this IEnumerable<T> value)
{
if (value == null)
return null;
var list = value.ToList();
return (list.Count > 0) ? list : null;
}
To actually answer your question:
This method hits the database twice because the extension methods provided by the System.Linq.Enumerable class exhibit what is called deferred execution. Essentially, this is to eliminate the need for constructing a string of temporarily cached collections for every part of a query. To understand this, consider the following example:
var firstMaleTom = people
.Where(p => p.Gender = Gender.Male)
.Where(p => p.FirstName == "Tom")
.FirstOrDefault();
Without deferred execution, the above code might require that the entire collection people be enumerated over, populating a temporary buffer array with all the individuals whose Gender is Male. Then it would need to be enumerated over again, populating another buffer array with all of the individuals from the first buffer whose first name is Tom. After all that work, the last part would return the first item from the resulting array.
That's a lot of pointless work. The idea with deferred execution is that the above code really just sets up the firstMaleTom variable with the information it needs to return what's being requested with the minimal amount of work.
Now, there's a flip side to this: in the case of querying a database, deferred execution means that the database gets queried when the return value is evaluated. So, in your IsNullOrEmpty method, when you call Any, the value parameter is actually being evaluated right then and there -- hence a database query. After this, in your ToListIfNotNullOrEmpty method, the line return new List<T>(value) also evaluates the value parameter -- because it's enumerating over the values and adding them to the newly created List<T>.
You could stick the .ToList() call inside the extension, the effect is slightly different, but does this still work in the cases you have?
public static IList<T> ToListIfNotNullOrEmpty<T>(this IEnumerable<T> value)
{
if(value == null)
{
return null;
}
var result = value.ToList();
return result.IsNullOrEmpty() ? null : result;
}

How can I extend DynamicQuery.cs to implement a .Single method?

I need to write some dynamic queries for a project I'm working on. I'm finding out that a significant amount of time is being spent by my program on the Count and First methods, so I started to change to .Single, only to find out that there is no such method.
The code below was my first attempt at creating one (mostly copied from the Where method), but it's not working. Help?
public static object Single(this IQueryable source, string predicate, params object[] values)
{
if (source == null) throw new ArgumentNullException("source");
if (predicate == null) throw new ArgumentNullException("predicate");
LambdaExpression lambda = DynamicExpression.ParseLambda(source.ElementType, typeof(bool), predicate, values);
return source.Provider.CreateQuery(
Expression.Call(
typeof(Queryable), "Single",
new Type[] { source.ElementType },
source.Expression, Expression.Quote(lambda)));
}
IMHO, you should just simply can use Single or SingleOrDefault when you are executing the query.
// build your dynamic query
var query = NorthwindConext.Products.Categories
.Where("CategoryID = #0", 2);
// now you can simply get the single item by
var category = query.SingleOrDefault();
So, I do not see the necessity for a "Single" operator for dynnamic linq. Especially, as the IEnumerable or IQueryable returned by the query enumeration should only contain one item.
I don't understand what the difference you fill to be between Single(SingleOrDefault) and First(FirstOrDefault) ?
Moreover EF do not implement the first one and you have to use First(FirstOrDefault) instead.
Also why do you fill you will gain perfomance improvement by creating your own implementation of single , which is by your comment almost a copy of where , which is almost the same as first
so why do not use it , and try to look at query's being generated and analyse them ?
I think Queryable.Single is what you're looking for.

Categories