In my UserRepository I have a GetActive method:
public IEnumerable<User> GetActive()
{
var users = context.Set<UserTbl>().Where(x => x.IsActive);
foreach(var user in users)
{
yield return entityMapper.CreateFrom(user);
}
}
The entityMapper is used to map from an EF-generated UserTbl to the User domain entity.
There exists thousands of users so I want the GetActive method to defer execution while returning the IEnumerable<User> so that the entire list isn't pulled unnecessarily. I have done that above with the foreach and yield.
When testing, it seems that the all the data is being fetched regardless. The following two calls take the same time:
// Get only 5 users in memory
var someUsers = UserRepository.GetActive().Take(5).ToList();
// Get all 100,000 users into memory
var allUsers = UserRepository.GetActive().ToList();
What am I doing wrong?
The moment you use foreach, the data is enumerated. You have to use IQueryable only until the ToList method. Your idea about deferring data using IEnumerable seem nice, but it is wrong. IEnumerable always returns all data, it just doesn't force the provider of the data to hold it all in memory. You have to use IQueryable if you want to provider to return pieces of data. But then, you can't use foreach and yield, because it always enumerates all data in it's parameter.
Only way to do what you want is to pass the required query into the GetActive method.
public IEnumerable<User> GetActive(Func<IQueryable<User>, IQueryable<User>> modifier)
{
var users = modifier(context.Set<UserTbl>().Where(x => x.IsActive));
foreach(var user in users)
{
yield return entityMapper.CreateFrom(user);
}
}
// Get only 5 users in memory
var someUsers = UserRepository.GetActive(q=>q.Take(5)).ToList();
// Get all 100,000 users into memory
var allUsers = UserRepository.GetActive(q=>q).ToList();
But I would really recommend not having repositories in your architecture at all. They introduce needless complexity over already complex ORM. See more in Repository pattern with Entity framework
Related
First I create database view where I have the records ordered.
But when I try to do "Skip" and "Take" they are not ordered.
var query = dbContext.UserView.OrderBy(x => x.Id);
for (int i = 0; i < 10; i++)
{
var users = await query
.Skip(i)
.Take(1)
.ToListAsync();
await SendMessage(users);
}
I am trying to take and send records on chunks but I don't want to load them in memory.
If I don't order var query = dbContext.UserView.OrderBy(x => x.Id); here, I receive different order each time in my for loop even though I create my database view with "order by".
When I call ToListAsync(), will it order every time and become a slower query.
Is there a way to create the database view and every time when I ask for records to keep the same order?
Thank you
Database Views are NEVER ordered by definition. The CREATE or ALTER statement for the view will fail if there is an ORDER BY clause. In this case, though, you cover for it with the .OrderBy() call from the dbContext.
Additionally, it's usually best to avoid calling ToList() or ToListAysnc() before you need to, and in this case you probably don't need to.
As for the ordering... this is what async code is for: it lets you do work on one thing while still await-ing another. The result is things don't always run in the order you expected. But there are some things we can do to help.
Finally, the way the loop is structured this would have sent 10 separate queries to the database. That's bad. In fact, it's probably the main source of the strange ordering, as the other async operations should all finish nearly instantly and so avoid most of the reordering issues.
You want it to look more like this instead:
var query = dbContext.UserView.OrderBy(x => x.Id).Take(10);
foreach(var user in query)
{
await SendMessage(user);
}
Of if SendMessge() can only accept a sequence:
var query = dbContext.UserView.OrderBy(x => x.Id).Take(10);
await SendMessage(query);
And if (and only if!) SendMessage() absolutely demands a list (and given the requirement from the comments to still only have one item in memory), we can still improve performance by only running ONE query:
var query = await dbContext.UserView.OrderBy(x => x.Id).Take(10);
foreach(var user in query)
{
//guessing at the type name here)
var list = new List<User>() {user} ;
await SendMessage(list);
}
Again: the above only runs ONE query on the database, but still only has ONE item in memory at a time.
But in this last situation I'd probably first explore whether I could also change or overload SendMessage() to allow an IEnumerable<User>. That is, the method probably looks like something like this:
public static async void SendMessage(List<User> users)
{
foreach(var user in users)
{
//send the message
}
}
And if so you can change it like this:
public static async void SendMessage(IEnumerable<User> users)
{
foreach(var user in users)
{
//send the message
}
}
Note the code in method did not change... only the function signature. Best of all, you can still pass a list to the method when you call it. And now you can also pass any other enumerable, including the version from the second code sample in this answer.
A few years ago I took over a project from a previous developer and have been working my way through this system and adding to it as well as reconfiguring some aspects of it. The latest thing I've uncovered was a many to many relationship that I haven't handled yet.
I have two tables:
ECAnalysis - which contains active and valid analyses for a given instrument.
ECAnalyte - which contains active and valid analytes.
One analysis can have many analytes whereas one analyte can be on many analyses (if I'm thinking about this in a correct way from what was left for me).
There is also an intermediate table in my MySQL database called ECAnalysisToECAnalyte which just contains the primary keys of each table.
The EF code is configured as such to be a generic repository because of the large number of entities. So to get data out of the system along with navigation properties there is a method that gets all of the data:
public virtual async Task<IList<T>> GetAllAsync<T>(params Expression<Func<T, object>>[] navigationProperties) where T : class
{
try
{
using (var db = new EntitiesContext())
{
List<T> list;
IQueryable<T> dbQuery = db.Set<T>();
foreach (Expression<Func<T, object>> navigationProperty in navigationProperties)
dbQuery = dbQuery.Include<T, object>(navigationProperty);
list = await dbQuery.AsNoTracking().ToListAsync<T>();
return list;
}
}
catch (ArgumentNullException ex)
{
throw new InvalidOperationException($"Invalid state: {typeof(T).Name} DbSet is null.", ex);
}
}
This method works correctly for all normal types of relationships and entities but it doesn't seem to work for a many to many relationship.
And to call it in my main application it's used in this manner:
var data = (from a in await Manager.GetAllAsync<ECAnalysis>(n => n.ECAnalyte)
select a);
My task now has been to get a list of analytes for a given analysis and pass them to a view. My first issue is when I use the method as is with the AsNoTracking() I get an exception stating that:
When an object is returned with a NoTracking merge option, Load can only be called when the EntityCollection or EntityReference does not contain objects.
If I remove the AsNoTracking() I get the correct list of analytes to my view but along with another exception:
The ObjectContext instance has been disposed and can no longer be used for operations that require a connection.
How I'm getting this list of analytes is like this...I know it's probably not the cleanest or most efficient so bear with me:
public async Task<JsonResult> GetAnalytes(string sampleType)
{
var analysis = (from a in await Manager.AllAsync<ECAnalysis>(b => b.ECAnalyte)
where a.Method == "IC" && a.Media == sampleType + " - ANIONS"
select a).ToList();
var ecAnalytes = analysis.FirstOrDefault().ECAnalyte.ToList();
var analyteList = new List<string>();
foreach (var item in ecAnalytes)
{
analyteList.Add(item.Name);
};
return Json(analysis, JsonRequestBehavior.AllowGet);
//this is being called from an Ajax method
}
So my questions are:
Is the many to many relationship configured properly for how I want to use it?
Is there a better way to get a list of analytes for a given analysis that doesn't give me an exception?
so basically, I have a project which uses EF Core. In order to shorten my lambdas when comparing if two objects (class Protocol) are equal, I've overridden my Equals method and overloaded the == and != operators. However, LINQ doesn't seem to care about it, and still uses reference for determining equality. Thanks
As I've said before, I've overridden the Equals method and overloaded the == and != operators. With no luck. I've also tried implementing the IEquatable interface. Also no luck.
I am using:
EF Core 2.2.4
//the protocol class
[Key]
public int ProtocolId {get;set;}
public string Repnr {get;set;}
public string Service {get;set;}
public override bool Equals(object obj)
{
if (obj is Protocol other)
{
return this.Equals(other);
}
return false;
}
public override int GetHashCode()
{
return $"{Repnr}-{Service}".GetHashCode();
}
public bool Equals(Protocol other)
{
return this?.Repnr == other?.Repnr && this?.Service == other?.Service;
}
public static bool operator ==(Protocol lhs, Protocol rhs)
{
return lhs.Equals(rhs);
}
public static bool operator !=(Protocol lhs, Protocol rhs)
{
return !lhs.Equals(rhs);
}
//the problem
using (var db = new DbContext())
{
var item1 = new Protocol() { Repnr = "1666", Service = "180" };
db.Protocols.Add(item1 );
db.SaveChanges();
var item2 = new Protocol() { Repnr = "1666", Service = "180" };
var result1 = db.Protocols.FirstOrDefault(a => a == item2);
var result2 = db.Protocols.FirstOrDefault(a => a.Equals(item2));
//both result1 and result2 are null
}
I would expect both result1 and result2 to be item1. However, they're both null. I know I could just do a.Repnr == b.Repnr && a.Service == b.Service, but that just isn't as clean. Thanks
To understand why the incorrect equality comparer is used, you have to be aware about the difference between IEnumerable<...> and IQueryable<...>.
IEnumerable
An object that implements IEnumerable<...>, is an object that represents a sequence of similar objects. It holds everything to fetch the first item of the sequence, and once you've got an item of the sequence you can get the next item, as long as there is a next item.
You start enumerating either explicitly by calling GetEnumerator() and repeatedly call MoveNext(). More common is to start enumerating implicitly by using foreach, or LINQ terminating statements like ToList(), ToDictionary(), FirstOrDefault(), Count() or Any(). This group of LINQ methods internally uses either foreach, or GetEnumerator() and MoveNext() / Current.
IQueryable
An object that implements IQueryable<...> also represents an enumerable sequence. The difference however, is that this sequence usually is not held by your process, but by a different process, like a database management system.
The IQueryable does not (necessarily) hold everything to enumerate. Instead it holds an Expression and a Provider. The Expression is a generic description about what must be queried. The Provider knows which process will execute the query (usually a database management system) and how to communicate with this process (usually something SQL-like).
An IQueryable<..> also implements IEnumerable<..>, so you can start enumerating the sequence as if it was a standard IEnumerable. Once you start enumerating an IQueryable<...> by calling (internally) GetEnumerator(), the Expression is sent to the Provider, who translates the Expression into SQL and executes the query. The result is presented as an enumerator, which can be enumerated using MoveNext() / Current.
This means, that if you want to enumerate an IQueryable<...>, the Expression must be translated into a language that the Provider supports. As the compiler does not really know who will execute the query, the compiler can't complain if your Expression holds methods or classes that your Provider doesn't know how to translate to SQL. In such cases you'll get a run-time error.
It is easy to see, that SQL does not know your own defined Equals method. In fact, there are even several standard LINQ functions that are not supported. See Supported and Unsupported LINQ Methods (LINQ to Entities).
So what should I do if I want to use an unsupported function?
One of the things that you could do is move the data to your local process, and then call the unsupported function.
This can be done using ToList, but if you will only use one or a few of the fetched items, this would be a waste of processing power.
One of the slower parts of a database query is the transport of the selected data to your local process. Hence it is wise to limit the data to the data that you actually plan to use.
A smarter solution would be to use AsEnumerable. This will fetch the selected data "per page". It will fetch the first page, and once you've enumerated through the fetched page (using MoveNext), it will fetch the next page.
So if you only use a few of the fetched items, you will have fetched some items that are not used, but at least you won't have fetched all of them.
Example
Suppose you have a local function that takes a Student as input and returns a Boolean
bool HasSpecialAbility(Student student);
Requirement: give me three Students that live in New York City that have the special Ability.
Alas, HasSpecialAbility is a local function, it can't be translated into Sql. You'll have to get the Students to your local process before calling it.
var result = dbContext.Students
// limit the transported data as much as you can:
.Where(student => student.CityCode == "NYC")
// transport to local process per page:
.AsEnumerable()
// now you can call HasSpecialAbility:
.Where(student => HasSpecialAbility(student))
.Take(3)
.ToList();
Ok, you might have fetched a page of 100 Students while you only needed 3, but at least you haven't fetched all 25000 students.
Suppose I have built up, through some conditional logic over many steps, an IQueryable<T> instance we'll call query.
I want to get a count of total records and a page of data, so I want to call query.CountAsync() and query.Skip(0).Take(10).ToListAsync(). I cannot call these in succession, because a race condition occurs where they both try to run a query on the same DbContext at the same time. This is not allowed:
"A second operation started on this context before a previous asynchronous operation completed. Use 'await' to ensure that any asynchronous operations have completed before calling another method on this context. Any instance members are not guaranteed to be thread safe."
I do not want to 'await' the first before even starting the second. I want to fire off both queries as soon as possible. The only way to do this is to run them from separate DbContexts. It seems ridiculous that I might have to build the entire query (or 2, or 3) side-by-side starting with different instances of DbSet. Is there any way to clone or alter an IQueryable<T> (not necessarily that interface, but it's underlying implementation) such that I can have one copy that runs on DbContext "A", and another that will run on DbContext "B", so that both queries can be executing simultaneously? I'm just trying to avoid recomposing the query X times from scratch just to run it on X contexts.
There is no standard way of doing that. The problem is that EF6 query expression trees contain constant nodes holding ObjectQuery instances which are bound to the DbContext (actually the underlying ObjectContext) used when creating the query. Also there is a runtime check before executing the query if there are such expressions bound to a different context than the one executing the query.
The only idea that comes in my mind is to process the query expression tree with ExpressionVisitor and replace these ObjectQuery instances with new ones bound to the new context.
Here is a possible implementation of the aforementioned idea:
using System.Data.Entity.Core.Objects;
using System.Data.Entity.Infrastructure;
using System.Linq;
using System.Linq.Expressions;
namespace System.Data.Entity
{
public static class DbQueryExtensions
{
public static IQueryable<T> BindTo<T>(this IQueryable<T> source, DbContext target)
{
var binder = new DbContextBinder(target);
var expression = binder.Visit(source.Expression);
var provider = binder.TargetProvider;
return provider != null ? provider.CreateQuery<T>(expression) : source;
}
class DbContextBinder : ExpressionVisitor
{
ObjectContext targetObjectContext;
public IQueryProvider TargetProvider { get; private set; }
public DbContextBinder(DbContext target)
{
targetObjectContext = ((IObjectContextAdapter)target).ObjectContext;
}
protected override Expression VisitConstant(ConstantExpression node)
{
if (node.Value is ObjectQuery objectQuery && objectQuery.Context != targetObjectContext)
return Expression.Constant(CreateObjectQuery((dynamic)objectQuery));
return base.VisitConstant(node);
}
ObjectQuery<T> CreateObjectQuery<T>(ObjectQuery<T> source)
{
var parameters = source.Parameters
.Select(p => new ObjectParameter(p.Name, p.ParameterType) { Value = p.Value })
.ToArray();
var query = targetObjectContext.CreateQuery<T>(source.CommandText, parameters);
query.MergeOption = source.MergeOption;
query.Streaming = source.Streaming;
query.EnablePlanCaching = source.EnablePlanCaching;
if (TargetProvider == null)
TargetProvider = ((IQueryable)query).Provider;
return query;
}
}
}
}
One difference with the standard EF6 LINQ queries is that this produces ObjectQuery<T> rather than DbQuery<T>, although except that ToString() does not return the generated SQL, I haven't noticed any difference in the further query building / execution. It seems to work, but use it with care and on your own risk.
You could write a function to build up your query, taking DbContext as a parameter.
public IQueryable<T> MyQuery(DbContext<T> db)
{
return db.Table
.Where(p => p.reallycomplex)
....
...
.OrderBy(p => p.manythings);
}
I've done this many times and it works well.
Now it's easy to make queries with two different contexts:
IQueryable<T> q1 = MyQuery(dbContext1);
IQueryable<T> q2 = MyQuery(dbContext2);
If your concern was the execution time taken to build the IQueryable objects, then my only suggestion is don't worry about it.
So you have an IQueryable<T> that will be performed on DbContext A as soon as the query is executed and you want the same query to run on DbContext B when the query is executed.
For this you'll have to understand the difference between an IEnumerable<T> and an IQueryable<T>.
An IEnumerable<T> holds all code to enumerate over the elements that the enumerable represents. The enumeration starts when GetEnumerator and MoveNext are called. This can be done explicitly. However it is usually done implicitly by functions like foreach, ToList, FirstOrDefault, etc.
An IQueryable does not hold the code to enumerate, it holds an Expression and a Provider. The Provider knows who will execute the query, and it knows how to translate the Expression into the language that is understood by the query executioner.
Due to this separation, it is possible to let the same Expression be executed by different data sources. They don't even have to be of the same type: one data source can be a database management system that understands SQL, the other one could be a comma separated file.
As long as you concatenate Linq statements that return an IQueryable, the query is not executed, only the Expression is changed.
As soon as enumeration starts, either by calling GetEnumerator / MoveNext, or by using foreach or one of the LINQ functions that do not return an IQueryable, the Provider will translate the Expression into the language the the data source understands and communicates with the data source to execute the query. The result of the query is an IEnumerable, which can be enumerated as if all data was in local code.
Some Providers are smart and use some buffering, so that not all data is transferred to local memory, but only part of the data. New data is queried when needed. So if you do a foreach in a database with a zillion elements, only the first few (thousands) elements are queried. More data is queried if your foreach runs out of fetched data.
So you already have one IQueryable<T>, therefore you have an Expression a Provider and an ElementType. You want the same Expression / ElementType to be executed by a differentProvider. You even want to change theExpression` slightly before you execute it.
Therefore you need to be able to create an object that implements IQueryable<T> and you want to be able to set the Expression, ElementType and a Provider
class MyQueryable<T> : IQueryable<T>
{
public type ElementType {get; set;}
public Expression Expression {get; set;}
public Provider Provider {get; set;}
}
IQueryable<T> queryOnDbContextA= dbCotextA ...
IQueryable<T> setInDbContextB = dbContextB.Set<T>();
IQueryable<T> queryOnDbContextB = new MyQueryable<T>()
{
ElementType = queryOnDbContextA.ElementType,
Expression = queryOnDbContextB.Expression,
Provider = setInDbContextB.Provider,
}
If desired you can adjust the query on the other context before executing it:
var getPageOnContextB = queryOnDbContextB
.Skip(...)
.Take(...);
Both queries are still not executed yet. Execute them:
var countA = await queryOnContextA.CountAsync();
var fetchedPageContextB = await getPageOnContextB.ToListAsync();
Let's say I have a User Entity, and created partial User class so I can add some methods (like with NHibernate). I added GetByID to make getting user easier:
public static User GetByID(int userID)
{
using (var context = new MyEntities())
{
return context.Users.Where(qq => qq.UserID == userID).Single();
}
}
Now, somewhere in business logic I'd like to do something like this:
var user = User.GetByID(userID);
var posts = user.GetAllPostsForThisMonth();
foreach(var post in posts)
{
Console.WriteLine(post.Answers.Count);
}
GetAllPostsForThisMonth() is similar to GetByID - has context and is disposing it right after execution.
Normally I can't do this because context is disposed when I call post.Answers.Count. This, I think, renders my methods useless... Or am I missing something? Can I anyhow use my entities like this? Or should I create method for every single query I use (like post.GetAnswersCount())? Thanks in advance!
The behavior you're lamenting is actually good, because it keeps you from shooting yourself in the foot. If you had been allowed to do this, it would have cause n round-trips to the database (where n is the number of posts), and each one of those round-trips would have pulled all the data for all the Answers, when all you wanted was the Count. This could have an enormous performance impact.
What you want to do is construct an object that represents all the information you expect to need from the database, and then construct a LINQ query that will actually load in all the information you expect to use.
public class PostSummary
{
public Post Post {get;set;}
public int AnswerCount {get;set;}
}
public IEnumerable<PostSummary> GetPostSummariesByUserAndDateRange(
int userId, DateTime start, DateTime end)
{
using (var context = new MyEntities())
{
return context.Posts
.Where(p => p.UserId == userId)
.Where(p => p.TimeStamp < start && p.TimeStamp > end)
.Select(new PostSummary{Post = p, AnswerCount = p.Answers.Count()})
.ToList();
}
}
This produces a single SQL query and, in a single round-trip, produces exactly the information you wanted without loading in a ton of information you didn't want.
Update
If NHibernate works anything like Java's Hibernate, it won't do lazy loading after the context is disposed, either. Entity Framework does give you a lot of options along these lines: which one works best for you will depend on your particular situation. For example:
Rather than keeping your context scoped inside the data-access method, you can make the context last longer (like once per request in a web application), so that Lazy loading of properties will continue to work beyond your data-access methods.
You can eagerly load whatever entity associations you know you're going to need.
Here's an example of eager loading:
public GetAllPostsAndAnswersForThisMonth()
{
using (var context = new MyEntities())
{
return context.Posts.Include("Answers")
.Where(p => p.UserID == UserID)
.ToList();
}
}
However, since Entity Framework basically constitutes your "Data Access" tier, I would still argue that the best practice will be to create a class or set of classes that accurately model what your business layer actually wants out of the data tier, and then have the data access method produce objects of those types.
One method is to explicitly load related objects that you know you will need before you dispose the context. This will make the related data available, with the downside that if you don't need the related info it is wasted time and memory to retrieve. Of course, you can also handle this with flags:
... GetAllPostsForThisMonth(bool includeAnswers)
{
using (var context = new MyEntities())
{
context.ContextOptions.LazyLoadingEnabled = false;
// code to get all posts for this month here
var posts = ...;
foreach (var post in posts)
if (!post.Answers.IsLoaded)
post.Answers.Load();
return posts;
}
}