How project computed properties to SQL in EF Core? - c#

How can EF Core project computed properties to SQL instead of calculating them on the client side?
public enum OrderType
{
Normal,
Special,
}
public class Order
{
public int SpecialContainerId? { get; set; }
public OrderType CalculatedOrderType { get => SpecialContainerId == null ? OrderType.Normal: OrderType.Special; }
}
// Version 1
dbContext.Order.Where(o => o.CalculatedOrderType == OrderType.Special);
// Version 2
dbContext.Order.Where(o => o.SpecialContainerId != null);
There are two versions for my EF Core query.
The first one is slow, the second fast, because the first one is computed at the client side, and the other in the database.
Is it possible to write computed properties on the Order class which are projected to SQL in an easy way? I do not want to repeat my query all the time.
Writing a method on order which returns expression would work of course. But is there an easy way available meanwhile?

Related

EF Core: How to best get average value in a model of a related model

I've got a Blazor Server App using the Entity Framework (EF Core).
I use a code first approach with two models, Entry and Target.
Each entry has a target. So a target can have more than one entry pointing to it.
The model for the Target looks like this:
public class Target
{
public string TargetId { get; set; }
[Required]
public string Name { get; set; }
[InverseProperty("Target")]
public List<Entry> Entries { get; set; }
[NotMapped]
public double AverageEntryRating => Entries != null ? Entries.Where(e => e.Rating > 0).Select(e => e.Rating).Average() : 0;
}
An entry can have a rating, the model Entry looks like this:
public class Entry
{
public string EntryId { get; set; }
public int Rating { get; set; }
[Required]
public string TargetId {get; set; }
[ForeignKey("TargetId")]
public Target Target { get; set; }
}
As you can see in my Target model, I would like to know for each Target, what the average rating for it is, based on the average of all entries that point to the Target - that's why there is this (NotMapped) property in the target:
public double AverageEntryRating => Entries != null ? Entries.Where(e => e.Rating > 0).Select(e => e.Rating).Average() : 0;
But this does (of course) not always work, as the Entries of the target are not guaranteed to be loaded at the time the property is accessed.
I tried to solve it differently, for example to have a method in my TargetService, where I can pass in a targetId and gives me the result:
public double GetTargetMedianEntryRating(string targetId) {
var median = _context.Entries
.Where(e => e.TargetId == targetId && e.Rating > 0)
.Select(e => e.Rating)
.DefaultIfEmpty()
.Average();
return median;
}
But when I list out my targets in a table and then in a cell want to display this value (passing in the current targetId of the foreach loop) I get a concurrency exception, as the database context is used in multiple threads (I guess one from looping through the rows/targets and one other from getting the average value)... so this leads me into new troubles.
Personally I would prefer to work with the AverageEntryRating property on the Target model, as it seems natural to me and it would also be convenient to access the value just like this.
But how would I make sure, that the entries are loaded, when I access this property. Or is this not a good approach because this would mean I would have to load Entries anyway for all the targets which would lead to performance degradation? If yes, what would be a good way to get to the average/median value?
There are a couple of options I could think of, and it depends on you situation what to do. There might be more alternatives, but at least I hope that this can give you some options you hadn't considered.
Have a BaseQuery extension method that always include all Entries
You could make sure of doing .Include(x => x.Entries) whenever you are querying for Target. You can even create an extension method of the database context called something like TargetBaseQuery() that includes all necessary relationship whenever you use it. Then you will be sure that the Entries List of each Target will be loaded when you access the property AverageEntryRating.
The downside will be a performance hit, since every time you load a Target you will need to load all its entries... and that's for every Target you query.
However, if you need to get it working fast, this would be probably the easiest. The pragmatic approach would be to do this, measure the performance hit, and if it is too slow then try something else, instead of doing premature optimization. The risk of course would be that it might work fast now, but it might scale badly in the future. So it's up to you to decide.
Another thing to consider would be to not Include the Entries every single time, but only in those places where you know you need the average. It might however become a maintainability issue.
Have a different model and service method to calculate the TargetStats
You could create another class Model that stores the related data of a Target, but it's not persisted in the database. For example:
public class TargetStats
{
public Target Target { get; set; }
public double AverageEntryRating { get; set; }
}
Then in your service you could have a method ish like this (haven't tested, so it might not work as is, but you get the idea):
public List<TargetStats> GetTargetStats() {
var targetStats = _context.Target
.Include(x => x.Entries)
.Select(x => new TargetStats
{
Target = x,
AverageEntryRatings = x.Entries.Where(e => e.Rating > 0).Select(e => e.Rating).Average(),
})
.ToList()
return targetStats;
}
The only advantage of this is that you don't have to degrade the performance of all Target related queries, but only of those that requires the average rating.
But this query in particular might still be slow. What you could do to further tweak it, is write raw SQL instead of LINQ or have a view in the database that you can query.
Store and update the Target's average rating as a column
Probably the best you could do to keep the code clean and have good performance while reading, is to store the average as a column in the Target table. This will move the performance cost of the calculation to the saving/updating of a Target or its related Entries, but the readings will be super fast since the data is already available. If the readings happen way more often than the updates, then it's probably worth doing it.
You could take a look at EF Core docs on perfomance, since they talk a little bit about the different perfomance tunning alternatives.

How do you make EntityFramework generate efficient SQL queries for related objects?

I am trying to work out how to use the .NET EntityFramework to generate both readable and natural code and efficient SQL query statements when fetching related entities. For example, given the following code-first definition
public class WidgetContext : DbContext
{
public DbSet<Widget> Widgets { get; set; }
public DbSet<Gizmo> Gizmos { get; set; }
}
public class Widget
{
public virtual int Id { get; set; }
[Index]
[MaxLength(512)]
public virtual string Name { get; set; }
public virtual ICollection<Gizmo> Gizmos { get; set; }
}
public class Gizmo
{
public virtual long Id { get; set; }
[Index]
[MaxLength(512)]
public virtual string Name { get; set; }
public virtual Widget Widget { get; set; }
public virtual int WidgetId { get; set; }
}
I want to be able to write code like
using (var wc = new WidgetContext())
{
var widget = wc.Widgets.First(x => x.Id == 123);
var gizmo = widget.Gizmos.First(x => x.Name == "gizmo 99");
}
and see a SQL query created along the lines of
SELECT TOP (1) * from Gizmos WHERE WidgetId = 123 AND Name = 'gizmo 99'
So that the work of picking the right Gizmo is performed by the database. This is important because in my use case each Widget could have thousands of related Gizmos and in a particular request I only need to retrieve one at a time. Unfortunately the code above causes the EntityFramework to create SQL like this instead
SELECT * from Gizmos WHERE WidgetId = 123
The match on Gizmo.Name is then being performed in memory by scanning the complete set of related Gizmo entities.
After a good deal of experimentation, I have found ways of creating the efficient SQL use I am looking for in the entity framework, but only by using ugly code which is much less natural to write. The example below illustrates this.
using System.Data.Entity;
using System.Data.Entity.Core.Objects.DataClasses;
using System.Linq;
static void Main(string[] args)
{
Database.SetInitializer(new DropCreateDatabaseAlways<WidgetContext>());
using (var wc = new WidgetContext())
{
var widget = new Widget() { Name = "my widget"};
wc.Widgets.Add(widget);
wc.SaveChanges();
}
using (var wc = new WidgetContext())
{
var widget = wc.Widgets.First();
for (int i = 0; i < 1000; i++)
widget.Gizmos.Add(new Gizmo() { Name = string.Format("gizmo {0}", i) });
wc.SaveChanges();
}
using (var wc = new WidgetContext())
{
wc.Database.Log = Console.WriteLine;
var widget = wc.Widgets.First();
Console.WriteLine("=====> Query 1");
// queries all gizmos associated with the widget and then runs the 'First' query in memory. Nice code, ugly database usage
var g1 = widget.Gizmos.First(x => x.Name == "gizmo 99");
Console.WriteLine("=====> Query 2");
// queries on the DB with two terms in the WHERE clause - only pulls one record, good SQL, ugly code
var g2 = ((EntityCollection<Gizmo>) widget.Gizmos).CreateSourceQuery().First(x => x.Name == "gizmo 99");
Console.WriteLine("=====> Query 3");
// queries on the DB with two terms in the WHERE clause - only pulls one record, good SQL, ugly code
var g3 = wc.Gizmos.First(x => x.Name == "gizmo 99" && x.WidgetId == widget.Id);
Console.WriteLine("=====> Query 4");
// queries on the DB with two terms in the WHERE clause - only pulls one record, also good SQL, ugly code
var g4 = wc.Entry(widget).Collection(x => x.Gizmos).Query().First(x => x.Name == "gizmo 99");
}
Console.ReadLine();
}
Query 1 demonstrates the 'fetch everything and filter' approach that is generated by the natural usage of the entity objects.
Queries 2,3 and 4 above all generate what I would consider to be an efficient SQL query - one that returns a single row and has two terms in the WHERE clause, but they all involve very stilted C# code.
Does anyone have a solution that will allow natural C# code to be written and generate efficient SQL utilization in this case?
I should note that I have tried replacing ICollection with EntityCollection in my Widget object to allow the cast to be removed from the Query 2 code above. Unfortunately this leads to an EntityException telling me that
The object could not be added to the EntityCollection or
EntityReference. An object that is attached to an ObjectContext cannot
be added to an EntityCollection or EntityReference that is not
associated with a source object.
when I try to retrieve any related objects.
Any suggestions appreciated.
Ok, further digging has let me get as close as I think is possible to where I want to be (which, to reiterate, is code that looks OO but generates efficient DB usage patterns).
It turns out that Query2 above (casting the related collection to an EntityCollection) actually isn't a good solution, since although it generates the desired query type against the database, the mere act of fetching the Gizmos collection from the widget is enough to make the entity framework go off to the database and fetch all related Gizmos - i.e. performing the query that I am trying to avoid.
However, it's possible to get the EntityCollection for a relationship without calling the getter of the collection property, as described here http://blogs.msdn.com/b/alexj/archive/2009/06/08/tip-24-how-to-get-the-objectcontext-from-an-entity.aspx. This approach sidesteps the entity framework fetching related entities when you access the Gizmos collection property.
So, an additional read-only property on the Widget can be added like this
public IQueryable<Gizmo> GizmosQuery
{
get
{
var relationshipManager = ((IEntityWithRelationships)this).RelationshipManager;
return (IQueryable<Gizmo>) relationshipManager.GetAllRelatedEnds().First( x => x is EntityCollection<Gizmo>).CreateSourceQuery();
}
}
and then the calling code can look like this
var g1 = widget.GizmosQuery.First(x => x.Name == "gizmo 99");
This approach generates SQL that efficiently fetches only a single row from the database, but depends on the following conditions holding true
Only one relationship from the source to the target type. Having multiple relationships linking a Widget to Gizmos would mean a more complicated predicate would be needed in the .First() call in GizmosQuery.
Proxy creation is enabled for the DbContext and the Widget class is eligible for proxy generation (https://msdn.microsoft.com/en-us/library/vstudio/dd468057%28v=vs.100%29.aspx)
The GizmosQuery property must not be called on objects that are newly created using new Widget() since these will not be proxies and will not implement IEntityWithRelationships. New objects that are valid proxies can be created using wc.Widgets.Create() instead if necessary.

Entity Framework inheritance

SQL Layer:
I have a table
Entity Framwork Layer:
I have the following rule: all Offers, which have State is null, are Outstanding offers, State is true are Accepted offers, State is false are Declined offers. Also, part of fields used only for Outstanding, part - only for Accepted etc... I use Database first approach, so, I updated EF model from DB and renamed Offer entity to OfferBase and created 3 child classes:
it works fine for add/select entities/records. Right now I want to "move" offer from outstanding to accept offer, so, I need to set Status=true (from Status is null) for appropriate record. But how to do it by Entity Framework? If I try to select outstanding offer as Accept offer I get an null reference (and clearly why)
// record with ID=1 exists, but State is null, so, EF can not find this record and offer will be null after the following string
var offer = (from i in _db.OfferBases.OfType<EFModels.OfferAccepted>() where i.ID == 1 select i).FirstOrDefault();
if I try to select as OfferBase entity I get the following error:
Unable to cast object of type
'System.Data.Entity.DynamicProxies.OfferOutstanding_9DD3E4A5D716F158C6875FA0EDF5D0E52150A406416D4D641148F9AFE2B5A16A'
to type 'VTS.EFModels.OfferAccepted'.
var offerB = (from i in _db.OfferBases where i.ID == 1 select i).FirstOrDefault();
var offer = (EFModels.OfferAccepted)offerB;
ADDED NOTES ABOUT ARCHITECTURE:
I have 3 types of Offer entity. There are: AcceptOffer, DeclineOffer and OutstandingOffer.
AcceptOffer:
UserID
ContactID
Notes
FirstContactDate
LastContactDate
[... and 5-10 the unique fields...]
DeclineOffer:
UserID
ContactID
Notes
[... and 5-10 the unique fields...]
OutstandingOffer:
UserID
ContactID
FirstContactDate
LastContactDate
[... and 5-10 the unique fields...]
How to do it correctly? Of course, I can select a record, remove from DB and add new with appropriate state value, but how to do it normally?
You can't change the type of an object once it's created. Your object model seems wrong.
Either you delete the outstanding offer and create an accepted offer from it (looks like what you are currently doing) but you may lose relations as you created a new object with a new identity (you can also copy them before removing the old object). Or you want to keep the same object and change its state.
If you want to keep the same identity then preffer composition over inheritance.
Your code could look like this :
public class Offer
{
public int Id { get; set; }
public virtual OfferState State { get; set }
}
public class OfferState
{
public int OfferId { get; set; }
public string Notes { get; set; }
}
public class AcceptedOfferState : OfferState
{
public DateTimeOffset AcceptDate { get; set; }
}
public class DeclinedOfferState : OfferState
{
public DateTimeOffset DeclinedDate { get; set; }
}
If you still want to change the type of the object and keep its identity then you may use stored procedures ; as stated by Noam Ben-Ami (PM owner for EF) : Changing the type of an entity.
Rather than trying to add these custom classes to your entity framework model, just create them as normal c# classes and then use a projection to convert from the entity framework generated class to your own class e.g.
var accepteOffers= from i in _db.Offers
where i.ID == 1 && i.Status == true
select new OfferAccepted { AcceptDate = i.AcceptDate, StartTime = i.StartTime /* map all releaveant fields here */};

Best way to query using EF

Using LINQ, I am having trouble querying my DbContext in an efficient way.
The database contains 700,000 over entities which have a date and a name and other information.
In my code, I have a new list of objects (which can potentially have 100,000 elements) coming in and I would like to query my database and deduct which information are new entity or which information are existing entities that needs to be updated.
I would like to do it in a very efficient way (with a single query if possible).
This is my code :
public class MyDbContext : DbContext
{
public DbSet<MyEntity> MyEntities { get; set; }
}
public class MyEntity
{
[Key]
public Guid Id { get; set; }
public DateTime Date { get; set; }
public string Name { get; set; }
public double Amount { get; set; }
public string Description { get; set; }
}
public class IncomingInfo
{
public DateTime Date { get; set; }
public string Name { get; set; }
public double Amount { get; set; }
}
public class Modifier
{
public void AddOrUpdate(IList<IncomingInfo> info)
{
using (var context = new MyDbContext())
{
//Find the new information
//to add as new entities
IEnumerable<MyEntity> EntitiesToAdd = ??
//Find the information
//to update in existing entities
IEnumerable<MyEntity> EntitiesToUpdate = ??
}
}
}
Can someone help me constructing my query?
Thank you very much.
Edit :
Sorry I forgot to explain how do I consider two entities equal.
There are equal if the Date and the Name property are identical.
I first tried to build a predicate using LinqKit PredicateBuilder without much success (encountered the error of parameter too large, had to make multiple queries which took time).
So far the most successful way I found was to implement a LEFT OUTER join and join the incoming list to the DbSet
Which I implemented this way :
var values = info.GroupJoin(context.MyEntities,
inf => inf.Name + inf.Date.ToString(),
ent => ent.Name + ent.Date.ToString(),
(inf, ents) => new { Info = inf, Entities = ents })
.SelectMany(i => i.Entities.DefaultIfEmpty(),
(i, ent) => new { i.Info.Name, i.Info.Amount, i.Info.Date, ToBeAdded = ent == null ? true : false });
IEnumerable<MyEntity> EntitiesToAdd = values.Where(i => i.ToBeAdded)
.Select(i => new MyEntity
{
Id = Guid.NewGuid(),
Amount = i.Amount,
Date = i.Date,
Name = i.Name,
Description = null
}).ToList();
My test contains 700,000 entities in database. The incoming info list contains 70,000 items; where 50,000 are existing entities and 20,000 are new entities.
This query takes around 15 seconds to execute which does not seem right to me.
Hopefully this is enough to ask for help. Can someone help me one this ?
Thank you very much.
I read the pastebin response from #Leniency and it covers some of the same stuff I was going to say, like querying a date range and comparing on there. The problem with that method though is that (depending on how those dates are set) it might return all 700K+ records in the database, which would give you the absolute worst performance.
My suggestion is that you analyze your network topology to see how expensive your calls to the database really are. I'm assuming this is running on a (web) server which is receiving these IncomingInfo objects from clients. If this server is closely connected to your database server (or on the same machine) then you might be better off not optimizing your calls to the database.
Also, if you have control over the behavior of the clients, you might want to force them to send only like 25 to 100 records with each request. This would make it so that you could deal with them in much more manageable chunks. The client might have to send 100 or more requests to the server (which you could do async so that they get sent ~5 at a time, depending on expected load profiles), but at least it wouldn't be sitting there for 5+ minutes waiting to get a response back from the server for a single request.
BTW, the GroupJoin call that you said took 15 seconds probably is having to download all 700K records before doing the join. You see, joins can't be done on objects that don't exist on the same machine, it either has to send all the IncomingInfo objects (or at least the Name+Date.ToString() concatenations) to the database, or it has to request all the records from the database before any joins can be done. You would probably have to look at the SQL that is being sent to the database in order to tell which method is being used. But you would probably find that querying the database for matches one at a time would probably be faster than the join in this case.
Hope that helps! ;)

Optimizing Repository’s SubmitChanges Method

I have following repository. I have a mapping between LINQ 2 SQL generated classes and domain objects using a factory.
The following code will work; but I am seeing two potential issues
1) It is using a SELECT query before update statement.
2) It need to update all the columns (not only the changed column). This is because we don’t know what all columns got changed in the domain object.
How to overcome these shortcomings?
Note: There can be scenarios (like triggers) which gets executed based on specific column update. So I cannot update a column unnecessarily.
REFERENCE:
LINQ to SQL: Updating without Refresh when “UpdateCheck = Never”
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=113917
CODE
namespace RepositoryLayer
{
public interface ILijosBankRepository
{
void SubmitChangesForEntity();
}
public class LijosSimpleBankRepository : ILijosBankRepository
{
private IBankAccountFactory bankFactory = new MySimpleBankAccountFactory();
public System.Data.Linq.DataContext Context
{
get;
set;
}
public virtual void SubmitChangesForEntity(DomainEntitiesForBank.IBankAccount iBankAcc)
{
//Does not get help from automated change tracking (due to mapping)
//Selecting the required entity
DBML_Project.BankAccount tableEntity = Context.GetTable<DBML_Project.BankAccount>().SingleOrDefault(p => p.BankAccountID == iBankAcc.BankAccountID);
if (tableEntity != null)
{
//Setting all the values to updates (except primary key)
tableEntity.Status = iBankAcc.AccountStatus;
//Type Checking
if (iBankAcc is DomainEntitiesForBank.FixedBankAccount)
{
tableEntity.AccountType = "Fixed";
}
if (iBankAcc is DomainEntitiesForBank.SavingsBankAccount)
{
tableEntity.AccountType = "Savings";
}
Context.SubmitChanges();
}
}
}
}
namespace DomainEntitiesForBank
{
public interface IBankAccount
{
int BankAccountID { get; set; }
double Balance { get; set; }
string AccountStatus { get; set; }
void FreezeAccount();
}
public class FixedBankAccount : IBankAccount
{
public int BankAccountID { get; set; }
public string AccountStatus { get; set; }
public double Balance { get; set; }
public void FreezeAccount()
{
AccountStatus = "Frozen";
}
}
}
If I understand your question, you are being passed an entity that you need to save to the database without knowing what the original values were, or which of the columns have actually changed.
If that is the case, then you have four options
You need to go back to the database to see the original values ie perform the select, as you code is doing. This allows you to set all your entity values and Linq2Sql will take care of which columns are actually changed. So if none of your columns are actually changed, then no update statement is triggered.
You need to avoid the select and just update the columns. You already know how to do (but for others see this question and answer). Since you don't know which columns have changed you have no option but set them all. This will produce an update statement even if no columns are actually changed and this can trigger any database triggers. Apart from disabling the triggers, about the only thing you can do here is make sure that the triggers are written to check the old and new columns values to avoid any further unnecessary updates.
You need to change your requirements/program so that you require both old and new entities values, so you can determine which columns have changed without going back to the database.
Don't use LINQ for your updates. LINQ stands for Language Integrated QUERY and it is (IMHO) brilliant at query, but I always looked on the updating/deleting features as an extra bonus, but not something which it was designed for. Also, if timing/performance is critical, then there is no way that LINQ will match properly hand-crafted SQL.
This isn't really a DDD question; from what I can tell you are asking:
Use linq to generate direct update without select
Where the accepted answer was no its not possible, but theres a higher voted answer that suggests you can attach an object to your context to initiate the change tracking of the data context.
Your second point about disabling triggers has been answered here and here. But as others have commented do you really need the triggers? Should you not be controlling these updates in code?
In general I think you're looking at premature optimization. You're using an ORM and as part of that you're trusting in L2S to make the database plumbing decisions for you. But remember where appropriate you can use stored procedures execute specific your SQL.

Categories