I've got a list of entity object Individual for an employee survey app - an Individual represents an employee or outside rater. The individual has the parent objects Team and Team.Organization, and the child objects Surveys, Surveys.Responses. Responses, in turn, are related to Questions.
So usually, when I want to check the complete information about an Individual, I need to fetch Individuals.Include(Team.Organization).Include(Surveys.Responses.Question).
That's obviously a lot of includes, and has a performance cost, so when I fetch a list of Individuals and don't need their related objects, I don't bother with the Includes... but then the user wants to manipulate an Individual. So here's the challenge. I seem to have 3 options, all bad:
1) Modify the query that downloads the big list of Individuals to .Include(Team.Organization).Include(Surveys.Responses.Question). This gives it bad performance.
2) Individuals.Load(), TeamReference.Load(), OrganizationReference.Load(), Surveys.Load(), (and iterate through the list of Surveys and load their Responses and the Responses' Questions).
3) When a user wishes to manipulate an Individual, I drop that reference and fetch a whole brand new Individual from the database by its primary key. This works, but is ugly because it means I have two different kinds of Individuals, and I can never use one in place of the other. It also creates ugly problems if I'm iterating across a list repeatedly, as it's tricky to avoid loading and dropping the fully-included Individuals repeatedly, which is wasteful.
Is there any way to say
myIndividual.Include("Team.Organization").Include("Surveys.Responses.Question");
with an existing Individual entity, instead of taking approach (3)?
That is, is there any middle-ground between "fetch everything from the database up-front" and "late-load one relationship at a time"?
Possible solution that I'm hoping I could get insight about:
So there's no way to do a manually-implemented explicit load on a navigational-property? No way to have the system interpret
Individual.Surveys = from survey in MyEntities.Surveys.Include("Responses.Question")
where survey.IndividualID = Individual.ID
select survey; //Individual.Surveys is the navigation collection property holding Surveys on the Individual.
Individual.Team = from team in MyEntities.Teams.Include("Organization")
where team.ID = Individual.TeamID
select team;
as just loading Individual's related objects from the database instead of being an assignment/update operation? If this means no actual change in X and Y, can I just do that?
I want a way to manually implement a lazy or explicit load that isn't doing it a dumb (one relation at a time) way. Really, the Teams and Organizationss aren't the problem, but the Survey.Responses.Questions are a massive buttload of database hits.
I'm using 3.5, but for the sake of others (and when my project finally migrates to 4) I'm sure responses relevant to 4 would be appreciated. In that context, similar customization of lazy loading would be good to hear about too.
edit: Switched the alphabet soup to my problem domain, edited for clarity.
Thanks
The Include statement is designed to do exactly what you're hoping to do. Having multiple includes does indeed eager load the related entities.
Here is a good blog post about it:
http://thedatafarm.com/blog/data-access/the-cost-of-eager-loading-in-entity-framework/
In addition, you can use strongly typed "Includes" using some nifty ObjectContext extension methods. Here is an example:
http://blogs.microsoft.co.il/blogs/shimmy/archive/2010/08/06/say-goodbye-to-the-hard-coded-objectquery-t-include-calls.aspx
Related
I have a question about Saving a list of object in ASP.NET MVC.
First I'm not using EntityFramework or Nh like ORM tool, just use Ado.net
suppose I have an object Product, and I want to collect all the products data via javascript and batch update the product list in one call.
my question is when should I differentiate which item is inserted, updated, or deleted?
one strategy is that I have a enum property on the DTO object and
also on the javascript ViewModel, and when I add an item into the
viewModel, I marked this object to add, and if I changed one Item, I
marked it to updated. so when this request come to the action, I can
know which items to be insert or update.
pros: it's easy on server side, don't need to differentiate the object status from server side.
cons: if I want to publish this action to webapi that will be called by third party, that may need third party user to
differentiate the state of the object.
differentiate the data from server side, just give me a list of object, on the server side, first retrive the current data from database, compare the data, then check which record to be inserted or updated.
pros: all the compare are done from server side.
cons: proformance issue
what ever the data passed from client, just remove the current data and insert the new data
I hope someone could give me an advice, what's the best practice to handle this situation, I think it's quite common but I can't find a best solution.
I've seen option 1 where added/deleted/modified items are maintained in javascript arrays and posted back to server. But for some reason, I didn't like it maybe because of writing client side code to maintain state.
So, I had used second option and thanks to LINQ for making my task easier. Assuming list has some unique id, below is pseudo code. Note: newly added items should have unique random id's, otherwise there might be chance of treating them as already existing item. In my case its GUID, so there was no chance of overriding.
var submittedIds = vmList.Select(a=>a.Id).ToList();
var dbIds = dbList.Select(d=>d.Id).ToList();
//Added items
var newIds = submittedIds.Except(dbIds).ToList();
//loop over newIds and construct list object to contain newly added items
//Deleted items
var deletedIds = dbIds.Except(submittedIds).ToList();
//Modified items
var modifiedIds = dbIds.Intersect(submittedIds).ToList();//if the values don't change, update statement won't do any harm here
This approach gives reasonable performance unless you are dealing with huge lists.
I think third option is not good. For ex: if you plan to implement audit features on your tables, it will give you wrong functionality. If a new record is inserted, you will have entries for all records as deleted and then one inserted which is wrong because only one is inserted.
3rd strategy is suitable for simple situations e.g. when you want to update a Purchase Order items, an Order will not have too much OrderLineItems. However, you have to take care concurrency issue.
I think your first strategy is best suitable in general case. It's also easy to implement. When you want to publish your service to a 3rd party, it's usual that a client must follow the service definition and requirement.
Update
For 1st strategy: If you don't want your clients have to specify status for their data, then do it for them. You can separate the SaveOrder service into smaller services: CreateOrder, UpdateOrder, DeleteOrder.
I'm having a problem with Entity Framework and filtering architecture.
Let's say that I have a couple of related entities, and I want to do some changes to them, based on a filter.
So, for example I have Orders and Orderlines (to put a simple example)
I have order1, with orderline1, orderline2, orderline3 relationships in the DB
Then I receive an update request for order1 but only for orderline1 and orderline3
I get the data from the db using entity framework, which retrieves an objectgraph of the order and its lines.
Is there a way to filter these entity objects so that I can work with an objectgraph that contains order1 and orderline1 and orderline3, but NOT orderline2 without that being a problem later?
Because if i remove orderline2 from the entitycollection, i get later on concurrency errors (or deleted entities, which is something i don't want)
I hope the question is clear, I know that there could be other ways (iterating and not performing updates on orderline2, so it remains the same and no changes are made) but the way the architecture was made doesn't let me do that right now.
If I could say "don't track any more changes to orderline2, just ignore any changes that I do to this particular object and descendants, just leave it in the DB the way it is", so that I can just remove it from the collection and move forward, that'd be perfect
Thanks!
You can go multiple ways as you already described yourself as well:
Iterating through all orderlines and only modifying those that need to be modified (but that isn't an option as you stated)
The alternative you described to specifically not track changes for orderline2 is not possible in a "normal" EF situation where the ObjectStateManager is responsible for change tracking (as far as I know). In a scenario with Self Tracking Entities it's more easy because every STE has it's own unique ChangeTracker on board which can be easily switched off.
But the most easy option would be to exclude the orderlines you dont want to modify in the "select" statement or the retrieval of the entities. Something like:
private void ModifyOrderLines(int orderID, List<int> orderlineIds)
{
using(Context context = new Context)
{
List<OrderLines> orderlines =
context.OrderLines.
Where(orderLine => orderLine.OrderID == orderID && orderlineIDS.Contains(orderLine.ID))
}
}
Assuming you have set up clean foreign key relationships which were translated into Navigation Properties in EF. So what you do is to get a list of OrderLines which belong to a certain order and have an ID that's in your list of OrderLines that need to be modified.
Afterwards you change the orderlines and apply the changes to the context and call SaveChanges. This is just a basic way of how you could do things. I don't know your exact setup but I hope this helps.
EDIT
Based on your comment I should just go for the easy way and write a loop as you already proposed. Why not? I don't think there are many alternatives, and if there are then they would make things overcomplicated.
So something like this might just work:
ObjectContext.OrderLines.ForEach(o => if(orderlineIds.Contains(o.ID) {o.SomeProperty = SomeValue}));
Or you could just write the loop yourself.
EDIT2
You already mentioned detaching from the ObjectContext in the title of your post. Why don't go that way then? You tell that you have no control over the ObjectContext that you get, that it is passed into several methods and that you get update requests for certain entities. Then detaching those entities that are not needed for the update request can be an option too. Maybe this topic on MSDN might help you decide. Afterwards you might attach the detached objects again for they maybe needed for subsequent "client" calls. But this depends on how you manage the ObjectContext.
Do you keep the ObjectContext "alive" over multiple "client" calls or do you instantiate it over and over again for specific client calls. I do not get the situation totally clear...
I'm using the PetaPoco mini-ORM, which in my implementation runs stored procedures and maps them to object models I've defined. This works very intuitively for queries that pull out singular tables (i.e. SELECT * FROM Orders), but less so when I start writing queries that pull aggregate results. For example, say I've got a Customers table and Orders table, where the Orders table contains a foreign key reference to a CustomerID. I want to retrieve a list of all orders, but in the view of my application, display the Customer name as well as all the other order fields, i.e.
SELECT
Customers.Name,
Orders.*
FROM
Orders
INNER JOIN Customers
ON Orders.CustomerID = Customers.ID
Having not worked with an ORM of any sort before, I'm unsure of the proper method to handle this sort of data. I see two options right now:
Create a new aggregate model for the specific operation. I feel like I would end up with a ton of models in any large application by doing this, but it would let me map a query result directly to an object.
Have two separate queries, one that retrieves Orders, another that retrieves Customers, then join them via LINQ. This seems a better alternative than #1, but similarly seems obtuse as I am pulling out 30 columns when I desire one (although my particular mini-ORM allows me to pull out just one row and bind it to a model).
Is there a preferred method of doing this, either of the two I mentioned, or a better way I haven't thought of?
Option #1 is common in CQRS-based architectures. It makes sense when you think about it: even though it requires some effort, it maps intuitively to what you are doing, and it doesn't impact other pieces of your solution. So if you have to change it, you can do so without breaking anything elsewhere.
I currently have a repository for just about every table in the database and would like to further align myself with DDD by reducing them to aggregate roots only.
Let’s assume that I have the following tables, User and Phone. Each user might have one or more phones. Without the notion of aggregate root I might do something like this:
//assuming I have the userId in session for example and I want to update a phone number
List<Phone> phones = PhoneRepository.GetPhoneNumberByUserId(userId);
phones[0].Number = “911”;
PhoneRepository.Update(phones[0]);
The concept of aggregate roots is easier to understand on paper than in practice. I will never have phone numbers that do not belong to a User, so would it make sense to do away with the PhoneRepository and incorporate phone related methods into the UserRepository? Assuming the answer is yes, I’m going to rewrite the prior code sample.
Am I allowed to have a method on the UserRepository that returns phone numbers? Or should it always return a reference to a User, and then traverse the relationship through the User to get to the phone numbers:
List<Phone> phones = UserRepository.GetPhoneNumbers(userId);
// Or
User user = UserRepository.GetUserWithPhoneNumbers(userId); //this method will join to Phone
Regardless of which way I acquire the phones, assuming I modified one of them, how do I go about updating them? My limited understanding is that objects under the root should be updated through the root, which would steer me towards choice #1 below. Although this will work perfectly well with Entity Framework, this seems extremely un-descriptive, because reading the code I have no idea what I’m actually updating, even though Entity Framework is keeping tab on changed objects within the graph.
UserRepository.Update(user);
// Or
UserRepository.UpdatePhone(phone);
Lastly, assuming I have several lookup tables that are not really tied to anything, such as CountryCodes, ColorsCodes, SomethingElseCodes. I might use them to populate drop downs or for whatever other reason. Are these standalone repositories? Can they be combined into some sort of logical grouping/repository such as CodesRepository? Or is that against best practices.
You are allowed to have any method you want in your repository :) In both of the cases you mention, it makes sense to return the user with phone list populated. Normally user object would not be fully populated with all the sub information (say all addresses, phone numbers) and we may have different methods for getting the user object populated with different kind of information. This is referred to as lazy loading.
User GetUserDetailsWithPhones()
{
// Populate User along with Phones
}
For updating, in this case, the user is being updated, not the phone number itself. Storage model may store the phones in different table and that way you may think that just the phones are being updated but that is not the case if you think from DDD perspective. As far as readability is concerned, while the line
UserRepository.Update(user)
alone doesn't convey what is being updated, the code above it would make it clear what is being updated. Also it would most likely be part of a front end method call that may signifiy what is being updated.
For the lookup tables, and actually even otherwise, it is useful to have GenericRepository and use that. The custom repository can inherit from the GenericRepository.
public class UserRepository : GenericRepository<User>
{
IEnumerable<User> GetUserByCustomCriteria()
{
}
User GetUserDetailsWithPhones()
{
// Populate User along with Phones
}
User GetUserDetailsWithAllSubInfo()
{
// Populate User along with all sub information e.g. phones, addresses etc.
}
}
Search for Generic Repository Entity Framework and you would fine many nice implementation. Use one of those or write your own.
Your example on the Aggregate Root repository is perfectly fine i.e any entity that cannot reasonably exist without dependency on another shouldn't have its own repository (in your case Phone). Without this consideration you can quickly find yourself with an explosion of Repositories in a 1-1 mapping to db tables.
You should look at using the Unit of Work pattern for data changes rather than the repositories themselves as I think they're causing you some confusion around intent when it comes to persisting changes back to the db. In an EF solution the Unit of Work is essentially an interface wrapper around your EF Context.
With regards to your repository for lookup data we simply create a ReferenceDataRepository that becomes responsible for data that doesn't specifically belong to a domain entity (Countries, Colours etc).
If phone makes no sense w/o user, it's an entity (if You care about it's identity) or value object and should always be modified through user and retrieved/updated together.
Think about aggregate roots as context definers - they draw local contexts but are in global context (Your application) themselves.
If You follow domain driven design, repositories are supposed to be 1:1 per aggregate roots.
No excuses.
I bet these are problems You are facing:
technical difficulties - object relation impedance mismatch. You are struggling with persisting whole object graphs with ease and entity framework kind a fails to help.
domain model is data centric (as opposed to behavior centric). because of that - You lose knowledge about object hierarchy (previously mentioned contexts) and magically everything becomes an aggregate root.
I'm not sure how to fix first problem, but I've noticed that fixing second one fixes first good enough. To understand what I mean with behavior centric, give this paper a try.
P.s. Reducing repository to aggregate root makes no sense.
P.p.s. Avoid "CodeRepositories". That leads to data centric -> procedural code.
P.p.p.s Avoid unit of work pattern. Aggregate roots should define transaction boundaries.
This is an old question, but thought worth posting a simple solution.
EF Context is already giving you both Unit of Work (tracks changes) and Repositories (in-memory reference to stuff from DB). Further abstraction is not mandatory.
Remove the DBSet from your context class, as Phone is not an aggregate root.
Use the 'Phones' navigation property on User instead.
static void updateNumber(int userId, string oldNumber, string newNumber)
static void updateNumber(int userId, string oldNumber, string newNumber)
{
using (MyContext uow = new MyContext()) // Unit of Work
{
DbSet<User> repo = uow.Users; // Repository
User user = repo.Find(userId);
Phone oldPhone = user.Phones.Where(x => x.Number.Trim() == oldNumber).SingleOrDefault();
oldPhone.Number = newNumber;
uow.SaveChanges();
}
}
If a Phone entity only makes sense together with an aggregate root User, then I would also think it makes sense that the operation for adding a new Phone record is the responsibility of the User domain object throught a specific method (DDD behavior) and that could make perfectly sense for several reasons, the immidiate reason is we should check the User object exists since the Phone entity depends on it existence and perhaps keep a transaction lock on it while doing more validation checks to ensure no other process have deleted the root aggregate before we are done validating the operation. In other cases with other kinds of root aggregates you might want to aggregate or calculate some value and persist it on column properties of the root aggregate for more efficient processing by other operations later on. Note though I suggest the User domain object have a method that adds the Phone it doesn't mean it should know about the existence of the database or EF, one of the great feature of EM and Hibernate is that they can track changes made to entity classes transparently and that also means adding of new related entities by their navigation collection properties.
Also if you want to use methods that retrieve all phones regardless of the users owning them you could still though it through the User repository you only need one method returns all users as IQueryable then you can map them to get all user phones and do a refined query with that. So you don't even need a PhoneRepository in this case. Beside I would rather use a class with extensions method for IQueryable that I can use anywhere not just from a Repository class if I wanted to abstract queries behind methods.
Just one caveat for being able to delete Phone entities by only using the domain object and not a Phone repository you need to make sure the UserId is part of the Phone primary key or in other words the primary key of a Phone record is a composite key made up of UserId and some other property (I suggest an auto generated identity) in the Phone entity. This makes sense intuively as the Phone record is "owned" by the User record and it's removal from the User navigation collection would equal its complete removal from the database.
I have a Linq-To-Sql based repository class which I have been successfully using. I am adding some functionality to the solution, which will provide WCF based access to the database.
I have not exposed the generated Linq classes as DataContracts, I've instead created my own "ViewModel" as a POCO for each entity I am going to be returning.
My question is, in order to do updates and take advantage of some of the Linq-To-Sql features like cyclic references from within my Service, do I need to add a Rowversion/Timestamp field to each table in by database so I can use code like dc.Table.Attach(myDisconnectedObject)? The alternitive, seems ugly:
var updateModel = dc.Table.SingleOrDefault(t => t.ID == myDisconnectedObject.ID);
updateModel.PropertyA = myDisconnectedObject.PropertyA;
updateModel.PropertyB = myDisconnectedObject.PropertyB;
updateModel.PropertyC = myDisconnectedObject.PropertyC;
// and so on and so forth
dc.SubmitChanges();
I guess a RowVersion/TimeStamp column on each table might be the best and least intrusive option - just basically check for that one value, and you're sure whether or not your data might have been modified in the mean time. All other columns can be set to Update Check=Never. This will take care of handling the possible concurrency issues when updating your database from "returning" objects.
However, the other thing you should definitely check out is AutoMapper - it's a great little component to ease those left-right-assignment orgies you have to go through when using ViewModels / Data Transfer Objects by making this mapping between two object types a snap. It's well used, well tested, used by many and very stable - a winner!