This is a general architecture question, hopefully to folks out there already using EF in final applications.
We have a typical N-Tier application:
WPF Client
WCF Services
EF STE DTO's
EF Data Layer
The application loads all known business types during load time (at the same time as the user logs in) then loads a very large "Work Batch" on demand, this batch is around 4-8Mg and is composed of over 1.000 business objects. When we finish loading this "Batch" we then link everything with the previously loaded business types, etc...
In the end we have around 2K-5K business objects in memory all correctly reference so we can use and abuse LINQ on the client side, we also do some complex math on all these objects on the client side, so we really need the large graph.
The issue comes when we want to save changes to the Database. With such a large object graph, we hardly want to send over everything again through the Network.
Our current aproach, which I dislike, given the complexity of the T4 templates so far, is to detach and attach everything on update. We basically want to update a given object, detach it from the rest of the graph, send it over the network, updated it on the WCF side, and then reattach it again on the client side. The main problem is when you want to update linked objects, let's say you add something that has a reference for something that is also added, then another reference to something modified, etc. This forces a lot of client code to make sure we don't break anything.
All this is done with generated code, so we are talking about 200-800 lines of T4 code per template.
What I'm looking at right now is a way to customize serialization and deserialization of the STE's, so that I can control what is sent over the network or not, and be able to update batches instead of just a single STE. Checking references, see if those references are Unchanged or not; if not don't serialize, if yes serialize and update everything just by attaching it to the context on the WCF side.
After some studying I found 2 solutions to this method.
One is by writing a custom DataContractSerializer.
The second one is by changing the STE template created by EF and playing around with the KnownTypeAttribute, instead of generating it for each reference type, have it reference a method that inspects the object and only marks for serialization references that are not unchanged.
Has anyone ever come across this
issue before?
What solutions did you use?
What problems did you encounter down
the line?
How easy was it to maintain the
templates created?
I don't know whole application design but if you generally load the work batch to the service and then send it to the client to play with it, it looks like service layer is somehow unnecessary and you can directly load data from database (and you will get much better performance). Depending on complexity of computation you can also do some computation directly in the database and you will again get much better performance.
Your approach to save only part of the graph is abuse to STE concept. STE works in manner - you load the graph, modify the graph and save the same graph. If you want to have a big dataset for reading and save only small chunks it is probably better to load data set for reading and once you decide to update a chunk, load only the chunk again, modify it and send it back.
Interfering the internal STEs behavior is imho the best way to lost some changes in some corner / unexpected scenarios.
Btw. this somehow looks like a scenario for syncing local database with a global one - I have never done that but it is quite common in smart-clients.
Related
Im new in DDD and I would like yours advise.
In my UI I need to view data from 2 aggregates.Im using EF Core and as I have read its better to keep only one navigation between entities so not to mix two aggregates and avoids serialization issues due to circular references.
How should I make the query?
Do I need to create a new view whenever I need data from 2 aggregates?
If needs to create views in which layer this view can exist? In infrastructure persistance layer or domain?
Thank you
How should I make the query?
With the simplest and fastest technology you can use. I mean: if building the query with EF Core requires several steps and a lot of extra objects, change approach and try with a direct SQL request. It's query, something you can test fast and you can change equally fast, whenever you need to do.
Do I need to create a new view whenever I need data from 2 aggregates?
You don't. With a view you hide away (in the view) the complexity oft the data read (at the code to change the DB every time the data to show should change), with the illusion/feeling that you manage an entity. Or course it should be clear that the data comes from a view. A query, on the other side, is more code related (to change the data shown you just change the query), but you also show "directly" that that data come from several sources.
Note: I've used EF Core years ago, and for a a really simple project. If with view you mean instead a view of the EF Core, than I would say yes. But only if building it doesn't require several steps/joins to gather the information. I would always think about a direct approach, when it looks that the code starts to be a bit too complex to show some data.
Here, anyway, the things can go really deep: do you have all your entities (root) in the same project? Or you have several microservices? With microservices, how do you share the data and how do you store it? Maybe a query is not viable, or it reads partially old data. As you can see, there're several thing to take into account when you have to read the data.
If needs to create views in which layer this view can exist? In infrastructure persistance layer or domain?
As stated before, if you mean a view within the EF Core, I would put really close to the layer where you're going to use it. But, it could depend. You could have a look here.
Personally I use 3 layers: domain, application and infrastructure. My views are in the application layer, because I have several queries that I reuse for different purposes. But before going into the infrastructure (where the requests are) I transform the results into the format required for UI.
DDD is about putting together all the business logic that otherwise is spread around several entities, services and even controllers. With this solution, all the actions that the domain offers could be performed without requiring extra logic outside the domain itself. Of course you need to implement the services that the domain is going to use, this is obvious.
On the other side is clear, at least for me, that the reuse is limited to the domain itself. I mean:
I can build a big query, that collects a lot of information from different sources, and reuse it for several UI views, but I've to be ready to pay the price of something that I have to touch every time something in the UI changes (anyway I need to transform this into a view related object);
I can build small, specialized queries that I use for 1, 2 (if they are the same) UI views, paying the price of more code (but simple and specialized, and really fast to test!) to maintain (here the query can produce close to/equal to view related object).
The second approach is the basic of CQRS, and I prefer that one. Remember, you can do CQRS even without event store and eventually consistency: you just take part of it, not the whole. We design to simplify our lives, not to make them harder.
The Problem
We have an app that stores hierarchical data in a database. We have defined a POCO object which represents a row of data.
The problem is we need certain properties to be dependent on the item's children and others on their ancestors. As an example, if a ((great)grand)child has incomplete state, then implicitly all of its parents are also incomplete. Similarly, if a parent has a status of disabled, then all children should be implicitly disabled as well.
On the database side of things, everything works thanks to triggers. However, the issue we're having is then synching those changes to any in-memory ORM objects that may have been affected.
That's why we're thinking to do all of this, we need to ensure there is only ever one model instance in memory for any specific row in the database. That's the crux of the entire problem.
We're currently doing that with triggers in the DB, and one giant hash-set of weak references to the objects keyed on the database's ID for the in-memory ORM objects, but we're not sure that's the proper way to go.
Initial Design
Our 'rookie' design started by loading all objects from the database which quickly blew out the memory, let alone took a lot of time loading data that may never actually be displayed in the UI as the user may never navigate to it.
Attempt 2
Our next attempt expanded on the former by dynamically loading only the levels needed for actual display in the UI, which greatly sped up loading, but now doesn't allow the state of the hierarchy to be polled without several calls to the database.
Attempt 2B
Similar to above, but we added persistent 'implicit status' fields which were updated via triggers in the database. That way if a parent was disabled, a trigger updated all children accordingly. Then the model objects simply refreshed themselves with the latest values from the database. This has the down-side of putting some business logic in the model layer and some in the database triggers as well as making both database writes and reads needed for every operation.
Fully Dynamic
This time we tried to make our models 'dumb' and removed our business layer completely from the code, moving that logic entirely to the database. That way there was only single-ownership of the business rules. Plus, this guaranteed bad data couldn't be inserted into the database in the first place. However, here too we needed to constantly poll the database for the 'current' values, meaning some logic did have to be built in to know which objects needed to be refreshed.
Fully Dynamic with Metadata
Similar to above, but all write calls to the database returned an update token that told the models if they had to refresh any loaded parents or children.
I'm hoping to get some feedback from the SO community on how to solve this issue.
Essentially, we have a database with a recurring template pattern and instances of this template. Templates live indefinitely, while the instances are bound in time. One group of users work only with templates and one group of users work only with "answer" entities connected to the instances. When a change is made to the template, the instances that are currently active automatically receive the changes from the templates (including cloning related entities or bringing existing clones into sync), while older instances are left alone "as you left them", which is an absolute requirement in order to not retroactively change history. When you go back to 2013, you want to see the data that was current as of the last change in 2013, not anything newer. Thus the cloning.
This all sounds good, except that making the clone involves cloning an involved graph of entities, sometimes including many-to-many relationships. Making sure that the information of the just-updated version of the template is used involves passing around that specific as-yet-unsaved entity object or saving at every step, forgetting all objects and making a new context every time. This code is hard to write, harder to get right and a nightmare to maintain.
I have desperately been looking for suitable literature about this and have been unable to even find something written up about the database modelling pattern (or for that matter better alternatives), never mind what to do in EF to work as efficiently as possible. Am I missing something, or is this just a case of it being a problem with inherent complexity?
There is nothing built in to help with this specific scenario. I'd consider a solution based on reflection and on the entity framework metadata model to automate a lot of this. That makes it easier to get right as well.
Cloning graph of objects should be automatable and has little inherent complexity. But if you want to clone only specific parts I can see complexity creep in easily. That's likely going to be inherent complexity. On the other hand if you find yourself writing the same cloning code and copy loops all over the place that's a missed abstraction and is artificial complexity.
Making sure that the information of the just-updated version of the template is used involves passing around that specific as-yet-unsaved entity object or saving at every step, forgetting all objects and making a new context every time.
I did not quite understand what you mean here. But talking about multiple contexts makes me very alert because that's a common anti-pattern. Normally, you want to have one context per logical unit of work. Often, that UOW is an HTTP request or a WCF request or a user interaction. When all entities are part of the same context many issues go away.
Also, it's not necessary to keep objects unsaved. Generally, the database should be synchronized with the in-memory entity state. So when you create fresh objects as part of your template cloning procedure there should be no reason to not save them. It's not necessary to save after each new entity. For performance reasons try not to save too often.
If you elaborate more on specific issues I can add commentary.
Sounds daft i know but i want to do something a bit out of the ordinary ...
essentially I'm looking to build solution that has a wcf data service at the back end (or something of that ilk at least) that allows me to query my database using simple url syntax.
the problem i have is that when my db schema changes i have to recomile the entire back end and that's not good because the solution i'm building allows the definition of "entities" so to speak.
Essentially what i want to do is have the model update every time the db updates ... as a sort of triggered event.
I'm thinking that EF won't do this which leads me to my actual question ...
How would you solve this problem?
I need exactly what a wcf data service offers out of the box ... just with a more dynamic data model beneath it.
You need to change the O/RM to something more dynamic ... something like Massive could be used instead of EF.
Someone looks to be doing similar with WebWCF ... Massive with WCF Web Api to return dynamic types/Expandos?.
If you use data services then you'd need to figure out some way to represent the Massive as a 'DataContext'. WebWCF on the other hand would serialise dynamic objects as a lump of JSON or XML where required.
The problem with your proposed approach is one where the Web Service contract is dynamic and not versioned. This means that if you delete/rename/change a field you essentially have created a change to the 'Contract' that the clients use to consume the web service. This can lead to a client breaking unless updated at the same time.
If you are looking at a low friction way of managing model change updating database I have found that EF Code First 4.2 and EF Migrations works pretty well for me. 0.7.0.1 is reasonably stable and all available from NuGet.
background: we've got a number of server processes and client apps that are used entirely internally, in a fairly controlled environment. we capture a significant amount of data every day that goes into a couple database machines. most everything is c#, with a few c++ apps.
just about every app has some basic (if not extensive) dependence on database data, whether it's for historical data, daily-calculated values, or assorted parameters. as the whole environment has gotten a bit more sprawling, I've been wondering about the sense in sticking an intermediary in between all client and server apps and the database, a sort of "database data broker". any app that needs values from the db makes a request to the data broker, instead of a dll wrapper function that calls a stored proc.
one immediate downside is that the data would make two trips across the network: from db to broker, and from broker to calling app. seems like poor form, but the amount of data would be small enough in each request that I'm ok with it as far as performance goes.
one (seeming) upside is that it would be trivial to set up a test environment, as it would entail just setting up a test data broker, and there's no maintaining of db connection strings locally anywhere else. also, I've been pondering creating a mini request language so you wouldn't have to enumerate functions for each dataset you might request (instead of GetX() and GetY(), there would be Get("name = X")
am I over-engineering this, or is it possibly a worthy architecture?
edit: thanks for all the great comments so far, great food for thought.
It depends on what you're trying to accomplish with it. According to Rocky Lhotka, you should only add a tier if you are forced to, kicking and screaming all the way.
I agree with him: don't tier unless you need to. I think there are valid reasons to add additional tiers, usually for purposes of security, scalability and maintainability. The question becomes: is yours a valid reason?
It looks like the major reason is maintainability. Does it outweigh the benefits you get by not having the tier?
only you can answer these:
what are the benefits of doing this?
what are the problems/risks of doing this?
do you need this to make testing easier or even possible?
if you make this change and when it goes live and crashes will you be fired?
if you make the changes and it goes live will you get a promotion?
etc...
As the former architect of a system that also used a database heavily as a "hub," I can say that there are several drawbacks that you should be aware of. Our system used databases:
As a transaction store (typical OLTP stuff)
As a staging queue (submitted but unprocessed transactions)
As a historical data store (results of processed transactions)
As an interoperation layer (untranslated commands or transactions issued from other systems)
One of the major drawbacks is ownership costs. When your databases become the single point of failure for so many types of operations, it becomes necessary to ensure that they are all hosted in high-availability environments. This not only expensive from a hardware perspective, but it is also expensive to support deployments to HA environments, since developers typically have very limited visibility to the internals.
A second drawback is that you have to seriously design integrity in to all of your tables. In a typical SOA environment, you have complete control over how data is modified. When you expose it through database tables, you must consider that any application with the right credentials will have the ability to modify data. Because of this, you must carefully consider utilitarian implementations of constraints. If you had a single service managing persistence, you could be much looser in constraints on the database and enforce them in code.
Third, if you ever want to expose any functionality that the database tables currently allow you to provide to outside parties, you must write service code anyway, so you might be better served doing it strategically as opposed to reacting to requests.
Fourth, UI interaction directly with the data layer creates security risks, especially if the client is a thick client.
Finally, writing code that responds to events (service calls) is much easier than polling code. Typically, organizations that rely heavily on database polling end up reinventing the wheel every time a new project requires a new "monitoring service." It can be avoided by creating a "framework," but those have their own pitfalls (primarily around prescription versus adoption).
This is just a laundry list of problems I have encountered. It's not necessarily meant to dissuade you from using databases for these functions, but it helps to know the dangers ahead of time so you can at least plan for them if they ever do become issues.
EDIT
Just thought of another scenario that caused us pains. Versioning your changes can be difficult. For example, if you need to change the shape of a table (normalize/denormalize), it has a cascading effect if multiple applications rely on it. In a SOA scenario, it is much easier, because you can keep your old API, change the internal interaction so that it works with the changed tables, and allow consumers to migrate to the new version on their own schedule.
A data broker sounds like a really good way to abstract out the multiple data sources for your apps. It would be easy to consolidate, change repositories, or otherwise move data around if needed in the future.
I may be misunderstanding something, but it seems to me like you should consider some entity framework. That is a framework you can use to "map" your interaction with the db to some domain objects. That way you work locally on domain objects that gets filled form your db, and when it is time to persist the state of your objects to the base, the framework handles all the connections back and forth. In this way you can also easily mock up these domain objects for unit testing without needing a db connection.
Check out NHibernate for a good entity framework alternative.
If you already have the database related know-how I think it's not a bad decission.
Good things that I can think of:
if the data model is consistent you can plug in new tools easily without making any changes in the other apps.
maybe you can have running the database more reliabily than your apps, so if one of them fails, the other one can still be working.
you can make backups and rollbacks using the database tools.
you can do emergency fixes manipulating the data directly with sql or some visual tool.
But if you have to learn new frameworks along the way, maybe the benefits are not worth the extra initial effort.
"any app that needs values from the db makes a request to the data broker"
When database technology was being invented over 40 years ago, the people doing that inventing had ideas along the lines of "any app that needs values from the db makes a request to the dbms".
Have you ever pondered the possibility that YOU ALREADY HAVE a "data broker", and that there might be very little added value in creating a second one of your own ?