using multiple dbcontexts in the same database with EF 6 - c#

I've seen a few answers that get in the general area of this question, but none that directly gets at what I'm seeing. I've inherited a recent code base with a charter to make the code run faster. It looks to me like each of the tables in the database (one database) is given its own dbcontext. These tables are highly relational and the major issue I keep seeing is that decisions need to be made based on foreign key relationships between tables. With this design approach, every time two tables are involved, the developer nested using statements for each context (good on him for auto cleanup, right), grabs the data from table A, then, loops through table B row by row to find matches (usually we're interested in the non-matches so that we can just update table B), then proceed. I have certainly seen multiple contexts used in an app, but I've never seen this "pattern" and cannot figure out why one would use it. Before I rewrite this part of the app so that each table is in the same context, I felt I should do a sanity/reality check to make sure that there isn't some really bad monster waiting for me. Obviously, I cannot find anything. Here's a pseudo snippet (exact code with table and column name changes):
var beers = from b in beerContext.AllBeersOffered
where b.CurrentStock = 1
select b;
foreach(var beer in beers)
{
Bars bars = barContext.AllBeersCarried.FirstOrDefault(x=>x.BeerId==beer.Id)
if(bars == null)
{
//Create a List of beers offered but not carried in that bar,
// which is then added to another table.
}
}
Again, my inclination is very much Whiskey-Tango-Foxtrot, move the tables to one DbContext and handle this in one insert, but that requires assuming a deeper level of really, really bad coding than I'm comfortable making. Obviously, the original programmer is long gone from the business and we think the profession entirely. But, he apparently was a pretty experienced developer (Java, though), so I don't like to assume incompetence when it could be something I'm missing. My question is, could you justify this design, and if so, what should I look for?

This is an anti-pattern, big time!
I've seen beginners do this, esp. when coming from a DAO background, in which each entity has its own data-access mirror.
It thoroughly defeats the purpose of a mature OR mapper like Entity Framework, that is designed to cross the chasm between a relational model and an object model. Such OR mappers are designed especially to map associations in tabular form (child refers to parent) to associations in object-oriented form (parent owns children).
So yes, go ahead and grab related entities into contexts that represent natural aggregates in the application.

I would recommend you to make coarse-grained DbContexts, but not to move entire schema to one DbContext.
You can follow the DDD practice, where one important table is taken as the root, and then it goes together with a couple of related tables that strongly depend on it. That structure is called an aggregate. In my designs, I usually make one DbContext per aggregate.
On the other hand, operations should be redefined where needed, so that each operation only deals with one aggregate. It may pick IDs from other aggregates, make queries to collect all the data it needs, but then make changes to only one aggregate. That approach to design is leading to great simplifications in code.

Related

Is there a strategy or design pattern for making large-scale graph clone operations in Entity Framework/EF Core?

Essentially, we have a database with a recurring template pattern and instances of this template. Templates live indefinitely, while the instances are bound in time. One group of users work only with templates and one group of users work only with "answer" entities connected to the instances. When a change is made to the template, the instances that are currently active automatically receive the changes from the templates (including cloning related entities or bringing existing clones into sync), while older instances are left alone "as you left them", which is an absolute requirement in order to not retroactively change history. When you go back to 2013, you want to see the data that was current as of the last change in 2013, not anything newer. Thus the cloning.
This all sounds good, except that making the clone involves cloning an involved graph of entities, sometimes including many-to-many relationships. Making sure that the information of the just-updated version of the template is used involves passing around that specific as-yet-unsaved entity object or saving at every step, forgetting all objects and making a new context every time. This code is hard to write, harder to get right and a nightmare to maintain.
I have desperately been looking for suitable literature about this and have been unable to even find something written up about the database modelling pattern (or for that matter better alternatives), never mind what to do in EF to work as efficiently as possible. Am I missing something, or is this just a case of it being a problem with inherent complexity?
There is nothing built in to help with this specific scenario. I'd consider a solution based on reflection and on the entity framework metadata model to automate a lot of this. That makes it easier to get right as well.
Cloning graph of objects should be automatable and has little inherent complexity. But if you want to clone only specific parts I can see complexity creep in easily. That's likely going to be inherent complexity. On the other hand if you find yourself writing the same cloning code and copy loops all over the place that's a missed abstraction and is artificial complexity.
Making sure that the information of the just-updated version of the template is used involves passing around that specific as-yet-unsaved entity object or saving at every step, forgetting all objects and making a new context every time.
I did not quite understand what you mean here. But talking about multiple contexts makes me very alert because that's a common anti-pattern. Normally, you want to have one context per logical unit of work. Often, that UOW is an HTTP request or a WCF request or a user interaction. When all entities are part of the same context many issues go away.
Also, it's not necessary to keep objects unsaved. Generally, the database should be synchronized with the in-memory entity state. So when you create fresh objects as part of your template cloning procedure there should be no reason to not save them. It's not necessary to save after each new entity. For performance reasons try not to save too often.
If you elaborate more on specific issues I can add commentary.

Should you have one-database-to-rule-them-all setup or separated database for each bounded context?

In DDD, as far as I understand it, it helps or guides you on how to structure complex application. Now in an application, you should identify your Bounded Context. Say you have more than 10 BCs.
I read somewhere (forgive me I cannot give any links), that its not ideal to have 1-big database for a complex application. That it should be separated for each BC. If that's the easier route to take. How should one structure an app if each BC have their own database.
I tried searching on github but cannot find one.
It depends if they only share the same database or also some tables - i.e. data.
Sharing a database but not tables can be perfectly fine. Except if you aim for scalability and intend to make your BC's independently deployable and runnable units like microservices, in which case they should probably have their own data store instance.
I see a few more drawbacks to database tables shared by 2 or more Bounded Contexts :
Tight coupling. The reason we have distinct BC's is that they represent different domain spaces that are likely to diverge their own way. Changing a concept in one of the BC's might impact the underlying table, forcing the other BC's that use this table to change as well. You get rigidity where there should be suppleness. You might also have inconsistencies or "holes" in the data due to the multiple possible sources of change.
Concurrency. In highly concurrent systems, some entities and the tables underneath are subject to strong contention. Bounded Contexts are one of the ways to lighten the load by separating different types of writes, but that only works if they don't lock the same data at the end of the day. Same is true for reads in non-CQRS systems where they query the same database where writes are done.
ORM friendliness. Most ORMs won't allow you to map to 2 or more classes from the same database table without a lot of convolutions and workarounds.
How should one structure an app if each BC have their own database.
To some extent (e.g. that may include the UI layer or not), just as if you had multiple separate applications. Please be more specific if you have precise questions in mind.
The idea of having this vertical slice per bounded-context is so the relationship of each BC to every other BC and the communication between them should be considered and designed based on the domain knowledge and not on the technical merits of a persistence technology.
If you have a Customer in 2 different BCs it causes a kind-of actor pattern situation. If the Support BC needs to know about the new Customer when it is created in the Sales BC, then the Sales BC needs to connect up to a known interface on the Support BC and pass it this new information. One domain talking to another. It models quite closely how things work in real life when people from different departments talk to each other.
If you share a big database (you're talking bespoke enterprise software here so there won't be many examples in the wild) then the temptation is to bypass all the domain expertise that is captured in the domain layers and meddle in another BC's database. Things become a big ball of mud very quickly.
Surprisingly I see this sort of thing too often in the real world and I consider it very bad practice.
It depends a littlebit on the reason why they are their own database. THe idea of a bounded context is that you have a set of entities that are related together and solve a problem together. if you look at the link Chaim Eliyah provided you can have a sales and a support context.http://martinfowler.com/bliki/BoundedContext.html
Now there is no reason a product for sales,and a product for support should look the same in a database. What is important is that if support wants to add a property (say "Low quality") that it can do so while sales might not want that property. Also downtime on your sales application should probably not affect your support application.
That said entities don't care where they are stored. If you already have a huge product database you can certainly build your entities for different bounded context based on the same database. The thing to remember is that database table is not the same as entity. Entities is what your business/application needs. Database is just what's needed to store things.
That said, separate if you can. If that's not feasable try to define ownerships. You make your life a lot easier if everyone agrees that product is the product as defined by sales and that support can have a "productfactsheetTable" augmenting the product. That way you avoid conflicting changes from each bounded context. (also a followup is that support can only read products but never write). Table prefixes might help here to make this clear.
And this problem already exists with 2 related bounded context. By 10 you'll have a nightmare if multiple context try to write to the same table.

SQL Server and Entity Framework - Dynamic Columns

I use SQL Server and Entity Framework as ORM.
Currently I have a table Product which contains all products of any kind. The different kinds of products possess different attributes.
For example:
All products of kind TV have attributes title, resolution and contrast
Where as all products of kind Car have attributes like model and horsepower
Based on this scenario I created a table called Attribute which contains all the attributes of a product.
Now to fetch a product from database I always have to join all the attributes.
To insert a product I have to insert all the attributes one by one as single rows.
The application is not just a shop or anything like it. It should be possible to add/remove an attribute to/from a kind of product on the fly without changing the db.
But my questions to you is still:
Is this a bad design?
Is there another way of doing it?
Will my solution slow down significant? (f.e. an insert takes several seconds assumed the product has hundreds of attributes...)
Update
The problem is that my application is really complex. There are a lot of huge algorithms. The software is used for statistical purposes.
One problem for example is the following one: In an algorithm-table I'm storing which attributes are used for filters. Say an administrator wants to filter all cars that have less than 100 horsepowers. The filters are dynamical, what means that I have a filter table which stores the filter type (lessThan) and the attribute (horsepowers). How can I keep this flexibility with the suggested approaches (with "hardcoded" columns)?
There is a thing about EF that I don't think everybody is aware of when designing the relations.
When you query something, EF (at least <= 4) wants to create a single SELECT for that query.
What that implies is that if you have entity A, that have a one-to-many relationship to entity B (say Item to Attributes) then EF joins the two together such there will be a returned row for all dependent Bs for each A. If A have many properties, multiple dependencies or even worse if B has many sub-dependencies, then the returned table will be quite massive, since all A-properties will be copied for each row of dependent B. Over time, when your entity models grow in complexity, this can turn into a real performance problem.
EF only includes the Bs if you explicitly tell to it to eager load the dependencies "include"s. If the includes are omitted, your stuff will initially load faster, but once you access your attributes, they will be lazy-loaded by EF. This is known as the SELECT N+1 problem (each A will require N times B-lazy queries, which can be a huge overhead).
While this is not a straight answer to your question, it is something to consider when designing your tables.
Also note, that EF supports several alternatives for base-classing. One strategy is to have a common table, that automatically joined together with the sub-entities. The alternative, which typically performs better, but is harder to upgrade, is to have one table with a super-set of all properties of all sub-classes.
More (over) generalized database design considerations:
The devil is in the details. You can make a whole career out of making good database design choices. There is no silver bullet database patterns.
EF comes with a lot of limitations. This is the price for the convenience. If the model suits EF well, then EF is quite good, but do consider more flexible alternatives like NHibernate. Sometimes even plain old data tables with views and stored procedures are to be preferred.
EF is not efficient if your model has a lot of small dependents (like a ton of attributes to an item table). It will result in either a monster query and return table or the select n+1 problem. You can write some tricky multi-part LINQ queries to somewhat compensate, but it is tricky.
SQL's strength is in integrity and reporting which works best for rather rigid data models.
Depending on the details, your model looks like a great candidate for a NoSql backend, like RavenDb and MongoDb. NoSql is much better for dynamic datamodels and scale really well.

How to design a Data Access Layer for a database table that may change in the future?

Introduction:
I'm refactoring (pretty much rewriting) a legacy application in my current internship. The part that this question will be concerned about is the database it uses and the way they retrieve data from it.
The database structure is:
There's a table that has the main records. Let's say each record is a measurement. It has some info about the measured material and different measurement information.
There's a table view they use that has the same information columns, plus some extra columns that contains data calculated from the given measurements. And it also filters some of the data from the table.
So let's say we have the main table with columns:
Measurement ID
Measurement A
Measurement B
The view has something like this:
Measurement ID
Measurement A
Measurement B
Some extra data (for example Measurement A * Measurement B)
The guy that is leading the development only knows some SQL, so he likes adding new columns that is calculated by some columns in the main table for experimenting. And this is definitely a need at the moment.
Requirements are:
Different types of databases should be supported (like SQL Server, Oracle, and probably some others).
The frontend should be able to show the view, which means even though some main columns will always stay the same, there may be some new columns including newly calculated values.
My question is:
What kind of system should I use to accommodate the needs of this application? I wanted to use Entity Framework, but the fact that the view may have new columns in the future is I think a problem. As far as I understand, I should map my classes to the database before compiling.
The other thing that I'm considering is maybe using Entity Framework to get data from the main table and do the calculations and the filtering that is currently done in the table view directly in the frontend, and skip the view altogether. Which sounds fine, though I don't know if they will allow me to do that.
What would you do in my case? Please take into account that I have virtually no experience with databases and ORMs.
You are correct in that using Entity Framework will be a problem if the underlying DB schema is always changing. It will require you to update the EF model on your end every time to grab those new columns.
Ideally, all of your database access is hidden behind the interface to your DAL, so that your application doesn't need to know about which ORM is being used -- if any -- or which database it's connecting to.
I hate to say it, but given your requirements, an ORM might not make sense. You might want to go with something more generic without any strong-typing. You could just simply always return a DataTable to your application layer, and it could loop through the columns and values to display whatever is returned. If there are fields you know will never change, you could create a manual mapping for those fields only into your application object(s).
You may have a look to NoSQL system that are a lot more flexible on the schema. Or have a look to document database like RavenDB. All these systems allow the schema to change dynamically. You need to check the Pro's and Con's to see if it can fill you requirements.
(This answer is a bit out of subject as it's about replacing the SQL server and not really creating a DAL, but other answers cover the subject well and I would like to propose another way that may help.)
If your schema is unstable, then using Entity Framework as a beginner is going to be a headache. The assumption is that you can just refresh the design canvas periodically to let the tool handle database table changes. You can try that for a time to see when it becomes too much of a pain, but without any prior experience using ORMs or Entity Framework it may not be worth the effort.
I would probably use something like Rob Conery's Massive ORM (https://github.com/robconery/massive). It gives you more flexibility with the underlying database schema and is a very small library. I remember it being ~300 lines of code and very easy to use. It uses C# dynamics so you'll have to be using >= C# 4.0 and be comfortable with that one concept but IMO it's worth it for the low-overhead. A full-fledged ORM like Entity Framework or NHibernate is going to cost a lot of learning cycles.
You could, of course, just stick to ADO.NET DataTables. They're a bit ugly and verbose, but they'll do the job.
You can use Entity Framework - Database First if the DB is changing. Of course, you will have to regenerate your classes when you want to be able to access new columns, when the DB schema changes.
If you need to accomodate different database servers, then you should take a look into implementing a repository pattern and abstract all your data access that way.
Your comment
it involves write operations to the main table but the main table never changes
confirms what I was hoping for. It means you can use Entity Framework as the core of you application and a different route to display data.
Suppose that for display (of the view) you use a classic DataTable (because all common grids support them, contrary to displaying dynamic objects). I don't know how create/update/delete will be done, but saving changes will at some point involve mapping a DataRow to a MainEntity object. You can write one method for that like
MainEntity DataRowToEntity(DataRow row)
{
var entity = new MainEntity();
entity.PropertyA = row["PropertyA"];
....
}
The MainEntity can be attached to a context, its status changed to Modified, and saved.

Need advice on combining ORM and SQL with legacy system

We are in the process of porting a legacy system to .NET, both to clean up architecture but also to take advantage of lots of new possibilities that just aren't easily done in the legacy system.
Note: When reading my post before submitting it I notice that I may have described things a bit too fast in places, ie. glossed over details. If there is anything that is unclear, leave a comment (not an answer) and I'll augment as much as possible
The legacy system uses a database, and 100% custom written SQL all over the place. This has lead to wide tables (ie. many columns), since code that needs data only retrieves what is necessary for the job.
As part of the port, we introduced an ORM layer which we can use, in addition to custom SQL. The ORM we chose is DevExpress XPO, and one of the features of this has also lead to some problems for us, namely that when we define a ORM class for, say, the Employee table, we have to add properties for all the columns, otherwise it won't retrieve them for us.
This also means that when we retrieve an Employee, we get all the columns, even if we only need a few.
One nice thing about having the ORM is that we can put some property-related logic into the same classes, without having to duplicate it all over the place. For instance, the simple expression to combine first, middle and last name into a "display name" can be put down there, as an example.
However, if we write SQL code somewhere, either in a DAL-like construct or, well, wherever, we need to duplicate this expression. This feels wrong and looks like a recipe for bugs and maintenance nightmare.
However, since we have two choices:
ORM, fetches everything, can have logic written once
SQL, fetches what we need, need to duplicate logic
Then we came up with an alternative. Since the ORM objects are code-generated from a dictionary, we decided to generate a set of dumb classes as well. These will have the same number of properties, but won't be tied to the ORM in the same manner. Additionally we added interfaces for all of the objects, also generated, and made both the ORM and the dum objects implement this interface.
This allowed us to move some of this logic out into extension methods tied to the interface. Since the dumb objects carry enough information for us to plug them into our SQL-classes and instead of getting a DataTable back, we can get a List back, with logic available, this looks to be working.
However, this has lead to another issue. If I want to write a piece of code that only displays or processes employees in the context that I need to know who they are (ie. their identifier in the system), as well as their name (first, middle and last), if I use this dumb object, I have no guarantee by the compiler that the code that calls me is really providing all this stuff.
One solution is for us to make the object know which properties have been assigned values, and an attempt to read an unassigned property crashes with an exception. This gives us an opportunity at runtime to catch contract breaches where code is not passing along enough information.
This also looks clunky to us.
So basically what I want advice on is if anyone else has been in, or are in, this situation and any tips or advice you can give.
We can not, at the present time, break up the tables. The legacy application will still have to exist for a number of years due to the size of the port, and the .NET code is not a in-3-years-release type of project but will be phased in in releases along the way. As such, both the legacy system and the .NET code need to work with the same tables.
We are also aware that this is not an ideal solution so please refrain from advice like "you shouldn't have done it like this". We are well aware of this :)
One thing we've looked into is to create an XML file, or similar, with "contracts". So we could put into this XML file something like this:
There is an Employee class with these 50 properties
Additionally, we have these 7 variations, for various parts of the program
Additionally, we have these 10 pieces of logic, that each require property X, Y and Z (X, Y and Z varies between those 10)
This could allow us to code-generate those 8 classes (full class + 7 smaller variations), and have the generator detect that for variation #3, property X, Y and K is present, and I can then tie in either the code for the logic or the interfaces the logic needs into this class automagically. This would allow us to have a number of different types of employee classes, with varying degrees of property coverage, and have the generator automatically add all logic that would be supported by this class to it.
My code could then say that I need an employee of type IEmployeeWithAddressAndPhoneNumbers.
This too looks clunky.
I would suggest that eventually a database refactoring (normalization) is probably in order. You could work on the refactoring and use views to provide the legacy application with an interface to the database consistent with what it expects. That is, for example, break the employe table down in to employee_info, employee_contact_info, employee_assignments, and then provide the legacy application with a view named employee that does a join across these three tables (or maybe a table-based function if the logic is more complex). This would potentially allow you to move ahead with a fully ORM-based solution which is what I would prefer and keep your legacy application happy. I would not proceed with a mixed solution of ORM/direct SQL, although you might be able to augment your ORM by having some entity classes which provide different views of the same data (say a join across a couple of tables for read-only display).
"We can not, at the present time, break up the tables. The legacy application will still have to exist for a number of years due to the size of the port, and the .NET code is not a in-3-years-release type of project but will be phased in in releases along the way. As such, both the legacy system and the .NET code need to work with the same tables."
Two words: materialized views.
You have several ways of "normalizing in place".
Materialized Views, a/k/a indexed views. This is a normalized clone of your source tables.
Explicit copying from old tables to new tables. "Ick" you say. However, consider that you'll be incrementally removing functionality from the old app. That means that you'll have some functionality in new, normalized tables, and the old tables can be gracefully ignored.
Explicit 2-way synch. This is hard, not not impossible. You normalize via copy from your legacy tables to correctly designed tables. You can -- as a temporary solution -- use Stored Procedures and Triggers to clone transactions into the legacy tables. You can then retire these kludges as your conversion proceeds.
You'll be happiest to do this in two absolutely distinct schemas. Since the old database probably doesn't have a well-designed schema, your new database will have one or more named schema so that you can maintain some version control over the definitions.
Although I haven't used this particular ORM, views can be useful in some cases in providing lighter-weight objects for display and reporting in these types of databases. According to their documentation they do support such a concept: XPView Concepts

Categories