Is there a performance hit in having a large dbContext? - c#

I'm building a web app that will have 30-35 tables in one database. Now the thing is I want to split the app into 3 different front ends (different teams want different things). 3 different projects.
App1 might use 15-20 tables, App2 might use 10, App3 might use 15.
I was planning on making a project called Models that has a dbContext with all the tables in the database and use that for the web app projects. If I need to add or update the database I can just update that one models project.
A colleague mentioned that you should only include what you need so I should make 3 separate dbcontexts for each web project or there will be a performance hit for including unnecessary tables.

To answer the question in the title: no, I haven't seen any performance hit with extremely large DbContexts. In one project I've worked, where the DbContext was defined with close to a thousand DbSets, the configuration time (the time taken to perform the calls to OnConfiguring and OnModelCreating) was around 2 seconds, and every single entity was configured through the Fluent API; so you can say that the hit is negligible (if there's one at all) for only 35 entities.
That said, whether you use one or more DbContext is dependent of how you will use them. If there's a clear separation of data where you can clearly say "this table will only be used here" and you will not end up having repeated DbSet, you could keep them separated.

A colleague mentioned [...] there will be a performance hit for including unnecessary tables
When colleagues say things like that, you tell them to either back such claims with evidence or to shut up. Seriously, there's enough cargo cult programming in the world already. It's the same as colleagues enforcing you to use String.Empty because it's faster than using "", because they read that on a blog once. Hint: it isn't.
It's very healthy to apply criticism to every claim you hear, especially if that claim is not grounded in any reality whatsoever.
Yes, loading a type with more properties will require more disk I/O and more CPU cycles. This will be extremely negligible though. You will not notice this on the grand scale of things.*
It becomes quite a different story if you're using an EDMX though, as loading and parsing that 5 MB of metadata will literally add seconds to the loading time of your application.*
*: yes, I'm looking for sources for both those claims at the moment.

I think its not a problem from performance perspective - but definitely I see challenge from maintenance perspective.
I experienced similar situation where we had one edmx based data model shared across different capabilities. however each capability is just focused on specific number of tables.
With this, problem we started facing whenever we required to change any table specific to any capability required us to touch one single data model and also leads to unnecessary merge conflicts during checkins.

Related

Should you have one-database-to-rule-them-all setup or separated database for each bounded context?

In DDD, as far as I understand it, it helps or guides you on how to structure complex application. Now in an application, you should identify your Bounded Context. Say you have more than 10 BCs.
I read somewhere (forgive me I cannot give any links), that its not ideal to have 1-big database for a complex application. That it should be separated for each BC. If that's the easier route to take. How should one structure an app if each BC have their own database.
I tried searching on github but cannot find one.
It depends if they only share the same database or also some tables - i.e. data.
Sharing a database but not tables can be perfectly fine. Except if you aim for scalability and intend to make your BC's independently deployable and runnable units like microservices, in which case they should probably have their own data store instance.
I see a few more drawbacks to database tables shared by 2 or more Bounded Contexts :
Tight coupling. The reason we have distinct BC's is that they represent different domain spaces that are likely to diverge their own way. Changing a concept in one of the BC's might impact the underlying table, forcing the other BC's that use this table to change as well. You get rigidity where there should be suppleness. You might also have inconsistencies or "holes" in the data due to the multiple possible sources of change.
Concurrency. In highly concurrent systems, some entities and the tables underneath are subject to strong contention. Bounded Contexts are one of the ways to lighten the load by separating different types of writes, but that only works if they don't lock the same data at the end of the day. Same is true for reads in non-CQRS systems where they query the same database where writes are done.
ORM friendliness. Most ORMs won't allow you to map to 2 or more classes from the same database table without a lot of convolutions and workarounds.
How should one structure an app if each BC have their own database.
To some extent (e.g. that may include the UI layer or not), just as if you had multiple separate applications. Please be more specific if you have precise questions in mind.
The idea of having this vertical slice per bounded-context is so the relationship of each BC to every other BC and the communication between them should be considered and designed based on the domain knowledge and not on the technical merits of a persistence technology.
If you have a Customer in 2 different BCs it causes a kind-of actor pattern situation. If the Support BC needs to know about the new Customer when it is created in the Sales BC, then the Sales BC needs to connect up to a known interface on the Support BC and pass it this new information. One domain talking to another. It models quite closely how things work in real life when people from different departments talk to each other.
If you share a big database (you're talking bespoke enterprise software here so there won't be many examples in the wild) then the temptation is to bypass all the domain expertise that is captured in the domain layers and meddle in another BC's database. Things become a big ball of mud very quickly.
Surprisingly I see this sort of thing too often in the real world and I consider it very bad practice.
It depends a littlebit on the reason why they are their own database. THe idea of a bounded context is that you have a set of entities that are related together and solve a problem together. if you look at the link Chaim Eliyah provided you can have a sales and a support context.http://martinfowler.com/bliki/BoundedContext.html
Now there is no reason a product for sales,and a product for support should look the same in a database. What is important is that if support wants to add a property (say "Low quality") that it can do so while sales might not want that property. Also downtime on your sales application should probably not affect your support application.
That said entities don't care where they are stored. If you already have a huge product database you can certainly build your entities for different bounded context based on the same database. The thing to remember is that database table is not the same as entity. Entities is what your business/application needs. Database is just what's needed to store things.
That said, separate if you can. If that's not feasable try to define ownerships. You make your life a lot easier if everyone agrees that product is the product as defined by sales and that support can have a "productfactsheetTable" augmenting the product. That way you avoid conflicting changes from each bounded context. (also a followup is that support can only read products but never write). Table prefixes might help here to make this clear.
And this problem already exists with 2 related bounded context. By 10 you'll have a nightmare if multiple context try to write to the same table.

How should I handle a potentially large number of edits in entity framework?

I'm using .NET 4.5.1 with EF 6.0.2 and db-first.
The use case is something like this:
Roughly 50k entities are loaded
A set of these entities are displayed for the user, others are required for displaying the items correctly
The user may perform heavy actions on the entities, meaning the user chooses to perform one action which cascades to actually affect potentially hundreds of entities.
The changes are saved back to database.
The question, then, is what is the best way to handle this? So far I've come up with 2 different solutions, but don't really like either:
Create a DbContext at step 1. Keep it around during the whole process, then finally save changes. The reason I don't necessarily like this, is that the process might take hours, and as far as I know, DbContexts should not be preserved for this long.
Create a DbContext at step 1. Discard it right after. At step 4, create a new DbContext, attach the modified entities to it and save changes. The big problem I see with this approach is how do I figure out which entities have actually be changed? Do I need to build a ChangeTracker of my own to be able to do this?
So is there a better alternative for handling this, or should I use one of the solutions above (perhaps with some changes)?
I would go with option number 1 - use a DbContext for the entire process.
The problem I have is with the assertion that the process might take hours. I don't think this is something you want to do. Imagine what happens when your user has been editing the data for 3 hours, and then face a power blackout before clicking the final save. You'll have users running after you with pitchforks.
You're also facing a lot of concurrency issues - what if two users perform the same lengthy process at once? Handling collisions after a few hours of work is going to be a problem, especially if you tell users changes they've made hours ago can't be saved. Pitchforks again.
So, I think you should go with number 3 - save incremental changes of the editing process, so the user's work isn't lost if something bad happens, and so that you can handle collisions if two users are updating the data at the same time.
You would probably want to keep the incremental changes in a separate place, not your main tables, because the business change hasn't been finalized yet.
and as far as I know, DbContexts should not be preserved for this long.
Häh?
There is nothing in a db context about not preserving it. You may get problems with other people having already edited the item, but that is an inherent architectura problem - generally it isn ot adviced to use optimistic AND pessimistic locking in a "multi hour edit marathon".
The only sensible approach if you have editing over hours is using your own change tracker and using proper logic when changes collode - and / or use a logical locking mechanism (flag in the database).

Creating a clear abstraction layer over a convoluted and large SQL database

Almost all of the applications I write at work get their data from a central MSSQL database. This database has about 70 tables, and on average I'd say 25 or so columns per table. The database has developed over 5-10 years (I'm not entirely sure) and is full of idiosyncrasies and quirks. Foreign keys are irregularly implemented when it comes to naming and so on, as well as case and language mixing in table and column names.
I am not able to restructure the database itself as it would break a ton of backwards compatibility for applications needed in the daily work of most people in the office.
I've almost exclusively been using LINQ2SQL for interacting with the database and it works fine, but always requires a lot of manual joining of tables, either in some db repository or 'inline' when coding. So I've finally decided that I have to do something to once and for all ease the pain of working with this leviathan. This would preferably include implementing a clear naming scheme, joining relevant tables with foreign keys properly once and for all etc.
The three routes I can see are:
Creating a number of views, stored procedures and functions in the SQL to ease up my interaction with the DB. This obviously has the bonus of being usable in many languages, as opposed to a solution implemented in e.g. C#. The biggest drawback I can see here is that it would probably take a lot of time to do this properly, as well as being a bit harder to service a year down the road when I haven't looked at the SQL queries for a while. I would also need to implement another DB abstraction step inside my applications as I wouldn't want to work with just straight up DB calls (abstraction upon abstraction seems bad in this case, but maybe I'm wrong?)
Continuing on my LINQ2SQL road, but creating a once-and-for-all repository class that hides all the underlying tables in abstracted calls only. This idea seems more feasible in terms of development time, maintenance and single-point-abstraction.
Pulling off some EF4 reverse-engineering magic, using the designer to hook up relevant foreign keys and renaming table classes to fit my taste.
Any input on how this should/could be done, as well as any recommended reading you might have, would be most appreciated.
We have a very similar situation with our database. We went the EF route, but we used Code First. I know it sounds weird to use Code First when your database already exists, but due to the size of the tables and the number of tables, trying to do it all in the designer was not feasible.
You can use the "Reverse Engineer Code First" option in Entity Framework Power Tools to generate everything you need from your database.
I think that well thought out abstraction layer is better suits the needs of application if it is not based on physical schema of DB. I mean - the main goal of DAL is to hide tables from users leaving to them only valid "activities" thru stored procedures. In most cases this will outperform the direct data access and gives to you one more degree of freedom - to play with TSQL code and to implement additional logic/schema changes without needing to change the application.

is a database intermediary good system design?

background: we've got a number of server processes and client apps that are used entirely internally, in a fairly controlled environment. we capture a significant amount of data every day that goes into a couple database machines. most everything is c#, with a few c++ apps.
just about every app has some basic (if not extensive) dependence on database data, whether it's for historical data, daily-calculated values, or assorted parameters. as the whole environment has gotten a bit more sprawling, I've been wondering about the sense in sticking an intermediary in between all client and server apps and the database, a sort of "database data broker". any app that needs values from the db makes a request to the data broker, instead of a dll wrapper function that calls a stored proc.
one immediate downside is that the data would make two trips across the network: from db to broker, and from broker to calling app. seems like poor form, but the amount of data would be small enough in each request that I'm ok with it as far as performance goes.
one (seeming) upside is that it would be trivial to set up a test environment, as it would entail just setting up a test data broker, and there's no maintaining of db connection strings locally anywhere else. also, I've been pondering creating a mini request language so you wouldn't have to enumerate functions for each dataset you might request (instead of GetX() and GetY(), there would be Get("name = X")
am I over-engineering this, or is it possibly a worthy architecture?
edit: thanks for all the great comments so far, great food for thought.
It depends on what you're trying to accomplish with it. According to Rocky Lhotka, you should only add a tier if you are forced to, kicking and screaming all the way.
I agree with him: don't tier unless you need to. I think there are valid reasons to add additional tiers, usually for purposes of security, scalability and maintainability. The question becomes: is yours a valid reason?
It looks like the major reason is maintainability. Does it outweigh the benefits you get by not having the tier?
only you can answer these:
what are the benefits of doing this?
what are the problems/risks of doing this?
do you need this to make testing easier or even possible?
if you make this change and when it goes live and crashes will you be fired?
if you make the changes and it goes live will you get a promotion?
etc...
As the former architect of a system that also used a database heavily as a "hub," I can say that there are several drawbacks that you should be aware of. Our system used databases:
As a transaction store (typical OLTP stuff)
As a staging queue (submitted but unprocessed transactions)
As a historical data store (results of processed transactions)
As an interoperation layer (untranslated commands or transactions issued from other systems)
One of the major drawbacks is ownership costs. When your databases become the single point of failure for so many types of operations, it becomes necessary to ensure that they are all hosted in high-availability environments. This not only expensive from a hardware perspective, but it is also expensive to support deployments to HA environments, since developers typically have very limited visibility to the internals.
A second drawback is that you have to seriously design integrity in to all of your tables. In a typical SOA environment, you have complete control over how data is modified. When you expose it through database tables, you must consider that any application with the right credentials will have the ability to modify data. Because of this, you must carefully consider utilitarian implementations of constraints. If you had a single service managing persistence, you could be much looser in constraints on the database and enforce them in code.
Third, if you ever want to expose any functionality that the database tables currently allow you to provide to outside parties, you must write service code anyway, so you might be better served doing it strategically as opposed to reacting to requests.
Fourth, UI interaction directly with the data layer creates security risks, especially if the client is a thick client.
Finally, writing code that responds to events (service calls) is much easier than polling code. Typically, organizations that rely heavily on database polling end up reinventing the wheel every time a new project requires a new "monitoring service." It can be avoided by creating a "framework," but those have their own pitfalls (primarily around prescription versus adoption).
This is just a laundry list of problems I have encountered. It's not necessarily meant to dissuade you from using databases for these functions, but it helps to know the dangers ahead of time so you can at least plan for them if they ever do become issues.
EDIT
Just thought of another scenario that caused us pains. Versioning your changes can be difficult. For example, if you need to change the shape of a table (normalize/denormalize), it has a cascading effect if multiple applications rely on it. In a SOA scenario, it is much easier, because you can keep your old API, change the internal interaction so that it works with the changed tables, and allow consumers to migrate to the new version on their own schedule.
A data broker sounds like a really good way to abstract out the multiple data sources for your apps. It would be easy to consolidate, change repositories, or otherwise move data around if needed in the future.
I may be misunderstanding something, but it seems to me like you should consider some entity framework. That is a framework you can use to "map" your interaction with the db to some domain objects. That way you work locally on domain objects that gets filled form your db, and when it is time to persist the state of your objects to the base, the framework handles all the connections back and forth. In this way you can also easily mock up these domain objects for unit testing without needing a db connection.
Check out NHibernate for a good entity framework alternative.
If you already have the database related know-how I think it's not a bad decission.
Good things that I can think of:
if the data model is consistent you can plug in new tools easily without making any changes in the other apps.
maybe you can have running the database more reliabily than your apps, so if one of them fails, the other one can still be working.
you can make backups and rollbacks using the database tools.
you can do emergency fixes manipulating the data directly with sql or some visual tool.
But if you have to learn new frameworks along the way, maybe the benefits are not worth the extra initial effort.
"any app that needs values from the db makes a request to the data broker"
When database technology was being invented over 40 years ago, the people doing that inventing had ideas along the lines of "any app that needs values from the db makes a request to the dbms".
Have you ever pondered the possibility that YOU ALREADY HAVE a "data broker", and that there might be very little added value in creating a second one of your own ?

Sometimes Connected CRUD application DAL

I am working on a Sometimes Connected CRUD application that will be primarily used by teams(2-4) of Social Workers and Nurses to track patient information in the form of a plan. The application is a revisualization of a ASP.Net app that was created before my time. There are approx 200 tables across 4 databases. The Web App version relied heavily on SP's but since this version is a winform app that will be pointing to a local db I see no reason to continue with SP's. Also of note, I had planned to use Merge Replication to handle the Sync'ing portion and there seems to be some issues with those two together.
I am trying to understand what approach to use for the DAL. I originally had planned to use LINQ to SQL but I have read tidbits that state it doesn't work in a Sometimes Connected setting. I have therefore been trying to read and experiment with numerous solutions; SubSonic, NHibernate, Entity Framework. This is a relatively simple application and due to a "looming" verion 3 redesign this effort can be borderline "throwaway." The emphasis here is on getting a desktop version up and running ASAP.
What i am asking here is for anyone with any experience using any of these technology's(or one I didn't list) to lend me your hard earned wisdom. What is my best approach, in your opinion, for me to pursue. Any other insights on creating this kind of App? I am really struggling with the DAL portion of this program.
Thank you!
If the stored procedures do what you want them to, I would have to say I'm dubious that you will get benefits by throwing them away and reimplementing them. Moreover, it shouldn't matter if you use stored procedures or LINQ to SQL style data access when it comes time to replicate your data back to the master database, so worrying about which DAL you use seems to be a red herring.
The tricky part about sometimes connected applications is coming up with a good conflict resolution system. My suggestions:
Always use RowGuids as your primary keys to tables. Merge replication works best if you always have new records uniquely keyed.
Realize that merge replication can only do so much: it is great for bringing new data in disparate systems together. It can even figure out one sided updates. It can't magically determine that your new record and my new record are actually the same nor can it really deal with changes on both sides without human intervention or priority rules.
Because of this, you will need "matching" rules to resolve records that are claiming to be new, but actually aren't. Note that this is a fuzzy step: rarely can you rely on a unique key to actually be entered exactly the same on both sides and without error. This means giving weighted matches where many of your indicators are the same or similar.
The user interface for resolving conflicts and matching up "new" records with the original needs to be easy to operate. I use something that looks similar to the classic three way merge that many source control systems use: Record A, Record B, Merged Record. They can default the Merged Record to A or B by clicking a header button, and can select each field by clicking against them as well. Finally, Merged Records fields are open for edit, because sometimes you need to take parts of the address (say) from A and B.
None of this should affect your data access layer in the slightest: this is all either lower level (merge replication, provided by the database itself) or higher level (conflict resolution, provided by your business rules for resolution) than your DAL.
If you can install a db system locally, go for something you feel familiar with. The greatest problem I think will be the syncing and merging part. You must think of several possibilities: Changed something that someone else deleted on the server. Who does decide?
Never used the Sync framework myself, just read an article. But this may give you a solid foundation to built on. But each way you go with data access, the solution to the businesslogic will probably have a much wider impact...
There is a sample app called issueVision Microsoft put out back in 2004.
http://windowsclient.net/downloads/folders/starterkits/entry1268.aspx
Found link on old thread in joelonsoftware.com. http://discuss.joelonsoftware.com/default.asp?joel.3.25830.10
Other ideas...
What about mobile broadband? A couple 3G cellular cards will work tomorrow and your app will need no changes sans large pages/graphics.
Excel spreadsheet used in the field. DTS or SSIS to import data into application. While a "better" solution is created.
Good luck!
If by SP's you mean stored procedures... I'm not sure I understand your reasoning from trying to move away from them. Considering that they're fast, proven, and already written for you (ie. tested).
Surely, if you're making an app that will mimic the original, there are definite merits to keeping as much of the original (working) codebase as possible - the least of which is speed.
I'd try installing a local copy of the db, and then pushing all affected records since the last connected period to the master db when it does get connected.

Categories