Creating a clear abstraction layer over a convoluted and large SQL database

Creating a clear abstraction layer over a convoluted and large SQL database - c#

Almost all of the applications I write at work get their data from a central MSSQL database. This database has about 70 tables, and on average I'd say 25 or so columns per table. The database has developed over 5-10 years (I'm not entirely sure) and is full of idiosyncrasies and quirks. Foreign keys are irregularly implemented when it comes to naming and so on, as well as case and language mixing in table and column names.
I am not able to restructure the database itself as it would break a ton of backwards compatibility for applications needed in the daily work of most people in the office.
I've almost exclusively been using LINQ2SQL for interacting with the database and it works fine, but always requires a lot of manual joining of tables, either in some db repository or 'inline' when coding. So I've finally decided that I have to do something to once and for all ease the pain of working with this leviathan. This would preferably include implementing a clear naming scheme, joining relevant tables with foreign keys properly once and for all etc.
The three routes I can see are:
Creating a number of views, stored procedures and functions in the SQL to ease up my interaction with the DB. This obviously has the bonus of being usable in many languages, as opposed to a solution implemented in e.g. C#. The biggest drawback I can see here is that it would probably take a lot of time to do this properly, as well as being a bit harder to service a year down the road when I haven't looked at the SQL queries for a while. I would also need to implement another DB abstraction step inside my applications as I wouldn't want to work with just straight up DB calls (abstraction upon abstraction seems bad in this case, but maybe I'm wrong?)
Continuing on my LINQ2SQL road, but creating a once-and-for-all repository class that hides all the underlying tables in abstracted calls only. This idea seems more feasible in terms of development time, maintenance and single-point-abstraction.
Pulling off some EF4 reverse-engineering magic, using the designer to hook up relevant foreign keys and renaming table classes to fit my taste.
Any input on how this should/could be done, as well as any recommended reading you might have, would be most appreciated.

We have a very similar situation with our database. We went the EF route, but we used Code First. I know it sounds weird to use Code First when your database already exists, but due to the size of the tables and the number of tables, trying to do it all in the designer was not feasible.
You can use the "Reverse Engineer Code First" option in Entity Framework Power Tools to generate everything you need from your database.

I think that well thought out abstraction layer is better suits the needs of application if it is not based on physical schema of DB. I mean - the main goal of DAL is to hide tables from users leaving to them only valid "activities" thru stored procedures. In most cases this will outperform the direct data access and gives to you one more degree of freedom - to play with TSQL code and to implement additional logic/schema changes without needing to change the application.

Related

Is there a program that finds the most optimal database indexes based on your code (C# for example)?

I know there are programs that give you a selection of indexes based on your database schema, but I can't find any that do it based on your code (i.e. based on what tables are accessed in queries the most). I mean, if in C# you opt for the code-first approach and Entity Framework handles the database, does someone start manually re-reading the code or watching the server-side requests for more optimal indexes, or you just don't care, since EF does a good enough job with the default schema?
Am I just not finding them or is there a huge and obvious reason (besides not wanting to disclose your code to third parties) that there are virtually none?

Entity Framework VS pure Ado.Net

EF is so widely used staff but I don't realize how I should use it. I met a lot of issues with EF on different projects with different approaches. So some questions brought together in my head. And answers leads me to use pure ado.net with stored procedures.
So the questions are:
How to deal with EF in n-tier application?
For example, we have some DAL with EF. I saw a lot of articles and projects that used repository, unit of work patterns as some kind of abstraction for EF. I think such approach kills most of benefits that increase development speed and leads to few things:
remapping of EF load results in some DTO that kills performance(call some select to get table data - first loop, second loop - map results to some composite type generated by ef, next - filter mapped data using linq and, at last, map it to some DTO). Exactly remapping to DTO is killer of one of the biggest efs benefit;
or
leads to strong cohesion between EF (and it's version) and app. It will be something like 2-tier app with dal and presentation with bll or dal with bll and presentation. I guess it's not best practice. And the same loading process as we have for previous thing except mapping, so again performance issue raised up. We could try to use EF as DAL without any abstraction under them. But we will get similar issues in some other way.
Should I use one context per app\thread\atomic operation? Using approach - one context per app\thread may slightly increase performance and possibilities to call navigation properties, but we meet another problem - updating this context and growing loaded data in context, also I'm not sure about concurrency with one dbcontext per app\thread. Using context per operation will lead us to remapping EF results to our DTO's. So you see that we again pushed back to question no.1.
Could we try to use EF + stored procedures only? Again we have issues from previous questions. What is the reason to use EF if the biggest part of functionality will not be used?
So, yes EF is great to start project. It so convenient when we have few screens and crud operations.
But what next?
All this text is just unsorted thoughts. I know that pure ado.net will lead to another kind of challenges.
So, what is your opinion about this topic?

By following the naming conventions , you will find it's called : ADO.NET Entity Framework , which means that Entity Framework sits on top of ADO.NET so it can't be faster , It may perform both in equal time , but let's look at EF provides :
You will no more get stuck with writing queries without any clue about if what you're writing is going to compile or not .
It makes you rely on C# or your favorite .NET language on writing your own data constraints that you wish to accept from the target user directly inside your model classes .
Finally : EF and LINQ give a lot of power in maintaining your applications later .
There are three different models with the Entity Framework : Model First , Database First and Code First get to know each of 'em .
-The Point about killing performance when remapping is on process , it's because that on the first run , EF loads metadata into memory and that takes time as it builds in-memory representation of model from edmx file.

ADO. Net is an object oriented framework that allows you to interact with database system (SQL, Oracle, etc).
Entity framework is a techniques of manipulating data in databases like (collection of queries (inert table name , select * from like this )).
it is uses with LINQ.

Entity Framework is not efficient in any case as in most tools or toolboxes designed to achieve 'faster' results.
Access to database should be viewed as a separate tier using store procedures as the interface. There is no reason for any application to have more than absolutely require CRUD operations. Less is more principle. Stored procedures are easy to write, secure, maintain and is de facto fastest way. It's easy to write tools to generate desired codes for POCO and DbContext through stored procedures.
Application well designed should have a limited numbers of connection strings to database and none of which should be the all mighty God. Using schema to support connection rights.
Lazy loading are false statements added to solve a problem that should never exist and introduced with ORM and its plug and play features. Data should only be read when needed. Developers should be responsible to implement this logic base on application context.
If your application logic has a problem to maintain states, no tool will help. It will in fact, make it worse by cover up the real problem until it's too late.
Database first is the only solution for a well designed application. Civilization realized long time ago the important of solid aqueduct and sewer system. High level code can and will be replaced anytime but data stays. Rewrite an entire application is matter of days if database is well designed.
Applications are just glorified database access. Still true in most cases.
This is my conclusion after many years in business applications debugging through codes produced by many different tools or toolboxes. The faster results advertised are not even close to cover the amount of time/energy wasted later trying to clean up the mess. Performance issues are rarely if not ever caused by high demand but the sum of all 'features' added through unusable tools.

ADO.NET provides consistent access to data sources such as SQL Server and XML, and to data sources exposed through OLE DB and ODBC. Data-sharing consumer applications can use ADO.NET to connect to these data sources and retrieve, handle, and update the data that they contain.
Entity Framework 6 (EF6) is a tried and tested object-relational mapper (O/RM) for .NET with many years of feature development and stabilization. An ORM like EF has the following advantage
ORM lets developers focus on the business logic of the application thereby facilitating huge reduction in code.
It eliminates the need for repetitive SQL code and provides many benefits to development speed.
Prevents writing manual SQL queries; & many more..
In an n-tier application,it depends on the amount of data your application is handling and your database is managing. According to my knowledge DTO's don't kill performance. They are data container for moving data between layers and are only used to pass data and does not contain any business logic. They are mostly used in service classes.See DTO.
One DBContext is always a best practice.
There is no such combination of EF + SP(Stored Procedure) as per my knowledge. If you wish to use an ORM like EF and an SP at the same time try micro-ORMs like Dapper,BLToolkit, etc..It was build for that purpose and is heck lotta fast than EF. Here is a good article on Dapper ORM.
Here is a related thread on a similar topic: What is the difference between an orm and ADO.net?

How to design a Data Access Layer for a database table that may change in the future?

Introduction:
I'm refactoring (pretty much rewriting) a legacy application in my current internship. The part that this question will be concerned about is the database it uses and the way they retrieve data from it.
The database structure is:
There's a table that has the main records. Let's say each record is a measurement. It has some info about the measured material and different measurement information.
There's a table view they use that has the same information columns, plus some extra columns that contains data calculated from the given measurements. And it also filters some of the data from the table.
So let's say we have the main table with columns:
Measurement ID
Measurement A
Measurement B
The view has something like this:
Measurement ID
Measurement A
Measurement B
Some extra data (for example Measurement A * Measurement B)
The guy that is leading the development only knows some SQL, so he likes adding new columns that is calculated by some columns in the main table for experimenting. And this is definitely a need at the moment.
Requirements are:
Different types of databases should be supported (like SQL Server, Oracle, and probably some others).
The frontend should be able to show the view, which means even though some main columns will always stay the same, there may be some new columns including newly calculated values.
My question is:
What kind of system should I use to accommodate the needs of this application? I wanted to use Entity Framework, but the fact that the view may have new columns in the future is I think a problem. As far as I understand, I should map my classes to the database before compiling.
The other thing that I'm considering is maybe using Entity Framework to get data from the main table and do the calculations and the filtering that is currently done in the table view directly in the frontend, and skip the view altogether. Which sounds fine, though I don't know if they will allow me to do that.
What would you do in my case? Please take into account that I have virtually no experience with databases and ORMs.

You are correct in that using Entity Framework will be a problem if the underlying DB schema is always changing. It will require you to update the EF model on your end every time to grab those new columns.
Ideally, all of your database access is hidden behind the interface to your DAL, so that your application doesn't need to know about which ORM is being used -- if any -- or which database it's connecting to.
I hate to say it, but given your requirements, an ORM might not make sense. You might want to go with something more generic without any strong-typing. You could just simply always return a DataTable to your application layer, and it could loop through the columns and values to display whatever is returned. If there are fields you know will never change, you could create a manual mapping for those fields only into your application object(s).

You may have a look to NoSQL system that are a lot more flexible on the schema. Or have a look to document database like RavenDB. All these systems allow the schema to change dynamically. You need to check the Pro's and Con's to see if it can fill you requirements.
(This answer is a bit out of subject as it's about replacing the SQL server and not really creating a DAL, but other answers cover the subject well and I would like to propose another way that may help.)

If your schema is unstable, then using Entity Framework as a beginner is going to be a headache. The assumption is that you can just refresh the design canvas periodically to let the tool handle database table changes. You can try that for a time to see when it becomes too much of a pain, but without any prior experience using ORMs or Entity Framework it may not be worth the effort.
I would probably use something like Rob Conery's Massive ORM (https://github.com/robconery/massive). It gives you more flexibility with the underlying database schema and is a very small library. I remember it being ~300 lines of code and very easy to use. It uses C# dynamics so you'll have to be using >= C# 4.0 and be comfortable with that one concept but IMO it's worth it for the low-overhead. A full-fledged ORM like Entity Framework or NHibernate is going to cost a lot of learning cycles.
You could, of course, just stick to ADO.NET DataTables. They're a bit ugly and verbose, but they'll do the job.

You can use Entity Framework - Database First if the DB is changing. Of course, you will have to regenerate your classes when you want to be able to access new columns, when the DB schema changes.
If you need to accomodate different database servers, then you should take a look into implementing a repository pattern and abstract all your data access that way.

Your comment
it involves write operations to the main table but the main table never changes
confirms what I was hoping for. It means you can use Entity Framework as the core of you application and a different route to display data.
Suppose that for display (of the view) you use a classic DataTable (because all common grids support them, contrary to displaying dynamic objects). I don't know how create/update/delete will be done, but saving changes will at some point involve mapping a DataRow to a MainEntity object. You can write one method for that like
MainEntity DataRowToEntity(DataRow row)
{
var entity = new MainEntity();
entity.PropertyA = row["PropertyA"];
....
}
The MainEntity can be attached to a context, its status changed to Modified, and saved.

Win-based application (C#) needs a class for handling DB Connection

I think about having a class clsConnection which we can take advantage of in order to execute every SQL query like select, insert, update, delete, .... is pretty good.
But how complete it could be? How?

You could use LINQ to SQL as AB Kolan suggested or, if you don't have time for the learning curve, I'd suggest taking a look at the Microsoft Enterprise Library Data Access Application Blocks.

You can use the DAB (SQlHelper) from the enterprise Library. This has all the methods/properties necessary for database operation. You dont need to create you own code.
Alternately you can use a ORM like LINQ or NHibernate.

It sounds to me like you're just re-writing the ADO.NET SqlConnection (which already has an attached property of type SqlCommand). Or Linq to SQL (or, even, Linq to Entities).

When doing data access i tend to split it into 2 tiers - purely for testability.
totally seperate the logic for getting key values and managing the really low level data collection from the atomic inserts, updates, selects deletes etc.
This way you can test the logic of the low level data collection very easily without needing to read and write from a database.
this way one layer of classes effectively manages writes to individual tables whilst the other is concerned with getting the data from lookups etc to populate these tables
The Business logic layer that sits on top of these 2 dal layers obviously manages the actual business logic - this means that the datastructure is as seperated from the business logic as is realistically possible ... Ie you could replace the dal and not feel the pain so much.
the 2 routes you can take that work well are
ADO.Net
this is very powerful as you have total control, but at the same time it is time consuming and feels repetative. Also its old school so most people are bored of it hence all the linq 2 sql comments. With this you open a connection to the DB and then execute a command against it.
Basically you create a class to interface with the database and use this to use stored procedures that are in the database. The lowest level class essentially fires off the command with its parameters and then populates itself with the returned values.
and Linq 2 SQL
This is a cool system. Essentially it makes SP's redundant for 90% of cases in return for allowing strongly typed sqlesque statements in your code - save time and are more reliable. I still use 2 dal layers with this but take advantage of the fact that it will generate the basic class with properties for you and simply add functionality to actually do the atomic operations. The higher level then implements the read and write logic for multiple objects.
The nicest part is that you can generate collections of collections easily with linq 2 sql and then write all the inserts and updates with one command (altohguh in reality you tend to do things seperatley).
L2S is powerful once you start playing with it wheras generating a collection of objects from ado.net can be a real pain in comparison - especially when you have to do it again and again.
Another alternative is Linq 2 entities
I ahve had problems with this due to linked servers, also it doesn't like views much and if your tables dont have pk's or constraints then it doesn't like life much either. Id stay clear of it for a while.
Of course if you mean that you want a generic class for writing and reading data from a database I think you will be adding complexity rather than solving a problem. Really you can;t avoid writing code ;) - each bit of data access is unique, trying to genericise it past ado.net or l2s is really asking for trouble imo.

Small project:
A singleton class (like DatabaseConnection) might be good for what you're doing.
Large project:
Enterprise Library has some database code; NHibernate or Entities Framework, perhaps.
Your question wasn't specific enough to give a very definitive answer on this.

Sometimes Connected CRUD application DAL

I am working on a Sometimes Connected CRUD application that will be primarily used by teams(2-4) of Social Workers and Nurses to track patient information in the form of a plan. The application is a revisualization of a ASP.Net app that was created before my time. There are approx 200 tables across 4 databases. The Web App version relied heavily on SP's but since this version is a winform app that will be pointing to a local db I see no reason to continue with SP's. Also of note, I had planned to use Merge Replication to handle the Sync'ing portion and there seems to be some issues with those two together.
I am trying to understand what approach to use for the DAL. I originally had planned to use LINQ to SQL but I have read tidbits that state it doesn't work in a Sometimes Connected setting. I have therefore been trying to read and experiment with numerous solutions; SubSonic, NHibernate, Entity Framework. This is a relatively simple application and due to a "looming" verion 3 redesign this effort can be borderline "throwaway." The emphasis here is on getting a desktop version up and running ASAP.
What i am asking here is for anyone with any experience using any of these technology's(or one I didn't list) to lend me your hard earned wisdom. What is my best approach, in your opinion, for me to pursue. Any other insights on creating this kind of App? I am really struggling with the DAL portion of this program.
Thank you!

If the stored procedures do what you want them to, I would have to say I'm dubious that you will get benefits by throwing them away and reimplementing them. Moreover, it shouldn't matter if you use stored procedures or LINQ to SQL style data access when it comes time to replicate your data back to the master database, so worrying about which DAL you use seems to be a red herring.
The tricky part about sometimes connected applications is coming up with a good conflict resolution system. My suggestions:
Always use RowGuids as your primary keys to tables. Merge replication works best if you always have new records uniquely keyed.
Realize that merge replication can only do so much: it is great for bringing new data in disparate systems together. It can even figure out one sided updates. It can't magically determine that your new record and my new record are actually the same nor can it really deal with changes on both sides without human intervention or priority rules.
Because of this, you will need "matching" rules to resolve records that are claiming to be new, but actually aren't. Note that this is a fuzzy step: rarely can you rely on a unique key to actually be entered exactly the same on both sides and without error. This means giving weighted matches where many of your indicators are the same or similar.
The user interface for resolving conflicts and matching up "new" records with the original needs to be easy to operate. I use something that looks similar to the classic three way merge that many source control systems use: Record A, Record B, Merged Record. They can default the Merged Record to A or B by clicking a header button, and can select each field by clicking against them as well. Finally, Merged Records fields are open for edit, because sometimes you need to take parts of the address (say) from A and B.
None of this should affect your data access layer in the slightest: this is all either lower level (merge replication, provided by the database itself) or higher level (conflict resolution, provided by your business rules for resolution) than your DAL.

If you can install a db system locally, go for something you feel familiar with. The greatest problem I think will be the syncing and merging part. You must think of several possibilities: Changed something that someone else deleted on the server. Who does decide?
Never used the Sync framework myself, just read an article. But this may give you a solid foundation to built on. But each way you go with data access, the solution to the businesslogic will probably have a much wider impact...

There is a sample app called issueVision Microsoft put out back in 2004.
http://windowsclient.net/downloads/folders/starterkits/entry1268.aspx
Found link on old thread in joelonsoftware.com. http://discuss.joelonsoftware.com/default.asp?joel.3.25830.10
Other ideas...
What about mobile broadband? A couple 3G cellular cards will work tomorrow and your app will need no changes sans large pages/graphics.
Excel spreadsheet used in the field. DTS or SSIS to import data into application. While a "better" solution is created.
Good luck!

If by SP's you mean stored procedures... I'm not sure I understand your reasoning from trying to move away from them. Considering that they're fast, proven, and already written for you (ie. tested).
Surely, if you're making an app that will mimic the original, there are definite merits to keeping as much of the original (working) codebase as possible - the least of which is speed.
I'd try installing a local copy of the db, and then pushing all affected records since the last connected period to the master db when it does get connected.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.