Using NHibernate with ancient database with some "dynamic" tables - c#

I have a legacy database with a pretty evil design that I need to write some applications for. I am not allowed to touch the database design at all, seeing how this is a fragile old system held together by spit and prayers. I am of course very aware that this is not how the database should have been designed in the first place, but real life some times gets in the way..
For my new application I am using NHibernate (with Fluent for mappings and NHibernate LINQ for querying) and trying to Do Things Right. So there is IoC and repositories and more interfaces than I can count. However, the DB structure is giving me some headaches.
The system is very much focused around the concept of customers, and each customer lives in a campaign. These campaigns are created by one of the old applications. Each campaign in the system is defined in a table called CampaignSettings. One of the columns of this table is simply a text column called "Table", which refers to a database table that is created at the same time as the campaign entry in CampaignSettings. The name of this table is related to the name of the campaign, which can pretty much be anything the customer wants (within the constraints given by SQL Server (2000 or 2005)). In these tables the customers live.
So that is challenge #1 - I won't know the table names until runtime. And it will change from site to site - no static mapping I guess.
To make it even worse, we have challenge #2 - this campaign table is also dynamic in structure, meaning it has a certain number of columns that are always there (customer id, name, phone number, email address and other housekeeping stuff), and then there are two other sets of columns, added depending on the requirements of the customer on a case-by-case basis.
The old applications use SQL to get the column names present in the table, then add the ones it doesn't know about as "custom fields" in the application. I need to handle this.
I know I probably can't handle these challenges simply by using mapping magic, and I am prepared to do some ugly SQL in addition to the ORM goodness that I get from NHibernate (there are 20-some "static" tables in here as well which NHibernate handles beautifully) - but how?
I will create a Customer entity that I guess I can populate manually by doing direct SQL like
SELECT * FROM SomeCampaignTable WHERE id=<?>
and then going through the columns one by one and putting stuff where it belongs. Not fun, but necessary.
And then I guess to discover the structure of the table in the first place, I could run SQL like this:
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'SomeCampaignTable'
ORDER BY ORDINAL_POSITION
And again do some manual work to configure my object to handle the custom fields.
My question is simply - how can I do this in NHibernate? Is it a simple matter of finding a way to run my own SQL, then looping through the results, or is there a more elegant way to take the pain out of it?
While I appreciate that this database design belongs in some kind of Museum of Torture somewhere, answers like "Add some views" or "Change the DB" won't help me - I will be shot if I suggest something like that.
Thanks for anything that could help save my sanity here!

You might be able to use NHibernate using Native SQL Entity Queries. Forget Linq2NH - not that I would recommend Linq2NH for any serious application.
Check this page.
13.1.2. Entity queries
https://www.hibernate.org/hib_docs/nhibernate/1.2/reference/en/html/querysql.html
You could maybe do something like this:
Map your entities based on a 'fake' table to keep NHibernate happy when it compiles the mapping documents (I know you said you can't change the DB, but hopefully ok to make an empty table to keep NH happy).
Then run a query like this, as per 13.1.2 above:
sess.CreateSQLQuery("SELECT tempColumn1 as mappingFileColumn1, tempColumn2 as mappingFileColumn2, tempColumn3 as mappingFileColumn3 FROM tempTableName").AddEntity(typeof(Cat));
NHibernate should stitch together the columns you've returned with the mapped entity and give you the entity of type 'Cat' with all the properties populated. I am speculating here though, I do not know for sure if this will work, its the only way I can think of to use NHibernate for this given you don't know the tables/columns at compile time. You definitely cannot use HQL, Criteria, Linq2NH since you don't know the tables and columns at compile time, and HQL et al all convert your mappings to the mapped column names to produce the underlying SQL. Native SQL Queries are the only way I think.

Related

linq2sql C#: How to query from a table with changing schema name

I have a webservice which tries to connect to a database of a desktop accounting application.
It have tables with same name but with different schema names such as:
[DatabaseName].[202001].[CustomerCredit]
[DatabaseName].[202002].[CustomerCredit]
.
.
.
[DatabaseName].[202014].[CustomerCredit]
[DatabaseName].[202015].[CustomerCredit]
[DatabaseName].[202016].[CustomerCredit]
...
..
[DatabaseName].[2020xx].[CustomerCredit]
Schema name is in format [Year+IncrementalNumber] such as [202014], [202015],[202016] and etc.
Whenever I want to query customer credit information in database, I should fetch information from schema with biggest number such as [DatabaseName].[202016].[CustomerCredit] if 202016 is latest schema in my db.
Note:
Creation of new schema in accounting application database have no rules and is completely decided by user of accounting application and every instance of application installed on different place may have different number of schemas.
So when I'm developing my webservice I have no idea to connect to which schema prior to development. In run-time I can find correct schema to query from its tables but I don't know how to manage to fetch table information with correct schema name in query.
I ususally creat a linq-to-sql dbml class and use its definitions to read information from db but I don't know how to manage schema change in this way?
DBML designer manage Scehma names like this:
[global::System.Data.Linq.Mapping.TableAttribute(Name="[202001].CustomerCredit")]
However since my app can retrieve schema name in run time, I don't know how to fix table declaration in my special case.
It is so easy to handle in ADO.NET but I don't know its equivalent in Linq2SQL:
select count(*) from [" + Variables.FinancialYearSchemaName + "].CustomerCredit where SFC_Status = 100;
Ultimately, no: most ORMs do not expect the schema change to vary at runtime, so most - including EF and LINQ-to-SQL do not support this scenario. One possible option would be to have different connection strings, each with different user accounts, that each has a different default schema configured at the database - and intialize your DB-context with a connection-string or connection that matches the required account. Then if EF asks the RDBMS for [CustomerCredit], it will look first in that account's schema ([202014].[CustomerCredit]). You should probably avoid having a [202014].[CustomerCredit] in that scenario, to prevent confusion. This is, however, a pretty hacky and ugly solution. But... it should work.
Alternatively, you would have to take more control over the data access, essentially writing your own SQL (presumably with a token replacement for the schema, which has problems of its own).
That schema is essentially a manual partitioning of the CustomerCredit table. The best solution would one that makes partitioning transparent to all users. The code shouldn't know how the data is partitioned.
Database Solutions
The benefit of database solutions is that they are transparent or almost transparent to users and require minimal maintenance
Table Partitioning
The clean solution would be to use table partitioning, making the different partitions transparent to all users. Table partitioning used to be an Enterprise-only feature but it became available in all editions since SQL Server 2016 SP1, even Express. This means it's free in all versions still in mainstream support.
The table is partitioned based on a function (eg a date based function) and stored in different files. Whenever possible, the query optimizer can check the partition boundaries and the query conditions and use only the file that contains the relevant data. Eg in a date-partitioned table, queries that contain a date filter can search only the relevant partitions.
Partitioned views
Another option, available since 2000 at least, is to use partitionend views, essentially a UNION ALL view that combines all table partitions, eg :
SELECT <select_list1>
FROM [202001].[CustomerCredit]
UNION ALL
SELECT <select_list2>
FROM [202002].[CustomerCredit]
UNION ALL
...
SELECT <select_listn>
FROM Tn;
EF can map entities to views instead of tables. If the criteria for updatable views are met, the partitioned view itself will be updatable and any modifications will be made to the correct table.
The query optimizer can take advantage of CHECK constraints on the tables to search only one table at a time, similar to how partitioned tables work.
Code solutions
This requires raw SQL queries, and a way to identify the correct table/schema each time a change is made. It requires modifications to the application each time the table partitioning changes, whether those are code modifications, or changes in a configuration file.
In all cases, one query can only read from one table at a time
Keep ADO.NET
One possibility is to keep using ADO.NET, replacing the table/schema name in a query template. The code will have to map to objects if needed, the same way it already did.
EF Raw SQL
Another, is to use EF's raw SQL features, eg EF Core's FromSqlRaw to query from a specific table , the same way ADO.NET would. The benefit is that EF will map the query results to objects. In EF Core, the raw query can be combined with LINQ operators :
var query=$"select * from [DatabaseName].[{schemaName}].[CustomerCredit]"
var credits = context.CustomerCredits
.FromSqlRaw(query)
.Where(...)
.ToList();
Dapper
Another option is to use Dapper or another micro-ORM with an ad-hoc query, similar to ADO.NET, and map the results to objects:
var query=$"select * from [DatabaseName].[{schemaName}].[CustomerCredit] where customerID=#ID";
var credits=connection.Query<CustomerCredit>(query,new {ID=someID});

SQL Server and Entity Framework - Dynamic Columns

I use SQL Server and Entity Framework as ORM.
Currently I have a table Product which contains all products of any kind. The different kinds of products possess different attributes.
For example:
All products of kind TV have attributes title, resolution and contrast
Where as all products of kind Car have attributes like model and horsepower
Based on this scenario I created a table called Attribute which contains all the attributes of a product.
Now to fetch a product from database I always have to join all the attributes.
To insert a product I have to insert all the attributes one by one as single rows.
The application is not just a shop or anything like it. It should be possible to add/remove an attribute to/from a kind of product on the fly without changing the db.
But my questions to you is still:
Is this a bad design?
Is there another way of doing it?
Will my solution slow down significant? (f.e. an insert takes several seconds assumed the product has hundreds of attributes...)
Update
The problem is that my application is really complex. There are a lot of huge algorithms. The software is used for statistical purposes.
One problem for example is the following one: In an algorithm-table I'm storing which attributes are used for filters. Say an administrator wants to filter all cars that have less than 100 horsepowers. The filters are dynamical, what means that I have a filter table which stores the filter type (lessThan) and the attribute (horsepowers). How can I keep this flexibility with the suggested approaches (with "hardcoded" columns)?
There is a thing about EF that I don't think everybody is aware of when designing the relations.
When you query something, EF (at least <= 4) wants to create a single SELECT for that query.
What that implies is that if you have entity A, that have a one-to-many relationship to entity B (say Item to Attributes) then EF joins the two together such there will be a returned row for all dependent Bs for each A. If A have many properties, multiple dependencies or even worse if B has many sub-dependencies, then the returned table will be quite massive, since all A-properties will be copied for each row of dependent B. Over time, when your entity models grow in complexity, this can turn into a real performance problem.
EF only includes the Bs if you explicitly tell to it to eager load the dependencies "include"s. If the includes are omitted, your stuff will initially load faster, but once you access your attributes, they will be lazy-loaded by EF. This is known as the SELECT N+1 problem (each A will require N times B-lazy queries, which can be a huge overhead).
While this is not a straight answer to your question, it is something to consider when designing your tables.
Also note, that EF supports several alternatives for base-classing. One strategy is to have a common table, that automatically joined together with the sub-entities. The alternative, which typically performs better, but is harder to upgrade, is to have one table with a super-set of all properties of all sub-classes.
More (over) generalized database design considerations:
The devil is in the details. You can make a whole career out of making good database design choices. There is no silver bullet database patterns.
EF comes with a lot of limitations. This is the price for the convenience. If the model suits EF well, then EF is quite good, but do consider more flexible alternatives like NHibernate. Sometimes even plain old data tables with views and stored procedures are to be preferred.
EF is not efficient if your model has a lot of small dependents (like a ton of attributes to an item table). It will result in either a monster query and return table or the select n+1 problem. You can write some tricky multi-part LINQ queries to somewhat compensate, but it is tricky.
SQL's strength is in integrity and reporting which works best for rather rigid data models.
Depending on the details, your model looks like a great candidate for a NoSql backend, like RavenDb and MongoDb. NoSql is much better for dynamic datamodels and scale really well.

How to design a Data Access Layer for a database table that may change in the future?

Introduction:
I'm refactoring (pretty much rewriting) a legacy application in my current internship. The part that this question will be concerned about is the database it uses and the way they retrieve data from it.
The database structure is:
There's a table that has the main records. Let's say each record is a measurement. It has some info about the measured material and different measurement information.
There's a table view they use that has the same information columns, plus some extra columns that contains data calculated from the given measurements. And it also filters some of the data from the table.
So let's say we have the main table with columns:
Measurement ID
Measurement A
Measurement B
The view has something like this:
Measurement ID
Measurement A
Measurement B
Some extra data (for example Measurement A * Measurement B)
The guy that is leading the development only knows some SQL, so he likes adding new columns that is calculated by some columns in the main table for experimenting. And this is definitely a need at the moment.
Requirements are:
Different types of databases should be supported (like SQL Server, Oracle, and probably some others).
The frontend should be able to show the view, which means even though some main columns will always stay the same, there may be some new columns including newly calculated values.
My question is:
What kind of system should I use to accommodate the needs of this application? I wanted to use Entity Framework, but the fact that the view may have new columns in the future is I think a problem. As far as I understand, I should map my classes to the database before compiling.
The other thing that I'm considering is maybe using Entity Framework to get data from the main table and do the calculations and the filtering that is currently done in the table view directly in the frontend, and skip the view altogether. Which sounds fine, though I don't know if they will allow me to do that.
What would you do in my case? Please take into account that I have virtually no experience with databases and ORMs.
You are correct in that using Entity Framework will be a problem if the underlying DB schema is always changing. It will require you to update the EF model on your end every time to grab those new columns.
Ideally, all of your database access is hidden behind the interface to your DAL, so that your application doesn't need to know about which ORM is being used -- if any -- or which database it's connecting to.
I hate to say it, but given your requirements, an ORM might not make sense. You might want to go with something more generic without any strong-typing. You could just simply always return a DataTable to your application layer, and it could loop through the columns and values to display whatever is returned. If there are fields you know will never change, you could create a manual mapping for those fields only into your application object(s).
You may have a look to NoSQL system that are a lot more flexible on the schema. Or have a look to document database like RavenDB. All these systems allow the schema to change dynamically. You need to check the Pro's and Con's to see if it can fill you requirements.
(This answer is a bit out of subject as it's about replacing the SQL server and not really creating a DAL, but other answers cover the subject well and I would like to propose another way that may help.)
If your schema is unstable, then using Entity Framework as a beginner is going to be a headache. The assumption is that you can just refresh the design canvas periodically to let the tool handle database table changes. You can try that for a time to see when it becomes too much of a pain, but without any prior experience using ORMs or Entity Framework it may not be worth the effort.
I would probably use something like Rob Conery's Massive ORM (https://github.com/robconery/massive). It gives you more flexibility with the underlying database schema and is a very small library. I remember it being ~300 lines of code and very easy to use. It uses C# dynamics so you'll have to be using >= C# 4.0 and be comfortable with that one concept but IMO it's worth it for the low-overhead. A full-fledged ORM like Entity Framework or NHibernate is going to cost a lot of learning cycles.
You could, of course, just stick to ADO.NET DataTables. They're a bit ugly and verbose, but they'll do the job.
You can use Entity Framework - Database First if the DB is changing. Of course, you will have to regenerate your classes when you want to be able to access new columns, when the DB schema changes.
If you need to accomodate different database servers, then you should take a look into implementing a repository pattern and abstract all your data access that way.
Your comment
it involves write operations to the main table but the main table never changes
confirms what I was hoping for. It means you can use Entity Framework as the core of you application and a different route to display data.
Suppose that for display (of the view) you use a classic DataTable (because all common grids support them, contrary to displaying dynamic objects). I don't know how create/update/delete will be done, but saving changes will at some point involve mapping a DataRow to a MainEntity object. You can write one method for that like
MainEntity DataRowToEntity(DataRow row)
{
var entity = new MainEntity();
entity.PropertyA = row["PropertyA"];
....
}
The MainEntity can be attached to a context, its status changed to Modified, and saved.

Create a database and entity model based on user input

Currently I'm adjusting a system that works with Entity Framework to connect to a SQL Server 2008 R2 database.
For the new part the key users need to add, change and remove entities that the normal users can use. Before I make a system that saves objects with names and with attributes I wanted to look if it is possible to create the database dynamically with the entities that the key users are giving (through a simplified entity designer).
I've search a little bit but on the internet but didn't find something quite like this. Maybe someone here knows something to push me in the right direction?
It sounds like you are best off by really defining tables, columns, indexes and foreign keys dynamically. If you were to use a "database of databases" schema with entities and attributes you would be unable to effectively index the database. Queries become extraordinarily slow and nasty.
You can query and change the database schema using SQL Server Management Objects (SMO). I have used them multiple times. They work and are quite nice to work with.
I'm not convinced that Entity Framework brings much to the table here. EF is good for expressing queries and DML on a static schema. If you were to use a dynamic schema you lose most of the benefits. Of course, some benefits remain such as entity key management and being able to use Entity SQL instead of T-SQL. On the downside you have to create all EF metadata at runtime (probably generate EDMX files or dynamic assemblies).
I think it is not worth it. I'd strongly consider building a database schema at runtime and executing queries against it using dynamically built T-SQL. It is much easier to do this than work against the system with EF.
In that sense you are back to DataTables and GridViews which was considered good style even 5 years ago. It's probably not too bad.

DB design when data is unknown about an entity?

I'm wondering if the following DB schema would have repercussions later. Let's say I'm writing a place entity. I'm not certain what properties of place will be stored in the DB. I'm thinking of making two tables: one to hold the required (or common) info, and one to hold additional info.
Table 1 - Place
PK PlaceId
Name
Lat
Lng
etc... (all the common fields)
Table 2 - PlaceData
PK DataId
PK FieldName
PK FK PlaceId
FieldData
Usage Scenario
I want certain visitors to have the capability of entering custom fields about a place. For example, a restaurant is a place that may have the following fields: HasParking, HasDriveThru, RequiresReservation, etc... but a car dealer is also a place, and those fields wouldn't make sense for a car dealer.
I want to support any type of place, from a single table (well, 2nd table has custom fields), because I don't know the number of types of places that will eventually be added to my site.
Overall goal
On my asp.net MVC (C#/Razor) site, where I display a place, it will show the attributes, as a unordered list populated by: SELECT * FROM PlaceData WHERE PlaceId = #0.
This way, I wouldn't need to show empty field names on the view (or do a string.IsNullOrWhitespace() check for each and every field. Which I would be forced to do if every attribute was a column on the table.
I'm assuming this scenario is quite common, but are there better ways to do it? Particularly from a performance perspective? What are the major drawbacks of this schema?
Your idea is referred to as an Entity-Attribute-Value table and is generally bad news in a RDBMS. RDBMSes are geared toward highly structured data.
The overall options are:
Model the db further in an RDBMS, which is most likely if someone is holding back specs from you.
Stick with the RDBMS, using XML columns for the data whose structure is variable. This makes the most sense if a relatively small portion of your data storage schema is semi- or un-structured. Speaking from a MS SQL Server perspective, this data can be indexed and you can perform checks that your data complies with an XML schema definition.
Move to a non-relational DB such as MongoDB, Cassandra, CouchDB, etc. This is what a lot of social sites and I suspect blog sites run with. Also, it is within reason to use a combination of RDBMS and non-relational stores if that's what your needs call for.
EAV gets to be a mess because you're creating a database within a database and lose all of the benefits a RDBMS can provide (foreign keys, data type enforcement, etc.) and the SQL code needed to reconstruct your objects goes from lasagna to fettuccine to spaghetti in the blink of an eye.
Given the information that's been added to the question, it would seem a good fit to create a PlaceDetails column of type XML in the Place table. You could also split that column into another table with a 1:1 relationship if performance requirements dictate it.
The upside to doing it that way is that you can retrieve the data using very simple SQL code, even using the xml data type's methods for searching the data. But that approach also allows you to do the more complex presentation-oriented data parsing in C#, which is better suited to that purpose than T-SQL is.
If you want your application to be able to create its own custom fields, this is a fine model. The Mantis Bugtracker uses this as well to allow Admins to add custom fields to their tickets.
If in any case, it's going to be the programmer that is going to create the field, I must agree with pst that this is more a premature optimization.
At any given time you can add new columns to the database (always watching for the third normalization rule) so you should go with what you want and only create a second table if needed or if such columns breaks any of the normal forms.

Categories