How to save unknown properties in database?

How to save unknown properties in database? - c#

I'm going to code a housekeeping book
So I create properties in code like Name, Category and some other need to create at run-time.
So how should I save that human-readable in a SQL Server database?
My suggestion is to create a table called Properties with 2 columns (Id, Name) and in that table I can store all my properties but it wouldn't anymore human-readable
I also not sure if it will be wise to create a column for each property in one big table
I could also create a XML "file" and store this in my DB but i don't thing this is a good idea either
Any advice is greatly appreciated

There are basically three approached to this
A column for every value
The one you are suggesting which is called an Entity Attribute Value model
Or the one you discounted which would be xml (or serialised objects)
They all have pros and cons, and some of the cons can get quite severe.
A column for every value means you have to change your db and model every time you want to store more data, which makes it very fragile and high maintenance.
EAV can easily lead to the queries becoming huge joins, and imposing data integrity on it is a hiding to nothing.
Object based can also lead to significant optimisation and maintenance issues, having to open every object to see if something is in it, for instance.
Now any one of these might be the best of a bad lot at the time you make the decision (they are all fragile in one respect or another), IF you insist on using a relational database.
Look at one of the NoSQL alternatives, they are designed for this sort of data.

Related

SQL - Storing values that have dynamic properties and constraints

I'm not sure if what I'm attempting to do is simply incorrect/impossible or if there is an easier way and that I'm missing the point.
I'm using SQL Server 2012
What I would like to do is have a table that can store rows with values relating to stored properties in another table. Basically, key value pair. The thing is, I would like to determine which key values can be used by which entities.
For example,
I would like one table listing various companies, another to store 'files' created for each company - this is used to store historical information, another listing various production departments(stages in production), another listing production figures(KGs, Units, etc), and one listing the actual production capture against these figures for each month. There are also tables in place to show which production departments can use which production figures as well as which company has which production departments.
Some of the companies have the same stages in production as well as additional stages that the others don't.
These figures are captured on a monthly basis ONLY, so I have a table describing all the months of a year.
Each production department may have similar types of recordings to be captured, though they don't all have the same production readings.
Here's a link to a graphical representation of the table layouts:
http://tinypic.com/r/30a51mx/8
..
My end result is to auto-populate / update the table with newly added figures as the user enters this section of the program (by passing through the FileID), and to allow the user to edit this using a datagridview (or atleast select a value to be edited from the datagridview)
I will then need to write reports later on that will need to pivot on this information.
Any help or suggestions would be greatly appreciated.
Thanks

For an effective DB design it is very important to understand, two major requirements:
Should the DB design be done keeping in mind the ease of use from application point of view or from efficient storage point of view.
This point is by large decided keeping in mind following factors:
How much data are we going to store, so we need to have some idea about cost of storage factoring redundancy. Good normalised DB reduces redundancy.
Your DB is normalised very well, but is that really needed. Typically cost of storage is very less in today's time, and so if we can think of design which is slightly more redundant, it should be OK. Unless of course you plan to use Standard version of SQL server which has its own limitation in terms of DB size.
Is the data retrieval and update slow/fast? The more normalised DB is more number JOINS are expected. In your case, if you want to return values for multiple properties, say n, in a single result, then you'd need to make n joins on the ProductionProperty table, which will essentially reduce query performance, and hence slow user experience. So unless your UI is not very demanding, and your users can live with small lag, go ahead with a normalised DB design.
ORM mismatch- Since the relational database model and object model (assuming programming language follows OOPs concept) usually mismatch and they will heavily in a normalised scenario like this; you'll need to spend more hours coding through or troubleshooting scenarios which may require you to squirm in pain when making changes to either of these models. I suggest you use a good ORM framework to counter this and be more aware of the ORM mismatch scenarios.
Will you have separate Reporting DB or Reporting tables? Basically this translates into is your DB an OLTP database or Reporting Database? If this DB is going to worked on heavily by Data entry persons day in and day out, normalised form should be suitable considering that Point #1 is satisfied. If however reporting is a major need, then de-normalised form should be preferred (which means that you do not need so many separate tables).
PS: Master data should be kept in table of its own.I can say Months definitely is a master data and so is UoM unless you plan to do CRUD on the UoM measures too. Also note that it hardly matters keeping month in separate table especially when same business logic/constraints can be enforced on columns in SQL.

How to design a Data Access Layer for a database table that may change in the future?

Introduction:
I'm refactoring (pretty much rewriting) a legacy application in my current internship. The part that this question will be concerned about is the database it uses and the way they retrieve data from it.
The database structure is:
There's a table that has the main records. Let's say each record is a measurement. It has some info about the measured material and different measurement information.
There's a table view they use that has the same information columns, plus some extra columns that contains data calculated from the given measurements. And it also filters some of the data from the table.
So let's say we have the main table with columns:
Measurement ID
Measurement A
Measurement B
The view has something like this:
Measurement ID
Measurement A
Measurement B
Some extra data (for example Measurement A * Measurement B)
The guy that is leading the development only knows some SQL, so he likes adding new columns that is calculated by some columns in the main table for experimenting. And this is definitely a need at the moment.
Requirements are:
Different types of databases should be supported (like SQL Server, Oracle, and probably some others).
The frontend should be able to show the view, which means even though some main columns will always stay the same, there may be some new columns including newly calculated values.
My question is:
What kind of system should I use to accommodate the needs of this application? I wanted to use Entity Framework, but the fact that the view may have new columns in the future is I think a problem. As far as I understand, I should map my classes to the database before compiling.
The other thing that I'm considering is maybe using Entity Framework to get data from the main table and do the calculations and the filtering that is currently done in the table view directly in the frontend, and skip the view altogether. Which sounds fine, though I don't know if they will allow me to do that.
What would you do in my case? Please take into account that I have virtually no experience with databases and ORMs.

You are correct in that using Entity Framework will be a problem if the underlying DB schema is always changing. It will require you to update the EF model on your end every time to grab those new columns.
Ideally, all of your database access is hidden behind the interface to your DAL, so that your application doesn't need to know about which ORM is being used -- if any -- or which database it's connecting to.
I hate to say it, but given your requirements, an ORM might not make sense. You might want to go with something more generic without any strong-typing. You could just simply always return a DataTable to your application layer, and it could loop through the columns and values to display whatever is returned. If there are fields you know will never change, you could create a manual mapping for those fields only into your application object(s).

You may have a look to NoSQL system that are a lot more flexible on the schema. Or have a look to document database like RavenDB. All these systems allow the schema to change dynamically. You need to check the Pro's and Con's to see if it can fill you requirements.
(This answer is a bit out of subject as it's about replacing the SQL server and not really creating a DAL, but other answers cover the subject well and I would like to propose another way that may help.)

If your schema is unstable, then using Entity Framework as a beginner is going to be a headache. The assumption is that you can just refresh the design canvas periodically to let the tool handle database table changes. You can try that for a time to see when it becomes too much of a pain, but without any prior experience using ORMs or Entity Framework it may not be worth the effort.
I would probably use something like Rob Conery's Massive ORM (https://github.com/robconery/massive). It gives you more flexibility with the underlying database schema and is a very small library. I remember it being ~300 lines of code and very easy to use. It uses C# dynamics so you'll have to be using >= C# 4.0 and be comfortable with that one concept but IMO it's worth it for the low-overhead. A full-fledged ORM like Entity Framework or NHibernate is going to cost a lot of learning cycles.
You could, of course, just stick to ADO.NET DataTables. They're a bit ugly and verbose, but they'll do the job.

You can use Entity Framework - Database First if the DB is changing. Of course, you will have to regenerate your classes when you want to be able to access new columns, when the DB schema changes.
If you need to accomodate different database servers, then you should take a look into implementing a repository pattern and abstract all your data access that way.

Your comment
it involves write operations to the main table but the main table never changes
confirms what I was hoping for. It means you can use Entity Framework as the core of you application and a different route to display data.
Suppose that for display (of the view) you use a classic DataTable (because all common grids support them, contrary to displaying dynamic objects). I don't know how create/update/delete will be done, but saving changes will at some point involve mapping a DataRow to a MainEntity object. You can write one method for that like
MainEntity DataRowToEntity(DataRow row)
{
var entity = new MainEntity();
entity.PropertyA = row["PropertyA"];
....
}
The MainEntity can be attached to a context, its status changed to Modified, and saved.

DB design when data is unknown about an entity?

I'm wondering if the following DB schema would have repercussions later. Let's say I'm writing a place entity. I'm not certain what properties of place will be stored in the DB. I'm thinking of making two tables: one to hold the required (or common) info, and one to hold additional info.
Table 1 - Place
PK PlaceId
Name
Lat
Lng
etc... (all the common fields)
Table 2 - PlaceData
PK DataId
PK FieldName
PK FK PlaceId
FieldData
Usage Scenario
I want certain visitors to have the capability of entering custom fields about a place. For example, a restaurant is a place that may have the following fields: HasParking, HasDriveThru, RequiresReservation, etc... but a car dealer is also a place, and those fields wouldn't make sense for a car dealer.
I want to support any type of place, from a single table (well, 2nd table has custom fields), because I don't know the number of types of places that will eventually be added to my site.
Overall goal
On my asp.net MVC (C#/Razor) site, where I display a place, it will show the attributes, as a unordered list populated by: SELECT * FROM PlaceData WHERE PlaceId = #0.
This way, I wouldn't need to show empty field names on the view (or do a string.IsNullOrWhitespace() check for each and every field. Which I would be forced to do if every attribute was a column on the table.
I'm assuming this scenario is quite common, but are there better ways to do it? Particularly from a performance perspective? What are the major drawbacks of this schema?

Your idea is referred to as an Entity-Attribute-Value table and is generally bad news in a RDBMS. RDBMSes are geared toward highly structured data.
The overall options are:
Model the db further in an RDBMS, which is most likely if someone is holding back specs from you.
Stick with the RDBMS, using XML columns for the data whose structure is variable. This makes the most sense if a relatively small portion of your data storage schema is semi- or un-structured. Speaking from a MS SQL Server perspective, this data can be indexed and you can perform checks that your data complies with an XML schema definition.
Move to a non-relational DB such as MongoDB, Cassandra, CouchDB, etc. This is what a lot of social sites and I suspect blog sites run with. Also, it is within reason to use a combination of RDBMS and non-relational stores if that's what your needs call for.
EAV gets to be a mess because you're creating a database within a database and lose all of the benefits a RDBMS can provide (foreign keys, data type enforcement, etc.) and the SQL code needed to reconstruct your objects goes from lasagna to fettuccine to spaghetti in the blink of an eye.
Given the information that's been added to the question, it would seem a good fit to create a PlaceDetails column of type XML in the Place table. You could also split that column into another table with a 1:1 relationship if performance requirements dictate it.
The upside to doing it that way is that you can retrieve the data using very simple SQL code, even using the xml data type's methods for searching the data. But that approach also allows you to do the more complex presentation-oriented data parsing in C#, which is better suited to that purpose than T-SQL is.

If you want your application to be able to create its own custom fields, this is a fine model. The Mantis Bugtracker uses this as well to allow Admins to add custom fields to their tickets.
If in any case, it's going to be the programmer that is going to create the field, I must agree with pst that this is more a premature optimization.

At any given time you can add new columns to the database (always watching for the third normalization rule) so you should go with what you want and only create a second table if needed or if such columns breaks any of the normal forms.

Customizeable database

What would be the best database/technique to use if I'd like to create a database that can "add", "remove" and "edit" tables and columns?
I'd like it to be scaleable and fast.
Should I use one table and four columns for this (Id, Table, Column, Type, Value) - Is there any good articles about this. Or is there any other solutions?
Maybe three tables: One that holds the tables, one that holds the columns and one for the values?
Maybe someone already has created a db for this purpose?
My requirements is that I'm using .NET (I guess the database don't have to be on windows, but I would prefer that)

Since (in comments on the question) you are aware of the pitfalls of the "inner platform effect", it is also true that this is a very common requirement - in particular to store custom user-defined columns. And indeed, most teams have needed this. Having tried various approaches, the one which I have found most successful is to keep the extra data in-line with the record - in particular, this makes it simple to obtain the data without requiring extra steps like a second complex query on an external table, and it means that all the values share things like timestamp/rowversion for concurrency.
In particular, I've found a CustomValues column (for example text or binary; typically json / xml, but could be more exotic) a very effective way to work, acting as a property-bag for the additional data. And you don't have to parse it (or indeed, SELECT it) until you know you need the extra data.
All you then need is a way to tie named keys to expected types, but you need that metadata anyway.
I will, however, stress the importance of making the data portable; don't (for example) store any specific platform-bespoke serialization (for example, BinaryFormatter for .NET) - things like xml / json are fine.
Finally, your RDBMS may also work with this column; for example, SQL Server has the xml data type that allows you to run specific queries and other operations on xml data. You must make your own decision whether that is a help or a hindrance ;p
If you also need to add tables, I wonder if you are truly using the RDBMS as an RDBMS; at that point I would consider switching from an RDBMS to a document-database such as CouchDB or Raven DB

Is there a way to not break code if columns in a database change?

Assume I have a declared a datatable and to this datatable I have assigned a result that gets returned from calling a stored procedure, so now, my datatable contains something like the following when accessing a row from it:
string name = dr["firstname"];
int age = (int)dr["age"];
if firstname is changed to first_name and age is removed, the code will obviously break because now the schema is broken, so is there a way to always keep the schema in sync with the code automatically without manually doing it? Is there some sort of meta description file that describes the columns in the database table and updates them accordingly? Is this a case where LINQ can be helpful because of its strongly typed nature?

What about good old fashioned views that select by column name, they always output the columns with the specified names in the specified order. If the table underneath needs to change, the view is modified if necessary but still outputs the same as it did before the underlying table change - just like interfaces for your objects. The application references the views instead of the tables and carries on working as normal. This comes down to standard database application design which should be taught in any (even basic) data architect course - but I rarely see these actually used in business applications. In fact, the project I'm currently working on is the first where I've seen this approach taken and it's refreshing to actually see it used properly.
Use stored procs, if your table changes, modify the stored proc so the output is still the same - used in a similar manner to shield the application from the underlying table thus insulating the application from any table changes. Not sufficient if you're looking to do dynamic joins, filters and aggregates where a view would be more appropriate.
If you want to do it application side, specify the names of the fields you're querying right in the query rather than using "select *" and relying on the field names to exist. However, if the field names on the table change, or a column is deleted, you're still stuck, you've gotta modify your query.
If the names of fields will change, but all of the fields will always exist, the content of those fields will remain the same and the fields will remain in the same order, you could reference the fields by index instead of by name.
Use an object relational mapper as others have specified, but I don't think this necessarily teaches good design rather than hopes the design of the framework is good enough and appropriate for what you're doing, which may or may not be the case. I'm not really of the opinion this is a good approach though.

About the only way to prevent this is through the use of Stored Procedures which select the columns and rename them to a standard name that is returned to your application. However, this does add another layer of maintenance to the database.

This was the reason ORM solutions such as NHibernate were created.
That or a code generator based on the database schema.

Why would you not want to change the code? If age is removed why would you want to still attempt to grab it in your code?
What Linq does is try to keep all the business logic in one location, the source code, rather than splitting between Database and Source Code.
You should change the code when the data columns are removed.

As you can perceive from all the answers given, what you are looking for doesn't exist. The reason for this is that you should remember programs are essentially data processing routines, so you can't change your data without changing something else in the program. What if it isn't the name of the column but it's type that's changing? Or what would happen if the column was deleted?
In sum, there's no good solution for such problems. Data is an integral part of the application - if it changes, expect at least some work. However, if you expect names to change (the database isn't yours, for example, and you have been informed by the owner that it's name might change in the future), and you don't want to re-deploy the application because of that, alternatives to recompiling your source code which, as stated in the other answers, include:
Use Stored Procedures
You can use stored procedures to provide data to the application. In the case of the proposed change (renaming a column), the DBA or however was in charge of the database schema should change the stored procedure as well.
Pros: No need for recompilation due to minor changes in the database
Cons: More artifacts that become now part of the application design, application understanding is blurred.
Use a Mapping File
You can create a mapping file that gives you the name that your application expects a certain column to have and the actual name the column has. Such are very inexpensive and easy.
Pros: No need for recompilation due to minor changes in the database
Cons: Extra entity (class) in your design, application understanding is blurred, you need to re-deploy the mapping file on change.
Use column position instead of column name
Instead of referencing the name of the column, use a positional argument (dr[1]).
Pros: Keeps you safe from name changes.
Cons: Everything else. If you table changes to accommodate more data (new column) there's a chance the numbering of columns will also change, if any of the columns is deleted you also will have a numbering problem, etc.
But a suggestion. Instead of accessing the column direct through a literal, use constants with some good naming standard. So
string name = dr["firstname"];
int age = (int)dr["age"];
Becomes
private const string CUSTOMER_COLUMN_FIRST_NAME = "firstname"
private const string CUSTOMER_COLUMN_AGE = "AGE"
string name = dr[CUSTOMER_COLUMN_FIRST_NAME];
int age = (int)dr[CUSTOMER_COLUMN_AGE];
This doesn't solves your problem, but it enables you to add better meaning to the code (even if you decide to abbreviate the constant's name) and make changing the name more easily, since it's centralized. And, if you want, Visual Studio can generate a class (inherited from DataTable) that statically defines your database rows, which also make code semantics more clear.

Apparently you have to introduce another layer of abstraction between your database and your applicatoin. Yes, this layer can be Linq2Sql, Entity Framework, NHibernate or any other ORM (object relation mapping) framework.
Now about that 'automatically'... maybe this kind of small change (renaming a column) can be handled automatically by some tools/framework. But I don't think that any framework can guarantee proper handling of changes automatically. It many cases you will have to manually do the "mapping" between your database and that new layer, so that you can keep the rest of your application unaffected.

Yes, Use Stored procedures for all access, and alias the actual attribute names in the table for output to the client code... Then if actual column names in the table change, you just have to change the sql in the stored proc, and leave the aliases the same as they were, and the client code can stay the same

Is there a way to not break code if columns in a database change?
No
It is a very, very good thing that you can't do this (completely) automatically.
If the database changes such that an application feature is no longer valid, you don't want the application to continue to expose the feature if the database no longer supports the feature.
In the best case, you want database changes to cause your code to no longer compile, so you can catch the problems at compile time rather than run time. Linq will help you catch these kinds of issues at compile time and there are many other ways to increase the agility of your code base such that database changes can be somewhat quickly propagated through the entire code base. Yes, ORMs can help with this. While views and stored procedures may make the problem better, they may also make it worse by increasing the complexity and amount of code that needs to react to changes to columns in tables.
Using code generation of some sort to generate (at least some part of) your data layer is your best bet to getting compile time errors when your application and database get out of sync. You should probably also have unit tests around your data layer to detect as many run-time type inconsistencies as possible when it's difficult to find the errors at compile time (for example, things like size constraints on columns).

It won't help when "age" is removed, but if you know that the columns will always be returned in the same order - even if the names change, then you could reference them by column name, instead, like:
string name = dr[0];
int age = (int)dr[1];
Depending on your DB version, you could also check out a Data Access generator such as SubSonic.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.