What would be the best database/technique to use if I'd like to create a database that can "add", "remove" and "edit" tables and columns?
I'd like it to be scaleable and fast.
Should I use one table and four columns for this (Id, Table, Column, Type, Value) - Is there any good articles about this. Or is there any other solutions?
Maybe three tables: One that holds the tables, one that holds the columns and one for the values?
Maybe someone already has created a db for this purpose?
My requirements is that I'm using .NET (I guess the database don't have to be on windows, but I would prefer that)
Since (in comments on the question) you are aware of the pitfalls of the "inner platform effect", it is also true that this is a very common requirement - in particular to store custom user-defined columns. And indeed, most teams have needed this. Having tried various approaches, the one which I have found most successful is to keep the extra data in-line with the record - in particular, this makes it simple to obtain the data without requiring extra steps like a second complex query on an external table, and it means that all the values share things like timestamp/rowversion for concurrency.
In particular, I've found a CustomValues column (for example text or binary; typically json / xml, but could be more exotic) a very effective way to work, acting as a property-bag for the additional data. And you don't have to parse it (or indeed, SELECT it) until you know you need the extra data.
All you then need is a way to tie named keys to expected types, but you need that metadata anyway.
I will, however, stress the importance of making the data portable; don't (for example) store any specific platform-bespoke serialization (for example, BinaryFormatter for .NET) - things like xml / json are fine.
Finally, your RDBMS may also work with this column; for example, SQL Server has the xml data type that allows you to run specific queries and other operations on xml data. You must make your own decision whether that is a help or a hindrance ;p
If you also need to add tables, I wonder if you are truly using the RDBMS as an RDBMS; at that point I would consider switching from an RDBMS to a document-database such as CouchDB or Raven DB
Related
Introduction:
I'm refactoring (pretty much rewriting) a legacy application in my current internship. The part that this question will be concerned about is the database it uses and the way they retrieve data from it.
The database structure is:
There's a table that has the main records. Let's say each record is a measurement. It has some info about the measured material and different measurement information.
There's a table view they use that has the same information columns, plus some extra columns that contains data calculated from the given measurements. And it also filters some of the data from the table.
So let's say we have the main table with columns:
Measurement ID
Measurement A
Measurement B
The view has something like this:
Measurement ID
Measurement A
Measurement B
Some extra data (for example Measurement A * Measurement B)
The guy that is leading the development only knows some SQL, so he likes adding new columns that is calculated by some columns in the main table for experimenting. And this is definitely a need at the moment.
Requirements are:
Different types of databases should be supported (like SQL Server, Oracle, and probably some others).
The frontend should be able to show the view, which means even though some main columns will always stay the same, there may be some new columns including newly calculated values.
My question is:
What kind of system should I use to accommodate the needs of this application? I wanted to use Entity Framework, but the fact that the view may have new columns in the future is I think a problem. As far as I understand, I should map my classes to the database before compiling.
The other thing that I'm considering is maybe using Entity Framework to get data from the main table and do the calculations and the filtering that is currently done in the table view directly in the frontend, and skip the view altogether. Which sounds fine, though I don't know if they will allow me to do that.
What would you do in my case? Please take into account that I have virtually no experience with databases and ORMs.
You are correct in that using Entity Framework will be a problem if the underlying DB schema is always changing. It will require you to update the EF model on your end every time to grab those new columns.
Ideally, all of your database access is hidden behind the interface to your DAL, so that your application doesn't need to know about which ORM is being used -- if any -- or which database it's connecting to.
I hate to say it, but given your requirements, an ORM might not make sense. You might want to go with something more generic without any strong-typing. You could just simply always return a DataTable to your application layer, and it could loop through the columns and values to display whatever is returned. If there are fields you know will never change, you could create a manual mapping for those fields only into your application object(s).
You may have a look to NoSQL system that are a lot more flexible on the schema. Or have a look to document database like RavenDB. All these systems allow the schema to change dynamically. You need to check the Pro's and Con's to see if it can fill you requirements.
(This answer is a bit out of subject as it's about replacing the SQL server and not really creating a DAL, but other answers cover the subject well and I would like to propose another way that may help.)
If your schema is unstable, then using Entity Framework as a beginner is going to be a headache. The assumption is that you can just refresh the design canvas periodically to let the tool handle database table changes. You can try that for a time to see when it becomes too much of a pain, but without any prior experience using ORMs or Entity Framework it may not be worth the effort.
I would probably use something like Rob Conery's Massive ORM (https://github.com/robconery/massive). It gives you more flexibility with the underlying database schema and is a very small library. I remember it being ~300 lines of code and very easy to use. It uses C# dynamics so you'll have to be using >= C# 4.0 and be comfortable with that one concept but IMO it's worth it for the low-overhead. A full-fledged ORM like Entity Framework or NHibernate is going to cost a lot of learning cycles.
You could, of course, just stick to ADO.NET DataTables. They're a bit ugly and verbose, but they'll do the job.
You can use Entity Framework - Database First if the DB is changing. Of course, you will have to regenerate your classes when you want to be able to access new columns, when the DB schema changes.
If you need to accomodate different database servers, then you should take a look into implementing a repository pattern and abstract all your data access that way.
Your comment
it involves write operations to the main table but the main table never changes
confirms what I was hoping for. It means you can use Entity Framework as the core of you application and a different route to display data.
Suppose that for display (of the view) you use a classic DataTable (because all common grids support them, contrary to displaying dynamic objects). I don't know how create/update/delete will be done, but saving changes will at some point involve mapping a DataRow to a MainEntity object. You can write one method for that like
MainEntity DataRowToEntity(DataRow row)
{
var entity = new MainEntity();
entity.PropertyA = row["PropertyA"];
....
}
The MainEntity can be attached to a context, its status changed to Modified, and saved.
I'm going to code a housekeeping book
So I create properties in code like Name, Category and some other need to create at run-time.
So how should I save that human-readable in a SQL Server database?
My suggestion is to create a table called Properties with 2 columns (Id, Name) and in that table I can store all my properties but it wouldn't anymore human-readable
I also not sure if it will be wise to create a column for each property in one big table
I could also create a XML "file" and store this in my DB but i don't thing this is a good idea either
Any advice is greatly appreciated
There are basically three approached to this
A column for every value
The one you are suggesting which is called an Entity Attribute Value model
Or the one you discounted which would be xml (or serialised objects)
They all have pros and cons, and some of the cons can get quite severe.
A column for every value means you have to change your db and model every time you want to store more data, which makes it very fragile and high maintenance.
EAV can easily lead to the queries becoming huge joins, and imposing data integrity on it is a hiding to nothing.
Object based can also lead to significant optimisation and maintenance issues, having to open every object to see if something is in it, for instance.
Now any one of these might be the best of a bad lot at the time you make the decision (they are all fragile in one respect or another), IF you insist on using a relational database.
Look at one of the NoSQL alternatives, they are designed for this sort of data.
We have a requirement on our project for custom fields. We have some standard fields on the table and each customer wants to be able to add their own custom fields. At the moment I am not interested in how this will work in the UI, but I want to know what the options are for the back end storage and retrieval of the data. The last time I did something like this was about 10 years ago in VB6 so I would be interested to know what the options are for this problem in today's .Net world.
The project is using SQL server for the backend, linq-to-sql for the ORM and a C# asp.net front end.
What are my options for this?
Thanks
There are four main options here:
actually change the schema (DDL) at runtime - however, pretty much no ORM will like that, and generally has security problems as your "app" account shouldn't normally be redefining the database; it does, however, avoid the "inner platform" effect inherent in the next two
use a key-value store as rows, i.e. a Customer table might have a CustomerValues table with pairs like "dfeeNumber"=12345 (one row per custom key/value pair) - but a pain to work with (instead of a "get", this is a "get" and a "list" per entity)
use a single hunk of data (xml, json, etc) in a CustomFields single cell - again, not ideal to work with, but it easier to store atomically with the main record (downside: forces you to load all the custom fields to read a single one)
use a document database (no schema at all) - but then: no ORM
I've used all 4 at different points. All 4 can work. YMMV.
I have a similar situation on the project I'm working on now.
Forget about linq-to-sql when you are having a flexible database schema. There is no way to update the linq-to-sql models on the fly when the DB schema changes.
Solutions:
Keep an extra table with the table name the values belong to , column name , value etc
Totally dynamically change your table schema each time they add a field.
Use a NOSQL solution like mongoDB or the Azure Table Storage. A NOSQL solution doesn't require a schema and can be changed on the fly.
This is a handy link 2 read:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056
You're referring to an EAV model (entity-attribute-value).
Here's an article: http://hanssens.org/post/Generic-Entity-Attribute-Value-Model-e28093-A-POCO-Implementation.aspx
I'm trying to figure out which is the the "correct" way to do this. I have a bunch of lookup tables in my database and would like to place an enum on top of those values so, when coding, it's easier to read (as well as not use hard-coded values).
I'm wondering if I should generate my table values based on an existing enumeration or if I should generate my enumeration from my table's values.
EDIT
Based on the first couple of comments, here are some clarifications:
Frequency of changes to the values could be rather frequent as they are intended to be rather dynamic. That being said, a compile will be necessary before adding any of these either way, because the enumeration needs to be updated to expose the new values.
The main reason for this need is because we don't want to tie people down to a specific list of values, we would like the applications to have the ability to add new entries as and when they need to.
In the past, we have generated the data from enumerations, but I'm second guessing myself
We usually generate enums from the database. We use CodeSmith, which allows us to create project files that can easily regenerate the enums as needed.
We've gone the other way occasionally, usually for reporting purposes (when existing enum values are persisted).
And, of course, we have enums whose values are never persisted.
In general the only reason to generate enums from the database is if code needs to make decisions based on them. If you just want to populate a ComboBox and persist the user's choice, don't generate an enum.
Obviously making decisions based on enums (or strings) whose values can change is fragile. You may want to consider including expiration dates (or "from" and "through" dates) in your database schema, so that existing values are not deleted. Filter expired values when populating UI selectors. This also makes it easier to have referential integrity.
As always in C#, you have to be aware that enum values may fall outside of the expected range. Include a default on your switch.
We came up with helper classes for creating cached lookup lists that make these easier to use.
I'm not advocating going down this route. If you have to, this is how we did it.
There's also a third option in that you have a explicit model which describes the schema in the level of detail you require and then you generate both data & schema from that model.
Regarding your question I think what you should do is thinking about the problem in your context and list pros/cons for you with each alternative and decide on what makes most sense for you and your business.
I have worked worked with all three strategies for different applications, the one I personally prefer is having an explicit model buts depending on the context.
Sorry for being fuzzy but I think for these kind of questions there's real golden rule which always applies in all cases.
I'm wondering if the following DB schema would have repercussions later. Let's say I'm writing a place entity. I'm not certain what properties of place will be stored in the DB. I'm thinking of making two tables: one to hold the required (or common) info, and one to hold additional info.
Table 1 - Place
PK PlaceId
Name
Lat
Lng
etc... (all the common fields)
Table 2 - PlaceData
PK DataId
PK FieldName
PK FK PlaceId
FieldData
Usage Scenario
I want certain visitors to have the capability of entering custom fields about a place. For example, a restaurant is a place that may have the following fields: HasParking, HasDriveThru, RequiresReservation, etc... but a car dealer is also a place, and those fields wouldn't make sense for a car dealer.
I want to support any type of place, from a single table (well, 2nd table has custom fields), because I don't know the number of types of places that will eventually be added to my site.
Overall goal
On my asp.net MVC (C#/Razor) site, where I display a place, it will show the attributes, as a unordered list populated by: SELECT * FROM PlaceData WHERE PlaceId = #0.
This way, I wouldn't need to show empty field names on the view (or do a string.IsNullOrWhitespace() check for each and every field. Which I would be forced to do if every attribute was a column on the table.
I'm assuming this scenario is quite common, but are there better ways to do it? Particularly from a performance perspective? What are the major drawbacks of this schema?
Your idea is referred to as an Entity-Attribute-Value table and is generally bad news in a RDBMS. RDBMSes are geared toward highly structured data.
The overall options are:
Model the db further in an RDBMS, which is most likely if someone is holding back specs from you.
Stick with the RDBMS, using XML columns for the data whose structure is variable. This makes the most sense if a relatively small portion of your data storage schema is semi- or un-structured. Speaking from a MS SQL Server perspective, this data can be indexed and you can perform checks that your data complies with an XML schema definition.
Move to a non-relational DB such as MongoDB, Cassandra, CouchDB, etc. This is what a lot of social sites and I suspect blog sites run with. Also, it is within reason to use a combination of RDBMS and non-relational stores if that's what your needs call for.
EAV gets to be a mess because you're creating a database within a database and lose all of the benefits a RDBMS can provide (foreign keys, data type enforcement, etc.) and the SQL code needed to reconstruct your objects goes from lasagna to fettuccine to spaghetti in the blink of an eye.
Given the information that's been added to the question, it would seem a good fit to create a PlaceDetails column of type XML in the Place table. You could also split that column into another table with a 1:1 relationship if performance requirements dictate it.
The upside to doing it that way is that you can retrieve the data using very simple SQL code, even using the xml data type's methods for searching the data. But that approach also allows you to do the more complex presentation-oriented data parsing in C#, which is better suited to that purpose than T-SQL is.
If you want your application to be able to create its own custom fields, this is a fine model. The Mantis Bugtracker uses this as well to allow Admins to add custom fields to their tickets.
If in any case, it's going to be the programmer that is going to create the field, I must agree with pst that this is more a premature optimization.
At any given time you can add new columns to the database (always watching for the third normalization rule) so you should go with what you want and only create a second table if needed or if such columns breaks any of the normal forms.