SQL - Storing values that have dynamic properties and constraints - c#

I'm not sure if what I'm attempting to do is simply incorrect/impossible or if there is an easier way and that I'm missing the point.
I'm using SQL Server 2012
What I would like to do is have a table that can store rows with values relating to stored properties in another table. Basically, key value pair. The thing is, I would like to determine which key values can be used by which entities.
For example,
I would like one table listing various companies, another to store 'files' created for each company - this is used to store historical information, another listing various production departments(stages in production), another listing production figures(KGs, Units, etc), and one listing the actual production capture against these figures for each month. There are also tables in place to show which production departments can use which production figures as well as which company has which production departments.
Some of the companies have the same stages in production as well as additional stages that the others don't.
These figures are captured on a monthly basis ONLY, so I have a table describing all the months of a year.
Each production department may have similar types of recordings to be captured, though they don't all have the same production readings.
Here's a link to a graphical representation of the table layouts:
http://tinypic.com/r/30a51mx/8
..
My end result is to auto-populate / update the table with newly added figures as the user enters this section of the program (by passing through the FileID), and to allow the user to edit this using a datagridview (or atleast select a value to be edited from the datagridview)
I will then need to write reports later on that will need to pivot on this information.
Any help or suggestions would be greatly appreciated.
Thanks

For an effective DB design it is very important to understand, two major requirements:
Should the DB design be done keeping in mind the ease of use from application point of view or from efficient storage point of view.
This point is by large decided keeping in mind following factors:
How much data are we going to store, so we need to have some idea about cost of storage factoring redundancy. Good normalised DB reduces redundancy.
Your DB is normalised very well, but is that really needed. Typically cost of storage is very less in today's time, and so if we can think of design which is slightly more redundant, it should be OK. Unless of course you plan to use Standard version of SQL server which has its own limitation in terms of DB size.
Is the data retrieval and update slow/fast? The more normalised DB is more number JOINS are expected. In your case, if you want to return values for multiple properties, say n, in a single result, then you'd need to make n joins on the ProductionProperty table, which will essentially reduce query performance, and hence slow user experience. So unless your UI is not very demanding, and your users can live with small lag, go ahead with a normalised DB design.
ORM mismatch- Since the relational database model and object model (assuming programming language follows OOPs concept) usually mismatch and they will heavily in a normalised scenario like this; you'll need to spend more hours coding through or troubleshooting scenarios which may require you to squirm in pain when making changes to either of these models. I suggest you use a good ORM framework to counter this and be more aware of the ORM mismatch scenarios.
Will you have separate Reporting DB or Reporting tables? Basically this translates into is your DB an OLTP database or Reporting Database? If this DB is going to worked on heavily by Data entry persons day in and day out, normalised form should be suitable considering that Point #1 is satisfied. If however reporting is a major need, then de-normalised form should be preferred (which means that you do not need so many separate tables).
PS: Master data should be kept in table of its own.I can say Months definitely is a master data and so is UoM unless you plan to do CRUD on the UoM measures too. Also note that it hardly matters keeping month in separate table especially when same business logic/constraints can be enforced on columns in SQL.

Related

How to model data using MongoDB

We have a relative large scale application that uses relational DB (MSSQL).
After a lot of reading I've decided that I want to examine using MongoDB and not MSSQL, mainly because performance and scale issues.
I read and study about Mongo and couldn't figure out the answer for the following questions:
Should we do it? Bare in mind we have the time to invest, the only question is "is it good for us?"
How to model our data?
My problem with mongo is that we have a lot of one to many relations in our DB.
After reading this great post (and the second part as well), I've realized a good practice will be to divide the decision into 3 scenarios:
1 to few
1 to many
1 to squillions.
In our db, most of the times we use one-to-many, but the problem is that most of the times it's the same "one".
For example, we have users and transactions tables.
Each user can perform a transaction, so basically what I should do is to model the user as following:
{
"name": "John",
...,
"Transactions" : [ObjectId("..."), ObjectId("..."),...]
}
So far it's fine, the problem is that we have a lot more than just transactions, for example we could have: posts, requests and many more features like transactions, and then, my users collection becomes huge (more then 25 "columns"). And also when I want to retrieve a data set I have to do several queries unlike MSSQL in which I'm just using Join statement.
Another issue is that I'll have to save a lot of extra data, for example, for each transaction I have to save the terminal ID, and in the report I'll have to show the terminal name, in that case (as for my understanding) I have 2 choices, the one is to do 2 queries and the other is to save the terminal name as well. In relational DB this is a simple join.
So maybe for schemes like ours, Mongo(or any other document based DB) is not the best choice?
I know those are a newbie questions :)
We use c# for our server side (ASP.Net Web API)
Thanks in advance!
You can face with some serious issues while modeling your data with 2 and 3 approaches:
For One to many you may face with data inconsistency or/and eventual consistency. Here, you store inside document an index (array of references) to external documents. So, for your example to add a new transaction you need two requests: create a transaction and add its reference to a user (update document). Mongo DB has ACID transactions only on document level, so for your case application for some reason can create a transaction but doesn’t add its reference to user. It can be app failures, network problems, bugs and so on. Of course, you can simulate db transaction in app with try/catch block making data cleanup when an error occurs. It will help but not in fully because app can fall down between requests.
So, if your app is high loaded after some time you can have some number of “dad” transactions which are not linked to any user. It couldn’t be a big problem if your app doesn’t query transactions directly – only via users, you will have only useless data in db. Otherwise you will have data inconsistency.
To fix that you need to create background job which will make proper cleanup. So, some period of time your data can be inconsistent – eventual consistency. For some applications, it can be ok, for another – not.
The same problem you can face while deleting transactions.
I agree, that a document with 25 arrays of references (columns) looks not very good. Working with such objects manually will be harder (testing, manual data fixes and so on.
One to squillions doesn’t have this affect but you need indexes to query efficiently. For large and shared db you can have bad performance.
In general, I’d like to say document dbs are pretty good if your app works mostly with one document (aggregate) and don’t have a lot of references to another docs and you don’t need transactions between docs. Denormalization can also be a source of inconsistency.
Key-value data is very easy to scale. Document dbs – it’s one step closer to key-value data-store. Column-oriented dbs are even more closed to key-value and so they can be scaled even better.
Also, I recommend you to consider the next measures to improve your SQL Server db performance:
Caching – perhaps you can cache some your app aggregates instead of gathering (making joins) them in SQL db all the time. For instance, Stack Overflow uses SQL Server db and Redis for caching aggregates (questions with answers, comments and so on).
Tune query performance within indexes, db structure, demoralization and so on.
If your db is hosted in on premise SQL Server then additional memory, SSD disk, table partitioning, data compressions, replication can help. As a rule, SQL Server gives a good performance with these approaches for dbs up to 1 TB.
CQRS approach.
Consider storing your app data in different databases. Every type of dbs has its own strong and weak sides. Document DB is good for storing aggregates, SQL db – for relational data and so on. Complex apps as a rule use a few db types.

Should you have one-database-to-rule-them-all setup or separated database for each bounded context?

In DDD, as far as I understand it, it helps or guides you on how to structure complex application. Now in an application, you should identify your Bounded Context. Say you have more than 10 BCs.
I read somewhere (forgive me I cannot give any links), that its not ideal to have 1-big database for a complex application. That it should be separated for each BC. If that's the easier route to take. How should one structure an app if each BC have their own database.
I tried searching on github but cannot find one.
It depends if they only share the same database or also some tables - i.e. data.
Sharing a database but not tables can be perfectly fine. Except if you aim for scalability and intend to make your BC's independently deployable and runnable units like microservices, in which case they should probably have their own data store instance.
I see a few more drawbacks to database tables shared by 2 or more Bounded Contexts :
Tight coupling. The reason we have distinct BC's is that they represent different domain spaces that are likely to diverge their own way. Changing a concept in one of the BC's might impact the underlying table, forcing the other BC's that use this table to change as well. You get rigidity where there should be suppleness. You might also have inconsistencies or "holes" in the data due to the multiple possible sources of change.
Concurrency. In highly concurrent systems, some entities and the tables underneath are subject to strong contention. Bounded Contexts are one of the ways to lighten the load by separating different types of writes, but that only works if they don't lock the same data at the end of the day. Same is true for reads in non-CQRS systems where they query the same database where writes are done.
ORM friendliness. Most ORMs won't allow you to map to 2 or more classes from the same database table without a lot of convolutions and workarounds.
How should one structure an app if each BC have their own database.
To some extent (e.g. that may include the UI layer or not), just as if you had multiple separate applications. Please be more specific if you have precise questions in mind.
The idea of having this vertical slice per bounded-context is so the relationship of each BC to every other BC and the communication between them should be considered and designed based on the domain knowledge and not on the technical merits of a persistence technology.
If you have a Customer in 2 different BCs it causes a kind-of actor pattern situation. If the Support BC needs to know about the new Customer when it is created in the Sales BC, then the Sales BC needs to connect up to a known interface on the Support BC and pass it this new information. One domain talking to another. It models quite closely how things work in real life when people from different departments talk to each other.
If you share a big database (you're talking bespoke enterprise software here so there won't be many examples in the wild) then the temptation is to bypass all the domain expertise that is captured in the domain layers and meddle in another BC's database. Things become a big ball of mud very quickly.
Surprisingly I see this sort of thing too often in the real world and I consider it very bad practice.
It depends a littlebit on the reason why they are their own database. THe idea of a bounded context is that you have a set of entities that are related together and solve a problem together. if you look at the link Chaim Eliyah provided you can have a sales and a support context.http://martinfowler.com/bliki/BoundedContext.html
Now there is no reason a product for sales,and a product for support should look the same in a database. What is important is that if support wants to add a property (say "Low quality") that it can do so while sales might not want that property. Also downtime on your sales application should probably not affect your support application.
That said entities don't care where they are stored. If you already have a huge product database you can certainly build your entities for different bounded context based on the same database. The thing to remember is that database table is not the same as entity. Entities is what your business/application needs. Database is just what's needed to store things.
That said, separate if you can. If that's not feasable try to define ownerships. You make your life a lot easier if everyone agrees that product is the product as defined by sales and that support can have a "productfactsheetTable" augmenting the product. That way you avoid conflicting changes from each bounded context. (also a followup is that support can only read products but never write). Table prefixes might help here to make this clear.
And this problem already exists with 2 related bounded context. By 10 you'll have a nightmare if multiple context try to write to the same table.

How to design a Data Access Layer for a database table that may change in the future?

Introduction:
I'm refactoring (pretty much rewriting) a legacy application in my current internship. The part that this question will be concerned about is the database it uses and the way they retrieve data from it.
The database structure is:
There's a table that has the main records. Let's say each record is a measurement. It has some info about the measured material and different measurement information.
There's a table view they use that has the same information columns, plus some extra columns that contains data calculated from the given measurements. And it also filters some of the data from the table.
So let's say we have the main table with columns:
Measurement ID
Measurement A
Measurement B
The view has something like this:
Measurement ID
Measurement A
Measurement B
Some extra data (for example Measurement A * Measurement B)
The guy that is leading the development only knows some SQL, so he likes adding new columns that is calculated by some columns in the main table for experimenting. And this is definitely a need at the moment.
Requirements are:
Different types of databases should be supported (like SQL Server, Oracle, and probably some others).
The frontend should be able to show the view, which means even though some main columns will always stay the same, there may be some new columns including newly calculated values.
My question is:
What kind of system should I use to accommodate the needs of this application? I wanted to use Entity Framework, but the fact that the view may have new columns in the future is I think a problem. As far as I understand, I should map my classes to the database before compiling.
The other thing that I'm considering is maybe using Entity Framework to get data from the main table and do the calculations and the filtering that is currently done in the table view directly in the frontend, and skip the view altogether. Which sounds fine, though I don't know if they will allow me to do that.
What would you do in my case? Please take into account that I have virtually no experience with databases and ORMs.
You are correct in that using Entity Framework will be a problem if the underlying DB schema is always changing. It will require you to update the EF model on your end every time to grab those new columns.
Ideally, all of your database access is hidden behind the interface to your DAL, so that your application doesn't need to know about which ORM is being used -- if any -- or which database it's connecting to.
I hate to say it, but given your requirements, an ORM might not make sense. You might want to go with something more generic without any strong-typing. You could just simply always return a DataTable to your application layer, and it could loop through the columns and values to display whatever is returned. If there are fields you know will never change, you could create a manual mapping for those fields only into your application object(s).
You may have a look to NoSQL system that are a lot more flexible on the schema. Or have a look to document database like RavenDB. All these systems allow the schema to change dynamically. You need to check the Pro's and Con's to see if it can fill you requirements.
(This answer is a bit out of subject as it's about replacing the SQL server and not really creating a DAL, but other answers cover the subject well and I would like to propose another way that may help.)
If your schema is unstable, then using Entity Framework as a beginner is going to be a headache. The assumption is that you can just refresh the design canvas periodically to let the tool handle database table changes. You can try that for a time to see when it becomes too much of a pain, but without any prior experience using ORMs or Entity Framework it may not be worth the effort.
I would probably use something like Rob Conery's Massive ORM (https://github.com/robconery/massive). It gives you more flexibility with the underlying database schema and is a very small library. I remember it being ~300 lines of code and very easy to use. It uses C# dynamics so you'll have to be using >= C# 4.0 and be comfortable with that one concept but IMO it's worth it for the low-overhead. A full-fledged ORM like Entity Framework or NHibernate is going to cost a lot of learning cycles.
You could, of course, just stick to ADO.NET DataTables. They're a bit ugly and verbose, but they'll do the job.
You can use Entity Framework - Database First if the DB is changing. Of course, you will have to regenerate your classes when you want to be able to access new columns, when the DB schema changes.
If you need to accomodate different database servers, then you should take a look into implementing a repository pattern and abstract all your data access that way.
Your comment
it involves write operations to the main table but the main table never changes
confirms what I was hoping for. It means you can use Entity Framework as the core of you application and a different route to display data.
Suppose that for display (of the view) you use a classic DataTable (because all common grids support them, contrary to displaying dynamic objects). I don't know how create/update/delete will be done, but saving changes will at some point involve mapping a DataRow to a MainEntity object. You can write one method for that like
MainEntity DataRowToEntity(DataRow row)
{
var entity = new MainEntity();
entity.PropertyA = row["PropertyA"];
....
}
The MainEntity can be attached to a context, its status changed to Modified, and saved.

Creating a clear abstraction layer over a convoluted and large SQL database

Almost all of the applications I write at work get their data from a central MSSQL database. This database has about 70 tables, and on average I'd say 25 or so columns per table. The database has developed over 5-10 years (I'm not entirely sure) and is full of idiosyncrasies and quirks. Foreign keys are irregularly implemented when it comes to naming and so on, as well as case and language mixing in table and column names.
I am not able to restructure the database itself as it would break a ton of backwards compatibility for applications needed in the daily work of most people in the office.
I've almost exclusively been using LINQ2SQL for interacting with the database and it works fine, but always requires a lot of manual joining of tables, either in some db repository or 'inline' when coding. So I've finally decided that I have to do something to once and for all ease the pain of working with this leviathan. This would preferably include implementing a clear naming scheme, joining relevant tables with foreign keys properly once and for all etc.
The three routes I can see are:
Creating a number of views, stored procedures and functions in the SQL to ease up my interaction with the DB. This obviously has the bonus of being usable in many languages, as opposed to a solution implemented in e.g. C#. The biggest drawback I can see here is that it would probably take a lot of time to do this properly, as well as being a bit harder to service a year down the road when I haven't looked at the SQL queries for a while. I would also need to implement another DB abstraction step inside my applications as I wouldn't want to work with just straight up DB calls (abstraction upon abstraction seems bad in this case, but maybe I'm wrong?)
Continuing on my LINQ2SQL road, but creating a once-and-for-all repository class that hides all the underlying tables in abstracted calls only. This idea seems more feasible in terms of development time, maintenance and single-point-abstraction.
Pulling off some EF4 reverse-engineering magic, using the designer to hook up relevant foreign keys and renaming table classes to fit my taste.
Any input on how this should/could be done, as well as any recommended reading you might have, would be most appreciated.
We have a very similar situation with our database. We went the EF route, but we used Code First. I know it sounds weird to use Code First when your database already exists, but due to the size of the tables and the number of tables, trying to do it all in the designer was not feasible.
You can use the "Reverse Engineer Code First" option in Entity Framework Power Tools to generate everything you need from your database.
I think that well thought out abstraction layer is better suits the needs of application if it is not based on physical schema of DB. I mean - the main goal of DAL is to hide tables from users leaving to them only valid "activities" thru stored procedures. In most cases this will outperform the direct data access and gives to you one more degree of freedom - to play with TSQL code and to implement additional logic/schema changes without needing to change the application.

DB design when data is unknown about an entity?

I'm wondering if the following DB schema would have repercussions later. Let's say I'm writing a place entity. I'm not certain what properties of place will be stored in the DB. I'm thinking of making two tables: one to hold the required (or common) info, and one to hold additional info.
Table 1 - Place
PK PlaceId
Name
Lat
Lng
etc... (all the common fields)
Table 2 - PlaceData
PK DataId
PK FieldName
PK FK PlaceId
FieldData
Usage Scenario
I want certain visitors to have the capability of entering custom fields about a place. For example, a restaurant is a place that may have the following fields: HasParking, HasDriveThru, RequiresReservation, etc... but a car dealer is also a place, and those fields wouldn't make sense for a car dealer.
I want to support any type of place, from a single table (well, 2nd table has custom fields), because I don't know the number of types of places that will eventually be added to my site.
Overall goal
On my asp.net MVC (C#/Razor) site, where I display a place, it will show the attributes, as a unordered list populated by: SELECT * FROM PlaceData WHERE PlaceId = #0.
This way, I wouldn't need to show empty field names on the view (or do a string.IsNullOrWhitespace() check for each and every field. Which I would be forced to do if every attribute was a column on the table.
I'm assuming this scenario is quite common, but are there better ways to do it? Particularly from a performance perspective? What are the major drawbacks of this schema?
Your idea is referred to as an Entity-Attribute-Value table and is generally bad news in a RDBMS. RDBMSes are geared toward highly structured data.
The overall options are:
Model the db further in an RDBMS, which is most likely if someone is holding back specs from you.
Stick with the RDBMS, using XML columns for the data whose structure is variable. This makes the most sense if a relatively small portion of your data storage schema is semi- or un-structured. Speaking from a MS SQL Server perspective, this data can be indexed and you can perform checks that your data complies with an XML schema definition.
Move to a non-relational DB such as MongoDB, Cassandra, CouchDB, etc. This is what a lot of social sites and I suspect blog sites run with. Also, it is within reason to use a combination of RDBMS and non-relational stores if that's what your needs call for.
EAV gets to be a mess because you're creating a database within a database and lose all of the benefits a RDBMS can provide (foreign keys, data type enforcement, etc.) and the SQL code needed to reconstruct your objects goes from lasagna to fettuccine to spaghetti in the blink of an eye.
Given the information that's been added to the question, it would seem a good fit to create a PlaceDetails column of type XML in the Place table. You could also split that column into another table with a 1:1 relationship if performance requirements dictate it.
The upside to doing it that way is that you can retrieve the data using very simple SQL code, even using the xml data type's methods for searching the data. But that approach also allows you to do the more complex presentation-oriented data parsing in C#, which is better suited to that purpose than T-SQL is.
If you want your application to be able to create its own custom fields, this is a fine model. The Mantis Bugtracker uses this as well to allow Admins to add custom fields to their tickets.
If in any case, it's going to be the programmer that is going to create the field, I must agree with pst that this is more a premature optimization.
At any given time you can add new columns to the database (always watching for the third normalization rule) so you should go with what you want and only create a second table if needed or if such columns breaks any of the normal forms.

Categories