Sorry if this has been asked elsewhere, but I couldn't find a clear answer anywhere.
I have decided to begin learning to use relational databases a bit more, namely SQL. This is a major beginners question but its probably essential to get started on.
I'm basically a little confused the best practice on how to utilize SQL (or other). At college i have accessed databases (using JSON strings) for things such as mobile apps, but i have never actually designed and built a database myself, as my tutor made the mentioned database for us to access himself.
Lets say I have a C# application that holds genealogy information (i.e. families and their members) and i wanted to store each individual on a database. Would I, simply use the structure I already have but save to fields in a database instead of an xml or text document? Or does it work the other way, i.e. do I create a database with required fields then just retrieve this from the database in a c# application and manipulate the data as I so wish, so the application would be entirely different (so the c# application basically doesn't really hold/store any data and just works on whats fed from the database)?
Whats troubling me is that usually where I would store my c# objects in a dictionary or list for example, would i instead just retrieve straight from the database? or retrieve from the and store the data into a normal structure and work from there (surely this would defeat the point of fast-searching from a database)?
I may be over-thinking it slightly. Hope that makes sense. Thanks in advance
Would I, simply use the structure I already...
or
do I create a database with required fields...
I think that is the crux of your question.
Starting from the database
For me, when building an application that uses a backend database, an Entity-Relationship diagram is pretty crucial. I found quite a nice little tutorial for you here: http://www.sum-it.nl/cursus/dbdesign/english/index.php3 but you can easily find one that suits your learning style. The key point is that you are trying to model the problem domain (the real world out there that needs your application) in a way that your application can somehow capture. Once you have an E-R diagram of related tables, it is easier to figure out the details. Using SQL Management Studio for SQL Server 2008 (Express edition) you can create a few basic tables and build the E-R diagram right there and have it generate relationships for you. You can then, at your leisure, examine the SQL used to achieve that and refine accordingly.
Personally, I always start by examining the problem domain, then I build the E-R diagram, then I build the database. I start building the C# application when I'm reasonably confident the database reflects the problem domain.
Starting from your C# application
However, what really matters is that you model the real world in a meaningful and effective way. In your case you already have a starting point in structures you've created in C# and you can use them to give you a starting point to build the E-R diagram. If you find it easier to get a C# application going and then build a database that reflects it, that should be fine. Perhaps you already have an approach that helps you capture the problem domain effectively. It's an iterative process whatever you do: building the C# code might reveal problems with the underlying database design and vice versa.
Diagramming - E-R or UML?
I'm personally convinced that this whole business is so complicated that you really need some diagrams.
to visualise your database, use an E-R diagram
to visualise your C# application use a UML class diagram
As you head towards a working application, you'll see how these 2 diagrams begin to match or at least reflect eash other pretty closely. In both cases, (entities or classes) understanding the relationship between objects will be really important when you query the database because it is crucial to understand relationships between tables (especially using 1-to-many relationships to resolve a complex many-to-many relationship) and various techniques for joining tables in queries (INNER or OUTER joins etc) No matter how clever your C# application is, you will at some point need to understand at least some of the complexities of the SQL language - and it is easier if you can refer to an E-R diagram.
Where to store?
Whats troubling me is that usually where I would store my c# objects in a dictionary or list for example, would i instead just retrieve straight from the database?
In the database, without a doubt. A C# class called Family would have a property FamilyName, say, with a setter method built in. If you discover a spelling mistake and want to change the name, the setter method would open a connection to the database, run an UPDATE query with the specified family name, (and probably the family id) as a parameter, and update the underlying field accordingly. Retrieving data would involve running a SELECT query etc.
Conclusion
Do some tutorials on how to examine a problem domain, create an entity-relationship diagram and build a set of related tables based on the diagram. I'm convinced that way you'll find it much easier to keep track of the C# classes that you build to communicate with the backend database.
Here's an example of a simple E-R diagram for families and their members:
To begin with you might think members and family could be in one table, but then you discover that creates a lot of duplication so you separate that out into family and member table with a one-to-many relationship, but then you realise that, through marriage for instance, people can belong to more than one family and you need to create a many-to-many relationship. I think the E-R diagram is the best place to work out that kind of complexity.
Not knowing what your structures look like or how your DB will be designed this is hard to answer. But you should be able to use existing data structures, and just pipe the data from the database instead of the XML file.
Look into Linq-to-XML, C# has a strong library to interact with SQL. May be a bit confusing at first, but very powerful once you learn it.
If I am right you are asking also if you should retrieve all the records from the database and store them as objects in a collection or retrieve selected records from the database and use the dataset results without placing them in a purpose defined structure.
I tend to select the records I want from the database and then load the results into my purpose defined classes / structures. This allows you to add your manipulation methods to the class holding a record result etc. without needing to take in dataset results to each method. However you will find yourself doing singular updates all the time when a batch update might be more efficient... if that makes sense.
Take a look at entity frameworks code first. If your data structures are classes in your application there are techniques to use that to create your database schema from that. As far as the data. Store it in your database and populate your lists and dictionaries with it. Or populate list of class genealogy individual with it.
If you want to write your own data classes, there's a free tutorial here written by myself. What I would definitely not to is use the data sources in ASP.NET, as these wizards are the Barty Crouches of the ASP.NET world - they appear good, but turn out to be evil, as inevitably you'll want to be able to tweak them and you won't understand how to do this.
Related
Introduction:
I'm refactoring (pretty much rewriting) a legacy application in my current internship. The part that this question will be concerned about is the database it uses and the way they retrieve data from it.
The database structure is:
There's a table that has the main records. Let's say each record is a measurement. It has some info about the measured material and different measurement information.
There's a table view they use that has the same information columns, plus some extra columns that contains data calculated from the given measurements. And it also filters some of the data from the table.
So let's say we have the main table with columns:
Measurement ID
Measurement A
Measurement B
The view has something like this:
Measurement ID
Measurement A
Measurement B
Some extra data (for example Measurement A * Measurement B)
The guy that is leading the development only knows some SQL, so he likes adding new columns that is calculated by some columns in the main table for experimenting. And this is definitely a need at the moment.
Requirements are:
Different types of databases should be supported (like SQL Server, Oracle, and probably some others).
The frontend should be able to show the view, which means even though some main columns will always stay the same, there may be some new columns including newly calculated values.
My question is:
What kind of system should I use to accommodate the needs of this application? I wanted to use Entity Framework, but the fact that the view may have new columns in the future is I think a problem. As far as I understand, I should map my classes to the database before compiling.
The other thing that I'm considering is maybe using Entity Framework to get data from the main table and do the calculations and the filtering that is currently done in the table view directly in the frontend, and skip the view altogether. Which sounds fine, though I don't know if they will allow me to do that.
What would you do in my case? Please take into account that I have virtually no experience with databases and ORMs.
You are correct in that using Entity Framework will be a problem if the underlying DB schema is always changing. It will require you to update the EF model on your end every time to grab those new columns.
Ideally, all of your database access is hidden behind the interface to your DAL, so that your application doesn't need to know about which ORM is being used -- if any -- or which database it's connecting to.
I hate to say it, but given your requirements, an ORM might not make sense. You might want to go with something more generic without any strong-typing. You could just simply always return a DataTable to your application layer, and it could loop through the columns and values to display whatever is returned. If there are fields you know will never change, you could create a manual mapping for those fields only into your application object(s).
You may have a look to NoSQL system that are a lot more flexible on the schema. Or have a look to document database like RavenDB. All these systems allow the schema to change dynamically. You need to check the Pro's and Con's to see if it can fill you requirements.
(This answer is a bit out of subject as it's about replacing the SQL server and not really creating a DAL, but other answers cover the subject well and I would like to propose another way that may help.)
If your schema is unstable, then using Entity Framework as a beginner is going to be a headache. The assumption is that you can just refresh the design canvas periodically to let the tool handle database table changes. You can try that for a time to see when it becomes too much of a pain, but without any prior experience using ORMs or Entity Framework it may not be worth the effort.
I would probably use something like Rob Conery's Massive ORM (https://github.com/robconery/massive). It gives you more flexibility with the underlying database schema and is a very small library. I remember it being ~300 lines of code and very easy to use. It uses C# dynamics so you'll have to be using >= C# 4.0 and be comfortable with that one concept but IMO it's worth it for the low-overhead. A full-fledged ORM like Entity Framework or NHibernate is going to cost a lot of learning cycles.
You could, of course, just stick to ADO.NET DataTables. They're a bit ugly and verbose, but they'll do the job.
You can use Entity Framework - Database First if the DB is changing. Of course, you will have to regenerate your classes when you want to be able to access new columns, when the DB schema changes.
If you need to accomodate different database servers, then you should take a look into implementing a repository pattern and abstract all your data access that way.
Your comment
it involves write operations to the main table but the main table never changes
confirms what I was hoping for. It means you can use Entity Framework as the core of you application and a different route to display data.
Suppose that for display (of the view) you use a classic DataTable (because all common grids support them, contrary to displaying dynamic objects). I don't know how create/update/delete will be done, but saving changes will at some point involve mapping a DataRow to a MainEntity object. You can write one method for that like
MainEntity DataRowToEntity(DataRow row)
{
var entity = new MainEntity();
entity.PropertyA = row["PropertyA"];
....
}
The MainEntity can be attached to a context, its status changed to Modified, and saved.
Almost all of the applications I write at work get their data from a central MSSQL database. This database has about 70 tables, and on average I'd say 25 or so columns per table. The database has developed over 5-10 years (I'm not entirely sure) and is full of idiosyncrasies and quirks. Foreign keys are irregularly implemented when it comes to naming and so on, as well as case and language mixing in table and column names.
I am not able to restructure the database itself as it would break a ton of backwards compatibility for applications needed in the daily work of most people in the office.
I've almost exclusively been using LINQ2SQL for interacting with the database and it works fine, but always requires a lot of manual joining of tables, either in some db repository or 'inline' when coding. So I've finally decided that I have to do something to once and for all ease the pain of working with this leviathan. This would preferably include implementing a clear naming scheme, joining relevant tables with foreign keys properly once and for all etc.
The three routes I can see are:
Creating a number of views, stored procedures and functions in the SQL to ease up my interaction with the DB. This obviously has the bonus of being usable in many languages, as opposed to a solution implemented in e.g. C#. The biggest drawback I can see here is that it would probably take a lot of time to do this properly, as well as being a bit harder to service a year down the road when I haven't looked at the SQL queries for a while. I would also need to implement another DB abstraction step inside my applications as I wouldn't want to work with just straight up DB calls (abstraction upon abstraction seems bad in this case, but maybe I'm wrong?)
Continuing on my LINQ2SQL road, but creating a once-and-for-all repository class that hides all the underlying tables in abstracted calls only. This idea seems more feasible in terms of development time, maintenance and single-point-abstraction.
Pulling off some EF4 reverse-engineering magic, using the designer to hook up relevant foreign keys and renaming table classes to fit my taste.
Any input on how this should/could be done, as well as any recommended reading you might have, would be most appreciated.
We have a very similar situation with our database. We went the EF route, but we used Code First. I know it sounds weird to use Code First when your database already exists, but due to the size of the tables and the number of tables, trying to do it all in the designer was not feasible.
You can use the "Reverse Engineer Code First" option in Entity Framework Power Tools to generate everything you need from your database.
I think that well thought out abstraction layer is better suits the needs of application if it is not based on physical schema of DB. I mean - the main goal of DAL is to hide tables from users leaving to them only valid "activities" thru stored procedures. In most cases this will outperform the direct data access and gives to you one more degree of freedom - to play with TSQL code and to implement additional logic/schema changes without needing to change the application.
I'm trying to build a web application that let the administrator talk to the database through C# and add new tables and columns to fit his requirements (sort of a very simple database studio) but I'm not trying to just create some spaghetti application.
So I'm trying to figure out how to let those things dynamically (automatically) when he creates a table and use the table to build them :
1- The business objects or entities (the classes, it's objects and properties).
2- The Data access layer (some simple methods that connects to the database and add, update, delete retrieve items (objects)).
Is this possible ? any pointers on how to achieve it ?
EDIT
just opened your link!! .. it's talking about the data bound controls and stuff! .. my question is way more advanced than that!.
when you build an N-Layered application you start with the database schema and implementation and it's easy to do programtically then you start building the DAL classes which (add, edit, etc in other words the CRUD operations) in and form this database
what I want to do is to allow the web administrator to choose add the new table through my application and then -dynamically- the application would take the tables names and columns as parameters and create new classes and define within them the CRUD methods that will implement the SQL CRUD operations
then it would also create dynamically the classes and define within them the variables, properties and methods to call and use the DAL methods .. all this based on the table, column names
NOTE : All this happens on the run-time!
You might want to look into ASP.Net Dynamic Data. It's a RAD tool which very easily gives you CRUD functionality for your entities and more. Check it out.
Sometime back I had also asked similar question on SO. I got only one reply.
Today I was digging some information on MSDN and as I had guessed it, MS CRM entity model works based on metadata. So basically whatever a CRM developer is working against is just metadata, they are not real objects as such. Following is the MSDN link.
Extend MS CRM Metadata and here is the MS CRM 4.0 SDK.
I hope this should get you started.
Update: Recently hit upon Visual Studio LightSwitch. I think this is what we wanted to build. A UI which will pick up table information from DB and then create all CRUD screens. VS LightSwitch is in its Beta1 and has quite a lot of potential. Should be a nice starting point.
First, any man trying to create MS Access is doomed to recreate MS Access. Badly.
You are better off using ASP.NET Dynamic Data (as suggested) or ASP.NET MVC Scaffolding. But runtime-generated playforms that actually make decent applications are really pipe dreams. You will need developer time to do anything complex. Or well.
What you are asking is non-sense. Why? Because the idea behind BLL and n-tier is that you know your data model well, and can create a static class model to represent your data model.
If your data model is dynamic, and changing, then you cannot create a static BLL (which is what a BLL is). What you will have to do dynamically build your queries at run-time. This is not something that any of the traditional methods are designed to handle, so you must do everything yourself.
While it's possible to dynamically generate classes at run-time, this is probably not the approach you want to take, because even if you manage to make your BLL adapt to your dynamic database.. the code that calls the BLL will not know anything about it, thus it will never get called.
This is not a problem you will solve overnight, or by copying any existing solution. You will have to design it from scratch, using low level ADO calls rather than relying on ORM's or any automation.
We are in the process of porting a legacy system to .NET, both to clean up architecture but also to take advantage of lots of new possibilities that just aren't easily done in the legacy system.
Note: When reading my post before submitting it I notice that I may have described things a bit too fast in places, ie. glossed over details. If there is anything that is unclear, leave a comment (not an answer) and I'll augment as much as possible
The legacy system uses a database, and 100% custom written SQL all over the place. This has lead to wide tables (ie. many columns), since code that needs data only retrieves what is necessary for the job.
As part of the port, we introduced an ORM layer which we can use, in addition to custom SQL. The ORM we chose is DevExpress XPO, and one of the features of this has also lead to some problems for us, namely that when we define a ORM class for, say, the Employee table, we have to add properties for all the columns, otherwise it won't retrieve them for us.
This also means that when we retrieve an Employee, we get all the columns, even if we only need a few.
One nice thing about having the ORM is that we can put some property-related logic into the same classes, without having to duplicate it all over the place. For instance, the simple expression to combine first, middle and last name into a "display name" can be put down there, as an example.
However, if we write SQL code somewhere, either in a DAL-like construct or, well, wherever, we need to duplicate this expression. This feels wrong and looks like a recipe for bugs and maintenance nightmare.
However, since we have two choices:
ORM, fetches everything, can have logic written once
SQL, fetches what we need, need to duplicate logic
Then we came up with an alternative. Since the ORM objects are code-generated from a dictionary, we decided to generate a set of dumb classes as well. These will have the same number of properties, but won't be tied to the ORM in the same manner. Additionally we added interfaces for all of the objects, also generated, and made both the ORM and the dum objects implement this interface.
This allowed us to move some of this logic out into extension methods tied to the interface. Since the dumb objects carry enough information for us to plug them into our SQL-classes and instead of getting a DataTable back, we can get a List back, with logic available, this looks to be working.
However, this has lead to another issue. If I want to write a piece of code that only displays or processes employees in the context that I need to know who they are (ie. their identifier in the system), as well as their name (first, middle and last), if I use this dumb object, I have no guarantee by the compiler that the code that calls me is really providing all this stuff.
One solution is for us to make the object know which properties have been assigned values, and an attempt to read an unassigned property crashes with an exception. This gives us an opportunity at runtime to catch contract breaches where code is not passing along enough information.
This also looks clunky to us.
So basically what I want advice on is if anyone else has been in, or are in, this situation and any tips or advice you can give.
We can not, at the present time, break up the tables. The legacy application will still have to exist for a number of years due to the size of the port, and the .NET code is not a in-3-years-release type of project but will be phased in in releases along the way. As such, both the legacy system and the .NET code need to work with the same tables.
We are also aware that this is not an ideal solution so please refrain from advice like "you shouldn't have done it like this". We are well aware of this :)
One thing we've looked into is to create an XML file, or similar, with "contracts". So we could put into this XML file something like this:
There is an Employee class with these 50 properties
Additionally, we have these 7 variations, for various parts of the program
Additionally, we have these 10 pieces of logic, that each require property X, Y and Z (X, Y and Z varies between those 10)
This could allow us to code-generate those 8 classes (full class + 7 smaller variations), and have the generator detect that for variation #3, property X, Y and K is present, and I can then tie in either the code for the logic or the interfaces the logic needs into this class automagically. This would allow us to have a number of different types of employee classes, with varying degrees of property coverage, and have the generator automatically add all logic that would be supported by this class to it.
My code could then say that I need an employee of type IEmployeeWithAddressAndPhoneNumbers.
This too looks clunky.
I would suggest that eventually a database refactoring (normalization) is probably in order. You could work on the refactoring and use views to provide the legacy application with an interface to the database consistent with what it expects. That is, for example, break the employe table down in to employee_info, employee_contact_info, employee_assignments, and then provide the legacy application with a view named employee that does a join across these three tables (or maybe a table-based function if the logic is more complex). This would potentially allow you to move ahead with a fully ORM-based solution which is what I would prefer and keep your legacy application happy. I would not proceed with a mixed solution of ORM/direct SQL, although you might be able to augment your ORM by having some entity classes which provide different views of the same data (say a join across a couple of tables for read-only display).
"We can not, at the present time, break up the tables. The legacy application will still have to exist for a number of years due to the size of the port, and the .NET code is not a in-3-years-release type of project but will be phased in in releases along the way. As such, both the legacy system and the .NET code need to work with the same tables."
Two words: materialized views.
You have several ways of "normalizing in place".
Materialized Views, a/k/a indexed views. This is a normalized clone of your source tables.
Explicit copying from old tables to new tables. "Ick" you say. However, consider that you'll be incrementally removing functionality from the old app. That means that you'll have some functionality in new, normalized tables, and the old tables can be gracefully ignored.
Explicit 2-way synch. This is hard, not not impossible. You normalize via copy from your legacy tables to correctly designed tables. You can -- as a temporary solution -- use Stored Procedures and Triggers to clone transactions into the legacy tables. You can then retire these kludges as your conversion proceeds.
You'll be happiest to do this in two absolutely distinct schemas. Since the old database probably doesn't have a well-designed schema, your new database will have one or more named schema so that you can maintain some version control over the definitions.
Although I haven't used this particular ORM, views can be useful in some cases in providing lighter-weight objects for display and reporting in these types of databases. According to their documentation they do support such a concept: XPView Concepts
I've been taking a look at some different products for .NET which propose to speed up development time by providing a way for business objects to map seamlessly to an automatically generated database. I've never had a problem writing a data access layer, but I'm wondering if this type of product will really save the time it claims. I also worry that I will be giving up too much control over the database and make it harder to track down any data level problems. Do these type of products make it better or worse in the already tough case that the database and business object structure must change?
For example:
Object Relation Mapping from Dev Express
In essence, is it worth it? Will I save "THAT" much time, effort, and future bugs?
I have used SubSonic and EntitySpaces. Once you get the hang of them, I beleive they can save you time, but as complexity of your app and volume of data grow, you may outgrow these tools. You start to lose time trying to figure out if something like a performance issue is related to the ORM or to your code. So, to answer your question, I think it depends. I tend to agree with Eric on this, high volume enterprise apps are not a good place for general purpose ORMs, but in standard fare smaller CRUD type apps, you might see some saved time.
I've found iBatis from the Apache group to be an excellent solution to this problem. My team is currently using iBatis to map all of our calls from Java to our MySQL backend. It's been a huge benefit as it's easy to manage all of our SQL queries and procedures because they're all located in XML files, not in our code. Separating SQL from your code, no matter what the language, is a great help.
Additionally, iBatis allows you to write your own data mappers to map data to and from your objects to the DB. We wanted this flexibility, as opposed to a Hibernate type solution that does everything for you, but also (IMO) limits your ability to perform complex queries.
There is a .NET version of iBatis as well.
I've recently set up ActiveRecord from the Castle Project for an app. It was pretty easy to get going. After creating a new app with it, I even used MyGeneration to script out class files for a legacy app that ActiveRecord could use in a pretty short time. It uses NHibernate to interact with the database, but takes away all the xml mapping that comes with NHibernate. The nice thing is though, if necessary, you already have NHibernate in your project, you can use its full power if you have some special cases. I'd suggest taking a look at it.
There are lots of choices of ORMs. Linq to Sql, nHibernate. For pure object databases there is db4o.
It depends on the application, but for a high volume enterprise application, I would not go this route. You need more control of your data.
I was discussing this with a friend over the weekend and it seems like the gains you make on ease of storage are lost if you need to be able to query the database outside of the application. My understanding is that these databases work by storing your object data in a de-normalized fashion. This makes it fast to retrieve entire sets of objects, but if you need to select data from a perspective that doesn't match your object model, the odbms might have a hard time getting at the particular data you want.