DataSet or Entity Data Model

DataSet or Entity Data Model - c#

Please excuse the noob question as I am new to integrating data with my applications. I've tried to find answers on the net, but not there yet.
I have an application I'm developing in C# on VS2010 which requires data in/out from a database. I am trying to figure out if its a DataSet or Entity Data Model I need to use when setting up a data source. My understanding was that it was the EDM which allowed me to treat tables/fields in a database as objects, but somehow it looks like I can do that with a DataSet too.
Some sources explain that a DataSet makes a cached copy of the Database which can then be manipulated.
Essentially my question is which should I use and what are the (dis)advantages of one over the other.

You have several options open to you when it comes to storing and retrieving data to/from a database:
At the very simplest level, use ADO.NET to open a connection to the DB, create a command and execute it. If you expect results back (i.e. SELECT ...) then you could call the command's ExecuteReader(...). Working in this manner results in very quick execution and the minimum of overhead, but you have to do more of the heavy lifting. If your app is simple, this is probably a good way to go. If your app is, or is likely to be more complex, you may want to consider other options...
ADO.NET DataSets are a reasonable DB IO mechanism, particularly for reading data from a DB. However, they can be a little cumbersome when trying to update the DB.
You could use an Object-Relational Mapper (ORM) like nHibernate or Entity Framework, but, frankly, that often results in your learning curve increasing dramatically while you figure out how to plug together the moving parts and make them work well together.
You might also consider a new variant of Entity Framework called Code First (CF): This allows you to pretty much design your code and CF will generate your EDM and handle the majority of the DB operations required for you to build your system. Scott Hanselman wrote up a nice intro into EF CF.
Having used practically every DB API and ORM on Windows over the last 20+ years, I am delighted with how CF is shaping up! EF 4.3 that shipped just a couple of weeks ago includes some key new improvements to CF including migrations which allow you to handle changes to your DB schema as it evolves. I've build 3-4 systems using EF CF over the last couple of months and am very happy - it's my favorite relational database IO mechanism at present.
If you want to really get into EF CF, I strongly recommend Julia Lerman's book EF CF - it's a short, nicely written, very useful guide that should take you no more than a day or two to work through the main sections of.
Hope this helps.

If you add a LocalDB data source to your project (because you want a small local database file) then when the Data Source Configuration Wizard pops up, it explicitly asks you whether you want to use a Dataset or Entity Data Model database model. Is this the situation you were facing? That was the problem I had that brought me to this entry.
There is no question that for an enterprise class application, or a website, you would want to investigate ADO.NET or an ORM, but it doesn't help answer this question, which has to do with what are the differences between choosing Dataset vs Entity Data Model in the wizard.
Essentially, Entity Data Model is the more recent technology. If you are unfamiliar with Dataset, then this is probably not the time to start using it.

If you're asking what are the pros and cons for ADO.NET (DataSet) vs EntityFramework (Entity Data Model) then there is a discussion that may help at ADO.NET Entity Framework or ADO.NET
EF will get you up and running pretty quickly but in my (very limited) experience its been a pain to maintain.
What is it that has determined that these are your only two options? There are far more available to you including many ORMs.

If your application is supporting a business application than queries get complex pretty soon. In such scenario, stored-procedures save a lot of time and are much easier to maintain and they work better with ADO.NET. In almost all scenarios, I would suggest using stored-procedures and ADO.NET. Move as much of the business rules and logic to stored procedures as you can...much easier to maintain this way.
Use Datasets (datatables) only to retrieve and read data. Any data that needs to be saved to database should be directly manipulated in the database ... no point doing it in dataset and then saving the same. In a multi-user environment it is almost always better to save the changes to database as soon as the user has clicked "save".
You may (should) use business objects within the application for business-logic processes.
Let us take a simple example of where you are saving a Contact (name, phone, email, address etc) and then retrieving a list of contacts added today...I would suggest you do it as follows:
1) Adding the contact - Client (web or otherwise) collects data --> data is saved in a Contact business object --> validate Contact object --> Call repository layer to save Contact object (adding a repository layer is useful but not-necessary to keep the data layer abstract from the client) --> Repository calls the data layer to save the contact object (here a simple ADO.NET call, using Command object, can be made to call the stored procedure to save the contact in database). No dataset was used in this use case.
2) Retrieving list of contacts -- Client calls the repository layer to get the list of contacts --> repository layer call the data layer to retrieve the data --> here the list of data is retrieved as a dataset(datatable) --> return the datatable back to the client and let the client read the data directly from datatable while rendering the data. Even a single contact can be retrieved as a dataset.
P.S: ORM is almost always an overkill. It is almost always used because certain developers like to keep everything object-oriented...so an extra layer gets added even though it does nothing useful (IMHO).

But, what if you have business logic (stored procedures) which can be used in many different applications.
So depends: if you make your application for different users with different backend storage, or you make many applications for users which doesn't change backend storage so often.
It is very important to have database integrity and rules independent from application (inner or outsource)

Related

Entity Framework VS pure Ado.Net

EF is so widely used staff but I don't realize how I should use it. I met a lot of issues with EF on different projects with different approaches. So some questions brought together in my head. And answers leads me to use pure ado.net with stored procedures.
So the questions are:
How to deal with EF in n-tier application?
For example, we have some DAL with EF. I saw a lot of articles and projects that used repository, unit of work patterns as some kind of abstraction for EF. I think such approach kills most of benefits that increase development speed and leads to few things:
remapping of EF load results in some DTO that kills performance(call some select to get table data - first loop, second loop - map results to some composite type generated by ef, next - filter mapped data using linq and, at last, map it to some DTO). Exactly remapping to DTO is killer of one of the biggest efs benefit;
or
leads to strong cohesion between EF (and it's version) and app. It will be something like 2-tier app with dal and presentation with bll or dal with bll and presentation. I guess it's not best practice. And the same loading process as we have for previous thing except mapping, so again performance issue raised up. We could try to use EF as DAL without any abstraction under them. But we will get similar issues in some other way.
Should I use one context per app\thread\atomic operation? Using approach - one context per app\thread may slightly increase performance and possibilities to call navigation properties, but we meet another problem - updating this context and growing loaded data in context, also I'm not sure about concurrency with one dbcontext per app\thread. Using context per operation will lead us to remapping EF results to our DTO's. So you see that we again pushed back to question no.1.
Could we try to use EF + stored procedures only? Again we have issues from previous questions. What is the reason to use EF if the biggest part of functionality will not be used?
So, yes EF is great to start project. It so convenient when we have few screens and crud operations.
But what next?
All this text is just unsorted thoughts. I know that pure ado.net will lead to another kind of challenges.
So, what is your opinion about this topic?

By following the naming conventions , you will find it's called : ADO.NET Entity Framework , which means that Entity Framework sits on top of ADO.NET so it can't be faster , It may perform both in equal time , but let's look at EF provides :
You will no more get stuck with writing queries without any clue about if what you're writing is going to compile or not .
It makes you rely on C# or your favorite .NET language on writing your own data constraints that you wish to accept from the target user directly inside your model classes .
Finally : EF and LINQ give a lot of power in maintaining your applications later .
There are three different models with the Entity Framework : Model First , Database First and Code First get to know each of 'em .
-The Point about killing performance when remapping is on process , it's because that on the first run , EF loads metadata into memory and that takes time as it builds in-memory representation of model from edmx file.

ADO. Net is an object oriented framework that allows you to interact with database system (SQL, Oracle, etc).
Entity framework is a techniques of manipulating data in databases like (collection of queries (inert table name , select * from like this )).
it is uses with LINQ.

Entity Framework is not efficient in any case as in most tools or toolboxes designed to achieve 'faster' results.
Access to database should be viewed as a separate tier using store procedures as the interface. There is no reason for any application to have more than absolutely require CRUD operations. Less is more principle. Stored procedures are easy to write, secure, maintain and is de facto fastest way. It's easy to write tools to generate desired codes for POCO and DbContext through stored procedures.
Application well designed should have a limited numbers of connection strings to database and none of which should be the all mighty God. Using schema to support connection rights.
Lazy loading are false statements added to solve a problem that should never exist and introduced with ORM and its plug and play features. Data should only be read when needed. Developers should be responsible to implement this logic base on application context.
If your application logic has a problem to maintain states, no tool will help. It will in fact, make it worse by cover up the real problem until it's too late.
Database first is the only solution for a well designed application. Civilization realized long time ago the important of solid aqueduct and sewer system. High level code can and will be replaced anytime but data stays. Rewrite an entire application is matter of days if database is well designed.
Applications are just glorified database access. Still true in most cases.
This is my conclusion after many years in business applications debugging through codes produced by many different tools or toolboxes. The faster results advertised are not even close to cover the amount of time/energy wasted later trying to clean up the mess. Performance issues are rarely if not ever caused by high demand but the sum of all 'features' added through unusable tools.

ADO.NET provides consistent access to data sources such as SQL Server and XML, and to data sources exposed through OLE DB and ODBC. Data-sharing consumer applications can use ADO.NET to connect to these data sources and retrieve, handle, and update the data that they contain.
Entity Framework 6 (EF6) is a tried and tested object-relational mapper (O/RM) for .NET with many years of feature development and stabilization. An ORM like EF has the following advantage
ORM lets developers focus on the business logic of the application thereby facilitating huge reduction in code.
It eliminates the need for repetitive SQL code and provides many benefits to development speed.
Prevents writing manual SQL queries; & many more..
In an n-tier application,it depends on the amount of data your application is handling and your database is managing. According to my knowledge DTO's don't kill performance. They are data container for moving data between layers and are only used to pass data and does not contain any business logic. They are mostly used in service classes.See DTO.
One DBContext is always a best practice.
There is no such combination of EF + SP(Stored Procedure) as per my knowledge. If you wish to use an ORM like EF and an SP at the same time try micro-ORMs like Dapper,BLToolkit, etc..It was build for that purpose and is heck lotta fast than EF. Here is a good article on Dapper ORM.
Here is a related thread on a similar topic: What is the difference between an orm and ADO.net?

How to design a Data Access Layer for a database table that may change in the future?

Introduction:
I'm refactoring (pretty much rewriting) a legacy application in my current internship. The part that this question will be concerned about is the database it uses and the way they retrieve data from it.
The database structure is:
There's a table that has the main records. Let's say each record is a measurement. It has some info about the measured material and different measurement information.
There's a table view they use that has the same information columns, plus some extra columns that contains data calculated from the given measurements. And it also filters some of the data from the table.
So let's say we have the main table with columns:
Measurement ID
Measurement A
Measurement B
The view has something like this:
Measurement ID
Measurement A
Measurement B
Some extra data (for example Measurement A * Measurement B)
The guy that is leading the development only knows some SQL, so he likes adding new columns that is calculated by some columns in the main table for experimenting. And this is definitely a need at the moment.
Requirements are:
Different types of databases should be supported (like SQL Server, Oracle, and probably some others).
The frontend should be able to show the view, which means even though some main columns will always stay the same, there may be some new columns including newly calculated values.
My question is:
What kind of system should I use to accommodate the needs of this application? I wanted to use Entity Framework, but the fact that the view may have new columns in the future is I think a problem. As far as I understand, I should map my classes to the database before compiling.
The other thing that I'm considering is maybe using Entity Framework to get data from the main table and do the calculations and the filtering that is currently done in the table view directly in the frontend, and skip the view altogether. Which sounds fine, though I don't know if they will allow me to do that.
What would you do in my case? Please take into account that I have virtually no experience with databases and ORMs.

You are correct in that using Entity Framework will be a problem if the underlying DB schema is always changing. It will require you to update the EF model on your end every time to grab those new columns.
Ideally, all of your database access is hidden behind the interface to your DAL, so that your application doesn't need to know about which ORM is being used -- if any -- or which database it's connecting to.
I hate to say it, but given your requirements, an ORM might not make sense. You might want to go with something more generic without any strong-typing. You could just simply always return a DataTable to your application layer, and it could loop through the columns and values to display whatever is returned. If there are fields you know will never change, you could create a manual mapping for those fields only into your application object(s).

You may have a look to NoSQL system that are a lot more flexible on the schema. Or have a look to document database like RavenDB. All these systems allow the schema to change dynamically. You need to check the Pro's and Con's to see if it can fill you requirements.
(This answer is a bit out of subject as it's about replacing the SQL server and not really creating a DAL, but other answers cover the subject well and I would like to propose another way that may help.)

If your schema is unstable, then using Entity Framework as a beginner is going to be a headache. The assumption is that you can just refresh the design canvas periodically to let the tool handle database table changes. You can try that for a time to see when it becomes too much of a pain, but without any prior experience using ORMs or Entity Framework it may not be worth the effort.
I would probably use something like Rob Conery's Massive ORM (https://github.com/robconery/massive). It gives you more flexibility with the underlying database schema and is a very small library. I remember it being ~300 lines of code and very easy to use. It uses C# dynamics so you'll have to be using >= C# 4.0 and be comfortable with that one concept but IMO it's worth it for the low-overhead. A full-fledged ORM like Entity Framework or NHibernate is going to cost a lot of learning cycles.
You could, of course, just stick to ADO.NET DataTables. They're a bit ugly and verbose, but they'll do the job.

You can use Entity Framework - Database First if the DB is changing. Of course, you will have to regenerate your classes when you want to be able to access new columns, when the DB schema changes.
If you need to accomodate different database servers, then you should take a look into implementing a repository pattern and abstract all your data access that way.

Your comment
it involves write operations to the main table but the main table never changes
confirms what I was hoping for. It means you can use Entity Framework as the core of you application and a different route to display data.
Suppose that for display (of the view) you use a classic DataTable (because all common grids support them, contrary to displaying dynamic objects). I don't know how create/update/delete will be done, but saving changes will at some point involve mapping a DataRow to a MainEntity object. You can write one method for that like
MainEntity DataRowToEntity(DataRow row)
{
var entity = new MainEntity();
entity.PropertyA = row["PropertyA"];
....
}
The MainEntity can be attached to a context, its status changed to Modified, and saved.

c#/.net project how to save/organize database queries

In my first c# project, I need to connect to a database server for multiple read only queries. Would anyone share experiences on how to organize the queries into the project? currently I just hardcoded query strings in the c# source files whenever needed. but it is hard to maintain and once something changes on the database server side I am in trouble. Or should I put all query strings in the .config file using appsettings? Are there better ways? I do not have rights to save stored procedures on the server. thanks.

There are different answers with varying levels of sophistication based on your needs. Except in the very smallest of projects, I create two class library projects for database access: one that contains the data model and queries and another test project that exercises the first project's queries. In simple solutions, you use this library in an ASP.NET or other project.
You should strongly consider learning an ORM like NHibernate or VS 2008/.NET 3.5's Linq-To-SQL or Entity Framework. Minimally, you MUST remember to use parameterized queries if you have a web-facing app.
In more sophisticated solutions you will completely encapsulate the database into it's own service, or tier. In my experience I had a data access tier that ran in it's own Windows Communication Foundation service, as a Windows Service, and it was the only service that could talk directly to the database or knew the database's data model. It would do all the interaction with the database, and then transform the data into different data models that are read by the other tiers. I typically create a project called "Contracts" that contains all the interfaces and data models that are communicated from the data tier to the rest of the system. The reason you do this is so that you avoid the pain you have mentioned: you can update the underlying database, ORM layer, and "common data models" and then not change the other tiers at all.

If this is your first project, try to keep thinks simple. If you add too much variables probably you'll end thinking more in technology than in solutions.
That said, if your queries don't expect to change it's parameters, you can use stored procedures. This approach also will help boost your queries as the execution plan will be kept in the database.

Strategies for replacing legacy data layer with Entity framework and POCO classes

We are using .net C# 4.0, VS 2010, EF 4.1 and legacy code in this project we are working on.
I'm working on a win form project where I have made a decision to start using entity framework 4.1 for accessing an ms sql db. The code base is quite old and we have an existing data layer that uses data adapters. These data adapters are used all over the place (in web apps and win form apps) My plan is to replace the old db access code with EF over time and get rid for the tight coupling between UI layers and data layer.
So my idea is to more or less combine EF with the legacy data access layer and slowly replace the legacy data layer with a more modern take on things using EF. So for now we need to use both EF and the legacy db access code.
What I have done so far is to add a project containing the edmx file and context. The edmx is generated using database first approach. I have also added another project that contains the POCO classes (by using ADO.NET POCO Entity Generator). I have more or less followed Julia Lerman's approach in her book "Programming Entity Framework" on how to split the model and the generated POCO classes. The database model has been set for years and it's not an option the change the table and the relationships, triggers, stored procedures, etc, so I'm basically stuck with the db model as it is.
I have read about the repository pattern and unit of work and I kind of like the patterns, but I struggle to implement them when I have both EF and the legacy db access code to deal with. Specially when I don't have the time to replace all of the legacy db access code with a pure EF implementation. In an perfect world I would start all over again with a fresh take one the data model, but that is not an option here.
Is the repository and unit of work patterns the way to go here? In order to use the POCO classes in my business layer, I sometimes need to use both EF and the legacy db code to populate my POCO classes. In another words, I can sometimes use EF to retrieve a part of the data I need and the use the old db access layer to retrieve the rest of the data and then map the data to my POCO classes. When I want to update some data I need to pick data from the POCO classes and use the legacy data access code to store the data in the database. So I need to map the data retrieved from the legacy data access layer to my POCO classes when I want to display the data in the UI and vice versa when I want to save data to the data base.
To complicate things we store some data in tables that we don't know the name of before runtime (Please don't ask me why:-) ). So in the old db access layer, we had to create sql statements on the fly where we inserted the table and column names based on information from other tables.
I also find that the relationships between the POCO classes are somewhat too data base centric. In another words, I feel that I need to have a more simplified domain model to work with. Perhaps I should create a domain model that fits the bill and then use the POCO classes as "DAO's" to populate the domain model classes?
How would you implement this using the Repository pattern and Unit of Work pattern? (if that is the way to go)

Alarm bells are ringing for me! We tried to do something similar a while ago (only with nHibernate not EF4). We had several problems running ADO.NET along side an ORM - database concurrency being a big one.
The database model has been set for
years and it's not an option the
change the table and the
relationships, triggers, stored
procedures, etc, so I'm basically
stuck with the db model as it is.
Yep. Same thing! The problem was that our stored procs contained a lot of business logic and weren't simple CRUD procs so keeping the ORM updated with the various updates performed by a stored procedure was not easy at all - Single Responsibility Principle - not a good one to break!
My plan is to replace the old db
access code with EF over time and get
rid for the tight coupling
between UI layers and data layer.
Maybe you could decouple without the need for an ORM - how about putting a service/facade layer infront of your UI layer to coordinate all interactions with the underlying domain and hide it from the UI.
If your database is 'king' and your app is highly data driven I think you will always be fighting an uphill battle implementing the patterns you mention.
Embrace ado.net for this project - use EF4 and DDD patterns on your next green field proj :)

EDMX + POCO class generator results in EFv4 code, not EFv4.1 code but you don't have to bother with these details. EFv4.1 offers just different API which does exactly the same (and it is only wrapper around EFv4 API).
Depending on the way how you use datasets you can reach some very hard problems. Datasets are representation of the change set pattern. They know what changes were done to data and they are able to store just these changes. EF entities know this only if they are attached to the context which loaded them from the database. Once you work with detached entities you must make a big effort to tell EF what has changed - especially when modifying relations (detached entities are common scenario in web applications and web services). For those purposes EF offers another template called Self-tracking entities but they have another problems and limitations (for example missing lazy loading, you cannot apply changes when entity with the same key is attached to the context, etc.).
EF also doesn't support several features used in datasets - for example unique keys and batch updates. It's fun that newer MS APIs usually solve some pains of previous APIs but in the same time provide much less features then previous APIs which introduces new pains.
Another problem can be with performance - EF is slower then direct data access with datasets and have higher memory consumption (and yes there are some memory leaks reported).
You can forget about using EF for accessing tables which you don't know at design time. EF doesn't allow any dynamic behavior. Table names and the type of database server are fixed in mapping. Another problems can be with the way how you use triggers - ORM tools don't like triggers and EF has limited features when working with database computed values (possibility to fill value in the database or in the application is disjunctive).
The way of filling POCOs from EF + Datasets sounds like this will not be possible when using only EF. EF has some allowed mapping patterns but possibilities to map several tables to single POCO class are extremely limited and constrained (if you want to have these tables editable). If you mean just loading one entity from EF and another entity from data adapter and just make reference between them you should be OK - in this scenario repository sounds like reasonable pattern because the purpose of the repository is exactly this: load or persist data. Unit of work can be also usable because you will most probably want to reuse single database connection between EF and data adapters to avoid distributed transaction during saving changes. UoW will be the place responsible for handling this connection.
EF mapping is related to database design - you can introduce some object oriented modifications but still EF is closely dependent on the database. If you want to use some advanced domain model you will probably need separate domain classes filled from EF and datasets. Again it will be responsibility of repository to hide these details.

From how much we have implemented, I have learned following things.
POCO and Self Tracking objects are difficult to deal with, as if you do not have easy understanding of what goes inside, there will be number of unexpected behavior which may have worked well in your previous project.
Changing pattern is not easy, so far we have been managing simple CRUD without unit of work and identity map pattern. Now lot of legacy code that we wrote in past does not consider these new patterns and the logic will not work correctly.
In our previous code, we were simply using transactions and single insert/update/delete statement that was directly sent to database assuming transactions on server side will take care of all operations.
In such conditions, we were directly dealing with IDs all the time, newly generated IDs were immediately available after single insert statement, however this is not case with EF.
In EF, we are not dealing with IDs, we are dealing with navigation properties, which is a huge change from earlier ADO.NET programming methods.
From our experience we found that only replacing EF with earlier data access code will result in chaos. But EF + RIA Services offer you a completely new solution where you will probably get everything you need and your UI will very easily bind to it. So if you are thinking about complete rewriting using UI + RIA Services + EF, then it is worth, because lot of dependency in query management reduces automatically. You will be focusing only on business logic, but this is a big decision and the amount of man hours required in complete rewriting or just replacing EF is almost same.
So we went UI + RIA Services + EF way, and we started replacing one one module. Mostly EF will easily co-exist with your existing infrastructure so there is no harm.

How to handle default data in an application using NHibernate

I'm working on an application that uses SQL Server and NHibernate. We have the concept of default data (complex entities) that needs to be created for each new entity. This data can be changed on a per-user basis. However, we're struggling with the best way to create this data.
For example, lets say my application has a Store entity which has several default Products that I want to create when a new Store gets created. Anything about aProduct can be modified by managers of each Store.
As I see it, there are two main options:
Keep the default data in code and write it to the database once the new entity is created.
Keep the default data in the database and move it over with a stored procedure/raw SQL when the entity is created.
Instinctively, I lean toward option two, since databases are great at moving and manipulating sets of data, and option one would require a ton of messy code that could get out of hand.
However, writing a stored procedure or raw SQL presents its own issues:
We would have to re-write the stored procedure or SQL depending on the database we're using
We would be subverting the ORM in a way (not sure if this is actually wrong). That is, we'd be moving data around without using NHibernate
I found this article by Ayende Rahien which outlines how to perform a bulk delete. I am thinking that doing something similar for inserting default data would be fine. I also found an nhibernate users groups post (called "Schema export and default data"--SO won't let me post two links) that describes a similar situation, but it doesn't seem like there's a consensus on what the right solution is (although Ayende does offer some feedback and suggests that the data live in the database).
After writing this, I'm leaning even more toward using a stored procedure, I'm just worried about possible pitfalls of mixing two database access strategies (directly calling SProcs and using an ORM).
Any feedback is appreciated!
Edit: Removed "immutable" language. I'm specifically talking about default data that can change so I think this term was incorrect/confusing here.

I would create a default data service that creates those data in code, and use a factory to create your store and use the default data service to generate the default entities.
Using a Stored Procedure definitely defeats the point of having an ORM.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.