I have three layer application in which all database related operation are performed in database layer.
for some queries huge data is fetch in sqldatareader (around 10 Millions rows with 32 columns), my question is how i can pass this big data to presentation layer where i am generating some reports.
after analyzing i have below options please share your inputs on the same.
Pass sqlDatareader itself
which is actually not the idea as i have to keep the connection open all the time.
use datatable
load sqldatareader into datatable and pass it as return statement.
this sounds good but i am not sure that this is the proper approach, in this case i would like to know does it will affect overall performance of the application or not. ?
use list as custom object.
Its winform based application installed on single machine only and i am using .net framework 3.5
your inputs and feedback are greatly appreciated.
Is it really necessary to pass so many data at one moment? Wouldn't be paging better? It would solve you many problems. Anyway, who wants to see 10 millions of data on a single page? Who would want to wait 5 minutes until the page is loaded with all the data?
If you have three-layer architecture, you would probably want to use some business objects instead of objects for direct communication with a database. Presentation layer should know nothing or very little about the database used. So in your db layer, take loaded data, store it in business objects and pass these objects to a presentation layer.
** passing sqldatareader will cost you lots of network usage and its not at all recommended to keep the connection open for such long time.
** Using datatable is a preferable option but however it depends on what kind of operations you want to perform on this huge data. By default, datatable can hold up to 16,777,216 rows
out of these two, second one is more preferable and gives good performance
You shouldn't pass an SQLDataReader or a DataTable on to another application layer, that will expose your DAL and DataBase model implementation and will damage the seperation of application layers.
That being said, SQLDataReader keeps an open connection to the DataBase. Therefore, you should read all the rows you need and close it as early as you can.
Designwise, you should represent the data using your own classes.
Anyway, are you sure you need that many rows in the presentation layer? sounds peculiar to me.
Related
I've been reviewing examples on the web of 3 layer design and I've noticed that most samples return either datasets or data tables. The thing that is confusing me is what if you would rather return a generic list of type so you can utlize properties or methods from within the type your list is based on? As example using a Name property that concats various fields in a specific way depending on the data, if the List is bound to a control on a form then the Name property can be used as the datafield. If you would want to accomplish the same thing when using a dataset or table, you'd have to return the data from the database to acheive the same (I try not to use datasets or datatables so I'm probably very wrong about this statement. :) )
The part that is really confusing me is about resusing code, to me it seems the only way to reuse code is to retrieve the data into either a dataset or datatable and then loop through the data and add it to a List, is this generally the best practice for 3 layer or is there a way to do this without datasets and datatables.
The example in the link below demonstrates in essence using datasets or tables and then adding it to an object but I'm forced to ask if this is the best practice?
http://www.codeproject.com/Articles/36847/Three-Layer-Architecture-in-C-NET
Thanks
Using DataTables is a specific dotnetism. The reason behind it is that they contain metadata about the structure of the data, which lets DataGrid (and other such components) display the data automatically without using reflection or so. My guess is this is amongst other things a heritage of the MS Access approach to RAD, where the intent was enabling "business people" to create apps by generating the user interface directly from a SQL schema, essentially doing the opposite of a tiered design. This heritage then seems to have leaked into the hivemind.
There's nothing wrong about using "plain" data structures, as long as you're willing to give up the RAD features, and the trend lately seems to have been to get rid of this tradeoff too. (For instance with Web Forms' strongly typed data controls, and MVC's model binding features.)
Also, speaking more generally, Code Project articles from before MVC was established are not really a good source of wisdom on general software architecture.
What you should carry your data on depends entirely on your needs.
If you retrieve data from the DB and bind it to a datagrid, datasets might give you the perfect solution. If you want some other method where data tracks its own update status you should look into Entity Framework. If you retrieve data and send it through a web service for cross platform or cross domain processing you need to load your data onto some other serializable classes of your own.
Take a look at the article below. It is a little old and targeted at EF4 but it summerizes pros and cons of different strategies very well. (There are three articles in the series, I suggest you read them all)
http://msdn.microsoft.com/en-us/magazine/ee335715.aspx
I think the samples you're finding used data tables and datasets because it's a simple way to show 3-tier design. Now days Entity Framework has largely replaced the "data access layer" mentioned in the sample.
Before entity framework when I wrote a data access layer I would return a generic list that I built from the database. To run an update, delete, or insert I would pass an object in as the parameter to the methods, then use the object's properties as the values in the sql statement. I preferred doing it that way for the reasons you mentioned but also because it allowed me to change the object definitions or db schema (or even use a different db all together) independently of each other.
Please excuse the noob question as I am new to integrating data with my applications. I've tried to find answers on the net, but not there yet.
I have an application I'm developing in C# on VS2010 which requires data in/out from a database. I am trying to figure out if its a DataSet or Entity Data Model I need to use when setting up a data source. My understanding was that it was the EDM which allowed me to treat tables/fields in a database as objects, but somehow it looks like I can do that with a DataSet too.
Some sources explain that a DataSet makes a cached copy of the Database which can then be manipulated.
Essentially my question is which should I use and what are the (dis)advantages of one over the other.
You have several options open to you when it comes to storing and retrieving data to/from a database:
At the very simplest level, use ADO.NET to open a connection to the DB, create a command and execute it. If you expect results back (i.e. SELECT ...) then you could call the command's ExecuteReader(...). Working in this manner results in very quick execution and the minimum of overhead, but you have to do more of the heavy lifting. If your app is simple, this is probably a good way to go. If your app is, or is likely to be more complex, you may want to consider other options...
ADO.NET DataSets are a reasonable DB IO mechanism, particularly for reading data from a DB. However, they can be a little cumbersome when trying to update the DB.
You could use an Object-Relational Mapper (ORM) like nHibernate or Entity Framework, but, frankly, that often results in your learning curve increasing dramatically while you figure out how to plug together the moving parts and make them work well together.
You might also consider a new variant of Entity Framework called Code First (CF): This allows you to pretty much design your code and CF will generate your EDM and handle the majority of the DB operations required for you to build your system. Scott Hanselman wrote up a nice intro into EF CF.
Having used practically every DB API and ORM on Windows over the last 20+ years, I am delighted with how CF is shaping up! EF 4.3 that shipped just a couple of weeks ago includes some key new improvements to CF including migrations which allow you to handle changes to your DB schema as it evolves. I've build 3-4 systems using EF CF over the last couple of months and am very happy - it's my favorite relational database IO mechanism at present.
If you want to really get into EF CF, I strongly recommend Julia Lerman's book EF CF - it's a short, nicely written, very useful guide that should take you no more than a day or two to work through the main sections of.
Hope this helps.
If you add a LocalDB data source to your project (because you want a small local database file) then when the Data Source Configuration Wizard pops up, it explicitly asks you whether you want to use a Dataset or Entity Data Model database model. Is this the situation you were facing? That was the problem I had that brought me to this entry.
There is no question that for an enterprise class application, or a website, you would want to investigate ADO.NET or an ORM, but it doesn't help answer this question, which has to do with what are the differences between choosing Dataset vs Entity Data Model in the wizard.
Essentially, Entity Data Model is the more recent technology. If you are unfamiliar with Dataset, then this is probably not the time to start using it.
If you're asking what are the pros and cons for ADO.NET (DataSet) vs EntityFramework (Entity Data Model) then there is a discussion that may help at ADO.NET Entity Framework or ADO.NET
EF will get you up and running pretty quickly but in my (very limited) experience its been a pain to maintain.
What is it that has determined that these are your only two options? There are far more available to you including many ORMs.
If your application is supporting a business application than queries get complex pretty soon. In such scenario, stored-procedures save a lot of time and are much easier to maintain and they work better with ADO.NET. In almost all scenarios, I would suggest using stored-procedures and ADO.NET. Move as much of the business rules and logic to stored procedures as you can...much easier to maintain this way.
Use Datasets (datatables) only to retrieve and read data. Any data that needs to be saved to database should be directly manipulated in the database ... no point doing it in dataset and then saving the same. In a multi-user environment it is almost always better to save the changes to database as soon as the user has clicked "save".
You may (should) use business objects within the application for business-logic processes.
Let us take a simple example of where you are saving a Contact (name, phone, email, address etc) and then retrieving a list of contacts added today...I would suggest you do it as follows:
1) Adding the contact - Client (web or otherwise) collects data --> data is saved in a Contact business object --> validate Contact object --> Call repository layer to save Contact object (adding a repository layer is useful but not-necessary to keep the data layer abstract from the client) --> Repository calls the data layer to save the contact object (here a simple ADO.NET call, using Command object, can be made to call the stored procedure to save the contact in database). No dataset was used in this use case.
2) Retrieving list of contacts -- Client calls the repository layer to get the list of contacts --> repository layer call the data layer to retrieve the data --> here the list of data is retrieved as a dataset(datatable) --> return the datatable back to the client and let the client read the data directly from datatable while rendering the data. Even a single contact can be retrieved as a dataset.
P.S: ORM is almost always an overkill. It is almost always used because certain developers like to keep everything object-oriented...so an extra layer gets added even though it does nothing useful (IMHO).
But, what if you have business logic (stored procedures) which can be used in many different applications.
So depends: if you make your application for different users with different backend storage, or you make many applications for users which doesn't change backend storage so often.
It is very important to have database integrity and rules independent from application (inner or outsource)
When using a Class to get one row of Data from the Database what is best to use:
A DataSet?
A Reader and do what store the data in a Structure?
What else?
Thanks for your time, Nathan
A DataReader is always your best choice--provided that it is compatible with your usage. DataReaders are very fast, efficient, and lightweight--but they carry the requirement that you maintain an active/open db connection for their lifecycle, this means they can't be marshalled across AppDomains (or across webservices, etc).
DataSets are actually populated by DataReaders--they are eager-loaded (all data is populated before any is accessed) and are therefore less performant, but they have the added benefit of being serializable (they're essentially just a DTO) and that means they're easy to carry across AppDomains or webservices.
The difference is sometimes summed up by saying "DataReaders are ideal for ADO.NET ONLINE (implying that it's fine to keep the db connection open) whereas DataSets are ideal for ADO.NET OFFLINE (where the consumer can't necessarily connect directly to the database).
DataAdapter (which fills a DataSet) uses a DataReader to do so.
So, DataReader is always more lightweight and easier to use than a DataAdapter. DataSets and DataTables always have a huge overhead in terms of memory usage. Makes no difference if you are fetching a single row, but makes a huge difference for bigger result sets.
If you are fetching a fixed number of items, in MS SQL Server, output variables from a stored proc (or parameterized command) usually perform best.
if you use a reader you must have a open connection to your database generally a DataReader is used for fetch a combo or dataGrid, but if you want to stock your data in memory and you close our data base connexion you must use Datatable
Note : excuse my english level
If you just want read-only access to the data, then go with a raw DataReader; it's the fasted and most lightweight data access method.
However, if you intend to alter the data and save back to the database, then I would recommend using a DataAdapter and a DataSet (even a typed DataSet) because the DataSet class takes care of tracking changes, additions and deletions to the set which makes saves much easier. Additionally, if you have multiple tables in the dataset, you can model the referential constraints between them in the dataset.
I just saw this topic: Datatable vs Dataset
but it didn't solve my doubt .. Let me explain better, I was doing connection with database and needed to show the results in a GridView. (I used RecordSet when I worked with VB6 while ago and DataSet is pretty similar to it so was much easier to use DataSet.)
Then a guy told me DataSet wasn't the best method to do ..
So, should I 'learn' DataReader or keep using DataSet ? DataTable ?
What are the pros/cons ?
That is essentially: "which is better: a bucket or a hose?"
A DataSet is the bucket here; it allows you to carry around a disconnected set of data and work with it - but you will incur the cost of carrying the bucket (so best to keep it to a size you are comfortable with).
A data-reader is the hose: it provides one-way/once-only access to data as it flies past you; you don't have to carry all of the available water at once, but it needs to be connected to the tap/database.
And in the same way that you can fill a bucket with a hose, you can fill the DataSet with the data-reader.
The point I'm trying to make is that they do different things...
I don't personally use DataSet very often - but some people love them. I do, however, make use of data-readers for BLOB access etc.
It depends on your needs. One of the most important differences is that a DataReader will retain an open connection to your database until you're done with it while a DataSet will be an in-memory object. If you bind a control to a DataReader then it's still open. In addition, a DataReader is a forward only approach to reading data that can't be manipulated. With a DataSet you can move back and forth and manipulate the data as you see fit.
Some additional features: DataSets can be serialized and represented in XML and, therefore, easily passed around to other tiers. DataReaders can't be serialized.
On the other hand if you have a large amount of rows to read from the database that you hand off to some process for a business rule a DataReader may make more sense rather than loading a DataSet with all the rows, taking up memory and possibly affecting scalability.
Here's a link that's a little dated but still useful: Contrasting the ADO.NET DataReader and DataSet.
Further to Marc's point: you can use a DataSet with no database at all.
You can fill it from an XML file, or just from a program. Fill it with rows from one database, then turn around and write it out to a different database.
A DataSet is a totally in-memory representation of a relational schema. Whether or not you ever use it with an actual relational database is up to you.
Different needs, different solutions.
As you said, dataset is most similar to VB6 Recordset. That is, pull down the data you need, pass it around, do with it what you will. Oh, and then eventually get rid of it when you're done.
Datareader is more limited, but it gives MUCH better performance when all you need is to read through the data once. For instance, if you're filling a grid yourself - i.e. pull the data, run through it, for each row populate the grid, then throw out the data - datareader is much better than dataset. On the other hand, dont even try using datareader if you have any intention of updating the data...
So, yes, learn it - but only use it when appropriate. Dataset gives you much more flexibility.
DataReader vs Dataset
1) - DataReader is designed in the connection-oriented architecture
- DataSet is designed in the disconnected architecture
2) - DataReader gives forward-only access to the data
- DataSet gives scrollable navigation to the data
3) - DataReader is read-only we can’t make changes to the data present under it
- DataSet is updatable we can make changes to the data present under it and send those changes back to the data source
4) - DataReader does not provide options like searching and sorting of data
- DataSet provides options like searching and sorting of data
To answer your second question - Yes, you should learn about DataReaders. If anything, so you understand how to use them.
I think you're better of in this situation using DataSets - since you're doing data binding and all (I'm thinking CPU cycles vs Human effort).
As to which one will give a better performance. It very much depends on your situation. For example, if you're editing the data you're binding and batching up the changes then you will be better off with DataSets
DataReader is used to retrieve read-only and forward-only data from a database. It read only one row at a time and read only forward, cannot read backward/random. DataReader cannot update/manipulate data back to database. It retrieve data from single table. As it is connected architecture, data is available as long as the connection exists.
DataSet is in-memory tables. It is disconnected architecture, automatically opens the connection and retrieve the data into memory, closes the connection when done. It fetches all the data at a time from the datasource to its memory. DataSet helps to fetch data from multiple tables and it can fetch back/forth/randomly.DataSet can update/insert/manipulate data.
Is there a general consensus out there for when working with library's that call stored procedures? Return datasets or use sqldatareader to populate custom objects?
Is the cost of serialization your Data Transport Object less then a DataSet?
Personally, I use a SqlDataAdapter with DataTables. DataTables have WAY less overhead than DataSets. My entity objects only contain business rules, they aren't used to transport data across tiers.
You may want to think about skipping the Data-Access Library; instead, have your business objects automatically there for you, populated with data, when you need them. NHibernate.
I'd have to agree with Justice, not necessarily about NHibernate (altho it's a great option) I would definately look at using some sort of ORM like NHibernate, Subsonic, Linq-to-sql, llblgen or any other one of the ORMs around.
As Jeremy Miller states:
if you're writing ADO.Net code by
hand, you're stealing from your
employer or client.
and to that end, I'd have to recommend returning objects as opposed to datasets or datatables.
Also, if you're returning datasets, unless you strongly type each dataset, you're going to have to write a lot of "lifting" code in your library to get the values out of the datasets. With an ORM and objects all that heavy lifting is done for you.
Finally, with Linq in c# you now get much better functionality for working with collections (aggregates, grouping, sorting, filtering etc) that may have given datasets the advantage.
I also use the dataReader, but be aware, if you do, you must be careful to close it (and the connection it is using) as quickly as possible after you are done populating the custom object(s)... One gotcha to watch for is that when you call OpenReader(), make sure you pass the optional parameter called CommandBehavior set to CommandBehavior.CloseConenction, or, even if you close the reader, the underlying connection will not be closed and released to the pool until it gets picked up by the GC, which can easily cause you to run out of available connections if you're calling multiple reader objects in a loop.
At some point, it depends on the purpose of the library, or I sould say, the functionality of the library. Since the OOP hype, the "general consensus" is to first retrieve/fetch the data in the DAL using datareaders as they are faster, then load up your objects and close your readers, however, this is not always the case. To keep simplicity, some pass back the dataset so that gridviews can be bounded and paging/sorting can be enabled with minimal code. Remember simple is best.
In reporting application however, I've noticed a likeness for datasets, especially if the data is exposed by webservices.
The cost of serilization will be based on the usage of the application and also experience. An inexperience developer may return 3000 to 50000 rows of un-necessary data in a dataset. Remember the dataset is an animal, but with a lot of functionality. Use wisely.
Most ORM does serialization behind the scenes (I stand corrected here)so it could be fair to say that it wouldn't cost that much, but then again it depends on the application.