I am creating a C# plugin for Excel and have noticed that there seems to be a many ways that you can actually import data from a database into a sheet in excel.
I am targeting Excel 2010 and wondered whether anyone has already done this research and knows what the quickest way to load the data is?
I can already guess that anything breaching the COM boundary is going to be slow so I have to minimize that. So I can stick all the data into one 2d array and load it that way. Loading 0.5million rows with 10 columns takes around 5.5seconds (assuming I have all the data in the array already). I don't know whether that is good or bad.
...but like I said there are alot of ways to get the data in and I would like to use the fastest.
Have you ever tried SQLBulkCopy ?
Create a database query in the Excel sheet, specifying a connection string, target range and query string. Have the query executed from within Excel.
See http://www.dicks-clicks.com/excel/ExternalData3.htm for an example.
I would like to have fields in my Excel sheets be bound (in both directions) to a data source, in this case an Access DB.
For example, I would like to have an Excel sheet 'select' a particular record, say a customer, and then load information on that customer into the worksheet. Then, any changes made to that worksheet would be pushed back to the data store, making Excel a nice front end to the data.
Can this be done? From what I can tell "Get External Data" options in Excel are one way routes. My development background is heavy in ASP.NET C# and SQL.
Excel is designed to deal with datasets and not so much single records. For what you are trying to do with a single record, you would be far better off building a form in access, but as I don't know your environment/organisations limitations I'll make a suggestion.
Since you've obviously got a bit of SQL and coding skill check out this post for an option that would work for you - Updating Access Database from Excel Worksheet Data
You can get or put as much data as you want and can join tables too. It's a good basic get and then push set up.
I have a test bed application available to 3 users.
If all users run the app at the same time, then a stored procedure runs, and all three will have result sets, currently using ADO.NET Datatable of approx 30,000 records which the app then needs to move into an Excel template for each of them.
The template is an xlsm file which contains some VBA that needs to run after importing the data. This template is saved in the solution.
I'm going to attempt to move the data from theDataTable to a worksheet using Excel-Interop
Has anyone any experience of moving this amount of data from Datatable into Excel?
#slugster suggested "setting up a datasource from Excel and just run the query using a dirty read" ... is it possible to set up a datasource in Excel linked to a non-materialized datatable?
Will looping through a table that is 30000 rows by 10 columns via xl interop run into problems?
Has anyone any experience of moving this amount of data from Datatable
into Excel?
Not from a DataTable object, but that amount of data using Excel's built in ability to import data, yes.
#slugster suggested "setting up a datasource from Excel and just run
the query using a dirty read" ... is it possible to set up a
datasource in Excel linked to a non-materialized datatable?
I would suggest this as well. To go further, I'd suggest creating a stored procedure then calling that. You should see better performance using a stored procedure. The procedure could collect and prepare the data then return it to Excel. Also, you may be able to build in a caching mechanism inside the procedure. For example, if your data only changes daily, you only rebuild the data in the source table once per day, so only the first user to request the data takes an initial performance hit. Also, depending on what type of post processing you are doing in Excel in VBA, maybe that could be handled in the procedure as well. The procedure will also help reduce the possibility of locking issues if you add SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED to the top of the procedure, or use the (NOLOCK) hint on the tables you are willing to allow dirty reads from.
Here's a nice article regarding using stored procedures in Excel: http://codebyjoshua.blogspot.com/2012/01/get-data-from-sql-server-stored.html
Will looping through a table that is 30000 rows by 10 columns via xl
interop run into problems?
This depends on your definition of "problems." I could see possible performance implications, however if you handle as much as you can in the stored procedure, you should be fine. In the world of data, that's really teeny tiny.
i have a sql proc that returns 20,000 + records and want to get this data into a CSV for a SQL2005 bulk load operation.
i think using a data set is overkill since i need only forward only read access to the data.
now i have a data reader but dont think iterating the data-reader is a good idea cause it will lock the oracle DB i am getting the 20,000 records from for some time while its done doing its thing.
logically i am thinking to create a disconnected snapshot of the data in a data table maybe and use that to generate my csv file.
i dont often develop such ETL apps so i wanted to know whats the gold standard on this type of operation.
thoughts?
also, allow me to mention that this needs to be a console app since CORP rules wont allow linked servers and anything cool - so that means SSIS is out.
Since you are worried about doing the iteration of the datareader yourself i could recommend using SqlBulkCopy class.
It lets you load data into an sql server database from any source than can be read with an IDataReader instance
Might solve your potential locking issue.
I just saw this topic: Datatable vs Dataset
but it didn't solve my doubt .. Let me explain better, I was doing connection with database and needed to show the results in a GridView. (I used RecordSet when I worked with VB6 while ago and DataSet is pretty similar to it so was much easier to use DataSet.)
Then a guy told me DataSet wasn't the best method to do ..
So, should I 'learn' DataReader or keep using DataSet ? DataTable ?
What are the pros/cons ?
That is essentially: "which is better: a bucket or a hose?"
A DataSet is the bucket here; it allows you to carry around a disconnected set of data and work with it - but you will incur the cost of carrying the bucket (so best to keep it to a size you are comfortable with).
A data-reader is the hose: it provides one-way/once-only access to data as it flies past you; you don't have to carry all of the available water at once, but it needs to be connected to the tap/database.
And in the same way that you can fill a bucket with a hose, you can fill the DataSet with the data-reader.
The point I'm trying to make is that they do different things...
I don't personally use DataSet very often - but some people love them. I do, however, make use of data-readers for BLOB access etc.
It depends on your needs. One of the most important differences is that a DataReader will retain an open connection to your database until you're done with it while a DataSet will be an in-memory object. If you bind a control to a DataReader then it's still open. In addition, a DataReader is a forward only approach to reading data that can't be manipulated. With a DataSet you can move back and forth and manipulate the data as you see fit.
Some additional features: DataSets can be serialized and represented in XML and, therefore, easily passed around to other tiers. DataReaders can't be serialized.
On the other hand if you have a large amount of rows to read from the database that you hand off to some process for a business rule a DataReader may make more sense rather than loading a DataSet with all the rows, taking up memory and possibly affecting scalability.
Here's a link that's a little dated but still useful: Contrasting the ADO.NET DataReader and DataSet.
Further to Marc's point: you can use a DataSet with no database at all.
You can fill it from an XML file, or just from a program. Fill it with rows from one database, then turn around and write it out to a different database.
A DataSet is a totally in-memory representation of a relational schema. Whether or not you ever use it with an actual relational database is up to you.
Different needs, different solutions.
As you said, dataset is most similar to VB6 Recordset. That is, pull down the data you need, pass it around, do with it what you will. Oh, and then eventually get rid of it when you're done.
Datareader is more limited, but it gives MUCH better performance when all you need is to read through the data once. For instance, if you're filling a grid yourself - i.e. pull the data, run through it, for each row populate the grid, then throw out the data - datareader is much better than dataset. On the other hand, dont even try using datareader if you have any intention of updating the data...
So, yes, learn it - but only use it when appropriate. Dataset gives you much more flexibility.
DataReader vs Dataset
1) - DataReader is designed in the connection-oriented architecture
- DataSet is designed in the disconnected architecture
2) - DataReader gives forward-only access to the data
- DataSet gives scrollable navigation to the data
3) - DataReader is read-only we can’t make changes to the data present under it
- DataSet is updatable we can make changes to the data present under it and send those changes back to the data source
4) - DataReader does not provide options like searching and sorting of data
- DataSet provides options like searching and sorting of data
To answer your second question - Yes, you should learn about DataReaders. If anything, so you understand how to use them.
I think you're better of in this situation using DataSets - since you're doing data binding and all (I'm thinking CPU cycles vs Human effort).
As to which one will give a better performance. It very much depends on your situation. For example, if you're editing the data you're binding and batching up the changes then you will be better off with DataSets
DataReader is used to retrieve read-only and forward-only data from a database. It read only one row at a time and read only forward, cannot read backward/random. DataReader cannot update/manipulate data back to database. It retrieve data from single table. As it is connected architecture, data is available as long as the connection exists.
DataSet is in-memory tables. It is disconnected architecture, automatically opens the connection and retrieve the data into memory, closes the connection when done. It fetches all the data at a time from the datasource to its memory. DataSet helps to fetch data from multiple tables and it can fetch back/forth/randomly.DataSet can update/insert/manipulate data.