I have a test bed application available to 3 users.
If all users run the app at the same time, then a stored procedure runs, and all three will have result sets, currently using ADO.NET Datatable of approx 30,000 records which the app then needs to move into an Excel template for each of them.
The template is an xlsm file which contains some VBA that needs to run after importing the data. This template is saved in the solution.
I'm going to attempt to move the data from theDataTable to a worksheet using Excel-Interop
Has anyone any experience of moving this amount of data from Datatable into Excel?
#slugster suggested "setting up a datasource from Excel and just run the query using a dirty read" ... is it possible to set up a datasource in Excel linked to a non-materialized datatable?
Will looping through a table that is 30000 rows by 10 columns via xl interop run into problems?
Has anyone any experience of moving this amount of data from Datatable
into Excel?
Not from a DataTable object, but that amount of data using Excel's built in ability to import data, yes.
#slugster suggested "setting up a datasource from Excel and just run
the query using a dirty read" ... is it possible to set up a
datasource in Excel linked to a non-materialized datatable?
I would suggest this as well. To go further, I'd suggest creating a stored procedure then calling that. You should see better performance using a stored procedure. The procedure could collect and prepare the data then return it to Excel. Also, you may be able to build in a caching mechanism inside the procedure. For example, if your data only changes daily, you only rebuild the data in the source table once per day, so only the first user to request the data takes an initial performance hit. Also, depending on what type of post processing you are doing in Excel in VBA, maybe that could be handled in the procedure as well. The procedure will also help reduce the possibility of locking issues if you add SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED to the top of the procedure, or use the (NOLOCK) hint on the tables you are willing to allow dirty reads from.
Here's a nice article regarding using stored procedures in Excel: http://codebyjoshua.blogspot.com/2012/01/get-data-from-sql-server-stored.html
Will looping through a table that is 30000 rows by 10 columns via xl
interop run into problems?
This depends on your definition of "problems." I could see possible performance implications, however if you handle as much as you can in the stored procedure, you should be fine. In the world of data, that's really teeny tiny.
Related
So I'm using C# and Visual Studio 2013. I currently have an importer that uses System.Data.OleDb to connect to MS Access databases and System.Data.SqlClient for the Sql Server connection.
What I originally was doing was I would read data from MS Access and store tables in a DataTable and store the data in the DataTable into SQL Server. It was doing well until I finally hit a table with some 30 odd columns and almost 1 million rows and I got an OutOfMemoryException.
So now I'm attempting to think of a work around. I am thinking of setting a row count check on an MS Access table before I attempt to load into a DataTable and if it is a certain number of rows or higher I plan to attempt to write to an external file and then do an import on that file.
So what I'm asking is anyone know how I can go about this? Only solutions I've seen use Interop and I've heared as a practice you don't want to use interop in code because its slow and not terribly reliable. I was attempting to get an import from MS Access to a .csv or .txt file, but if a table doesn't have a Primary Key I'm not sure how to go about iterating over a table if it's not currently in a DataTable.
If you are doing an import on large data, you could use and OleDbReader. When using an OleDbReader, it would not affect your memory as you would read through one record at a time to do the insert into another database.
It may take slightly longer, but will ensure completion without an OutOfMemory Error.
I'm currently working with an import file that has 460,000 rows of data within it. Each row consists of a ID and a quantity (eg. "1,120"). This information is read from the file, then should be used to update each individual row within a database (eg. UPDATE item SET quantity = QTY WHERE id = 1).
The problem I'm having, though, is actually being able to actually run the query efficiently. If I run an individual query for each line, it's really not going to work (As I've found out the hard way).
I'm not in any way a SQL user and I'm currently learning, but from what I've seen, the web doesn't seem to have any useful results on this.
I was wondering if anybody had experience with updating such a large dataset, and if so, would they be willing to share the methods that they used to achieve this?
460k rows isn't a lot, so you should be okay there.
I'd recommend importing the entire dataset into a temporary table, or table variable. To get the solution working, start by creating an actual physical table, which you can either DROP or TRUNCATE while you are getting this working.
Create the table, then import all the data into it. Then, do your table update based on a join to this import table.
Discard the import table when appropriate. Once this is all working how you want it to, you can do the entire thing using a stored procedure, and use a temporary table to handle the imported data while you are working with it.
460000 rows is a small dataset. Really small.
Bulk insert into tempoary table, then use an update command to run the update on the original data in one run.
I would like to have fields in my Excel sheets be bound (in both directions) to a data source, in this case an Access DB.
For example, I would like to have an Excel sheet 'select' a particular record, say a customer, and then load information on that customer into the worksheet. Then, any changes made to that worksheet would be pushed back to the data store, making Excel a nice front end to the data.
Can this be done? From what I can tell "Get External Data" options in Excel are one way routes. My development background is heavy in ASP.NET C# and SQL.
Excel is designed to deal with datasets and not so much single records. For what you are trying to do with a single record, you would be far better off building a form in access, but as I don't know your environment/organisations limitations I'll make a suggestion.
Since you've obviously got a bit of SQL and coding skill check out this post for an option that would work for you - Updating Access Database from Excel Worksheet Data
You can get or put as much data as you want and can join tables too. It's a good basic get and then push set up.
I have a table with about 100 columns and about 10000 rows.
Periodically, I will receive an Excel with similar data and I now need to update the table.
If new rows exist in Excel, I have to add them to the db.
If old rows have been updated, I need to update the rows in the db.
If some rows have been deleted, I need to delete the row from my main table and add to another table.
I have thought about proceeding as follows:
Fetch all rows from db into a DataSet.
Import all rows from Excel into a DataSet.
Compare these 2 DataSets now using joins and perform the required operations.
I have never worked with data of this magnitude and am worried about the performance. Let me know the ideal way to go about realizing this requirement.
Thanks in advance. :)
don't worry about the performance with 10k records, you will not notice it...
maybe a better way to do it is to import the excel file in a temp table and do the processing with a couple simple sql queries... you'll save on dev time and it will potentially perform better...
As my experience says, its so simple if you choose to do the stuff in t-sql as following:
You can use "OPENROWSET", "OPENQUERY", linked servers, DTS and many other thing in SQL Server to import the excel file into a temporary table.
You can write some simple queries to do that. If you are using SQL 2008, "MERGE" has exacly made for your question.
Another thing is that the performance is far different than C#. You can use "TOP" clause to chunk the comparison and do many other things.
Hope it helps.
Cheers
I'm currently working on a project where we have a large data warehouse which imports several GB of data on a daily basis from a number of different sources. We have a lot of files with different formats and structures all being imported into a couple of base tables which we then transpose/pivot through stored procs. This part works fine. The initial import however, is awfully slow.
We can't use SSIS File Connection Managers as the columns can be totally different from file to file so we have a custom object model in C# which transposes rows and columns of data into two base tables; one for column names, and another for the actual data in each cell, which is related to a record in the attribute table.
Example - Data Files:
Example - DB tables:
The SQL insert is performed currently by looping through all the data rows and appending the values to a SQL string. This constructs a large dynamic string which is then executed at the end via SqlCommand.
The problem is, even running in a 1MB file takes about a minute, so when it comes to large files (200MB etc) it takes hours to process a single file. I'm looking for suggestions as to other ways to approach the insert that will improve performance and speed up the process.
There are a few things I can do with the structure of the loop to cut down on the string size and number of SQL commands present in the string but ideally I'm looking for a cleaner, more robust approach. Apologies if I haven't explained myself well, I'll try and provide more detail if required.
Any ideas on how to speed up this process?
The dynamic string is going to be SLOW. Each SQLCommand is a separate call to the database. You are much better off streaming the output as a bulk insertion operation.
I understand that all your files are different formats, so you are having to parse and unpivot in code to get it into your EAV database form.
However, because the output is in a consistent schema you would be better off either using separate connection managers and the built-in unpivot operator, or in a script task adding multiple rows to the data flow in the common output (just like you are currently doing in building your SQL INSERT...INSERT...INSERT for each input row) and then letting it all stream into a destination.
i.e. Read your data and in the script source, assign the FileID, RowId, AttributeName and Value to multiple rows (so this is doing the unpivot in code, but instead of generating a varying number of inserts, you are just inserting a varying number of rows into the dataflow based on the input row).
Then pass that through a lookup to get from AttributeName to AttributeID (erroring the rows with invalid attributes).
Stream straight into an OLEDB destination, and it should be a lot quicker.
One thought - are you repeatedly going back to the database to find the appropriate attribute value? If so, switching the repeated queries to a query against a recordset that you keep at the clientside will speed things up enormously.
This is something I have done before - 4 reference tables involved. Creating a local recordset and filtering that as appropriate caused a speed up of a process from 2.5 hours to about 3 minutes.
Why not store whatever reference tables are needed within each database and perform all lookups on the database end? Or it may even be better to pass a table type into each database where keys are needed, store all reference data in one central database and then perform your lookups there.