How to increase performance on exporting a database with tables with one to many relationship (in so case many to many relationship) into a single excel file.
Right now, I get all the data from the database and process it into a table using a few for loops, then i change the header of the html file to download it as an excel file. But it take a while for the number of records i have (about 300 records. )
I was just wondering, if there is a faster way to improved performance.
Thanks
It sounds like you're loading each table into memory with your c# code, and then building a flat table by looping through the data. A vastly simpler and faster way to do that would be to use a SQL query with a few JOINs in it:
http://www.w3schools.com/sql/sql_join.asp
http://en.wikipedia.org/wiki/Join_(SQL)
Also, I get the impression that you're rendering the resulting flat table to html, and then saving that as an excel file. There are several ways that you can create that excel (or csv) file directly, without having to turn it into an html table first.
Related
I have an excel file and i am querying this on my C# program with SQL using OleDB.
But i faced with a problem. My file has about 300K rows and querying takes too much long time. I have googled for this issue and used some libraries such as spreadsheetlight and EPPlus but they haven't got query feature.
Can anyone advice me for the fastest way to querying my file?
Thanks in advance.
I have worked with 400-800K rows Excel files. The task was to read all rows and insert them into SQL Server DB. From my experience OleDB was not able to process such big files in a timely manner, therefore we had to fall back to Excel file import directly into DB using SQL Server means, e.g. OPENROWSET.
Even smaller files, like 260K rows took approx. an hour with OleDB to import row-by-row into DB table using Core2 Duo generation hardware.
So, in your case you can consider the following:
1.Try reading Excel file in chunks using ranged SELECT:
OleDbCommand date = new OleDbCommand("SELECT ["+date+"] FROM [Sheet1$A1:Z10000]
WHERE ["+key+"]= " + array[i].ToString(), connection);
Note, [Sheet1$A1:Z10000] tells OleDB to process only first 10K rows of columns A to Z of the sheet instead the whole sheet. You can use this approach if for example your Excel file is sorted and you know that you don't need to check ALL rows but only for this year. Or you can change Z10000 dynamically to read the next chunk of the file and combine result with the previous one.
2.Get all your Excel file contents directly into DB using direct DB import, such as mentioned OPENROWSET of the MS SQL Server and then run your search queries against RDBMS instead of the Excel files.
I would personally suggest option #2. Comment if you can use DB at all and what RDBMS product/version is available to you, if any.
Hope this helps!
I have a table with about 100 columns and about 10000 rows.
Periodically, I will receive an Excel with similar data and I now need to update the table.
If new rows exist in Excel, I have to add them to the db.
If old rows have been updated, I need to update the rows in the db.
If some rows have been deleted, I need to delete the row from my main table and add to another table.
I have thought about proceeding as follows:
Fetch all rows from db into a DataSet.
Import all rows from Excel into a DataSet.
Compare these 2 DataSets now using joins and perform the required operations.
I have never worked with data of this magnitude and am worried about the performance. Let me know the ideal way to go about realizing this requirement.
Thanks in advance. :)
don't worry about the performance with 10k records, you will not notice it...
maybe a better way to do it is to import the excel file in a temp table and do the processing with a couple simple sql queries... you'll save on dev time and it will potentially perform better...
As my experience says, its so simple if you choose to do the stuff in t-sql as following:
You can use "OPENROWSET", "OPENQUERY", linked servers, DTS and many other thing in SQL Server to import the excel file into a temporary table.
You can write some simple queries to do that. If you are using SQL 2008, "MERGE" has exacly made for your question.
Another thing is that the performance is far different than C#. You can use "TOP" clause to chunk the comparison and do many other things.
Hope it helps.
Cheers
I use C# SqlBulkCopy class to load big XML file to SQL server. I've implemented IDataReader, which loops throught XML and get values.
The file contains a lot of tables, so I have to call SqlBulkCopy. WriteToServer method as many times as many tables I have in source XML file. Every time DataReader loops throught whole file, which takes a lot of time.
How can I improve perfomance of my app? Is there a better way to do what I want?
Here is a plan of my program:
Loop thru source file - determine tables and their columns (and datatypes).
Create tables on Sql Server.
Load data to Sql Server by looping source file and get the values for each table I've determined, one by one.
I'm currently working on a project where we have a large data warehouse which imports several GB of data on a daily basis from a number of different sources. We have a lot of files with different formats and structures all being imported into a couple of base tables which we then transpose/pivot through stored procs. This part works fine. The initial import however, is awfully slow.
We can't use SSIS File Connection Managers as the columns can be totally different from file to file so we have a custom object model in C# which transposes rows and columns of data into two base tables; one for column names, and another for the actual data in each cell, which is related to a record in the attribute table.
Example - Data Files:
Example - DB tables:
The SQL insert is performed currently by looping through all the data rows and appending the values to a SQL string. This constructs a large dynamic string which is then executed at the end via SqlCommand.
The problem is, even running in a 1MB file takes about a minute, so when it comes to large files (200MB etc) it takes hours to process a single file. I'm looking for suggestions as to other ways to approach the insert that will improve performance and speed up the process.
There are a few things I can do with the structure of the loop to cut down on the string size and number of SQL commands present in the string but ideally I'm looking for a cleaner, more robust approach. Apologies if I haven't explained myself well, I'll try and provide more detail if required.
Any ideas on how to speed up this process?
The dynamic string is going to be SLOW. Each SQLCommand is a separate call to the database. You are much better off streaming the output as a bulk insertion operation.
I understand that all your files are different formats, so you are having to parse and unpivot in code to get it into your EAV database form.
However, because the output is in a consistent schema you would be better off either using separate connection managers and the built-in unpivot operator, or in a script task adding multiple rows to the data flow in the common output (just like you are currently doing in building your SQL INSERT...INSERT...INSERT for each input row) and then letting it all stream into a destination.
i.e. Read your data and in the script source, assign the FileID, RowId, AttributeName and Value to multiple rows (so this is doing the unpivot in code, but instead of generating a varying number of inserts, you are just inserting a varying number of rows into the dataflow based on the input row).
Then pass that through a lookup to get from AttributeName to AttributeID (erroring the rows with invalid attributes).
Stream straight into an OLEDB destination, and it should be a lot quicker.
One thought - are you repeatedly going back to the database to find the appropriate attribute value? If so, switching the repeated queries to a query against a recordset that you keep at the clientside will speed things up enormously.
This is something I have done before - 4 reference tables involved. Creating a local recordset and filtering that as appropriate caused a speed up of a process from 2.5 hours to about 3 minutes.
Why not store whatever reference tables are needed within each database and perform all lookups on the database end? Or it may even be better to pass a table type into each database where keys are needed, store all reference data in one central database and then perform your lookups there.
I'm building an application to import data into a sql server 2008 Express db.
This database is being used by an application that is currently in production.
The data that needs to be imported comes from various sources, mostly excel sheets and xml files.
The database has the following tables:
tools
powertools
strikingtools
owners
Each row, or xml tag in the source files has information about 1 tool:
name, tooltype, weight, wattage, owner, material, etc...
Each of these rows has the name of the tool's owner this name has to be inserted into the owners table but only if the name isn't already in there.
For each of these rows a new row needs to be inserted in the tools table.
The tools table has a field owner_id with a foreign key to the owners table where the primary key of the corresponding row in the owners table needs to be set
Depending on the tooltype a new row must be created in either the powertools table or the strikingtools table. These 2 tables also have a tool_id field with a foreign key to the tools table that must be filled in.
The tools table has a tool_owner_id field with a foreign key to the owners table that must be filled in.
If any of the rows in the importfile fails to import for some reason, the entire import needs to be rolled back
Currently I'm using a dataset to do this but for some large files (over 200.000 tools) this requires quite a lot of memory. Can anybody think of a better aproach for this?
There are two main issues to be solved:
Parsing the a large XML document efficiently.
Adding a large amount of records to the database.
XML Parsing
Although the DataSet approach works, the whole XML document is loaded into memory. To improve the efficiency of working with large XML documents you might want look at the XmlReader class. The API is slightly more difficult to use than what DataSet provides. But you will get the benefit of not loading the whole DOM into memory at once.
Inserting records to the DB
To satisfy your Atomicity requirement you can use a single database transaction but the large number of records you are dealing with for a single transaction is not ideal. You will most likely incur issues like:
Database having to deal with a large number of locks
Database locks that might escalate from row locks to page locks and even table locks.
Concurrent use of the database will be severely affect during the import.
I would recommend the following instead of a single DB transaction:
See if it possible to create smaller transaction batches. Maybe 100 records at a time. Perhaps it is possible to logically load sections of the XML file together, where it would be acceptable load a subset of the data as a unit into the system.
Validate as much of your data upfront. E.g. Check that required fields are filled or that FK's are correct.
Make the upload repeatable. Skip over existing data.
Provide a manual undo strategy. I know this is easier said than done, but might even be required as an additional business rule. For example the upload was successful but someone realises a couple of hours later that the wrong file was uploaded.
It might be useful to upload your data to a initial staging area in your DB to perform validations and to mark which records have been processed.
Use SSIS, and create and ETL package.
Use Transactions for the roll back feature, and stored procedure that handle creating/checking the foreign keys.