I have an Excel file which I am required to parse, validate and then load into a SQL Server database using Interop. I have the application working and everything is fine by reading a sheet, reading each line (row and columns) and adding that line to an List as an Insert statement. When I reach the end of the Worksheet, I execute all of the Insert statements as one batch.
The problem I have is that it is using a lot of RAM when the worksheet is big (1000+ rows). Is there a better or more efficient strategy for larger data? Should I be committing more and clearing the List?
I don't think there is much you can do on the parsing side (unless you are coding it all yourself), but I'd INSERT the data as soon as you have a row available. No need to store it in a list. In your solution, you are basically storing all data twice (once in the "Excel memory" and once in "database insert memory").
Related
I have an excel file and i am querying this on my C# program with SQL using OleDB.
But i faced with a problem. My file has about 300K rows and querying takes too much long time. I have googled for this issue and used some libraries such as spreadsheetlight and EPPlus but they haven't got query feature.
Can anyone advice me for the fastest way to querying my file?
Thanks in advance.
I have worked with 400-800K rows Excel files. The task was to read all rows and insert them into SQL Server DB. From my experience OleDB was not able to process such big files in a timely manner, therefore we had to fall back to Excel file import directly into DB using SQL Server means, e.g. OPENROWSET.
Even smaller files, like 260K rows took approx. an hour with OleDB to import row-by-row into DB table using Core2 Duo generation hardware.
So, in your case you can consider the following:
1.Try reading Excel file in chunks using ranged SELECT:
OleDbCommand date = new OleDbCommand("SELECT ["+date+"] FROM [Sheet1$A1:Z10000]
WHERE ["+key+"]= " + array[i].ToString(), connection);
Note, [Sheet1$A1:Z10000] tells OleDB to process only first 10K rows of columns A to Z of the sheet instead the whole sheet. You can use this approach if for example your Excel file is sorted and you know that you don't need to check ALL rows but only for this year. Or you can change Z10000 dynamically to read the next chunk of the file and combine result with the previous one.
2.Get all your Excel file contents directly into DB using direct DB import, such as mentioned OPENROWSET of the MS SQL Server and then run your search queries against RDBMS instead of the Excel files.
I would personally suggest option #2. Comment if you can use DB at all and what RDBMS product/version is available to you, if any.
Hope this helps!
I have a test bed application available to 3 users.
If all users run the app at the same time, then a stored procedure runs, and all three will have result sets, currently using ADO.NET Datatable of approx 30,000 records which the app then needs to move into an Excel template for each of them.
The template is an xlsm file which contains some VBA that needs to run after importing the data. This template is saved in the solution.
I'm going to attempt to move the data from theDataTable to a worksheet using Excel-Interop
Has anyone any experience of moving this amount of data from Datatable into Excel?
#slugster suggested "setting up a datasource from Excel and just run the query using a dirty read" ... is it possible to set up a datasource in Excel linked to a non-materialized datatable?
Will looping through a table that is 30000 rows by 10 columns via xl interop run into problems?
Has anyone any experience of moving this amount of data from Datatable
into Excel?
Not from a DataTable object, but that amount of data using Excel's built in ability to import data, yes.
#slugster suggested "setting up a datasource from Excel and just run
the query using a dirty read" ... is it possible to set up a
datasource in Excel linked to a non-materialized datatable?
I would suggest this as well. To go further, I'd suggest creating a stored procedure then calling that. You should see better performance using a stored procedure. The procedure could collect and prepare the data then return it to Excel. Also, you may be able to build in a caching mechanism inside the procedure. For example, if your data only changes daily, you only rebuild the data in the source table once per day, so only the first user to request the data takes an initial performance hit. Also, depending on what type of post processing you are doing in Excel in VBA, maybe that could be handled in the procedure as well. The procedure will also help reduce the possibility of locking issues if you add SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED to the top of the procedure, or use the (NOLOCK) hint on the tables you are willing to allow dirty reads from.
Here's a nice article regarding using stored procedures in Excel: http://codebyjoshua.blogspot.com/2012/01/get-data-from-sql-server-stored.html
Will looping through a table that is 30000 rows by 10 columns via xl
interop run into problems?
This depends on your definition of "problems." I could see possible performance implications, however if you handle as much as you can in the stored procedure, you should be fine. In the world of data, that's really teeny tiny.
How to increase performance on exporting a database with tables with one to many relationship (in so case many to many relationship) into a single excel file.
Right now, I get all the data from the database and process it into a table using a few for loops, then i change the header of the html file to download it as an excel file. But it take a while for the number of records i have (about 300 records. )
I was just wondering, if there is a faster way to improved performance.
Thanks
It sounds like you're loading each table into memory with your c# code, and then building a flat table by looping through the data. A vastly simpler and faster way to do that would be to use a SQL query with a few JOINs in it:
http://www.w3schools.com/sql/sql_join.asp
http://en.wikipedia.org/wiki/Join_(SQL)
Also, I get the impression that you're rendering the resulting flat table to html, and then saving that as an excel file. There are several ways that you can create that excel (or csv) file directly, without having to turn it into an html table first.
I have a table with about 100 columns and about 10000 rows.
Periodically, I will receive an Excel with similar data and I now need to update the table.
If new rows exist in Excel, I have to add them to the db.
If old rows have been updated, I need to update the rows in the db.
If some rows have been deleted, I need to delete the row from my main table and add to another table.
I have thought about proceeding as follows:
Fetch all rows from db into a DataSet.
Import all rows from Excel into a DataSet.
Compare these 2 DataSets now using joins and perform the required operations.
I have never worked with data of this magnitude and am worried about the performance. Let me know the ideal way to go about realizing this requirement.
Thanks in advance. :)
don't worry about the performance with 10k records, you will not notice it...
maybe a better way to do it is to import the excel file in a temp table and do the processing with a couple simple sql queries... you'll save on dev time and it will potentially perform better...
As my experience says, its so simple if you choose to do the stuff in t-sql as following:
You can use "OPENROWSET", "OPENQUERY", linked servers, DTS and many other thing in SQL Server to import the excel file into a temporary table.
You can write some simple queries to do that. If you are using SQL 2008, "MERGE" has exacly made for your question.
Another thing is that the performance is far different than C#. You can use "TOP" clause to chunk the comparison and do many other things.
Hope it helps.
Cheers
my windows app is reading text file and inserting it into the database. The problem is text file is extremely big (at least for our low-end machines). It has 100 thousands rows and it takes time to write it into the database.
Can you guys suggest how should i read and write the data efficiently so that it does not hog machine memory?
FYI...
Column delimiter : '|'
Row delimiter : NewLine
It has approximately 10 columns.. (It has an information of clients...like first name, last name, address, phones, emails etc.)
CONSIDER THAT...I AM RESTRICTED FROM USING BULK CMD.
You don't say what kind of database you're using, but if it is SQL Server, then you should look into the BULK INSERT command or the BCP utility.
Given that there is absolutely no chance of getting help from your security folks and using BULK commands, here is the approach I would take:
Make sure you are reading the entire text file first before inserting into the database. Thus reducing the I/O.
Check what indexes you have on the destination table. Can you insert into a temporary table with no indexes or dependencies so that the individual inserts are fast?
Does this data need to be visible immediately after insert? If not then you can have a scheduled job to read from the temp table in step 2 above and insert into the destination table (that has indexes, foreign keys etc.).
Is it possible for you to register your custom assembly into Sql Server? (I'm assuming it's sql server because you've already said you used bulk insert earlier).
Than you can call your assembly to do (mostly) whatever you need, like getting a file from some service (or whatever your option is), parsing and inserting directly into tables.
This is not an option I like, but it could be a saver sometimes.