I have an excel file and i am querying this on my C# program with SQL using OleDB.
But i faced with a problem. My file has about 300K rows and querying takes too much long time. I have googled for this issue and used some libraries such as spreadsheetlight and EPPlus but they haven't got query feature.
Can anyone advice me for the fastest way to querying my file?
Thanks in advance.
I have worked with 400-800K rows Excel files. The task was to read all rows and insert them into SQL Server DB. From my experience OleDB was not able to process such big files in a timely manner, therefore we had to fall back to Excel file import directly into DB using SQL Server means, e.g. OPENROWSET.
Even smaller files, like 260K rows took approx. an hour with OleDB to import row-by-row into DB table using Core2 Duo generation hardware.
So, in your case you can consider the following:
1.Try reading Excel file in chunks using ranged SELECT:
OleDbCommand date = new OleDbCommand("SELECT ["+date+"] FROM [Sheet1$A1:Z10000]
WHERE ["+key+"]= " + array[i].ToString(), connection);
Note, [Sheet1$A1:Z10000] tells OleDB to process only first 10K rows of columns A to Z of the sheet instead the whole sheet. You can use this approach if for example your Excel file is sorted and you know that you don't need to check ALL rows but only for this year. Or you can change Z10000 dynamically to read the next chunk of the file and combine result with the previous one.
2.Get all your Excel file contents directly into DB using direct DB import, such as mentioned OPENROWSET of the MS SQL Server and then run your search queries against RDBMS instead of the Excel files.
I would personally suggest option #2. Comment if you can use DB at all and what RDBMS product/version is available to you, if any.
Hope this helps!
Related
Helo,
I have got ORACLE DB table with over 200M records and need migrate this to MSSQL. This process is repeated once a month and should be done +-over night. Fastest method seemed to be using bpc utils.
I tried to create bcp file using C# code, but when run a query to get data always end with exhausted DB temp space, which cannot be raised. When I split this single query with offset and fetch, I must add rownumber. This takes also a lot of time because of sorting.
Next solution could be direct export from oracle SQL developer with delimited columns to create csv file or bcp file, but some errors with encoding appeared. This is quite boring also this manual export takes about 1/3 of day.
My question is, is there any solution to create bpc file using C# in any normal time or any other way to achieve this?
Thanks.
I have a huge amount of data in excel files, with at least 20 columns each file.
I am working with .net (c#), my task is to import rows that met the conditions to insert data into SQL database, for an example, I need to insert only rows with current year (or selected year), also I have column name is 'Full Employee Name' I need to check it if it exists in table Resource Human.
Also other condition is to check if the column name is the same in the SQL table.
I am succeeding to do it with code, but at least 200 lines to do all the possible checks. I read about SSIS (integration service, BI tool), and it looks that can help me to do my task.
My question how doing it? I am stacking with this new concept.
I think that choosing the best approach is based on your needs:
If you are looking to create automated jobs and to perform data import from excel to SQL periodically, i think it is better to go SSIS
If you are trying to create a small tool that convert an excel file to SQL table, then working with .NET is fine
If you are looking to loop over Excel files with different structure, then you should use .NET or you have to convert files to .csv then use SSIS.
Also you can refer to the following Microsoft documentation for more options in importing Excel files to SQL: (SQL queries, Linked servers, OPENROWSET ...)
Import data from Excel to SQL Server or Azure SQL Database
If you've already got a working .net solution, and 200 lines of code doesn't sound that bad to me, I wouldn't bother looking into SSIS to replace it.
So I'm using C# and Visual Studio 2013. I currently have an importer that uses System.Data.OleDb to connect to MS Access databases and System.Data.SqlClient for the Sql Server connection.
What I originally was doing was I would read data from MS Access and store tables in a DataTable and store the data in the DataTable into SQL Server. It was doing well until I finally hit a table with some 30 odd columns and almost 1 million rows and I got an OutOfMemoryException.
So now I'm attempting to think of a work around. I am thinking of setting a row count check on an MS Access table before I attempt to load into a DataTable and if it is a certain number of rows or higher I plan to attempt to write to an external file and then do an import on that file.
So what I'm asking is anyone know how I can go about this? Only solutions I've seen use Interop and I've heared as a practice you don't want to use interop in code because its slow and not terribly reliable. I was attempting to get an import from MS Access to a .csv or .txt file, but if a table doesn't have a Primary Key I'm not sure how to go about iterating over a table if it's not currently in a DataTable.
If you are doing an import on large data, you could use and OleDbReader. When using an OleDbReader, it would not affect your memory as you would read through one record at a time to do the insert into another database.
It may take slightly longer, but will ensure completion without an OutOfMemory Error.
From some time I am struggling with exporting data from db (stored procedures) to excel files. As I am working with loads of data (around 500k rows per sheet, business requirement unfortunately) I was wondering whether there is the possibility to execute parallel writes to a single worksheet. I am using EPPlus and OpenXML for writing to the files. As I was reading through the web, I was able to understand that the User Defined Functions can be executed in parallel, but I didn't found anything for parallel writing. Any help/advice/hint/etc. will be much appreciated!
To my knowledge, no. You can program it, but it's a hell
If You want to export to Excel fastly I think that you have two ways.
CVS (fields separated by semicolon).
Use Excel with ADO. You can insert registers with Sql Sentences.
More info:
MSDN: How To Transfer Data from ADO Data Source to Excel with ADO
StackOverflow: ado in excel, inserting records into access database
Unfortunately, you'll not be able to write to a single file in parallel. A better way is to create separate files (you can write to them in parallel :)) and finally merge the sheets to get a single document. you can do that using OpenXML. Here is an example - http://blogs.technet.com/b/ptsblog/archive/2011/01/24/open-xml-sdk-merging-documents.aspx
Well, parallel would be ambiguous, I'm thinking of multi-threading which is concurrent.
How to increase performance on exporting a database with tables with one to many relationship (in so case many to many relationship) into a single excel file.
Right now, I get all the data from the database and process it into a table using a few for loops, then i change the header of the html file to download it as an excel file. But it take a while for the number of records i have (about 300 records. )
I was just wondering, if there is a faster way to improved performance.
Thanks
It sounds like you're loading each table into memory with your c# code, and then building a flat table by looping through the data. A vastly simpler and faster way to do that would be to use a SQL query with a few JOINs in it:
http://www.w3schools.com/sql/sql_join.asp
http://en.wikipedia.org/wiki/Join_(SQL)
Also, I get the impression that you're rendering the resulting flat table to html, and then saving that as an excel file. There are several ways that you can create that excel (or csv) file directly, without having to turn it into an html table first.