I am exporting data from Excel to a DataTable, but I am getting some performance issues when my Excel file contains large amount of rows...
public DataView LoadFromExcel()
{
Microsoft.Office.Interop.Excel.Application application =
new Microsoft.Office.Interop.Excel.Application();
Workbook workbook = null;
Worksheet worksheet = null;
string filename = null;
OpenFileDialog file = new OpenFileDialog();
if (true == file.ShowDialog())
{
filename = file.FileName;
}
workbook = application.Workbooks.Open(filename, true, true);
worksheet = workbook.Sheets[1];
Range range = worksheet.UsedRange;
int row = range.Rows.Count;
int columns = range.Columns.Count;
System.Data.DataTable dt = new System.Data.DataTable();
for (int i = 1; i <= columns; i++)
{
dt.Columns.Add((range.Cells[1, i] as Range).Value2.ToString());
}
for (row = 2; row <= range.Rows.Count; row++)
{
DataRow dr = dt.NewRow();
for (int column = 1; column <= range.Columns.Count; column++)
{
dr[column - 1] = (range.Cells[row, column] as
Microsoft.Office.Interop.Excel.Range).Value2.ToString();
}
dt.Rows.Add(dr);
dt.AcceptChanges();
}
workbook.Close(true, Missing.Value, Missing.Value);
application.Quit();
return dt.DefaultView;
}
Is there any way I can solve this problem? Please help.
I think this is not the right approach.
For inserting large amount of data into a table, you should use "Bulk Insert" feature of your database and during bulk insert, you should turn off the database log and roll-back features. Otherwise the bulk insert would act just like bunch of ordinary inserts.
I know Oracle and SQL Server has this feature and some NoSQL databases has it too. Since you have not mentioned what is your database, it helps to google it.
You can do it with the help of OLEDb provider. I have tried doing for 50000 records. It may help you, just try below code:
// txtPath.Text is the path to the excel file.
string conString = #"Provider=Microsoft.ACE.OLEDB.12.0;" + "Data Source=" + txtPath.Text + ";" + "Extended Properties=" + "\"" + "Excel 12.0;HDR=YES;" + "\"";
OleDbConnection oleCon = new OleDbConnection(conString);
OleDbCommand oleCmd = new OleDbCommand("SELECT field1, field2, field3 FROM [Sheet1$]", oleCon);
DataTable dt = new DataTable();
oleCon.Open();
dt.Load(oleCmd.ExecuteReader());
oleCon.Close();
You have to take care of few things:
Name of the sheet should be Sheet1 or else give the proper name in the query.
While reading the sheet, sheet should not be open.
The column name should be properly defined in the query
Column name should be on the first row in the sheet
I hope it will help you...
Let me know if any thing more you require... :)
You can use Sql bulk copy to perform such operation.
Try reading the values to variables and do some filters in order to avoid sending wrong values that can affect your database.
It is wrong to save unknown data to database most expecially MS SQL - do some filtering to make the saving easier and preserve your DB health..
Related
I have created a trigger statement in my database table and the data will be stored in a new physical table. Now I have created a console application to extract the data from physical table into excel sheet using excel interop.
Each time i run the application, I only want the not exported data to show in the new excel instead of showing everything. Its like I want to compare with previously generated excel and remove the data that is already in there in the currently generating excel.
For example:
Stock.xls data:
A
B
C
Database Table data :
A
B
C
When I run the application for the second time (I have added a new row in physical table in db, so the new xl sheet should remove a,b,c (and only must show d))
Stock.xls data:
D
Database Table data :
A
B
C
D
This is my code :
string connectionstring = System.Configuration.ConfigurationManager.ConnectionStrings["IntegrationConnection"].ConnectionString;
string sql2 = null;
string data2 = null;
int k = 0;
int l = 0;
string Filename2 = #"D:\Integration\Stock.xls";
if (!File.Exists(Filename2))
{
File.Create(Filename2).Dispose();
using (TextWriter tw = new StreamWriter(Filename2))
{
tw.WriteLine("Please run the program again");
tw.Close();
}
}
////*** Preparing excel Application
Excel.Application xlApp2;
Excel.Workbook xlWorkBook2;
Excel.Worksheet xlWorkSheet2;
object misValue2 = System.Reflection.Missing.Value;
///*** Opening Excel application
xlApp2 = new Microsoft.Office.Interop.Excel.Application();
xlWorkBook2 = xlApp2.Workbooks.Open(Filename2);
xlWorkSheet2 = (Excel.Worksheet)(xlWorkBook2.ActiveSheet as Excel.Worksheet);
xlApp2.DisplayAlerts = false;
SqlConnection conn2 = new SqlConnection(connectionstring);
conn2.Open();
sql2 = "SELECT * from tblMPartHistory";
///*** Preparing to retrieve value from the database
DataTable dtable2 = new DataTable();
SqlDataAdapter dscmd2 = new SqlDataAdapter(sql2, conn2);
DataSet ds2 = new DataSet();
dscmd2.Fill(dtable2);
////*** Generating the column Names here
string[] colNames2 = new string[dtable2.Columns.Count];
int col2 = 0;
foreach (DataColumn dc in dtable2.Columns)
colNames2[col2++] = dc.ColumnName;
char lastColumn2 = (char)(51 + dtable2.Columns.Count - 1);
xlWorkSheet2.get_Range("A1", lastColumn2 + "1").Value2 = colNames2;
xlWorkSheet2.get_Range("A1", lastColumn2 + "1").Font.Bold = true;
xlWorkSheet2.get_Range("A1", lastColumn2 + "1").VerticalAlignment
= Excel.XlVAlign.xlVAlignCenter;
/////*** Inserting the Column and Values into Excel file
for (k = 0; k <= dtable2.Rows.Count - 1; k++)
{
for (l = 0; l <= dtable2.Columns.Count - 1; l++)
{
data2 = dtable2.Rows[k].ItemArray[l].ToString();
xlWorkSheet2.Cells[k + 2, l + 1] = data2;
xlWorkBook2.Save();
}
}
xlWorkBook2.Close(true, misValue2, misValue2);
xlApp2.Quit();
System.Runtime.InteropServices.Marshal.ReleaseComObject(xlWorkSheet2);
System.Runtime.InteropServices.Marshal.ReleaseComObject(xlWorkBook2);
System.Runtime.InteropServices.Marshal.ReleaseComObject(xlApp2);
So this is how I understand your question:
You have a Database table and you want to generate a new excel sheet every time new rows have been 'commited' to your database, reflecting the recently added data only.
Of course you could go with your approach and just compare the data to all of your recent excel sheets, to remove duplicates.
A better approach would be though to add either a timestamp or a session-id to your dataset.
Then you can query your database for it and generate the new sheet from all the rows with the latest matching timestamp or highest matching session-id.
This way you will not only spare the extra work of the duplicate removal, but you'd also be able to restore all the sheets when they get lost somehow.
I want to read a lot of cells from Excel to the 2-dimensional array in C#.
Using Microsoft.Office.Interop.Excel and reading cells one by one is too slow. I know how to write the array to the range (Microsoft.Office.Interop.Excel really slow) but I would like to do it in the opposite direction
_Excel.Application xlApp = new _Excel.Application();
_Excel.Workbook xlWorkBook;
_Excel.Worksheet xlWorkSheet;
object misValue = System.Reflection.Missing.Value;
xlWorkBook = xlApp.Workbooks.Open(path);
xlWorkSheet = xlWorkBook.Worksheets["Engineering BOM"];
_Excel.Range range = (_Excel.Range)xlWorkSheet.Cells[1, 1];
range = range.get_Resize(13000, 9);
string[,] indexMatrix = new string[13000, 9];
// below code should be much faster
for (int i = 1; i < 1300; i++)
{
for (int j = 1; j < 9; j++)
{
indexMatrix[i, j] = xlWorkSheet.Cells[i, j].Value2;
}
}
As a result I want to have values from cells range in array (range size is exactly the same as array size). Now app is reading cell by cell and writing data to array but it is too slow. Is any way to copy a whole range to cells directly?
thank you in advance :)
You can try this, it should be faster but:
You have to use data tables(in this case it is better to use a data table instead a
multidimensional array.)
You don't need to care about range anymore.
So what are we going to do? connect to excel and make a query to select all the data and fill a data table. What we need? a few lines of code.
First we declare our connection string:
For Excel 2007 or above (*.XLSX files)
string connectionString = string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0 Xml;HDR=No;IMEX=1\";", fullPath);
For Excel 2003 (*.XLS files)
string connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=\"Excel 8.0;HDR=No;IMEX=1\";", fullPath);
where fullPath is the full file path of your excel file
Now we have to create the connection and fill the data table:
OleDbConnection SQLConn = new OleDbConnection(strConnectionString);
SQLConn.Open();
OleDbDataAdapter SQLAdapter = new OleDbDataAdapter();
string sql = "SELECT * FROM [" + sheetName + "$]";
OleDbCommand selectCMD = new OleDbCommand(sql, SQLConn);
SQLAdapter.SelectCommand = selectCMD;
SQLAdapter.Fill(dtXLS);
SQLConn.Close();
where sheetName is your sheet name, and dtXLS is your data table populated with all your excel value.
This should be faster.
I guess that range is somewhat defining a 'data table'. If that is right, then fastest would be to read that as Data using OleDb or ODbc (and doesn't even need excel to be installed):
DataTable tbl = new DataTable();
using (OleDbConnection con = new OleDbConnection(#"Provider=Microsoft.ACE.OLEDB.12.0;" +
$"Data Source={path};" +
#"Extended Properties=""Excel 12.0;HDR=Yes"""))
using (OleDbCommand cmd = new OleDbCommand(#"Select * from [Engineering BOM$A1:i13000]", con))
{
con.Open();
tbl.Load(cmd.ExecuteReader());
}
If it was not, then you could do this:
Excel.Application xl = new Excel.Application();
var wb = xl.Workbooks.Open(path);
Excel.Worksheet ws = (Excel.Worksheet)wb.Worksheets["Engineering BOM"];
var v = ws.Range["A1:I13000"].Value;
(Not sure if excel itself could do such a big array allocation).
private void OnCreated(object sender, FileSystemEventArgs e)
{
excelDataSet.Clear();
string extension = Path.GetExtension(e.FullPath);
if (extension == ".xls" || extension == ".xlsx")
{
string ConnectionString = "";
if (extension == ".xls") { ConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0; Data Source = '" + e.FullPath + "';Extended Properties=\"Excel 8.0;HDR=YES;\""; }
if (extension == ".xlsx") { ConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0; Data Source = '" + e.FullPath + "';Extended Properties=\"Excel 12.0;HDR=YES;\""; }
using (OleDbConnection conn = new OleDbConnection(ConnectionString))
{
conn.Open();
OleDbDataAdapter objDA = new OleDbDataAdapter("select * from [Sheet1$]", conn);
objDA.Fill(excelDataSet);
conn.Close();
conn.Dispose();
}
}
}
This is my code. It's working when my filewatcher triggers. Problem is the excel file I read has 1 header row and 3 row that has values. When I use this code and check my dataset row count I get 9.. I've no idea where does it take that 9 from, am I doing something wrong? I'm checking my code for last 30-35 min and still couldn't find what I'm doing wrong..
I get the column's right but the rows are not working. I don't need the header line btw
Update: my example excel file had 3 rows and I was getting 9 as row count. I just copied these rows and made my file 24 row + 1 header row and when I did rows.count I got 24 as answer. So it worked fine? Is that normal?
There is a Nuget called Linq to Excel. I used this nuget in several projects to query the data inside .csv and .xlsx files without any difficulty, it is easy to implement. It might be poor in performance but it can resolve your problem.
Here is the documentation of Linq to Excel
I would highly recommend you to take a look at EPPLUS library https://github.com/JanKallman/EPPlus/wiki
I have plently of trouble with oledb until i found EPPLUS. It's really easy to use for creating and updating excel files. There are plenty of good examples out there like the one under which is from How do i iterate through rows in an excel table using epplus?
var package = new ExcelPackage(new FileInfo("sample.xlsx"));
ExcelWorksheet workSheet = package.Workbook.Worksheets[1];
var start = workSheet.Dimension.Start;
var end = workSheet.Dimension.End;
for (int row = start.Row; row <= end.Row; row++)
{ // Row by row...
for (int col = start.Column; col <= end.Column; col++)
{ // ... Cell by cell...
object cellValue = workSheet.Cells[row, col].Text; // This got me the actual value I needed.
}
}
I am creating a test framework that should read parameters from an excel sheet. I would like to be able to :
Get a row count of test records in the sheet
Get column count
Reference a particular cell eg A23 and read or write values to it.
I found this code on the internet. Its great but it appears to have been coded to work with a form component. I dont necessarily need to show the excel sheet on a datagrid.
This is the code I found. Its working ok but I need to add the functionalities above. Thanks for your help :)
using System.Data;
using System.Data.OleDb;
...
OleDbConnection con = new OleDbConnection(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\Book1.xls;Extended Properties=Excel 8.0");
OleDbDataAdapter da = new OleDbDataAdapter("select * from MyObject", con);
DataTable dt = new DataTable();
da.Fill(dt);
Count Row
sheet.Range["A11"].Formula = “COUNT(A1:A10)”;
Count Column
sheet.Range["A12"].Formula = “COUNT(A1:F1)”;
.NET Excel componetn
you can Reference particular cells using this code :
Select * from [Sheet1$A1:B10]
for example above code access to cell A1 to B10
see here
You can use this method:
private DataTable LoadXLS(string filePath)
{
DataTable table = new DataTable();
DataRow row;
try
{
using (OleDbConnection cnLogin = new OleDbConnection())
{
cnLogin.ConnectionString = "provider=Microsoft.Jet.OLEDB.4.0;Data Source='" + filePath + "';Extended Properties=Excel 8.0;";
cnLogin.Open();
string sQuery = "SELECT * FROM [Sheet1$]";
table.Columns.Add("Tags", typeof(string));
table.Columns.Add("ReplaceWords", typeof(string));
OleDbCommand comDB = new OleDbCommand(sQuery, cnLogin);
using (OleDbDataReader drJobs = comDB.ExecuteReader(CommandBehavior.Default))
{
while (drJobs.Read())
{
row = table.NewRow();
row["Tags"] = drJobs[0].ToString();
row["ReplaceWords"] = drJobs[1].ToString();
table.Rows.Add(row);
}
}
}
return table;
}
And use like this:
DataTable dtXLS = LoadXLS(path);
//and do what you need
If you need to write into Excel you need to check this out http://msdn.microsoft.com/en-us/library/dd264733.aspx
an easy way to handle excel files and operations to them is the following one:
add the microsoft.office.interop.excel reference to your project (Add Reference.. => search under the .NET tab => add the reference)
create a new excel application and open the workbook:
Excel.Application application = new Excel.Application();
Excel.Workbook workbook = application.Workbooks.Open(workBookPath);
Excel.Worksheet worksheet = workbook.Sheets[worksheetNumber];
you can get the row and column count with the following lines:
var endColumn = worksheet.Columns.CurrentRegion.EntireColumn.Count;
var endRow = worksheet.Rows.CurrentRegion.EntireRow.Count;***
reading values form a cell or a range of cells can be done in the following way(rowIndex is the number of the row in which the cells you want to read out are):
System.Array values = (System.Array)worksheet.get_Range("A" +
rowIndex.ToString(), "D" + rowIndex.ToString()).Cells.Value;
I'm importing data in a SQL Server 2008 database from excel file where the first row is headers (HDR=1). The thing is that the second row is also kind of headers which i don't really need to be imported. So how do I ignore the second row from that excel (I guess if the first row is the headers, the actual second row in excel is first)?
In MySQL is just about saying IGNORE LINES 1 in the end of import command ... How do I do it in SQL Server?
Here is part of the code doing that:
//Create Connection to Excel work book
OleDbConnection excelConnection = new OleDbConnection(excelConnectionString);
//Create OleDbCommand to fetch data from Excel
OleDbCommand cmd = new OleDbCommand("Select [task_code],[status_code],[wbs] from [task$]", excelConnection);
excelConnection.Open();
OleDbDataReader dReader;
dReader = cmd.ExecuteReader();
SqlBulkCopy sqlBulk = new SqlBulkCopy(connectionString);
//Give your Destination table name
sqlBulk.DestinationTableName = "task";
sqlBulk.WriteToServer(dReader);
sqlBulk.Close();
Thanks
Use the following:
...
OleDbDataReader dReader;
dReader = cmd.ExecuteReader();
if( !dReader.Read() || !dReader.Read())
return "No data";
SqlBulkCopy sqlBulk = new SqlBulkCopy(connectionString);
...
A quick solution would be to:
Copy the file
Use Office interop to delete the second line of the spreadsheet
Import the amended spreadsheet
To delete the line from the spreadsheet:
public static void DeleteRow(string pathToFile, string sheetName, string cellRef)
{
Application app= new Application();
Workbook workbook = app.Workbooks.Open(pathToFile);
for (int sheetNum = 1; sheetNum < workbook.Sheets.Count + 1; sheetNum++)
{
Worksheet sheet = (Worksheet)workbook.Sheets[sheetNum];
if (sheet.Name != sheetName)
{
continue;
}
Range secondRow = sheet.Range[cellRef];
secondRow.EntireRow.Delete();
}
workbook.Save();
workbook.Close();
app.Quit();
}