I'm writing a program in which I'm using C# language, DataSet, etc. I have about 200 000 values what I want to export to an .xlsx document.
My code:
using Excel = Microsoft.Office.Interop.Excel;
...
Excel.Application excelApp = new Excel.Application();
Excel.Workbook excelworkbook = excelApp.Workbooks.Open(/location/);
Excel._Worksheet excelworkSheet = (Excel.Worksheet)excelApp.ActiveSheet;
...
excelApp.visible = true;
...
for (int i = 0; i < /value/; i++)
for (int j = 0; j < /value/; j++)
excelworkSheet.Cells[i, j] = /value/;
It works well, but it is too slow (at least 5-10 minutes).
Have you got any advice?
I just took the same performance hit, wrote this to benchmark:
[Test]
public void WriteSpeedTest()
{
var excelApp = new Application();
var workbook = excelApp.Workbooks.Add();
var sheet = (Worksheet)workbook.Worksheets[1];
int n = 1000;
var stopwatch = Stopwatch.StartNew();
SeparateWrites(sheet, n);
Console.WriteLine("SeparateWrites(sheet, " + n + "); took: " + stopwatch.ElapsedMilliseconds + " ms");
stopwatch.Restart();
BatchWrite(sheet, n);
Console.WriteLine("BatchWrite(sheet, " + n + "); took: " + stopwatch.ElapsedMilliseconds + " ms");
workbook.SaveAs(Path.Combine(#"C:\TEMP", "Test"));
workbook.Close(false);
Marshal.FinalReleaseComObject(excelApp);
}
private static void BatchWrite(Worksheet sheet, int n)
{
string[,] strings = new string[n, 1];
var array = Enumerable.Range(1, n).ToArray();
for (var index = 0; index < array.Length; index++)
{
strings[index, 0] = array[index].ToString();
}
sheet.Range["B1", "B" + n].set_Value(null, strings);
}
private static void SeparateWrites(Worksheet sheet, int n)
{
for (int i = 1; i <= n; i++)
{
sheet.Cells[i, 1].Value = i.ToString();
}
}
Results:
n = 100 n = 1 000 n = 10 000
SeparateWrites(sheet, n); 180 ms 1125 ms 10972 ms
BatchWrite(sheet, n); 3 ms 4 ms 14 ms
For Excel, I only programmed VBA so I cannot give you the exact syntax on how to do it in C#.
What I notice though is that you are doing something that I have noticed many people are tempted to:
Writing code to each cell in Excel separately.
Read / Write operations are rather slow in comparison to operations performed in memory.
It would be more interesting to pass an array of data to a function that writes all of these data to a defined range in one step. Before doing so, of course you need to set the dimensions of the range correctly (equal to the size of the array).
However, when doing so, performance should be increased.
Related
I have input table in datagridview (output is showed in green) and I need to get to this output:
'Start of block' 'Size' 'TypKar'
1.2.2017 0:00:02 14 6280
1.2.2017 0:03:33 2 3147
1.2.2017 0:04:17 2 4147
1.2.2017 0:04:28 2 6280
1.2.2017 0:04:59 10 3147
Right now I use for loop in which I write first entry and then I count until value in column TypKar changes. When it changes, I write date and type and start counting from 1.
for(int i = 0; i < dviewExport.RowCount; i++)
{
//first line in excel
if(totalCount == 0)
{
totalCount = 32;
signCount = 1;
excelWsExport.Cells[totalCount, 2] = (DateTime)dviewExport[0, i].Value;
excelWsExport.Cells[totalCount, 3] = 1;
excelWsExport.Cells[totalCount, 4] = dviewExport["TypKar", i].Value;
continue;
}
//value is same = just increment
if((excelWsExport.Cells[totalCount, 4] as Excel.Range).Value.ToString() == dviewExport["TypKar", i].Value.ToString())
{
excelWsExport.Cells[totalCount, 3] = (excelWsExport.Cells[totalCount, 3] as Excel.Range).Value + 1;
signCount++;
if(maxCount < signCount)
maxCount = signCount;
}
//value changed = write new line and restart incrementing
else
{
totalCount++;
signCount = 1;
excelWsExport.Cells[totalCount, 2] = (DateTime)dviewExport[0, i].Value;
excelWsExport.Cells[totalCount, 3] = 1;
excelWsExport.Cells[totalCount, 4] = dviewExport["TypKar", i].Value;
}
}
Problem is, that I write it to excel and when data have several thousands of rows it takes a lot of time.
Is it possible to speed it up with excel interop - write it to array and then paste array to excel / sql / ling or anything else?
I tried to find similar problem and get some answers but I don't know how to describe my problem.
In one of the applications I'm working on right now I use something similar to:
string connectionString = "my connection string";
for (int i = 0; i < dataGridView1.RowCount - 1; i++)
{
DataGridViewRow row = dataGridView1.Rows[i];
SqlConnection conn = new SqlConnection(connectionString);
conn.Open();
try
{
var queryString = "INSERT INTO [SQLdb] " +
"(columnNamesInDB) " +
"VALUES (#dataBeingRead)";
SqlCommand comm = new SqlCommand(queryString, conn);
comm.ExecuteNonQuery();
comm.Close();
}
catch (Exception e)
{
//catch behavior
}
To loop through every value in the grid view and insert into an SQL server. Works pretty quickly for our purposes (~1000 range currently).
Based on Export a C# List of Lists to Excel I managed to fast things up by creating generic lists, then pasting it to object lists with two dimensions and then these created lists to excel range. This is way more faster than writing each time to excel cell.
Problem is that Excel does not like List<T> or either list[]. You have to send to excel object[,] (two dimensional) and since I had just one dimension, I made second dimesion 1.
//create generic lists
List<DateTime> listDate = new List<DateTime>();
List<int> listSize = new List<int>();
List<string> listSign = new List<string>();
//fill lists with data from wherever
for(int i = 0; i < dviewExport.RowCount; i++)
{
if(listSign.Count == 0)
{
signCount = 1;
listDate.Add((DateTime)dviewExport[0, i].Value);
listSize.Add(1);
listSign.Add((string)dviewExport[$"{Sign}", i].Value);
continue;
}
if(listSign[listSign.Count - 1] == dviewExport[$"{Sign}", i].Value.ToString())
{
listSize[listSize.Count - 1] += 1;
signCount++;
if(maxCount < signCount)
maxCount = signCount;
}
else
{
signCount = 1;
listDate.Add((DateTime)dviewExport[0, i].Value);
listSize.Add(1);
listSign.Add((string)dviewExport[$"{Sign}", i].Value);
}
}
//create two dimensional object lists with size of generic lists
object[,] outDate = new object[listDate.Count, 1];
object[,] outSize = new object[listSize.Count, 1];
object[,] outSign = new object[listSign.Count, 1];
//fill two dimensional object lists with data from generic lists
for(int row = 0; row < listDate.Count; row++)
{
outDate[row, 0] = listDate[row];
outSize[row, 0] = listSize[row];
outSign[row, 0] = listSign[row];
}
//set Excel ranges and paste lists
range = excelWsExport.get_Range($"B32:B{32 + listDate.Count}", Type.Missing);
range.NumberFormat = "d.MM.yyyy H:mm:ss";
range.Value = outDate;
range = excelWsExport.get_Range($"C32:C{32 + listSize.Count}", Type.Missing);
range.Value = outSize;
range = excelWsExport.get_Range($"D32:D{32 + listSign.Count}", Type.Missing);
range.Value = outSign;
I am trying to extract all text data from an Excel document in C# and am having performance issues. In the following code I open the Workbook, loop over all worksheets, and loop over all cells in the used range, extracting the text from each cell as I go. The problem is, this takes 14 seconds to execute.
public class ExcelFile
{
public string Path = #"C:\test.xlsx";
private Excel.Application xl = new Excel.Application();
private Excel.Workbook WB;
public string FullText;
private Excel.Range rng;
private Dictionary<string, string> Variables;
public ExcelFile()
{
WB = xl.Workbooks.Open(Path);
xl.Visible = true;
foreach (Excel.Worksheet CurrentWS in WB.Worksheets)
{
rng = CurrentWS.UsedRange;
for (int i = 1; i < rng.Count; i++)
{ FullText += rng.Cells[i].Value; }
}
WB.Close(false);
xl.Quit();
}
}
Whereas in VBA I would do something like this, which takes ~1 second:
Sub run()
Dim strText As String
For Each ws In ActiveWorkbook.Sheets
For Each c In ws.UsedRange
strText = strText & c.Text
Next c
Next ws
End Sub
Or, even faster (less than 1 second):
Sub RunFast()
Dim strText As String
Dim varCells As Variant
For Each ws In ActiveWorkbook.Sheets
varCells = ws.UsedRange
For i = 1 To UBound(varCells, 1)
For j = 1 To UBound(varCells, 2)
strText = strText & CStr(varCells(i, j))
Next j
Next i
Next ws
End Sub
Perhaps something is happening in the for loop in C# that I'm not aware of? Is it possible to load a range into an array-type object (as in my last example) to allow iteration over just the values, not the cell objects?
Excel and C# run in different environments completely. C# runs in the .NET framework using managed memory while Excel is a native C++ application and runs in unmanaged memory. Translating data between these two (a process called "marshaling") is extremely expensive in terms of performance.
Tweaking your code isn't going to help. For loops, string construction, etc. are all blazingly fast compared to the marshaling process. The only way you are going to get significantly better performance is to reduce the number of trips that have to cross the interprocess boundary. Extracting data cell by cell is never going to get you the performance you want.
Here are a couple options:
Write a sub or function in VBA that does everything you want, then call that sub or function via interop. Walkthrough.
Use interop to save the worksheet to a temporary file in CSV format, then open the file using C#. You will need to loop through and parse the file to get it into a useful data structure, but this loop will go much faster.
Use interop to save a range of cells to the clipboard, then use C# to read the clipboard directly.
I use this function. The loops are only for converting to array starting at index 0, the main work is done in object[,] tmp = range.Value.
public object[,] GetTable(int row, int col, int width, int height)
{
object[,] arr = new object[height, width];
Range c1 = (Range)Worksheet.Cells[row + 1, col + 1];
Range c2 = (Range)Worksheet.Cells[row + height, col + width];
Range range = Worksheet.get_Range(c1, c2);
object[,] tmp = range.Value;
for (int i = 0; i < height; ++i)
{
for (int j = 0; j < width; ++j)
{
arr[i, j] = tmp[i + tmp.GetLowerBound(0), j + tmp.GetLowerBound(1)];
}
}
return arr;
}
One thing which will speed it up is to use a StringBuilder instead of += on the previous string. Strings are immutable in C# and therefore you are creating a ton of extra strings during your process of creating the final string.
Additionally you may improve performance looping over the row, column positions instead of looping over the index.
Here is the code changed with a StringBuilder and row, column positional looping:
public class ExcelFile
{
public string Path = #"C:\test.xlsx";
private Excel.Application xl = new Excel.Application();
private Excel.Workbook WB;
public string FullText;
private Excel.Range rng;
private Dictionary<string, string> Variables;
public ExcelFile()
{
StringBuilder sb = new StringBuilder();
WB = xl.Workbooks.Open(Path);
xl.Visible = true;
foreach (Excel.Worksheet CurrentWS in WB.Worksheets)
{
rng = CurrentWS.UsedRange;
for (int i = 1; i <= rng.Rows.Count; i++)
{
for (int j = 1; j <= rng.Columns.Count; j++)
{
sb.append(rng.Cells[i, j].Value);
}
}
}
FullText = sb.ToString();
WB.Close(false);
xl.Quit();
}
}
I sympathize with you pwwolff. Looping through Excel cells can be expensive. Antonio and Max are both correct but John Wu's answer sums it up nicely. Using string builder may speed things up and making an object array from the used range IMHO is about as fast as you are going to get using interop. I understand there are other third party libraries that may perform better. Looping through each cell will take an unacceptable amount of time if the file is large using interop.
On the tests below I used a workbook with a single sheet where the sheet has 11 columns and 100 rows of used range data. Using an object array implementation this took a little over a second. With 735 rows it took around 40 seconds.
I put 3 buttons on a form with a multi line text box. The first button uses your posted code. The second button takes the ranges out of the loops. The third button uses an object array approach. Each one has a significant performance improvement over the other. I used a text box on the form to output the data, you can use a string as you are but using a string builder would be better if you must have one big string.
Again, if the files are large you may want to consider another implementation. Hope this helps.
private void button1_Click(object sender, EventArgs e) {
Stopwatch sw = new Stopwatch();
MessageBox.Show("Start DoExcel...");
sw.Start();
DoExcel();
sw.Stop();
MessageBox.Show("End DoExcel...Took: " + sw.Elapsed.Seconds + " seconds and " + sw.Elapsed.Milliseconds + " Milliseconds");
}
private void button2_Click(object sender, EventArgs e) {
MessageBox.Show("Start DoExcel2...");
Stopwatch sw = new Stopwatch();
sw.Start();
DoExcel2();
sw.Stop();
MessageBox.Show("End DoExcel2...Took: " + sw.Elapsed.Seconds + " seconds and " + sw.Elapsed.Milliseconds + " Milliseconds");
}
private void button3_Click(object sender, EventArgs e) {
MessageBox.Show("Start DoExcel3...");
Stopwatch sw = new Stopwatch();
sw.Start();
DoExcel3();
sw.Stop();
MessageBox.Show("End DoExcel3...Took: " + sw.Elapsed.Seconds + " seconds and " + sw.Elapsed.Milliseconds + " Milliseconds");
}
// object[,] array implementation
private void DoExcel3() {
textBox1.Text = "";
string Path = #"D:\Test\Book1 - Copy.xls";
Excel.Application xl = new Excel.Application();
Excel.Workbook WB;
Excel.Range rng;
WB = xl.Workbooks.Open(Path);
xl.Visible = true;
int totalRows = 0;
int totalCols = 0;
foreach (Excel.Worksheet CurrentWS in WB.Worksheets) {
rng = CurrentWS.UsedRange;
totalCols = rng.Columns.Count;
totalRows = rng.Rows.Count;
object[,] objectArray = (object[,])rng.Cells.Value;
for (int row = 1; row < totalRows; row++) {
for (int col = 1; col < totalCols; col++) {
if (objectArray[row, col] != null)
textBox1.Text += objectArray[row,col].ToString();
}
textBox1.Text += Environment.NewLine;
}
}
WB.Close(false);
xl.Quit();
Marshal.ReleaseComObject(WB);
Marshal.ReleaseComObject(xl);
}
// Range taken out of loops
private void DoExcel2() {
textBox1.Text = "";
string Path = #"D:\Test\Book1 - Copy.xls";
Excel.Application xl = new Excel.Application();
Excel.Workbook WB;
Excel.Range rng;
WB = xl.Workbooks.Open(Path);
xl.Visible = true;
int totalRows = 0;
int totalCols = 0;
foreach (Excel.Worksheet CurrentWS in WB.Worksheets) {
rng = CurrentWS.UsedRange;
totalCols = rng.Columns.Count;
totalRows = rng.Rows.Count;
for (int row = 1; row < totalRows; row++) {
for (int col = 1; col < totalCols; col++) {
textBox1.Text += rng.Rows[row].Cells[col].Value;
}
textBox1.Text += Environment.NewLine;
}
}
WB.Close(false);
xl.Quit();
Marshal.ReleaseComObject(WB);
Marshal.ReleaseComObject(xl);
}
// original posted code
private void DoExcel() {
textBox1.Text = "";
string Path = #"D:\Test\Book1 - Copy.xls";
Excel.Application xl = new Excel.Application();
Excel.Workbook WB;
Excel.Range rng;
WB = xl.Workbooks.Open(Path);
xl.Visible = true;
foreach (Excel.Worksheet CurrentWS in WB.Worksheets) {
rng = CurrentWS.UsedRange;
for (int i = 1; i < rng.Count; i++) {
textBox1.Text += rng.Cells[i].Value;
}
}
WB.Close(false);
xl.Quit();
Marshal.ReleaseComObject(WB);
Marshal.ReleaseComObject(xl);
}
I'm using the following code snippet to write some data into an excel file using EPPlus. My application does some big data processing and since excel has a limit of ~1 million rows, space runs out time to time. So what I am trying to achieve is this, once a System.ArgumentException : row out of range is detected or in other words.. no space is left in the worksheet.. the remainder of the data will be written in the 2nd worksheet in the same workbook. I have tried the following code but no success yet. Any help will be appreciated!
try
{
for (int i = 0; i < data.Count(); i++)
{
var cell1 = ws.Cells[rowIndex, colIndex];
cell1.Value = data[i];
colIndex++;
}
rowIndex++;
}
catch (System.ArgumentException)
{
for (int i = 0; i < data.Count(); i++)
{
var cell2 = ws1.Cells[rowIndex, colIndex];
cell2.Value = data[i];
colIndex++;
}
rowIndex++;
}
You shouldnt use a catch to handle that kind of logic - it is more for a last resort. Better to engineer you code to deal with your situation since this is very predictable.
The excel 2007 format has a hard limit of 1,048,576 rows. With that, you know exactly how many rows you should put before going to a new sheet. From there it is simple for loops and math:
[TestMethod]
public void Big_Row_Count_Test()
{
var existingFile = new FileInfo(#"c:\temp\temp.xlsx");
if (existingFile.Exists)
existingFile.Delete();
const int maxExcelRows = 1048576;
using (var package = new ExcelPackage(existingFile))
{
//Assume a data row count
var rowCount = 2000000;
//Determine number of sheets
var sheetCount = (int)Math.Ceiling((double)rowCount/ maxExcelRows);
for (var i = 0; i < sheetCount; i++)
{
var ws = package.Workbook.Worksheets.Add(String.Format("Sheet{0}", i));
var sheetRowLimit = Math.Min((i + 1)*maxExcelRows, rowCount);
//Remember +1 for 1-based excel index
for (var j = i * maxExcelRows + 1; j <= sheetRowLimit; j++)
{
var cell1 = ws.Cells[j - (i*maxExcelRows), 1];
cell1.Value = j;
}
}
package.Save();
}
}
I'm trying to save values from a List<string> to a Excel Worksheet using EPPlus so I wrote this code:
private void button3_Click(object sender, EventArgs e)
{
int value = bdCleanList.Count() / Int32.Parse(textBox7.Text);
string bases_generadas = System.IO.Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "bases_generadas");
var package = new ExcelPackage();
package.Workbook.Worksheets.Add("L1");
ExcelWorksheet worksheet = package.Workbook.Worksheets[1];
worksheet.Name = "L1";
int j = 2;
int col = 1;
for (int i = 1; i < bdCleanList.Count(); i++)
{
if (i%Int32.Parse(textBox7.Text) == 0)
{
package.Workbook.Worksheets.Add("L" + j);
worksheet = package.Workbook.Worksheets[j];
worksheet.Name = "L" + j;
j += 1;
worksheet.Cells[i, col].Value = bdCleanList[i];
}
else
{
worksheet.Cells[i, col].Value = bdCleanList[i];
}
}
Byte[] bin = package.GetAsByteArray();
File.WriteAllBytes(System.IO.Path.Combine(bases_generadas, "bases_generadas_" + DateTime.Now.Ticks.ToString() + DateTime.Now.ToString("dd-MM-yyyy-hh-mm-ss") + ".xlsx"), bin);
MessageBox.Show("Se generaron un total de " + value + " bases y puede encontrarlas en la siguiente ruta: " + System.IO.Path.Combine(bases_generadas, "bases_generadas_" + DateTime.Now.Ticks.ToString() + DateTime.Now.ToString("dd-MM-yyyy-hh-mm-ss") + ".xlsx"), "InformaciĆ³n", MessageBoxButtons.OK, MessageBoxIcon.Information);
}
In the sample I'm running bdCleanList.Count() has 2056 values, Int32.Parse(textBox7.Text) has 500 as value so value gets in this case 5, the problem here is that values for L2, L3 ... L5 aren't saved and I don't know why. Values for first worksheet is saved fine but the rest don't, what's wrong in my code? How do I set active worksheet in order to save values on the active sheet? How do I move between worksheets?
Ok, after some days of headache and a lot of hours reading my code once and once and finding over Internet I found the solution: my code was "good" but I didn't get the mistake until I debug it line by line several times. If yours see in this line worksheet.Cells[i, col].Value = bdCleanList[i]; is where I set values for Cells and it does but for L1 and because i start at 0 and then I wrote in (i+1) all was good, Cells start in 1 and end in 499, then for L2 and because I don't scroll to the end of Column, values start in Cell 500 and end on Cell 1000, and this is right because i was on 500 at the moment where L2 was created. That was the problem. So I change my code to this one:
int pos = 1;
for (int i = 0; i < bdCleanList.Count(); i++)
{
if ((i + 1) % Int32.Parse(textBox7.Text) == 0)
{
package.Workbook.Worksheets.Add("B" + j);
worksheet = package.Workbook.Worksheets[j];
worksheet.Name = "B" + j;
j += 1;
pos = 1;
}
worksheet.Cells[pos, 1].Value = bdCleanList[i];
pos++;
}
And that does the job as I want. Thanks to every people here for try to help me
A rather higeisch dataset with 16000 x 12 entries needs to be dumped into a worksheet.
I use the following function now:
for (int r = 0; r < dt.Rows.Count; ++r)
{
for (int c = 0; c < dt.Columns.Count; ++c)
{
worksheet.Cells[c + 1][r + 1] = dt.Rows[r][c].ToString();
}
}
I rediced the example to the center piece
Here is what i implemented after reading the suggestion from Dave Zych.
This works great.
private static void AppendWorkSheet(Excel.Workbook workbook, DataSet data, String tableName)
{
Excel.Worksheet worksheet;
if (UsedSheets == 0) worksheet = workbook.Worksheets[1];
else worksheet = workbook.Worksheets.Add();
UsedSheets++;
DataTable dt = data.Tables[0];
var valuesArray = new object[dt.Rows.Count, dt.Columns.Count];
for (int r = 0; r < dt.Rows.Count; ++r)
{
for (int c = 0; c < dt.Columns.Count; ++c)
{
valuesArray[r, c] = dt.Rows[r][c].ToString();
}
}
Excel.Range c1 = (Excel.Range)worksheet.Cells[1, 1];
Excel.Range c2 = (Excel.Range)worksheet.Cells[dt.Rows.Count, dt.Columns.Count];
Excel.Range range = worksheet.get_Range(c1, c2);
range.Cells.Value2 = valuesArray;
worksheet.Name = tableName;
}
Build a 2D array of your values from your DataSet, and then you can set a range of values in Excel to the values of the array.
object valuesArray = new object[dataTable.Rows.Count, dataTable.Columns.Count];
for(int i = 0; i < dt.Rows.Count; i++)
{
//If you know the number of columns you have, you can specify them this way
//Otherwise use an inner for loop on columns
valuesArray[i, 0] = dt.Rows[i]["ColumnName"].ToString();
valuesArray[i, 1] = dt.Rows[i]["ColumnName2"].ToString();
...
}
//Calculate the second column value by the number of columns in your dataset
//"O" is just an example in this case
//Also note: Excel is 1 based index
var sheetRange = worksheet.get_Range("A2:O2",
string.Format("A{0}:O{0}", dt.Rows.Count + 1));
sheetRange.Cells.Value2 = valuesArray;
This is much, much faster than looping and setting each cell individually. If you're setting each cell individually, you have to talk to Excel through COM (for lack of a better phrase) for each cell (which in your case is ~192,000 times), which is incredibly slow. Looping, building your array and only talking to Excel once removes much of that overhead.