Fastest way to drop a DataSet into a worksheet - c#

A rather higeisch dataset with 16000 x 12 entries needs to be dumped into a worksheet.
I use the following function now:
for (int r = 0; r < dt.Rows.Count; ++r)
{
for (int c = 0; c < dt.Columns.Count; ++c)
{
worksheet.Cells[c + 1][r + 1] = dt.Rows[r][c].ToString();
}
}
I rediced the example to the center piece
Here is what i implemented after reading the suggestion from Dave Zych.
This works great.
private static void AppendWorkSheet(Excel.Workbook workbook, DataSet data, String tableName)
{
Excel.Worksheet worksheet;
if (UsedSheets == 0) worksheet = workbook.Worksheets[1];
else worksheet = workbook.Worksheets.Add();
UsedSheets++;
DataTable dt = data.Tables[0];
var valuesArray = new object[dt.Rows.Count, dt.Columns.Count];
for (int r = 0; r < dt.Rows.Count; ++r)
{
for (int c = 0; c < dt.Columns.Count; ++c)
{
valuesArray[r, c] = dt.Rows[r][c].ToString();
}
}
Excel.Range c1 = (Excel.Range)worksheet.Cells[1, 1];
Excel.Range c2 = (Excel.Range)worksheet.Cells[dt.Rows.Count, dt.Columns.Count];
Excel.Range range = worksheet.get_Range(c1, c2);
range.Cells.Value2 = valuesArray;
worksheet.Name = tableName;
}

Build a 2D array of your values from your DataSet, and then you can set a range of values in Excel to the values of the array.
object valuesArray = new object[dataTable.Rows.Count, dataTable.Columns.Count];
for(int i = 0; i < dt.Rows.Count; i++)
{
//If you know the number of columns you have, you can specify them this way
//Otherwise use an inner for loop on columns
valuesArray[i, 0] = dt.Rows[i]["ColumnName"].ToString();
valuesArray[i, 1] = dt.Rows[i]["ColumnName2"].ToString();
...
}
//Calculate the second column value by the number of columns in your dataset
//"O" is just an example in this case
//Also note: Excel is 1 based index
var sheetRange = worksheet.get_Range("A2:O2",
string.Format("A{0}:O{0}", dt.Rows.Count + 1));
sheetRange.Cells.Value2 = valuesArray;
This is much, much faster than looping and setting each cell individually. If you're setting each cell individually, you have to talk to Excel through COM (for lack of a better phrase) for each cell (which in your case is ~192,000 times), which is incredibly slow. Looping, building your array and only talking to Excel once removes much of that overhead.

Related

How to copy data from one column to another at the same datatable in c#

I have a datatable filled with information from an excel file. I have more than four columns but to bring an example I'm writing just four of them. I have to write a program in which if the value of the cell in the column C is 0, then I have to copy column B to column A. If the value of the cell in column C is > 0 then i have to copy the column B to A and should add another row in which i have to copy the value of the column C to A.
What i have till now is
for (int r = 2; r <= ws.UsedRange.Rows.Count; r++)
{ if (ws.UsedRange.Cells[r, 3].Text == "0")
{
DataRow row = dt.NewRow();
for (int c = 1; c < ws.UsedRange.Columns.Count; c++)
{
string cell = ws.Cells[r, c].Text;
row[c - 1] = cell;
}
}
So my questions are:
How can i copy a column to another in the same datatable? Copy B to A.
How can i add another row and copy the value of C to A only for that row?
Here is the full code:
public DataTable ReadExcel2(string file)
{
ExcelI.Application app = new ExcelI.Application(); //create an excel instance
ExcelI.Workbook wb = app.Workbooks.Open(file, ReadOnly: true); //open a file
ExcelI.Worksheet ws = wb.Worksheets[1]; //choose a sheet. The firt one
var rng = ws.UsedRange;
//takes the index of the columns that are going to be filtered
int service = ColumnIndexByName(ws.Cells[1, 1].EntireRow, "Service");
int status = ColumnIndexByName(ws.Cells[1, 1].EntireRow, "Status");
int code = ColumnIndexByName(ws.Cells[1, 1].EntireRow, "Code");
DataTable dt = new DataTable();
dt.Columns.Add("A", typeof(string));
for (int c = 1; c < ws.UsedRange.Columns.Count; c++)
{
string colName = ws.Cells[1, c].Text;
int i = 2;
while (dt.Columns.Contains(colName))
{
colName = ws.Cells[1, c].Text + "{" + i.ToString() + "}";
i++;
}
dt.Columns.Add(colName);
}
//do a loop to delete the rows that we dont need
for (int r = 2; r <= ws.UsedRange.Rows.Count; r++)
{
if (ws.UsedRange.Cells[r, 3].Text == "0")
{
DataRow row = dt.NewRow();
for (int c = 1; c < ws.UsedRange.Columns.Count; c++)
{
string cell = ws.Cells[r, c].Text;
row[c - 1] = cell;
}
dt.Rows.Add(row);
row["A"] = row["C"];
}
}
//Close the file
wb.Close();
//release the excel objects from use
Marshal.ReleaseComObject(wb);
Marshal.ReleaseComObject(ws);
//take the id of excel process
int pid = app.PID();
app.Quit();
StartProc("taskkill", $"/f /pid {pid}");
return dt;
}
To add row use dt.Rows.Add(row);, about "copy the column B to A" you mean copy value , just assign row[0] = row[2];, by the way , your example missing a bracket.
I think you should review your code according to conditions in your question, and you can do it yourself as well. Just pay attention to condition you wrote in question and conditional operator you checked in the code.

iteration through datagrid rows for export to excel (again)

I am trying to export dataGrid rows to an Excel sheet.
Since I am switching from a WinForms(dataGridView) to WPF
(dataGrid) and basically I have no clue about WPF so far
I need your help.
Maybe somebody can either tell me how to change my loop
or what I have to do instead to get the rows filled into
the cells of the Excel sheet.
I have read all articles on SO covering this problem but
don't seem to find a topic suiting my issue.
This is what I did for the filling of the column names, which
works perfectly:
for (int i = 1; i < dataGrid.Columns.Count + 1; i++)
{
Excel.Range BackgroundColor;
BackgroundColor = xlWorkSheet.get_Range("a9", "j9");
BackgroundColor.Interior.Color = System.Drawing.ColorTranslator.ToOle(System.Drawing.Color.RoyalBlue);
AxlEx.Cells[9, i] = dataGrid.Columns[i - 1].Header;
}
when it comes down to the filling of the cells with rows I have tried numerous attemps to get it working
for (int i = 0; i < dataGrid.Items.Count; i++)
{
DataRowView aux = (DataRowView)dataGrid.Items[i];
for (int j = 0; j < aux.Row.ItemArray.Length; j++)
{
//Console.WriteLine(string.Format("{0}-{1}", j, aux.Row.ItemArray[j]));
AxlEx.Cells[i + 10, j + 1] = aux.Row.ItemArray[j];
}
}
throws me an exception of System.InvalidCast exception for a type mismatch
which is obvious... but I don't know how to convert, here also the fitting
Topics on SO didn't have an example which i could understand to change my code.
Before I had this:
for (int i = 0; i < dataGrid.Items.Count; i++)
{
for (int j = 0; j < dataGrid.Columns.Count; j++)
{
AxlEx.Cells[i + 10, j + 1] = dataRow.Row.ItemArray[j].ToString();
}
}
which then works for 1 row if i refer to
DataRowView dataRow = (DataRowView)dataGrid.SelectedItem;
How can I get this to work?
I do not know whether it is necessary to debug your code. However, I would like to show my work code to export data from DataGrid to MS Excel:
It is better to transfer this work from UI Thread to a ThreadPool:
using Excel = Microsoft.Office.Interop.Excel;//add this library
Task.Run(() => {
// load excel, and create a new workbook
Excel.Application excelApp = new Excel.Application();
excelApp.Workbooks.Add();
// single worksheet
Excel._Worksheet workSheet = excelApp.ActiveSheet;
// column headings
for (int i = 0; i < YourDataTable.Columns.Count; i++)
{
workSheet.Cells[1, (i + 1)] = YourDataTable.Columns[i].ColumnName;
}
// rows
for (int i = 0; i < YourDataTable.Rows.Count; i++)
{
// to do: format datetime values before printing
for (int j = 0; j < YourDataTable.Columns.Count; j++)
{
workSheet.Cells[(i + 2), (j + 1)] = YourDataTable.Rows[i][j];
}
}
excelApp.Visible = true;
});
I found the problem....
for (int i = 0; i < dataGrid.Items.Count-1; i++)
{
DataRowView aux = (DataRowView)dataGrid.Items[i];
for (int j = 0; j < aux.Row.ItemArray.Length; j++)
{
//Console.WriteLine(string.Format("{0}-{1}", j, aux.Row.ItemArray[j]));
AxlEx.Cells[i + 10, j + 1] = aux.Row.ItemArray[j];
}
}
i had to substract (dataGrid.Items.Count-1) because there was an additional blank line in the dataGrid which seemed to cause the problem.
Pobably due to a NULL field return value ???
the datagrid

How to write on multiple worksheets using EPPlus

I'm using the following code snippet to write some data into an excel file using EPPlus. My application does some big data processing and since excel has a limit of ~1 million rows, space runs out time to time. So what I am trying to achieve is this, once a System.ArgumentException : row out of range is detected or in other words.. no space is left in the worksheet.. the remainder of the data will be written in the 2nd worksheet in the same workbook. I have tried the following code but no success yet. Any help will be appreciated!
try
{
for (int i = 0; i < data.Count(); i++)
{
var cell1 = ws.Cells[rowIndex, colIndex];
cell1.Value = data[i];
colIndex++;
}
rowIndex++;
}
catch (System.ArgumentException)
{
for (int i = 0; i < data.Count(); i++)
{
var cell2 = ws1.Cells[rowIndex, colIndex];
cell2.Value = data[i];
colIndex++;
}
rowIndex++;
}
You shouldnt use a catch to handle that kind of logic - it is more for a last resort. Better to engineer you code to deal with your situation since this is very predictable.
The excel 2007 format has a hard limit of 1,048,576 rows. With that, you know exactly how many rows you should put before going to a new sheet. From there it is simple for loops and math:
[TestMethod]
public void Big_Row_Count_Test()
{
var existingFile = new FileInfo(#"c:\temp\temp.xlsx");
if (existingFile.Exists)
existingFile.Delete();
const int maxExcelRows = 1048576;
using (var package = new ExcelPackage(existingFile))
{
//Assume a data row count
var rowCount = 2000000;
//Determine number of sheets
var sheetCount = (int)Math.Ceiling((double)rowCount/ maxExcelRows);
for (var i = 0; i < sheetCount; i++)
{
var ws = package.Workbook.Worksheets.Add(String.Format("Sheet{0}", i));
var sheetRowLimit = Math.Min((i + 1)*maxExcelRows, rowCount);
//Remember +1 for 1-based excel index
for (var j = i * maxExcelRows + 1; j <= sheetRowLimit; j++)
{
var cell1 = ws.Cells[j - (i*maxExcelRows), 1];
cell1.Value = j;
}
}
package.Save();
}
}

GemBox - For loop for rows and cols?

I have a question. Is there a way that I could go through all the cols/rows in a spreadsheet using a for loop?? Right now I am using foreach loops like this in my code: (You can just ignore what's going on inside).
foreach (ExcelRow row in w1.Rows)
{
foreach (ExcelCell cell in row.AllocatedCells)
{
Console.Write("row: {0}", globalVar.iRowActual);
if (globalVar.iRowActual > 1)
{
cellValue = SafeCellValue(cell);
Console.WriteLine("value is: {0}", cellValue);
}
}
globalVar.iRowActual++;
}
The problem is that I would like to assign the value of each cell to a new variable and pass it to another method. I would like to use for loops for this and I know I can use CalculateMaxUsedColumns as the limit for the cols but is there a property like that, that I could use for the rows?!
This is what I would like to do:
int columnCount = ws.CalculateMaxUsedColumns();
int rowCount = ws.CalculateMaxUsedRows(); ------> PART I NEED HELP WITH
for(int i=0; i <columnCount; i++){
for(int j = 0; j<rowCount; j++){
.....
}
}
Any kind of help would be greatly appreciated. Thanks!!!
Here is a way you can iterate in GemBox.Spreadsheet through all the columns / rows in a spreadsheet using a for loop.
Go through the CellRange which is returned by ExcelWorksheet.GetUsedCellRange method.
ExcelFile workbook = ExcelFile.Load("Sample.xlsx");
ExcelWorksheet worksheet = workbook.Worksheets[0];
CellRange range = worksheet.GetUsedCellRange(true);
for (int r = range.FirstRowIndex; r <= range.LastRowIndex; r++)
{
for (int c = range.FirstColumnIndex; c <= range.LastColumnIndex; c++)
{
ExcelCell cell = range[r - range.FirstRowIndex, c - range.FirstColumnIndex];
string cellName = CellRange.RowColumnToPosition(r, c);
string cellRow = ExcelRowCollection.RowIndexToName(r);
string cellColumn = ExcelColumnCollection.ColumnIndexToName(c);
Console.WriteLine(string.Format("Cell name: {1}{0}Cell row: {2}{0}Cell column: {3}{0}Cell value: {4}{0}",
Environment.NewLine, cellName, cellRow, cellColumn, (cell.Value) ?? "Empty"));
}
}
EDIT
In newer versions there are some additional APIs which can simplify this. For instance, you can now use foreach and still retreive the row and column indexes with ExcelCell.Row.Index and ExcelCell.Column.Index and you can retreive the names without using those static methods (without RowColumnToPosition, RowIndexToName and ColumnIndexToName).
ExcelFile workbook = ExcelFile.Load("Sample.xlsx");
ExcelWorksheet worksheet = workbook.Worksheets[0];
foreach (ExcelRow row in worksheet.Rows)
{
foreach (ExcelCell cell in row.AllocatedCells)
{
Console.WriteLine($"Cell value: {cell.Value ?? "Empty"}");
Console.WriteLine($"Cell name: {cell.Name}");
Console.WriteLine($"Row index: {cell.Row.Index}");
Console.WriteLine($"Row name: {cell.Row.Name}");
Console.WriteLine($"Column index: {cell.Column.Index}");
Console.WriteLine($"Column name: {cell.Column.Name}");
Console.WriteLine();
}
}
Also, here are two other ways how you can iterate through sheet cells in for loop.
1) Use ExcelWorksheets.Rows.Count and ExcelWorksheets.CalculateMaxUsedColumns() to get the last used row and column.
ExcelFile workbook = ExcelFile.Load("Sample.xlsx");
ExcelWorksheet worksheet = workbook.Worksheets[0];
int rowCount = worksheet.Rows.Count;
int columnCount = worksheet.CalculateMaxUsedColumns();
for (int r = 0; r < rowCount; r++)
{
for (int c = 0; c < columnCount; c++)
{
ExcelCell cell = worksheet.Cells[r, c];
Console.WriteLine($"Cell value: {cell.Value ?? "Empty"}");
Console.WriteLine($"Cell name: {cell.Name}");
Console.WriteLine($"Row name: {cell.Row.Name}");
Console.WriteLine($"Column name: {cell.Column.Name}");
Console.WriteLine();
}
}
If you have a non-uniform spreadsheet in which rows have different column count (for instance, first row has 10 cells, second row has 100 cells, etc.), then you could use the following change in order to avoid iterating through non-allocated cells:
int rowCount = worksheet.Rows.Count;
for (int r = 0; r < rowCount; r++)
{
ExcelRow row = worksheet.Rows[r];
int columnCount = row.AllocatedCells.Count;
for (int c = 0; c < columnCount; c++)
{
ExcelCell cell = row.Cells[c];
// ...
}
}
2) Use CellRange.GetReadEnumerator method, it iterates through only already allocated cells in the range.
ExcelFile workbook = ExcelFile.Load("Sample.xlsx");
ExcelWorksheet worksheet = workbook.Worksheets[0];
CellRangeEnumerator enumerator = worksheet.Cells.GetReadEnumerator();
while (enumerator.MoveNext())
{
ExcelCell cell = enumerator.Current;
Console.WriteLine($"Cell value: {cell.Value ?? "Empty"}");
Console.WriteLine($"Cell name: {cell.Name}");
Console.WriteLine($"Row name: {cell.Row.Name}");
Console.WriteLine($"Column name: {cell.Column.Name}");
Console.WriteLine();
}

Convert a jagged array to a 2D array directly without iterating each item?

I am trying to save a DataTable into an excel sheet.. my code is like this..
Excel.Range range = xlWorkSheet.get_Range("A2");
range = range.get_Resize(dtExcel.Rows.Count, dtExcel.Columns.Count);
object[,] rng1 = new object[dtExcel.Rows.Count, dtExcel.Columns.Count];
Excel range requires range value as array[,] but I have the DataTable as jagged array[][].
object[][] rng2 = dtExcel.AsEnumerable().Select(x => x.ItemArray).ToArray();
Is there any built-in function to directly convert the jagged array[][] to a 2D array[][] ?
Iterating through Excel, DataTable and assigning seems slower with bulk data..
Also I don't want to setup querying with DSN for excel.. I chose excel storage to avoid the configuring of any databases.. :P
I found a detailed explanation of ways of writing data to excel here..
http://support.microsoft.com/kb/306023
At last I used NPOI library for this. It is quite simple and free.
The code to convert DataTable to excel as follows.
HSSFWorkbook hssfworkbook = new HSSFWorkbook();
foreach (DataTable dt in DataSource.Tables)
{
ISheet sheet1 = hssfworkbook.CreateSheet(dt.TableName);
//Set column titles
IRow headRow = sheet1.CreateRow(0);
for (int colNum = 0; colNum < dt.Columns.Count; colNum++)
{
ICell cell = headRow.CreateCell(colNum);
cell.SetCellValue(dt.Columns[colNum].ColumnName);
}
//Set values in cells
for (int rowNum = 1; rowNum <= dt.Rows.Count; rowNum++)
{
IRow row = sheet1.CreateRow(rowNum);
for (int colNum = 0; colNum < dt.Columns.Count; colNum++)
{
ICell cell = row.CreateCell(colNum);
cell.SetCellValue(dt.Rows[rowNum - 1][colNum].ToString());
}
}
// Resize column width to show all data
for (int colNum = 0; colNum < dt.Columns.Count; colNum++)
{
sheet1.AutoSizeColumn(colNum);
}
}

Categories