OutOfMemoryException while trying to read big Excel file into DataTable - c#

I'm using SSIS package to clean and load data from .Xlsx file to SQL Server table.
I have also to highlight cells containing wrong data in .Xlsx file, for this I have to get back column and row indexes based on column name and row id(witch I have in my data spreadsheet). For that I compare each column name from my first spreadsheet (Error_Sheet) with rows of a column that I added in a second spreadsheet and do the same for rows, and if I have the same value of cells I get back the column and row indexes of my data spreadsheet and highlight the cell based on that column and row index. The script worked fine, but after trying to run it from a server I got an Memory exception and also on my workstation where it was working fine before.
I've tried to reduce the range that I'm taking data from : AC1:AC10000 to AC1:AC100, it worked only after the first time compilation, but it keeps throwing exception again.
string strSQLErrorColumns = "Select * From [" + Error_Sheet + "AC1:AC100]";
OleDbConnection cn = new OleDbConnection(strCn);
OleDbDataAdapter objAdapterErrorColumns = new OleDbDataAdapter(strSQLErrorColumns, cn);
System.Data.DataSet dsErrorColumns = new DataSet();
objAdapterErrorColumns.Fill(dsErrorColumns, Error_Sheet);
System.Data.DataTable dtErrorColumns = dsErrorColumns.Tables[Error_Sheet];
dsErrorColumns.Dispose();
objAdapterErrorColumns.Dispose();
foreach (DataColumn ColumnData in dtDataColumns.Columns){
ColumnDataCellsValue = dtDataColumns.Columns[iCntD].ColumnName.ToString();
iCntE = 0;
foreach (DataRow ColumnError in dtErrorColumns.Rows){
ColumnErrorCellsValue = dtErrorColumns.Rows[iCntE].ItemArray[0].ToString();
if (ColumnDataCellsValue.Equals(ColumnErrorCellsValue)){
ColumnIndex = ColumnData.Table.Columns[ColumnDataCellsValue].Ordinal;
iCntE = iCntE + 1;
break;
}
}
iCntD = iCntD + 1;
}
ColumnIndexHCell = ColumnIndex + 1;
RowIndexHCell = RowIndex + 2;
Range rng = xlSheets.Cells[RowIndexHCell, ColumnIndexHCell] as Excel.Range;
rng.Interior.Color = System.Drawing.ColorTranslator.ToOle(System.Drawing.Color.Yellow);
There is any other way to load data in DataTable to get column and row index without using a lot of memory or by using Excel.Range.Cell instead of dataset and DataTable to get Cell value, column and row index from xlsx file please ?
I didn't show the whole code because it's long. Please keep me informed if more information needed.

When trying to read data from an Excel with huge number of Rows, it is better to read data by chunk (in OleDbDataAdapter you can use paging option to achieve that).
int result = 1;
int intPagingIndex = 0;
int intPagingInterval = 1000;
while (result > 0){
result = daGetDataFromSheet.Fill(dsErrorColumns,intPagingIndex, intPagingInterval , Error_Sheet);
System.Data.DataTable dtErrorColumns = dsErrorColumns.Tables[Error_Sheet];
//Implement your logic here
intPagingIndex += intPagingInterval ;
}
This will prevent an OutOfMemory Exception. And no more need to specify a range like AC1:AC10000
References
Paging Through a Query Result
Fill(DataSet, Int32, Int32, String)

Related

NPOI export excel all columns but last become blank

I am investing the AutoSizeColumn() of NPOI Excel exporting. From this SO question, I know that when exporting a DataTable to excel, have to write all data inside one column before calling the AutoSizeColumn(). Example: (as from this SO answer:
HSSFWorkbook spreadsheet = new HSSFWorkbook();
DataSet results = GetSalesDataFromDatabase();
//here, we must insert at least one sheet to the workbook. otherwise, Excel will say 'data lost in file'
HSSFSheet sheet1 = spreadsheet.CreateSheet("Sheet1");
foreach (DataColumn column in results.Tables[0].Columns)
{
int rowIndex = 0;
foreach (DataRow row in results.Tables[0].Rows)
{
HSSFRow dataRow = sheet1.CreateRow(rowIndex);
dataRow.CreateCell(column.Ordinal).SetCellValue(row[column].ToString());
rowIndex++;
}
sheet1.AutoSizeColumn(column.Ordinal);
}
//Write the stream data of workbook to the file 'test.xls' in the temporary directory
FileStream file = new FileStream(Path.Combine(Path.GetTempPath(), "test.xls") , FileMode.Create);
spreadsheet.Write(file);
file.Close();
When I open the test.xls, only the last column has value. All those previous columns does not have anything in it! (The size of each column is adjusted however.)
FYI: 1) I use the code in GetTable() from https://www.dotnetperls.com/datatable
2) I am using C#.
When I open the test.xls, only the last column has value. All those previous columns does not have anything in it! (The size of each column is adjusted however.)
That's because you're looping through the column and (re)creating ALL the rows on each iteration. This row (re)creation loop effectively overwrites the old rows (with their previous cell value set) with a blank one. Thus all columns but last become blank.
Try switching the loop order so that rows are iterated first then columns:
HSSFWorkbook spreadsheet = new HSSFWorkbook();
DataSet results = GetSalesDataFromDatabase();
//here, we must insert at least one sheet to the workbook. otherwise, Excel will say 'data lost in file'
HSSFSheet sheet1 = spreadsheet.CreateSheet("Sheet1");
int rowIndex = 0;
foreach (DataRow row in results.Tables[0].Rows)
{
HSSFRow dataRow = sheet1.CreateRow(rowIndex);
foreach (DataColumn column in results.Tables[0].Columns)
{
dataRow.CreateCell(column.Ordinal).SetCellValue(row[column].ToString());
}
rowIndex++;
}
for(var i = 0; i< results.Tables[0].Columns.Count; i++)
{
sheet1.AutoSizeColumn(i);
}
//Write the stream data of workbook to the file 'test.xls' in the temporary directory
FileStream file = new FileStream(Path.Combine(Path.GetTempPath(), "test.xls"), FileMode.Create);
spreadsheet.Write(file);
file.Close();

How can i get actual used range for modified excels using Epplus?

I am reading data from excel to datable using EPPlus.
After reading an excel sheet with 10 rows of record, I modified the excel sheet by removing existing data and kept data for only one row.
But when I am reading the modified excel it still reading 10 rows (1 with value and remaining as null fields) to data table.
How can limit this?
I am using following code for reading Excel.
using (var pck = new OfficeOpenXml.ExcelPackage())
{
using (var stream = File.OpenRead(FilePath))
{
pck.Load(stream);
}
var ws = pck.Workbook.Worksheets.First();
bool hasHeader = true; // adjust it accordingly(this is a simple approach)
foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column])
{
DSClientTransmittal.Tables[0].Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column));
}
var startRow = hasHeader ? 2 : 1;
for (var rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
{
//var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
var wsRow = ws.Cells[rowNum, 1, rowNum, DSClientTransmittal.Tables[0].Columns.Count];
var row = DSClientTransmittal.Tables[0].NewRow();
foreach (var cell in wsRow)
{
try
{
object cellValue = cell.Value;
//row[cell.Start.Column - 1] = cell.Text;
row[cell.Start.Column - 1] = cellValue.ToString().Trim();
//cell.Style.Numberformat.Format = "#";
//row[cell.Start.Column - 1] = cell.Text;
}
catch (Exception ex) { }
}
DSClientTransmittal.Tables[0].Rows.Add(row);
}
pck.Dispose();
}
When I was using Interop excel to read excel, same issue was overcame by
clearformat() method like
ws.Columns.ClearFormats();
xlColCount = ws.UsedRange.Columns.Count;
Is there any equivalent for this in Epplus open xml?
How can I get actual used range for modified excels?
There is no built-in way of indicating that a row shouldn't be accounted for when only deleting data in some cells.
Dimension is as close as you can get, but rows are included in the Dimension if any column contains data or if any row above or below contains data.
You could however try to find out if you should skip a row in the for loop.
For example if you always delete data in the first 4 columns only, then you could try:
if(!ws.Cells[rowNum, 1, rowNum, 4].All(c => c.Value == null))
{
//Continue adding the row to the table
}
The description isn't indicating the criteria for skipping a row, but you get the idea.
To start with, I am not a C# programmer, but I think I have a solution that works using an Excel VBA script. You may be able to run this Excel VBA code with C, or get insight in how to accomplish the same thing with C+.
The problem you are having is related to the way Excel handles the working size of a worksheet. If you enter data in the 1 millionth row and then delete that cell, Excel still shows the worksheet as having 1 million rows.
I tested out this Excel VBA code and it successfully deleted all rows that were completely empty, and then reset the worksheet size.
Sub DelEmptyRowsResizeWorksheet()
Dim i As Long, iLimit As Long
iLimit = ActiveSheet.UsedRange.Rows.Count
For i = iLimit To 1 Step -1
If Application.CountA(Cells(i, 1).EntireRow) = 0 Then
Cells(i, 1).EntireRow.Delete
End If
Next i
iLimit = ActiveSheet.UsedRange.Rows.Count ' resize the worksheet based on the last row with data
End Sub
To do this manually without a script, first delete all empty rows at the bottom (or columns on the right side) of a worksheet, save it, then close and reopen the workbook. I found that this also resets the Excel workbook size.

Reading from Excel File using ClosedXML

My Excel file is not in tabular data. I am trying to read from an excel file.
I have sections within my excel file that are tabular.
I need to loop through rows 3 to 20 which are tabular and read the data.
Here is party of my code:
string fileName = "C:\\Folder1\\Prev.xlsx";
var workbook = new XLWorkbook(fileName);
var ws1 = workbook.Worksheet(1);
How do I loop through rows 3 to 20 and read columns 3,4, 6, 7, 8?
Also if a row is empty, how do I determine that so I can skip over it without reading that each column has a value for a given row.
To access a row:
var row = ws1.Row(3);
To check if the row is empty:
bool empty = row.IsEmpty();
To access a cell (column) in a row:
var cell = row.Cell(3);
To get the value from a cell:
object value = cell.Value;
// or
string value = cell.GetValue<string>();
For more information see the documentation.
Here's my jam.
var rows = worksheet.RangeUsed().RowsUsed().Skip(1); // Skip header row
foreach (var row in rows)
{
var rowNumber = row.RowNumber();
// Process the row
}
If you just use .RowsUsed(), your range will contain a huge number of columns. Way more than are actually filled in!
So use .RangeUsed() first to limit the range. This will help you process the file faster!
You can also use .Skip(1) to skip over the column header row (if you have one).
I'm not sure if this solution will solve OP's problem but I prefer using RowsUsed method. It can be used to get the list of only those rows which are non-empty or has been edited by the user. This way I can avoid making emptiness check while processing each row.
Below code snippet can process 3rd to 20th row numbers out of all the non-empty rows. I've filtered the empty rows before starting the foreach loop. Please bear in mind that filtering the non-empty rows before starting to process the rows can affect the total count of rows which will get processed. So you need to be careful while applying any logic which is based on the total number of rows processed inside foreach loop.
string fileName = "C:\\Folder1\\Prev.xlsx";
using (var excelWorkbook = new XLWorkbook(fileName))
{
var nonEmptyDataRows = excelWorkbook.Worksheet(1).RowsUsed();
foreach (var dataRow in nonEmptyDataRows)
{
//for row number check
if(dataRow.RowNumber() >=3 && dataRow.RowNumber() <= 20)
{
//to get column # 3's data
var cell = dataRow.Cell(3).Value;
}
}
}
RowsUsed method is helpful in commonly faced problems which require processing the rows of an excel sheet.
It works easily
XLWorkbook workbook = new XLWorkbook(FilePath);
var rowCount = workbook.Worksheet(1).LastRowUsed().RowNumber();
var columnCount = workbook.Worksheet(1).LastColumnUsed().ColumnNumber();
int column = 1;
int row = 1;
List<string> ll = new List<string>();
while (row <= rowCount)
{
while (column <= columnCount)
{
string title = workbook.Worksheets.Worksheet(1).Cell(row, column).GetString();
ll.Add(title);
column++;
}
row++;
column = 1;
}

How to copy specific rows from DataTable to another one

I want to copy some rows of data table, to another one. I tried this code :
DataTable Result = new DataTable();
for(int i = 5; i < PageSize && i < TransactionDataTable.Rows.Count ; i++)
{
DataRow dr = (DataRow)TransactionDataTable.Rows[i];
Result.ImportRow(dr);
}
string MobileNumber = TransactionDataTable.Rows[0]["MobileRegistration_MobileNumber"].ToString();
string MobileNumber2 = Result.Rows[0]["MobileRegistration_MobileNumber"].ToString();
TransactionDataTable is a dataTable with more than 1000 rows.
in above code, MobileNumber has proper vallue, but MobileNumber2 dosent have.
I got this error in last line ( in assigning MobileNumber2 value) :
Additional information: Column 'MobileRegistration_MobileNumber' does not belong to table .
It seems that the rows didnt copy properly, in Result dataTable.
whats the wrong with this code?
and I tried Result.Rows.Add(dr); instead of Result.ImportRow(dr);
but an exception throw with this information:
This row already belongs to another table.
Thanks for any helping...
You are making new datatable Result without any column. So, make sure you are bringing all columns from the table which you are going to copy.
In your case, you can clone the TransactionDataTable and then import the row.
The updated copy implementation will be like this :
DataTable Result = TransactionDataTable.Clone();
for(int i = 5; i < PageSize && i < TransactionDataTable.Rows.Count ; i++)
{
DataRow dr = (DataRow)TransactionDataTable.Rows[i];
Result.ImportRow(dr);
}
string MobileNumber = TransactionDataTable.Rows[0]["MobileRegistration_MobileNumber"].ToString();
string MobileNumber2 = Result.Rows[0]["MobileRegistration_MobileNumber"].ToString();

How to remove AutoFilter Using C# in EPPlus

I have tried the below C# CODING:
wsDt.Cells["A10:G10"].AutoFilter = false;
but the filter is not removed from my excel.
Any other way to remove it.
Thanks...
In Excel, when you use the Format as Table option it will not only style the data but will also create a Named Range - Table1. This option also automatically enables the Filter Buttons. After formatting as a table, you can uncheck Filter Buttons in the Table Tools -> Table Styles Options.
What works for me is doing the same programmatically.
LoadFromDataTable(DataTable, bool, TableStyles) basically
pastes data to the worksheet starting at the ExcelRange
applies a Table Format
uses the DataTable.TableName to name the range
enables Filter Button
Disable the Filter Button
use the DataTable.TableName to reference the named range
set ShowFilter to false
enter code here
//imagine a table with 5 columns
DataTable dt = new DataTable();
dt.TableName = "UniqueTableName";
//define the cells where the headers will appear
int topRow = 1;
int leftMostColumn = 1;
int rightMostColumn = 5;
//bind the DataTable using LoadFromDataTable()
OfficeOpenXml.ExcelRange excelRange = worksheet.Cells[topRow, leftMostColumn, topRow, rightMostColumn];
excelRange.LoadFromDataTable(dt, true, OfficeOpenXml.Table.TableStyles.Light8);
//turn of the filtering
OfficeOpenXml.Table.ExcelTable table = worksheet.Tables[dt.TableName];
table.ShowFilter = false;
This seems to be an EPPlus bug and I don't think it has been resolved as of the latest release (4.04), at least I could figure out a solution. My workaround is to simply load the spreadsheet values a row at a time with a loop:
int sheetRow = 3;
for (int outer = 0; outer < outerSourceTable.Rows.Count; outer++)
{
var outerThingId = Convert.ToInt32(outerSourceTable.Rows[outer]["OuterThingId"]);
var outerThingName = Convert.ToString(outerSourceTable.Rows[outer]["OuterThing"]);
var innerThingsTable = _repository.GetInnerThings(outerThingId);
if (innerThingsTable.Rows.Count > 0)
{
myWorksheet.Cells[sheetRow, 1].Value = outerThingName;
// Load the data into the worksheet. We need to load a row at a time
// to avoid the auto-filter bug
for (int inner = 0; inner < innerThingsTable.Rows.Count; inner++)
{
var innerName = Convert.ToString(innerThingsTable.Rows[inner]["Name"]);
var innerDescr = Convert.ToString(innerThingsTable.Rows[inner]["Description"]);
myWorksheet.Cells[sheetRow, 2].Value = innerName;
myWorksheet.Cells[sheetRow, 3].Value = innerDescr;
sheetRow++;
}
sheetRow++;
}
}
If you populate your excel data using the LoadFromCollection() call. You can then reference it using the default Excel table name of "Table 1".
This is the same idea as Patricks answer but demonstrates the use without DataTable.
excelWorksheet.Cells.LoadFromCollection(myCollection);
ExcelTable table = excelWorksheet.Tables["Table1"];
table.ShowFilter = false;
Sometimes excel creates the Table as a named range. In my instance the Table was the only thing in the first worksheet, so the following helped me:
var ws = wb.Worksheets.First();
ws.NamedRanges.FirstOrDefault()?.Ranges.FirstOrDefault()?.SetAutoFilter(false);

Categories