I am investing the AutoSizeColumn() of NPOI Excel exporting. From this SO question, I know that when exporting a DataTable to excel, have to write all data inside one column before calling the AutoSizeColumn(). Example: (as from this SO answer:
HSSFWorkbook spreadsheet = new HSSFWorkbook();
DataSet results = GetSalesDataFromDatabase();
//here, we must insert at least one sheet to the workbook. otherwise, Excel will say 'data lost in file'
HSSFSheet sheet1 = spreadsheet.CreateSheet("Sheet1");
foreach (DataColumn column in results.Tables[0].Columns)
{
int rowIndex = 0;
foreach (DataRow row in results.Tables[0].Rows)
{
HSSFRow dataRow = sheet1.CreateRow(rowIndex);
dataRow.CreateCell(column.Ordinal).SetCellValue(row[column].ToString());
rowIndex++;
}
sheet1.AutoSizeColumn(column.Ordinal);
}
//Write the stream data of workbook to the file 'test.xls' in the temporary directory
FileStream file = new FileStream(Path.Combine(Path.GetTempPath(), "test.xls") , FileMode.Create);
spreadsheet.Write(file);
file.Close();
When I open the test.xls, only the last column has value. All those previous columns does not have anything in it! (The size of each column is adjusted however.)
FYI: 1) I use the code in GetTable() from https://www.dotnetperls.com/datatable
2) I am using C#.
When I open the test.xls, only the last column has value. All those previous columns does not have anything in it! (The size of each column is adjusted however.)
That's because you're looping through the column and (re)creating ALL the rows on each iteration. This row (re)creation loop effectively overwrites the old rows (with their previous cell value set) with a blank one. Thus all columns but last become blank.
Try switching the loop order so that rows are iterated first then columns:
HSSFWorkbook spreadsheet = new HSSFWorkbook();
DataSet results = GetSalesDataFromDatabase();
//here, we must insert at least one sheet to the workbook. otherwise, Excel will say 'data lost in file'
HSSFSheet sheet1 = spreadsheet.CreateSheet("Sheet1");
int rowIndex = 0;
foreach (DataRow row in results.Tables[0].Rows)
{
HSSFRow dataRow = sheet1.CreateRow(rowIndex);
foreach (DataColumn column in results.Tables[0].Columns)
{
dataRow.CreateCell(column.Ordinal).SetCellValue(row[column].ToString());
}
rowIndex++;
}
for(var i = 0; i< results.Tables[0].Columns.Count; i++)
{
sheet1.AutoSizeColumn(i);
}
//Write the stream data of workbook to the file 'test.xls' in the temporary directory
FileStream file = new FileStream(Path.Combine(Path.GetTempPath(), "test.xls"), FileMode.Create);
spreadsheet.Write(file);
file.Close();
Related
I need to delete empty rows from my excel using closed XML. These rows previously had some data but are now empty. I have tried using row.IsEmpty() but it does not delete any rows at all. Below is what I have tried so far:
using (XLWorkbook workBook = new XLWorkbook(destinationPath))
{
int worksheetCount = workBook.Worksheets.Count;
for(int worksheetIndex = 1; worksheetIndex <= worksheetCount; worksheetIndex++)
{
//Read the first Sheet from Excel file.
IXLWorksheet workSheet = workBook.Worksheet(1);
int rowCount = workSheet.RowsUsed().Count();
//Loop through the Worksheet rows.
foreach (IXLRow row in workSheet.RowsUsed())
{
if (row.IsEmpty())
{
row.Delete();
}
}
workBook.Save();
int rowCountNew = workSheet.RowsUsed().Count();
My rowCount and rowCountNew have the same values when actually they should be different. Also,even though I have empty rows in the excel my if condition continues to remain false and hence never hits row.delete(). Hope my question makes sense.
Thanks In Advance!
Your core issue is to find out why row.IsEmpty() returns false. For the parameterless .IsEmpty() overload, ClosedXML uses the XLCellsUsedOptions.AllContents value to determine whether cells are empty. This means that if a cell contains values or comments, it is considered non-empty. My guess is that you forgot to clear some comments.
Using closed XML library we can iterate through each workbook and delete a given row based on the row number
string srcFile ="srcfile.xlsx";
string dstFile = "destnation file.xlsx";
using (XLWorkbook wb = new XLWorkbook(srcFile))
{
foreach (var item in wb.Worksheets)
{
item.Row(2).Delete();
}
wb.SaveAs(dstFile);
}
I'm using SSIS package to clean and load data from .Xlsx file to SQL Server table.
I have also to highlight cells containing wrong data in .Xlsx file, for this I have to get back column and row indexes based on column name and row id(witch I have in my data spreadsheet). For that I compare each column name from my first spreadsheet (Error_Sheet) with rows of a column that I added in a second spreadsheet and do the same for rows, and if I have the same value of cells I get back the column and row indexes of my data spreadsheet and highlight the cell based on that column and row index. The script worked fine, but after trying to run it from a server I got an Memory exception and also on my workstation where it was working fine before.
I've tried to reduce the range that I'm taking data from : AC1:AC10000 to AC1:AC100, it worked only after the first time compilation, but it keeps throwing exception again.
string strSQLErrorColumns = "Select * From [" + Error_Sheet + "AC1:AC100]";
OleDbConnection cn = new OleDbConnection(strCn);
OleDbDataAdapter objAdapterErrorColumns = new OleDbDataAdapter(strSQLErrorColumns, cn);
System.Data.DataSet dsErrorColumns = new DataSet();
objAdapterErrorColumns.Fill(dsErrorColumns, Error_Sheet);
System.Data.DataTable dtErrorColumns = dsErrorColumns.Tables[Error_Sheet];
dsErrorColumns.Dispose();
objAdapterErrorColumns.Dispose();
foreach (DataColumn ColumnData in dtDataColumns.Columns){
ColumnDataCellsValue = dtDataColumns.Columns[iCntD].ColumnName.ToString();
iCntE = 0;
foreach (DataRow ColumnError in dtErrorColumns.Rows){
ColumnErrorCellsValue = dtErrorColumns.Rows[iCntE].ItemArray[0].ToString();
if (ColumnDataCellsValue.Equals(ColumnErrorCellsValue)){
ColumnIndex = ColumnData.Table.Columns[ColumnDataCellsValue].Ordinal;
iCntE = iCntE + 1;
break;
}
}
iCntD = iCntD + 1;
}
ColumnIndexHCell = ColumnIndex + 1;
RowIndexHCell = RowIndex + 2;
Range rng = xlSheets.Cells[RowIndexHCell, ColumnIndexHCell] as Excel.Range;
rng.Interior.Color = System.Drawing.ColorTranslator.ToOle(System.Drawing.Color.Yellow);
There is any other way to load data in DataTable to get column and row index without using a lot of memory or by using Excel.Range.Cell instead of dataset and DataTable to get Cell value, column and row index from xlsx file please ?
I didn't show the whole code because it's long. Please keep me informed if more information needed.
When trying to read data from an Excel with huge number of Rows, it is better to read data by chunk (in OleDbDataAdapter you can use paging option to achieve that).
int result = 1;
int intPagingIndex = 0;
int intPagingInterval = 1000;
while (result > 0){
result = daGetDataFromSheet.Fill(dsErrorColumns,intPagingIndex, intPagingInterval , Error_Sheet);
System.Data.DataTable dtErrorColumns = dsErrorColumns.Tables[Error_Sheet];
//Implement your logic here
intPagingIndex += intPagingInterval ;
}
This will prevent an OutOfMemory Exception. And no more need to specify a range like AC1:AC10000
References
Paging Through a Query Result
Fill(DataSet, Int32, Int32, String)
I'm using EPPlus and iterating through all the columns of each worksheet in a workbook. I'm trying to format every column with a header that contains the word "NUMBER" as a number format. It runs and it hits the breakpoint where I set the number format but, when I open the spreadsheet, the columns are still formatted as text. Any help would very much be appreciated.
private void cleanSpreadSheet(string fileName)
{
// set all columns with a header of number to numeric type
FileInfo existingFile = new FileInfo(fileName);
var package = new ExcelPackage(existingFile);
ExcelWorkbook wb = package.Workbook;
foreach (ExcelWorksheet workSheet in wb.Worksheets)
{
var start = workSheet.Dimension.Start;
var end = workSheet.Dimension.End;
for (int col = start.Column; col <= end.Column; col++)
{ // col by col
if (workSheet.Cells[1, col].Text.ToUpper().Contains("NUMBER"))
{
workSheet.Column(col).Style.Numberformat.Format = "0";
}
}
}
package.Save();
package.Dispose();
wb.Dispose();
}
The issue may be that your existing Excel workbook has data formatted as Excel "Text". If this is the case, you won't likely be able to simply convert each cells' format to a number as Excel doesn't know how to convert "Text" to "Numbers".
Instead, you may need to iterate over each column and row in EPPlus and replace each value after casting them. The code below could use some error checking on the casts, but gives you a sense of the concept.
//foreach row ... foreach column...
ExcelRange cell = worksheet.Cells[row, col];
cell.Value = double.Parse((string)cell.Value);
I am reading data from excel to datable using EPPlus.
After reading an excel sheet with 10 rows of record, I modified the excel sheet by removing existing data and kept data for only one row.
But when I am reading the modified excel it still reading 10 rows (1 with value and remaining as null fields) to data table.
How can limit this?
I am using following code for reading Excel.
using (var pck = new OfficeOpenXml.ExcelPackage())
{
using (var stream = File.OpenRead(FilePath))
{
pck.Load(stream);
}
var ws = pck.Workbook.Worksheets.First();
bool hasHeader = true; // adjust it accordingly(this is a simple approach)
foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column])
{
DSClientTransmittal.Tables[0].Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column));
}
var startRow = hasHeader ? 2 : 1;
for (var rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
{
//var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
var wsRow = ws.Cells[rowNum, 1, rowNum, DSClientTransmittal.Tables[0].Columns.Count];
var row = DSClientTransmittal.Tables[0].NewRow();
foreach (var cell in wsRow)
{
try
{
object cellValue = cell.Value;
//row[cell.Start.Column - 1] = cell.Text;
row[cell.Start.Column - 1] = cellValue.ToString().Trim();
//cell.Style.Numberformat.Format = "#";
//row[cell.Start.Column - 1] = cell.Text;
}
catch (Exception ex) { }
}
DSClientTransmittal.Tables[0].Rows.Add(row);
}
pck.Dispose();
}
When I was using Interop excel to read excel, same issue was overcame by
clearformat() method like
ws.Columns.ClearFormats();
xlColCount = ws.UsedRange.Columns.Count;
Is there any equivalent for this in Epplus open xml?
How can I get actual used range for modified excels?
There is no built-in way of indicating that a row shouldn't be accounted for when only deleting data in some cells.
Dimension is as close as you can get, but rows are included in the Dimension if any column contains data or if any row above or below contains data.
You could however try to find out if you should skip a row in the for loop.
For example if you always delete data in the first 4 columns only, then you could try:
if(!ws.Cells[rowNum, 1, rowNum, 4].All(c => c.Value == null))
{
//Continue adding the row to the table
}
The description isn't indicating the criteria for skipping a row, but you get the idea.
To start with, I am not a C# programmer, but I think I have a solution that works using an Excel VBA script. You may be able to run this Excel VBA code with C, or get insight in how to accomplish the same thing with C+.
The problem you are having is related to the way Excel handles the working size of a worksheet. If you enter data in the 1 millionth row and then delete that cell, Excel still shows the worksheet as having 1 million rows.
I tested out this Excel VBA code and it successfully deleted all rows that were completely empty, and then reset the worksheet size.
Sub DelEmptyRowsResizeWorksheet()
Dim i As Long, iLimit As Long
iLimit = ActiveSheet.UsedRange.Rows.Count
For i = iLimit To 1 Step -1
If Application.CountA(Cells(i, 1).EntireRow) = 0 Then
Cells(i, 1).EntireRow.Delete
End If
Next i
iLimit = ActiveSheet.UsedRange.Rows.Count ' resize the worksheet based on the last row with data
End Sub
To do this manually without a script, first delete all empty rows at the bottom (or columns on the right side) of a worksheet, save it, then close and reopen the workbook. I found that this also resets the Excel workbook size.
My Excel file is not in tabular data. I am trying to read from an excel file.
I have sections within my excel file that are tabular.
I need to loop through rows 3 to 20 which are tabular and read the data.
Here is party of my code:
string fileName = "C:\\Folder1\\Prev.xlsx";
var workbook = new XLWorkbook(fileName);
var ws1 = workbook.Worksheet(1);
How do I loop through rows 3 to 20 and read columns 3,4, 6, 7, 8?
Also if a row is empty, how do I determine that so I can skip over it without reading that each column has a value for a given row.
To access a row:
var row = ws1.Row(3);
To check if the row is empty:
bool empty = row.IsEmpty();
To access a cell (column) in a row:
var cell = row.Cell(3);
To get the value from a cell:
object value = cell.Value;
// or
string value = cell.GetValue<string>();
For more information see the documentation.
Here's my jam.
var rows = worksheet.RangeUsed().RowsUsed().Skip(1); // Skip header row
foreach (var row in rows)
{
var rowNumber = row.RowNumber();
// Process the row
}
If you just use .RowsUsed(), your range will contain a huge number of columns. Way more than are actually filled in!
So use .RangeUsed() first to limit the range. This will help you process the file faster!
You can also use .Skip(1) to skip over the column header row (if you have one).
I'm not sure if this solution will solve OP's problem but I prefer using RowsUsed method. It can be used to get the list of only those rows which are non-empty or has been edited by the user. This way I can avoid making emptiness check while processing each row.
Below code snippet can process 3rd to 20th row numbers out of all the non-empty rows. I've filtered the empty rows before starting the foreach loop. Please bear in mind that filtering the non-empty rows before starting to process the rows can affect the total count of rows which will get processed. So you need to be careful while applying any logic which is based on the total number of rows processed inside foreach loop.
string fileName = "C:\\Folder1\\Prev.xlsx";
using (var excelWorkbook = new XLWorkbook(fileName))
{
var nonEmptyDataRows = excelWorkbook.Worksheet(1).RowsUsed();
foreach (var dataRow in nonEmptyDataRows)
{
//for row number check
if(dataRow.RowNumber() >=3 && dataRow.RowNumber() <= 20)
{
//to get column # 3's data
var cell = dataRow.Cell(3).Value;
}
}
}
RowsUsed method is helpful in commonly faced problems which require processing the rows of an excel sheet.
It works easily
XLWorkbook workbook = new XLWorkbook(FilePath);
var rowCount = workbook.Worksheet(1).LastRowUsed().RowNumber();
var columnCount = workbook.Worksheet(1).LastColumnUsed().ColumnNumber();
int column = 1;
int row = 1;
List<string> ll = new List<string>();
while (row <= rowCount)
{
while (column <= columnCount)
{
string title = workbook.Worksheets.Worksheet(1).Cell(row, column).GetString();
ll.Add(title);
column++;
}
row++;
column = 1;
}