I'm using EPPlus and iterating through all the columns of each worksheet in a workbook. I'm trying to format every column with a header that contains the word "NUMBER" as a number format. It runs and it hits the breakpoint where I set the number format but, when I open the spreadsheet, the columns are still formatted as text. Any help would very much be appreciated.
private void cleanSpreadSheet(string fileName)
{
// set all columns with a header of number to numeric type
FileInfo existingFile = new FileInfo(fileName);
var package = new ExcelPackage(existingFile);
ExcelWorkbook wb = package.Workbook;
foreach (ExcelWorksheet workSheet in wb.Worksheets)
{
var start = workSheet.Dimension.Start;
var end = workSheet.Dimension.End;
for (int col = start.Column; col <= end.Column; col++)
{ // col by col
if (workSheet.Cells[1, col].Text.ToUpper().Contains("NUMBER"))
{
workSheet.Column(col).Style.Numberformat.Format = "0";
}
}
}
package.Save();
package.Dispose();
wb.Dispose();
}
The issue may be that your existing Excel workbook has data formatted as Excel "Text". If this is the case, you won't likely be able to simply convert each cells' format to a number as Excel doesn't know how to convert "Text" to "Numbers".
Instead, you may need to iterate over each column and row in EPPlus and replace each value after casting them. The code below could use some error checking on the casts, but gives you a sense of the concept.
//foreach row ... foreach column...
ExcelRange cell = worksheet.Cells[row, col];
cell.Value = double.Parse((string)cell.Value);
Related
I need to delete empty rows from my excel using closed XML. These rows previously had some data but are now empty. I have tried using row.IsEmpty() but it does not delete any rows at all. Below is what I have tried so far:
using (XLWorkbook workBook = new XLWorkbook(destinationPath))
{
int worksheetCount = workBook.Worksheets.Count;
for(int worksheetIndex = 1; worksheetIndex <= worksheetCount; worksheetIndex++)
{
//Read the first Sheet from Excel file.
IXLWorksheet workSheet = workBook.Worksheet(1);
int rowCount = workSheet.RowsUsed().Count();
//Loop through the Worksheet rows.
foreach (IXLRow row in workSheet.RowsUsed())
{
if (row.IsEmpty())
{
row.Delete();
}
}
workBook.Save();
int rowCountNew = workSheet.RowsUsed().Count();
My rowCount and rowCountNew have the same values when actually they should be different. Also,even though I have empty rows in the excel my if condition continues to remain false and hence never hits row.delete(). Hope my question makes sense.
Thanks In Advance!
Your core issue is to find out why row.IsEmpty() returns false. For the parameterless .IsEmpty() overload, ClosedXML uses the XLCellsUsedOptions.AllContents value to determine whether cells are empty. This means that if a cell contains values or comments, it is considered non-empty. My guess is that you forgot to clear some comments.
Using closed XML library we can iterate through each workbook and delete a given row based on the row number
string srcFile ="srcfile.xlsx";
string dstFile = "destnation file.xlsx";
using (XLWorkbook wb = new XLWorkbook(srcFile))
{
foreach (var item in wb.Worksheets)
{
item.Row(2).Delete();
}
wb.SaveAs(dstFile);
}
I am investing the AutoSizeColumn() of NPOI Excel exporting. From this SO question, I know that when exporting a DataTable to excel, have to write all data inside one column before calling the AutoSizeColumn(). Example: (as from this SO answer:
HSSFWorkbook spreadsheet = new HSSFWorkbook();
DataSet results = GetSalesDataFromDatabase();
//here, we must insert at least one sheet to the workbook. otherwise, Excel will say 'data lost in file'
HSSFSheet sheet1 = spreadsheet.CreateSheet("Sheet1");
foreach (DataColumn column in results.Tables[0].Columns)
{
int rowIndex = 0;
foreach (DataRow row in results.Tables[0].Rows)
{
HSSFRow dataRow = sheet1.CreateRow(rowIndex);
dataRow.CreateCell(column.Ordinal).SetCellValue(row[column].ToString());
rowIndex++;
}
sheet1.AutoSizeColumn(column.Ordinal);
}
//Write the stream data of workbook to the file 'test.xls' in the temporary directory
FileStream file = new FileStream(Path.Combine(Path.GetTempPath(), "test.xls") , FileMode.Create);
spreadsheet.Write(file);
file.Close();
When I open the test.xls, only the last column has value. All those previous columns does not have anything in it! (The size of each column is adjusted however.)
FYI: 1) I use the code in GetTable() from https://www.dotnetperls.com/datatable
2) I am using C#.
When I open the test.xls, only the last column has value. All those previous columns does not have anything in it! (The size of each column is adjusted however.)
That's because you're looping through the column and (re)creating ALL the rows on each iteration. This row (re)creation loop effectively overwrites the old rows (with their previous cell value set) with a blank one. Thus all columns but last become blank.
Try switching the loop order so that rows are iterated first then columns:
HSSFWorkbook spreadsheet = new HSSFWorkbook();
DataSet results = GetSalesDataFromDatabase();
//here, we must insert at least one sheet to the workbook. otherwise, Excel will say 'data lost in file'
HSSFSheet sheet1 = spreadsheet.CreateSheet("Sheet1");
int rowIndex = 0;
foreach (DataRow row in results.Tables[0].Rows)
{
HSSFRow dataRow = sheet1.CreateRow(rowIndex);
foreach (DataColumn column in results.Tables[0].Columns)
{
dataRow.CreateCell(column.Ordinal).SetCellValue(row[column].ToString());
}
rowIndex++;
}
for(var i = 0; i< results.Tables[0].Columns.Count; i++)
{
sheet1.AutoSizeColumn(i);
}
//Write the stream data of workbook to the file 'test.xls' in the temporary directory
FileStream file = new FileStream(Path.Combine(Path.GetTempPath(), "test.xls"), FileMode.Create);
spreadsheet.Write(file);
file.Close();
I am reading data from excel to datable using EPPlus.
After reading an excel sheet with 10 rows of record, I modified the excel sheet by removing existing data and kept data for only one row.
But when I am reading the modified excel it still reading 10 rows (1 with value and remaining as null fields) to data table.
How can limit this?
I am using following code for reading Excel.
using (var pck = new OfficeOpenXml.ExcelPackage())
{
using (var stream = File.OpenRead(FilePath))
{
pck.Load(stream);
}
var ws = pck.Workbook.Worksheets.First();
bool hasHeader = true; // adjust it accordingly(this is a simple approach)
foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column])
{
DSClientTransmittal.Tables[0].Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column));
}
var startRow = hasHeader ? 2 : 1;
for (var rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
{
//var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
var wsRow = ws.Cells[rowNum, 1, rowNum, DSClientTransmittal.Tables[0].Columns.Count];
var row = DSClientTransmittal.Tables[0].NewRow();
foreach (var cell in wsRow)
{
try
{
object cellValue = cell.Value;
//row[cell.Start.Column - 1] = cell.Text;
row[cell.Start.Column - 1] = cellValue.ToString().Trim();
//cell.Style.Numberformat.Format = "#";
//row[cell.Start.Column - 1] = cell.Text;
}
catch (Exception ex) { }
}
DSClientTransmittal.Tables[0].Rows.Add(row);
}
pck.Dispose();
}
When I was using Interop excel to read excel, same issue was overcame by
clearformat() method like
ws.Columns.ClearFormats();
xlColCount = ws.UsedRange.Columns.Count;
Is there any equivalent for this in Epplus open xml?
How can I get actual used range for modified excels?
There is no built-in way of indicating that a row shouldn't be accounted for when only deleting data in some cells.
Dimension is as close as you can get, but rows are included in the Dimension if any column contains data or if any row above or below contains data.
You could however try to find out if you should skip a row in the for loop.
For example if you always delete data in the first 4 columns only, then you could try:
if(!ws.Cells[rowNum, 1, rowNum, 4].All(c => c.Value == null))
{
//Continue adding the row to the table
}
The description isn't indicating the criteria for skipping a row, but you get the idea.
To start with, I am not a C# programmer, but I think I have a solution that works using an Excel VBA script. You may be able to run this Excel VBA code with C, or get insight in how to accomplish the same thing with C+.
The problem you are having is related to the way Excel handles the working size of a worksheet. If you enter data in the 1 millionth row and then delete that cell, Excel still shows the worksheet as having 1 million rows.
I tested out this Excel VBA code and it successfully deleted all rows that were completely empty, and then reset the worksheet size.
Sub DelEmptyRowsResizeWorksheet()
Dim i As Long, iLimit As Long
iLimit = ActiveSheet.UsedRange.Rows.Count
For i = iLimit To 1 Step -1
If Application.CountA(Cells(i, 1).EntireRow) = 0 Then
Cells(i, 1).EntireRow.Delete
End If
Next i
iLimit = ActiveSheet.UsedRange.Rows.Count ' resize the worksheet based on the last row with data
End Sub
To do this manually without a script, first delete all empty rows at the bottom (or columns on the right side) of a worksheet, save it, then close and reopen the workbook. I found that this also resets the Excel workbook size.
My Excel file is not in tabular data. I am trying to read from an excel file.
I have sections within my excel file that are tabular.
I need to loop through rows 3 to 20 which are tabular and read the data.
Here is party of my code:
string fileName = "C:\\Folder1\\Prev.xlsx";
var workbook = new XLWorkbook(fileName);
var ws1 = workbook.Worksheet(1);
How do I loop through rows 3 to 20 and read columns 3,4, 6, 7, 8?
Also if a row is empty, how do I determine that so I can skip over it without reading that each column has a value for a given row.
To access a row:
var row = ws1.Row(3);
To check if the row is empty:
bool empty = row.IsEmpty();
To access a cell (column) in a row:
var cell = row.Cell(3);
To get the value from a cell:
object value = cell.Value;
// or
string value = cell.GetValue<string>();
For more information see the documentation.
Here's my jam.
var rows = worksheet.RangeUsed().RowsUsed().Skip(1); // Skip header row
foreach (var row in rows)
{
var rowNumber = row.RowNumber();
// Process the row
}
If you just use .RowsUsed(), your range will contain a huge number of columns. Way more than are actually filled in!
So use .RangeUsed() first to limit the range. This will help you process the file faster!
You can also use .Skip(1) to skip over the column header row (if you have one).
I'm not sure if this solution will solve OP's problem but I prefer using RowsUsed method. It can be used to get the list of only those rows which are non-empty or has been edited by the user. This way I can avoid making emptiness check while processing each row.
Below code snippet can process 3rd to 20th row numbers out of all the non-empty rows. I've filtered the empty rows before starting the foreach loop. Please bear in mind that filtering the non-empty rows before starting to process the rows can affect the total count of rows which will get processed. So you need to be careful while applying any logic which is based on the total number of rows processed inside foreach loop.
string fileName = "C:\\Folder1\\Prev.xlsx";
using (var excelWorkbook = new XLWorkbook(fileName))
{
var nonEmptyDataRows = excelWorkbook.Worksheet(1).RowsUsed();
foreach (var dataRow in nonEmptyDataRows)
{
//for row number check
if(dataRow.RowNumber() >=3 && dataRow.RowNumber() <= 20)
{
//to get column # 3's data
var cell = dataRow.Cell(3).Value;
}
}
}
RowsUsed method is helpful in commonly faced problems which require processing the rows of an excel sheet.
It works easily
XLWorkbook workbook = new XLWorkbook(FilePath);
var rowCount = workbook.Worksheet(1).LastRowUsed().RowNumber();
var columnCount = workbook.Worksheet(1).LastColumnUsed().ColumnNumber();
int column = 1;
int row = 1;
List<string> ll = new List<string>();
while (row <= rowCount)
{
while (column <= columnCount)
{
string title = workbook.Worksheets.Worksheet(1).Cell(row, column).GetString();
ll.Add(title);
column++;
}
row++;
column = 1;
}
my requirement to is to insert formula for a cell. i am using below method to insert formula. And its inserting formula corectly and formula working fine.
but when i insert formula my excel file got corrpted and showing the message
"Excel found unreadable content in "exceltemplate.xlsx"
Do you want to recover the contents of...".
I searched lot,but not getting resolved.
Please help to resolve this
public void InsertFormula(string filepath, string SheetName, string strCellIndex, string strFormula)
{
using (SpreadsheetDocument document = SpreadsheetDocument.Open(filepath, true))
{
IEnumerable<Sheet> sheets = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == SheetName);
if (sheets.Count() == 0)
{
// The specified worksheet does not exist.
return;
}
WorksheetPart worksheetPart = (WorksheetPart)document.WorkbookPart.GetPartById(sheets.First().Id);
Worksheet worksheet = worksheetPart.Worksheet;
SheetData sheetData = worksheet.GetFirstChild<SheetData>();
Row row1 = new Row()
{
RowIndex = (UInt32Value)4U,
Spans = new ListValue<StringValue>()
};
Cell cell = new Cell() { CellReference = strCellIndex };
CellFormula cellformula = new CellFormula();
cellformula.Text = strFormula;
cell.DataType = CellValues.Number;
CellValue cellValue = new CellValue();
cellValue.Text = "0";
cell.Append(cellformula);
cell.Append(cellValue);
row1.Append(cell);
sheetData.Append(row1);
worksheet.Save();
document.Close();
}
}
There are 2 problems with the function.
1st problem is that you explicitly set the RowIndex to 4U. The cell that you're assigning the formula to has to be on row 4, say cell C4. Since the cell reference is passed in as a parameter (strCellIndex), that's not guaranteed.
And even if you fixed that, we have the next (and more insidious) problem...
2nd problem is a little harder to fix. The Row class has to be inserted in order within the SheetData class (as child objects), ordered by RowIndex. Let's assume you still want RowIndex to be hard-coded as 4U. This means if the existing Excel file has rows 2, 3 and 7, you have to insert the Row class behind the Row class with RowIndex 3. This is important, otherwise Excel will puke blood (as you've already experienced).
The solution to the 2nd problem requires a little more work. Consider the functions InsertAt(), InsertBefore() and InsertAfter() of the SheetData class (or most of the Open XML SDK classes actually). Iterate through the child classes of SheetData till you find a Row class with a RowIndex greater than the Row class you're inserting. Then use InsertBefore().
I will leave you with the fun task of error checking, such as if there are no Row classes to begin with, or all Row classes have RowIndex-es less than your to-be-inserted Row class, or (here's the fun one) an existing Row class with the same RowIndex as the Row class you want to insert.