I have a sample.xlsx file generated by a program(of which I have no control) with data spanning across 200 rows. However, the file has another 1800 "trailing" empty rows(as shown in image) saved into it making it a total of 2000 rows. ie, if I read this file using openXML and get the rowcount, it is 2000 instead of 200.
Am trying to figure out a way to delete these rows so that the excel will be having only rows with data.
NOTE : There are no empty rows in between.
All the empty rows comes at the end of the excel only. How can I check for empty rows and delete it or Find out the rows with value and delete the remainig rows by row index ?
PS : I can't use Interop Services.
And also the counts 200 and 2000 are for example purpose, it may vary from one file to another.
Below is my code:
using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(#"C:\sample.xlsx", true))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
var thisSheet = workbookPart.Workbook.Descendants<Sheet>()
.FirstOrDefault(s => s.Name == "sheet1");
SheetData sheetData = thisSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach (var row in rows)
{
//method to find empty rows or row index of empty rows
// rows.Remove() or RemoveAllChildren()
}
}
Related
I'm trying to make a template excel file and I need to put data at various parts of the file. I have 2 fields where the data I'm importing is from a list so in the cell I do something like this:
{Item.Name}
and I of course name the range of cells that will be populated by this list. I have run into an issue where only the first record in my list will be of the correct format/ cell merge. Every record after the first completely breaks down all of my merged cells so my formatting is not good. Any ideas of how to get closedxml.excel to recognize there are merged cells?
I don't know if there is a way to get only the merged cells, but you can check if a cell is merged:
using (var excelFileStream = new FileStream("excelfile.xlsx", FileMode.Open, FileAccess.Read))
{
using IXLWorkbook workbook = new XLWorkbook(excelFileStream);
IXLWorksheet worksheet = workbook.Worksheets.Worksheet(1);
IXLCell cell = worksheet.Cell(row: 1, column: 1);
IXLRangeAddress range = cell.MergedRange().RangeAddress;
if (range.ColumnSpan > 1 || range.RowSpan > 1)
{
//merged cell
}
else
{
//non-merged cell
}
}
I iterate through Rows using DocumentFormat.OpenXml, sometimes i need to start from 4th, 8th, 11th row. I define how many rows should be skipped with "skipRows" and "If" below let's me skip unnecessary rows:
var rows = sheet.Descendants<Row>();
foreach (Row row in rows)
{
if (dataRowIndex < skipRows)
{
dataRowIndex++;
continue;
}
The problem is that sometimes when row is completely empty it automatically doesn't iterate through it. Sometimes when it's empty it iterates through it. It always iterates when there is any cell written in said row. Why is that? How can I ensure that it always skips for example 6 rows no matter if there is any data in cells in those rows?
Sometimes when it's empty it iterates through it. It always iterates when there is any cell written in said row. Why is that?
This is due to the way the XML schema is defined. A row is completely optional in the schema; if there's no data in a row then there's no requirement to write it to the XML (although there's nothing to stop it being written either). If there is a cell in a row then the row must be written to the XML as a cell is a child of a row; without the row there would be nowhere to write the cell.
How can I ensure that it always skips for example 6 rows no matter if there is any data in cells in those rows?
You can use the RowIndex property of the Row to find out the actual index of the Row being read.
The following example should do what you're after:
using (SpreadsheetDocument document = SpreadsheetDocument.Open(filePath, false))
{
WorkbookPart workbookPart = document.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
SharedStringTablePart stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
var rows = sheetData.Descendants<Row>();
foreach (Row row in rows)
{
if (row.RowIndex <= skipRows)
{
continue;
}
//this is just to show that it's outputting from the first non-skipped row
Cell cell = row.GetFirstChild<Cell>();
string contents;
if (cell.DataType == CellValues.SharedString)
{
int index = int.Parse(cell.CellValue.InnerText);
contents = stringTable.SharedStringTable.ElementAt(index).InnerText;
}
else
{
contents = cell.InnerText;
}
Console.WriteLine(contents);
}
}
I am investing the AutoSizeColumn() of NPOI Excel exporting. From this SO question, I know that when exporting a DataTable to excel, have to write all data inside one column before calling the AutoSizeColumn(). Example: (as from this SO answer:
HSSFWorkbook spreadsheet = new HSSFWorkbook();
DataSet results = GetSalesDataFromDatabase();
//here, we must insert at least one sheet to the workbook. otherwise, Excel will say 'data lost in file'
HSSFSheet sheet1 = spreadsheet.CreateSheet("Sheet1");
foreach (DataColumn column in results.Tables[0].Columns)
{
int rowIndex = 0;
foreach (DataRow row in results.Tables[0].Rows)
{
HSSFRow dataRow = sheet1.CreateRow(rowIndex);
dataRow.CreateCell(column.Ordinal).SetCellValue(row[column].ToString());
rowIndex++;
}
sheet1.AutoSizeColumn(column.Ordinal);
}
//Write the stream data of workbook to the file 'test.xls' in the temporary directory
FileStream file = new FileStream(Path.Combine(Path.GetTempPath(), "test.xls") , FileMode.Create);
spreadsheet.Write(file);
file.Close();
When I open the test.xls, only the last column has value. All those previous columns does not have anything in it! (The size of each column is adjusted however.)
FYI: 1) I use the code in GetTable() from https://www.dotnetperls.com/datatable
2) I am using C#.
When I open the test.xls, only the last column has value. All those previous columns does not have anything in it! (The size of each column is adjusted however.)
That's because you're looping through the column and (re)creating ALL the rows on each iteration. This row (re)creation loop effectively overwrites the old rows (with their previous cell value set) with a blank one. Thus all columns but last become blank.
Try switching the loop order so that rows are iterated first then columns:
HSSFWorkbook spreadsheet = new HSSFWorkbook();
DataSet results = GetSalesDataFromDatabase();
//here, we must insert at least one sheet to the workbook. otherwise, Excel will say 'data lost in file'
HSSFSheet sheet1 = spreadsheet.CreateSheet("Sheet1");
int rowIndex = 0;
foreach (DataRow row in results.Tables[0].Rows)
{
HSSFRow dataRow = sheet1.CreateRow(rowIndex);
foreach (DataColumn column in results.Tables[0].Columns)
{
dataRow.CreateCell(column.Ordinal).SetCellValue(row[column].ToString());
}
rowIndex++;
}
for(var i = 0; i< results.Tables[0].Columns.Count; i++)
{
sheet1.AutoSizeColumn(i);
}
//Write the stream data of workbook to the file 'test.xls' in the temporary directory
FileStream file = new FileStream(Path.Combine(Path.GetTempPath(), "test.xls"), FileMode.Create);
spreadsheet.Write(file);
file.Close();
A spread sheet contains 5 columns and 8 rows. First row is the header of the sheet. I have to read all the data of the cells. The last column contains blank value. C# is unable to read the 5th column of 7th and 8th row. The below code returns array index out of bound exception.
rows[7].Descendants().ElementAt(4)
I have found that the font size of 5th column of rows 7th and 8th are different from the other. If I have changed the font size then it is working fine. I don't have any explanation for this unnatural behaviour.
Please find the code base below
using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(filePath, false))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
var workBookSheet = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>().Where(s => s.Name.ToString() == "Feuil1").FirstOrDefault();
if (workBookSheet != null)
{
string relationshipId = workBookSheet.Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
Worksheet workSheet = worksheetPart.Worksheet;
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
var data = rows.ElementAt(2).Descendants<Cell>().ElementAt(4);
}
}
The spreadsheet looks like below -
enter image description here
As per the sheet I have got error on row 3 column 4.
Try this to get the value
string val = (Range)xlWorkSheet.Cells[7, 5]).Value2.ToString();
my requirement to is to insert formula for a cell. i am using below method to insert formula. And its inserting formula corectly and formula working fine.
but when i insert formula my excel file got corrpted and showing the message
"Excel found unreadable content in "exceltemplate.xlsx"
Do you want to recover the contents of...".
I searched lot,but not getting resolved.
Please help to resolve this
public void InsertFormula(string filepath, string SheetName, string strCellIndex, string strFormula)
{
using (SpreadsheetDocument document = SpreadsheetDocument.Open(filepath, true))
{
IEnumerable<Sheet> sheets = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == SheetName);
if (sheets.Count() == 0)
{
// The specified worksheet does not exist.
return;
}
WorksheetPart worksheetPart = (WorksheetPart)document.WorkbookPart.GetPartById(sheets.First().Id);
Worksheet worksheet = worksheetPart.Worksheet;
SheetData sheetData = worksheet.GetFirstChild<SheetData>();
Row row1 = new Row()
{
RowIndex = (UInt32Value)4U,
Spans = new ListValue<StringValue>()
};
Cell cell = new Cell() { CellReference = strCellIndex };
CellFormula cellformula = new CellFormula();
cellformula.Text = strFormula;
cell.DataType = CellValues.Number;
CellValue cellValue = new CellValue();
cellValue.Text = "0";
cell.Append(cellformula);
cell.Append(cellValue);
row1.Append(cell);
sheetData.Append(row1);
worksheet.Save();
document.Close();
}
}
There are 2 problems with the function.
1st problem is that you explicitly set the RowIndex to 4U. The cell that you're assigning the formula to has to be on row 4, say cell C4. Since the cell reference is passed in as a parameter (strCellIndex), that's not guaranteed.
And even if you fixed that, we have the next (and more insidious) problem...
2nd problem is a little harder to fix. The Row class has to be inserted in order within the SheetData class (as child objects), ordered by RowIndex. Let's assume you still want RowIndex to be hard-coded as 4U. This means if the existing Excel file has rows 2, 3 and 7, you have to insert the Row class behind the Row class with RowIndex 3. This is important, otherwise Excel will puke blood (as you've already experienced).
The solution to the 2nd problem requires a little more work. Consider the functions InsertAt(), InsertBefore() and InsertAfter() of the SheetData class (or most of the Open XML SDK classes actually). Iterate through the child classes of SheetData till you find a Row class with a RowIndex greater than the Row class you're inserting. Then use InsertBefore().
I will leave you with the fun task of error checking, such as if there are no Row classes to begin with, or all Row classes have RowIndex-es less than your to-be-inserted Row class, or (here's the fun one) an existing Row class with the same RowIndex as the Row class you want to insert.