How to always skip specified number of Rows using DocumentFormat.OpenXml

How to always skip specified number of Rows using DocumentFormat.OpenXml - c#

I iterate through Rows using DocumentFormat.OpenXml, sometimes i need to start from 4th, 8th, 11th row. I define how many rows should be skipped with "skipRows" and "If" below let's me skip unnecessary rows:
var rows = sheet.Descendants<Row>();
foreach (Row row in rows)
{
if (dataRowIndex < skipRows)
{
dataRowIndex++;
continue;
}
The problem is that sometimes when row is completely empty it automatically doesn't iterate through it. Sometimes when it's empty it iterates through it. It always iterates when there is any cell written in said row. Why is that? How can I ensure that it always skips for example 6 rows no matter if there is any data in cells in those rows?

Sometimes when it's empty it iterates through it. It always iterates when there is any cell written in said row. Why is that?
This is due to the way the XML schema is defined. A row is completely optional in the schema; if there's no data in a row then there's no requirement to write it to the XML (although there's nothing to stop it being written either). If there is a cell in a row then the row must be written to the XML as a cell is a child of a row; without the row there would be nowhere to write the cell.
How can I ensure that it always skips for example 6 rows no matter if there is any data in cells in those rows?
You can use the RowIndex property of the Row to find out the actual index of the Row being read.
The following example should do what you're after:
using (SpreadsheetDocument document = SpreadsheetDocument.Open(filePath, false))
{
WorkbookPart workbookPart = document.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
SharedStringTablePart stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
var rows = sheetData.Descendants<Row>();
foreach (Row row in rows)
{
if (row.RowIndex <= skipRows)
{
continue;
}
//this is just to show that it's outputting from the first non-skipped row
Cell cell = row.GetFirstChild<Cell>();
string contents;
if (cell.DataType == CellValues.SharedString)
{
int index = int.Parse(cell.CellValue.InnerText);
contents = stringTable.SharedStringTable.ElementAt(index).InnerText;
}
else
{
contents = cell.InnerText;
}
Console.WriteLine(contents);
}
}

Related

C# : OpenXML "Remove" empty rows from excel

I have a sample.xlsx file generated by a program(of which I have no control) with data spanning across 200 rows. However, the file has another 1800 "trailing" empty rows(as shown in image) saved into it making it a total of 2000 rows. ie, if I read this file using openXML and get the rowcount, it is 2000 instead of 200.
Am trying to figure out a way to delete these rows so that the excel will be having only rows with data.
NOTE : There are no empty rows in between.
All the empty rows comes at the end of the excel only. How can I check for empty rows and delete it or Find out the rows with value and delete the remainig rows by row index ?
PS : I can't use Interop Services.
And also the counts 200 and 2000 are for example purpose, it may vary from one file to another.
Below is my code:
using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(#"C:\sample.xlsx", true))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
var thisSheet = workbookPart.Workbook.Descendants<Sheet>()
.FirstOrDefault(s => s.Name == "sheet1");
SheetData sheetData = thisSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach (var row in rows)
{
//method to find empty rows or row index of empty rows
// rows.Remove() or RemoveAllChildren()
}
}

How can I programmatically delete Rows from an excel that are currently empty but have been edited once , using closed xml

I need to delete empty rows from my excel using closed XML. These rows previously had some data but are now empty. I have tried using row.IsEmpty() but it does not delete any rows at all. Below is what I have tried so far:
using (XLWorkbook workBook = new XLWorkbook(destinationPath))
{
int worksheetCount = workBook.Worksheets.Count;
for(int worksheetIndex = 1; worksheetIndex <= worksheetCount; worksheetIndex++)
{
//Read the first Sheet from Excel file.
IXLWorksheet workSheet = workBook.Worksheet(1);
int rowCount = workSheet.RowsUsed().Count();
//Loop through the Worksheet rows.
foreach (IXLRow row in workSheet.RowsUsed())
{
if (row.IsEmpty())
{
row.Delete();
}
}
workBook.Save();
int rowCountNew = workSheet.RowsUsed().Count();
My rowCount and rowCountNew have the same values when actually they should be different. Also,even though I have empty rows in the excel my if condition continues to remain false and hence never hits row.delete(). Hope my question makes sense.
Thanks In Advance!

Your core issue is to find out why row.IsEmpty() returns false. For the parameterless .IsEmpty() overload, ClosedXML uses the XLCellsUsedOptions.AllContents value to determine whether cells are empty. This means that if a cell contains values or comments, it is considered non-empty. My guess is that you forgot to clear some comments.

Using closed XML library we can iterate through each workbook and delete a given row based on the row number
string srcFile ="srcfile.xlsx";
string dstFile = "destnation file.xlsx";
using (XLWorkbook wb = new XLWorkbook(srcFile))
{
foreach (var item in wb.Worksheets)
{
item.Row(2).Delete();
}
wb.SaveAs(dstFile);
}

Can not read spread sheet cell with different font in C#

A spread sheet contains 5 columns and 8 rows. First row is the header of the sheet. I have to read all the data of the cells. The last column contains blank value. C# is unable to read the 5th column of 7th and 8th row. The below code returns array index out of bound exception.
rows[7].Descendants().ElementAt(4)
I have found that the font size of 5th column of rows 7th and 8th are different from the other. If I have changed the font size then it is working fine. I don't have any explanation for this unnatural behaviour.
Please find the code base below
using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(filePath, false))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
var workBookSheet = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>().Where(s => s.Name.ToString() == "Feuil1").FirstOrDefault();
if (workBookSheet != null)
{
string relationshipId = workBookSheet.Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
Worksheet workSheet = worksheetPart.Worksheet;
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
var data = rows.ElementAt(2).Descendants<Cell>().ElementAt(4);
}
}
The spreadsheet looks like below -
enter image description here
As per the sheet I have got error on row 3 column 4.

Try this to get the value
string val = (Range)xlWorkSheet.Cells[7, 5]).Value2.ToString();

Reading from Excel File using ClosedXML

My Excel file is not in tabular data. I am trying to read from an excel file.
I have sections within my excel file that are tabular.
I need to loop through rows 3 to 20 which are tabular and read the data.
Here is party of my code:
string fileName = "C:\\Folder1\\Prev.xlsx";
var workbook = new XLWorkbook(fileName);
var ws1 = workbook.Worksheet(1);
How do I loop through rows 3 to 20 and read columns 3,4, 6, 7, 8?
Also if a row is empty, how do I determine that so I can skip over it without reading that each column has a value for a given row.

To access a row:
var row = ws1.Row(3);
To check if the row is empty:
bool empty = row.IsEmpty();
To access a cell (column) in a row:
var cell = row.Cell(3);
To get the value from a cell:
object value = cell.Value;
// or
string value = cell.GetValue<string>();
For more information see the documentation.

Here's my jam.
var rows = worksheet.RangeUsed().RowsUsed().Skip(1); // Skip header row
foreach (var row in rows)
{
var rowNumber = row.RowNumber();
// Process the row
}
If you just use .RowsUsed(), your range will contain a huge number of columns. Way more than are actually filled in!
So use .RangeUsed() first to limit the range. This will help you process the file faster!
You can also use .Skip(1) to skip over the column header row (if you have one).

I'm not sure if this solution will solve OP's problem but I prefer using RowsUsed method. It can be used to get the list of only those rows which are non-empty or has been edited by the user. This way I can avoid making emptiness check while processing each row.
Below code snippet can process 3rd to 20th row numbers out of all the non-empty rows. I've filtered the empty rows before starting the foreach loop. Please bear in mind that filtering the non-empty rows before starting to process the rows can affect the total count of rows which will get processed. So you need to be careful while applying any logic which is based on the total number of rows processed inside foreach loop.
string fileName = "C:\\Folder1\\Prev.xlsx";
using (var excelWorkbook = new XLWorkbook(fileName))
{
var nonEmptyDataRows = excelWorkbook.Worksheet(1).RowsUsed();
foreach (var dataRow in nonEmptyDataRows)
{
//for row number check
if(dataRow.RowNumber() >=3 && dataRow.RowNumber() <= 20)
{
//to get column # 3's data
var cell = dataRow.Cell(3).Value;
}
}
}
RowsUsed method is helpful in commonly faced problems which require processing the rows of an excel sheet.

It works easily
XLWorkbook workbook = new XLWorkbook(FilePath);
var rowCount = workbook.Worksheet(1).LastRowUsed().RowNumber();
var columnCount = workbook.Worksheet(1).LastColumnUsed().ColumnNumber();
int column = 1;
int row = 1;
List<string> ll = new List<string>();
while (row <= rowCount)
{
while (column <= columnCount)
{
string title = workbook.Worksheets.Worksheet(1).Cell(row, column).GetString();
ll.Add(title);
column++;
}
row++;
column = 1;
}

Inserting formula giving error.. Excel found unreadable content in "ab.xlsx". Do you want to recover the

my requirement to is to insert formula for a cell. i am using below method to insert formula. And its inserting formula corectly and formula working fine.
but when i insert formula my excel file got corrpted and showing the message
"Excel found unreadable content in "exceltemplate.xlsx"
Do you want to recover the contents of...".
I searched lot,but not getting resolved.
Please help to resolve this
public void InsertFormula(string filepath, string SheetName, string strCellIndex, string strFormula)
{
using (SpreadsheetDocument document = SpreadsheetDocument.Open(filepath, true))
{
IEnumerable<Sheet> sheets = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == SheetName);
if (sheets.Count() == 0)
{
// The specified worksheet does not exist.
return;
}
WorksheetPart worksheetPart = (WorksheetPart)document.WorkbookPart.GetPartById(sheets.First().Id);
Worksheet worksheet = worksheetPart.Worksheet;
SheetData sheetData = worksheet.GetFirstChild<SheetData>();
Row row1 = new Row()
{
RowIndex = (UInt32Value)4U,
Spans = new ListValue<StringValue>()
};
Cell cell = new Cell() { CellReference = strCellIndex };
CellFormula cellformula = new CellFormula();
cellformula.Text = strFormula;
cell.DataType = CellValues.Number;
CellValue cellValue = new CellValue();
cellValue.Text = "0";
cell.Append(cellformula);
cell.Append(cellValue);
row1.Append(cell);
sheetData.Append(row1);
worksheet.Save();
document.Close();
}
}

There are 2 problems with the function.
1st problem is that you explicitly set the RowIndex to 4U. The cell that you're assigning the formula to has to be on row 4, say cell C4. Since the cell reference is passed in as a parameter (strCellIndex), that's not guaranteed.
And even if you fixed that, we have the next (and more insidious) problem...
2nd problem is a little harder to fix. The Row class has to be inserted in order within the SheetData class (as child objects), ordered by RowIndex. Let's assume you still want RowIndex to be hard-coded as 4U. This means if the existing Excel file has rows 2, 3 and 7, you have to insert the Row class behind the Row class with RowIndex 3. This is important, otherwise Excel will puke blood (as you've already experienced).
The solution to the 2nd problem requires a little more work. Consider the functions InsertAt(), InsertBefore() and InsertAfter() of the SheetData class (or most of the Open XML SDK classes actually). Iterate through the child classes of SheetData till you find a Row class with a RowIndex greater than the Row class you're inserting. Then use InsertBefore().
I will leave you with the fun task of error checking, such as if there are no Row classes to begin with, or all Row classes have RowIndex-es less than your to-be-inserted Row class, or (here's the fun one) an existing Row class with the same RowIndex as the Row class you want to insert.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to always skip specified number of Rows using DocumentFormat.OpenXml - c#

Related

C# : OpenXML "Remove" empty rows from excel

How can I programmatically delete Rows from an excel that are currently empty but have been edited once , using closed xml

Can not read spread sheet cell with different font in C#

Reading from Excel File using ClosedXML

Inserting formula giving error.. Excel found unreadable content in "ab.xlsx". Do you want to recover the

Categories

Resources