Can not read spread sheet cell with different font in C# - c#

A spread sheet contains 5 columns and 8 rows. First row is the header of the sheet. I have to read all the data of the cells. The last column contains blank value. C# is unable to read the 5th column of 7th and 8th row. The below code returns array index out of bound exception.
rows[7].Descendants().ElementAt(4)
I have found that the font size of 5th column of rows 7th and 8th are different from the other. If I have changed the font size then it is working fine. I don't have any explanation for this unnatural behaviour.
Please find the code base below
using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(filePath, false))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
var workBookSheet = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>().Where(s => s.Name.ToString() == "Feuil1").FirstOrDefault();
if (workBookSheet != null)
{
string relationshipId = workBookSheet.Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
Worksheet workSheet = worksheetPart.Worksheet;
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
var data = rows.ElementAt(2).Descendants<Cell>().ElementAt(4);
}
}
The spreadsheet looks like below -
enter image description here
As per the sheet I have got error on row 3 column 4.

Try this to get the value
string val = (Range)xlWorkSheet.Cells[7, 5]).Value2.ToString();

Related

C# : OpenXML "Remove" empty rows from excel

I have a sample.xlsx file generated by a program(of which I have no control) with data spanning across 200 rows. However, the file has another 1800 "trailing" empty rows(as shown in image) saved into it making it a total of 2000 rows. ie, if I read this file using openXML and get the rowcount, it is 2000 instead of 200.
Am trying to figure out a way to delete these rows so that the excel will be having only rows with data.
NOTE : There are no empty rows in between.
All the empty rows comes at the end of the excel only. How can I check for empty rows and delete it or Find out the rows with value and delete the remainig rows by row index ?
PS : I can't use Interop Services.
And also the counts 200 and 2000 are for example purpose, it may vary from one file to another.
Below is my code:
using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(#"C:\sample.xlsx", true))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
var thisSheet = workbookPart.Workbook.Descendants<Sheet>()
.FirstOrDefault(s => s.Name == "sheet1");
SheetData sheetData = thisSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach (var row in rows)
{
//method to find empty rows or row index of empty rows
// rows.Remove() or RemoveAllChildren()
}
}

How to change the Selection activecell in sheetview using OpenXml?

I need to remove this Selection activecell and it can be done by assigning it to A1. But I could not figure out the tag hierarchy I should be using.
I used the following code.But it is not working.
WorkbookPart wbPart = doc.WorkbookPart;
var workbook = doc.WorkbookPart.Workbook;
WorksheetPart worksheetPart = wbPart.WorksheetParts.First();
Worksheet worksheet = worksheetPart.Worksheet;
SheetViews sheetViews = worksheet.SheetViews;
SheetView sheetView = sheetViews.Descendants<SheetView>().First();
var selection = sheetView.Descendants<Selection>().First();
selection.ActiveCell = "A1";
If all you want to do is the remove the selected cell then you just need to replace your last two lines with
sheetView.RemoveAllChildren();
IF you also want to make sure that A1 is the top left cell in view then you should also add the following line
sheetView.TopLeftCell = "A1";
Also be sure to save the worksheet with
worksheet.Save();

How to check Self Referencing cell formula in excel using OpenXML

Is there any way to find out Self Referencing cell formula in excel using OpenXML C#. I didn't recognize how to find out Self Referencing cell formula. For e.g Cell C3 has formula =SUM(C1:C5). How to recognize this formula had self Referencing cell? or Can I can get all cell names from this formula?
My code is:
SpreadsheetDocument excelDocument = SpreadsheetDocument.Open(FilePath, true);
WorkbookPart wbPart = excelDocument.WorkbookPart;
string sheetId = wbPart.Workbook.Descendants<Sheet>().First().Id;
WorksheetPart wsPart = (WorksheetPart)(wbPart.GetPartById(sheetId));
IEnumerable<Cell> theCells = wsPart.Worksheet.Descendants<Cell>();
foreach (Cell thecell in theCells)
{
string strFormula = thecell.CellFormula.InnerText;
}

How to always skip specified number of Rows using DocumentFormat.OpenXml

I iterate through Rows using DocumentFormat.OpenXml, sometimes i need to start from 4th, 8th, 11th row. I define how many rows should be skipped with "skipRows" and "If" below let's me skip unnecessary rows:
var rows = sheet.Descendants<Row>();
foreach (Row row in rows)
{
if (dataRowIndex < skipRows)
{
dataRowIndex++;
continue;
}
The problem is that sometimes when row is completely empty it automatically doesn't iterate through it. Sometimes when it's empty it iterates through it. It always iterates when there is any cell written in said row. Why is that? How can I ensure that it always skips for example 6 rows no matter if there is any data in cells in those rows?
Sometimes when it's empty it iterates through it. It always iterates when there is any cell written in said row. Why is that?
This is due to the way the XML schema is defined. A row is completely optional in the schema; if there's no data in a row then there's no requirement to write it to the XML (although there's nothing to stop it being written either). If there is a cell in a row then the row must be written to the XML as a cell is a child of a row; without the row there would be nowhere to write the cell.
How can I ensure that it always skips for example 6 rows no matter if there is any data in cells in those rows?
You can use the RowIndex property of the Row to find out the actual index of the Row being read.
The following example should do what you're after:
using (SpreadsheetDocument document = SpreadsheetDocument.Open(filePath, false))
{
WorkbookPart workbookPart = document.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
SharedStringTablePart stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
var rows = sheetData.Descendants<Row>();
foreach (Row row in rows)
{
if (row.RowIndex <= skipRows)
{
continue;
}
//this is just to show that it's outputting from the first non-skipped row
Cell cell = row.GetFirstChild<Cell>();
string contents;
if (cell.DataType == CellValues.SharedString)
{
int index = int.Parse(cell.CellValue.InnerText);
contents = stringTable.SharedStringTable.ElementAt(index).InnerText;
}
else
{
contents = cell.InnerText;
}
Console.WriteLine(contents);
}
}

Inserting formula giving error.. Excel found unreadable content in "ab.xlsx". Do you want to recover the

my requirement to is to insert formula for a cell. i am using below method to insert formula. And its inserting formula corectly and formula working fine.
but when i insert formula my excel file got corrpted and showing the message
"Excel found unreadable content in "exceltemplate.xlsx"
Do you want to recover the contents of...".
I searched lot,but not getting resolved.
Please help to resolve this
public void InsertFormula(string filepath, string SheetName, string strCellIndex, string strFormula)
{
using (SpreadsheetDocument document = SpreadsheetDocument.Open(filepath, true))
{
IEnumerable<Sheet> sheets = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == SheetName);
if (sheets.Count() == 0)
{
// The specified worksheet does not exist.
return;
}
WorksheetPart worksheetPart = (WorksheetPart)document.WorkbookPart.GetPartById(sheets.First().Id);
Worksheet worksheet = worksheetPart.Worksheet;
SheetData sheetData = worksheet.GetFirstChild<SheetData>();
Row row1 = new Row()
{
RowIndex = (UInt32Value)4U,
Spans = new ListValue<StringValue>()
};
Cell cell = new Cell() { CellReference = strCellIndex };
CellFormula cellformula = new CellFormula();
cellformula.Text = strFormula;
cell.DataType = CellValues.Number;
CellValue cellValue = new CellValue();
cellValue.Text = "0";
cell.Append(cellformula);
cell.Append(cellValue);
row1.Append(cell);
sheetData.Append(row1);
worksheet.Save();
document.Close();
}
}
There are 2 problems with the function.
1st problem is that you explicitly set the RowIndex to 4U. The cell that you're assigning the formula to has to be on row 4, say cell C4. Since the cell reference is passed in as a parameter (strCellIndex), that's not guaranteed.
And even if you fixed that, we have the next (and more insidious) problem...
2nd problem is a little harder to fix. The Row class has to be inserted in order within the SheetData class (as child objects), ordered by RowIndex. Let's assume you still want RowIndex to be hard-coded as 4U. This means if the existing Excel file has rows 2, 3 and 7, you have to insert the Row class behind the Row class with RowIndex 3. This is important, otherwise Excel will puke blood (as you've already experienced).
The solution to the 2nd problem requires a little more work. Consider the functions InsertAt(), InsertBefore() and InsertAfter() of the SheetData class (or most of the Open XML SDK classes actually). Iterate through the child classes of SheetData till you find a Row class with a RowIndex greater than the Row class you're inserting. Then use InsertBefore().
I will leave you with the fun task of error checking, such as if there are no Row classes to begin with, or all Row classes have RowIndex-es less than your to-be-inserted Row class, or (here's the fun one) an existing Row class with the same RowIndex as the Row class you want to insert.

Categories