Getting incorrect cell value while parsing excel with OpenXML - c#

I am trying to parse an excel and get the result in datatable using C# and openxml.
Below is my code snippet.
value = cell.CellValue.InnerText;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
return doc.WorkbookPart.SharedStringTablePart.SharedStringTable.ChildElements.GetItem(int.Parse(value)).InnerText;
}
return value;
But if the cell value is 80.3600 then it is getting parsed as 80.36.
Also if the value is 03-Jan-2018 then it is getting parsed as 43103.
The problem is, the excel which I am trying to parse is dynamically generated and at run time I won't know which column is date and which column is numeric.
Is there any way to get the value as it is or get every value as a string i.e. no formatting?

i've noticed , numeric and date time cell's value have different styleIndex value.
you can get cell format by styleIndex from doc.WorkbookPart.WorkbookStylesPart.Stylesheet.NumberingFormats.
var doc = SpreadsheetDocument.Open(File.Open("D:\\123.xlsx", FileMode.Open), false);
var sheet = doc.WorkbookPart.Workbook.Descendants<Sheet>().FirstOrDefault();
WorksheetPart wsPart = (WorksheetPart)(doc.WorkbookPart.GetPartById(sheet.Id));
var cells = wsPart.Worksheet.Descendants<Cell>().ToList();
var numberingFormats = doc.WorkbookPart.WorkbookStylesPart.Stylesheet.NumberingFormats.ToList();
var stringTable = doc.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
foreach (var cell in cells)
{
if (cell.DataType == null)
{
//DateTime
if (cell.StyleIndex != null)
{
var numerFormat = numberingFormats.ElementAt((int) cell.StyleIndex.Value - 1) as NumberingFormat;
if (numerFormat.FormatCode.Value == "[$-409]mmmm\\ d\\,\\ yyyy;#")
{
Console.WriteLine(DateTime.FromOADate(double.Parse(cell.InnerText)).ToString("MMMM dd,yyyy"));
}
else if (numerFormat.FormatCode.Value == "[$-409]dd\\-mmm\\-yy;#")
{
Console.WriteLine(DateTime.FromOADate(double.Parse(cell.InnerText)).ToString("dd-MMM-yy"));
}
}
else
{
//Numeric
Console.WriteLine(int.Parse(cell.InnerText));
}
}
else if (cell.DataType.Value == CellValues.SharedString)
{
Console.WriteLine(stringTable.SharedStringTable.ElementAt(int.Parse(cell.InnerText)).InnerText);
}
}
also can read this one:Excel Interop cell formatting of Dates

Related

Find linked formula values from worksheets and replace with actual cell value

In a OOXML spreadsheet .xlsx you can through a linking formula fecth values from another spreadsheet and have them in your worksheet as values, that will always be updated when those values in another spreadsheet are updated.
I am using Open Xml SDK and I basically want to do what this does: https://www.e-iceblue.com/Tutorials/Spire.XLS/Spire.XLS-Program-Guide/Formula/Remove-Formulas-from-Cells-but-Keep-Values-in-Excel-in-C.html
How do I:
Find a value that has formula linking value to a cell in another spreadsheet
Replace the formula value with the actual cell value
Do this foreach cell in each worksheet in a spreadsheet
I have tried this so far: https://learn.microsoft.com/en-us/office/open-xml/how-to-retrieve-the-values-of-cells-in-a-spreadsheet
But I am recieving a NullRefereceneException each time the cell does not contain a formula or just any value. I have tried try-catch and several other ways to escape this exception, but it is not working.
But back to the challenge as outlined above; can anyone help me out?
Basic stuff such as using SOME DIRECTIVE, foreach loop, Open(), Save() I know how to do.
This worked for me:
public void Remove_CellReferences(string filepath)
{
using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(filepath, true))
{
// Delete all cell references in worksheet
List<WorksheetPart> worksheetparts = spreadsheet.WorkbookPart.WorksheetParts.ToList();
foreach (WorksheetPart part in worksheetparts)
{
Worksheet worksheet = part.Worksheet;
var rows = worksheet.GetFirstChild<SheetData>().Elements<Row>(); // Find all rows
foreach (var row in rows)
{
var cells = row.Elements<Cell>();
foreach (Cell cell in cells)
{
if (cell.CellFormula != null)
{
string formula = cell.CellFormula.InnerText;
if (formula.Length > 0)
{
string hit = formula.Substring(0, 1); // Transfer first 1 characters to string
if (hit == "[")
{
CellValue cellvalue = cell.CellValue; // Save current cell value
cell.CellFormula = null; // Remove RTD formula
// If cellvalue does not have a real value
if (cellvalue.Text == "#N/A")
{
cell.DataType = CellValues.String;
cell.CellValue = new CellValue("Invalid data removed");
}
else
{
cell.CellValue = cellvalue; // Insert saved cell value
}
}
}
}
}
}
}
// Delete all external link references
List<ExternalWorkbookPart> extwbParts = spreadsheet.WorkbookPart.ExternalWorkbookParts.ToList();
if (extwbParts.Count > 0)
{
foreach (ExternalWorkbookPart extpart in extwbParts)
{
var elements = extpart.ExternalLink.ChildElements.ToList();
foreach (var element in elements)
{
if (element.LocalName == "externalBook")
{
spreadsheet.WorkbookPart.DeletePart(extpart);
}
}
}
}
// Delete calculation chain
CalculationChainPart calc = spreadsheet.WorkbookPart.CalculationChainPart;
spreadsheet.WorkbookPart.DeletePart(calc);
}
}

Reading cell value with applied Text formatting with OpenXML

I am trying to read Excel Sheet, that contains cells with Text formatting.
Some one of columns has values 1, 1.1, 1.2 and etc.
In Excel all of these values look good, in cells with Text formatting - 1, 1.1, 1.2.
But when I read that cells with OpenXML, I got values 1, 1.1000000000000001, 1.2 - some one of them has decimal parts.
OKay, I checked xl\worksheets\sheet1.xml in *.xlsx file and I see, that really contains value 1.1000000000000001
<row r="3" spans="1:20" ht="15" x14ac:dyDescent="0.25">
<c r="A3" s="2">
<v>1.1000000000000001</v>
</c>
My code is:
List<List<string>> rows = new List<List<string>>();
List<string> cols;
spreadsheetDocument = SpreadsheetDocument.Open(excelFilePath, false);
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
SharedStringTablePart sstpart = workbookPart.GetPartsOfType<SharedStringTablePart>().First();
SharedStringTable sst = sstpart.SharedStringTable;
foreach (Row r in sheetData.Elements<Row>())
{
cols = new List<string>();
foreach (Cell c in r.Elements<Cell>())
{
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
int ssid = int.Parse(c.CellValue.Text);
string str = sst.ChildElements[ssid].InnerText;
cols.Add(str);
}
else
{
cols.Add(c.CellValue?.InnerText);
}
}
rows.Add(cols);
}
spreadsheetDocument.Close();
How shall I get correct value from such cells? For example, 1.1, but not 1.1000000000000001.
First create this method to get the value of a cell.
public static string GetCellValue(WorkbookPart workbookPart, Cell cell)
{
string value = null;
if (cell.InnerText.Length > 0)
{
value = cell.InnerText;
// If the cell represents an integer number, you are done.
// For dates, this code returns the serialized value that
// represents the date. The code handles strings and
// Booleans individually. For shared strings, the code
// looks up the corresponding value in the shared string
// table. For Booleans, the code converts the value into
// the words TRUE or FALSE.
if (cell.DataType != null)
{
switch (cell.DataType.Value)
{
case CellValues.SharedString:
// For shared strings, look up the value in the
// shared strings table.
var stringTable =
workbookPart.GetPartsOfType<SharedStringTablePart>()
.FirstOrDefault();
// If the shared string table is missing, something
// is wrong. Return the index that is in
// the cell. Otherwise, look up the correct text in
// the table.
if (stringTable != null)
{
value =
stringTable.SharedStringTable
.ElementAt(int.Parse(value)).InnerText;
}
break;
case CellValues.Boolean:
value = value switch
{
"0" => "FALSE",
_ => "TRUE",
};
break;
}
}
}
return value;
}
Then use this code for your inner for loop:
foreach (Cell c in r.Elements<Cell>())
{
string str = GetCellValue(workbookPart, c);
cols.add(str);
}
The GetCellValue method is inspired by Microsoft documentation on this page

How to get the value of cell containing a date and keep the original formatting using NPOI

I have an Excel file that I edited using DevExpress and I am reading using NPOI. When I try to get the value of a date cell as string, it does not keep the original value.
For example:
In a DevExpress grid I set this value: 2016-08-12. I want to obtain the same value in my string but instead I get 42689.
My code to get the cell value is like this:
ICell cell = row.GetCell(i);
cell.SetCellType(CellType.String);
string fieldString = cell.StringCellValue;
result = result + ";" + FieldValue;
How can I get the original formatted date value?
In Excel, dates are stored as numbers. If you want to get a formatted date, you'll need to check whether the cell contains a date (there's a utility method for that), then get the date value of the cell, get the data format, and finally convert the date to string using the format. You should not force the CellType to string or else you will no longer be able to tell that the cell originally held a date. I would recommend making an extension method like this to get the formatted cell value based on its type:
using NPOI.SS.UserModel;
public static class NpoiExtensions
{
public static string GetFormattedCellValue(this ICell cell, IFormulaEvaluator eval = null)
{
if (cell != null)
{
switch (cell.CellType)
{
case CellType.String:
return cell.StringCellValue;
case CellType.Numeric:
if (DateUtil.IsCellDateFormatted(cell))
{
DateTime date = cell.DateCellValue;
ICellStyle style = cell.CellStyle;
// Excel uses lowercase m for month whereas .Net uses uppercase
string format = style.GetDataFormatString().Replace('m', 'M');
return date.ToString(format);
}
else
{
return cell.NumericCellValue.ToString();
}
case CellType.Boolean:
return cell.BooleanCellValue ? "TRUE" : "FALSE";
case CellType.Formula:
if (eval != null)
return GetFormattedCellValue(eval.EvaluateInCell(cell));
else
return cell.CellFormula;
case CellType.Error:
return FormulaError.ForInt(cell.ErrorCellValue).String;
}
}
// null or blank cell, or unknown cell type
return string.Empty;
}
}
Then, use it like this:
ICell cell = row.GetCell(i);
string fieldString = cell.GetFormattedCellValue();
result = result + ";" + FieldValue;
Optional: If you have any formulas in your cells and you want those formulas to be evaluated, then create an IFormulaEvaluator based on your workbook type and pass the evaluator to the GetFormattedCellValue() method. For example:
IFormulaEvaluator eval;
if (workbook is XSSFWorkbook)
eval = new XSSFFormulaEvaluator(workbook);
else
eval = new HSSFFormulaEvaluator(workbook);
...
ICell cell = row.GetCell(i);
string fieldString = cell.GetFormattedCellValue(eval);
result = result + ";" + FieldValue;
You can use this code.
DateTime.FromOADate(row.Cells[0].NumericCellValue)
If the file has custom formatted dates, you will need to test for those, otherwise the function returns a numeric. This version of Brian Rogers' answer will check:
public static string GetFormattedCellValue(this ICell cell, IFormulaEvaluator eval = null)
{
// https://github.com/tonyqus/npoi/blob/master/main/SS/UserModel/BuiltinFormats.cs
//*The first user-defined format starts at 164.
// var dataformatNumber = cell.CellStyle.DataFormat;
//var formatstring = cell.CellStyle.GetDataFormatString();
//e.g. m/d/yyyy\ h:mm:ss\ \ AM/PM\ #164
//e.g m/d/yyyy\ hh:mm #165
if (cell != null)
{
switch (cell.CellType)
{
case CellType.String:
return cell.StringCellValue;
case CellType.Numeric:
if (DateUtil.IsCellDateFormatted(cell))
{
DateTime date = cell.DateCellValue;
ICellStyle style = cell.CellStyle;
// Excel uses lowercase m for month whereas .Net uses uppercase
string format = style.GetDataFormatString().Replace('m', 'M');
return date.ToString(format);
}
else if(cell.CellStyle.DataFormat>=164 && DateUtil.IsValidExcelDate(cell.NumericCellValue) && cell.DateCellValue != null)
{
return cell.DateCellValue.ToString();
}
else
{
return cell.NumericCellValue.ToString();
}
case CellType.Boolean:
return cell.BooleanCellValue ? "TRUE" : "FALSE";
case CellType.Formula:
if (eval != null)
return GetFormattedCellValue(eval.EvaluateInCell(cell));
else
return cell.CellFormula;
case CellType.Error:
return FormulaError.ForInt(cell.ErrorCellValue).String;
}
}
// null or blank cell, or unknown cell type
return string.Empty;
}

In ClosedXML, is there anyway to get the column letter from column header name?

I have an excel worksheet that has column headers and I don't want to hard code the column letter or index so I am trying to figure out how I could make it dynamic. I am looking for something like this:
var ws = wb.Worksheet("SheetName");
var range = ws.RangeUsed();
var table = range.AsTable();
string colLetter = table.GetColumnLetter("ColHeader");
foreach (var row in table.Rows())
{
if (i > 1)
{
string val = row.Cell(colLetter).Value.ToString();
}
i++;
}
Does ClosedXML support anything like the made up GetColumnLetter() function above so I don't have to hard code column letters?
Sure, get the cell you want using a predicate on the CellsUsed collection on the row with the headers, then return the column letter from the column.
public string GetColumnName(IXLTable table, string columnHeader)
{
var cell = table.HeadersRow().CellsUsed(c => c.Value.ToString() == columnHeader).FirstOrDefault();
if (cell != null)
{
return cell.WorksheetColumn().ColumnLetter();
}
return null;
}
For version 0.95.4.0 I did the next steps
var ws = wb.Worksheet("SheetName");
var range = ws.RangeUsed();
var table = range.AsTable();
var cell = table.FindColumn(c => c.FirstCell().Value.ToString() == yourColumnName);
if (cell != null)
{
var columnLetter = cell.RangeAddress.FirstAddress.ColumnLetter;
}

Get Column number from cell value, openxml

I have an Excel wherein i want to get the column number for eg the below image :
In the above image , i know that the records will appear on the 1st row , but i am unsure of the Column number. In above example the column value : "Quantity" appears on "D1". I know the row number how can i find the column number ("D" in the above case) using OPEN XML, as the column name quantity might appear anywhere in the excel and i need to find the corresponding values of only quantity.
Unfortunately there's not a single method you can call to find the correct cell. Instead you'll need to iterate over the cells to find the matching text. To complicate things slightly, the value in the cell is not always the actual text. Instead strings can be stored in the SharedStringTablePart and the value of the cell is an index into the contents of that table.
Something like the following should do what you're after:
private static string GetCellReference(string filename, string sheetName, int rowIndex, string textToFind)
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filename, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
//get the correct sheet
Sheet sheet = workbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == sheetName).First();
if (sheet != null)
{
WorksheetPart worksheetPart = workbookPart.GetPartById(sheet.Id) as WorksheetPart;
SharedStringTablePart stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
Row row = sheetData.Elements<Row>().Where(r => r.RowIndex == rowIndex).First();
if (row != null)
{
foreach (Cell c in row.Elements<Cell>())
{
string cellText;
if (c.DataType == CellValues.SharedString)
{
//the value will be a number which is an index into the shared strings table
int index = int.Parse(c.CellValue.InnerText);
cellText = stringTable.SharedStringTable.ElementAt(index).InnerText;
}
else
{
//just take the value from the cell (note this won't work for dates and other types)
cellText = c.CellValue.InnerText;
}
if (cellText == textToFind)
{
return c.CellReference;
}
}
}
}
}
return null;
}
This can then be called like this:
string cellReference = GetCellReference(#"c:\temp\test.xlsx", "Sheet1", 1, "Quantity");
Console.WriteLine(cellReference); //prints D1 for your example
If you just want D rather than D1 you can use a simple regex to remove the numbers:
private static string GetColumnName(string cellReference)
{
if (cellReference == null)
return null;
return Regex.Replace(cellReference, "[0-9]", "");
}
And then use it like this:
string cellReference = GetCellReference(#"c:\temp\test.xlsx", "Sheet1", 1, "Quantity");
Console.WriteLine(GetColumnName(cellReference)); //prints D for your example

Categories