Get Column number from cell value, openxml - c#

I have an Excel wherein i want to get the column number for eg the below image :
In the above image , i know that the records will appear on the 1st row , but i am unsure of the Column number. In above example the column value : "Quantity" appears on "D1". I know the row number how can i find the column number ("D" in the above case) using OPEN XML, as the column name quantity might appear anywhere in the excel and i need to find the corresponding values of only quantity.

Unfortunately there's not a single method you can call to find the correct cell. Instead you'll need to iterate over the cells to find the matching text. To complicate things slightly, the value in the cell is not always the actual text. Instead strings can be stored in the SharedStringTablePart and the value of the cell is an index into the contents of that table.
Something like the following should do what you're after:
private static string GetCellReference(string filename, string sheetName, int rowIndex, string textToFind)
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filename, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
//get the correct sheet
Sheet sheet = workbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == sheetName).First();
if (sheet != null)
{
WorksheetPart worksheetPart = workbookPart.GetPartById(sheet.Id) as WorksheetPart;
SharedStringTablePart stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
Row row = sheetData.Elements<Row>().Where(r => r.RowIndex == rowIndex).First();
if (row != null)
{
foreach (Cell c in row.Elements<Cell>())
{
string cellText;
if (c.DataType == CellValues.SharedString)
{
//the value will be a number which is an index into the shared strings table
int index = int.Parse(c.CellValue.InnerText);
cellText = stringTable.SharedStringTable.ElementAt(index).InnerText;
}
else
{
//just take the value from the cell (note this won't work for dates and other types)
cellText = c.CellValue.InnerText;
}
if (cellText == textToFind)
{
return c.CellReference;
}
}
}
}
}
return null;
}
This can then be called like this:
string cellReference = GetCellReference(#"c:\temp\test.xlsx", "Sheet1", 1, "Quantity");
Console.WriteLine(cellReference); //prints D1 for your example
If you just want D rather than D1 you can use a simple regex to remove the numbers:
private static string GetColumnName(string cellReference)
{
if (cellReference == null)
return null;
return Regex.Replace(cellReference, "[0-9]", "");
}
And then use it like this:
string cellReference = GetCellReference(#"c:\temp\test.xlsx", "Sheet1", 1, "Quantity");
Console.WriteLine(GetColumnName(cellReference)); //prints D for your example

Related

Find linked formula values from worksheets and replace with actual cell value

In a OOXML spreadsheet .xlsx you can through a linking formula fecth values from another spreadsheet and have them in your worksheet as values, that will always be updated when those values in another spreadsheet are updated.
I am using Open Xml SDK and I basically want to do what this does: https://www.e-iceblue.com/Tutorials/Spire.XLS/Spire.XLS-Program-Guide/Formula/Remove-Formulas-from-Cells-but-Keep-Values-in-Excel-in-C.html
How do I:
Find a value that has formula linking value to a cell in another spreadsheet
Replace the formula value with the actual cell value
Do this foreach cell in each worksheet in a spreadsheet
I have tried this so far: https://learn.microsoft.com/en-us/office/open-xml/how-to-retrieve-the-values-of-cells-in-a-spreadsheet
But I am recieving a NullRefereceneException each time the cell does not contain a formula or just any value. I have tried try-catch and several other ways to escape this exception, but it is not working.
But back to the challenge as outlined above; can anyone help me out?
Basic stuff such as using SOME DIRECTIVE, foreach loop, Open(), Save() I know how to do.
This worked for me:
public void Remove_CellReferences(string filepath)
{
using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(filepath, true))
{
// Delete all cell references in worksheet
List<WorksheetPart> worksheetparts = spreadsheet.WorkbookPart.WorksheetParts.ToList();
foreach (WorksheetPart part in worksheetparts)
{
Worksheet worksheet = part.Worksheet;
var rows = worksheet.GetFirstChild<SheetData>().Elements<Row>(); // Find all rows
foreach (var row in rows)
{
var cells = row.Elements<Cell>();
foreach (Cell cell in cells)
{
if (cell.CellFormula != null)
{
string formula = cell.CellFormula.InnerText;
if (formula.Length > 0)
{
string hit = formula.Substring(0, 1); // Transfer first 1 characters to string
if (hit == "[")
{
CellValue cellvalue = cell.CellValue; // Save current cell value
cell.CellFormula = null; // Remove RTD formula
// If cellvalue does not have a real value
if (cellvalue.Text == "#N/A")
{
cell.DataType = CellValues.String;
cell.CellValue = new CellValue("Invalid data removed");
}
else
{
cell.CellValue = cellvalue; // Insert saved cell value
}
}
}
}
}
}
}
// Delete all external link references
List<ExternalWorkbookPart> extwbParts = spreadsheet.WorkbookPart.ExternalWorkbookParts.ToList();
if (extwbParts.Count > 0)
{
foreach (ExternalWorkbookPart extpart in extwbParts)
{
var elements = extpart.ExternalLink.ChildElements.ToList();
foreach (var element in elements)
{
if (element.LocalName == "externalBook")
{
spreadsheet.WorkbookPart.DeletePart(extpart);
}
}
}
}
// Delete calculation chain
CalculationChainPart calc = spreadsheet.WorkbookPart.CalculationChainPart;
spreadsheet.WorkbookPart.DeletePart(calc);
}
}

Reading cell value with applied Text formatting with OpenXML

I am trying to read Excel Sheet, that contains cells with Text formatting.
Some one of columns has values 1, 1.1, 1.2 and etc.
In Excel all of these values look good, in cells with Text formatting - 1, 1.1, 1.2.
But when I read that cells with OpenXML, I got values 1, 1.1000000000000001, 1.2 - some one of them has decimal parts.
OKay, I checked xl\worksheets\sheet1.xml in *.xlsx file and I see, that really contains value 1.1000000000000001
<row r="3" spans="1:20" ht="15" x14ac:dyDescent="0.25">
<c r="A3" s="2">
<v>1.1000000000000001</v>
</c>
My code is:
List<List<string>> rows = new List<List<string>>();
List<string> cols;
spreadsheetDocument = SpreadsheetDocument.Open(excelFilePath, false);
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
SharedStringTablePart sstpart = workbookPart.GetPartsOfType<SharedStringTablePart>().First();
SharedStringTable sst = sstpart.SharedStringTable;
foreach (Row r in sheetData.Elements<Row>())
{
cols = new List<string>();
foreach (Cell c in r.Elements<Cell>())
{
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
int ssid = int.Parse(c.CellValue.Text);
string str = sst.ChildElements[ssid].InnerText;
cols.Add(str);
}
else
{
cols.Add(c.CellValue?.InnerText);
}
}
rows.Add(cols);
}
spreadsheetDocument.Close();
How shall I get correct value from such cells? For example, 1.1, but not 1.1000000000000001.
First create this method to get the value of a cell.
public static string GetCellValue(WorkbookPart workbookPart, Cell cell)
{
string value = null;
if (cell.InnerText.Length > 0)
{
value = cell.InnerText;
// If the cell represents an integer number, you are done.
// For dates, this code returns the serialized value that
// represents the date. The code handles strings and
// Booleans individually. For shared strings, the code
// looks up the corresponding value in the shared string
// table. For Booleans, the code converts the value into
// the words TRUE or FALSE.
if (cell.DataType != null)
{
switch (cell.DataType.Value)
{
case CellValues.SharedString:
// For shared strings, look up the value in the
// shared strings table.
var stringTable =
workbookPart.GetPartsOfType<SharedStringTablePart>()
.FirstOrDefault();
// If the shared string table is missing, something
// is wrong. Return the index that is in
// the cell. Otherwise, look up the correct text in
// the table.
if (stringTable != null)
{
value =
stringTable.SharedStringTable
.ElementAt(int.Parse(value)).InnerText;
}
break;
case CellValues.Boolean:
value = value switch
{
"0" => "FALSE",
_ => "TRUE",
};
break;
}
}
}
return value;
}
Then use this code for your inner for loop:
foreach (Cell c in r.Elements<Cell>())
{
string str = GetCellValue(workbookPart, c);
cols.add(str);
}
The GetCellValue method is inspired by Microsoft documentation on this page

How to get a specific row by worksheet and rowindex in excel file

I need to choose a specific cell in my excel file via using openXML:
Worksheet workSheet = workSheetPart.Worksheet;
Cell cell = GetCell(workSheet, "B", 2);
private static Cell GetCell(Worksheet worksheet,
string columnName, uint rowIndex)
{
Row row = GetRow(worksheet, rowIndex);
if (row == null)
return null;
return row.Elements<Cell>().Where(c => string.Compare
(c.CellReference.Value, columnName +
rowIndex, true) == 0).First();
}
private static Row GetRow(Worksheet worksheet, uint rowIndex)
{
var test = worksheet.GetFirstChild<SheetData>().
Elements<Row>().Where(r => r.RowIndex == rowIndex).First(); //Here is the problem.
return worksheet.GetFirstChild<SheetData>().
Elements<Row>().Where(r => r.RowIndex == rowIndex).First();
}
When debugging I have noticed that RowIndex is null, so this is causing the problem I guess in the linq query
Use ElementAt...
var test = worksheet.GetFirstChild<SheetData>().Elements<Row>().ElementAt(rowIndex);
When you create a row, you have to set the Row index value for each row you added to the excel.
Row headerRow = new Row() { RowIndex = new UInt32Value(1) }; //Here I used as 1(one), use accordingly
When you create a cell, you have to set the CellReference value for each cell you created.
Cell cell = new Cell() { };
cell.CellReference = new StringValue("A1");
cell.DataType = CellValues.String;
cell.CellValue = new CellValue("Test value");
If the Row index and cell reference is not set, the values will be null when you query among it.
Some other useful methods for using in excel
//To get the CellReference
private static string GetExcelCellReference(uint columnNumber, uint rowNumber)
{
return $"{GetExcelColumnName(columnNumber)}{rowNumber}";
}
//To get the excel column name using column number
private static string GetExcelColumnName(uint columnNumber)
{
int dividend = (int)columnNumber;
string columnName = String.Empty;
int modulo;
while (dividend > 0)
{
modulo = (dividend - 1) % 26;
columnName = Convert.ToChar(65 + modulo).ToString() + columnName;
dividend = (int)((dividend - modulo) / 26);
}
return columnName;
}
//To get the excel column name from cellname or cellreference number
private static string GetColumnName(string cellName)
{
// Create a regular expression to match the column name portion of the cell name.
Regex regex = new Regex("[A-Za-z]+");
Match match = regex.Match(cellName);
return match.Value;
}
//To get the row index from cell name or cell reference number
private static uint GetRowIndex(string cellName)
{
// Create a regular expression to match the row index portion the cell name.
Regex regex = new Regex(#"\d+");
Match match = regex.Match(cellName);
return uint.Parse(match.Value);
}

Add Cell and Row in openXML

I have pre define excel format i need to pass the data to excel.I'm able to get the particular sheet .But don't know how to pass the data to cell.
var excelDocument = new ExcelDocument();
var fileName = Guid.NewGuid();
string filePath = HttpContext.Current.Server.MapPath("~/Uploads/TemplateFiles/test.xlsx");
using (SpreadsheetDocument document =
SpreadsheetDocument.Open(filePath, false))
{
WorkbookPart workbookPart = document.WorkbookPart;
Workbook workbook = document.WorkbookPart.Workbook;
string sheetName = workbookPart.Workbook.Descendants<Sheet>().ElementAt(1).Name;
IEnumerable<Sheet> sheets = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == "Census Template for Import");
if (sheets.Count() == 0)
{
// The specified worksheet does not exist.
return null;
}
WorksheetPart worksheetPart = (WorksheetPart)document.WorkbookPart.GetPartById(sheets.First().Id);
SheetData sheetData = worksheetPart.Worksheet.GetFirstChild<SheetData>();
var excelRows = sheetData.Descendants<DocumentFormat.OpenXml.Spreadsheet.Row>().ToList();
int rowindex = 10;
foreach (var item in census)
{
//how to write the data in cell
rowindex++;
}
worksheetPart.Worksheet.Save();
workbookPart.Workbook.Save();
document.Close();
//worksheetPart.Worksheet.Save();
}
return filePath;
Here is a method for getting a cell or adding a new one, if the cell does not exists, when you know both the row and column indexes.
Note that:
rowIndex and columnIndex should start with 1
property RowIndex of a Row should be initialized during the creation of the row
property CellReference of a Cell should be initialized during the creation of the cell
If RowIndex or CellReference is null, then NullReferenceException will be thrown.
private Cell InsertCell(uint rowIndex, uint columnIndex, Worksheet worksheet)
{
Row row = null;
var sheetData = worksheet.GetFirstChild<SheetData>();
// Check if the worksheet contains a row with the specified row index.
row = sheetData.Elements<Row>().FirstOrDefault(r => r.RowIndex == rowIndex);
if (row == null)
{
row = new Row() { RowIndex = rowIndex };
sheetData.Append(row);
}
// Convert column index to column name for cell reference.
var columnName = GetExcelColumnName(columnIndex);
var cellReference = columnName + rowIndex; // e.g. A1
// Check if the row contains a cell with the specified column name.
var cell = row.Elements<Cell>()
.FirstOrDefault(c => c.CellReference.Value == cellReference);
if (cell == null)
{
cell = new Cell() { CellReference = cellReference };
if (row.ChildElements.Count < columnIndex)
row.AppendChild(cell);
else
row.InsertAt(cell, (int)columnIndex);
}
return cell;
}
Here you will find the code of GetExcelColumnName() method.
Can't tell if its a new file your creating or appending into an existing one but:
spreadSheet.WorkbookPart.WorksheetParts.First().Worksheet.First().AppendChild(new Row());
sheet.First().Last().AppendChild(new Cell() { CellValue = new CellValue("test") });
Should work for both cases but the new cell will be put on the last active row in the first sheet.
I had the same issue that you had and this article How to: Insert text into a cell in a spreadsheet document (Open XML SDK). I guess you need to insert a new Cell object into your worksheet and then insert the specified data (assuming it is a string or that it has already being cast into a string) into that cell.
Seems you define rowindex=10, there are two way to add rows.
If row 10 is last row in your excel then you can simply append new row like:
foreach (var item in census)
{
//how to write the data in cell
Row row = new Row();
row.RowIndex = (UInt32)rowindex;
Cell cell = new Cell()
{
DataType = CellValues.String,
CellValue = new CellValue("value")
};
row.Append(cell);
sheetData.Append(row);
rowindex++;
}
If there are rows after row 10 then you have to use insert,then manually change rows and cells after row 10
index to the right index value like:
foreach (var item in census)
{
//how to write the data in cell
Row refRow = GetRow(sheetData, rowIndex);
++rowIndex;
Cell cell1 = new Cell() { CellReference = "A" + rowIndex };
CellValue cellValue1 = new CellValue();
cellValue1.Text = "";
cell1.Append(cellValue1);
Row newRow = new Row()
{
RowIndex = rowIndex
};
newRow.Append(cell1);
for (int i = (int)rowIndex; i <= sheetData.Elements<Row>().Count(); i++)
{
var row = sheetData.Elements<Row>().Where(r => r.RowIndex.Value == i).FirstOrDefault();
row.RowIndex++;
foreach (Cell c in row.Elements<Cell>())
{
string refer = c.CellReference.Value;
int num = Convert.ToInt32(Regex.Replace(refer, #"[^\d]*", ""));
num++;
string letters = Regex.Replace(refer, #"[^A-Z]*", "");
c.CellReference.Value = letters + num;
}
}
sheetData.InsertAfter(newRow, refRow);
rowindex++;
}
static Row GetRow(SheetData wsData, UInt32 rowIndex)
{
var row = wsData.Elements<Row>().
Where(r => r.RowIndex.Value == rowIndex).FirstOrDefault();
if (row == null)
{
row = new Row();
row.RowIndex = rowIndex;
wsData.Append(row);
}
return row;
}
This is a prototype. You might need to change some code or variable name to fit your project.
References:
append rows in Excel by OpenXML
insert rows in Excel by OpenXML

In ClosedXML, is there anyway to get the column letter from column header name?

I have an excel worksheet that has column headers and I don't want to hard code the column letter or index so I am trying to figure out how I could make it dynamic. I am looking for something like this:
var ws = wb.Worksheet("SheetName");
var range = ws.RangeUsed();
var table = range.AsTable();
string colLetter = table.GetColumnLetter("ColHeader");
foreach (var row in table.Rows())
{
if (i > 1)
{
string val = row.Cell(colLetter).Value.ToString();
}
i++;
}
Does ClosedXML support anything like the made up GetColumnLetter() function above so I don't have to hard code column letters?
Sure, get the cell you want using a predicate on the CellsUsed collection on the row with the headers, then return the column letter from the column.
public string GetColumnName(IXLTable table, string columnHeader)
{
var cell = table.HeadersRow().CellsUsed(c => c.Value.ToString() == columnHeader).FirstOrDefault();
if (cell != null)
{
return cell.WorksheetColumn().ColumnLetter();
}
return null;
}
For version 0.95.4.0 I did the next steps
var ws = wb.Worksheet("SheetName");
var range = ws.RangeUsed();
var table = range.AsTable();
var cell = table.FindColumn(c => c.FirstCell().Value.ToString() == yourColumnName);
if (cell != null)
{
var columnLetter = cell.RangeAddress.FirstAddress.ColumnLetter;
}

Categories