Populate an excel sheet column with directory files using OpenXML

Populate an excel sheet column with directory files using OpenXML - c#

I have an excel workbook with two WorkSheets, "Tourist Information" and "Documents". In the "Documents" sheet, I have to fill the "Scanned Document" column with all the file names found in a directory. I don't have to fill any other column except Scanned Document column. I am unable to fill the excel sheet with file names which start from cell reference C3. Could you please help me to populate the column with file names.
"Documents" Sheet is:
My code is:
//Open the Excel file in Read Mode using OpenXML
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(#"C:\TouristRecord.xlsx", true))
{
WorksheetPart documents = GetWorksheetPart(doc.WorkbookPart, "Documents");
Worksheet documentsWorksheet = documents.Worksheet;
IEnumerable<Row> documentsRows = documentsWorksheet.GetFirstChild<SheetData>().Descendants<Row>();
//Loop through the Worksheet rows
foreach (var files in Directory.GetFiles(#"C:\DocumentsFolder"))
{
foreach (Row row in documentsRows)
{
// I am unable to write logic to update the excel sheet value here.
}
}
doc.Save();
}
And GetWorksheetPart method is :
public WorksheetPart GetWorksheetPart(WorkbookPart workbookPart, string sheetName)
{
string relId = workbookPart.Workbook.Descendants<Sheet>().First(s => sheetName.Equals(s.Name)).Id;
return (WorksheetPart)workbookPart.GetPartById(relId);
}

To add a cell to C3 you will need to create a new Cell object, assign it a cell reference of C3, set its value and then add it to the Row that represents row 3 on the sheet. We can wrap that logic into a method like this:
private void AddCellToRow(Row row, string value, string cellReference)
{
//the cell might already exist, if it does we should use it.
Cell cell = row.Descendants<Cell>().FirstOrDefault(c => c.CellReference == cellReference);
if (cell == null)
{
cell = new Cell();
cell.CellReference = cellReference;
}
cell.CellValue = new CellValue(value);
cell.DataType = CellValues.String;
row.Append(cell);
}
If we assume that the current worksheet has a contiguous set of rows then the logic of what to write is pretty straightforward:
Iterate each row in the document
Check if the row index is greater than 2 (as you want to start writing from 3 onwards). If it is:
Grab the 3rd Cell or create it if it doesn't exist.
add the nth element of your file list to the Cell.
Increment n
Iterate the remaining files in your file list (as you may have more files than rows in the original document). For each one:
add a new Row
add a new Cell to the Row with the file name as the cell's value.
Putting that into code you end up with:
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(#"C:\TouristRecord.xlsx", true))
{
WorksheetPart documents = GetWorksheetPart(doc.WorkbookPart, "Documents");
//get the she sheetdata as that's where we need to add rows
SheetData sheetData = documents.Worksheet.GetFirstChild<SheetData>();
IEnumerable<Row> documentsRows = sheetData.Descendants<Row>();
//get all of the files into an array
var filenames = Directory.GetFiles(#"C:\DocumentsFolder");
if (filenames.Length > 0)
{
int currentFileIndex = 0;
// keep the row index in case the rowindex property is null anywhere
// the spec allows for it to be null, in which case the row
// index is one more than the previous row (or 1 if this is the first row)
uint currentRowIndex = 1;
foreach (var documentRow in documentsRows)
{
if (documentRow.RowIndex.HasValue)
{
currentRowIndex = documentRow.RowIndex.Value;
}
else
{
currentRowIndex++;
}
if (currentRowIndex <= 2)
{
//this is row 1 or 2 so we can ignore it
continue;
}
AddCellToRow(documentRow, filenames[currentFileIndex], "C" + currentRowIndex);
currentFileIndex++;
if (filenames.Length <= currentFileIndex)
{
// there are no more files so we can stop
break;
}
}
// now output any files we haven't already output. These will need a new row as there isn't one
// in the document as yet.
for (int i = currentFileIndex; i < filenames.Length; i++)
{
//there are more files than there were rows in the directory, add more rows
Row row = new Row();
currentRowIndex++;
row.RowIndex = currentRowIndex;
AddCellToRow(row, filenames[i], "C" + currentRowIndex);
sheetData.Append(row);
}
}
}
There's an assumption above that the current worksheet has a contiguous set of rows. This might not always be true as the spec allows for empty rows to not be written to the XML. In that case, you could end up with gaps in your output. Imagine the original file has data in rows 1, 2 and 5; in that scenario the foreach would cause you to skip writing to rows 3 and 4. This can be solved by checking the currentRowIndex inside the loop and adding a new Row for any gaps that may occur. I haven't added that code as it's a complication that detracts from the fundamentals of the answer.

Related

Find linked formula values from worksheets and replace with actual cell value

In a OOXML spreadsheet .xlsx you can through a linking formula fecth values from another spreadsheet and have them in your worksheet as values, that will always be updated when those values in another spreadsheet are updated.
I am using Open Xml SDK and I basically want to do what this does: https://www.e-iceblue.com/Tutorials/Spire.XLS/Spire.XLS-Program-Guide/Formula/Remove-Formulas-from-Cells-but-Keep-Values-in-Excel-in-C.html
How do I:
Find a value that has formula linking value to a cell in another spreadsheet
Replace the formula value with the actual cell value
Do this foreach cell in each worksheet in a spreadsheet
I have tried this so far: https://learn.microsoft.com/en-us/office/open-xml/how-to-retrieve-the-values-of-cells-in-a-spreadsheet
But I am recieving a NullRefereceneException each time the cell does not contain a formula or just any value. I have tried try-catch and several other ways to escape this exception, but it is not working.
But back to the challenge as outlined above; can anyone help me out?
Basic stuff such as using SOME DIRECTIVE, foreach loop, Open(), Save() I know how to do.

This worked for me:
public void Remove_CellReferences(string filepath)
{
using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(filepath, true))
{
// Delete all cell references in worksheet
List<WorksheetPart> worksheetparts = spreadsheet.WorkbookPart.WorksheetParts.ToList();
foreach (WorksheetPart part in worksheetparts)
{
Worksheet worksheet = part.Worksheet;
var rows = worksheet.GetFirstChild<SheetData>().Elements<Row>(); // Find all rows
foreach (var row in rows)
{
var cells = row.Elements<Cell>();
foreach (Cell cell in cells)
{
if (cell.CellFormula != null)
{
string formula = cell.CellFormula.InnerText;
if (formula.Length > 0)
{
string hit = formula.Substring(0, 1); // Transfer first 1 characters to string
if (hit == "[")
{
CellValue cellvalue = cell.CellValue; // Save current cell value
cell.CellFormula = null; // Remove RTD formula
// If cellvalue does not have a real value
if (cellvalue.Text == "#N/A")
{
cell.DataType = CellValues.String;
cell.CellValue = new CellValue("Invalid data removed");
}
else
{
cell.CellValue = cellvalue; // Insert saved cell value
}
}
}
}
}
}
}
// Delete all external link references
List<ExternalWorkbookPart> extwbParts = spreadsheet.WorkbookPart.ExternalWorkbookParts.ToList();
if (extwbParts.Count > 0)
{
foreach (ExternalWorkbookPart extpart in extwbParts)
{
var elements = extpart.ExternalLink.ChildElements.ToList();
foreach (var element in elements)
{
if (element.LocalName == "externalBook")
{
spreadsheet.WorkbookPart.DeletePart(extpart);
}
}
}
}
// Delete calculation chain
CalculationChainPart calc = spreadsheet.WorkbookPart.CalculationChainPart;
spreadsheet.WorkbookPart.DeletePart(calc);
}
}

Reading cell value with applied Text formatting with OpenXML

I am trying to read Excel Sheet, that contains cells with Text formatting.
Some one of columns has values 1, 1.1, 1.2 and etc.
In Excel all of these values look good, in cells with Text formatting - 1, 1.1, 1.2.
But when I read that cells with OpenXML, I got values 1, 1.1000000000000001, 1.2 - some one of them has decimal parts.
OKay, I checked xl\worksheets\sheet1.xml in *.xlsx file and I see, that really contains value 1.1000000000000001
<row r="3" spans="1:20" ht="15" x14ac:dyDescent="0.25">
<c r="A3" s="2">
<v>1.1000000000000001</v>
</c>
My code is:
List<List<string>> rows = new List<List<string>>();
List<string> cols;
spreadsheetDocument = SpreadsheetDocument.Open(excelFilePath, false);
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
SharedStringTablePart sstpart = workbookPart.GetPartsOfType<SharedStringTablePart>().First();
SharedStringTable sst = sstpart.SharedStringTable;
foreach (Row r in sheetData.Elements<Row>())
{
cols = new List<string>();
foreach (Cell c in r.Elements<Cell>())
{
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
int ssid = int.Parse(c.CellValue.Text);
string str = sst.ChildElements[ssid].InnerText;
cols.Add(str);
}
else
{
cols.Add(c.CellValue?.InnerText);
}
}
rows.Add(cols);
}
spreadsheetDocument.Close();
How shall I get correct value from such cells? For example, 1.1, but not 1.1000000000000001.

First create this method to get the value of a cell.
public static string GetCellValue(WorkbookPart workbookPart, Cell cell)
{
string value = null;
if (cell.InnerText.Length > 0)
{
value = cell.InnerText;
// If the cell represents an integer number, you are done.
// For dates, this code returns the serialized value that
// represents the date. The code handles strings and
// Booleans individually. For shared strings, the code
// looks up the corresponding value in the shared string
// table. For Booleans, the code converts the value into
// the words TRUE or FALSE.
if (cell.DataType != null)
{
switch (cell.DataType.Value)
{
case CellValues.SharedString:
// For shared strings, look up the value in the
// shared strings table.
var stringTable =
workbookPart.GetPartsOfType<SharedStringTablePart>()
.FirstOrDefault();
// If the shared string table is missing, something
// is wrong. Return the index that is in
// the cell. Otherwise, look up the correct text in
// the table.
if (stringTable != null)
{
value =
stringTable.SharedStringTable
.ElementAt(int.Parse(value)).InnerText;
}
break;
case CellValues.Boolean:
value = value switch
{
"0" => "FALSE",
_ => "TRUE",
};
break;
}
}
}
return value;
}
Then use this code for your inner for loop:
foreach (Cell c in r.Elements<Cell>())
{
string str = GetCellValue(workbookPart, c);
cols.add(str);
}
The GetCellValue method is inspired by Microsoft documentation on this page

Iterating and adding all column names to a ComboBox using EPPlus

I need to add the column names within a sheet to a combobox
I have tried the following
var pck = new OfficeOpenXml.ExcelPackage();
pck.Load(new System.IO.FileInfo("test.xlsx").OpenRead());
var ws = pck.Workbook.Worksheets[1];
int totalCols = ws.Dimension.End.Column;
for (int i = 1; i <= totalCols; i++)
{
comboBox1.Items.Add( (ws.Column(i).ToString()));
}
}
But this produces a Null Reference Exception.
Why is that happening?

Ensure that you're loading the package correctly and selecting the values correctly:
// Select workbook
var fileInfo = new FileInfo(#"yourfile.xlsx");
// Load workbook
using (var package = new ExcelPackage(fileInfo)) {
// Itterate through workbook sheets
foreach (var sheet in package.Workbook.Worksheets){
// Itterate through each column until final column
for (int i = 1; i <= sheet.Dimension.End.Column; i++) {
comboBox1.Items.Add(sheet.Cells[1, i].Text);
}
}
}
This runs correctly in a new workbook with two sheets and values in the columns of each sheet.

Why would there be no data in the visualiser when there is valid data in the DataTable?

I'm trying to build a wrapper for SpreadsheetLight that returns a DataSet from any .xlsx document passed through it. However, I seem to be having a problem with DataRows not being added to a temporary DataTable.
Here's part of the code that parses a worksheet and generates a DataTable from it:
public DataSet ReadToDataSet(string fileName)
{
using (var wb = new SLDocument(fileName))
{
var set = new DataSet(GenerateTitle(wb.DocumentProperties.Title));
foreach (var wsName in wb.GetWorksheetNames())
{
var ws = wb.SelectWorksheet(wsName);
// Select worksheet returns a bool, so if it comes back false, try the next worksheet instead.
if (!ws) continue;
// Statistics gives indecies of the first and last data cells
var stats = wb.GetWorksheetStatistics();
// Create a new DataTable for each worksheet
var dt = new DataTable(wsName);
//var addDataColumns = true;
for (var colIdx = stats.StartColumnIndex; colIdx < stats.EndColumnIndex; colIdx++)
dt.Columns.Add(colIdx.ToString(), typeof(string));
// Scan each row
for (var rowIdx = stats.StartRowIndex; rowIdx < stats.EndRowIndex; rowIdx++)
{
//dt.Rows.Add();
var newRow = dt.NewRow();
// And each column for data
for (var colIdx = stats.StartColumnIndex; colIdx < stats.EndColumnIndex; colIdx++)
{
//if (addDataColumns)
// dt.Columns.Add();
newRow[colIdx - 1] = wb.GetCellValueAsString(rowIdx, colIdx);
//if (colIdx >= stats.EndColumnIndex)
// addDataColumns = false;
}
dt.Rows.Add(newRow);
}
set.Tables.Add(dt);
}
// Debug output
foreach (DataRow row in set.Tables[0].Rows)
{
foreach (var output in row.ItemArray)
{
Console.WriteLine(output.ToString());
}
}
return set;
}
}
Note: SpreadsheetLight indicies start from 1 instead of 0;
Now, I've tried replacing dt.Rows.Add() with new object[stats.EndColumnIndex -1];, as well as a temporary variable from var newRow = dt.NewRow(); and then passing them into the DataTable afterwards, but still get the same end result. The row objects are populating correctly, but aren't transferring to the DataTable at the end.
When you explore the object during runtime, it shows the correct number of rows and columns in the relevant properties. But when you open it up in the DataVisualiser you can only see the columns, no rows.
I must be missing something obvious.
Update
I looped through the resulting table and output the values to the console as a test. All the correct values appear, but the visualiser remains empty:
I guess the question now is, why would there be no data in the visualiser when there is valid data in the DataTable?
Update 2
Added the full method for reference, including a simple set of for loops to loop through all rows and columns in the first DataTable. Note: I also experimented with pulling the column creation out of the loop and even setting the datatypes. Made no difference. Commented code shows the original.

Ok, turns out the problem was most likely from the columns being added. Either there were too many columns for the visualiser to handle (1024) which I find hard to believe, or there was a bug in visual studio that's randomly corrected itself.
There's also a bug in SpreadsheetLight that lists all columns as having data when you call GetWorksheetStatistics(); so I've used a workaround that uses the maximum number of total cells available OR the stats.NumberOfColumns, whichever is the smallest.
Either way, the below code now functions.
public DataSet ReadToDataSet(string fileName)
{
using (var wb = new SLDocument(fileName))
{
var set = new DataSet(GenerateTitle(wb.DocumentProperties.Title));
foreach (var wsName in wb.GetWorksheetNames())
{
var ws = wb.SelectWorksheet(wsName);
// Select worksheet returns a bool, so if it comes back false, try the next worksheet instead.
if (!ws) continue;
// Statistics gives indecies of the first and last data cells
var stats = wb.GetWorksheetStatistics();
// There is a bug with the stats columns. Take the total number of elements available or the columns from the stats table, whichever is the smallest
var newColumnIndex = stats.NumberOfCells < stats.NumberOfColumns
? stats.NumberOfCells
: stats.NumberOfColumns;
// Create a new DataTable for each worksheet
var dt = new DataTable(wsName);
var addDataColumns = true;
// Scan each row
for (var rowIdx = stats.StartRowIndex; rowIdx < stats.EndRowIndex; rowIdx++)
{
var newRow = dt.NewRow();
// And each column for data
for (var colIdx = stats.StartColumnIndex; colIdx < newColumnIndex; colIdx++)
{
if (addDataColumns)
dt.Columns.Add();
newRow[colIdx - 1] = wb.GetCellValueAsString(rowIdx, colIdx);
}
addDataColumns = false;
dt.Rows.Add(newRow);
}
set.Tables.Add(dt);
}
return set;
}
}
Hopefully someone else finds this as a useful reference in the future, either for SpreadsheetLight or DataVisualiser in Visual Studio. If anyone know's of any limits for the visualiser, I'm all ears!

Get Column number from cell value, openxml

I have an Excel wherein i want to get the column number for eg the below image :
In the above image , i know that the records will appear on the 1st row , but i am unsure of the Column number. In above example the column value : "Quantity" appears on "D1". I know the row number how can i find the column number ("D" in the above case) using OPEN XML, as the column name quantity might appear anywhere in the excel and i need to find the corresponding values of only quantity.

Unfortunately there's not a single method you can call to find the correct cell. Instead you'll need to iterate over the cells to find the matching text. To complicate things slightly, the value in the cell is not always the actual text. Instead strings can be stored in the SharedStringTablePart and the value of the cell is an index into the contents of that table.
Something like the following should do what you're after:
private static string GetCellReference(string filename, string sheetName, int rowIndex, string textToFind)
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filename, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
//get the correct sheet
Sheet sheet = workbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == sheetName).First();
if (sheet != null)
{
WorksheetPart worksheetPart = workbookPart.GetPartById(sheet.Id) as WorksheetPart;
SharedStringTablePart stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
Row row = sheetData.Elements<Row>().Where(r => r.RowIndex == rowIndex).First();
if (row != null)
{
foreach (Cell c in row.Elements<Cell>())
{
string cellText;
if (c.DataType == CellValues.SharedString)
{
//the value will be a number which is an index into the shared strings table
int index = int.Parse(c.CellValue.InnerText);
cellText = stringTable.SharedStringTable.ElementAt(index).InnerText;
}
else
{
//just take the value from the cell (note this won't work for dates and other types)
cellText = c.CellValue.InnerText;
}
if (cellText == textToFind)
{
return c.CellReference;
}
}
}
}
}
return null;
}
This can then be called like this:
string cellReference = GetCellReference(#"c:\temp\test.xlsx", "Sheet1", 1, "Quantity");
Console.WriteLine(cellReference); //prints D1 for your example
If you just want D rather than D1 you can use a simple regex to remove the numbers:
private static string GetColumnName(string cellReference)
{
if (cellReference == null)
return null;
return Regex.Replace(cellReference, "[0-9]", "");
}
And then use it like this:
string cellReference = GetCellReference(#"c:\temp\test.xlsx", "Sheet1", 1, "Quantity");
Console.WriteLine(GetColumnName(cellReference)); //prints D for your example

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Populate an excel sheet column with directory files using OpenXML - c#

Related

Find linked formula values from worksheets and replace with actual cell value

Reading cell value with applied Text formatting with OpenXML

Iterating and adding all column names to a ComboBox using EPPlus

Why would there be no data in the visualiser when there is valid data in the DataTable?

Get Column number from cell value, openxml

Categories

Resources