OpenXML WorksheetParts in reverse? - c#

I have the following code
SpreadsheetDocument doc = SpreadsheetDocument.Open(name, true);
foreach (WorksheetPart wsP in doc.WorkbookPart.WorksheetParts)
{
SheetData sData = wsP.Worksheet.Descendants<SheetData>().First();
var cells = sData.Descendants<Cell>().Where(c => c.CellReference.Value == "A1");
if (cells.Count<Cell>() == 1)
{
int index = Convert.ToInt32(cells.First().CellValue.Text);
SharedStringItem str = doc.WorkbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(index);
}
}
Sheet sheet = doc.WorkbookPart.Workbook.Sheets.FirstChild as Sheet;
doc.Close();
So far, I'm exploring OpenXML and just seeing how things work. I've come upon what seems to be an odd behavior. The file I am opening contains three sheets:
"Instructions", who's A1 cell is "fck ="
"Geometry", who's A1 cell is "PullDate"
"Stresses", who's A1 cell is "Section"
If I tinker with the latter part of the code (Sheet) to go to different sheets (by appending a .NextSibling at the end), that is precisely the order of sheet titles I get (by using the debugger).
However, the foreach WorksheetPart section on top goes backwards, returning me the A1 cells as "Section" and then "PullDate" and then "fck =".
Is this the expected behavior or am I doing something foolishly wrong and not noticing?

Related

How to get a merged cell from Excel using DocumentFormat.OpenXML or ClosedXML C#

I need to get the area of the merged cell, the line number on which the area ends in Excel using only DocumentFormat.OpenXml or ClosedXML, how to do this for each cell?
Using ClosedXML, this could be done with:
var ws = workbook.Worksheet("Sheet 1");
var cell = ws.Cell("A2");
var mergedRange = cell.MergedRange();
var lastCell = mergedRange.LastCell();
// or
var lastCellAddress = mergedRange.RangeAddress.LastAddress;
I found this to be a little clunky but I believe this to be the correct approach.
private static void GetMergedCells()
{
var fileName = $"c:\\temp\\Data.xlsm";
// Open the document.
using (SpreadsheetDocument document = SpreadsheetDocument.Open(fileName, false))
{
// Get the WorkbookPart object.
var workbookPart = document.WorkbookPart;
// Get the first worksheet in the document. You can change this as need be.
var worksheet = workbookPart.Workbook.Descendants<Sheet>().FirstOrDefault();
// Retrieve the WorksheetPart using the Part ID from the previous "Sheet" object.
var worksheetPart = (WorksheetPart)workbookPart.GetPartById(worksheet.Id);
// Retrieve the MergeCells element, this will contain all MergeCell elements.
var mergeCellsList = worksheetPart.Worksheet.Elements<MergeCells>();
// Now loop through and spit out each range reference for the merged cells.
// You'll need to process the range either as a string or turn it into another
// object that gives you the end row.
foreach (var mergeCells in mergeCellsList)
{
foreach (MergeCell mergeCell in mergeCells)
{
Console.WriteLine(mergeCell.Reference);
}
}
}
}
If you couldn't already tell, this is using DocumentFormat.OpenXml.Spreadsheet

Getting float number instead of exact string while reading from excel using openxml

I have a code to read Excel file into data-table using OpenXml, it works fine but however, when I read a string like this "7.5" it gives me "7.4999999" in float format why is it so?
I just want to read it as it is i.e in the string. so this should "7.5" should be 7.5 only not 7.499 or any other thing.
any help would be appreciated.
here is my code to read Excel in datatable
private static string ConvertFileToDataTable(Stream fileStream, DataTable framworkDatatable, List<FileErrorModel> fileErrorModels)
{
string frameworkName;
InitializeDataTable(framworkDatatable);
using (var document = SpreadsheetDocument.Open(fileStream, true))
{
var workbookPart = document.WorkbookPart;
var mysheet = (Sheet)document.WorkbookPart.Workbook.Sheets.ChildElements.GetItem(0);
frameworkName = mysheet.Name;
var worksheet = ((WorksheetPart)workbookPart.GetPartById(mysheet.Id)).Worksheet;
var sheetData = (SheetData)worksheet.ChildElements.GetItem(4);
foreach (var row in sheetData.Descendants<Row>())
{
if (row == null) continue;
if (row.RowIndex.Value == Constants.ColumnNameRowIndex)
{
if (row.Descendants<Cell>().Count() > Constants.ValidColumnCount || row.Descendants<Cell>().Count() < Constants.ValidColumnCount)
{
var error = new FileErrorModel
{
ErrorText = ValidationMessages.IncompatibleCoulmns,
RowNumber = null,
ColumnNumber = null
};
fileErrorModels.Add(error);
}
}
if (row.RowIndex.Value == Constants.TableNameRowIndex ||
row.RowIndex.Value == Constants.ColumnNameRowIndex) continue;
framworkDatatable.Rows.Add();
var i = 0;
foreach (var cell in row.Descendants<Cell>())
{
framworkDatatable.Rows[framworkDatatable.Rows.Count - 1][i] = GetCellValue(document, cell);
i++;
}
}
}
return frameworkName;
}
this is how I initialize Datatable
private static void InitializeDataTable(DataTable framworkDatatable)
{
framworkDatatable.Columns.AddRange(new[] {
new DataColumn(ImportSheetColumnNames.ControlFamilyIdentifier, typeof(string)),
new DataColumn(ImportSheetColumnNames.ControlFamilyShortName, typeof(string)),
new DataColumn(ImportSheetColumnNames.ControlFamilyDescription,typeof(string)),
new DataColumn(ImportSheetColumnNames.ControlIdentifier,typeof(string)),
new DataColumn(ImportSheetColumnNames.ControlType,typeof(string)),
new DataColumn(ImportSheetColumnNames.ControlDescription,typeof(string)),
new DataColumn(ImportSheetColumnNames.TestProcedure,typeof(string)),
new DataColumn(ImportSheetColumnNames.BestPractice,typeof(string)),
new DataColumn(ImportSheetColumnNames.Help,typeof(string)),
new DataColumn(ImportSheetColumnNames.ActionPlan,typeof(string)),
new DataColumn(ImportSheetColumnNames.Question,typeof(string))
});
}
ignore "ImportSheetColumnNames", they are just constants column names
In short: (binary) floating point numbers cannot exactly represent the decimal 7.5. No reason to believe that something is wrong with your code. More details e.g. here: Why can't decimal numbers be represented exactly in binary?
What do you see if you open the sheet in Excel (not in the cell, but in the formula bar)?
If I create a new Excel spreadsheet and enter the following values in to the corresponding cells:
A1 = 7.5
A2 = 7.4999999
A3 = 7.49 (but formatted to have a single decimal point showing)
I see all three values as 7.5 in the sheet, but I see those values in the formula bar.
Now, if I look in the file (using the Open XML Productivity Tool - downloadable from Microsoft.com and essential for any Open XML work), I see the following saved in the file,under
filename
/xl/workbook.xml
/xl/worksheets/sheet1.xml
x:worksheet
x:sheetData:
:
<x:sheetData xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<x:row r="1" spans="1:1" x14ac:dyDescent="0.25" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac">
<x:c r="A1">
<x:v>7.5</x:v>
</x:c>
</x:row>
<x:row r="2" spans="1:1" x14ac:dyDescent="0.25" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac">
<x:c r="A2">
<x:v>7.4999998999999997</x:v>
</x:c>
</x:row>
<x:row r="3" spans="1:1" x14ac:dyDescent="0.25" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac">
<x:c r="A3" s="1">
<x:v>7.49</x:v>
</x:c>
</x:row>
</x:sheetData>
That's what's in the file (the <x:v/> element represents "value"). Excel doesn't store the "string in the cell", it stores the value, and it also stores everything in the environment that describes how to display things to the user. A value of 7.4999999 is pretty much 7.5, and is shown as 7.5 in Excel, but it's stored as 7.4999999. The 7.5 is not stored, and there's no way to recreate it unless you try to recreate all of Excel's display rules.

Read from Excel file — specific sheet

Simply put: I need to read from an xlsx file (rows), in the simplest way. That is, preferably without using third-party tools, or at least things that aren't available as nuget packages.
I've been trying for a while with IExcelDatareader, but I cannot figure out how to get data from a specific sheet.
This simple code snippet works, but it just reads the first worksheet:
FileStream stream = File.Open("C:\\test\\test.xlsx", FileMode.Open, FileAccess.Read);
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
excelReader.IsFirstRowAsColumnNames = true;
while (excelReader.Read()) {
Console.WriteLine(excelReader.GetString(0));
}
This prints the rows in the first worksheet, but ignores the others. Of course, there is nothing to suggest otherwise, but I cannot seem to find out how to specify the sheet name.
It strikes me that this should be quite easy?
Sorry for asking something which has been asked several times before, but the answer (here and elsewhere on the net) are a jungle of bad, plain wrong and outdated half-answers that's a nightmare to try and make sense of. Especially since almost everyone answering assumes that you know some specific details that are not always easy to find.
UPDATE:
As per daniell89's suggestion below, I've tried this:
FileStream stream = File.Open("C:\\test\\test.xlsx", FileMode.Open, FileAccess.Read);
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
excelReader.IsFirstRowAsColumnNames = true;
// Select the first or second sheet - this works:
DataTable specificWorkSheet = excelReader.AsDataSet().Tables[1];
// This works: Printing the first value in each column
foreach (var col in specificWorkSheet.Columns)
Console.WriteLine(col.ToString());
// This does NOT work: Printing the first value in each row
foreach (var row in specificWorkSheet.Rows)
Console.WriteLine(row.ToString());
Printing each column heading with col.ToString() works fine.
Printing the first cell of each row with row.ToString() results in this output:
System.Data.DataRow
System.Data.DataRow
System.Data.DataRow
...
One per row, so it's obviously getting the rows. But how to get the contents, and why does ToString() work for the columns and not for the rows?
Maybe look at this answer: https://stackoverflow.com/a/32522041/5358389
DataSet workSheets= reader.AsDataSet();
And then specific sheet:
DataTable specificWorkSheet = reader.AsDataSet().Tables[yourValue];
Enumerating rows:
foreach (var row in specificWorkSheet.Rows)
Console.WriteLine(((DataRow)row)[0]); // column identifier in square brackets
You need to get the Worksheet for the sheet you want to read data from. To get range A1 from Cars, for example:
var app = new Application();
Workbooks workbooks = app.Workbooks;
Workbook workbook = workbooks.Open(#"C:\MSFT Site Account Updates_May 2015.xlsx");
Worksheet sheet = workbook.Sheets["Cars"];
Range range = sheet.Range["A1"];
It is a late reply but i hope it will help someone
The script will be aiming at retrieving data from the first sheet and also to get the data of the first row
if (upload != null && upload.ContentLength > 0)
{
// ExcelDataReader works with the binary Excel file, so it needs a FileStream
// to get started. This is how we avoid dependencies on ACE or Interop:
Stream stream = upload.InputStream;
// We return the interface, so that
IExcelDataReader reader = null;
if (upload.FileName.EndsWith(".xls"))
{
reader = ExcelReaderFactory.CreateBinaryReader(stream);
}
else if (upload.FileName.EndsWith(".xlsx"))
{
reader = ExcelReaderFactory.CreateOpenXmlReader(stream);
}
else
{
ModelState.AddModelError("File", "This file format is not supported");
return View();
}
var result = reader.AsDataSet(new ExcelDataSetConfiguration()
{
ConfigureDataTable = (_) => new ExcelDataTableConfiguration()
{
UseHeaderRow = true
}
}).Tables[0];// get the first sheet data with index 0
var tables = result.Rows[0].Table.Columns;//we have to get a list of table headers here "first row" from 1 row
foreach(var rue in tables)// iterate through the header list and add it to variable 'Headers'
{
Headers.Add(rue.ToString());//Headers has been treated as a global variable "private List<string> Headers = new List<string>();"
}
var count = Headers.Count();// test if the headers have been added using count
reader.Close();
return View(result);
}
else
{
ModelState.AddModelError("File", "Please Upload Your file");
}

Using Open XML to read an Excel spreadsheet, how do I determine the sheet that a Table is on?

If I have a loaded SpreadsheetDocument instance:
SpreadsheetDocument spreadsheetDocument
and iterate over the WorksheetParts:
foreach (var wp in spreadsheetDocument.WorkbookPart.WorksheetParts)
for every part that is a "Table" I can get to the table definition with:
wp.TableDefinitionParts
and grab the first entry. At this point I can grab the table name:
var tableName = tableDefinitionPart.Table.Name;
But how do I determine which sheet this this table is located in?
Given a WorksheetPart (as assigned to wp in your code), the first entry Parts list will be an Packaging.IdPartPair object:
var parts = wp.Parts.ToList();
var idPartPair = parts[0];
If you take a look at the value of
idPartPair.OpenXmlPart.Uri.OriginalString
it will be a string that looks like this:
/xl/tables/table2.xml
The only thing you care about is the number 2 in that string. Believe it or not, that's actually saying that the table is in the third sheet of the workbook (zero-based)
At this point, write your favorite code to extract the 2 out of the above code. My version is this, but I'm sure someone else can make this shorter:
var sheetNo = int.Parse(string.Concat(Path.GetFileNameWithoutExtension(idPartPair.OpenXmlPart.Uri.OriginalString).Skip(5)));
Next, get the list of sheets:
var sheets = spreadsheetDocument.WorkbookPart.Workbook.Sheets.ToList();
Then use sheetNo to index into it:
var sheet = (Sheet)sheets[sheetNo];
Then you can easily get the sheet name:
var sheetName = sheet.Name;

How do you preserve OpenXml cell attributes after an edit when Excel saves the document?

I want to export an excel file with data so that my users can:
Download an Excel file (export from my program)
Edit the data in the file
Save it
Upload the Excel file (reimport it into my program)
Basically I will give them the experience of having an offline file that they can edit if they do not have any internet access (as ours is a web application)
When creating Excel files using the OpenXml SDK I use the OpenXmlElement.SetAttribute method to add attributes to the columns, rows and cells of the worksheet. The attributes I add are used so that on reimport I can match the edited data with the location where it should be stored.
The attributes I export are:
Database Id's
Original value (database value at time of export to allow for easy synchronisation on import)
The date the exported data was last modified
The export routine looks like this for a cell:
var cell = new Cell {
CellReference = string.Format("{0}{1}", Column.Reference, Row.Index),
DataType = this.CellDataType
};
foreach (var keyValuePair in this.AttributeDictionary) {
cell.SetAttribute(new OpenXmlAttribute {
LocalName = keyValuePair.Key,
Value = keyValuePair.Value.ToString()
});
}
This export works fine. When examining the exported file in the OpenXml Productivity Tool I can see the attributes are added correctly. When the file is saved after editing in Excel the attributes are not preserved. Is there a way to tell Excel to preserve the attributes or is there another procedure that would be best used here to preserve the data I need for easy re-importation of the data?
Side Question:
What are the attributes for if Excel does not preserve them?
I don't think you can force Excel to round trip unknown attributes but you can add extension elements using ExtensionLists and Extensions. Excel will roundtrip these elements and are designed (as far as I can make out) for storing application specific data just as you are after.
There doesn't seem to be too much documentation around that I can find but Part 3 of the ECMA-376 spec mentions extensions.
The following code will create a sheet with a value in cell A1 and an ExtensionList with one Extension in it as a child of that cell:
public static void CreateSpreadsheetWorkbook(string filepath)
{
if (File.Exists(filepath))
File.Delete(filepath);
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Create(filepath, SpreadsheetDocumentType.Workbook))
{
// Add a WorkbookPart to the document.
WorkbookPart workbookpart = spreadsheetDocument.AddWorkbookPart();
workbookpart.Workbook = new Workbook();
// Add a WorksheetPart to the WorkbookPart.
WorksheetPart worksheetPart = workbookpart.AddNewPart<WorksheetPart>();
SheetData sheetData = new SheetData();
worksheetPart.Worksheet = new Worksheet(sheetData);
// Add Sheets to the Workbook.
Sheets sheets = spreadsheetDocument.WorkbookPart.Workbook.AppendChild<Sheets>(new Sheets());
// Append a new worksheet and associate it with the workbook.
Sheet sheet = new Sheet()
{
Id = spreadsheetDocument.WorkbookPart.
GetIdOfPart(worksheetPart),
SheetId = 1,
Name = "Sheet1"
};
sheets.Append(sheet);
Row row = new Row()
{
RowIndex = 1U
};
Cell cell = new Cell()
{
CellReference = "A1",
CellValue = new CellValue("A Test"),
DataType = CellValues.String
};
ExtensionList extensions = new ExtensionList();
Extension extension = new Extension()
{
Uri = "Testing1234"
};
extensions.AppendChild(extension);
extension.AddNamespaceDeclaration("ns", "http://tempuri/someUrl");
cell.AppendChild(extensions);
row.Append(cell);
sheetData.Append(row);
workbookpart.Workbook.Save();
// Close the document.
spreadsheetDocument.Close();
}
}
The following will read the value back again, even if the file has been round tripped through Excel.
public static void ReadSheet(string filename, string sheetName)
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filename, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
//get the correct sheet
Sheet sheet = workbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == sheetName).First();
if (sheet != null)
{
WorksheetPart worksheetPart = workbookPart.GetPartById(sheet.Id) as WorksheetPart;
foreach (Cell cell in worksheetPart.Worksheet.Descendants<Cell>())
{
ExtensionList extensions = cell.GetFirstChild<ExtensionList>();
if (extensions != null)
{
Extension extension = extensions.GetFirstChild<Extension>();
if (extension != null)
{
Console.WriteLine("Cell {0} has value {1}", cell.CellReference, extension.Uri);
}
}
}
}
}
}
The output from which is
Cell A1 has value Testing1234
As for your side question:
What are the attributes for if Excel does not preserve them?
I'm not too sure - the only time I've used the OpenXmlAttribute class is when I've used a SAX approach to write a document. In that case you need to write the attributes explicitly along with the elements. For example:
List<OpenXmlAttribute> oxa = new List<OpenXmlAttribute>();
//cell reference attribute
oxa.Add(new OpenXmlAttribute("r", "", "A1"));
//cell type attribute
oxa.Add(new OpenXmlAttribute("t", "", "str"));
//write the start element of a cell with the above attributes
oxw.WriteStartElement(new Cell(), oxa);
//write a value to the cell
oxw.WriteElement(new CellValue("Test"));
//write the end element
oxw.WriteEndElement();
My answer here has a full example of using a SAX approach.

Categories