I have a spreadsheet document that has 182 columns in it. I need to place the spreadsheet data into a data table, tab by tab, but i need to find out as I'm adding data from each tab, what is the tab name, and add the tab name to a column in the data table.
This is how I set up the data table.
I then loop in the workbook and drill down to the sheetData object and walk through each row and column, getting cell data.
DataTable dt = new DataTable();
for (int i = 0; i <= col.GetUpperBound(0); i++)
{
try
{
dt.Columns.Add(new DataColumn(col[i].ToString(), typeof(string)));
}
catch (Exception e)
{
MessageBox.Show("Uploader Error" + e.ToString());
return null;
}
}
dt.Columns.Add(new DataColumn("SheetName", typeof(string)));
However at the end of the string array that I use for the Data Table, I need to add the tab name. How can I find out the tab name as I'm looping in the sheet in Open XML?
Here is my code so far:
using (SpreadsheetDocument spreadSheetDocument =
SpreadsheetDocument.Open(Destination, false))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
Workbook workbook = spreadSheetDocument.WorkbookPart.Workbook;
Sheets sheets =
spreadSheetDocument
.WorkbookPart
.Workbook
.GetFirstChild<DocumentFormat.OpenXml.Spreadsheet.Sheets>();
OpenXmlElementList list = sheets.ChildElements;
foreach (WorksheetPart worksheetpart in workbook.WorkbookPart.WorksheetParts)
{
Worksheet worksheet = worksheetpart.Worksheet;
foreach (SheetData sheetData in worksheet.Elements<SheetData>())
{
foreach (Row row in sheetData.Elements())
{
string[] thisarr = new string[183];
int index = 0;
foreach (Cell cell in row.Elements())
{
thisarr[(index)] = GetCellValue(spreadSheetDocument, cell);
index++;
}
thisarr[182] = ""; //need to add tabname here
if (thisarr[0].ToString() != "")
{
dt.Rows.Add(thisarr);
}
}
}
}
}
return dt;
Just a note: I did previously get the tab names from the InnerXML property of "list" in
OpenXmlElementList list = sheets.ChildElements;
however I noticed as I'm looping in the spreadsheet it does not get the tab names in the right order.
Here is a handy helper method to get the Sheet corresponding to a WorksheetPart:
public static Sheet GetSheetFromWorkSheet
(WorkbookPart workbookPart, WorksheetPart worksheetPart)
{
string relationshipId = workbookPart.GetIdOfPart(worksheetPart);
IEnumerable<Sheet> sheets = workbookPart.Workbook.Sheets.Elements<Sheet>();
return sheets.FirstOrDefault(s => s.Id.HasValue && s.Id.Value == relationshipId);
}
Then you can get the name from the sheets Name-property:
Sheet sheet = GetSheetFromWorkSheet(myWorkbookPart, myWorksheetPart);
string sheetName = sheet.Name;
...this will be the "tab name" OP referred to.
For the record the opposite method would look like:
public static Worksheet GetWorkSheetFromSheet(WorkbookPart workbookPart, Sheet sheet)
{
var worksheetPart = (WorksheetPart)workbookPart.GetPartById(sheet.Id);
return worksheetPart.Worksheet;
}
...and with that we can also add the following method:
public static IEnumerable<KeyValuePair<string, Worksheet>> GetNamedWorksheets
(WorkbookPart workbookPart)
{
return workbookPart.Workbook.Sheets.Elements<Sheet>()
.Select(sheet => new KeyValuePair<string, Worksheet>
(sheet.Name, GetWorkSheetFromSheet(workbookPart, sheet)));
}
Now you can easily enumerate through all Worksheets including their name.
Throw it all into a dictionary for name-based lookup if you prefer that:
IDictionary<string, WorkSheet> wsDict = GetNamedWorksheets(myWorkbookPart)
.ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
...or if you just want one specific sheet by name:
public static Sheet GetSheetFromName(WorkbookPart workbookPart, string sheetName)
{
return workbookPart.Workbook.Sheets.Elements<Sheet>()
.FirstOrDefault(s => s.Name.HasValue && s.Name.Value == sheetName);
}
(Then call GetWorkSheetFromSheet to get the corresponding Worksheet.)
The sheet names are stored in the WorkbookPart in a Sheets element which has children of element Sheet which corresponds to each worksheet in the Excel file. All you have to do is grab the correct index out of that Sheets element and that will be the Sheet you are on in your loop. I added a snippet of code below to do what you want.
int sheetIndex = 0;
foreach (WorksheetPart worksheetpart in workbook.WorkbookPart.WorksheetParts)
{
Worksheet worksheet = worksheetpart.Worksheet;
// Grab the sheet name each time through your loop
string sheetName = workbookPart.Workbook.Descendants<Sheet>().ElementAt(sheetIndex).Name;
foreach (SheetData sheetData in worksheet.Elements<SheetData>())
{
...
}
sheetIndex++;
}
Using spreadsheetDocument As SpreadsheetDocument = spreadsheetDocument.Open("D:\Libro1.xlsx", True)
Dim workbookPart As WorkbookPart = spreadsheetDocument.WorkbookPart
workbookPart.Workbook.Descendants(Of Sheet)()
Dim worksheetPart As WorksheetPart = workbookPart.WorksheetParts.Last
Dim text As String
For Each Sheet As Sheet In spreadsheetDocument.WorkbookPart.Workbook.Sheets
Dim sName As String = Sheet.Name
Dim sID As String = Sheet.Id
Dim part As WorksheetPart = workbookPart.GetPartById(sID)
Dim actualSheet As Worksheet = part.Worksheet
Dim sheetData As SheetData = part.Worksheet.Elements(Of SheetData)().First
For Each r As Row In sheetData.Elements(Of Row)()
For Each c As Cell In r.Elements(Of Cell)()
text = c.CellValue.Text
Console.Write(text & " ")
Next
Next
Next
End Using
Console.Read()
worksheet.GetAttribute("name","").Value
Related
There is code which works with Excel like this:
using SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open("filePath", true);
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart!;
SheetData sheetData = GetSheetData(workbookPart, outFile);
Worksheet worksheet = sheetData.Parent as Worksheet;
List<SharedStringItem> sharedStringTableElements = workbookPart.SharedStringTablePart!.SharedStringTable.Elements<SharedStringItem>().ToList();
List<Row> sheetRows = sheetData.Elements<Row>().ToList();
...
But it breaks when I try to open my generated file. SharedStringTablePart is null.
This is how I create Excel files:
using SpreadsheetDocument spreadSheet = CreateSpreadsheet(filePath);
WorkbookPart workbookPart = spreadSheet.AddWorkbookPart();
workbookPart.Workbook = new Workbook();
WorksheetPart worksheetPart = workbookPart.AddNewPart<WorksheetPart>();
SheetData sheetData = new();
worksheetPart.Worksheet = new Worksheet(sheetData);
Sheets sheets = workbookPart.Workbook.AppendChild(new Sheets());
string relationshipId = workbookPart.GetIdOfPart(worksheetPart);
Sheet sheet = new() {
Id = relationshipId,
SheetId = FirstSheetId,
Name = "SheetName",
};
sheets.Append(sheet);
//fill sheetData here
workbookPart.Workbook.Save();
spreadSheet.Close();
How do I change this code so it doesn't break when accessing a SharedStringTablePart? How can I fill it in correctly?
Thanks in advance.
You create a new SpreadsheetDocument and add the WorkbookPart. Right after being created, that new WorkbookPart does not have any associated parts. Thus, you'll have to create all required parts, including the SharedStringTablePart, like so:
// Let's assume you created the WorkbookPart as follows.
WorkbookPart workbookPart = spreadSheet.AddWorkbookPart();
// Create and initialize the associated SharedStringTablePart as well.
var sharedStringTablePart = workbookPart.AddNewPart<SharedStringTablePart>();
sharedStringTablePart.SharedStringTable = new SharedStringTable();
Once added as shown above, workbookPart.SharedStringTablePart will no longer be null.
I came back to write how I did it.
First of all, we need to create SharedStringTablePart as Thomas said:
private static SharedStringTablePart GetSharedStringTablePart(SpreadsheetDocument spreadSheet)
{
if (spreadSheet.WorkbookPart!.GetPartsOfType<SharedStringTablePart>().Any())
{
return spreadSheet.WorkbookPart.GetPartsOfType<SharedStringTablePart>().First();
}
else
{
return spreadSheet.WorkbookPart.AddNewPart<SharedStringTablePart>();
}
}
Then I created a helper method to insert the data into this table:
private static string InsertSharedStringItem(string text, SharedStringTablePart shareStringPart)
{
shareStringPart.SharedStringTable ??= new SharedStringTable();
int i = 0;
foreach (SharedStringItem item in shareStringPart.SharedStringTable.Elements<SharedStringItem>())
{
if (item.InnerText == text)
{
return i.ToString();
}
i++;
}
shareStringPart.SharedStringTable.AppendChild(new SharedStringItem(new Text(text)));
shareStringPart.SharedStringTable.Save();
return i.ToString();
}
So next time when I inserting cell value I do it like this:
string index = InsertSharedStringItem("some text", sharedTablePart);
Cell cell = new() {
DataType = CellValues.SharedString,
CellValue = new CellValue(index),
...
};
row.AppendChild(cell);
I'm fairly new to creating Excel spreadsheets in C# and I'm looking for advice.
I've spent 2 or 3 days now looking through documentation and blogs etc but I cannot seem to find an answer to a common task.
I need to insert a text value into a specific worksheet. I cant easily post my code at the moment but it appears to be an issue with every example I've seen of the definition of a sheet, it always gets the first child.
I need to iterate through all sheets and dependant on the sheet name then go and insert a value.
I.e. If the sheet name = "testA" then write TestA, if the sheet name = "testB" then write "TestB".
Currently I can insert a value for the workbook but it inserts the same value for every sheet.
Sheets sheets = workbookPart.Workbook.GetFirstChild<Sheets>();
[Microsoft documentation] (https://learn.microsoft.com/en-us/office/open-xml/how-to-insert-text-into-a-cell-in-a-spreadsheet)
Please note I'm not just giving up on this I've just reached a bit of a wall with it and need some pointers.
Many Thanks, J
Namespaces I'm using:
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
private void btnExport_Click(object sender, EventArgs e)
{
//Using Microsofts Interop class to create Excel files gives mixed resuts and requires excel to be installed.
// This uses OpenXML library to achieve this and doesnt require excel to be installed: https://medium.com/swlh/openxml-sdk-brief-guide-c-7099e2391059
string filepath = "Test.xlsx";
// Create a spreadsheet document by supplying the filepath.
// By default, AutoSave = true, Editable = true, and Type = xlsx.
SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.
Create(filepath, SpreadsheetDocumentType.Workbook);
// Add a WorkbookPart to the document.
WorkbookPart workbookpart = spreadsheetDocument.AddWorkbookPart();
workbookpart.Workbook = new Workbook();
// Add a WorksheetPart to the WorkbookPart.
WorksheetPart worksheetPart = workbookpart.AddNewPart<WorksheetPart>();
worksheetPart.Worksheet = new Worksheet(new SheetData());
// Add Sheets to the Workbook.
Sheets sheets = spreadsheetDocument.WorkbookPart.Workbook.
AppendChild<Sheets>(new Sheets());
// Append a new worksheet and associate it with the workbook.
Sheet sheetOne = new Sheet()
{
Id = spreadsheetDocument.WorkbookPart.
GetIdOfPart(worksheetPart),
SheetId = 1,
Name = "Sheet1"
};
Sheet sheetTwo= new Sheet()
{
Id = spreadsheetDocument.WorkbookPart.
GetIdOfPart(worksheetPart),
SheetId = 2,
Name = "Sheet2"
};
Sheet sheetParameters = new Sheet()
{
Id = spreadsheetDocument.WorkbookPart.
GetIdOfPart(worksheetPart),
SheetId = 3,
Name = "Parameters"
};
Sheet sheetFour= new Sheet()
{
Id = spreadsheetDocument.WorkbookPart.
GetIdOfPart(worksheetPart),
SheetId = 4,
Name = "Sheet4"
};
Sheet sheetFive= new Sheet()
{
Id = spreadsheetDocument.WorkbookPart.
GetIdOfPart(worksheetPart),
SheetId = 5,
Name = "Sheet5"
};
sheets.Append(sheetOne);
sheets.Append(sheetTwo);
sheets.Append(sheetParameters);
sheets.Append(sheetFour);
sheets.Append(sheetFive);
workbookpart.Workbook.Save();
// Close the document.
spreadsheetDocument.Close();
//InsertTextExistingExcel(filepath, "test1", "A", 1, "Parameters");
InsertInSheet(filepath, "test1", "A", 1, "Parameters");
//Any of these lists empty?
}
public static void InsertInSheet(string filePath, string value, string columnName, uint rowIndex, string sheetName)
{
using (SpreadsheetDocument spreadSheet = SpreadsheetDocument.Open(filePath, true))
{
getSheetDetails(spreadSheet);
}
}
//work here
public static void getSheetDetails(SpreadsheetDocument doc)
{
WorksheetPart worksheetPart;
//iterate through sheets
foreach (Sheet sheetDetail in doc.WorkbookPart.Workbook.Sheets)
{
if (sheetDetail.Name == "Parameters")
{
//each sheet has a worksheetPart
worksheetPart = (WorksheetPart)doc.WorkbookPart.GetPartById(sheetDetail.Id);
Cell cell = InsertCellInWorksheet("A", 1, worksheetPart);
cell.CellValue = new CellValue("test");
cell.DataType = new EnumValue<CellValues>(CellValues.String);
worksheetPart.Worksheet.Save();
}
}
}
private static Cell InsertCellInWorksheet(string columnName, uint rowIndex, WorksheetPart worksheetPart)
{
//fix earlier worksheetPart only has a count of 1
Worksheet worksheet = worksheetPart.Worksheet;
//fix this to point to sheet
//SheetData sheetData = worksheet.Descendants<SheetData>();
SheetData sheetData = worksheet.GetFirstChild<SheetData>();
MessageBox.Show(sheetData.InnerXml.ToString());
string cellReference = columnName + rowIndex;
Row row;
if (sheetData.Elements<Row>().Where(r => r.RowIndex == rowIndex).Count() != 0)
{
row = sheetData.Elements<Row>().Where(r => r.RowIndex == rowIndex).First();
}
else
{
row = new Row() { RowIndex = rowIndex };
sheetData.Append(row);
}
Cell refCell = row.Descendants<Cell>().LastOrDefault();
Cell newCell = new Cell() { CellReference = cellReference };
row.InsertAfter(newCell, refCell);
worksheet.Save();
return newCell;
}
I am new at OpenXML c# and I want to read rows from excel file. But I need to read excel sheet by name. this is my sample code that reads first sheet:
using (var spreadSheet = SpreadsheetDocument.Open(path, true))
{
WorkbookPart workbookPart = spreadSheet.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
foreach (Row r in sheetData.Elements<Row>())
{
foreach (Cell c in r.Elements<Cell>())
{
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
// reading cells
}
}
}
But how can I find by sheet name and read cells.
I've done it like in the code snippet below. It's basically Workbook->Spreadsheet->Sheet then getting the Name attribute of the sheet.
The basic underling xml looks like this:
<x:workbook>
<x:sheets>
<x:sheet name="Sheet1" sheetId="1" r:id="rId1" />
<x:sheet name="TEST sheet Name" sheetId="2" r:id="rId2" />
</x:sheets>
</x:workbook>
The id value is what the Open XML package uses internally to identify each sheet and link it with the other XML parts. That's why the line of code that follows identifying the name uses GetPartById to pick up the WorksheetPart.
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(path, false))
{
WorkbookPart bkPart = doc.WorkbookPart;
DocumentFormat.OpenXml.Spreadsheet.Workbook workbook = bkPart.Workbook;
DocumentFormat.OpenXml.Spreadsheet.Sheet s = workbook.Descendants<DocumentFormat.OpenXml.Spreadsheet.Sheet>().Where(sht => sht.Name == "Sheet1").FirstOrDefault();
WorksheetPart wsPart = (WorksheetPart)bkPart.GetPartById(s.Id);
DocumentFormat.OpenXml.Spreadsheet.SheetData sheetdata = wsPart.Worksheet.Elements<DocumentFormat.OpenXml.Spreadsheet.SheetData>().FirstOrDefault();
foreach (DocumentFormat.OpenXml.Spreadsheet.Row r in sheetdata.Elements<DocumentFormat.OpenXml.Spreadsheet.Row>())
{
DocumentFormat.OpenXml.Spreadsheet.Cell c = r.Elements<DocumentFormat.OpenXml.Spreadsheet.Cell>().First();
txt += c.CellValue.Text + Environment.NewLine;
}
this.txtMessages.Text += txt;
}
I switched from Interop library to OpenXML, because I need to read large Excel files. Before that I could use:
worksheet.UsedRange.Rows.Count
to get the number of rows with data on the worksheet. I used this information to make a progressbar. In OpenXML I do not know how to get the same information about the worksheet. What I have now is this code:
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(path, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
int row_count = 0, col_count;
// here I would like to get the info about the number of rows
foreach (Row r in sheetData.Elements<Row>())
{
col_count = 0;
if (row_count > 10)
{
foreach (Cell c in r.Elements<Cell>())
{
// do some stuff
// update progressbar
}
}
row_count++;
}
}
It's not that hard (When you use LINQ),
using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open("PATH", true))
{
//Get workbookpart
WorkbookPart workbookPart = myDoc.WorkbookPart;
//then access to the worksheet part
IEnumerable<WorksheetPart> worksheetPart = workbookPart.WorksheetParts;
foreach (WorksheetPart WSP in worksheetPart)
{
//find sheet data
IEnumerable<SheetData> sheetData = WSP.Worksheet.Elements<SheetData>();
// Iterate through every sheet inside Excel sheet
foreach (SheetData SD in sheetData)
{
IEnumerable<Row> row = SD.Elements<Row>(); // Get the row IEnumerator
Console.WriteLine(row.Count()); // Will give you the count of rows
}
}
}
Edited with Linq now it's straight forward.
I have been struggling to find a solution on how to read a large xlsx file with OpenXml. I have tried the microsoft samples without luck. I simply need to read an excel file into a DataTable in c#. I am not concerned with value types in the datatable, everything can be stored as a string values.
The samples I have found so far don't retain the structure of the spreadsheet and only return the values of the cells.
Any ideas?
The open xml SDK can be a little hard to understand. However, I have found it useful to use http://simpleooxml.codeplex.com/ this code plex project. It adds a thin layer over the sdk to more easily parse through excel files and work with styles.
Then you can use something like the following with their worksheet reader to recurse through and grab the values you want
System.IO.MemoryStream ms = Utility.StreamToMemory(xslxTemplate);
using (SpreadsheetDocument document = SpreadsheetDocument.Open(ms, true))
{
IEnumerable<Sheet> sheets = document.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
if (sheets.Count() == 0)
{
// The specified worksheet does not exist.
return null;
}
string relationshipId = sheets.First().Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)document.WorkbookPart.GetPartById(relationshipId);
string myval =WorksheetReader.GetCell("A", 0, worksheetPart).CellValue.InnerText;
// Put in a loop to go through contents of document
}
You can get DataTable this way:
using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(fileName, false))
{
DataTable data = ToDataTable(spreadsheet, "Employees");
}
This method will read Excel sheet data as DataTable
public DataTable ToDataTable(SpreadsheetDocument spreadsheet, string worksheetName)
{
var workbookPart = spreadsheet.WorkbookPart;
var sheet = workbookPart
.Workbook
.Descendants<Sheet>()
.FirstOrDefault(s => s.Name == worksheetName);
var worksheetPart = sheet == null
? null
: workbookPart.GetPartById(sheet.Id) as WorksheetPart;
var dataTable = new DataTable();
if (worksheetPart != null)
{
var sheetData = worksheetPart.Worksheet.GetFirstChild<SheetData>();
foreach (Row row in sheetData.Descendants<Row>())
{
var values = row
.Descendants<Cell>()
.Select(cell =>
{
var value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
value = workbookPart
.SharedStringTablePart
.SharedStringTable
.ChildElements[int.Parse(value)]
.InnerText;
}
return (object)value;
})
.ToArray();
dataTable.Rows.Add(values);
}
}
return dataTable;
}