Read excel by sheet name with OpenXML - c#

I am new at OpenXML c# and I want to read rows from excel file. But I need to read excel sheet by name. this is my sample code that reads first sheet:
using (var spreadSheet = SpreadsheetDocument.Open(path, true))
{
WorkbookPart workbookPart = spreadSheet.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
foreach (Row r in sheetData.Elements<Row>())
{
foreach (Cell c in r.Elements<Cell>())
{
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
// reading cells
}
}
}
But how can I find by sheet name and read cells.

I've done it like in the code snippet below. It's basically Workbook->Spreadsheet->Sheet then getting the Name attribute of the sheet.
The basic underling xml looks like this:
<x:workbook>
<x:sheets>
<x:sheet name="Sheet1" sheetId="1" r:id="rId1" />
<x:sheet name="TEST sheet Name" sheetId="2" r:id="rId2" />
</x:sheets>
</x:workbook>
The id value is what the Open XML package uses internally to identify each sheet and link it with the other XML parts. That's why the line of code that follows identifying the name uses GetPartById to pick up the WorksheetPart.
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(path, false))
{
WorkbookPart bkPart = doc.WorkbookPart;
DocumentFormat.OpenXml.Spreadsheet.Workbook workbook = bkPart.Workbook;
DocumentFormat.OpenXml.Spreadsheet.Sheet s = workbook.Descendants<DocumentFormat.OpenXml.Spreadsheet.Sheet>().Where(sht => sht.Name == "Sheet1").FirstOrDefault();
WorksheetPart wsPart = (WorksheetPart)bkPart.GetPartById(s.Id);
DocumentFormat.OpenXml.Spreadsheet.SheetData sheetdata = wsPart.Worksheet.Elements<DocumentFormat.OpenXml.Spreadsheet.SheetData>().FirstOrDefault();
foreach (DocumentFormat.OpenXml.Spreadsheet.Row r in sheetdata.Elements<DocumentFormat.OpenXml.Spreadsheet.Row>())
{
DocumentFormat.OpenXml.Spreadsheet.Cell c = r.Elements<DocumentFormat.OpenXml.Spreadsheet.Cell>().First();
txt += c.CellValue.Text + Environment.NewLine;
}
this.txtMessages.Text += txt;
}

Related

Could not set innerText for twoCellAnchor using openxml excel C#

My requirement is to change the text of the textbox in excel using openxml c#. i can find the textbox using the below code:
WorkbookPart workbookPart = document.WorkbookPart;
Sheets sheets = workbookPart.Workbook.GetFirstChild<Sheets>();
//To add the month in the first KPMG sheet
string sheetName = "Test";
Sheet sheet1 = document.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>().Where(s => s.Name == sheetName).FirstOrDefault();
string relationshipId = sheet1.Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)document.WorkbookPart.GetPartById(relationshipId);
var ocaElems = worksheetPart.DrawingsPart.WorksheetDrawing.Elements<TwoCellAnchor>();
foreach (TwoCellAnchor twoCellAnchor in ocaElems)
{
if (twoCellAnchor.InnerText.Contains("google"))
{
//twoCellAnchor.InnerText = "Ted Report - August";
//while setting innerText its showing error as set is not accessible
}
}

DocumentFormat.openxml Excel File Reading Issue

I have used DocumentFormat.OpenXml dll in one of my project for reading and writing excel file.
During Reading of Excel File, Let's say for some column say Column1 I am having cell values as "TRUE" and "FALSE". When I read this Excel File using Following Code
private SharedStringTable sharedStringTable;
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = doc.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.FirstOrDefault();
SharedStringTablePart sharedStringTablePart = workbookPart.SharedStringTablePart;
if (sharedStringTablePart != null)
{
sharedStringTable = sharedStringTablePart.SharedStringTable;
}
var sheets = workbookPart.Workbook.Sheets;
foreach (Sheet sheet in sheets)
{
Worksheet requiredItem = (doc.WorkbookPart.GetPartById(sheet.Id.Value) as WorksheetPart).Worksheet;
var sheetData = requiredItem.Elements<SheetData>().First();
foreach (var rowItem in sheetData.Elements<Row>())
{
foreach (var item in rowItem.Elements<Cell>())
{
string requiredText = string.empty;
if (item.CellValue != null)
{
requiredText = item.CellValue.InnerText;
}
}
}
}
}
At that time for Cell Values "TRUE" and "FALSE" i am getting values 1 and 0 Respectively.
Can anyone provide me any way so that I can get values "TRUE" and "FALSE" instead of 1 and 0 ?

How to count rows per worksheet in OpenXML

I switched from Interop library to OpenXML, because I need to read large Excel files. Before that I could use:
worksheet.UsedRange.Rows.Count
to get the number of rows with data on the worksheet. I used this information to make a progressbar. In OpenXML I do not know how to get the same information about the worksheet. What I have now is this code:
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(path, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
int row_count = 0, col_count;
// here I would like to get the info about the number of rows
foreach (Row r in sheetData.Elements<Row>())
{
col_count = 0;
if (row_count > 10)
{
foreach (Cell c in r.Elements<Cell>())
{
// do some stuff
// update progressbar
}
}
row_count++;
}
}
It's not that hard (When you use LINQ),
using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open("PATH", true))
{
//Get workbookpart
WorkbookPart workbookPart = myDoc.WorkbookPart;
//then access to the worksheet part
IEnumerable<WorksheetPart> worksheetPart = workbookPart.WorksheetParts;
foreach (WorksheetPart WSP in worksheetPart)
{
//find sheet data
IEnumerable<SheetData> sheetData = WSP.Worksheet.Elements<SheetData>();
// Iterate through every sheet inside Excel sheet
foreach (SheetData SD in sheetData)
{
IEnumerable<Row> row = SD.Elements<Row>(); // Get the row IEnumerator
Console.WriteLine(row.Count()); // Will give you the count of rows
}
}
}
Edited with Linq now it's straight forward.

OpenXML, SAX, and Simply Reading an Xlsx file

I have been struggling to find a solution on how to read a large xlsx file with OpenXml. I have tried the microsoft samples without luck. I simply need to read an excel file into a DataTable in c#. I am not concerned with value types in the datatable, everything can be stored as a string values.
The samples I have found so far don't retain the structure of the spreadsheet and only return the values of the cells.
Any ideas?
The open xml SDK can be a little hard to understand. However, I have found it useful to use http://simpleooxml.codeplex.com/ this code plex project. It adds a thin layer over the sdk to more easily parse through excel files and work with styles.
Then you can use something like the following with their worksheet reader to recurse through and grab the values you want
System.IO.MemoryStream ms = Utility.StreamToMemory(xslxTemplate);
using (SpreadsheetDocument document = SpreadsheetDocument.Open(ms, true))
{
IEnumerable<Sheet> sheets = document.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
if (sheets.Count() == 0)
{
// The specified worksheet does not exist.
return null;
}
string relationshipId = sheets.First().Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)document.WorkbookPart.GetPartById(relationshipId);
string myval =WorksheetReader.GetCell("A", 0, worksheetPart).CellValue.InnerText;
// Put in a loop to go through contents of document
}
You can get DataTable this way:
using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(fileName, false))
{
DataTable data = ToDataTable(spreadsheet, "Employees");
}
This method will read Excel sheet data as DataTable
public DataTable ToDataTable(SpreadsheetDocument spreadsheet, string worksheetName)
{
var workbookPart = spreadsheet.WorkbookPart;
var sheet = workbookPart
.Workbook
.Descendants<Sheet>()
.FirstOrDefault(s => s.Name == worksheetName);
var worksheetPart = sheet == null
? null
: workbookPart.GetPartById(sheet.Id) as WorksheetPart;
var dataTable = new DataTable();
if (worksheetPart != null)
{
var sheetData = worksheetPart.Worksheet.GetFirstChild<SheetData>();
foreach (Row row in sheetData.Descendants<Row>())
{
var values = row
.Descendants<Cell>()
.Select(cell =>
{
var value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
value = workbookPart
.SharedStringTablePart
.SharedStringTable
.ChildElements[int.Parse(value)]
.InnerText;
}
return (object)value;
})
.ToArray();
dataTable.Rows.Add(values);
}
}
return dataTable;
}

How to retrieve Tab names from excel sheet using OpenXML

I have a spreadsheet document that has 182 columns in it. I need to place the spreadsheet data into a data table, tab by tab, but i need to find out as I'm adding data from each tab, what is the tab name, and add the tab name to a column in the data table.
This is how I set up the data table.
I then loop in the workbook and drill down to the sheetData object and walk through each row and column, getting cell data.
DataTable dt = new DataTable();
for (int i = 0; i <= col.GetUpperBound(0); i++)
{
try
{
dt.Columns.Add(new DataColumn(col[i].ToString(), typeof(string)));
}
catch (Exception e)
{
MessageBox.Show("Uploader Error" + e.ToString());
return null;
}
}
dt.Columns.Add(new DataColumn("SheetName", typeof(string)));
However at the end of the string array that I use for the Data Table, I need to add the tab name. How can I find out the tab name as I'm looping in the sheet in Open XML?
Here is my code so far:
using (SpreadsheetDocument spreadSheetDocument =
SpreadsheetDocument.Open(Destination, false))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
Workbook workbook = spreadSheetDocument.WorkbookPart.Workbook;
Sheets sheets =
spreadSheetDocument
.WorkbookPart
.Workbook
.GetFirstChild<DocumentFormat.OpenXml.Spreadsheet.Sheets>();
OpenXmlElementList list = sheets.ChildElements;
foreach (WorksheetPart worksheetpart in workbook.WorkbookPart.WorksheetParts)
{
Worksheet worksheet = worksheetpart.Worksheet;
foreach (SheetData sheetData in worksheet.Elements<SheetData>())
{
foreach (Row row in sheetData.Elements())
{
string[] thisarr = new string[183];
int index = 0;
foreach (Cell cell in row.Elements())
{
thisarr[(index)] = GetCellValue(spreadSheetDocument, cell);
index++;
}
thisarr[182] = ""; //need to add tabname here
if (thisarr[0].ToString() != "")
{
dt.Rows.Add(thisarr);
}
}
}
}
}
return dt;
Just a note: I did previously get the tab names from the InnerXML property of "list" in
OpenXmlElementList list = sheets.ChildElements;
however I noticed as I'm looping in the spreadsheet it does not get the tab names in the right order.
Here is a handy helper method to get the Sheet corresponding to a WorksheetPart:
public static Sheet GetSheetFromWorkSheet
(WorkbookPart workbookPart, WorksheetPart worksheetPart)
{
string relationshipId = workbookPart.GetIdOfPart(worksheetPart);
IEnumerable<Sheet> sheets = workbookPart.Workbook.Sheets.Elements<Sheet>();
return sheets.FirstOrDefault(s => s.Id.HasValue && s.Id.Value == relationshipId);
}
Then you can get the name from the sheets Name-property:
Sheet sheet = GetSheetFromWorkSheet(myWorkbookPart, myWorksheetPart);
string sheetName = sheet.Name;
...this will be the "tab name" OP referred to.
For the record the opposite method would look like:
public static Worksheet GetWorkSheetFromSheet(WorkbookPart workbookPart, Sheet sheet)
{
var worksheetPart = (WorksheetPart)workbookPart.GetPartById(sheet.Id);
return worksheetPart.Worksheet;
}
...and with that we can also add the following method:
public static IEnumerable<KeyValuePair<string, Worksheet>> GetNamedWorksheets
(WorkbookPart workbookPart)
{
return workbookPart.Workbook.Sheets.Elements<Sheet>()
.Select(sheet => new KeyValuePair<string, Worksheet>
(sheet.Name, GetWorkSheetFromSheet(workbookPart, sheet)));
}
Now you can easily enumerate through all Worksheets including their name.
Throw it all into a dictionary for name-based lookup if you prefer that:
IDictionary<string, WorkSheet> wsDict = GetNamedWorksheets(myWorkbookPart)
.ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
...or if you just want one specific sheet by name:
public static Sheet GetSheetFromName(WorkbookPart workbookPart, string sheetName)
{
return workbookPart.Workbook.Sheets.Elements<Sheet>()
.FirstOrDefault(s => s.Name.HasValue && s.Name.Value == sheetName);
}
(Then call GetWorkSheetFromSheet to get the corresponding Worksheet.)
The sheet names are stored in the WorkbookPart in a Sheets element which has children of element Sheet which corresponds to each worksheet in the Excel file. All you have to do is grab the correct index out of that Sheets element and that will be the Sheet you are on in your loop. I added a snippet of code below to do what you want.
int sheetIndex = 0;
foreach (WorksheetPart worksheetpart in workbook.WorkbookPart.WorksheetParts)
{
Worksheet worksheet = worksheetpart.Worksheet;
// Grab the sheet name each time through your loop
string sheetName = workbookPart.Workbook.Descendants<Sheet>().ElementAt(sheetIndex).Name;
foreach (SheetData sheetData in worksheet.Elements<SheetData>())
{
...
}
sheetIndex++;
}
Using spreadsheetDocument As SpreadsheetDocument = spreadsheetDocument.Open("D:\Libro1.xlsx", True)
Dim workbookPart As WorkbookPart = spreadsheetDocument.WorkbookPart
workbookPart.Workbook.Descendants(Of Sheet)()
Dim worksheetPart As WorksheetPart = workbookPart.WorksheetParts.Last
Dim text As String
For Each Sheet As Sheet In spreadsheetDocument.WorkbookPart.Workbook.Sheets
Dim sName As String = Sheet.Name
Dim sID As String = Sheet.Id
Dim part As WorksheetPart = workbookPart.GetPartById(sID)
Dim actualSheet As Worksheet = part.Worksheet
Dim sheetData As SheetData = part.Worksheet.Elements(Of SheetData)().First
For Each r As Row In sheetData.Elements(Of Row)()
For Each c As Cell In r.Elements(Of Cell)()
text = c.CellValue.Text
Console.Write(text & " ")
Next
Next
Next
End Using
Console.Read()
worksheet.GetAttribute("name","").Value

Categories