I switched from Interop library to OpenXML, because I need to read large Excel files. Before that I could use:
worksheet.UsedRange.Rows.Count
to get the number of rows with data on the worksheet. I used this information to make a progressbar. In OpenXML I do not know how to get the same information about the worksheet. What I have now is this code:
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(path, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
int row_count = 0, col_count;
// here I would like to get the info about the number of rows
foreach (Row r in sheetData.Elements<Row>())
{
col_count = 0;
if (row_count > 10)
{
foreach (Cell c in r.Elements<Cell>())
{
// do some stuff
// update progressbar
}
}
row_count++;
}
}
It's not that hard (When you use LINQ),
using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open("PATH", true))
{
//Get workbookpart
WorkbookPart workbookPart = myDoc.WorkbookPart;
//then access to the worksheet part
IEnumerable<WorksheetPart> worksheetPart = workbookPart.WorksheetParts;
foreach (WorksheetPart WSP in worksheetPart)
{
//find sheet data
IEnumerable<SheetData> sheetData = WSP.Worksheet.Elements<SheetData>();
// Iterate through every sheet inside Excel sheet
foreach (SheetData SD in sheetData)
{
IEnumerable<Row> row = SD.Elements<Row>(); // Get the row IEnumerator
Console.WriteLine(row.Count()); // Will give you the count of rows
}
}
}
Edited with Linq now it's straight forward.
Related
The code below only count the total number of rows in sheet 1 but I also want to count the rows in sheet 2 and sheet 3 and so on
using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open("PATH", true))
{
//Get workbookpart
WorkbookPart workbookPart = myDoc.WorkbookPart;
//then access to the worksheet part
IEnumerable<WorksheetPart> worksheetPart = workbookPart.WorksheetParts;
foreach (WorksheetPart WSP in worksheetPart)
{
//find sheet data
IEnumerable<SheetData> sheetData = WSP.Worksheet.Elements<SheetData>();
// Iterate through every sheet inside Excel sheet
foreach (SheetData SD in sheetData)
{
IEnumerable<Row> row = SD.Elements<Row>(); // Get the row IEnumerator
Console.WriteLine(row.Count()); // Will give you the count of rows
}
}
}
I am new at OpenXML c# and I want to read rows from excel file. But I need to read excel sheet by name. this is my sample code that reads first sheet:
using (var spreadSheet = SpreadsheetDocument.Open(path, true))
{
WorkbookPart workbookPart = spreadSheet.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
foreach (Row r in sheetData.Elements<Row>())
{
foreach (Cell c in r.Elements<Cell>())
{
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
// reading cells
}
}
}
But how can I find by sheet name and read cells.
I've done it like in the code snippet below. It's basically Workbook->Spreadsheet->Sheet then getting the Name attribute of the sheet.
The basic underling xml looks like this:
<x:workbook>
<x:sheets>
<x:sheet name="Sheet1" sheetId="1" r:id="rId1" />
<x:sheet name="TEST sheet Name" sheetId="2" r:id="rId2" />
</x:sheets>
</x:workbook>
The id value is what the Open XML package uses internally to identify each sheet and link it with the other XML parts. That's why the line of code that follows identifying the name uses GetPartById to pick up the WorksheetPart.
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(path, false))
{
WorkbookPart bkPart = doc.WorkbookPart;
DocumentFormat.OpenXml.Spreadsheet.Workbook workbook = bkPart.Workbook;
DocumentFormat.OpenXml.Spreadsheet.Sheet s = workbook.Descendants<DocumentFormat.OpenXml.Spreadsheet.Sheet>().Where(sht => sht.Name == "Sheet1").FirstOrDefault();
WorksheetPart wsPart = (WorksheetPart)bkPart.GetPartById(s.Id);
DocumentFormat.OpenXml.Spreadsheet.SheetData sheetdata = wsPart.Worksheet.Elements<DocumentFormat.OpenXml.Spreadsheet.SheetData>().FirstOrDefault();
foreach (DocumentFormat.OpenXml.Spreadsheet.Row r in sheetdata.Elements<DocumentFormat.OpenXml.Spreadsheet.Row>())
{
DocumentFormat.OpenXml.Spreadsheet.Cell c = r.Elements<DocumentFormat.OpenXml.Spreadsheet.Cell>().First();
txt += c.CellValue.Text + Environment.NewLine;
}
this.txtMessages.Text += txt;
}
I search lot about how to read and copy data in multiple Excel sheets to data-set using open XML. Only get how to parse single sheet excel to data table. I have a Excel file with multiple sheets. I want to take the data to data-set, so below I pasted the code for copying one sheet data. Please help me to read multiple sheet excel to data-set.
public static DataTable ReadAsDataTable(string fileName)
{
DataTable dataTable = new DataTable();
using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
string relationshipId = sheets.First().Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
Worksheet workSheet = worksheetPart.Worksheet;
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach (Cell cell in rows.ElementAt(0))
{
dataTable.Columns.Add(GetCellValue(spreadSheetDocument, cell));
}
foreach (Row row in rows)
{
DataRow dataRow = dataTable.NewRow();
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
dataRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
}
dataTable.Rows.Add(dataRow);
}
}
dataTable.Rows.RemoveAt(0);
return dataTable;
}
I understand your requirement clearly and it is very simple to read multiple worksheets in excel using open xml.
you just need to add one for each loop.
I have already create a article with the same requirement in my website,kindly refer that and surely you will get a solution.
http://www.learnandshare-karthik.com/2016/08/29/open-xml-sdk-to-read-workbook-with-multiple-worksheets/
if you still need further help,please let me know
thanks
karthik
I have used DocumentFormat.OpenXml dll in one of my project for reading and writing excel file.
During Reading of Excel File, Let's say for some column say Column1 I am having cell values as "TRUE" and "FALSE". When I read this Excel File using Following Code
private SharedStringTable sharedStringTable;
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = doc.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.FirstOrDefault();
SharedStringTablePart sharedStringTablePart = workbookPart.SharedStringTablePart;
if (sharedStringTablePart != null)
{
sharedStringTable = sharedStringTablePart.SharedStringTable;
}
var sheets = workbookPart.Workbook.Sheets;
foreach (Sheet sheet in sheets)
{
Worksheet requiredItem = (doc.WorkbookPart.GetPartById(sheet.Id.Value) as WorksheetPart).Worksheet;
var sheetData = requiredItem.Elements<SheetData>().First();
foreach (var rowItem in sheetData.Elements<Row>())
{
foreach (var item in rowItem.Elements<Cell>())
{
string requiredText = string.empty;
if (item.CellValue != null)
{
requiredText = item.CellValue.InnerText;
}
}
}
}
}
At that time for Cell Values "TRUE" and "FALSE" i am getting values 1 and 0 Respectively.
Can anyone provide me any way so that I can get values "TRUE" and "FALSE" instead of 1 and 0 ?
I'm parsing a large file using the SAX approach offered on:
Parsing and Reading Large Excel Files with the Open XML SDK
This is my modified version (only getting the row number for simplicity)
using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open("BigFile.xlsx", true))
{
WorkbookPart workbookPart = myDoc.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
String rowNum;
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
if (reader.HasAttributes)
rowNum = reader.Attributes.First(a => a.LocalName == "r").Value
}
}
}
The problem is that this loops through every item/cell/column/whatnot and only acts when the element type is Row.
Is there a SAX way to loop only through the rows and not every item in the worksheet?
Thanks,
The key is to use the Skip() and ReadNextSibling() methods of the reader...
using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open("BigFile.xlsx", true))
{
WorkbookPart workbookPart = myDoc.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
String rowNum;
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
do
{
if (reader.HasAttributes)
rowNum = reader.Attributes.First(a => a.LocalName == "r").Value;
} while (reader.ReadNextSibling()); // Skip to the next row
break; // We just looped through all the rows so no need to continue reading the worksheet
}
if (reader.ElementType != typeof(Worksheet)) // Dont' want to skip the contents of the worksheet
reader.Skip(); // Skip contents of any node before finding the first row.
}
}