open xml reading from excel file - c#

I want to implement openXml sdk 2.5 into my project. I do everything in this link
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
using System.IO.Packaging;
static void Main(string[] args)
{
String fileName = #"C:\OPENXML\BigData.xlsx";
// Comment one of the following lines to test the method separately.
ReadExcelFileDOM(fileName); // DOM
//ReadExcelFileSAX(fileName); // SAX
}
// The DOM approach.
// Note that the code below works only for cells that contain numeric values.
//
static void ReadExcelFileDOM(string fileName)
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
string text;
int rowCount= sheetData.Elements<Row>().Count();
foreach (Row r in sheetData.Elements<Row>())
{
foreach (Cell c in r.Elements<Cell>())
{
text = c.CellValue.Text;
Console.Write(text + " ");
}
}
Console.WriteLine();
Console.ReadKey();
}
}
But i am not getting any row. It hasn't entered loop. Note: I also set up openXml sdk 2.5 my computer
And I find below code this is work for numeric value.For string value it writes 0 1 2 ...
private static void Main(string[] args)
{
var filePath = #"C:/OPENXML/BigData.xlsx";
using (var document = SpreadsheetDocument.Open(filePath, false))
{
var workbookPart = document.WorkbookPart;
var workbook = workbookPart.Workbook;
var sheets = workbook.Descendants<Sheet>();
foreach (var sheet in sheets)
{
var worksheetPart = (WorksheetPart)workbookPart.GetPartById(sheet.Id);
var sharedStringPart = workbookPart.SharedStringTablePart;
//var values = sharedStringPart.SharedStringTable.Elements<SharedStringItem>().ToArray();
string text;
var rows = worksheetPart.Worksheet.Descendants<Row>();
foreach (var row in rows)
{
Console.WriteLine();
int count = row.Elements<Cell>().Count();
foreach (Cell c in row.Elements<Cell>())
{
text = c.CellValue.InnerText;
Console.Write(text + " ");
}
}
}
}
Console.ReadLine();
}

Your approach seemed to work ok for me - in that it did "enter the loop".
Nevertheless you could also try something like the following:
void Main()
{
string fileName = #"c:\path\to\my\file.xlsx";
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(fs, false))
{
WorkbookPart workbookPart = doc.WorkbookPart;
SharedStringTablePart sstpart = workbookPart.GetPartsOfType<SharedStringTablePart>().First();
SharedStringTable sst = sstpart.SharedStringTable;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
Worksheet sheet = worksheetPart.Worksheet;
var cells = sheet.Descendants<Cell>();
var rows = sheet.Descendants<Row>();
Console.WriteLine("Row count = {0}", rows.LongCount());
Console.WriteLine("Cell count = {0}", cells.LongCount());
// One way: go through each cell in the sheet
foreach (Cell cell in cells)
{
if ((cell.DataType != null) && (cell.DataType == CellValues.SharedString))
{
int ssid = int.Parse(cell.CellValue.Text);
string str = sst.ChildElements[ssid].InnerText;
Console.WriteLine("Shared string {0}: {1}", ssid, str);
}
else if (cell.CellValue != null)
{
Console.WriteLine("Cell contents: {0}", cell.CellValue.Text);
}
}
// Or... via each row
foreach (Row row in rows)
{
foreach (Cell c in row.Elements<Cell>())
{
if ((c.DataType != null) && (c.DataType == CellValues.SharedString))
{
int ssid = int.Parse(c.CellValue.Text);
string str = sst.ChildElements[ssid].InnerText;
Console.WriteLine("Shared string {0}: {1}", ssid, str);
}
else if (c.CellValue != null)
{
Console.WriteLine("Cell contents: {0}", c.CellValue.Text);
}
}
}
}
}
}
I used the filestream approach to open the workbook because this allows you to open it with shared access - so that you can have the workbook open in Excel at the same time. The Spreadsheet.Open(... method won't work if the workbook is open elsewhere.
Perhaps that is why your code didn't work.
Note, also, the use of the SharedStringTable to get the cell text where appropriate.
EDIT 2018-07-11:
Since this post is still getting votes I should also point out that in many cases it may be a lot easier to use ClosedXML to manipulate/read/edit your workbooks. The documentation examples are pretty user friendly and the coding is, in my limited experience, much more straight forward. Just be aware that it does not (yet) implement all the Excel functions (for example INDEX and MATCH) which may or may not be an issue. [Not that I would want to be trying to deal with INDEX and MATCH in OpenXML anyway.]

I had the same issue as the OP, and the answer above did not work for me.
I think this is the issue: when you create a document in Excel (not programmatically), you have 3 sheets by default and the WorksheetParts that has the row data for Sheet1 is the last WorksheetParts element, not the first.
I figured this out by putting a watch for document.WorkbookPart.WorksheetParts in Visual Studio, expanding Results, then looking at all of the sub elements until I found a SheetData object where HasChildren = true.
Try this:
// open the document read-only
SpreadSheetDocument document = SpreadsheetDocument.Open(filePath, false);
SharedStringTable sharedStringTable = document.WorkbookPart.SharedStringTablePart.SharedStringTable;
string cellValue = null;
foreach (WorksheetPart worksheetPart in document.WorkbookPart.WorksheetParts)
{
foreach (SheetData sheetData in worksheetPart.Worksheet.Elements<SheetData>())
{
if (sheetData.HasChildren)
{
foreach (Row row in sheetData.Elements<Row>())
{
foreach (Cell cell in row.Elements<Cell>())
{
cellValue = cell.InnerText;
if (cell.DataType == CellValues.SharedString)
{
Console.WriteLine("cell val: " + sharedStringTable.ElementAt(Int32.Parse(cellValue)).InnerText);
}
else
{
Console.WriteLine("cell val: " + cellValue);
}
}
}
}
}
}
document.Close();

Read Large Excel :
openxml has two approaches of DOM and SAX to read an excel. the DOM one consume more RAM resource since it loads the whole xml content(Excel file) in Memory but its strong typed approach.
SAX in other hand is event base parse. more here
so if you are facing large excel file its better to use SAX.
the below code sample uses SAX approach and also handle two important scenario in excel file reading.
open xml skips the empty cells so your dataset faces displacement and wrong index.
you need to skip the empty rows also.
this function returns the exact actual index of the cell at the time and handle the first scenario.
from here
private static int CellReferenceToIndex(Cell cell)
{
int index = 0;
string reference = cell.CellReference.ToString().ToUpper();
foreach (char ch in reference)
{
if (Char.IsLetter(ch))
{
int value = (int)ch - (int)'A';
index = (index == 0) ? value : ((index + 1) * 26) + value;
}
else
return index;
}
return index;
}
code to read excel sax approach.
//i want to import excel to data table
dt = new DataTable();
using (SpreadsheetDocument document = SpreadsheetDocument.Open(path, false))
{
WorkbookPart workbookPart = document.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
//row counter
int rcnt = 0;
while (reader.Read())
{
//find xml row element type
//to understand the element type you can change your excel file eg : test.xlsx to test.zip
//and inside that you may observe the elements in xl/worksheets/sheet.xml
//that helps to understand openxml better
if (reader.ElementType == typeof(Row))
{
//create data table row type to be populated by cells of this row
DataRow tempRow = dt.NewRow();
//***** HANDLE THE SECOND SENARIO*****
//if row has attribute means it is not a empty row
if (reader.HasAttributes)
{
//read the child of row element which is cells
//here first element
reader.ReadFirstChild();
do
{
//find xml cell element type
if (reader.ElementType == typeof(Cell))
{
Cell c = (Cell)reader.LoadCurrentElement();
string cellValue;
int actualCellIndex = CellReferenceToIndex(c);
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
SharedStringItem ssi = workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(int.Parse(c.CellValue.InnerText));
cellValue = ssi.Text.Text;
}
else
{
cellValue = c.CellValue.InnerText;
}
//if row index is 0 its header so columns headers are added & also can do some headers check incase
if (rcnt == 0)
{
dt.Columns.Add(cellValue);
}
else
{
// instead of tempRow[c.CellReference] = cellValue;
tempRow[actualCellIndex] = cellValue;
}
}
}
while (reader.ReadNextSibling());
//if its not the header row so append rowdata to the datatable
if (rcnt != 0)
{
dt.Rows.Add(tempRow);
}
rcnt++;
}
}
}
}

Everything is explained in the accepted answer.
Here is just an extension method to solve the problem
public static string GetCellText(this Cell cell, in SharedStringTable sst)
{
if (cell.CellValue is null)
return string.Empty;
if ((cell.DataType is not null) &&
(cell.DataType == CellValues.SharedString))
{
int ssid = int.Parse(cell.CellValue.Text);
return sst.ChildElements[ssid].InnerText;
}
return cell.CellValue.Text;
}

Related

How to convert the data that contains 9 header rows from excel to XML in C#?

I'm new in C#. I'm trying to read the data from excel file to convert to XML in C#.
However, I encounter issue to read the two separate data (columns and rows) from the table.
Refer to the picture below, I would like to read the data from the first two columns then only read the data from the fifth rows onwards. After that, combine the data and display in XML format. But the data mess up after convert to XML.
It would be appreciated if anyone could help.
Requirement:
excelData
I have tried to skip the first 4 rows then read the data starting from fifth rows with 9 header rows, but it failed to get the correct value from the cell.
private void btnConvert_Click(object sender, EventArgs e)
{
try
{
if (string.IsNullOrEmpty(txtXMLPath.Text) || string.IsNullOrEmpty(txtNodeName.Text))
{
throw new Exception("The input cannot be empty. Please fill in.");
}
string XMLpath = #"C:\temp\" + txtXMLPath.Text + ".xml";
// Read the Excel file with skip of 4 rows and 9 header rows
(int headerRowCount, int skipRowCount) = (9, 4);
(DataTable dataTable, List<string> headerRow) = ReadExcelFile(txtFilePath.Text, skipRowCount, headerRowCount);
WriteDataTableToXmlFile_Attribute(dataTable, XMLpath, txtNodeName.Text);
MessageBox.Show("Conversion completed successfully! \nXML file saved in " + XMLpath, "Success", MessageBoxButtons.OK, MessageBoxIcon.Information);
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
static (DataTable, List<string>) ReadExcelFile(string filePath, int skipRowCount, int headerRowCount)
{
DataTable dataTable = new DataTable();
List<string> headerRow = new List<string>();
using (SpreadsheetDocument document = SpreadsheetDocument.Open(filePath, false))
{
WorkbookPart workbookPart = document.WorkbookPart;
Sheet sheet = workbookPart.Workbook.Descendants<Sheet>().First();
WorksheetPart worksheetPart = (WorksheetPart)workbookPart.GetPartById(sheet.Id);
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
int columnCount = 0;
foreach (Row row in rows)
{
if (row.RowIndex <= skipRowCount)
{
// Skip this row
continue;
}
else if (row.RowIndex <= headerRowCount)
{
foreach (Cell cell in row.Descendants<Cell>())
{
string cellValue = GetCellValue(workbookPart, cell);
if (!string.IsNullOrEmpty(cellValue))
{
headerRow.Add(cellValue);
if (!dataTable.Columns.Contains(cellValue))
{
dataTable.Columns.Add(cellValue);
columnCount++;
}
}
}
}
else
{
DataRow dataRow = dataTable.NewRow();
int columnIndex = 0;
foreach (Cell cell in row.Descendants<Cell>())
{
while (columnIndex >= columnCount)
{
columnIndex--;
}
string cellValue = GetCellValue(workbookPart, cell);
dataRow[columnIndex++] = cellValue;
}
dataTable.Rows.Add(dataRow);
}
}
}
return (dataTable, headerRow);
}
Expected Output in XML:
<Root>
<Data Lot="#01" Walson Lot No.="8A0M008888" SPEC="Return Loss" Frequency[ GHz ]="2.405" Specification="AAA11" Specification="<-12" UNIT="dB" MAX="-16.01" MIN="-34.70" AVG="-21.67" SD="3.46" 1="-16.49" /></Root>

How to load on demand excel rows in a data table c#

I have a requirement where-in I have to fill dataTable from a sheet of Microsoft excel.
The sheet may have lots of data so the requirement is that when a foreach loop is iterated over the data table which is supposed to hold the data from Microsoft excel sheet should fill the table on demand.
Meaning if there are 1000000 records in the sheet the data table should fetch data in batches of 100 depending on the current position of the foreach current item in the loop.
Any pointer or suggestion will be appreciated.
I would suggest you to use OpenXML to parse and read your excel data from file.
This will also allow you to read out specific sections/regions from your workbook.
You will find more information and also an example at this link:
Microsoft Docs - Parse and read a large spreadsheet document (Open XML SDK)
This will be more efficiently and easier to develop than use the official microsoft office excel interop.
**I am not near a PC with Visual stuido, so this code is untested, and may have syntax errors until I can test it later.
It will still give you the main idea of what needs to be done.
private void ExcelDataPages(int firstRecord, int numberOfRecords)
{
Excel.Application dataApp = new Excel.Application();
Excel.Workbook dataWorkbook = new Excel.Workbook();
int x = 0;
dataWorkbook.DisplayAlerts = false;
dataWorkbook.Visible = false;
dataWorkbook.AutomationSecurity = Microsoft.Office.Core.MsoAutomationSecurity.msoAutomationSecurityLow;
dataWorkbook = dataApp.Open(#"C:\Test\YourWorkbook.xlsx");
try
{
Excel.Worksheet dataSheet = dataWorkbook.Sheet("Name of Sheet");
while (x < numberOfRecords)
{
Range currentRange = dataSheet.Rows[firstRecord + x]; //For all columns in row
foreach (Range r in currentRange.Cells) //currentRange represents all the columns in the row
{
// do what you need to with the Data here.
}
x++;
}
}
catch (Exception ex)
{
//Enter in Error handling
}
dataWorkbook.Close(false); //Depending on how quick you will access the next batch of data, you may not want to close the Workbook, reducing load time each time. This may also mean you need to move the open of the workbook to a higher level in your class, or if this is the main process of the app, make it static, stopping the garbage collector from destroying the connection.
dataApp.Quit();
}
Give the following a try--it uses NuGet package DocumentFormat.OpenXml The code is from Using OpenXmlReader. However, I modified it to add data to a DataTable. Since you're reading data from the same Excel file multiple times, it's faster to open the Excel file once using an instance of SpreadSheetDocument and dispose of it when finished. Since the instance of SpreedSheetDocument needs to be disposed of before your application exits, IDisposable is used.
Where it says "ToDo", you'll need to replace the code that creates the DataTable columns with your own code to create the correct columns for your project.
I tested the code below with an Excel file containing approximately 15,000 rows. When reading 100 rows at a time, the first read took approximately 500 ms - 800 ms, whereas subsequent reads took approximately 100 ms - 400 ms.
Create a class (name: HelperOpenXml)
HelperOpenXml.cs
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
using System.Data;
using System.Diagnostics;
namespace ExcelReadSpecifiedRowsUsingOpenXml
{
public class HelperOpenXml : IDisposable
{
public string Filename { get; private set; } = string.Empty;
public int RowCount { get; private set; } = 0;
private SpreadsheetDocument spreadsheetDocument = null;
private DataTable dt = null;
public HelperOpenXml(string filename)
{
this.Filename = filename;
}
public void Dispose()
{
if (spreadsheetDocument != null)
{
try
{
spreadsheetDocument.Dispose();
dt.Clear();
}
catch(Exception ex)
{
throw ex;
}
}
}
public DataTable GetRowsSax(int startRow, int endRow, bool firstRowIsHeader = false)
{
int startIndex = startRow;
int endIndex = endRow;
if (firstRowIsHeader)
{
//if first row is header, increment by 1
startIndex = startRow + 1;
endIndex = endRow + 1;
}
if (spreadsheetDocument == null)
{
//create new instance
spreadsheetDocument = SpreadsheetDocument.Open(Filename, false);
//create new instance
dt = new DataTable();
//ToDo: replace 'dt.Columns.Add(...)' below with your code to create the DataTable columns
//add columns to DataTable
dt.Columns.Add("A");
dt.Columns.Add("B");
dt.Columns.Add("C");
dt.Columns.Add("D");
dt.Columns.Add("E");
dt.Columns.Add("F");
dt.Columns.Add("G");
dt.Columns.Add("H");
dt.Columns.Add("I");
dt.Columns.Add("J");
dt.Columns.Add("K");
}
else
{
//remove existing data from DataTable
dt.Rows.Clear();
}
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
int numWorkSheetParts = 0;
foreach (WorksheetPart worksheetPart in workbookPart.WorksheetParts)
{
using (OpenXmlReader reader = OpenXmlReader.Create(worksheetPart))
{
int rowIndex = 0;
//use the reader to read the XML
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
reader.ReadFirstChild();
List<string> cValues = new List<string>();
int colIndex = 0;
do
{
//only get data from desired rows
if ((rowIndex > 0 && rowIndex >= startIndex && rowIndex <= endIndex) ||
(rowIndex == 0 && !firstRowIsHeader && rowIndex >= startIndex && rowIndex <= endIndex))
{
if (reader.ElementType == typeof(Cell))
{
Cell c = (Cell)reader.LoadCurrentElement();
string cellRef = c.CellReference; //ex: A1, B1, ..., A2, B2
string cellValue = string.Empty;
//string/text data is stored in SharedString
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
SharedStringItem ssi = workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(int.Parse(c.CellValue.InnerText));
cellValue = ssi.Text.Text;
}
else
{
cellValue = c.CellValue.InnerText;
}
//Debug.WriteLine("{0}: {1} ", c.CellReference, cellValue);
//add value to List which is used to add a row to the DataTable
cValues.Add(cellValue);
}
}
colIndex += 1; //increment
} while (reader.ReadNextSibling());
if (cValues.Count > 0)
{
//if List contains data, use it to add row to DataTable
dt.Rows.Add(cValues.ToArray());
}
rowIndex += 1; //increment
if (rowIndex > endIndex)
{
break; //exit loop
}
}
}
}
numWorkSheetParts += 1; //increment
}
DisplayDataTableData(dt); //display data in DataTable
return dt;
}
private void DisplayDataTableData(DataTable dt)
{
foreach (DataColumn dc in dt.Columns)
{
Debug.WriteLine("colName: " + dc.ColumnName);
}
foreach (DataRow r in dt.Rows)
{
Debug.WriteLine(r[0].ToString() + " " + r[1].ToString());
}
}
}
}
Usage:
private string excelFilename = #"C:\Temp\Test.xlsx";
private HelperOpenXml helperOpenXml = null;
...
private void GetData(int startIndex, int endIndex, bool firstRowIsHeader)
{
helperOpenXml.GetRowsSax(startIndex, endIndex, firstRowIsHeader);
}
Note: Make sure to call Dispose() (ex: helperOpenXml.Dispose();) before your application exits.
Update:
OpenXML stores dates as the number of days since 01 Jan 1900. For dates prior to 01 Jan 1900, they are stored in SharedString. For more info see Reading a date from xlsx using open xml sdk
Here's a code snippet:
Cell c = (Cell)reader.LoadCurrentElement();
...
string cellValue = string.Empty
...
cellValue = c.CellValue.InnerText;
double dateCellValue = 0;
Double.TryParse(cellValue, out dateCellValue);
DateTime dt = DateTime.FromOADate(dateCellValue);
cellValue = dt.ToString("yyyy/MM/dd");
Another simple alternative is this: Take a look at the NUGET package ExcelDataReader, with additional information on
https://github.com/ExcelDataReader/ExcelDataReader
Usage example:
[Fact]
void Test_ExcelDataReader()
{
System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
var scriptPath = Path.GetDirectoryName(Util.CurrentQueryPath); // LinqPad script path
var filePath = $#"{scriptPath}\TestExcel.xlsx";
using (var stream = File.Open(filePath, FileMode.Open, FileAccess.Read))
{
// Auto-detect format, supports:
// - Binary Excel files (2.0-2003 format; *.xls)
// - OpenXml Excel files (2007 format; *.xlsx, *.xlsb)
using (var reader = ExcelDataReader.ExcelReaderFactory.CreateReader(stream))
{
var result = reader.AsDataSet();
// The result of each spreadsheet is in result.Tables
var t0 = result.Tables[0];
Assert.True(t0.Rows[0][0].Dump("R0C0").ToString()=="Hello", "Expected 'Hello'");
Assert.True(t0.Rows[0][1].Dump("R0C1").ToString()=="World!", "Expected 'World!'");
} // using
} // using
} // fact
Before you start reading, you need to set and encoding provider as follows:
System.Text.Encoding.RegisterProvider(
System.Text.CodePagesEncodingProvider.Instance);
The cells are addressed the following way:
var t0 = result.Tables[0]; // table 0 is the first worksheet
var cell = t0.Rows[0][0]; // on table t0, read cell row 0 column 0
And you can easily loop through the rows and columns in a for loop as follows:
for (int r = 0; r < t0.Rows.Count; r++)
{
var row = t0.Rows[r];
var columns = row.ItemArray;
for (int c = 0; c < columns.Length; c++)
{
var cell = columns[c];
cell.Dump();
}
}
I use this code with EPPlus DLL, Don't forget to add reference. But should check to match with your requirement.
public DataTable ReadExcelDatatable(bool hasHeader = true)
{
using (var pck = new OfficeOpenXml.ExcelPackage())
{
using (var stream = File.OpenRead(this._fullPath))
{
pck.Load(stream);
}
var ws = pck.Workbook.Worksheets.First();
DataTable tbl = new DataTable();
int i = 1;
foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column])
{
//table head
tbl.Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column));
tbl.Columns.Add(_tableHead[i]);
i++;
}
var startRow = hasHeader ? 2 : 1;
for (int rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++)
{
var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
DataRow row = tbl.Rows.Add();
foreach (var cell in wsRow)
{
row[cell.Start.Column - 1] = cell.Text;
}
}
return tbl;
}
}
I'm going to give you a different answer. If the performance is bad loading a million rows into a DataTable resort to using a Driver to load the data: How to open a huge excel file efficiently
DataSet excelDataSet = new DataSet();
string filePath = #"c:\temp\BigBook.xlsx";
// For .XLSXs we use =Microsoft.ACE.OLEDB.12.0;, for .XLS we'd use Microsoft.Jet.OLEDB.4.0; with "';Extended Properties=\"Excel 8.0;HDR=YES;\"";
string connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + filePath + "';Extended Properties=\"Excel 12.0;HDR=YES;\"";
using (OleDbConnection conn = new OleDbConnection(connectionString))
{
conn.Open();
OleDbDataAdapter objDA = new System.Data.OleDb.OleDbDataAdapter
("select * from [Sheet1$]", conn);
objDA.Fill(excelDataSet);
//dataGridView1.DataSource = excelDataSet.Tables[0];
}
Next filter the DataSet's DataTable using a DataView. Using a DataView's RowFilter property you can specify subsets of rows based on their column values.
DataView prodView = new DataView(excelDataSet.Tables[0],
"UnitsInStock <= ReorderLevel",
"SupplierID, ProductName",
DataViewRowState.CurrentRows);
Ref: https://www.c-sharpcorner.com/article/dataview-in-C-Sharp/
Or you could use the DataTables' DefaultView RowFilter directly:
excelDataSet.Tables[0].DefaultView.RowFilter = "Amount >= 5000 and Amount <= 5999 and Name = 'StackOverflow'";

Reading large Excel files with c# and get the indexes

I tried to use Microsoft.Office.Interop.Excel but it's too slow when it comes to reading large excel documents (it was taking over 5 minutes for me). I read that DocumentFormat.OpenXml is faster when it comes to reading large excel documents but in the documentation it doesn't appear that I can't store the columns and row indexes.
For now, I am also only interested in the first row to get the column headers and I will be reading the rest of the document after some logic. I haven't been able to find a way to read only a portion of the excel document. I want to do something similar to this:
int r = 1; //row index
int c = 1; //column index
while (xlRange.Cells[r,c] != null && xlRange.Cells[r, c].Value2 != null)
{
TagListData.Add(new TagClass { IsTagSelected = false, TagName = xlRange[r, c].Value2.toString(), rIndex = r, cIndex = c });
c += 3;
}
Users will be picking excel documents through openFileDialog so there's no fixed number of rows of columns I can use. Is there a way I could make this work?
Thank you
In OpenXML if a cell has no text it might or might not appear in the list of cells (depends on whether it ever had text or not). Therefore the while (...Value2 != null) type of approach isn't really a safe way to do things in OpenXML.
Here is a very simple approach to reading the first row (written using LINQPad hence the Main and the Dump). Note the (simplified) use of the SharedStringTable to get the real text of the cell:
void Main()
{
var fileName = #"c:\temp\openxml-read-row.xlsx";
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(fs, false))
{
// Get the necessary bits of the doc
WorkbookPart workbookPart = doc.WorkbookPart;
SharedStringTablePart sstpart = workbookPart.GetPartsOfType<SharedStringTablePart>().First();
SharedStringTable sst = sstpart.SharedStringTable;
WorkbookStylesPart ssp = workbookPart.GetPartsOfType<WorkbookStylesPart>().First();
Stylesheet ss = ssp.Stylesheet;
// Get the first worksheet
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
Worksheet sheet = worksheetPart.Worksheet;
var rows = sheet.Descendants<Row>();
var row = rows.First();
foreach (var cell in row.Descendants<Cell>())
{
var txt = GetCellText(cell, sst);
// LINQPad specific method .Dump()
$"{cell.CellReference} = {txt}".Dump();
}
}
}
}
// Very basic way to get the text of a cell
private string GetCellText(Cell cell, SharedStringTable sst)
{
if (cell == null)
return "";
if ((cell.DataType != null) && (cell.DataType == CellValues.SharedString))
{
int ssid = int.Parse(cell.CellValue.Text);
string str = sst.ChildElements[ssid].InnerText;
return str;
}
else if (cell.CellValue != null)
{
return cell.CellValue.Text;
}
return "";
}
However... there's potentially a lot of work involved with OpenXML and you'd be well advised to try and use something like ClosedXML or EPPlus instead.
eg using ClosedXML
using (var workbook = new XLWorkbook(fileName))
{
var worksheet = workbook.Worksheets.First();
var row = worksheet.Row(1);
foreach (var cell in row.CellsUsed())
{
var txt = cell.Value.ToString();
// LINQPad specific method .Dump()
$"{cell.Address.ToString()} = {txt}".Dump();
}
}

OpenXML read performance too slow while parsing excel file

I am reading excel files using C# and OpenXML (SAX). The file size ranges between 5-10 mb and has 5-6 sheets. Number of rows per sheet vary between 25-100K.
I am using the following code to fetch the data. It's reading about 100 rows per second. In comparison, Apache POI is able to read a thousand rows in the same time.
I expected better results as both products are from Microsoft. Am I doing something wrong?
using (SpreadsheetDocument excelDoc = SpreadsheetDocument.Open(filename, true))
{
WorkbookPart workbookPart = excelDoc.WorkbookPart;
WorksheetPart mainSheet = (WorksheetPart)workbookPart.GetPartById(sheetIds[0].ToString());
OpenXmlReader reader = OpenXmlReader.Create(mainSheet);
//CellType c;
SharedStringTable t = workbookPart.SharedStringTablePart.SharedStringTable;
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
reader.ReadFirstChild();
do
{
if (reader.ElementType == typeof(Cell))
{
Cell c = (Cell)reader.LoadCurrentElement();
string cellValue;
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
int index = int.Parse(c.CellValue.InnerText);
SharedStringItem ssi = workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(int.Parse(c.CellValue.InnerText));
cellValue = ssi.Text.Text;
}
else
{
cellValue = c.CellValue.InnerText;
}
//Console.Out.Write("{0}: {1} ", c.CellReference, cellValue);
}
}
while (reader.ReadNextSibling());
}
}
}
Please try below code.
using (SpreadsheetDocument excelDoc = SpreadsheetDocument.Open(filename, true))
{
WorkbookPart workbookPart = excelDoc.WorkbookPart;
WorksheetPart mainSheet = (WorksheetPart)workbookPart.GetPartById(sheetIds[0].ToString());
OpenXmlReader reader = OpenXmlReader.Create(mainSheet);
//CellType c;
SharedStringTable t = workbookPart.SharedStringTablePart.SharedStringTable;
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
reader.ReadFirstChild();
do
{
if (reader.ElementType == typeof(Cell))
{
Cell c = (Cell)reader.LoadCurrentElement();
string cellValue;
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
int index = int.Parse(c.CellValue.InnerText);
SharedStringItem ssi = t.Elements<SharedStringItem>().ElementAt(int.Parse(c.CellValue.InnerText));
cellValue = ssi.Text.Text;
}
else
{
cellValue = c.CellValue.InnerText;
}
//Console.Out.Write("{0}: {1} ", c.CellReference, cellValue);
}
}
while (reader.ReadNextSibling());
}
}
}

DocumentFormat.openxml Excel File Reading Issue

I have used DocumentFormat.OpenXml dll in one of my project for reading and writing excel file.
During Reading of Excel File, Let's say for some column say Column1 I am having cell values as "TRUE" and "FALSE". When I read this Excel File using Following Code
private SharedStringTable sharedStringTable;
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = doc.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.FirstOrDefault();
SharedStringTablePart sharedStringTablePart = workbookPart.SharedStringTablePart;
if (sharedStringTablePart != null)
{
sharedStringTable = sharedStringTablePart.SharedStringTable;
}
var sheets = workbookPart.Workbook.Sheets;
foreach (Sheet sheet in sheets)
{
Worksheet requiredItem = (doc.WorkbookPart.GetPartById(sheet.Id.Value) as WorksheetPart).Worksheet;
var sheetData = requiredItem.Elements<SheetData>().First();
foreach (var rowItem in sheetData.Elements<Row>())
{
foreach (var item in rowItem.Elements<Cell>())
{
string requiredText = string.empty;
if (item.CellValue != null)
{
requiredText = item.CellValue.InnerText;
}
}
}
}
}
At that time for Cell Values "TRUE" and "FALSE" i am getting values 1 and 0 Respectively.
Can anyone provide me any way so that I can get values "TRUE" and "FALSE" instead of 1 and 0 ?

Categories