How to get data from Excel using the OpenXML C library# - c#

I wanted to get data from a table in Excel, but I get them in the form
Instead of
Program code
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
static void ReadExcelFileDOM(string fileName)
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(#"PATH", false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
string text;
foreach (Row r in sheetData.Elements<Row>())
{
foreach (Cell c in r.Elements<Cell>())
{
text = c.CellValue.Text;
Console.Write(text + " ");
}
}
Console.WriteLine();
Console.ReadKey();
}
}
I need to get a table using the openxml library

using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(fs, false))
{
WorkbookPart workbookPart = doc.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
SharedStringTablePart sstpart = workbookPart.GetPartsOfType<SharedStringTablePart>().First();
SharedStringTable sst = sstpart.SharedStringTable;
foreach (var row in sheetData.Elements<Row>())
{
foreach (var cell in row.Elements<Cell>())
{
if (cell.DataType != null && cell.DataType == CellValues.SharedString)
{
int ssid = int.Parse(cell.CellValue.Text);
string str = sst.ChildElements[ssid].InnerText;
Console.WriteLine("Shared string {0}: {1}", ssid, str);
}
else
{
Console.WriteLine("Shared string {0}: {1}", );
}
}
}

Related

How to create excel file with SharedStringTablePart using DocumentFormat.OpenXml

There is code which works with Excel like this:
using SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open("filePath", true);
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart!;
SheetData sheetData = GetSheetData(workbookPart, outFile);
Worksheet worksheet = sheetData.Parent as Worksheet;
List<SharedStringItem> sharedStringTableElements = workbookPart.SharedStringTablePart!.SharedStringTable.Elements<SharedStringItem>().ToList();
List<Row> sheetRows = sheetData.Elements<Row>().ToList();
...
But it breaks when I try to open my generated file. SharedStringTablePart is null.
This is how I create Excel files:
using SpreadsheetDocument spreadSheet = CreateSpreadsheet(filePath);
WorkbookPart workbookPart = spreadSheet.AddWorkbookPart();
workbookPart.Workbook = new Workbook();
WorksheetPart worksheetPart = workbookPart.AddNewPart<WorksheetPart>();
SheetData sheetData = new();
worksheetPart.Worksheet = new Worksheet(sheetData);
Sheets sheets = workbookPart.Workbook.AppendChild(new Sheets());
string relationshipId = workbookPart.GetIdOfPart(worksheetPart);
Sheet sheet = new() {
Id = relationshipId,
SheetId = FirstSheetId,
Name = "SheetName",
};
sheets.Append(sheet);
//fill sheetData here
workbookPart.Workbook.Save();
spreadSheet.Close();
How do I change this code so it doesn't break when accessing a SharedStringTablePart? How can I fill it in correctly?
Thanks in advance.
You create a new SpreadsheetDocument and add the WorkbookPart. Right after being created, that new WorkbookPart does not have any associated parts. Thus, you'll have to create all required parts, including the SharedStringTablePart, like so:
// Let's assume you created the WorkbookPart as follows.
WorkbookPart workbookPart = spreadSheet.AddWorkbookPart();
// Create and initialize the associated SharedStringTablePart as well.
var sharedStringTablePart = workbookPart.AddNewPart<SharedStringTablePart>();
sharedStringTablePart.SharedStringTable = new SharedStringTable();
Once added as shown above, workbookPart.SharedStringTablePart will no longer be null.
I came back to write how I did it.
First of all, we need to create SharedStringTablePart as Thomas said:
private static SharedStringTablePart GetSharedStringTablePart(SpreadsheetDocument spreadSheet)
{
if (spreadSheet.WorkbookPart!.GetPartsOfType<SharedStringTablePart>().Any())
{
return spreadSheet.WorkbookPart.GetPartsOfType<SharedStringTablePart>().First();
}
else
{
return spreadSheet.WorkbookPart.AddNewPart<SharedStringTablePart>();
}
}
Then I created a helper method to insert the data into this table:
private static string InsertSharedStringItem(string text, SharedStringTablePart shareStringPart)
{
shareStringPart.SharedStringTable ??= new SharedStringTable();
int i = 0;
foreach (SharedStringItem item in shareStringPart.SharedStringTable.Elements<SharedStringItem>())
{
if (item.InnerText == text)
{
return i.ToString();
}
i++;
}
shareStringPart.SharedStringTable.AppendChild(new SharedStringItem(new Text(text)));
shareStringPart.SharedStringTable.Save();
return i.ToString();
}
So next time when I inserting cell value I do it like this:
string index = InsertSharedStringItem("some text", sharedTablePart);
Cell cell = new() {
DataType = CellValues.SharedString,
CellValue = new CellValue(index),
...
};
row.AppendChild(cell);

Is there anyway to upload a large Excel into a database quickly using C#?

I am trying to upload a large Ms Excel file into a relational database using C#.
Using following code:
But the process takes a very long time to read the excel (Approximated 45 hours to read an excel file with 185,000 record).
public static DataTable GetSpreadsheetWorkbookSheet(string filepath)
{
DataTable dataTable = new DataTable();
List<string> sheetList = new List<string>();
using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(filepath, true))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
int sheetCount = workbookPart.Workbook.Descendants<Sheet>().Count();
string relationshipId = "";
dataTable.Columns.Add("Sheet");
dataTable.Columns.Add("Path");
for (int i = 0; i < sheetCount; i++)
{
DataRow dataRow = dataTable.NewRow();
string sheetName = workbookPart.Workbook.Descendants<Sheet>().ElementAt(i).Name;
relationshipId = workbookPart.Workbook.Descendants<Sheet>().ElementAt(i).Id;
dataTable.Rows.Add(sheetName, filepath);
}
}
return dataTable;
}
public static DataTable CreateSpreadsheetWorkbook(string filepath, string sheetName)
{
DataTable dataTable = new DataTable();
List<string> sheetList = new List<string>();
using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(filepath, true))
{
WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
string relationshipId = "";
int x = sheets.ToList().Count;
if (sheets.ToList().Count > 1)
{
relationshipId = sheets.ToList().Find(io => io.Name.ToString().Equals(sheetName)).Id;
}
else
{
relationshipId = sheets.ToList().First().Id.Value;
}
WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
Worksheet workSheet = worksheetPart.Worksheet;
SheetData sheetData = workSheet.GetFirstChild<SheetData>();
IEnumerable<Row> rows = sheetData.Descendants<Row>();
foreach (Cell cell in rows.ElementAt(0))
{
dataTable.Columns.Add(GetCellValue(spreadSheetDocument, cell));
}
foreach (Row row in rows)
{
int t = row.Descendants<Cell>().Count();
DataRow dataRow = dataTable.NewRow();
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
Thread.Sleep(1);
dataRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
Thread.Sleep(1);
}
dataTable.Rows.Add(dataRow);
Thread.Sleep(1);
}
}
dataTable.Rows.RemoveAt(0);
return dataTable;
}
private static string GetCellValue(SpreadsheetDocument document, Cell cell)
{
SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
string value = cell.InnerText;
if (cell.DataType != null && (cell.DataType.Value == CellValues.SharedString))
{
string txt = stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText.ToString();
Thread.Sleep(1);
return txt;
}
else if (cell.DataType != null && (cell.DataType.Value == CellValues.String))
{
string txt = value;
Thread.Sleep(1);
return txt;
}
else if (cell.DataType != null && (cell.DataType.Value == CellValues.InlineString))
{
string txt = stringTablePart.ToString();
Thread.Sleep(1);
return txt;
}
else if (cell.DataType == null)
{
string txt = value;
Thread.Sleep(1);
return txt;
}
else
{
if (String.IsNullOrEmpty(value) || String.IsNullOrWhiteSpace(value))
{
value = "0";
}
return Convert.ToDecimal(value).ToString("N4");
}
}
I need to upload the file quickly, Is there anyway to doing it quickly, Please help?

OpenXML read performance too slow while parsing excel file

I am reading excel files using C# and OpenXML (SAX). The file size ranges between 5-10 mb and has 5-6 sheets. Number of rows per sheet vary between 25-100K.
I am using the following code to fetch the data. It's reading about 100 rows per second. In comparison, Apache POI is able to read a thousand rows in the same time.
I expected better results as both products are from Microsoft. Am I doing something wrong?
using (SpreadsheetDocument excelDoc = SpreadsheetDocument.Open(filename, true))
{
WorkbookPart workbookPart = excelDoc.WorkbookPart;
WorksheetPart mainSheet = (WorksheetPart)workbookPart.GetPartById(sheetIds[0].ToString());
OpenXmlReader reader = OpenXmlReader.Create(mainSheet);
//CellType c;
SharedStringTable t = workbookPart.SharedStringTablePart.SharedStringTable;
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
reader.ReadFirstChild();
do
{
if (reader.ElementType == typeof(Cell))
{
Cell c = (Cell)reader.LoadCurrentElement();
string cellValue;
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
int index = int.Parse(c.CellValue.InnerText);
SharedStringItem ssi = workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(int.Parse(c.CellValue.InnerText));
cellValue = ssi.Text.Text;
}
else
{
cellValue = c.CellValue.InnerText;
}
//Console.Out.Write("{0}: {1} ", c.CellReference, cellValue);
}
}
while (reader.ReadNextSibling());
}
}
}
Please try below code.
using (SpreadsheetDocument excelDoc = SpreadsheetDocument.Open(filename, true))
{
WorkbookPart workbookPart = excelDoc.WorkbookPart;
WorksheetPart mainSheet = (WorksheetPart)workbookPart.GetPartById(sheetIds[0].ToString());
OpenXmlReader reader = OpenXmlReader.Create(mainSheet);
//CellType c;
SharedStringTable t = workbookPart.SharedStringTablePart.SharedStringTable;
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
reader.ReadFirstChild();
do
{
if (reader.ElementType == typeof(Cell))
{
Cell c = (Cell)reader.LoadCurrentElement();
string cellValue;
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
int index = int.Parse(c.CellValue.InnerText);
SharedStringItem ssi = t.Elements<SharedStringItem>().ElementAt(int.Parse(c.CellValue.InnerText));
cellValue = ssi.Text.Text;
}
else
{
cellValue = c.CellValue.InnerText;
}
//Console.Out.Write("{0}: {1} ", c.CellReference, cellValue);
}
}
while (reader.ReadNextSibling());
}
}
}

OpenXML Add Cell to WorkSheet

I am currently doing my head in working over the OpenXML 2.5 Framework on the MSDN site here, https://msdn.microsoft.com/en-us/library/office/cc861607.aspx
All methods I have tried to add a cell to an existing worksheet corrupt the workbook as the MSDN site only outlines creating the worksheet and not modifying it.
Everytime I add a cell the system wants a whole new worksheet and will not allow the addition of a cell to an existing worksheet. I have been banging my head for hours going over MSDN and Googling this with no luck.
The problem is I need a class that can receiving strings and update the excel file. Has anyone been able to add a cell to an existing worksheet? My issue seems to be due to a string by string solution.
Working input (PowerShell) only works if a new Worksheet is created for the Cell,
[CmdletBinding(SupportsShouldProcess=$true, ConfirmImpact='Medium')]
$cSharpData = (
[Reflection.Assembly]::LoadWithPartialName("DocumentFormat.OpenXml"),
[Reflection.Assembly]::LoadWithPartialName("WindowsBase"),
[Reflection.Assembly]::LoadWithPartialName("System.Linq")
)
[String]$cSharpClass = Get-Content .\method.cs
$cSharpType = Add-Type -ReferencedAssemblies $cSharpData -TypeDefinition $cSharpClass
$testData = Get-WmiObject Win32_QuickFixEngineering
[DoExcelMethod]::CreateXLSX('.\test.xlsx')
$locNo = 1
[DoExcelMethod]::AddSheetData('.\test.xlsx', $testData, 'TestWS', 'A', $locNo)
The file this is point at has the following,
using System;
using System.Linq;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
public class DoExcelMethod {
private static int SharedDataItem(string sData, SharedStringTablePart ssPart) {
if (ssPart.SharedStringTable == null) {
ssPart.SharedStringTable = new SharedStringTable();
}
int cnt = 0;
foreach (SharedStringItem sspItem in ssPart.SharedStringTable.Elements<SharedStringItem>()) {
if (sspItem.InnerText == sData) {
return cnt;
}
cnt++;
}
ssPart.SharedStringTable.AppendChild(new SharedStringItem(new DocumentFormat.OpenXml.Spreadsheet.Text(sData)));
ssPart.SharedStringTable.Save();
return cnt;
}
private static WorksheetPart InsertWorksheet(string wsName, WorkbookPart wbPart) {
WorksheetPart newWsPart = wbPart.AddNewPart<WorksheetPart>();
newWsPart.Worksheet = new Worksheet(new SheetData());
newWsPart.Worksheet.Save();
Sheets sheets = wbPart.Workbook.GetFirstChild<Sheets>();
string relId = wbPart.GetIdOfPart(newWsPart);
uint sheetId = 1;
if (sheets.Elements<Sheet>().Count() > 0) {
sheetId = sheets.Elements<Sheet>().Select(s => s.SheetId.Value).Max() + 1;
}
Sheet sheet = new Sheet() { Id = relId, SheetId = sheetId, Name = wsName };
sheets.Append(sheet);
wbPart.Workbook.Save();
return newWsPart;
}
private static Cell InsertCellInWorksheet(string columnName, uint rowIndex, WorksheetPart worksheetPart) {
Worksheet worksheet = worksheetPart.Worksheet;
SheetData sheetData = worksheet.GetFirstChild<SheetData>();
string cellReference = columnName + rowIndex;
Row row;
if (sheetData.Elements<Row>().Where(r => r.RowIndex == rowIndex).Count() != 0) {
row = sheetData.Elements<Row>().Where(r => r.RowIndex == rowIndex).First();
} else {
row = new Row() { RowIndex = rowIndex };
sheetData.Append(row);
}
if (row.Elements<Cell>().Where(c => c.CellReference.Value == columnName + rowIndex).Count() > 0) {
return row.Elements<Cell>().Where(c => c.CellReference.Value == cellReference).First();
} else {
Cell refCell = null;
foreach (Cell cell in row.Elements<Cell>()) {
if (string.Compare(cell.CellReference.Value, cellReference, true) > 0) {
refCell = cell;
break;
}
}
Cell newCell = new Cell() { CellReference = cellReference };
row.InsertBefore(newCell, refCell);
worksheet.Save();
return newCell;
}
}
public static void CreateXLSX(string xlsxFile) {
SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Create(xlsxFile, SpreadsheetDocumentType.Workbook);
WorkbookPart workbookpart = spreadsheetDocument.AddWorkbookPart();
workbookpart.Workbook = new Workbook();
WorksheetPart worksheetPart = workbookpart.AddNewPart<WorksheetPart>();
worksheetPart.Worksheet = new Worksheet(new SheetData());
Sheets sheets = spreadsheetDocument.WorkbookPart.Workbook.AppendChild<Sheets>(new Sheets());
Sheet sheet = new Sheet() { Id = spreadsheetDocument.WorkbookPart.GetIdOfPart(worksheetPart), SheetId = 1, Name = "Default" };
sheets.Append(sheet);
workbookpart.Workbook.Save();
spreadsheetDocument.Close();
}
public static void AddSheetData(string xlsxFile, string psData, string wsName, string psCol, uint psRow) {
using (SpreadsheetDocument sSheet = SpreadsheetDocument.Open(xlsxFile, true)) {
SharedStringTablePart ssPart;
if (sSheet.WorkbookPart.GetPartsOfType<SharedStringTablePart>().Count() > 0) {
ssPart = sSheet.WorkbookPart.GetPartsOfType<SharedStringTablePart>().First();
} else {
ssPart = sSheet.WorkbookPart.AddNewPart<SharedStringTablePart>();
}
int ssIns = SharedDataItem(psData, ssPart);
WorksheetPart wsPart = InsertWorksheet(wsName, sSheet.WorkbookPart);
Cell cell = InsertCellInWorksheet(psCol, psRow, wsPart);
cell.CellValue = new CellValue(ssIns.ToString());
cell.DataType = new EnumValue<CellValues>(CellValues.SharedString);
wsPart.Worksheet.Save();
}
}
}
So despite this working I cannot get a cell into an existing worksheet, can anyone help as I am going insane :(
Thanks all
The issue you have is in the call to InsertWorksheet in AddSheetData. You are calling the InsertWorksheet method irrespective of whether or not the worksheet already exists. Instead of doing that, you can first search for the worksheet then if it exists you can use it and if it doesn't you can create a new one.
Firstly, you can search for a WorksheetPart by its name using a method such as this one (taken from my answer here):
private static WorksheetPart GetWorksheetPartBySheetName(WorkbookPart workbookPart, string sheetName)
{
WorksheetPart worksheetPart = null;
//find the sheet (note this is case-sensitive)
IEnumerable<Sheet> sheets = workbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>().Where(s => s.Name == sheetName);
if (sheets.Count() > 0)
{
string relationshipId = sheets.First().Id.Value;
worksheetPart = (WorksheetPart)workbookPart.GetPartById(relationshipId);
}
return worksheetPart;
}
If that method finds the WorksheetPart then it will return it, if not it will return null.
Once you have that you just need a small tweak to AddSheetData to call GetWorksheetPartBySheetName then only call InsertWorksheet if that method returns null. To do that you can replace this line
WorksheetPart wsPart = InsertWorksheet(wsName, sSheet.WorkbookPart);
with this
WorksheetPart wsPart = GetWorksheetPartBySheetName(sSheet.WorkbookPart, wsName);
if (wsPart == null)
wsPart = InsertWorksheet(wsName, sSheet.WorkbookPart);

DocumentFormat.openxml Excel File Reading Issue

I have used DocumentFormat.OpenXml dll in one of my project for reading and writing excel file.
During Reading of Excel File, Let's say for some column say Column1 I am having cell values as "TRUE" and "FALSE". When I read this Excel File using Following Code
private SharedStringTable sharedStringTable;
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = doc.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.FirstOrDefault();
SharedStringTablePart sharedStringTablePart = workbookPart.SharedStringTablePart;
if (sharedStringTablePart != null)
{
sharedStringTable = sharedStringTablePart.SharedStringTable;
}
var sheets = workbookPart.Workbook.Sheets;
foreach (Sheet sheet in sheets)
{
Worksheet requiredItem = (doc.WorkbookPart.GetPartById(sheet.Id.Value) as WorksheetPart).Worksheet;
var sheetData = requiredItem.Elements<SheetData>().First();
foreach (var rowItem in sheetData.Elements<Row>())
{
foreach (var item in rowItem.Elements<Cell>())
{
string requiredText = string.empty;
if (item.CellValue != null)
{
requiredText = item.CellValue.InnerText;
}
}
}
}
}
At that time for Cell Values "TRUE" and "FALSE" i am getting values 1 and 0 Respectively.
Can anyone provide me any way so that I can get values "TRUE" and "FALSE" instead of 1 and 0 ?

Categories