I tried to use Microsoft.Office.Interop.Excel but it's too slow when it comes to reading large excel documents (it was taking over 5 minutes for me). I read that DocumentFormat.OpenXml is faster when it comes to reading large excel documents but in the documentation it doesn't appear that I can't store the columns and row indexes.
For now, I am also only interested in the first row to get the column headers and I will be reading the rest of the document after some logic. I haven't been able to find a way to read only a portion of the excel document. I want to do something similar to this:
int r = 1; //row index
int c = 1; //column index
while (xlRange.Cells[r,c] != null && xlRange.Cells[r, c].Value2 != null)
{
TagListData.Add(new TagClass { IsTagSelected = false, TagName = xlRange[r, c].Value2.toString(), rIndex = r, cIndex = c });
c += 3;
}
Users will be picking excel documents through openFileDialog so there's no fixed number of rows of columns I can use. Is there a way I could make this work?
Thank you
In OpenXML if a cell has no text it might or might not appear in the list of cells (depends on whether it ever had text or not). Therefore the while (...Value2 != null) type of approach isn't really a safe way to do things in OpenXML.
Here is a very simple approach to reading the first row (written using LINQPad hence the Main and the Dump). Note the (simplified) use of the SharedStringTable to get the real text of the cell:
void Main()
{
var fileName = #"c:\temp\openxml-read-row.xlsx";
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(fs, false))
{
// Get the necessary bits of the doc
WorkbookPart workbookPart = doc.WorkbookPart;
SharedStringTablePart sstpart = workbookPart.GetPartsOfType<SharedStringTablePart>().First();
SharedStringTable sst = sstpart.SharedStringTable;
WorkbookStylesPart ssp = workbookPart.GetPartsOfType<WorkbookStylesPart>().First();
Stylesheet ss = ssp.Stylesheet;
// Get the first worksheet
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
Worksheet sheet = worksheetPart.Worksheet;
var rows = sheet.Descendants<Row>();
var row = rows.First();
foreach (var cell in row.Descendants<Cell>())
{
var txt = GetCellText(cell, sst);
// LINQPad specific method .Dump()
$"{cell.CellReference} = {txt}".Dump();
}
}
}
}
// Very basic way to get the text of a cell
private string GetCellText(Cell cell, SharedStringTable sst)
{
if (cell == null)
return "";
if ((cell.DataType != null) && (cell.DataType == CellValues.SharedString))
{
int ssid = int.Parse(cell.CellValue.Text);
string str = sst.ChildElements[ssid].InnerText;
return str;
}
else if (cell.CellValue != null)
{
return cell.CellValue.Text;
}
return "";
}
However... there's potentially a lot of work involved with OpenXML and you'd be well advised to try and use something like ClosedXML or EPPlus instead.
eg using ClosedXML
using (var workbook = new XLWorkbook(fileName))
{
var worksheet = workbook.Worksheets.First();
var row = worksheet.Row(1);
foreach (var cell in row.CellsUsed())
{
var txt = cell.Value.ToString();
// LINQPad specific method .Dump()
$"{cell.Address.ToString()} = {txt}".Dump();
}
}
Related
I want to implement openXml sdk 2.5 into my project. I do everything in this link
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
using System.IO.Packaging;
static void Main(string[] args)
{
String fileName = #"C:\OPENXML\BigData.xlsx";
// Comment one of the following lines to test the method separately.
ReadExcelFileDOM(fileName); // DOM
//ReadExcelFileSAX(fileName); // SAX
}
// The DOM approach.
// Note that the code below works only for cells that contain numeric values.
//
static void ReadExcelFileDOM(string fileName)
{
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
string text;
int rowCount= sheetData.Elements<Row>().Count();
foreach (Row r in sheetData.Elements<Row>())
{
foreach (Cell c in r.Elements<Cell>())
{
text = c.CellValue.Text;
Console.Write(text + " ");
}
}
Console.WriteLine();
Console.ReadKey();
}
}
But i am not getting any row. It hasn't entered loop. Note: I also set up openXml sdk 2.5 my computer
And I find below code this is work for numeric value.For string value it writes 0 1 2 ...
private static void Main(string[] args)
{
var filePath = #"C:/OPENXML/BigData.xlsx";
using (var document = SpreadsheetDocument.Open(filePath, false))
{
var workbookPart = document.WorkbookPart;
var workbook = workbookPart.Workbook;
var sheets = workbook.Descendants<Sheet>();
foreach (var sheet in sheets)
{
var worksheetPart = (WorksheetPart)workbookPart.GetPartById(sheet.Id);
var sharedStringPart = workbookPart.SharedStringTablePart;
//var values = sharedStringPart.SharedStringTable.Elements<SharedStringItem>().ToArray();
string text;
var rows = worksheetPart.Worksheet.Descendants<Row>();
foreach (var row in rows)
{
Console.WriteLine();
int count = row.Elements<Cell>().Count();
foreach (Cell c in row.Elements<Cell>())
{
text = c.CellValue.InnerText;
Console.Write(text + " ");
}
}
}
}
Console.ReadLine();
}
Your approach seemed to work ok for me - in that it did "enter the loop".
Nevertheless you could also try something like the following:
void Main()
{
string fileName = #"c:\path\to\my\file.xlsx";
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(fs, false))
{
WorkbookPart workbookPart = doc.WorkbookPart;
SharedStringTablePart sstpart = workbookPart.GetPartsOfType<SharedStringTablePart>().First();
SharedStringTable sst = sstpart.SharedStringTable;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
Worksheet sheet = worksheetPart.Worksheet;
var cells = sheet.Descendants<Cell>();
var rows = sheet.Descendants<Row>();
Console.WriteLine("Row count = {0}", rows.LongCount());
Console.WriteLine("Cell count = {0}", cells.LongCount());
// One way: go through each cell in the sheet
foreach (Cell cell in cells)
{
if ((cell.DataType != null) && (cell.DataType == CellValues.SharedString))
{
int ssid = int.Parse(cell.CellValue.Text);
string str = sst.ChildElements[ssid].InnerText;
Console.WriteLine("Shared string {0}: {1}", ssid, str);
}
else if (cell.CellValue != null)
{
Console.WriteLine("Cell contents: {0}", cell.CellValue.Text);
}
}
// Or... via each row
foreach (Row row in rows)
{
foreach (Cell c in row.Elements<Cell>())
{
if ((c.DataType != null) && (c.DataType == CellValues.SharedString))
{
int ssid = int.Parse(c.CellValue.Text);
string str = sst.ChildElements[ssid].InnerText;
Console.WriteLine("Shared string {0}: {1}", ssid, str);
}
else if (c.CellValue != null)
{
Console.WriteLine("Cell contents: {0}", c.CellValue.Text);
}
}
}
}
}
}
I used the filestream approach to open the workbook because this allows you to open it with shared access - so that you can have the workbook open in Excel at the same time. The Spreadsheet.Open(... method won't work if the workbook is open elsewhere.
Perhaps that is why your code didn't work.
Note, also, the use of the SharedStringTable to get the cell text where appropriate.
EDIT 2018-07-11:
Since this post is still getting votes I should also point out that in many cases it may be a lot easier to use ClosedXML to manipulate/read/edit your workbooks. The documentation examples are pretty user friendly and the coding is, in my limited experience, much more straight forward. Just be aware that it does not (yet) implement all the Excel functions (for example INDEX and MATCH) which may or may not be an issue. [Not that I would want to be trying to deal with INDEX and MATCH in OpenXML anyway.]
I had the same issue as the OP, and the answer above did not work for me.
I think this is the issue: when you create a document in Excel (not programmatically), you have 3 sheets by default and the WorksheetParts that has the row data for Sheet1 is the last WorksheetParts element, not the first.
I figured this out by putting a watch for document.WorkbookPart.WorksheetParts in Visual Studio, expanding Results, then looking at all of the sub elements until I found a SheetData object where HasChildren = true.
Try this:
// open the document read-only
SpreadSheetDocument document = SpreadsheetDocument.Open(filePath, false);
SharedStringTable sharedStringTable = document.WorkbookPart.SharedStringTablePart.SharedStringTable;
string cellValue = null;
foreach (WorksheetPart worksheetPart in document.WorkbookPart.WorksheetParts)
{
foreach (SheetData sheetData in worksheetPart.Worksheet.Elements<SheetData>())
{
if (sheetData.HasChildren)
{
foreach (Row row in sheetData.Elements<Row>())
{
foreach (Cell cell in row.Elements<Cell>())
{
cellValue = cell.InnerText;
if (cell.DataType == CellValues.SharedString)
{
Console.WriteLine("cell val: " + sharedStringTable.ElementAt(Int32.Parse(cellValue)).InnerText);
}
else
{
Console.WriteLine("cell val: " + cellValue);
}
}
}
}
}
}
document.Close();
Read Large Excel :
openxml has two approaches of DOM and SAX to read an excel. the DOM one consume more RAM resource since it loads the whole xml content(Excel file) in Memory but its strong typed approach.
SAX in other hand is event base parse. more here
so if you are facing large excel file its better to use SAX.
the below code sample uses SAX approach and also handle two important scenario in excel file reading.
open xml skips the empty cells so your dataset faces displacement and wrong index.
you need to skip the empty rows also.
this function returns the exact actual index of the cell at the time and handle the first scenario.
from here
private static int CellReferenceToIndex(Cell cell)
{
int index = 0;
string reference = cell.CellReference.ToString().ToUpper();
foreach (char ch in reference)
{
if (Char.IsLetter(ch))
{
int value = (int)ch - (int)'A';
index = (index == 0) ? value : ((index + 1) * 26) + value;
}
else
return index;
}
return index;
}
code to read excel sax approach.
//i want to import excel to data table
dt = new DataTable();
using (SpreadsheetDocument document = SpreadsheetDocument.Open(path, false))
{
WorkbookPart workbookPart = document.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
//row counter
int rcnt = 0;
while (reader.Read())
{
//find xml row element type
//to understand the element type you can change your excel file eg : test.xlsx to test.zip
//and inside that you may observe the elements in xl/worksheets/sheet.xml
//that helps to understand openxml better
if (reader.ElementType == typeof(Row))
{
//create data table row type to be populated by cells of this row
DataRow tempRow = dt.NewRow();
//***** HANDLE THE SECOND SENARIO*****
//if row has attribute means it is not a empty row
if (reader.HasAttributes)
{
//read the child of row element which is cells
//here first element
reader.ReadFirstChild();
do
{
//find xml cell element type
if (reader.ElementType == typeof(Cell))
{
Cell c = (Cell)reader.LoadCurrentElement();
string cellValue;
int actualCellIndex = CellReferenceToIndex(c);
if (c.DataType != null && c.DataType == CellValues.SharedString)
{
SharedStringItem ssi = workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(int.Parse(c.CellValue.InnerText));
cellValue = ssi.Text.Text;
}
else
{
cellValue = c.CellValue.InnerText;
}
//if row index is 0 its header so columns headers are added & also can do some headers check incase
if (rcnt == 0)
{
dt.Columns.Add(cellValue);
}
else
{
// instead of tempRow[c.CellReference] = cellValue;
tempRow[actualCellIndex] = cellValue;
}
}
}
while (reader.ReadNextSibling());
//if its not the header row so append rowdata to the datatable
if (rcnt != 0)
{
dt.Rows.Add(tempRow);
}
rcnt++;
}
}
}
}
Everything is explained in the accepted answer.
Here is just an extension method to solve the problem
public static string GetCellText(this Cell cell, in SharedStringTable sst)
{
if (cell.CellValue is null)
return string.Empty;
if ((cell.DataType is not null) &&
(cell.DataType == CellValues.SharedString))
{
int ssid = int.Parse(cell.CellValue.Text);
return sst.ChildElements[ssid].InnerText;
}
return cell.CellValue.Text;
}
I have been trying for a couple of days now to insert a string into an openxml spreadsheet. Everything else (so far) works, everything but that.
This is the code i'm currently running (note, this is purely for testing purposes and is pretty basic):
using (SpreadsheetDocument spreadSheet = SpreadsheetDocument.Create(file + "test.zip", SpreadsheetDocumentType.Workbook))
{
spreadSheet.AddWorkbookPart();
spreadSheet.WorkbookPart.Workbook = new Workbook();
spreadSheet.WorkbookPart.AddNewPart<SharedStringTablePart>();
spreadSheet.WorkbookPart.SharedStringTablePart.SharedStringTable = new SharedStringTable() {Count=1, UniqueCount=1};
spreadSheet.WorkbookPart.SharedStringTablePart.SharedStringTable.AppendChild(new SharedStringItem(new Text("test")));
spreadSheet.WorkbookPart.SharedStringTablePart.SharedStringTable.Save();
preadSheet.WorkbookPart.AddNewPart<WorksheetPart>();
spreadSheet.WorkbookPart.WorksheetParts.First().Worksheet = new Worksheet();
spreadSheet.WorkbookPart.WorksheetParts.First().Worksheet.AppendChild(new SheetData());
spreadSheet.WorkbookPart.WorksheetParts.First().Worksheet.First().AppendChild(new Row());
Row r2 = new Row() { RowIndex = 5 };
spreadSheet.WorkbookPart.WorksheetParts.First().Worksheet.First().AppendChild(r2);
r2.AppendChild(new Cell() { CellReference = "A5", CellValue = new CellValue("0"), DataType = new EnumValue<CellValues>(CellValues.SharedString) });
spreadSheet.WorkbookPart.WorksheetParts.First().Worksheet.Save();
spreadSheet.WorkbookPart.Workbook.GetFirstChild<Sheets>().AppendChild(new Sheet()
{
Id = spreadSheet.WorkbookPart.GetIdOfPart(spreadSheet.WorkbookPart.WorksheetParts.First()),
SheetId = 1,
Name = "test"
});
spreadSheet.WorkbookPart.Workbook.Save();
}
Everything seems to work, the file saves where i want it to and, generally, looks the way i expect it to. The "only" issue is that, when i add the string to the cell, excel will give me an error saying that the file is corrupt and continues to delete said cell.
Am i doing something wrong?
Try this, in your method..
if (spreadSheet.WorkbookPart.GetPartsOfType<SharedStringTablePart>().Count() > 0)
{
shareStringPart = spreadSheet.WorkbookPart.GetPartsOfType<SharedStringTablePart>().First();
}
else
{
shareStringPart = spreadSheet.WorkbookPart.AddNewPart<SharedStringTablePart>();
}
index = InsertSharedStringItem(cell_value, shareStringPart);
cell.CellValue = new CellValue(index.ToString());
cell.DataType = new EnumValue<CellValues>(CellValues.SharedString);
InsertSharedString Method:
private static int InsertSharedStringItem(string text, SharedStringTablePart shareStringPart)
{
// If the part does not contain a SharedStringTable, create one.
if (shareStringPart.SharedStringTable == null)
{
shareStringPart.SharedStringTable = new SharedStringTable();
}
int i = 0;
// Iterate through all the items in the SharedStringTable. If the text already exists, return its index.
foreach (SharedStringItem item in shareStringPart.SharedStringTable.Elements<SharedStringItem>())
{
if (item.InnerText == text)
{
return i;
}
i++;
}
// The text does not exist in the part. Create the SharedStringItem and return its index.
shareStringPart.SharedStringTable.AppendChild(new SharedStringItem(new DocumentFormat.OpenXml.Spreadsheet.Text(text)));
shareStringPart.SharedStringTable.Save();
return i;
}
I think you should create a spreadsheet (using Excel), add the text into the cell and then open this spreadsheet in "OpenXML 2.5 Productivity Tool". There is a "Reflect code" button in the productivity tool that would help you replicate in code what needs to be done. That's the easiest way, I've found to solve such bugs.
I know that there are different ways to read an Excel file:
Iterop
Oledb
Open Xml SDK
Compatibility is not a question because the program will be executed in a controlled environment.
My Requirement :
Read a file to a DataTable / CUstom Entities (I don't know how to make dynamic properties/fields to an object[column names will be variating in an Excel file])
Use DataTable/Custom Entities to perform some operations using its data.
Update DataTable with the results of the operations
Write it back to excel file.
Which would be simpler.
Also if possible advice me on custom Entities (adding properties/fields to an object dynamically)
Take a look at Linq-to-Excel. It's pretty neat.
var book = new LinqToExcel.ExcelQueryFactory(#"File.xlsx");
var query =
from row in book.Worksheet("Stock Entry")
let item = new
{
Code = row["Code"].Cast<string>(),
Supplier = row["Supplier"].Cast<string>(),
Ref = row["Ref"].Cast<string>(),
}
where item.Supplier == "Walmart"
select item;
It also allows for strongly-typed row access too.
I realize this question was asked nearly 7 years ago but it's still a top Google search result for certain keywords regarding importing excel data with C#, so I wanted to provide an alternative based on some recent tech developments.
Importing Excel data has become such a common task to my everyday duties, that I've streamlined the process and documented the method on my blog: best way to read excel file in c#.
I use NPOI because it can read/write Excel files without Microsoft Office installed and it doesn't use COM+ or any interops. That means it can work in the cloud!
But the real magic comes from pairing up with NPOI Mapper from Donny Tian because it allows me to map the Excel columns to properties in my C# classes without writing any code. It's beautiful.
Here is the basic idea:
I create a .net class that matches/maps the Excel columns I'm interested in:
class CustomExcelFormat
{
[Column("District")]
public int District { get; set; }
[Column("DM")]
public string FullName { get; set; }
[Column("Email Address")]
public string EmailAddress { get; set; }
[Column("Username")]
public string Username { get; set; }
public string FirstName
{
get
{
return Username.Split('.')[0];
}
}
public string LastName
{
get
{
return Username.Split('.')[1];
}
}
}
Notice, it allows me to map based on column name if I want to!
Then when I process the excel file all I need to do is something like this:
public void Execute(string localPath, int sheetIndex)
{
IWorkbook workbook;
using (FileStream file = new FileStream(localPath, FileMode.Open, FileAccess.Read))
{
workbook = WorkbookFactory.Create(file);
}
var importer = new Mapper(workbook);
var items = importer.Take<CustomExcelFormat>(sheetIndex);
foreach(var item in items)
{
var row = item.Value;
if (string.IsNullOrEmpty(row.EmailAddress))
continue;
UpdateUser(row);
}
DataContext.SaveChanges();
}
Now, admittedly, my code does not modify the Excel file itself. I am instead saving the data to a database using Entity Framework (that's why you see "UpdateUser" and "SaveChanges" in my example). But there is already a good discussion on SO about how to save/modify a file using NPOI.
Using OLE Query, it's quite simple (e.g. sheetName is Sheet1):
DataTable LoadWorksheetInDataTable(string fileName, string sheetName)
{
DataTable sheetData = new DataTable();
using (OleDbConnection conn = this.returnConnection(fileName))
{
conn.Open();
// retrieve the data using data adapter
OleDbDataAdapter sheetAdapter = new OleDbDataAdapter("select * from [" + sheetName + "$]", conn);
sheetAdapter.Fill(sheetData);
conn.Close();
}
return sheetData;
}
private OleDbConnection returnConnection(string fileName)
{
return new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + fileName + "; Jet OLEDB:Engine Type=5;Extended Properties=\"Excel 8.0;\"");
}
For newer Excel versions:
return new OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + fileName + ";Extended Properties=Excel 12.0;");
You can also use Excel Data Reader an open source project on CodePlex. Its works really well to export data from Excel sheets.
The sample code given on the link specified:
FileStream stream = File.Open(filePath, FileMode.Open, FileAccess.Read);
//1. Reading from a binary Excel file ('97-2003 format; *.xls)
IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(stream);
//...
//2. Reading from a OpenXml Excel file (2007 format; *.xlsx)
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
//...
//3. DataSet - The result of each spreadsheet will be created in the result.Tables
DataSet result = excelReader.AsDataSet();
//...
//4. DataSet - Create column names from first row
excelReader.IsFirstRowAsColumnNames = true;
DataSet result = excelReader.AsDataSet();
//5. Data Reader methods
while (excelReader.Read())
{
//excelReader.GetInt32(0);
}
//6. Free resources (IExcelDataReader is IDisposable)
excelReader.Close();
Reference: How do I import from Excel to a DataSet using Microsoft.Office.Interop.Excel?
Try to use this free way to this, https://freenetexcel.codeplex.com
Workbook workbook = new Workbook();
workbook.LoadFromFile(#"..\..\parts.xls",ExcelVersion.Version97to2003);
//Initialize worksheet
Worksheet sheet = workbook.Worksheets[0];
DataTable dataTable = sheet.ExportDataTable();
If you can restrict it to just (Open Office XML format) *.xlsx files, then probably the most popular library would be EPPLus.
Bonus is, there are no other dependencies. Just install using nuget:
Install-Package EPPlus
Try to use Aspose.cells library (not free, but trial is enough to read), it is quite good
Install-package Aspose.cells
There is sample code:
using Aspose.Cells;
using System;
namespace ExcelReader
{
class Program
{
static void Main(string[] args)
{
// Replace path for your file
readXLS(#"C:\MyExcelFile.xls"); // or "*.xlsx"
Console.ReadKey();
}
public static void readXLS(string PathToMyExcel)
{
//Open your template file.
Workbook wb = new Workbook(PathToMyExcel);
//Get the first worksheet.
Worksheet worksheet = wb.Worksheets[0];
//Get cells
Cells cells = worksheet.Cells;
// Get row and column count
int rowCount = cells.MaxDataRow;
int columnCount = cells.MaxDataColumn;
// Current cell value
string strCell = "";
Console.WriteLine(String.Format("rowCount={0}, columnCount={1}", rowCount, columnCount));
for (int row = 0; row <= rowCount; row++) // Numeration starts from 0 to MaxDataRow
{
for (int column = 0; column <= columnCount; column++) // Numeration starts from 0 to MaxDataColumn
{
strCell = "";
strCell = Convert.ToString(cells[row, column].Value);
if (String.IsNullOrEmpty(strCell))
{
continue;
}
else
{
// Do your staff here
Console.WriteLine(strCell);
}
}
}
}
}
}
Read from excel, modify and write back
/// <summary>
/// /Reads an excel file and converts it into dataset with each sheet as each table of the dataset
/// </summary>
/// <param name="filename"></param>
/// <param name="headers">If set to true the first row will be considered as headers</param>
/// <returns></returns>
public DataSet Import(string filename, bool headers = true)
{
var _xl = new Excel.Application();
var wb = _xl.Workbooks.Open(filename);
var sheets = wb.Sheets;
DataSet dataSet = null;
if (sheets != null && sheets.Count != 0)
{
dataSet = new DataSet();
foreach (var item in sheets)
{
var sheet = (Excel.Worksheet)item;
DataTable dt = null;
if (sheet != null)
{
dt = new DataTable();
var ColumnCount = ((Excel.Range)sheet.UsedRange.Rows[1, Type.Missing]).Columns.Count;
var rowCount = ((Excel.Range)sheet.UsedRange.Columns[1, Type.Missing]).Rows.Count;
for (int j = 0; j < ColumnCount; j++)
{
var cell = (Excel.Range)sheet.Cells[1, j + 1];
var column = new DataColumn(headers ? cell.Value : string.Empty);
dt.Columns.Add(column);
}
for (int i = 0; i < rowCount; i++)
{
var r = dt.NewRow();
for (int j = 0; j < ColumnCount; j++)
{
var cell = (Excel.Range)sheet.Cells[i + 1 + (headers ? 1 : 0), j + 1];
r[j] = cell.Value;
}
dt.Rows.Add(r);
}
}
dataSet.Tables.Add(dt);
}
}
_xl.Quit();
return dataSet;
}
public string Export(DataTable dt, bool headers = false)
{
var wb = _xl.Workbooks.Add();
var sheet = (Excel.Worksheet)wb.ActiveSheet;
//process columns
for (int i = 0; i < dt.Columns.Count; i++)
{
var col = dt.Columns[i];
//added columns to the top of sheet
var currentCell = (Excel.Range)sheet.Cells[1, i + 1];
currentCell.Value = col.ToString();
currentCell.Font.Bold = true;
//process rows
for (int j = 0; j < dt.Rows.Count; j++)
{
var row = dt.Rows[j];
//added rows to sheet
var cell = (Excel.Range)sheet.Cells[j + 1 + 1, i + 1];
cell.Value = row[i];
}
currentCell.EntireColumn.AutoFit();
}
var fileName="{somepath/somefile.xlsx}";
wb.SaveCopyAs(fileName);
_xl.Quit();
return fileName;
}
I used Office's NuGet Package: DocumentFormat.OpenXml and pieced together the code from that component's doc site.
With the below helper code, was similar in complexity to my other CSV file format parsing in that project...
public static async Task ImportXLSX(Stream stream, string sheetName) {
{
// This was necessary for my Blazor project, which used a BrowserFileStream object
MemoryStream ms = new MemoryStream();
await stream.CopyToAsync(ms);
using (var document = SpreadsheetDocument.Open(ms, false))
{
// Retrieve a reference to the workbook part.
WorkbookPart wbPart = document.WorkbookPart;
// Find the sheet with the supplied name, and then use that
// Sheet object to retrieve a reference to the first worksheet.
Sheet theSheet = wbPart?.Workbook.Descendants<Sheet>().Where(s => s?.Name == sheetName).FirstOrDefault();
// Throw an exception if there is no sheet.
if (theSheet == null)
{
throw new ArgumentException("sheetName");
}
WorksheetPart wsPart = (WorksheetPart)(wbPart.GetPartById(theSheet.Id));
// For shared strings, look up the value in the
// shared strings table.
var stringTable =
wbPart.GetPartsOfType<SharedStringTablePart>()
.FirstOrDefault();
// I needed to grab 4 cells from each row
// Starting at row 11, until the cell in column A is blank
int row = 11;
while (true) {
var accountNameCell = GetCell(wsPart, "A" + row.ToString());
var accountName = GetValue(accountNameCell, stringTable);
if (string.IsNullOrEmpty(accountName)) {
break;
}
var investmentNameCell = GetCell(wsPart, "B" + row.ToString());
var investmentName = GetValue(investmentNameCell, stringTable);
var symbolCell = GetCell(wsPart, "D" + row.ToString());
var symbol = GetValue(symbolCell, stringTable);
var marketValue = GetCell(wsPart, "J" + row.ToString()).InnerText;
// DO STUFF with data
row++;
}
}
}
private static string? GetValue(Cell cell, SharedStringTablePart stringTable) {
try {
return stringTable.SharedStringTable.ElementAt(int.Parse(cell.InnerText)).InnerText;
} catch (Exception) {
return null;
}
}
private static Cell GetCell(WorksheetPart wsPart, string cellReference) {
return wsPart.Worksheet.Descendants<Cell>().Where(c => c.CellReference.Value == cellReference)?.FirstOrDefault();
}
I have been struggling to find a solution on how to read a large xlsx file with OpenXml. I have tried the microsoft samples without luck. I simply need to read an excel file into a DataTable in c#. I am not concerned with value types in the datatable, everything can be stored as a string values.
The samples I have found so far don't retain the structure of the spreadsheet and only return the values of the cells.
Any ideas?
The open xml SDK can be a little hard to understand. However, I have found it useful to use http://simpleooxml.codeplex.com/ this code plex project. It adds a thin layer over the sdk to more easily parse through excel files and work with styles.
Then you can use something like the following with their worksheet reader to recurse through and grab the values you want
System.IO.MemoryStream ms = Utility.StreamToMemory(xslxTemplate);
using (SpreadsheetDocument document = SpreadsheetDocument.Open(ms, true))
{
IEnumerable<Sheet> sheets = document.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
if (sheets.Count() == 0)
{
// The specified worksheet does not exist.
return null;
}
string relationshipId = sheets.First().Id.Value;
WorksheetPart worksheetPart = (WorksheetPart)document.WorkbookPart.GetPartById(relationshipId);
string myval =WorksheetReader.GetCell("A", 0, worksheetPart).CellValue.InnerText;
// Put in a loop to go through contents of document
}
You can get DataTable this way:
using (SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(fileName, false))
{
DataTable data = ToDataTable(spreadsheet, "Employees");
}
This method will read Excel sheet data as DataTable
public DataTable ToDataTable(SpreadsheetDocument spreadsheet, string worksheetName)
{
var workbookPart = spreadsheet.WorkbookPart;
var sheet = workbookPart
.Workbook
.Descendants<Sheet>()
.FirstOrDefault(s => s.Name == worksheetName);
var worksheetPart = sheet == null
? null
: workbookPart.GetPartById(sheet.Id) as WorksheetPart;
var dataTable = new DataTable();
if (worksheetPart != null)
{
var sheetData = worksheetPart.Worksheet.GetFirstChild<SheetData>();
foreach (Row row in sheetData.Descendants<Row>())
{
var values = row
.Descendants<Cell>()
.Select(cell =>
{
var value = cell.CellValue.InnerXml;
if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
{
value = workbookPart
.SharedStringTablePart
.SharedStringTable
.ChildElements[int.Parse(value)]
.InnerText;
}
return (object)value;
})
.ToArray();
dataTable.Rows.Add(values);
}
}
return dataTable;
}
I need to embed an Excel document in a Word document. I used the answer on this SO question -> How can I embed any file type into Microsoft Word using OpenXml 2.0;
Everything works fine except that:
DrawAspect = OVML.OleDrawAspectValues.Icon lets you edit the Excel object by opening a new Excel instance. However, when I edit the data, it is not updated in the Word document.
DrawAspect = OVML.OleDrawAspectValues.Content lets you edit the Excel object directly in the Word document.
My question is, what do I have to change in the code so can I edit the Excel object in the new instance and have it properly reflected in the Word document? I tried everything to no avail.
Something tells me that DrawAspect = OVML.OleDrawAspectValues.Icon suggests that the object acts as an Icon, and changes cannot be properly reflected in this icon.
You could try a way I prorosed here:
How to insert an Image in WORD after a bookmark using OpenXML.
In short, use Open XML SDK 2.0 Productivity Tool (which is a part of Open XML SDK 2.0). Do whatever you need to do with document manually in MS Word. Then open this file in Open XML SDK 2.0 Productivity Tool. Then find the edits you are interested in to see how it is represented in OpenXML format , as well as how to do that programmaticaly.
Hope that helps!
UPDATED:
Okay - I have got better now what's the problem is... So in addition to my advice above I would recommend you to look at this threads on MSDN Forum:
How to modify the imbedded Excel in a Word document supplying chart
data
Embedded excel sheet in word
I let myself to repost the code sample (posted on MSDN Forum by Ji Zhou) just to avoid the deletion of original thread there.
Hope it is helpful enough to retrieve the Excel object from Word, change some cells and embed it back into Word.
public static void Main()
{
using (WordprocessingDocument wDoc = WordprocessingDocument.Open(#"D:\test.docx", true))
{
Stream stream = wDoc.MainDocumentPart.ChartParts.First().EmbeddedPackagePart.GetStream();
using (SpreadsheetDocument ssDoc = SpreadsheetDocument.Open(stream, true))
{
WorkbookPart wbPart = ssDoc.WorkbookPart;
Sheet theSheet = wbPart.Workbook.Descendants<Sheet>().
Where(s => s.Name == "Sheet1").FirstOrDefault();
if (theSheet != null)
{
Worksheet ws = ((WorksheetPart)(wbPart.GetPartById(theSheet.Id))).Worksheet;
Cell theCell = InsertCellInWorksheet("C", 2, ws);
theCell.CellValue = new CellValue("5");
theCell.DataType = new EnumValue<CellValues>(CellValues.Number);
ws.Save();
}
}
}
}
private static Cell InsertCellInWorksheet(string columnName, uint rowIndex, Worksheet worksheet)
{
SheetData sheetData = worksheet.GetFirstChild<SheetData>();
string cellReference = columnName + rowIndex;
Row row;
if (sheetData.Elements<Row>().Where(r => r.RowIndex == rowIndex).Count() != 0)
{
row = sheetData.Elements<Row>().Where(r => r.RowIndex == rowIndex).First();
}
else
{
row = new Row() { RowIndex = rowIndex };
sheetData.Append(row);
}
if (row.Elements<Cell>().Where(c => c.CellReference.Value == columnName + rowIndex).Count() > 0)
{
return row.Elements<Cell>().Where(c => c.CellReference.Value == cellReference).First();
}
else
{
Cell refCell = null;
foreach (Cell cell in row.Elements<Cell>())
{
if (string.Compare(cell.CellReference.Value, cellReference, true) > 0)
{
refCell = cell;
break;
}
}
Cell newCell = new Cell() { CellReference = cellReference };
row.InsertBefore(newCell, refCell);
worksheet.Save();
return newCell;
}
}