Copying a worksheet in OpenXML that has previously been modified

Copying a worksheet in OpenXML that has previously been modified - c#

I have followed the routine used here to copy a worksheet in a workbook into the same workbook by using .AddPart<>() and a temporary workbook to serve as a middleman, but have come up with some issues. Prior to copying the worksheet, I have made modifications to it. However, the copied version of the worksheet retains the original sheet, not the modified one. Here's the code:
Document = SpreadsheetDocument.Open(filename, true);
SetupPages(0, nSections);
CopySheet(1);
SetupPages() simply gets a worksheet which has a table with only one row and then copies that row nSections - 1 times, so that the table ends up with nSections rows. This works.
CopySheet() has the following code:
private void CopySheet(int sNum)
{
var tempSheet = SpreadsheetDocument.Create(new MemoryStream(), SpreadsheetDocumentType.Workbook);
WorkbookPart tempWBP = tempSheet.AddWorkbookPart();
var part = Document.XGetWorkSheetPart(sNum);
WorksheetPart tempWSP = tempWBP.AddPart<WorksheetPart>(part);
var copy = Document.WorkbookPart.AddPart<WorksheetPart>(tempWSP);
var sheets = Document.WorkbookPart.Workbook.GetFirstChild<Sheets>();
var sheet = new Sheet();
sheet.Id = Document.WorkbookPart.GetIdOfPart(copy);
sheet.Name = "AAA";
sheet.SheetId = (uint)sheets.ChildElements.Count + 1;
sheets.Append(sheet);
}
This method gets the desired workSheetPart and uses tempSheet.AddPart<>() to perform a deep copy of the desired part and then uses it again to deep copy it back into the original document. This way all the referenced objects contained within the desired workSheetPart are also copied.
This works, in as much as I now have a copy of the workSheet in my original document. However, all the modifications made by SetupPages() prior to calling CopySheet() do not appear in the copied workSheet. So I now have the original Sheet with (in this example) 11 rows and the copied Sheet with only 1 row.
I have tried placing a Document.WorkBookPart.WorkBook.Save() between SetupPages() and CopySheet(), to see if "saving the changes" would make them available to be copied as well, but to no avail.
Is there any OpenXML trick I am unaware of?
EDIT: Using the debugger I have just noticed that part.WorkSheet.ChildElements.Count = 15 (the expected number of rows after SetupPages). However, tempWBP.WorkSheet.ChildElements.Count = 5 (the number of rows in the original sheet prior to the modifications of SetupPages). For whatever reason, the new Rows added by my program are not being deep-copied. Now this I just don't understand.
Should it have something to do with some improper binding of the rows on my part or something, here's the definition of SetupPages():
public void SetupPages(int p, int nSections)
{
Regex rn = new Regex("[0-9]+");
Regex rs = new Regex("[A-Z]+");
SheetData sData = Document.XGet<SheetData>(2 * p + 1);
for (uint i = 1; i < nSections; i++)
{
var r = sData.ChildElements.GetItem(4).Clone() as Row;
r.RowIndex.Value += i;
foreach (OpenXmlElement _c in r.ChildElements)
{
var c = _c as Cell;
Match mn = rn.Match(c.CellReference.Value);
Match ms = rs.Match(c.CellReference.Value);
string str = (int.Parse(mn.Value) + i).ToString();
c.CellReference.Value = ms.Value + str;
}
(r.FirstChild as Cell).CellValue = new CellValue((i + 1).ToString());
sData.Append(r);
}
}
EDIT 2: I've been able to make the copy work fully, but it's very unelegant and doesn't fix what is clearly an error in the code, but merely an emergency measure to pretend that there isn't a problem. I basically get a clone of the original SheetData (which contains the modifications done in SetupPages) and set it as the SheetData of the copied Sheet. I'm just posting this here for information purposes, but if someone could still please point out what is wrong in the code that I'm not seeing, I'll be quite thankful. Here's the "hacked" version of CopySheet(), if anyone is interested.
private void CopySheet(int sNum)
{
var tempSheet = SpreadsheetDocument.Create(new MemoryStream(), SpreadsheetDocumentType.Workbook);
WorkbookPart tempWBP = tempSheet.AddWorkbookPart();
var part = Document.XGetWorkSheetPart(sNum);
var b = part.Worksheet.ChildElements[5].Clone() as SheetData;
WorksheetPart tempWSP = tempWBP.AddPart<WorksheetPart>(part);
var copy = Document.WorkbookPart.AddPart<WorksheetPart>(tempWSP);
copy.Worksheet.RemoveChild<SheetData>(copy.Worksheet.ChildElements[5] as SheetData);
copy.Worksheet.InsertAt<SheetData>(b, 5);
var sheets = Document.WorkbookPart.Workbook.GetFirstChild<Sheets>();
var sheet = new Sheet();
sheet.Id = Document.WorkbookPart.GetIdOfPart(copy);
sheet.Name = "AAA";
sheet.SheetId = (uint)sheets.ChildElements.Count + 1;
sheets.Append(sheet);
}

I've solved this as best I can (and from what I've gathered from other forums I've asked, as best "they" can) by closing and opening the file prior to creating the copies. This makes the changes permanent and then, when copied, the changes are copied, too. With this, the "hack" described above becomes unnecessary. The final version of the code therefore became (with a change to avoid SheetID and Sheet.Name conflicts):
private void CopySheet(int sNum, int pNum, string type)
{
var tempSheet = SpreadsheetDocument.Create(new MemoryStream(), SpreadsheetDocumentType.Workbook);
WorkbookPart tempWBP = tempSheet.AddWorkbookPart();
var part = Document.XGetWorkSheetPart(sNum);
WorksheetPart tempWSP = tempWBP.AddPart<WorksheetPart>(part);
var copy = Document.WorkbookPart.AddPart<WorksheetPart>(tempWSP);
var sheets = Document.WorkbookPart.Workbook.GetFirstChild<Sheets>();
var sheet = new Sheet();
sheet.Id = Document.WorkbookPart.GetIdOfPart(copy);
sheet.Name = "Phase " + pNum + " " + type;
uint id = 1;
bool valid = false;
while (!valid)
{
uint temp = id;
foreach (OpenXmlElement e in sheets.ChildElements)
{
var s = e as Sheet;
if (id == s.SheetId.Value)
{
id++;
break;
}
}
if (temp == id)
valid = true;
}
sheet.SheetId = id;
sheets.Append(sheet);
}

Related

Reading docx file with table

I have a simple document with one table in it. I would like to read its cells content. I found many tutorials for writing, but none for reading.
I suppose I should enumerate sections, but how to know which contains a table?
var document = DocX.Create(#"mydoc.docx");
var s = document.GetSections();
foreach (var item in s)
{
}

I'm using the following namespace aliases:
using excel = Microsoft.Office.Interop.Excel;
using word = Microsoft.Office.Interop.Word;
You can specifically grab the tables using this code:
private void WordRunButton_Click(object sender, EventArgs e)
{
var excelApp = new excel.Application();
excel.Workbooks workbooks = excelApp.Workbooks;
var wordApp = new word.Application();
word.Documents documents = wordApp.Documents;
wordApp.Visible = false;
excelApp.Visible = false;
// You don't want your computer to actually load each one visibly; would ruin performance.
string[] fileDirectories = Directory.GetFiles("Some Directory", "*.doc*",
SearchOption.AllDirectories);
foreach (var item in fileDirectories)
{
word._Document document = documents.Open(item);
foreach (word.Table table in document.Tables)
{
string wordFile = item;
appendName = Path.GetFileNameWithoutExtension(wordFile) + " Table " + tableCount + ".xlsx";
//Not needed if you're not going to save each table individually
var workbook = excelApp.Workbooks.Add(1);
excel._Worksheet worksheet = (excel.Worksheet)workbook.Sheets[1];
for (int row = 1; row <= table.Rows.Count; row++)
{
for (int col = 1; col <= table.Columns.Count; col++)
{
var cell = table.Cell(row, col);
var range = cell.Range;
var text = range.Text;
var cleaned = excelApp.WorksheetFunction.Clean(text);
worksheet.Cells[row, col] = cleaned;
}
}
workbook.SaveAs(Path.Combine("Some Directory", Path.GetFileName(appendName)), excel.XlFileFormat.xlWorkbookDefault);
//Last arg can be whatever file extension you want
//just make sure it matches what you set above.
workbook.Close();
Marshal.ReleaseComObject(workbook);
tableCount++;
}
document.Close();
Marshal.ReleaseComObject(document);
}
//Microsoft apps are picky with memory. Make sure you close and release each instance once you're done with it.
//Failure to do so will result in many lingering apps in the background
excelApp.Application.Quit();
workbooks.Close();
excelApp.Quit();
Marshal.ReleaseComObject(workbooks);
Marshal.ReleaseComObject(excelApp);
wordApp.Application.Quit();
wordApp.Quit();
Marshal.ReleaseComObject(documents);
Marshal.ReleaseComObject(wordApp);
}
The document is the actual word document type (word.Document). Make sure you check for split cells if you have them!
Hope this helps!

If you only have one table in document it should be rather simple. Try this:
DocX doc = DocX.Load("C:\\Temp\\mydoc.docx");
Table t = doc.Table[0];
//read cell content
string someText = t.Rows[0].Cells[0].Paragraps[0].Text;
You can loop through table rows and table cells inside each row, and also through Paragraphs inside each Cells[i] if there are more paragraphs. You can do that with simple for loop:
for (int i = 0; i < t.Rows.Count; i++)
{
someText = t.Rows[i].Cells[0].Paragraphs[0].Text;
}
Hope it helps.

Reading large Excel files with c# and get the indexes

I tried to use Microsoft.Office.Interop.Excel but it's too slow when it comes to reading large excel documents (it was taking over 5 minutes for me). I read that DocumentFormat.OpenXml is faster when it comes to reading large excel documents but in the documentation it doesn't appear that I can't store the columns and row indexes.
For now, I am also only interested in the first row to get the column headers and I will be reading the rest of the document after some logic. I haven't been able to find a way to read only a portion of the excel document. I want to do something similar to this:
int r = 1; //row index
int c = 1; //column index
while (xlRange.Cells[r,c] != null && xlRange.Cells[r, c].Value2 != null)
{
TagListData.Add(new TagClass { IsTagSelected = false, TagName = xlRange[r, c].Value2.toString(), rIndex = r, cIndex = c });
c += 3;
}
Users will be picking excel documents through openFileDialog so there's no fixed number of rows of columns I can use. Is there a way I could make this work?
Thank you

In OpenXML if a cell has no text it might or might not appear in the list of cells (depends on whether it ever had text or not). Therefore the while (...Value2 != null) type of approach isn't really a safe way to do things in OpenXML.
Here is a very simple approach to reading the first row (written using LINQPad hence the Main and the Dump). Note the (simplified) use of the SharedStringTable to get the real text of the cell:
void Main()
{
var fileName = #"c:\temp\openxml-read-row.xlsx";
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(fs, false))
{
// Get the necessary bits of the doc
WorkbookPart workbookPart = doc.WorkbookPart;
SharedStringTablePart sstpart = workbookPart.GetPartsOfType<SharedStringTablePart>().First();
SharedStringTable sst = sstpart.SharedStringTable;
WorkbookStylesPart ssp = workbookPart.GetPartsOfType<WorkbookStylesPart>().First();
Stylesheet ss = ssp.Stylesheet;
// Get the first worksheet
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
Worksheet sheet = worksheetPart.Worksheet;
var rows = sheet.Descendants<Row>();
var row = rows.First();
foreach (var cell in row.Descendants<Cell>())
{
var txt = GetCellText(cell, sst);
// LINQPad specific method .Dump()
$"{cell.CellReference} = {txt}".Dump();
}
}
}
}
// Very basic way to get the text of a cell
private string GetCellText(Cell cell, SharedStringTable sst)
{
if (cell == null)
return "";
if ((cell.DataType != null) && (cell.DataType == CellValues.SharedString))
{
int ssid = int.Parse(cell.CellValue.Text);
string str = sst.ChildElements[ssid].InnerText;
return str;
}
else if (cell.CellValue != null)
{
return cell.CellValue.Text;
}
return "";
}
However... there's potentially a lot of work involved with OpenXML and you'd be well advised to try and use something like ClosedXML or EPPlus instead.
eg using ClosedXML
using (var workbook = new XLWorkbook(fileName))
{
var worksheet = workbook.Worksheets.First();
var row = worksheet.Row(1);
foreach (var cell in row.CellsUsed())
{
var txt = cell.Value.ToString();
// LINQPad specific method .Dump()
$"{cell.Address.ToString()} = {txt}".Dump();
}
}

Split a large Excel file into multiple, based on row count

I have a C# console application which needs a large Excel to be split into multiple Excel files based on the row count. The code below shows a source file with only 51 rows (including the header column rows) but the final source file will have 100,000+ rows.
The code is trying to skip the very first (header) row and then should copy from rows 2 through 11 and so on--I have the target files set to only 10 rows per file, to make developing faster.
Question So how do I copy rows 2 through 11 and subsequent 10 rows from the source Excel file and paste to multiple target Excel files so that the target files each will have 10 rows?
Here is the almost newly written code. It is loosely based on copying of specific range of excel cells from one worksheet to another worksheet and https://social.msdn.microsoft.com/Forums/vstudio/en-US/afd01976-63d0-4f96-9ba4-e3e2b6cf8d55/excel-with-c-how-to-specify-a-range-?forum=vsto
Now I am able to write 5 Excel files. But the first file has 9 rows (starting from row 2) while 2nd file has only 3 rows, starting with row 10, the 3rd has 13 rows starting, again, with row 10; the last two files have incrementally more rows, both starting with row 10.
So something wrong with my For Loop? Or the way I am selecting the ranges?
string startPath = System.IO.Path.GetDirectoryName(System.Diagnostics.Process.GetCurrentProcess().MainModule.FileName);
string filePath_source = Path.Combine(startPath, #"Source_Files\Offers_Source_Temp.xlsx");
string filePath_copiedinto = Path.Combine(startPath, #"Source_Files\ToBeCopiedInto.xlsx");
app = new Excel.Application();
app.DisplayAlerts = false;
book = app.Workbooks.Open(filePath_source);
sheet = (Excel.Worksheet)book.Worksheets.get_Item((1));
int iRowCount = sheet.UsedRange.Rows.Count;
int maxrows = 10;//change this to something like 50,000 later. 01/16/18
int maxloops = iRowCount / maxrows;
int beginrow = 2; //skipping the header row.
Excel.Application destxlApp;
Excel.Workbook destworkBook;
Excel.Worksheet destworkSheet;
Excel.Range destrange;
string srcPath;
string destPath;
//Opening of first worksheet and copying
srcPath = filePath_source;
for (int i = 1; i <= maxloops; i++) {
Excel.Range rng = (Excel.Range)sheet.Range[sheet.Cells[beginrow, 1], sheet.Cells[maxrows, 3]];
rng.Copy(Type.Missing);
//opening of the second worksheet and pasting
destPath = filePath_copiedinto;
destxlApp = new Excel.Application();
destxlApp.DisplayAlerts = false;
destworkBook = destxlApp.Workbooks.Open(destPath, 0, false);
destworkSheet = destworkBook.Worksheets.get_Item(1);
destrange = destworkSheet.Cells[1, 1];
destrange.Select();
destworkSheet.Paste(Type.Missing, Type.Missing);
destworkBook.SaveAs(startPath + "\\Output_Files\\" + beginrow + ".xlsx");
destworkBook.Close(true, null, null);
destxlApp.Quit();
beginrow = beginrow + maxrows;
string blah = null;
}

I would suggest to use OpenXml library to do that task. It is dependency free and supports the whole OpenXml structure.
Here a starting point how to read/write the rows:
using System;
using System.Linq;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
// Open the document for editing.
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
foreach (Row r in sheetData.Elements<Row>())
{
}
}
Now, writing is very similar:
using (SpreadsheetDocument spreadSheet = SpreadsheetDocument.Create(fileName),
SpreadsheetDocumentType.Workbook))
{
// create the workbook
spreadSheet.AddWorkbookPart();
spreadSheet.WorkbookPart.Workbook = new Workbook (); // create the worksheet
spreadSheet.WorkbookPart.AddNewPart<WorksheetPart>();
spreadSheet.WorkbookPart.WorksheetParts.First().Worksheet = new Worksheet();
// create sheet data
spreadSheet.WorkbookPart.WorksheetParts.First().Worksheet.AppendChild(new SheetData());
// create row
spreadSheet.WorkbookPart.WorksheetParts.First().Worksheet.First().AppendChild(new Row());
}

Got it! In my revised code in the Question, I came close but had some problem in the For Loop; fixed it per the code below. So here is the almost complete code. Thanks everyone for your help!!
try
{
string startPath = System.IO.Path.GetDirectoryName(System.Diagnostics.Process.GetCurrentProcess().MainModule.FileName);
string filePath_source = Path.Combine(startPath, #"Source_Files\Offers_Source_Temp.xlsx");
string filePath_copiedinto = Path.Combine(startPath, #"Source_Files\ToBeCopiedInto.xlsx");
app = new Excel.Application();
app.DisplayAlerts = false;
book = app.Workbooks.Open(filePath_source);
sheet = (Excel.Worksheet)book.Worksheets.get_Item((1));
int iRowCount = sheet.UsedRange.Rows.Count;
int countColumns = sheet.UsedRange.Columns.Count;
int maxrows = 10;//change this to something like 50,000 later. 01/16/18
int maxloops = iRowCount / maxrows;
int beginrow = 2; //skipping the header row.
Excel.Application destxlApp;
Excel.Workbook destworkBook;
Excel.Worksheet destworkSheet;
Excel.Range destrange;
string srcPath;
string destPath;
//Opening of first worksheet and copying
srcPath = filePath_source;
for (int i = 1; i <= maxloops; i++) {
/// Excel.Range rng = (Excel.Range)sheet.Range[sheet.Cells[beginrow, 1], sheet.Cells[maxrows, 3]];
Excel.Range startCell = sheet.Cells[beginrow, 1];//not sure the second parameter needed?
Excel.Range endCell = sheet.Cells[beginrow+maxrows-1, 3];//not sure the second parameter needed?
Excel.Range rng = sheet.Range[startCell, endCell];
rng = rng.EntireRow;//so second parameters above should not be needed. But doesn't work without it!
rng.Copy(Type.Missing);
//opening of the second worksheet and pasting
destPath = filePath_copiedinto;
destxlApp = new Excel.Application();
destxlApp.DisplayAlerts = false;
destworkBook = destxlApp.Workbooks.Open(destPath, 0, false);
destworkSheet = destworkBook.Worksheets.get_Item(1);
destrange = destworkSheet.Cells[1, 1];
destrange.Select();
destworkSheet.Paste(Type.Missing, Type.Missing);
destworkBook.SaveAs(startPath + "\\Output_Files\\" + beginrow + ".xlsx");
destworkBook.Close(true, null, null);
destxlApp.Quit();
beginrow = beginrow + maxrows;
}//for loop
}

openxml 2.5, how to insert a string into a cell?

I have been trying for a couple of days now to insert a string into an openxml spreadsheet. Everything else (so far) works, everything but that.
This is the code i'm currently running (note, this is purely for testing purposes and is pretty basic):
using (SpreadsheetDocument spreadSheet = SpreadsheetDocument.Create(file + "test.zip", SpreadsheetDocumentType.Workbook))
{
spreadSheet.AddWorkbookPart();
spreadSheet.WorkbookPart.Workbook = new Workbook();
spreadSheet.WorkbookPart.AddNewPart<SharedStringTablePart>();
spreadSheet.WorkbookPart.SharedStringTablePart.SharedStringTable = new SharedStringTable() {Count=1, UniqueCount=1};
spreadSheet.WorkbookPart.SharedStringTablePart.SharedStringTable.AppendChild(new SharedStringItem(new Text("test")));
spreadSheet.WorkbookPart.SharedStringTablePart.SharedStringTable.Save();
preadSheet.WorkbookPart.AddNewPart<WorksheetPart>();
spreadSheet.WorkbookPart.WorksheetParts.First().Worksheet = new Worksheet();
spreadSheet.WorkbookPart.WorksheetParts.First().Worksheet.AppendChild(new SheetData());
spreadSheet.WorkbookPart.WorksheetParts.First().Worksheet.First().AppendChild(new Row());
Row r2 = new Row() { RowIndex = 5 };
spreadSheet.WorkbookPart.WorksheetParts.First().Worksheet.First().AppendChild(r2);
r2.AppendChild(new Cell() { CellReference = "A5", CellValue = new CellValue("0"), DataType = new EnumValue<CellValues>(CellValues.SharedString) });
spreadSheet.WorkbookPart.WorksheetParts.First().Worksheet.Save();
spreadSheet.WorkbookPart.Workbook.GetFirstChild<Sheets>().AppendChild(new Sheet()
{
Id = spreadSheet.WorkbookPart.GetIdOfPart(spreadSheet.WorkbookPart.WorksheetParts.First()),
SheetId = 1,
Name = "test"
});
spreadSheet.WorkbookPart.Workbook.Save();
}
Everything seems to work, the file saves where i want it to and, generally, looks the way i expect it to. The "only" issue is that, when i add the string to the cell, excel will give me an error saying that the file is corrupt and continues to delete said cell.
Am i doing something wrong?

Try this, in your method..
if (spreadSheet.WorkbookPart.GetPartsOfType<SharedStringTablePart>().Count() > 0)
{
shareStringPart = spreadSheet.WorkbookPart.GetPartsOfType<SharedStringTablePart>().First();
}
else
{
shareStringPart = spreadSheet.WorkbookPart.AddNewPart<SharedStringTablePart>();
}
index = InsertSharedStringItem(cell_value, shareStringPart);
cell.CellValue = new CellValue(index.ToString());
cell.DataType = new EnumValue<CellValues>(CellValues.SharedString);
InsertSharedString Method:
private static int InsertSharedStringItem(string text, SharedStringTablePart shareStringPart)
{
// If the part does not contain a SharedStringTable, create one.
if (shareStringPart.SharedStringTable == null)
{
shareStringPart.SharedStringTable = new SharedStringTable();
}
int i = 0;
// Iterate through all the items in the SharedStringTable. If the text already exists, return its index.
foreach (SharedStringItem item in shareStringPart.SharedStringTable.Elements<SharedStringItem>())
{
if (item.InnerText == text)
{
return i;
}
i++;
}
// The text does not exist in the part. Create the SharedStringItem and return its index.
shareStringPart.SharedStringTable.AppendChild(new SharedStringItem(new DocumentFormat.OpenXml.Spreadsheet.Text(text)));
shareStringPart.SharedStringTable.Save();
return i;
}

I think you should create a spreadsheet (using Excel), add the text into the cell and then open this spreadsheet in "OpenXML 2.5 Productivity Tool". There is a "Reflect code" button in the productivity tool that would help you replicate in code what needs to be done. That's the easiest way, I've found to solve such bugs.

Optimal way to Read an Excel file (.xls/.xlsx)

I know that there are different ways to read an Excel file:
Iterop
Oledb
Open Xml SDK
Compatibility is not a question because the program will be executed in a controlled environment.
My Requirement :
Read a file to a DataTable / CUstom Entities (I don't know how to make dynamic properties/fields to an object[column names will be variating in an Excel file])
Use DataTable/Custom Entities to perform some operations using its data.
Update DataTable with the results of the operations
Write it back to excel file.
Which would be simpler.
Also if possible advice me on custom Entities (adding properties/fields to an object dynamically)

Take a look at Linq-to-Excel. It's pretty neat.
var book = new LinqToExcel.ExcelQueryFactory(#"File.xlsx");
var query =
from row in book.Worksheet("Stock Entry")
let item = new
{
Code = row["Code"].Cast<string>(),
Supplier = row["Supplier"].Cast<string>(),
Ref = row["Ref"].Cast<string>(),
}
where item.Supplier == "Walmart"
select item;
It also allows for strongly-typed row access too.

I realize this question was asked nearly 7 years ago but it's still a top Google search result for certain keywords regarding importing excel data with C#, so I wanted to provide an alternative based on some recent tech developments.
Importing Excel data has become such a common task to my everyday duties, that I've streamlined the process and documented the method on my blog: best way to read excel file in c#.
I use NPOI because it can read/write Excel files without Microsoft Office installed and it doesn't use COM+ or any interops. That means it can work in the cloud!
But the real magic comes from pairing up with NPOI Mapper from Donny Tian because it allows me to map the Excel columns to properties in my C# classes without writing any code. It's beautiful.
Here is the basic idea:
I create a .net class that matches/maps the Excel columns I'm interested in:
class CustomExcelFormat
{
[Column("District")]
public int District { get; set; }
[Column("DM")]
public string FullName { get; set; }
[Column("Email Address")]
public string EmailAddress { get; set; }
[Column("Username")]
public string Username { get; set; }
public string FirstName
{
get
{
return Username.Split('.')[0];
}
}
public string LastName
{
get
{
return Username.Split('.')[1];
}
}
}
Notice, it allows me to map based on column name if I want to!
Then when I process the excel file all I need to do is something like this:
public void Execute(string localPath, int sheetIndex)
{
IWorkbook workbook;
using (FileStream file = new FileStream(localPath, FileMode.Open, FileAccess.Read))
{
workbook = WorkbookFactory.Create(file);
}
var importer = new Mapper(workbook);
var items = importer.Take<CustomExcelFormat>(sheetIndex);
foreach(var item in items)
{
var row = item.Value;
if (string.IsNullOrEmpty(row.EmailAddress))
continue;
UpdateUser(row);
}
DataContext.SaveChanges();
}
Now, admittedly, my code does not modify the Excel file itself. I am instead saving the data to a database using Entity Framework (that's why you see "UpdateUser" and "SaveChanges" in my example). But there is already a good discussion on SO about how to save/modify a file using NPOI.

Using OLE Query, it's quite simple (e.g. sheetName is Sheet1):
DataTable LoadWorksheetInDataTable(string fileName, string sheetName)
{
DataTable sheetData = new DataTable();
using (OleDbConnection conn = this.returnConnection(fileName))
{
conn.Open();
// retrieve the data using data adapter
OleDbDataAdapter sheetAdapter = new OleDbDataAdapter("select * from [" + sheetName + "$]", conn);
sheetAdapter.Fill(sheetData);
conn.Close();
}
return sheetData;
}
private OleDbConnection returnConnection(string fileName)
{
return new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + fileName + "; Jet OLEDB:Engine Type=5;Extended Properties=\"Excel 8.0;\"");
}
For newer Excel versions:
return new OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + fileName + ";Extended Properties=Excel 12.0;");
You can also use Excel Data Reader an open source project on CodePlex. Its works really well to export data from Excel sheets.
The sample code given on the link specified:
FileStream stream = File.Open(filePath, FileMode.Open, FileAccess.Read);
//1. Reading from a binary Excel file ('97-2003 format; *.xls)
IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(stream);
//...
//2. Reading from a OpenXml Excel file (2007 format; *.xlsx)
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
//...
//3. DataSet - The result of each spreadsheet will be created in the result.Tables
DataSet result = excelReader.AsDataSet();
//...
//4. DataSet - Create column names from first row
excelReader.IsFirstRowAsColumnNames = true;
DataSet result = excelReader.AsDataSet();
//5. Data Reader methods
while (excelReader.Read())
{
//excelReader.GetInt32(0);
}
//6. Free resources (IExcelDataReader is IDisposable)
excelReader.Close();
Reference: How do I import from Excel to a DataSet using Microsoft.Office.Interop.Excel?

Try to use this free way to this, https://freenetexcel.codeplex.com
Workbook workbook = new Workbook();
workbook.LoadFromFile(#"..\..\parts.xls",ExcelVersion.Version97to2003);
//Initialize worksheet
Worksheet sheet = workbook.Worksheets[0];
DataTable dataTable = sheet.ExportDataTable();

If you can restrict it to just (Open Office XML format) *.xlsx files, then probably the most popular library would be EPPLus.
Bonus is, there are no other dependencies. Just install using nuget:
Install-Package EPPlus

Try to use Aspose.cells library (not free, but trial is enough to read), it is quite good
Install-package Aspose.cells
There is sample code:
using Aspose.Cells;
using System;
namespace ExcelReader
{
class Program
{
static void Main(string[] args)
{
// Replace path for your file
readXLS(#"C:\MyExcelFile.xls"); // or "*.xlsx"
Console.ReadKey();
}
public static void readXLS(string PathToMyExcel)
{
//Open your template file.
Workbook wb = new Workbook(PathToMyExcel);
//Get the first worksheet.
Worksheet worksheet = wb.Worksheets[0];
//Get cells
Cells cells = worksheet.Cells;
// Get row and column count
int rowCount = cells.MaxDataRow;
int columnCount = cells.MaxDataColumn;
// Current cell value
string strCell = "";
Console.WriteLine(String.Format("rowCount={0}, columnCount={1}", rowCount, columnCount));
for (int row = 0; row <= rowCount; row++) // Numeration starts from 0 to MaxDataRow
{
for (int column = 0; column <= columnCount; column++) // Numeration starts from 0 to MaxDataColumn
{
strCell = "";
strCell = Convert.ToString(cells[row, column].Value);
if (String.IsNullOrEmpty(strCell))
{
continue;
}
else
{
// Do your staff here
Console.WriteLine(strCell);
}
}
}
}
}
}

Read from excel, modify and write back
/// <summary>
/// /Reads an excel file and converts it into dataset with each sheet as each table of the dataset
/// </summary>
/// <param name="filename"></param>
/// <param name="headers">If set to true the first row will be considered as headers</param>
/// <returns></returns>
public DataSet Import(string filename, bool headers = true)
{
var _xl = new Excel.Application();
var wb = _xl.Workbooks.Open(filename);
var sheets = wb.Sheets;
DataSet dataSet = null;
if (sheets != null && sheets.Count != 0)
{
dataSet = new DataSet();
foreach (var item in sheets)
{
var sheet = (Excel.Worksheet)item;
DataTable dt = null;
if (sheet != null)
{
dt = new DataTable();
var ColumnCount = ((Excel.Range)sheet.UsedRange.Rows[1, Type.Missing]).Columns.Count;
var rowCount = ((Excel.Range)sheet.UsedRange.Columns[1, Type.Missing]).Rows.Count;
for (int j = 0; j < ColumnCount; j++)
{
var cell = (Excel.Range)sheet.Cells[1, j + 1];
var column = new DataColumn(headers ? cell.Value : string.Empty);
dt.Columns.Add(column);
}
for (int i = 0; i < rowCount; i++)
{
var r = dt.NewRow();
for (int j = 0; j < ColumnCount; j++)
{
var cell = (Excel.Range)sheet.Cells[i + 1 + (headers ? 1 : 0), j + 1];
r[j] = cell.Value;
}
dt.Rows.Add(r);
}
}
dataSet.Tables.Add(dt);
}
}
_xl.Quit();
return dataSet;
}
public string Export(DataTable dt, bool headers = false)
{
var wb = _xl.Workbooks.Add();
var sheet = (Excel.Worksheet)wb.ActiveSheet;
//process columns
for (int i = 0; i < dt.Columns.Count; i++)
{
var col = dt.Columns[i];
//added columns to the top of sheet
var currentCell = (Excel.Range)sheet.Cells[1, i + 1];
currentCell.Value = col.ToString();
currentCell.Font.Bold = true;
//process rows
for (int j = 0; j < dt.Rows.Count; j++)
{
var row = dt.Rows[j];
//added rows to sheet
var cell = (Excel.Range)sheet.Cells[j + 1 + 1, i + 1];
cell.Value = row[i];
}
currentCell.EntireColumn.AutoFit();
}
var fileName="{somepath/somefile.xlsx}";
wb.SaveCopyAs(fileName);
_xl.Quit();
return fileName;
}

I used Office's NuGet Package: DocumentFormat.OpenXml and pieced together the code from that component's doc site.
With the below helper code, was similar in complexity to my other CSV file format parsing in that project...
public static async Task ImportXLSX(Stream stream, string sheetName) {
{
// This was necessary for my Blazor project, which used a BrowserFileStream object
MemoryStream ms = new MemoryStream();
await stream.CopyToAsync(ms);
using (var document = SpreadsheetDocument.Open(ms, false))
{
// Retrieve a reference to the workbook part.
WorkbookPart wbPart = document.WorkbookPart;
// Find the sheet with the supplied name, and then use that
// Sheet object to retrieve a reference to the first worksheet.
Sheet theSheet = wbPart?.Workbook.Descendants<Sheet>().Where(s => s?.Name == sheetName).FirstOrDefault();
// Throw an exception if there is no sheet.
if (theSheet == null)
{
throw new ArgumentException("sheetName");
}
WorksheetPart wsPart = (WorksheetPart)(wbPart.GetPartById(theSheet.Id));
// For shared strings, look up the value in the
// shared strings table.
var stringTable =
wbPart.GetPartsOfType<SharedStringTablePart>()
.FirstOrDefault();
// I needed to grab 4 cells from each row
// Starting at row 11, until the cell in column A is blank
int row = 11;
while (true) {
var accountNameCell = GetCell(wsPart, "A" + row.ToString());
var accountName = GetValue(accountNameCell, stringTable);
if (string.IsNullOrEmpty(accountName)) {
break;
}
var investmentNameCell = GetCell(wsPart, "B" + row.ToString());
var investmentName = GetValue(investmentNameCell, stringTable);
var symbolCell = GetCell(wsPart, "D" + row.ToString());
var symbol = GetValue(symbolCell, stringTable);
var marketValue = GetCell(wsPart, "J" + row.ToString()).InnerText;
// DO STUFF with data
row++;
}
}
}
private static string? GetValue(Cell cell, SharedStringTablePart stringTable) {
try {
return stringTable.SharedStringTable.ElementAt(int.Parse(cell.InnerText)).InnerText;
} catch (Exception) {
return null;
}
}
private static Cell GetCell(WorksheetPart wsPart, string cellReference) {
return wsPart.Worksheet.Descendants<Cell>().Where(c => c.CellReference.Value == cellReference)?.FirstOrDefault();
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Copying a worksheet in OpenXML that has previously been modified - c#

Related

Reading docx file with table

Reading large Excel files with c# and get the indexes

Split a large Excel file into multiple, based on row count

openxml 2.5, how to insert a string into a cell?

Optimal way to Read an Excel file (.xls/.xlsx)

Categories

Resources