Reading docx file with table - c#

I have a simple document with one table in it. I would like to read its cells content. I found many tutorials for writing, but none for reading.
I suppose I should enumerate sections, but how to know which contains a table?
var document = DocX.Create(#"mydoc.docx");
var s = document.GetSections();
foreach (var item in s)
{
}

I'm using the following namespace aliases:
using excel = Microsoft.Office.Interop.Excel;
using word = Microsoft.Office.Interop.Word;
You can specifically grab the tables using this code:
private void WordRunButton_Click(object sender, EventArgs e)
{
var excelApp = new excel.Application();
excel.Workbooks workbooks = excelApp.Workbooks;
var wordApp = new word.Application();
word.Documents documents = wordApp.Documents;
wordApp.Visible = false;
excelApp.Visible = false;
// You don't want your computer to actually load each one visibly; would ruin performance.
string[] fileDirectories = Directory.GetFiles("Some Directory", "*.doc*",
SearchOption.AllDirectories);
foreach (var item in fileDirectories)
{
word._Document document = documents.Open(item);
foreach (word.Table table in document.Tables)
{
string wordFile = item;
appendName = Path.GetFileNameWithoutExtension(wordFile) + " Table " + tableCount + ".xlsx";
//Not needed if you're not going to save each table individually
var workbook = excelApp.Workbooks.Add(1);
excel._Worksheet worksheet = (excel.Worksheet)workbook.Sheets[1];
for (int row = 1; row <= table.Rows.Count; row++)
{
for (int col = 1; col <= table.Columns.Count; col++)
{
var cell = table.Cell(row, col);
var range = cell.Range;
var text = range.Text;
var cleaned = excelApp.WorksheetFunction.Clean(text);
worksheet.Cells[row, col] = cleaned;
}
}
workbook.SaveAs(Path.Combine("Some Directory", Path.GetFileName(appendName)), excel.XlFileFormat.xlWorkbookDefault);
//Last arg can be whatever file extension you want
//just make sure it matches what you set above.
workbook.Close();
Marshal.ReleaseComObject(workbook);
tableCount++;
}
document.Close();
Marshal.ReleaseComObject(document);
}
//Microsoft apps are picky with memory. Make sure you close and release each instance once you're done with it.
//Failure to do so will result in many lingering apps in the background
excelApp.Application.Quit();
workbooks.Close();
excelApp.Quit();
Marshal.ReleaseComObject(workbooks);
Marshal.ReleaseComObject(excelApp);
wordApp.Application.Quit();
wordApp.Quit();
Marshal.ReleaseComObject(documents);
Marshal.ReleaseComObject(wordApp);
}
The document is the actual word document type (word.Document). Make sure you check for split cells if you have them!
Hope this helps!

If you only have one table in document it should be rather simple. Try this:
DocX doc = DocX.Load("C:\\Temp\\mydoc.docx");
Table t = doc.Table[0];
//read cell content
string someText = t.Rows[0].Cells[0].Paragraps[0].Text;
You can loop through table rows and table cells inside each row, and also through Paragraphs inside each Cells[i] if there are more paragraphs. You can do that with simple for loop:
for (int i = 0; i < t.Rows.Count; i++)
{
someText = t.Rows[i].Cells[0].Paragraphs[0].Text;
}
Hope it helps.

Related

Extract values from a list and save it to an existing Excel sheet in c#

I have an existing Excel sheet which has headers. I get data from my server and place it in my WPF DataGrid and it looks like this:
On a click of a button, I need to place the values from my list to a particular sheet in my existing Excel workbook. I can actually get the values from a WINFORM DataGrid like this:
var xlApp = new Excel.Application();
Excel.Worksheet sheet = new Excel.Worksheet();
xlApp.Visible = true;
var path = #"D:\Reports\Tag_History.xlsx";
sheet = xlApp.Application.Workbooks.Open(path).Worksheets["Summary"];
var rowCount = dataGrid.Items.Count;
var rowColumn = dataGrid.Columns.Count;
for (int i = 0; i < rowCount - 1; i++)
{
for (int j = 0; j < 7; j++)
{
if (dataGrid[j, i].ValueType == typeof(string))
{
xlsht.Cells[i + 2, j + 1] = "'" + dataGrid[j, i].Value.ToString();
}
else
{
xlsht.Cells[i + 2, j + 1] = dataGrid[j, i].Value.ToString();
}
}
}
but since I am trying to do this in WPF, this code does not work anymore. This is by transferring dataGrid data to an existing excel file. Since I think that transferring list to an existing excel file is better, I have to try this. This is what I have so far:
var xlApp = new Excel.Application();
Excel.Worksheet sheet = new Excel.Worksheet();
xlApp.Visible = true;
var path = #"D:\Reports\Tag_History.xlsx";
sheet = xlApp.Application.Workbooks.Open(path).Worksheets["Summary"];
var range = sheet.Range["A2", "A2"];
foreach (var item in summaryList)
{
range.Value2 = item.TagNumber;
}
This code works but it is only updating a single cell of the excel file.
Can you please show me how to do this? Thank you.
Install Microsoft.Office.Interop.Excel Nuget package in your application. Right-click on your project -> "References" and choose "Manage NuGet Packages...", then just search for Excel. Otherwise, select Tools -> Nuget Package Manager -> Package Manager Console -> then install the Excel nuget (https://www.nuget.org/packages/Microsoft.Office.Interop.Excel/).
Bind the items in DataGrid and then export data to excel as like below,
private void btnExport_Click(object sender, RoutedEventArgs e)
{
Microsoft.Office.Interop.Excel.Application excel = null;
Microsoft.Office.Interop.Excel.Workbook wb = null;
object missing = Type.Missing;
Microsoft.Office.Interop.Excel.Worksheet ws = null;
Microsoft.Office.Interop.Excel.Range rng = null;
// collection of DataGrid Items
var dtExcelDataTable = ExcelTimeReport(txtFrmDte.Text, txtToDte.Text, strCondition);
excel = new Microsoft.Office.Interop.Excel.Application();
wb = excel.Workbooks.Add();
ws = (Microsoft.Office.Interop.Excel.Worksheet)wb.ActiveSheet;
ws.Columns.AutoFit();
ws.Columns.EntireColumn.ColumnWidth = 25;
// Header row
for (int Idx = 0; Idx < dtExcelDataTable.Columns.Count; Idx++)
{
ws.Range["A1"].Offset[0, Idx].Value = dtExcelDataTable.Columns[Idx].ColumnName;
}
// Data Rows
for (int Idx = 0; Idx < dtExcelDataTable.Rows.Count; Idx++)
{
ws.Range["A2"].Offset[Idx].Resize[1, dtExcelDataTable.Columns.Count].Value = dtExcelDataTable.Rows[Idx].ItemArray;
}
excel.Visible = true;
wb.Activate();
wb.SaveCopyAs("excel file location");
wb.Saved = true;
excel.Quit();
}

Trying to read excel file with epplus and getting System.NullException error?

Edit
Based on the replies below, the error I am experiencing may or may not be causing my inability to read my excel file. That is, I am not getting data from the line worksheet.Cells[row,col].Value in my for loop given below.
Problem
I am trying to return a DataTable with information from an excel file. Specifically, it is an xlsx file from 2013 excel I believe. Please see the code below:
private DataTable ImportToDataTable(string Path)
{
DataTable dt = new DataTable();
FileInfo fi = new FileInfo(Path);
if(!fi.Exists)
{
throw new Exception("File " + Path + " Does not exist.");
}
using (ExcelPackage xlPackage = new ExcelPackage(fi))
{
//Get the worksheet in the workbook
ExcelWorksheet worksheet = xlPackage.Workbook.Worksheets.First();
//Obtain the worksheet size
ExcelCellAddress startCell = worksheet.Dimension.Start;
ExcelCellAddress endCell = worksheet.Dimension.End;
//Create the data column
for(int col = startCell.Column; col <= endCell.Column; col++)
{
dt.Columns.Add(col.ToString());
}
for(int row = startCell.Row; row <= endCell.Row; row++)
{
DataRow dr = dt.NewRow(); //Create a row
int i = 0;
for(int col = startCell.Column; col <= endCell.Column; col++)
{
dr[i++] = worksheet.Cells[row, col].Value.ToString();
}
dt.Rows.Add(dr);
}
}
return dt;
}
Error
This is where things get weird. I can see the proper value in startCell and endCell. However, when I look at worksheet I take a peek under Cells and I see something I don't understand:
worksheet.Cells.Current' threw an exception of type 'System.NullReferenceException
Attempts
Reformatting my excel with general fields.
Making sure no field in my excel was empty
RTFM'ed epplus documentation. Nothing suggestive of this error.
Looked at EPPlus errors on stackoverflow. My problem is unique.
Honestly, I am having trouble figuring out what this error is really saying? Is something wrong with my format? Is something wrong with epplus? I have read on here people had no problems with 2013 xlsx with eeplus and I am only trying to parse the excel file by row. If someone could help me shed light on what this error means and how to rectify it. I would be most grateful. I've spent quite a long time trying to figure this out.
When we give:
dr[i++] = worksheet.Cells[row, col].Value.ToString();
it search for value at that column, if the column is empty, it gives Null reference error.
Try instead:
dr[i++] = worksheet.Cells[row, col].Text;
Hope this will help
Like #Thorians said, current is really meant to use when you enumerating the cells. If you want to use it in purest form and actually be able to call current then you would need something like this:
using (var pck = new ExcelPackage(existingFile))
{
var worksheet = pck.Workbook.Worksheets.First();
//this is important to hold onto the range reference
var cells = worksheet.Cells;
//this is important to start the cellEnum object (the Enumerator)
cells.Reset();
//Can now loop the enumerator
while (cells.MoveNext())
{
//Current can now be used thanks to MoveNext
Console.WriteLine("Cell [{0}, {1}] = {2}"
, cells.Current.Start.Row
, cells.Current.Start.Column
, cells.Current.Value);
}
}
Note that you have to create a kind of local collection cells for this to work properly. Otherwise Current will be null if you tried `worksheet.cells.current'
But it would be simpler to use a ForEach and have the CLR do the work for you.
UPDATE: Based on comments. Your code should work fine as is, could it be your excel file:
[TestMethod]
public void Current_Cell_Test()
{
//http://stackoverflow.com/questions/32516676/trying-to-read-excel-file-with-epplus-and-getting-system-nullexception-error
//Throw in some data
var datatable = new DataTable("tblData");
datatable.Columns.AddRange(new[] { new DataColumn("Col1", typeof (int)), new DataColumn("Col2", typeof (int)),new DataColumn("Col3", typeof (object)) });
for (var i = 0; i < 10; i++)
{
var row = datatable.NewRow(); row[0] = i; row[1] = i * 10; row[2] = Path.GetRandomFileName(); datatable.Rows.Add(row);
}
//Create a test file
var fi = new FileInfo(#"c:\temp\test1.xlsx");
if (fi.Exists)
fi.Delete();
using (var pck = new ExcelPackage(fi))
{
var worksheet = pck.Workbook.Worksheets.Add("Sheet1");
worksheet.Cells.LoadFromDataTable(datatable, true);
pck.Save();
}
var dt = new DataTable();
using (ExcelPackage xlPackage = new ExcelPackage(fi))
{
//Get the worksheet in the workbook
ExcelWorksheet worksheet = xlPackage.Workbook.Worksheets.First();
//Obtain the worksheet size
ExcelCellAddress startCell = worksheet.Dimension.Start;
ExcelCellAddress endCell = worksheet.Dimension.End;
//Create the data column
for (int col = startCell.Column; col <= endCell.Column; col++)
{
dt.Columns.Add(col.ToString());
}
for (int row = startCell.Row; row <= endCell.Row; row++)
{
DataRow dr = dt.NewRow(); //Create a row
int i = 0;
for (int col = startCell.Column; col <= endCell.Column; col++)
{
dr[i++] = worksheet.Cells[row, col].Value.ToString();
}
dt.Rows.Add(dr);
}
}
Console.Write("{{dt Rows: {0} Columns: {1}}}", dt.Rows.Count, dt.Columns.Count);
}
Give this in the output:
{Rows: 11, Columns: 3}
Current is the current range when enumerating.
there is nothing wrong with this throwing an exception in debugging inspection when it is not being used within an enumerating scope.
code sample:
var range = ws.Cells[1,1,1,100];
foreach (var cell in range)
{
var a = range.Current.Value; // a is same as b
var b = cell.Value;
}
I am also getting same issue while reading excel file and none of the solution provided worked for me. Here is working code:
public void readXLS(string FilePath)
{
FileInfo existingFile = new FileInfo(FilePath);
using (ExcelPackage package = new ExcelPackage(existingFile))
{
//get the first worksheet in the workbook
ExcelWorksheet worksheet = package.Workbook.Worksheets[1];
int colCount = worksheet.Dimension.End.Column; //get Column Count
int rowCount = worksheet.Dimension.End.Row; //get row count
for (int row = 1; row <= rowCount; row++)
{
for (int col = 1; col <= colCount; col++)
{
Console.WriteLine(" Row:" + row + " column:" + col + " Value:" + worksheet.Cells[row, col].Value.ToString().Trim());
}
}
}
}

transfer from c# to excel with multiple sheets

I have a few different dictionaries with different categories of information and I need to output them all into an xls or csv file with multiple spreadsheets. Currently, I have to download each excel file for a specific date range individually and then copy and paste them together so they're on different sheets of the same file. Is there any way to download all of them together in one document? Currently, I use the following code to output their files:
writeCsvToStream(
organize.ToDictionary(k => k.Key, v => v.Value as IacTransmittal), writer
);
ms.Seek(0, SeekOrigin.Begin);
Response.Clear();
Response.AddHeader("Content-Disposition", "attachment; filename=" + fileName);
Response.AddHeader("Content-Length", ms.Length.ToString());
Response.ContentType = "application/octet-stream";
ms.CopyTo(Response.OutputStream);
Response.End();
where writeCsvToStream just creates the text for the individual file.
There are some different options you could use.
ADO.NET Excel driver - with this API you can populate data into Excel documents using SQL style syntax. Each worksheet in the workbook is a table, each column header in a worksheet is a column in that table etc.
Here is a code project article on the exporting to Excel using ADO.NET:
http://www.codeproject.com/Articles/567155/Work-with-MS-Excel-and-ADO-NET
The ADO.NET approach is safe to use in a multi-user, web app environment.
Use OpenXML to export the data
OpenXML is a schema definition for different types of documents and the later versions of Excel (the ones that use .xlsx, .xlsm etc. instead of just .xls) use this format for the documents. The OpenXML schema is huge and somewhat cumbersome, however you can do pretty much anything with it.
Here is a code project article on exporting data to Excel using OpenXML:
http://www.codeproject.com/Articles/692121/Csharp-Export-data-to-Excel-using-OpenXML-librarie
The OpenXML approach is safe to use in a multi-user, web app environment.
A third approach is to use COM automation which is the same as programmatically running an instance of the Excel desktop application and using COM to control the actions of that instance.
Here is an article on that topic:
http://support.microsoft.com/kb/302084
Note that this third approach (office automation) is not safe in a multi-user, web app environment. I.e. it should not be used on a server, only from standalone desktop applications.
If you're open to learning a new library, I highly recommend EPPlus.
I'm making a few assumptions here since you didn't post much code to translate, but an example of usage may look like this:
using OfficeOpenXml;
using OfficeOpenXml.Style;
public static void WriteXlsOutput(Dictionary<string, IacTransmittal> collection) //accepting one dictionary as a parameter
{
using (FileStream outFile = new FileStream("Example.xlsx", FileMode.Create))
{
using (ExcelPackage ePackage = new ExcelPackage(outFile))
{
//group the collection by date property on your class
foreach (IGrouping<DateTime, IacTransmittal> collectionByDate in collection
.OrderBy(i => i.Value.Date.Date)
.GroupBy(i => i.Value.Date.Date)) //assuming the property is named Date, using Date property of DateTIme so we only create new worksheets for individual days
{
ExcelWorksheet eWorksheet = ePackage.Workbook.Worksheets.Add(collectionByDate.Key.Date.ToString("yyyyMMdd")); //add a new worksheet for each unique day
Type iacType = typeof(IacTransmittal);
PropertyInfo[] iacProperties = iacType.GetProperties();
int colCount = iacProperties.Count(); //number of properties determines how many columns we need
//set column headers based on properties on your class
for (int col = 1; col <= colCount; col++)
{
eWorksheet.Cells[1, col].Value = iacProperties[col - 1].Name ; //assign the value of the cell to the name of the property
}
int rowCounter = 2;
foreach (IacTransmittal iacInfo in collectionByDate) //iterate over each instance of this class in this igrouping
{
int interiorColCount = 1;
foreach (PropertyInfo iacProp in iacProperties) //iterate over properties on the class
{
eWorksheet.Cells[rowCounter, interiorColCount].Value = iacProp.GetValue(iacInfo, null); //assign cell values by getting the value of each property in the class
interiorColCount++;
}
rowCounter++;
}
}
ePackage.Save();
}
}
}
Thanks for the ideas! I was eventually able to figure out the following
using Excel = Microsoft.Office.Interop.Excel;
Excel.Application ExcelApp = new Excel.Application();
Excel.Workbook ExcelWorkBook = null;
Excel.Worksheet ExcelWorkSheet = null;
ExcelApp.Visible = true;
ExcelWorkBook = ExcelApp.Workbooks.Add(Excel.XlWBATemplate.xlWBATWorksheet);
List<string> SheetNames = new List<string>()
{ "Sheet1", "Sheet2", "Sheet3", "Sheet4", "Sheet5", "Sheet6", "Sheet7"};
string [] headers = new string []
{ "Field 1", "Field 2", "Field 3", "Field 4", "Field 5" };
for (int i = 0; i < SheetNames.Count; i++)
ExcelWorkBook.Worksheets.Add(); //Adding New sheet in Excel Workbook
for (int k = 0; k < SheetNames.Count; k++ )
{
int r = 1; // Initialize Excel Row Start Position = 1
ExcelWorkSheet = ExcelWorkBook.Worksheets[k + 1];
//Writing Columns Name in Excel Sheet
for (int col = 1; col < headers.Length + 1; col++)
ExcelWorkSheet.Cells[r, col] = headers[col - 1];
r++;
switch (k)
{
case 0:
foreach (var kvp in Sheet1)
{
ExcelWorkSheet.Cells[r, 1] = kvp.Value.Field1;
ExcelWorkSheet.Cells[r, 2] = kvp.Value.Field2;
ExcelWorkSheet.Cells[r, 3] = kvp.Value.Field3;
ExcelWorkSheet.Cells[r, 4] = kvp.Value.Field4;
ExcelWorkSheet.Cells[r, 5] = kvp.Value.Field5;
r++;
}
break;
}
ExcelWorkSheet.Name = SheetNames[k];//Renaming the ExcelSheets
}
//Activate the first worksheet by default.
((Excel.Worksheet)ExcelApp.ActiveWorkbook.Sheets[1]).Activate();
//Save As the excel file.
ExcelApp.ActiveWorkbook.SaveCopyAs(#"out_My_Book1.xls");

How to set directory for Excel.WorkBook.SaveAs()?

I have code that converts a .CSV file to a .XLSX file. The conversion goes well, but the WorkBook.SaveAs(#"file.xslx") method seems to only save the file to C:\Users[MyName]\Documents\file.xlsx. When I use Excel.Application.GetSaveAsFileName() it defaults to C:\Users[MyName]\Documents with a Save As dialog.
Furthermore, setting Excel.Application.DefaultFilePath doesn't seem to help unless I explicitly state F:.........
I have a relative working directory set, where a.csv is read from:
using Excel = Microsoft.Office.Interop.Excel;
StreamReader a = new StreamReader(#"a.csv");
var CSVContent = new List<string[]>();
Excel.Application excel = new Excel.Application();
excel.DefaultFilePath = #"Output\"; //doesn't do anything
Excel.Workbook workBook = excel.Workbooks.Add();
Excel.Worksheet sheet = workBook.ActiveSheet;
while (!a.EndOfStream)
{
string read = a.ReadLine();
CSVContent.Add(read.Split(','));
}
for (int i = 0; i < CSVContent.Count; i++) //write List contents to xlsx Line by Line
{
string[] csvLine = CSVContent[i];
for (int j = 0; j < csvLine.Length; j++)
{
sheet.Cells[i + 1, j + 1] = csvLine[j]; //Cells begin at 1 in Excel
}
}
var b = excel.GetSaveAsFilename("a.xlsx");
workBook.SaveAs(b);
workBook.Close();
How do I get workbook.SaveAs() to save into the solution relative working directory?
Relying on relative paths tends to be problematic. If you want to save it to the current directory of the application, use Environment.CurrentDirectory, that is:
var b = Environment.CurrentDirectory + #"\a.xlsx";

Create Excel VBA code and button programmatically from C#

I am in the middle of simple method, that saves my DataGridView into an Excel document (1 sheet only) and also adds VBA code and a button to run the VBA code.
public void SaveFile(string filePath)
{
Microsoft.Office.Interop.Excel.ApplicationClass ExcelApp = new Microsoft.Office.Interop.Excel.ApplicationClass();
ExcelApp.Application.Workbooks.Add(Type.Missing);
//Change Workbook-properties.
ExcelApp.Columns.ColumnWidth = 20;
// Storing header part in Excel.
for (int i = 1; i < gridData.Columns.Count + 1; i++)
{
ExcelApp.Cells[1, i] = gridData.Columns[i - 1].HeaderText;
}
//Storing Each row and column value to excel sheet
for (int row = 0; row < gridData.Rows.Count; row++)
{
gridData.Rows[row].Cells[0].Value = "Makro";
for (int column = 0; column < gridData.Columns.Count; column++)
{
ExcelApp.Cells[row + 2, column + 1] = gridData.Rows[row].Cells[column].Value.ToString();
}
}
ExcelApp.ActiveWorkbook.SaveCopyAs(filePath);
ExcelApp.ActiveWorkbook.Saved = true;
ExcelApp.Quit();
}
I only implemented DataGridView export.
EDIT: Thanks to Joel I could, with proper words, search again for the solution. I think that this may be helpful. Would you correct me or give a tip or two about what I should look for.
I just wrote a small example which adds a new button to an existing workbook and afterwards add a macro which will be called when the button is clicked.
using Excel = Microsoft.Office.Interop.Excel;
using VBIDE = Microsoft.Vbe.Interop;
...
private static void excelAddButtonWithVBA()
{
Excel.Application xlApp = new Excel.Application();
Excel.Workbook xlBook = xlApp.Workbooks.Open(#"PATH_TO_EXCEL_FILE");
Excel.Worksheet wrkSheet = xlBook.Worksheets[1];
Excel.Range range;
try
{
//set range for insert cell
range = wrkSheet.get_Range("A1:A1");
//insert the dropdown into the cell
Excel.Buttons xlButtons = wrkSheet.Buttons();
Excel.Button xlButton = xlButtons.Add((double)range.Left, (double)range.Top, (double)range.Width, (double)range.Height);
//set the name of the new button
xlButton.Name = "btnDoSomething";
xlButton.Text = "Click me!";
xlButton.OnAction = "btnDoSomething_Click";
buttonMacro(xlButton.Name, xlApp, xlBook, wrkSheet);
}
catch (Exception ex)
{
Debug.WriteLine(ex.Message);
}
xlApp.Visible = true;
}
And here we got the buttonMacro(..) method
private static void buttonMacro(string buttonName, Excel.Application xlApp, Excel.Workbook wrkBook, Excel.Worksheet wrkSheet)
{
StringBuilder sb;
VBIDE.VBComponent xlModule;
VBIDE.VBProject prj;
prj = wrkBook.VBProject;
sb = new StringBuilder();
// build string with module code
sb.Append("Sub " + buttonName + "_Click()" + "\n");
sb.Append("\t" + "msgbox \"" + buttonName + "\"\n"); // add your custom vba code here
sb.Append("End Sub");
// set an object for the new module to create
xlModule = wrkBook.VBProject.VBComponents.Add(VBIDE.vbext_ComponentType.vbext_ct_StdModule);
// add the macro to the spreadsheet
xlModule.CodeModule.AddFromString(sb.ToString());
}
Found this information within an KB article How To Create an Excel Macro by Using Automation from Visual C# .NET

Categories