Extracting data stored tables in .docx file with C#

Extracting data stored tables in .docx file with C# - c#

When I run the following code on a .docx file the child2.innerText will print out the information I am after (although without seperating text from the seperate columns of the table).
My problem is that the associated innerXml is completely incomprehensible for me. I thought that there would be an 'get cell from table' method or such. I have literally no idea how to extract columns/rows from the xml I've been given though.
I'm completely new to C#, so I might be missing something obvious.
I am using the openxml and the file is .docx.
using (WordprocessingDocument wdoc = WordprocessingDocument.Open(pathToMiniToktrapport, false))
{
var table = wdoc.MainDocumentPart.Document.Body.Elements<Table>();
foreach (var child in table)
{
foreach (var child2 in child) {
System.Console.WriteLine(child2.InnerXml);
System.Console.WriteLine(child2.InnerText);
System.Console.ReadLine();
}
}
}
Thanks!

Please go through the link in MSDN which shows the typical structure and sample code
http://msdn.microsoft.com/en-us/library/office/cc850835.aspx

Related

How to insert hyperlink in powerpoint slide using Open XML?

I have a scenario where I have to replace the certain variables from slide template with tabular data but in this case data and slide text is overlapping and after some research I found out that PowerPoint is not designed for such cases [MS Link] (img 1). To overcome this I though instead of replacing the variables with tabular data, I should replace the variable with the link which will point the newly created slide where I can post my tabular data (img 2).
So come back to my question, Is there any way I can write the data without messing the template? OR How can I replace the variable with the hyperlink to the slide?

According to the Documentation you can do something like this, just adjust to your case and how to find the variable1.
// Open the presentation file as read-only.
using (PresentationDocument document = PresentationDocument.Open(fileName, false))
{
// Iterate through all the slide parts in the presentation part.
foreach (SlidePart slidePart in document.PresentationPart.SlideParts)
{
IEnumerable<Drawing.HyperlinkType> links = slidePart.Slide.Descendants<Drawing.HyperlinkType>();
// Iterate through all the links in the slide part.
foreach (Drawing.HyperlinkType link in links)
{
// Iterate through all the external relationships in the slide part.
foreach (HyperlinkRelationship relation in slidePart.HyperlinkRelationships)
{
// If the relationship ID matches the link ID…
if (relation.Id.Equals(link.Id))
{
// Add the URI of the external relationship to the list of strings.
ret.Add(relation.Uri.AbsoluteUri);
}
}
}
}
}

openXML tables not being read in MS word

I'm trying to read all the tables from a word file into a list, although for some reason the count is 0 regardless of how many tables are in the file. Here's my code.
public void FindAndReplace(string DocPath)
{
using (WordprocessingDocument doc = WordprocessingDocument.Open(DocPath, true))
{
using (StreamReader reader = new StreamReader(doc.MainDocumentPart.GetStream()))
{
//Text titlePlaceholder = doc.MainDocumentPart.Document.Body.Descendants<Text>().Where((x) => x.Text == "Compliance Review By:").First();
List<Table> tables = doc.MainDocumentPart.Document.Descendants<Table>().ToList();
System.Console.WriteLine(tables.Count);
tables.Count = 0. What am I doing wrong?

If all you're trying to do is READ the tables, then there's no need to open the document for editing (which is what you're doing currently)
Set the second parameter to false in WordprocessingDocument.Open() to open for reading. This will prevent the error related to opening an entry more than once in Update mode (I assume that's what you're running into anyway).
Solution based on chatter
The real culprit here has to do with using the wrong OpenXml namespace when examining tables in the document. When looking for Descendants of type Table, the passed-in type must be OpenXml.Wordprocessing.Table, NOT OpenXml.Drawing.Table
I don't know what type of object the OpenXml.Drawing.Table is used for. I'll ask about this in a separate question.

You probably are referencing a wrong Table. This should work:
var tables = doc.MainDocumentPart.Document.Descendants<DocumentFormat.OpenXml.Wordprocessing.Table>().ToList();

Anu start had the answer in the comments. The problem was that I was using an incorrect namespace. Instead of using DocumentFormat.OpenXml.Wordprocessing.Table I was using DocumentFormat.OpenXml.Drawing.Table

Iterate through a Microsoft Word document to find and replace tables

I have some VBA code that iterates through a document to remove tables from a document. The following code works fine in VBA:
Set wrdDoc = ThisDocument
With wrdDoc
For Each tbl In wrdDoc.Tables
tbl.Select
Selection.Delete
Next tbl
End With
Unfortunately, I cannot easily translate this code to C#, presumably because there is a newer Range.Find method. Here are three things I tried, each failing.
First attempt (re-write of the VBA code):
foreach (var item in doc.Tables)
{
item.Delete; //NOPE! No "Delete" function.
}
I tried this:
doc = app.Documents.Open(sourceFolderAndFile); //sourceFolderAndFile opens a standard word document.
var rng = doc.Tables;
foreach(var item in rng)
{
item.Delete; //NOPE! No "Delete" function.
}
I also tried this:
doc = app.Documents.Open(sourceFolderAndFile); //sourceFolderAndFile opens a standard word document.
var rng = doc.Tables;
Range.Find.Execute(... //NOPE! No Range.Find available for the table collection.
...
Could someone please help me understand how I can use C# and Word Interop (Word 2013 and 2016) to iterate through a document, find a table, and then perform a function, like selecting it, deleting it, or replacing it?
Thanks!

It took me some time to figure this answer out. With all the code samples online, I missed the need to create an app. For posterity, here is how I resolved the problem.
Make sure you have a Using statement, like this:
using MsWord = Microsoft.Office.Interop.Word;
Open the document and then work with the new msWord reference, the range, and the table. I provide a basic example below:
//open the document.
doc = app.Documents.Open(sourceFolderAndFile, ReadOnly: true, ConfirmConversions: false);
//iterate through the tables and delete them.
foreach (MsWord.Table table in doc.Tables)
{
//select the area where the table is located and delete it.
MsWord.Range rng = table.Range;
rng.SetRange(table.Range.End, table.Range.End);
table.Delete();
}
//don't forget doc.close and app.quit to clean up memory.
You can use the Range (rng) to replace the table with other items, like text, images, etc.

c# CSVHelper read CSV with variable headers

First time using the csvReader - note it requires a custom class that defines the Headers found in the CSV file.
class DataRecord
{
//Should have properties which correspond to the Column Names in the file
public String Amount { get; set; }
public String InvoiceDate { get; set; }......
}
The example given then uses the class such:-
using (var sr = new StreamReader(#"C:\\Data\\Invoices.csv"))
{
var reader = new CsvReader(sr);
//CSVReader will now read the whole file into an enumerable
IEnumerable<DataRecord> records = reader.GetRecords<DataRecord>();
//First 5 records in CSV file will be printed to the Output Window
foreach (DataRecord record in records.Take(5))
{
Debug.Print("{0} {1}, {2}", record.Amount, record.InvoiceDate, ....);
}
Two questions :-
1. The app will be loading in files with differing headers so I need to be able to update this class on the fly - is this possible & how?
(I am able to extract the headers from the CSV file.)
CSV file is potentially multi millions of rows (gb size) so is this the best / most efficient way of importing the file.
Destination is a SQLite DB - debug line is used as example.
Thanks

The app will be loading in files with differing headers so I need to be able to update this class on the fly - is this possible & how?
Although it is definetely possible with reflecion or third part libraries, creating an object for a row will be inefficient for such a big files. Moreover, using C# for such a scenario is a bad idea (unless you have some business data transformation). I would consider something like this, or perhaps a SSIS package.

How to get tables from a word file and store them into a datagridview?

i am working with c# on VS 2013. in my program, i want to get a word file as an input from to an openfiledialog. then i want to access into it and extract the tables which exist on it and finally, store them into a datagridview.
please i need a Tutorial to follow.
Thank you!!

I presume you are working with OpenXML SDK.. In that case maybe something like that will give you access to all of the tables:
Body body = doc.MainDocumentPart.Document.Body;
foreach (Table t in body.Descendants<Table>())
{
...
}
See this as well: https://msdn.microsoft.com/en-us/library/office/cc850835(v=office.14).aspx

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extracting data stored tables in .docx file with C# - c#

Please go through the link in MSDN which shows the typical structure and sample code http://msdn.microsoft.com/en-us/library/office/cc850835.aspx

Related

How to insert hyperlink in powerpoint slide using Open XML?

openXML tables not being read in MS word

Iterate through a Microsoft Word document to find and replace tables

c# CSVHelper read CSV with variable headers

How to get tables from a word file and store them into a datagridview?

Categories

Resources