How to "Insert text from file" using OpenXml SDK 2.0

How to "Insert text from file" using OpenXml SDK 2.0 - c#

Using Word 2010 GUI, there is an option to "Insert text from file...", which does exactly that: It insert the text in the main part of a document to the current location in your document.
I would like to do the same using C# and the OpenXml SDK 2.0
using (var mainDocument = WordprocessingDocument.Open("MainFile.docx", true);
{
var mainPart = mainDocument.MainDocumentPart;
var bookmarkStart = mainPart
.Document
.Body
.Descendants<BookmarkStart>()
.SingleOrDefault(b => b.Name == "ExtraContentBookmark");
var extraContent = GetTextFromFile("ExtraFile.docx");
bookmarkStart.InsertAfterSelf(extraContent);
}
I have tried using plain Xml (XElement), using OpenXmlElement (MainDocumentPart.Document.Body.Descendants), and using AltChunk. Every alternative so far has yielded a non-conformant docx-file.
What should the method GetTextFromFile look like?

This is how I implemented it. The solution was to use AltChunk as described by Eric White. I had already tried it, but as Bradley said in his answer, a bookmark may be anywhere in a document, and mine was inside a paragraph. As soon as I inserted the text before the containing paragraph, everything worked fine.
Here is the (simplified) code:
using (var mainDocument = WordprocessingDocument.Open("MainFile.docx", true);
{
var mainPart = mainDocument.MainDocumentPart;
var bookmarkStart = mainPart
.Document
.Body
.Descendants<BookmarkStart>()
.SingleOrDefault(b => b.Name == "ExtraContentBookmark");
var altChunk = GetAltChunkFromFile("ExtraFile.docx", mainPart);
var containingParagraph = element.Ancestors<Paragraph>().FirstOrDefault();
containingParagraph.InsertBeforeSelf(altChunk);
}
...
private AltChunk GetAltChunk(string filename, MainDocumentPart mainDocumentPart)
{
var altChunkId = "AltChunkId1";
var chunk = mainDocumentPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
chunk.FeedData(File.Open(filename, FileMode.Open));
var altChunk = new AltChunk { Id = altChunkId };
return altChunk;
}

It is not as simple as inserting the descendants of the document body tag at the bookmark location. Some reasons:
The two documents may be using different styles; you would have to copy across dependant styles, or update the references to use the styles in the destination document.
The <bookmarkStart> tag can appear almost anywhere in a document, including inside a paragraph, a run, a table cell, etc. Since you cannot nest paragraphs or runs, you will have to determine where the bookmark is situated, then ascend/descend the XML tree until you find an appropriate place to insert the content.
What you're trying to do becomes quite a complicated task when using the OpenXml SDK. It requires an in-depth understanding of the format and its schema.
I would almost advise using VSTO/OLE automation instead, as it enables you to use the functionality that is built into Word.

Related

Fill word template data using openXML SDK

I have an word file template and xml file for data. I want to find content Content control in word and get data from xml and then replace text in word template. i'm using the following code but it is not updating word file.
using (WordprocessingDocument document = WordprocessingDocument.CreateFromTemplate(txtWordFile.Text))
{
MainDocumentPart mainPart = document.MainDocumentPart;
IEnumerable<SdtBlock> block = mainPart.Document.Body.Descendants<SdtBlock>().Where
(r => r.SdtProperties.GetFirstChild<Tag>().Val == "TotalClose");
Text t = block.Descendants<Text>().Single();
t.Text = "13,450,542";
mainPart.Document.Save();
}

For anyone still struggling with this - you can check out this library https://github.com/antonmihaylov/OpenXmlTemplates
With it you can replace the text inside all content controls of the document based on a JSON object (or a basic C# dictionary) without writing specific code, instead you specify the variable name in the tag of the content control.
(Note - i am the maker of that library, but it is open sourced and licensed under LGPLv3)

I think you should write changes to temporary file.
See Save modified WordprocessingDocument to new file or my code from work project:
MemoryStream yourDocStream = new MemoryStream();
... // populate yourDocStream with .docx bytes
using (Package package = Package.Open(yourDocStream, FileMode.Open, FileAccess.ReadWrite))
{
// Load the document XML in the part into an XDocument instance.
PackagePart packagePart = LoadXmlPackagePart(package);
XDocument xDocument = XDocument.Load(XmlReader.Create(packagePart.GetStream()));
// making changes
// Save the XML into the package
using (XmlWriter xw = XmlWriter.Create(packagePart.GetStream(FileMode.Create, FileAccess.Write)))
{
xDocument.Save(xw);
}
var resultDocumentBytes = yourDocStream.ToArray();
}

The basic approach you use works fine, but I'm surprised you're not getting any compile-time errors because
IEnumerable<SdtBlock> block = mainPart.Document
.Body
.Descendants<SdtBlock>()
.Where(r => r.SdtProperties.GetFirstChild<Tag>().Val == "TotalClose");
is not compatible with
Text t = block.Descendants<Text>().Single();
block, as IEnumerable has no Descendants property. You either need to loop through all the items in IEnumerable and perform this on each item, or you need to define and instantiate a single item, like this:
using (WordprocessingDocument document = WordprocessingDocument.CreateFromTemplate(txtWordFile.Text))
{
MainDocumentPart mainPart = pkgDoc.MainDocumentPart;
SdtBlock block = mainPart.Document.Body.Descendants<SdtBlock>().Where
(r => r.SdtProperties.GetFirstChild<Tag>().Val == "test1").FirstOrDefault();
Text t = block.Descendants<Text>().Single();
t.Text = "13,450,542";
mainPart.Document.Save();
}

c# create word document with openXML : XML Parsing Error (when replacement string contains spaces)

I am trying to create a word document using a word template in my C# application using openXML. Here is my code so far:
DirectoryInfo tempDir = new DirectoryInfo(Server.MapPath("~\\Files\\WordTemplates\\"));
DirectoryInfo docsDir = new DirectoryInfo(Server.MapPath("~\\Files\\FinanceDocuments\\"));
string ype = "test Merge"; //if ype string contains spaces then I get this error
string sourceFile = tempDir + "\\PaymentOrderTemplate.dotx";
string destinationFile = docsDir + "\\" + "PaymentOrder.doc";
// Create a copy of the template file and open the copy
File.Copy(sourceFile, destinationFile, true);
// create key value pair, key represents words to be replace and
//values represent values in document in place of keys.
Dictionary<string, string> keyValues = new Dictionary<string, string>();
keyValues.Add("ype", ype);
SearchAndReplace(destinationFile, keyValues);
Process.Start(destinationFile);
And the SearchAndReplace funtion:
public static void SearchAndReplace(string document, Dictionary<string, string> dict)
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
foreach (KeyValuePair<string, string> item in dict)
{
Regex regexText = new Regex(item.Key);
docText = regexText.Replace(docText, item.Value);
}
using (StreamWriter sw = new StreamWriter(
wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
}
But when I try to open the exported file I get this error:
XML parsing error
Location: Part: /word/document.xml, line: 2, Column: 2142
Document.xml first lines:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid wp14">
<w:body>
<w:tbl>
<w:tblPr>
<w:tblW w:w="10348" w:ttest Merge="dxa"/>
<w:tblInd w:w="108" w:ttest Merge="dxa"/>
<w:tblBorders>
Edit
I found out that the problem occured because I was using mergefields in the word template. If I use plain text it works. But in this case it will be slow because it has to check every single word in the template and if matches replace it. Is it possible to do it in another way?

Disclaimer: You seem to be using the OpenXML SDK, because your code looks virtually identical to that found here: https://msdn.microsoft.com/en-us/library/bb508261(v=office.12).aspx - I've never in my life used this SDK and I'm basing this answer on an educated guess at what's happening
It seems that the operation you're carrying out on this Word document is affecting parts of the document that you didn't intend.
I believe that calling document.MainDocumentPart.GetStream() just giving you more or less raw direct access to the XML of the document, and you're then treating it as a plain xml file, manipulating it as text, and carrying out a list of straight text replacements? I think it's thus likely the cause of the problem because you're intending to edit document text, but accidentally damaging xml node structure in the process
By way of an example, here is a simple HTML document:
<html>
<head><title>Damage report</title></head>
<body>
<p>The soldier was shot once in the body and twice in the head</p>
</body>
</html>
You decide to run a find/replace to make the places the soldier was shot, a bit more specific:
var html = File.ReadAllText(#"c:\my.html");
html = html.Replace("body", "chest");
html = html.Replace("head", "forehead");
File.WriteAllText(#"c:\my.html");
Only thing, your document is now ruined:
<html>
<forehead><title>Damage report</title></forehead>
<chest>
<p>The soldier was shot once in the chest and twice in the forehead</p>
</chest>
</html>
A browser can't parse it (well, it's still valid I suppose, but it's meaningless) any more because the replacement operation broke some things.
You're replacing "ype" with "test Merge" but this seems to be clobbering an occurrence of the word "type" - something that it seems pretty likely would appear in the XML attribute or element names - and turning it into "ttest Merge".
To correctly change the content of an XML document's node texts, it should be parsed from text to an XML document object model representation, the nodes iterated, the texts altered, and the whole thing re-serialized back to xml text. Office SDK does seem to provide ways to do this, because you can treat a document like a collection of class object instances, and say things like this code snippet (also from MSDN):
// Create a Wordprocessing document.
using (WordprocessingDocument myDoc = WordprocessingDocument.Create(docName, WordprocessingDocumentType.Document))
{
// Add a new main document part.
MainDocumentPart mainPart = myDoc.AddMainDocumentPart();
//Create DOM tree for simple document.
mainPart.Document = new Document();
Body body = new Body();
Paragraph p = new Paragraph();
Run r = new Run();
Text t = new Text("Hello World!");
//Append elements appropriately.
r.Append(t);
p.Append(r);
body.Append(p);
mainPart.Document.Append(body);
// Save changes to the main document part.
mainPart.Document.Save();
}
You should be looking for another way, not using streams/direct low level xml access, to access the document elements. Something like these:
https://blogs.msdn.microsoft.com/brian_jones/2009/01/28/traversing-in-the-open-xml-dom/
https://www.gemboxsoftware.com/document/articles/find-replace-word-csharp
Or possibly starting with a related SO question like this: Search And Replace Text in OPENXML (Added file) (though the answer you need may be in the something linked inside this question)

Add HTML String to OpenXML (*.docx) Document

I am trying to use Microsoft's OpenXML 2.5 library to create a OpenXML document. Everything works great, until I try to insert an HTML string into my document. I have scoured the web and here is what I have come up with so far (snipped to just the portion I am having trouble with):
Paragraph paragraph = new Paragraph();
Run run = new Run();
string altChunkId = "id1";
AlternativeFormatImportPart chunk =
document.MainDocumentPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.Html, altChunkId);
chunk.FeedData(new MemoryStream(Encoding.UTF8.GetBytes(ioi.Text)));
AltChunk altChunk = new AltChunk { Id = altChunkId };
run.AppendChild(new Break());
paragraph.AppendChild(run);
body.AppendChild(paragraph);
Obviously, I haven't actually added the altChunk in this example, but I have tried appending it everywhere - to the run, paragraph, body, etc. In ever case, I am unable to open up the docx file in Word 2010.
This is making me a little nutty because it seems like it should be straightforward (I will admit that I'm not fully understanding the AltChunk "thing"). Would appreciate any help.
Side Note: One thing I did find that was interesting, and I don't know if it's actually a problem or not, is this response which says AltChunk corrupts the file when working from a MemoryStream. Can anybody confirm that this is/isn't true?

I can reproduce the error "... there is a problem with the content" by using
an incomplete HTML document as the content of the alternative format import part.
For example if you use the following HTML snippet <h1>HELLO</h1>
MS Word is unable to open the document.
The code below shows how to add an AlternativeFormatImportPart to a word document.
(I've tested the code with MS Word 2013).
using (WordprocessingDocument doc = WordprocessingDocument.Open(#"test.docx", true))
{
string altChunkId = "myId";
MainDocumentPart mainDocPart = doc.MainDocumentPart;
var run = new Run(new Text("test"));
var p = new Paragraph(new ParagraphProperties(
new Justification() { Val = JustificationValues.Center }),
run);
var body = mainDocPart.Document.Body;
body.Append(p);
MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<html><head></head><body><h1>HELLO</h1></body></html>"));
// Uncomment the following line to create an invalid word document.
// MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<h1>HELLO</h1>"));
// Create alternative format import part.
AlternativeFormatImportPart formatImportPart =
mainDocPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.Html, altChunkId);
//ms.Seek(0, SeekOrigin.Begin);
// Feed HTML data into format import part (chunk).
formatImportPart.FeedData(ms);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainDocPart.Document.Body.Append(altChunk);
}
According to the Office OpenXML specification valid parent elements for the
w:altChunk element are body, comment, docPartBody, endnote, footnote, ftr, hdr and tc.
So, I've added the w:altChunk to the body element.
For more information on the w:altChunk element see this MSDN link.
EDIT
As pointed out by #user2945722, to make sure that the OpenXml library correctlty interprets the byte array as UTF-8, you should add the UTF-8 preamble. This can be done this way:
MemoryStream ms = new MemoryStream(new UTF8Encoding(true).GetPreamble().Concat(Encoding.UTF8.GetBytes(htmlEncodedString)).ToArray()
This will prevent your é's from being rendered as Ã©'s, your ä's as Ã¤'s, etc.

Had the same problem here, but a totally different cause. Worth a try if the accepted solution doesn't help. Try closing the file after saving. In my case, it happened to be the difference between a corrupt and a clean docx file. Oddly, most other operations work with only a Save() and program exit.
String cid = "chunkid";
WordprocessingDocument document = WordprocessingDocument.Open("somefile.docx", true);
Body body = document.MainDocumentPart.Document.Body;
MemoryStream ms = new MemoryStream(System.Text.Encoding.UTF8.GetBytes("<html><head></head><body>hi</body></html>"));
AlternativeFormatImportPart formatImportPart = document.MainDocumentPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Html, cid);
formatImportPart.FeedData(ms);
AltChunk altChunk = new AltChunk();
altChunk.Id = cid;
document.MainDocumentPart.Document.Body.Append(altChunk);
document.MainDocumentPart.Document.Save();
// here's the magic!
document.Close();

Replace a bookmark in a Word document with the contents of another word document

I'm looking to replace a bookmark in a word document with the entire contents of another word document. I was hoping to do something along the lines of the following, but appending the xml does not seem to be enough as it does not include pictures.
using Word = Microsoft.Office.Interop.Word;
...
Word.Application wordApp = new Word.Application();
Word.Document doc = wordApp.Documents.Add(filename);
var bookmark = doc.Bookmarks.OfType<Bookmark>().First();
var doc2 = wordApp.Documents.Add(filename2);
bookmark.Range.InsertXML(doc2.Contents.XML);
The second document contains a few images and a few tables of text.
Update: Progress made by using XML, but still doesn't satisfy adding pictures as well.

You've jumped in deep.
If you're using the object model (bookmark.Range) and trying to insert a picture you can use the clipboard or bookmark.Range.InlineShapes.AddPicture(...). If you're trying to insert a whole document you can copy/paste the second document:
Object objUnit = Word.WdUnits.wdStory;
wordApp.Selection.EndKey(ref objUnit, ref oMissing);
wordApp.ActiveWindow.Selection.PasteAndFormat(Word.WdRecoveryType.wdPasteDefault);
If you're using XML there may be other problems, such as formatting, images, headers/footers not coming in correctly.
Depending on the task it may be better to use DocumentBuilder and OpenXML SDK. If you're writing a Word addin you can use the object API, it will likely perform the same, if you're processing documents without Word go with OpenXML SDK and DocumentBuilder. The issue with DocumentBuilder is if it doesn't work there aren't many work-arounds to try. It's open source not the cleanest piece of code if you try troubleshooting it.

You can do this with openxml SDK and Document builder. To outline here is what you will need
1> Inject insert key in main doc
public WmlDocument GetProcessedTemplate(string templatePath, string insertKey)
{
WmlDocument templateDoc = new WmlDocument(templatePath);
using (MemoryStream mem = new MemoryStream())
{
mem.Write(templateDoc.DocumentByteArray, 0, templateDoc.DocumentByteArray.Length);
using (WordprocessingDocument doc = WordprocessingDocument.Open([source], true))
{
XDocument xDoc = doc.MainDocumentPart.GetXDocument();
XElement bookMarkPara = [get bookmarkPara to replace];
bookMarkPara.ReplaceWith(new XElement(PtOpenXml.Insert, new XAttribute("Id", insertKey)));
doc.MainDocumentPart.PutXDocument();
}
templateDoc.DocumentByteArray = mem.ToArray();
}
return templateDoc;
}
2> Use document builder to merge
List<Source> documentSources = new List<Source>();
var insertKey = "INSERT_HERE_1";
var processedTemplate = GetProcessedTemplate([docPath], insertKey);
documentSources.Add(new Source(processedTemplate, true));
documentSources.Add(new Source(new WmlDocument([docToInsertFilePath]), insertKey));
DocumentBuilder.BuildDocument(documentSources, [outputFilePath]);

Populate a word template using C# in ASP.NET MVC3

I read it some post referring to Populate word documents, but I need to populate a word document (Office 2007) using C#. For example i want to have a word document with a label [NAME], use that label in C# to put my value, and do all this in a ASP.NET MVC3 controller. Any idea?

You could use the OpenXML SDK provided by Microsoft to manipulate Word documents. And here's a nice article (it's actually the third of a series of 3 articles) with a couple of examples.

You can do like this :
- Introduce "signets" into your Word document template
- Work on a copy of your word template
- Modify signets values from c# code and save or print your file.
Be carefull with releasing correctly your word process if you treat several documents in your application :)

OP's solution extracted from the question:
The solution i found is this:
static void Main(string[] args)
{
Console.WriteLine("Starting up Word template updater ...");
//get path to template and instance output
string docTemplatePath = #"C:\Users\user\Desktop\Doc Offices XML\earth.docx";
string docOutputPath = #"C:\Users\user\Desktop\Doc Offices XML\earth_Instance.docx";
//create copy of template so that we don't overwrite it
File.Copy(docTemplatePath, docOutputPath);
Console.WriteLine("Created copy of template ...");
//stand up object that reads the Word doc package
using (WordprocessingDocument doc = WordprocessingDocument.Open(docOutputPath, true))
{
//create XML string matching custom XML part
string newXml = "<root>" +
"<Earth>Outer Space</Earth>" +
"</root>";
MainDocumentPart main = doc.MainDocumentPart;
main.DeleteParts<CustomXmlPart>(main.CustomXmlParts);
//MainDocumentPart mainPart = doc.AddMainDocumentPart();
//add and write new XML part
CustomXmlPart customXml = main.AddCustomXmlPart(CustomXmlPartType.CustomXml);
using (StreamWriter ts = new StreamWriter(customXml.GetStream()))
{
ts.Write(newXml);
}
//closing WordprocessingDocument automatically saves the document
}
Console.WriteLine("Done");
Console.ReadLine();
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to "Insert text from file" using OpenXml SDK 2.0 - c#

Related

Fill word template data using openXML SDK

c# create word document with openXML : XML Parsing Error (when replacement string contains spaces)

Add HTML String to OpenXML (*.docx) Document

Replace a bookmark in a Word document with the contents of another word document

Populate a word template using C# in ASP.NET MVC3

Categories

Resources