I'm updating a word document by rewriting the CustomXMLPart file. I've basically followed this tutorial: http://blogs.msdn.com/b/brian_jones/archive/2009/01/05/taking-advantage-of-bound-content-controls.aspx
private bool _makeDoc()
{
var path = HttpContext.Current.Server.MapPath("~/Classes/Word/template.docx");
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(path, true))
{
//create new XML string
//these values will populate the template word doc
string newXML = "<root>";
newXML += "<name>";
newXML += "name goes here";
newXML += "</name>";
newXML += "<bio>";
newXML += "text" + "more text";
newXML += "</bio>";
newXML += "</root>";
MainDocumentPart mainPart = myDoc.MainDocumentPart;
//delete old xml part
mainPart.DeleteParts<CustomXmlPart>(mainPart.CustomXmlParts);
//add new xml part
CustomXmlPart customXml = mainPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
using(StreamWriter ts = new StreamWriter(customXml.GetStream()))
{
ts.Write(newXML);
}
myDoc.Close();
}
return true;
}
The problem is that I can't figure out how to add a line break between "text" and "more text". I've tried Environment.NewLine, I've tried wrapping it in <w:p><w:r><w:t> tags. I can't seem to get it to produce a valid docx file.
Any help would be appreciated.
The Content Control properties has an option for "Allow carriage returns". Turning this on, and using Environment.NewLine worked perfectly.
I believe you'll have to wrap them in paragraphs in order to get the returns, as far as I know at least. So your resulting OOXML would look something like,
<w:p><w:r><w:t>Text</w:t></w:r></w:p>
<w:p><w:r><w:t>More text</w:t></w:r></w:p>
As far as it not resulting in valid OOXML when you do this, have you opened the OOXML package "document.xml" and saw exactly where the XML is invalid?
Edit:
The OOXML SDK 2.0 comes with some validation tools you might find useful.
via raw XML you can add:
<w:r>
<w:br />
</w:r>
via OOXML SDK:
Paragraph paragraph1 = new Paragraph();
Run breakRun = new Run();
breakRun.Append( new Break() );
paragraph1.Append( breakRun );
_document.MainDocumentPart.Document.AppendChild<Paragraph>(paragraph1);
//where _document is the WordProcessingDocument instance
Related
I'm new to VSTO and OpenXML and I would like to develop a Word add-in. This add-in should use OpenXML, The add in should add a MergeField to the document, I can actually add MergeField using ConsoleApp but I want to insert the MergeField from the Word add in to the current opened document.
So I have this code in ButtonClick
// take current file location
var fileFullName = Globals.ThisAddIn.Application.ActiveDocument.FullName;
Globals.ThisAddIn.Application.ActiveDocument.Close(WdSaveOptions.wdSaveChanges, WdOriginalFormat.wdOriginalDocumentFormat, true);
// function to insert new field here
OpenAndAddTextToWordDocument(fileFullName, "username");
Globals.ThisAddIn.Application.Documents.Open(fileFullName);
And I Created the function which should add the new MergeField:
public static DocumentFormat.OpenXml.Wordprocessing.Paragraph OpenAndAddTextToWordDocument(string filepath, string txt)
{
// Open a WordprocessingDocument for editing using the filepath.
WordprocessingDocument wordprocessingDocument =
WordprocessingDocument.Open(filepath, true);
// Assign a reference to the existing document body.
Body body = wordprocessingDocument.MainDocumentPart.Document.Body;
// add text
string instructionText = String.Format(" MERGEFIELD {0} \\* MERGEFORMAT", txt);
SimpleField simpleField1 = new SimpleField() { Instruction = instructionText };
Run run1 = new Run();
RunProperties runProperties1 = new RunProperties();
NoProof noProof1 = new NoProof();
runProperties1.Append(noProof1);
Text text1 = new Text();
text1.Text = String.Format("«{0}»", txt);
run1.Append(runProperties1);
run1.Append(text1);
simpleField1.Append(run1);
DocumentFormat.OpenXml.Wordprocessing.Paragraph paragraph = new DocumentFormat.OpenXml.Wordprocessing.Paragraph();
paragraph.Append(new OpenXmlElement[] { simpleField1 });
return paragraph;
// Close the handle explicitly.
wordprocessingDocument.Close();
But something is not working here, when I use the add in it doesn't do anything
Thanks for the help.
Add a try/catch and you'll probably find that it can't open the file because it's currently open for editing.
The OpenXML SDK is a library for writing to Office files without going through Office's interfaces. But you're trying to do so while also using Office's interfaces, so you're essentially trying to take two approaches at once. This isn't going to work unless you first close the document.
But what you probably want to do is use VSTO. In VSTO, each document has a Fields collection, that you can use to add fields.
Fields.Add(Range, Type, Text, PreserveFormatting)
I am trying to create a word document using a word template in my C# application using openXML. Here is my code so far:
DirectoryInfo tempDir = new DirectoryInfo(Server.MapPath("~\\Files\\WordTemplates\\"));
DirectoryInfo docsDir = new DirectoryInfo(Server.MapPath("~\\Files\\FinanceDocuments\\"));
string ype = "test Merge"; //if ype string contains spaces then I get this error
string sourceFile = tempDir + "\\PaymentOrderTemplate.dotx";
string destinationFile = docsDir + "\\" + "PaymentOrder.doc";
// Create a copy of the template file and open the copy
File.Copy(sourceFile, destinationFile, true);
// create key value pair, key represents words to be replace and
//values represent values in document in place of keys.
Dictionary<string, string> keyValues = new Dictionary<string, string>();
keyValues.Add("ype", ype);
SearchAndReplace(destinationFile, keyValues);
Process.Start(destinationFile);
And the SearchAndReplace funtion:
public static void SearchAndReplace(string document, Dictionary<string, string> dict)
{
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
foreach (KeyValuePair<string, string> item in dict)
{
Regex regexText = new Regex(item.Key);
docText = regexText.Replace(docText, item.Value);
}
using (StreamWriter sw = new StreamWriter(
wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
}
But when I try to open the exported file I get this error:
XML parsing error
Location: Part: /word/document.xml, line: 2, Column: 2142
Document.xml first lines:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid wp14">
<w:body>
<w:tbl>
<w:tblPr>
<w:tblW w:w="10348" w:ttest Merge="dxa"/>
<w:tblInd w:w="108" w:ttest Merge="dxa"/>
<w:tblBorders>
Edit
I found out that the problem occured because I was using mergefields in the word template. If I use plain text it works. But in this case it will be slow because it has to check every single word in the template and if matches replace it. Is it possible to do it in another way?
Disclaimer: You seem to be using the OpenXML SDK, because your code looks virtually identical to that found here: https://msdn.microsoft.com/en-us/library/bb508261(v=office.12).aspx - I've never in my life used this SDK and I'm basing this answer on an educated guess at what's happening
It seems that the operation you're carrying out on this Word document is affecting parts of the document that you didn't intend.
I believe that calling document.MainDocumentPart.GetStream() just giving you more or less raw direct access to the XML of the document, and you're then treating it as a plain xml file, manipulating it as text, and carrying out a list of straight text replacements? I think it's thus likely the cause of the problem because you're intending to edit document text, but accidentally damaging xml node structure in the process
By way of an example, here is a simple HTML document:
<html>
<head><title>Damage report</title></head>
<body>
<p>The soldier was shot once in the body and twice in the head</p>
</body>
</html>
You decide to run a find/replace to make the places the soldier was shot, a bit more specific:
var html = File.ReadAllText(#"c:\my.html");
html = html.Replace("body", "chest");
html = html.Replace("head", "forehead");
File.WriteAllText(#"c:\my.html");
Only thing, your document is now ruined:
<html>
<forehead><title>Damage report</title></forehead>
<chest>
<p>The soldier was shot once in the chest and twice in the forehead</p>
</chest>
</html>
A browser can't parse it (well, it's still valid I suppose, but it's meaningless) any more because the replacement operation broke some things.
You're replacing "ype" with "test Merge" but this seems to be clobbering an occurrence of the word "type" - something that it seems pretty likely would appear in the XML attribute or element names - and turning it into "ttest Merge".
To correctly change the content of an XML document's node texts, it should be parsed from text to an XML document object model representation, the nodes iterated, the texts altered, and the whole thing re-serialized back to xml text. Office SDK does seem to provide ways to do this, because you can treat a document like a collection of class object instances, and say things like this code snippet (also from MSDN):
// Create a Wordprocessing document.
using (WordprocessingDocument myDoc = WordprocessingDocument.Create(docName, WordprocessingDocumentType.Document))
{
// Add a new main document part.
MainDocumentPart mainPart = myDoc.AddMainDocumentPart();
//Create DOM tree for simple document.
mainPart.Document = new Document();
Body body = new Body();
Paragraph p = new Paragraph();
Run r = new Run();
Text t = new Text("Hello World!");
//Append elements appropriately.
r.Append(t);
p.Append(r);
body.Append(p);
mainPart.Document.Append(body);
// Save changes to the main document part.
mainPart.Document.Save();
}
You should be looking for another way, not using streams/direct low level xml access, to access the document elements. Something like these:
https://blogs.msdn.microsoft.com/brian_jones/2009/01/28/traversing-in-the-open-xml-dom/
https://www.gemboxsoftware.com/document/articles/find-replace-word-csharp
Or possibly starting with a related SO question like this: Search And Replace Text in OPENXML (Added file) (though the answer you need may be in the something linked inside this question)
I have a word document which contain many pages. One of those pages contain a placeholder instead of other content. so I want to replace that placeholder with another doc file without losing formatting. This doc file which is to be replaced may have many pages. How can I replace that placeholder with this doc file programmatically.. I searched many but could not find any option to insert a doc file replacing a placeholder.. Thank You In Advance.
Or how can we copy the contents of doc to be inserted and then replace the placeholder with copied content
I found a post here.The below code is from that post.
With the library, you can do the following to replace text from a Word document, considering that documentByteArray is your document byte content taken from database:
using (MemoryStream mem = new MemoryStream())
{
mem.Write(documentByteArray, 0, (int)documentByteArray.Length);
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
Regex regexText = new Regex("Hello world!");
docText = regexText.Replace(docText, "Hi Everyone!");
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
}
if instead of "Hi Everyone" if we replace it with a binarydata,which is an array of bytes
byte[] binarydata = File.ReadAllBytes(filepaths);
how can we modify the program?
First of all you should get a Nuget package called Novacode.Docx, this is what I have found to be the best Document creator and editor in the last few years.
using Novacode.Docx;
void Main()
{
var doc = DocX.Load(#"c:\temp\existingDoc.docx");
var docToAdd = DocX.Load(#"c:\temp\docToAdd.docx");
doc.InsertDocument(docToAdd, true); //version 1.0.0.22
doc.InsertDocument(docToAdd); //version 1.0.0.19
}
this is the most simple and basic implementation of what it is that youre after but this works.
for anything else take a look at the documentation at
https://docx.codeplex.com/
or
http://cathalscorner.blogspot.co.uk/
this will be the best place to start. I would also recommend that if you do use this one that you use the version 1.0.0.19 as there are some formatting issues in 1.0.0.22
I am trying to use Microsoft's OpenXML 2.5 library to create a OpenXML document. Everything works great, until I try to insert an HTML string into my document. I have scoured the web and here is what I have come up with so far (snipped to just the portion I am having trouble with):
Paragraph paragraph = new Paragraph();
Run run = new Run();
string altChunkId = "id1";
AlternativeFormatImportPart chunk =
document.MainDocumentPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.Html, altChunkId);
chunk.FeedData(new MemoryStream(Encoding.UTF8.GetBytes(ioi.Text)));
AltChunk altChunk = new AltChunk { Id = altChunkId };
run.AppendChild(new Break());
paragraph.AppendChild(run);
body.AppendChild(paragraph);
Obviously, I haven't actually added the altChunk in this example, but I have tried appending it everywhere - to the run, paragraph, body, etc. In ever case, I am unable to open up the docx file in Word 2010.
This is making me a little nutty because it seems like it should be straightforward (I will admit that I'm not fully understanding the AltChunk "thing"). Would appreciate any help.
Side Note: One thing I did find that was interesting, and I don't know if it's actually a problem or not, is this response which says AltChunk corrupts the file when working from a MemoryStream. Can anybody confirm that this is/isn't true?
I can reproduce the error "... there is a problem with the content" by using
an incomplete HTML document as the content of the alternative format import part.
For example if you use the following HTML snippet <h1>HELLO</h1>
MS Word is unable to open the document.
The code below shows how to add an AlternativeFormatImportPart to a word document.
(I've tested the code with MS Word 2013).
using (WordprocessingDocument doc = WordprocessingDocument.Open(#"test.docx", true))
{
string altChunkId = "myId";
MainDocumentPart mainDocPart = doc.MainDocumentPart;
var run = new Run(new Text("test"));
var p = new Paragraph(new ParagraphProperties(
new Justification() { Val = JustificationValues.Center }),
run);
var body = mainDocPart.Document.Body;
body.Append(p);
MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<html><head></head><body><h1>HELLO</h1></body></html>"));
// Uncomment the following line to create an invalid word document.
// MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<h1>HELLO</h1>"));
// Create alternative format import part.
AlternativeFormatImportPart formatImportPart =
mainDocPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.Html, altChunkId);
//ms.Seek(0, SeekOrigin.Begin);
// Feed HTML data into format import part (chunk).
formatImportPart.FeedData(ms);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainDocPart.Document.Body.Append(altChunk);
}
According to the Office OpenXML specification valid parent elements for the
w:altChunk element are body, comment, docPartBody, endnote, footnote, ftr, hdr and tc.
So, I've added the w:altChunk to the body element.
For more information on the w:altChunk element see this MSDN link.
EDIT
As pointed out by #user2945722, to make sure that the OpenXml library correctlty interprets the byte array as UTF-8, you should add the UTF-8 preamble. This can be done this way:
MemoryStream ms = new MemoryStream(new UTF8Encoding(true).GetPreamble().Concat(Encoding.UTF8.GetBytes(htmlEncodedString)).ToArray()
This will prevent your é's from being rendered as é's, your ä's as ä's, etc.
Had the same problem here, but a totally different cause. Worth a try if the accepted solution doesn't help. Try closing the file after saving. In my case, it happened to be the difference between a corrupt and a clean docx file. Oddly, most other operations work with only a Save() and program exit.
String cid = "chunkid";
WordprocessingDocument document = WordprocessingDocument.Open("somefile.docx", true);
Body body = document.MainDocumentPart.Document.Body;
MemoryStream ms = new MemoryStream(System.Text.Encoding.UTF8.GetBytes("<html><head></head><body>hi</body></html>"));
AlternativeFormatImportPart formatImportPart = document.MainDocumentPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Html, cid);
formatImportPart.FeedData(ms);
AltChunk altChunk = new AltChunk();
altChunk.Id = cid;
document.MainDocumentPart.Document.Body.Append(altChunk);
document.MainDocumentPart.Document.Save();
// here's the magic!
document.Close();
I have a word document with a bunch of content controls on it. These are mapped to a custom XML part. To build the document on the fly, I simply overwrite the custom XML part.
The problem I'm having, is that if I don't define a particular item, it's space is still visible in the document, pushing down the stuff below it, and looking inconsistent with the rest of the document.
Here's a basic example of my code:
var path = HttpContext.Current.Server.MapPath("~/Classes/Word/LawyerBio.docx");
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(path, true))
{
//create new XML string
//these values will populate the template word doc
string newXML = "<root>";
if (!String.IsNullOrEmpty(_lawyer["Recognition"]))
{
newXML += "<recognition>";
newXML += _text.Field("Recognition Title");
newXML += "</recognition>";
}
if (!String.IsNullOrEmpty(_lawyer["Board Memberships"]))
{
newXML += "<boards>";
newXML += _text.Field("Board Memberships Title");
newXML += "</boards>";
}
newXML += "</root>";
MainDocumentPart mainPart = myDoc.MainDocumentPart;
//delete old xml part
mainPart.DeleteParts<CustomXmlPart>(mainPart.CustomXmlParts);
//add new xml part
CustomXmlPart customXml = mainPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);
using(StreamWriter ts = new StreamWriter(customXml.GetStream()))
{
ts.Write(newXML);
}
myDoc.Close();
}
Is there any way to make these content controls actually collapse/hide?
I think you will have to do either some preprocessing before the docx is opened in Word, or some postprocessing (eg via a macro).
As an example of the preprocessing approach, OpenDoPE defines a "condition" which you could use to exclude the undefined stuff.