I am trying to use Microsoft's OpenXML 2.5 library to create a OpenXML document. Everything works great, until I try to insert an HTML string into my document. I have scoured the web and here is what I have come up with so far (snipped to just the portion I am having trouble with):
Paragraph paragraph = new Paragraph();
Run run = new Run();
string altChunkId = "id1";
AlternativeFormatImportPart chunk =
document.MainDocumentPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.Html, altChunkId);
chunk.FeedData(new MemoryStream(Encoding.UTF8.GetBytes(ioi.Text)));
AltChunk altChunk = new AltChunk { Id = altChunkId };
run.AppendChild(new Break());
paragraph.AppendChild(run);
body.AppendChild(paragraph);
Obviously, I haven't actually added the altChunk in this example, but I have tried appending it everywhere - to the run, paragraph, body, etc. In ever case, I am unable to open up the docx file in Word 2010.
This is making me a little nutty because it seems like it should be straightforward (I will admit that I'm not fully understanding the AltChunk "thing"). Would appreciate any help.
Side Note: One thing I did find that was interesting, and I don't know if it's actually a problem or not, is this response which says AltChunk corrupts the file when working from a MemoryStream. Can anybody confirm that this is/isn't true?
I can reproduce the error "... there is a problem with the content" by using
an incomplete HTML document as the content of the alternative format import part.
For example if you use the following HTML snippet <h1>HELLO</h1>
MS Word is unable to open the document.
The code below shows how to add an AlternativeFormatImportPart to a word document.
(I've tested the code with MS Word 2013).
using (WordprocessingDocument doc = WordprocessingDocument.Open(#"test.docx", true))
{
string altChunkId = "myId";
MainDocumentPart mainDocPart = doc.MainDocumentPart;
var run = new Run(new Text("test"));
var p = new Paragraph(new ParagraphProperties(
new Justification() { Val = JustificationValues.Center }),
run);
var body = mainDocPart.Document.Body;
body.Append(p);
MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<html><head></head><body><h1>HELLO</h1></body></html>"));
// Uncomment the following line to create an invalid word document.
// MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<h1>HELLO</h1>"));
// Create alternative format import part.
AlternativeFormatImportPart formatImportPart =
mainDocPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.Html, altChunkId);
//ms.Seek(0, SeekOrigin.Begin);
// Feed HTML data into format import part (chunk).
formatImportPart.FeedData(ms);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainDocPart.Document.Body.Append(altChunk);
}
According to the Office OpenXML specification valid parent elements for the
w:altChunk element are body, comment, docPartBody, endnote, footnote, ftr, hdr and tc.
So, I've added the w:altChunk to the body element.
For more information on the w:altChunk element see this MSDN link.
EDIT
As pointed out by #user2945722, to make sure that the OpenXml library correctlty interprets the byte array as UTF-8, you should add the UTF-8 preamble. This can be done this way:
MemoryStream ms = new MemoryStream(new UTF8Encoding(true).GetPreamble().Concat(Encoding.UTF8.GetBytes(htmlEncodedString)).ToArray()
This will prevent your é's from being rendered as é's, your ä's as ä's, etc.
Had the same problem here, but a totally different cause. Worth a try if the accepted solution doesn't help. Try closing the file after saving. In my case, it happened to be the difference between a corrupt and a clean docx file. Oddly, most other operations work with only a Save() and program exit.
String cid = "chunkid";
WordprocessingDocument document = WordprocessingDocument.Open("somefile.docx", true);
Body body = document.MainDocumentPart.Document.Body;
MemoryStream ms = new MemoryStream(System.Text.Encoding.UTF8.GetBytes("<html><head></head><body>hi</body></html>"));
AlternativeFormatImportPart formatImportPart = document.MainDocumentPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Html, cid);
formatImportPart.FeedData(ms);
AltChunk altChunk = new AltChunk();
altChunk.Id = cid;
document.MainDocumentPart.Document.Body.Append(altChunk);
document.MainDocumentPart.Document.Save();
// here's the magic!
document.Close();
Related
So I need to generate a docx file for reporting purposes. This report contains text, tables and a lot of images.
So far, I managed to add text and a table (and populate it based on the content of my xml using an xslt transform).
However, I am stuck on adding images. I found some examples of how to add images using C# but I don't think this is what I need. I need to format the document using my xslt and add the images in the right places (for instance in a table cell). Is it somehow possible to add a container using xslt which uses the filepath to display/embed the image similar to the <img> tag in html?
I know that the docx format is basically a zip containing a file structure and to embed the image I should add it to this file structure also.
Any examples or references are appreciated.
to give you an idea of my code:
XslCompiledTransform transform = new XslCompiledTransform();
transform.Load(xsltFile);
StringWriter stringWriter = new StringWriter();
XmlWriter xmlWriter = XmlWriter.Create(stringWriter);
transform.Transform(xmlFile, xmlWriter);
XmlDocument newWordContent = new XmlDocument();
newWordContent.LoadXml(stringWriter.ToString());
File.Copy(docXtemplate, outputFilename, true);
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(outputFilename, true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
Body body = new Body(newWordContent.DocumentElement.InnerXml);
DocumentFormat.OpenXml.Wordprocessing.Document document = new DocumentFormat.OpenXml.Wordprocessing.Document(body);
document.Save(mainPart);
}
It basically replaces the body of an existing docx file. This enables me to use all the formatting, etc.
The xslt file is generated by adjusting the document.xml file from the docx.
Update
Ok, so I figured out how to add an image to the docx file directory, see below
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(outputFilename, true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
ImagePart imagePart = mainPart.AddImagePart(ImagePartType.Png);
using (FileStream stream = new FileStream(imageFile, FileMode.Open))
{
imagePart.FeedData(stream);
}
Body body = new Body(newWordContent.DocumentElement.InnerXml);
DocumentFormat.OpenXml.Wordprocessing.Document document = new
DocumentFormat.OpenXml.Wordprocessing.Document(body);
document.Save(mainPart);
}
This will add the image to the docx structure. I also checkt the relatioship and this is present in the 'document.xml.rels' file. When I take this id and use it in my xslt to add the image to the document (for testing), I do see an area where the image should be when opening with Word, however it says: cannot display image with the red cross.
A difference I do notice is that image which where in the orignal docx are saved in "word\media" while the added image with the code above is added in "media". Not sure if this is a problem
Ok, So I think I figured it out.
XslCompiledTransform transform = new XslCompiledTransform();
transform.Load(xsltFile);
StringWriter stringWriter = new StringWriter();
XmlWriter xmlWriter = XmlWriter.Create(stringWriter);
transform.Transform(xmlFile, xmlWriter);
XmlDocument newWordContent = new XmlDocument();
newWordContent.LoadXml(stringWriter.ToString());
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(outputFilename, true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
ImagePart imagePart = mainPart.AddImagePart(ImagePartType.Png, "imgId");
using (FileStream stream = new FileStream(imageFile, FileMode.Open))
{
imagePart.FeedData(stream);
}
Body body = new Body(newWordContent.DocumentElement.InnerXml);
DocumentFormat.OpenXml.Wordprocessing.Document document = new
DocumentFormat.OpenXml.Wordprocessing.Document(body);
document.Save(mainPart);
}
The above code will add an image to your docx file structure with a specific id. You can use this id to refer to in your xsl transform. In the code example from my question I didn't set the id but used the one that was generated. However, each time you run this code the image will be added to the file with a new id resulting in a "not able to display" error. Not one of my sharpest moments;-).
For my use case I have to add multiple images to a large document so that code will be different but I think that based on the above code this can be achieved.
I have a word document which contain many pages. One of those pages contain a placeholder instead of other content. so I want to replace that placeholder with another doc file without losing formatting. This doc file which is to be replaced may have many pages. How can I replace that placeholder with this doc file programmatically.. I searched many but could not find any option to insert a doc file replacing a placeholder.. Thank You In Advance.
Or how can we copy the contents of doc to be inserted and then replace the placeholder with copied content
I found a post here.The below code is from that post.
With the library, you can do the following to replace text from a Word document, considering that documentByteArray is your document byte content taken from database:
using (MemoryStream mem = new MemoryStream())
{
mem.Write(documentByteArray, 0, (int)documentByteArray.Length);
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
Regex regexText = new Regex("Hello world!");
docText = regexText.Replace(docText, "Hi Everyone!");
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
}
if instead of "Hi Everyone" if we replace it with a binarydata,which is an array of bytes
byte[] binarydata = File.ReadAllBytes(filepaths);
how can we modify the program?
First of all you should get a Nuget package called Novacode.Docx, this is what I have found to be the best Document creator and editor in the last few years.
using Novacode.Docx;
void Main()
{
var doc = DocX.Load(#"c:\temp\existingDoc.docx");
var docToAdd = DocX.Load(#"c:\temp\docToAdd.docx");
doc.InsertDocument(docToAdd, true); //version 1.0.0.22
doc.InsertDocument(docToAdd); //version 1.0.0.19
}
this is the most simple and basic implementation of what it is that youre after but this works.
for anything else take a look at the documentation at
https://docx.codeplex.com/
or
http://cathalscorner.blogspot.co.uk/
this will be the best place to start. I would also recommend that if you do use this one that you use the version 1.0.0.19 as there are some formatting issues in 1.0.0.22
I am trying to create a new MS Word document and insert a new merge field using the Microsoft Open Office SDK (v2.0) but the code below does not work (the merge field doesn't appear when I save the resulting stream to a file and open it). I have searched the internet but have been unable to resolve the issue.
MemoryStream stream = new MemoryStream();
using (
WordprocessingDocument wordDocument = WordprocessingDocument.Create(stream,
DocumentFormat.OpenXml.WordprocessingDocumentType.Document, true))
{
MainDocumentPart mainPart = wordDocument.AddMainDocumentPart();
mainPart.Document = new Document();
Body body = mainPart.Document.AppendChild(new Body());
FieldCode fieldCode = new FieldCode(" MERGEFIELD MyMergeField ");
body.AppendChild(fieldCode);
mainPart.Document.Save();
}
return stream;
Thanks for your help.
I'm looking to replace a bookmark in a word document with the entire contents of another word document. I was hoping to do something along the lines of the following, but appending the xml does not seem to be enough as it does not include pictures.
using Word = Microsoft.Office.Interop.Word;
...
Word.Application wordApp = new Word.Application();
Word.Document doc = wordApp.Documents.Add(filename);
var bookmark = doc.Bookmarks.OfType<Bookmark>().First();
var doc2 = wordApp.Documents.Add(filename2);
bookmark.Range.InsertXML(doc2.Contents.XML);
The second document contains a few images and a few tables of text.
Update: Progress made by using XML, but still doesn't satisfy adding pictures as well.
You've jumped in deep.
If you're using the object model (bookmark.Range) and trying to insert a picture you can use the clipboard or bookmark.Range.InlineShapes.AddPicture(...). If you're trying to insert a whole document you can copy/paste the second document:
Object objUnit = Word.WdUnits.wdStory;
wordApp.Selection.EndKey(ref objUnit, ref oMissing);
wordApp.ActiveWindow.Selection.PasteAndFormat(Word.WdRecoveryType.wdPasteDefault);
If you're using XML there may be other problems, such as formatting, images, headers/footers not coming in correctly.
Depending on the task it may be better to use DocumentBuilder and OpenXML SDK. If you're writing a Word addin you can use the object API, it will likely perform the same, if you're processing documents without Word go with OpenXML SDK and DocumentBuilder. The issue with DocumentBuilder is if it doesn't work there aren't many work-arounds to try. It's open source not the cleanest piece of code if you try troubleshooting it.
You can do this with openxml SDK and Document builder. To outline here is what you will need
1> Inject insert key in main doc
public WmlDocument GetProcessedTemplate(string templatePath, string insertKey)
{
WmlDocument templateDoc = new WmlDocument(templatePath);
using (MemoryStream mem = new MemoryStream())
{
mem.Write(templateDoc.DocumentByteArray, 0, templateDoc.DocumentByteArray.Length);
using (WordprocessingDocument doc = WordprocessingDocument.Open([source], true))
{
XDocument xDoc = doc.MainDocumentPart.GetXDocument();
XElement bookMarkPara = [get bookmarkPara to replace];
bookMarkPara.ReplaceWith(new XElement(PtOpenXml.Insert, new XAttribute("Id", insertKey)));
doc.MainDocumentPart.PutXDocument();
}
templateDoc.DocumentByteArray = mem.ToArray();
}
return templateDoc;
}
2> Use document builder to merge
List<Source> documentSources = new List<Source>();
var insertKey = "INSERT_HERE_1";
var processedTemplate = GetProcessedTemplate([docPath], insertKey);
documentSources.Add(new Source(processedTemplate, true));
documentSources.Add(new Source(new WmlDocument([docToInsertFilePath]), insertKey));
DocumentBuilder.BuildDocument(documentSources, [outputFilePath]);
Using Word 2010 GUI, there is an option to "Insert text from file...", which does exactly that: It insert the text in the main part of a document to the current location in your document.
I would like to do the same using C# and the OpenXml SDK 2.0
using (var mainDocument = WordprocessingDocument.Open("MainFile.docx", true);
{
var mainPart = mainDocument.MainDocumentPart;
var bookmarkStart = mainPart
.Document
.Body
.Descendants<BookmarkStart>()
.SingleOrDefault(b => b.Name == "ExtraContentBookmark");
var extraContent = GetTextFromFile("ExtraFile.docx");
bookmarkStart.InsertAfterSelf(extraContent);
}
I have tried using plain Xml (XElement), using OpenXmlElement (MainDocumentPart.Document.Body.Descendants), and using AltChunk. Every alternative so far has yielded a non-conformant docx-file.
What should the method GetTextFromFile look like?
This is how I implemented it. The solution was to use AltChunk as described by Eric White. I had already tried it, but as Bradley said in his answer, a bookmark may be anywhere in a document, and mine was inside a paragraph. As soon as I inserted the text before the containing paragraph, everything worked fine.
Here is the (simplified) code:
using (var mainDocument = WordprocessingDocument.Open("MainFile.docx", true);
{
var mainPart = mainDocument.MainDocumentPart;
var bookmarkStart = mainPart
.Document
.Body
.Descendants<BookmarkStart>()
.SingleOrDefault(b => b.Name == "ExtraContentBookmark");
var altChunk = GetAltChunkFromFile("ExtraFile.docx", mainPart);
var containingParagraph = element.Ancestors<Paragraph>().FirstOrDefault();
containingParagraph.InsertBeforeSelf(altChunk);
}
...
private AltChunk GetAltChunk(string filename, MainDocumentPart mainDocumentPart)
{
var altChunkId = "AltChunkId1";
var chunk = mainDocumentPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
chunk.FeedData(File.Open(filename, FileMode.Open));
var altChunk = new AltChunk { Id = altChunkId };
return altChunk;
}
It is not as simple as inserting the descendants of the document body tag at the bookmark location. Some reasons:
The two documents may be using different styles; you would have to copy across dependant styles, or update the references to use the styles in the destination document.
The <bookmarkStart> tag can appear almost anywhere in a document, including inside a paragraph, a run, a table cell, etc. Since you cannot nest paragraphs or runs, you will have to determine where the bookmark is situated, then ascend/descend the XML tree until you find an appropriate place to insert the content.
What you're trying to do becomes quite a complicated task when using the OpenXml SDK. It requires an in-depth understanding of the format and its schema.
I would almost advise using VSTO/OLE automation instead, as it enables you to use the functionality that is built into Word.