So I need to generate a docx file for reporting purposes. This report contains text, tables and a lot of images.
So far, I managed to add text and a table (and populate it based on the content of my xml using an xslt transform).
However, I am stuck on adding images. I found some examples of how to add images using C# but I don't think this is what I need. I need to format the document using my xslt and add the images in the right places (for instance in a table cell). Is it somehow possible to add a container using xslt which uses the filepath to display/embed the image similar to the <img> tag in html?
I know that the docx format is basically a zip containing a file structure and to embed the image I should add it to this file structure also.
Any examples or references are appreciated.
to give you an idea of my code:
XslCompiledTransform transform = new XslCompiledTransform();
transform.Load(xsltFile);
StringWriter stringWriter = new StringWriter();
XmlWriter xmlWriter = XmlWriter.Create(stringWriter);
transform.Transform(xmlFile, xmlWriter);
XmlDocument newWordContent = new XmlDocument();
newWordContent.LoadXml(stringWriter.ToString());
File.Copy(docXtemplate, outputFilename, true);
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(outputFilename, true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
Body body = new Body(newWordContent.DocumentElement.InnerXml);
DocumentFormat.OpenXml.Wordprocessing.Document document = new DocumentFormat.OpenXml.Wordprocessing.Document(body);
document.Save(mainPart);
}
It basically replaces the body of an existing docx file. This enables me to use all the formatting, etc.
The xslt file is generated by adjusting the document.xml file from the docx.
Update
Ok, so I figured out how to add an image to the docx file directory, see below
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(outputFilename, true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
ImagePart imagePart = mainPart.AddImagePart(ImagePartType.Png);
using (FileStream stream = new FileStream(imageFile, FileMode.Open))
{
imagePart.FeedData(stream);
}
Body body = new Body(newWordContent.DocumentElement.InnerXml);
DocumentFormat.OpenXml.Wordprocessing.Document document = new
DocumentFormat.OpenXml.Wordprocessing.Document(body);
document.Save(mainPart);
}
This will add the image to the docx structure. I also checkt the relatioship and this is present in the 'document.xml.rels' file. When I take this id and use it in my xslt to add the image to the document (for testing), I do see an area where the image should be when opening with Word, however it says: cannot display image with the red cross.
A difference I do notice is that image which where in the orignal docx are saved in "word\media" while the added image with the code above is added in "media". Not sure if this is a problem
Ok, So I think I figured it out.
XslCompiledTransform transform = new XslCompiledTransform();
transform.Load(xsltFile);
StringWriter stringWriter = new StringWriter();
XmlWriter xmlWriter = XmlWriter.Create(stringWriter);
transform.Transform(xmlFile, xmlWriter);
XmlDocument newWordContent = new XmlDocument();
newWordContent.LoadXml(stringWriter.ToString());
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(outputFilename, true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
ImagePart imagePart = mainPart.AddImagePart(ImagePartType.Png, "imgId");
using (FileStream stream = new FileStream(imageFile, FileMode.Open))
{
imagePart.FeedData(stream);
}
Body body = new Body(newWordContent.DocumentElement.InnerXml);
DocumentFormat.OpenXml.Wordprocessing.Document document = new
DocumentFormat.OpenXml.Wordprocessing.Document(body);
document.Save(mainPart);
}
The above code will add an image to your docx file structure with a specific id. You can use this id to refer to in your xsl transform. In the code example from my question I didn't set the id but used the one that was generated. However, each time you run this code the image will be added to the file with a new id resulting in a "not able to display" error. Not one of my sharpest moments;-).
For my use case I have to add multiple images to a large document so that code will be different but I think that based on the above code this can be achieved.
Related
I have an word file template and xml file for data. I want to find content Content control in word and get data from xml and then replace text in word template. i'm using the following code but it is not updating word file.
using (WordprocessingDocument document = WordprocessingDocument.CreateFromTemplate(txtWordFile.Text))
{
MainDocumentPart mainPart = document.MainDocumentPart;
IEnumerable<SdtBlock> block = mainPart.Document.Body.Descendants<SdtBlock>().Where
(r => r.SdtProperties.GetFirstChild<Tag>().Val == "TotalClose");
Text t = block.Descendants<Text>().Single();
t.Text = "13,450,542";
mainPart.Document.Save();
}
For anyone still struggling with this - you can check out this library https://github.com/antonmihaylov/OpenXmlTemplates
With it you can replace the text inside all content controls of the document based on a JSON object (or a basic C# dictionary) without writing specific code, instead you specify the variable name in the tag of the content control.
(Note - i am the maker of that library, but it is open sourced and licensed under LGPLv3)
I think you should write changes to temporary file.
See Save modified WordprocessingDocument to new file or my code from work project:
MemoryStream yourDocStream = new MemoryStream();
... // populate yourDocStream with .docx bytes
using (Package package = Package.Open(yourDocStream, FileMode.Open, FileAccess.ReadWrite))
{
// Load the document XML in the part into an XDocument instance.
PackagePart packagePart = LoadXmlPackagePart(package);
XDocument xDocument = XDocument.Load(XmlReader.Create(packagePart.GetStream()));
// making changes
// Save the XML into the package
using (XmlWriter xw = XmlWriter.Create(packagePart.GetStream(FileMode.Create, FileAccess.Write)))
{
xDocument.Save(xw);
}
var resultDocumentBytes = yourDocStream.ToArray();
}
The basic approach you use works fine, but I'm surprised you're not getting any compile-time errors because
IEnumerable<SdtBlock> block = mainPart.Document
.Body
.Descendants<SdtBlock>()
.Where(r => r.SdtProperties.GetFirstChild<Tag>().Val == "TotalClose");
is not compatible with
Text t = block.Descendants<Text>().Single();
block, as IEnumerable has no Descendants property. You either need to loop through all the items in IEnumerable and perform this on each item, or you need to define and instantiate a single item, like this:
using (WordprocessingDocument document = WordprocessingDocument.CreateFromTemplate(txtWordFile.Text))
{
MainDocumentPart mainPart = pkgDoc.MainDocumentPart;
SdtBlock block = mainPart.Document.Body.Descendants<SdtBlock>().Where
(r => r.SdtProperties.GetFirstChild<Tag>().Val == "test1").FirstOrDefault();
Text t = block.Descendants<Text>().Single();
t.Text = "13,450,542";
mainPart.Document.Save();
}
I have a word document which contain many pages. One of those pages contain a placeholder instead of other content. so I want to replace that placeholder with another doc file without losing formatting. This doc file which is to be replaced may have many pages. How can I replace that placeholder with this doc file programmatically.. I searched many but could not find any option to insert a doc file replacing a placeholder.. Thank You In Advance.
Or how can we copy the contents of doc to be inserted and then replace the placeholder with copied content
I found a post here.The below code is from that post.
With the library, you can do the following to replace text from a Word document, considering that documentByteArray is your document byte content taken from database:
using (MemoryStream mem = new MemoryStream())
{
mem.Write(documentByteArray, 0, (int)documentByteArray.Length);
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
{
string docText = null;
using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
{
docText = sr.ReadToEnd();
}
Regex regexText = new Regex("Hello world!");
docText = regexText.Replace(docText, "Hi Everyone!");
using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
{
sw.Write(docText);
}
}
}
if instead of "Hi Everyone" if we replace it with a binarydata,which is an array of bytes
byte[] binarydata = File.ReadAllBytes(filepaths);
how can we modify the program?
First of all you should get a Nuget package called Novacode.Docx, this is what I have found to be the best Document creator and editor in the last few years.
using Novacode.Docx;
void Main()
{
var doc = DocX.Load(#"c:\temp\existingDoc.docx");
var docToAdd = DocX.Load(#"c:\temp\docToAdd.docx");
doc.InsertDocument(docToAdd, true); //version 1.0.0.22
doc.InsertDocument(docToAdd); //version 1.0.0.19
}
this is the most simple and basic implementation of what it is that youre after but this works.
for anything else take a look at the documentation at
https://docx.codeplex.com/
or
http://cathalscorner.blogspot.co.uk/
this will be the best place to start. I would also recommend that if you do use this one that you use the version 1.0.0.19 as there are some formatting issues in 1.0.0.22
I am trying to create a new MS Word document and insert a new merge field using the Microsoft Open Office SDK (v2.0) but the code below does not work (the merge field doesn't appear when I save the resulting stream to a file and open it). I have searched the internet but have been unable to resolve the issue.
MemoryStream stream = new MemoryStream();
using (
WordprocessingDocument wordDocument = WordprocessingDocument.Create(stream,
DocumentFormat.OpenXml.WordprocessingDocumentType.Document, true))
{
MainDocumentPart mainPart = wordDocument.AddMainDocumentPart();
mainPart.Document = new Document();
Body body = mainPart.Document.AppendChild(new Body());
FieldCode fieldCode = new FieldCode(" MERGEFIELD MyMergeField ");
body.AppendChild(fieldCode);
mainPart.Document.Save();
}
return stream;
Thanks for your help.
I have an XFA PDF file (which I did not author). It's a third-party form which I'm trying to fill out. I filled out the form manually, then I used iTextSharp save the full XML DomDocument from it. Now I'm trying to apply that same XML file programmatically. However, the resulting PDF doesn't have any of the fields filled in. This is the code I'm using to apply the XML file:
PdfReader pdfReader = new PdfReader(inputPdf);
using (MemoryStream ms = new MemoryStream())
{
using (PdfStamper stamper = new PdfStamper(pdfReader, ms, '\0', true))
{
XfaForm xfaForm = new XfaForm(pdfReader);
XmlDocument doc = new XmlDocument();
doc.Load(inputXml);
xfaForm.DomDocument = doc;
xfaForm.Changed = true;
XfaForm.SetXfa(xfaForm, stamper.Reader, stamper.Writer);
}
var bytes = ms.ToArray();
System.IO.File.WriteAllBytes(outputPdf, bytes);
}
inputPdf is the path to the original empty PDF file.
inputXml is the path to the XML file extracted from the filled out PDF file. This is the entire XML file, and not just the datasets section.
What's interesting is that if I create the PdfStamper object like this instead:
new PdfStamper(pdfReader, ms);
then I see the data in the fields, but of course then I have the associated issues with not appending.
Any suggestions on what I might be doing wrong? I just can't seem to get any of the changes to the DomDocument to save.
I am trying to use Microsoft's OpenXML 2.5 library to create a OpenXML document. Everything works great, until I try to insert an HTML string into my document. I have scoured the web and here is what I have come up with so far (snipped to just the portion I am having trouble with):
Paragraph paragraph = new Paragraph();
Run run = new Run();
string altChunkId = "id1";
AlternativeFormatImportPart chunk =
document.MainDocumentPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.Html, altChunkId);
chunk.FeedData(new MemoryStream(Encoding.UTF8.GetBytes(ioi.Text)));
AltChunk altChunk = new AltChunk { Id = altChunkId };
run.AppendChild(new Break());
paragraph.AppendChild(run);
body.AppendChild(paragraph);
Obviously, I haven't actually added the altChunk in this example, but I have tried appending it everywhere - to the run, paragraph, body, etc. In ever case, I am unable to open up the docx file in Word 2010.
This is making me a little nutty because it seems like it should be straightforward (I will admit that I'm not fully understanding the AltChunk "thing"). Would appreciate any help.
Side Note: One thing I did find that was interesting, and I don't know if it's actually a problem or not, is this response which says AltChunk corrupts the file when working from a MemoryStream. Can anybody confirm that this is/isn't true?
I can reproduce the error "... there is a problem with the content" by using
an incomplete HTML document as the content of the alternative format import part.
For example if you use the following HTML snippet <h1>HELLO</h1>
MS Word is unable to open the document.
The code below shows how to add an AlternativeFormatImportPart to a word document.
(I've tested the code with MS Word 2013).
using (WordprocessingDocument doc = WordprocessingDocument.Open(#"test.docx", true))
{
string altChunkId = "myId";
MainDocumentPart mainDocPart = doc.MainDocumentPart;
var run = new Run(new Text("test"));
var p = new Paragraph(new ParagraphProperties(
new Justification() { Val = JustificationValues.Center }),
run);
var body = mainDocPart.Document.Body;
body.Append(p);
MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<html><head></head><body><h1>HELLO</h1></body></html>"));
// Uncomment the following line to create an invalid word document.
// MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes("<h1>HELLO</h1>"));
// Create alternative format import part.
AlternativeFormatImportPart formatImportPart =
mainDocPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.Html, altChunkId);
//ms.Seek(0, SeekOrigin.Begin);
// Feed HTML data into format import part (chunk).
formatImportPart.FeedData(ms);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainDocPart.Document.Body.Append(altChunk);
}
According to the Office OpenXML specification valid parent elements for the
w:altChunk element are body, comment, docPartBody, endnote, footnote, ftr, hdr and tc.
So, I've added the w:altChunk to the body element.
For more information on the w:altChunk element see this MSDN link.
EDIT
As pointed out by #user2945722, to make sure that the OpenXml library correctlty interprets the byte array as UTF-8, you should add the UTF-8 preamble. This can be done this way:
MemoryStream ms = new MemoryStream(new UTF8Encoding(true).GetPreamble().Concat(Encoding.UTF8.GetBytes(htmlEncodedString)).ToArray()
This will prevent your é's from being rendered as é's, your ä's as ä's, etc.
Had the same problem here, but a totally different cause. Worth a try if the accepted solution doesn't help. Try closing the file after saving. In my case, it happened to be the difference between a corrupt and a clean docx file. Oddly, most other operations work with only a Save() and program exit.
String cid = "chunkid";
WordprocessingDocument document = WordprocessingDocument.Open("somefile.docx", true);
Body body = document.MainDocumentPart.Document.Body;
MemoryStream ms = new MemoryStream(System.Text.Encoding.UTF8.GetBytes("<html><head></head><body>hi</body></html>"));
AlternativeFormatImportPart formatImportPart = document.MainDocumentPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Html, cid);
formatImportPart.FeedData(ms);
AltChunk altChunk = new AltChunk();
altChunk.Id = cid;
document.MainDocumentPart.Document.Body.Append(altChunk);
document.MainDocumentPart.Document.Save();
// here's the magic!
document.Close();