Open XML parts are missing in dynamically created Word document - c#

I'm creating WordprocessingDocuments in C# with the Open XML SDK and then converting them to pdf. Initially, I was using Interop to save the document in PDF format, but now that is not an option. I found that LibreOffice can convert documents calling soffice.exe from cmd, and I had wonderful results with normal documents. Still, then, when I tested LibreOffice converter with my dynamic documents, the converter crashed.
I copied one of these documents and opened it with LibreOffice Writer, its structure was wrong, then I opened the same document with Microsoft Word and its structure was fine. Finally, I saved it with Microsoft Word and opened both documents as ZIP files as below:
This is the good one:
And this is the bad one:
I noticed that when I save the document in Microsoft Word, these Open XML parts (which I called "files" in an earlier version of this question) are appearing. When I open the document previously saved with Microsoft Word in LibreOffice, the document is fine again.
Thus, is there a way to generate these Open XML parts (inside the Word document) without opening Microsoft Word?
I use the following code (to check if it is creating all the files):
using (MemoryStream mem = new MemoryStream())
{
// Create Document
using (WordprocessingDocument wordDocument =
WordprocessingDocument.Create(mem, WordprocessingDocumentType.Document, true))
{
// Add a main document part.
MainDocumentPart mainPart = wordDocument.AddMainDocumentPart();
// Create the document structure and add some text.
mainPart.Document = new Document();
Body docBody = new Body();
// Add your docx content here
CreateParagraph(docBody);
CreateStyledParagraph(docBody);
CreateTable(docBody);
CreateList(docBody);
Paragraph pImg = new Paragraph();
ImagePart imagePart = mainPart.AddImagePart(ImagePartType.Jpeg);
string imgPath = "https://cdn.pixabay.com/photo/2019/11/15/05/23/dog-4627679_960_720.png";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(imgPath);
req.UseDefaultCredentials = true;
req.PreAuthenticate = true;
req.Credentials = CredentialCache.DefaultCredentials;
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
imagePart.FeedData(resp.GetResponseStream());
// 1500000 and 1092000 are img width and height
Run rImg = new Run(DrawingManager(mainPart.GetIdOfPart(imagePart), "PictureName", 1500000, 1092000, string.Empty));
pImg.Append(rImg);
docBody.Append(pImg);
Paragraph pLink = new Paragraph();
// For the mainpart see above
pLink.Append(HyperLinkManager("http://YourLink", "My awesome link", mainPart));
docBody.Append(pLink);
mainPart.Document.Append(docBody);
mainPart.Document.Save();
wordDocument.Close();
}
result = Convert.ToBase64String(mem.ToArray());
}
The code above creates a Word document named Result.docx with the following structure:
But there aren't any other Open XML parts (like app.xml or styles.xml)

You need to make a difference between:
the Open XML standard and its minimum requirements on a WordprocessingDocument and
the "minimum" document created by Microsoft Word or other applications.
As per the standard, the minimum WordprocessingDocument only needs a main document part (MainDocumentPart, document.xml) with the following content:
<w:document xmlns:w="...">
<w:body>
<w:p />
</w:body>
</w:document>
Further parts such as the StyleDefinitionsPart (styles.xml) or the NumberingDefintionsPart (numbering.xml) are only required if you have styles or numbering, in which case you must explicitly create them in your code.
Next, looking at your sample code, it seems you are creating:
paragraphs that reference styles (see CreateStyledParagraph(docBody)), which would have to be defined in the StyleDefinitionsPart (styles.xml); and
numbered lists (e.g., CreateList(docBody)), which would have to be defined in the NumberingDefinitionsPart (numbering.xml).
However, your code neither creates a StyleDefinitionsPart nor a NumberingDefintionsPart, which means your document is likely not a valid Open XML document.
Now, Word is very forgiving and fixes various issues silently, ignoring parts of your Open XML markup (e.g., the styles you might have assigned to your paragraphs).
By contrast, depending on how fault-tolerant LibreOffice is, invalid Open XML markup might lead to a crash. For example, if LibreOffice simply assumes that a StyleDefinitionsPart exists when it finds an element like <w:pStyle w:val="MyStyleName" /> in your w:document and then does not check whether it gets a null reference when asking for the StyleDefinitionsPart, it could crash.
Finally, to add parts to your Word document, you would use the Open XML SDK as follows:
[Fact]
public void CanAddParts()
{
const string path = "Document.docx";
const WordprocessingDocumentType type = WordprocessingDocumentType.Document;
using WordprocessingDocument wordDocument = WordprocessingDocument.Create(path, type);
// Create minimum main document part.
MainDocumentPart mainDocumentPart = wordDocument.AddMainDocumentPart();
mainDocumentPart.Document = new Document(new Body(new Paragraph()));
// Create empty style definitions part.
var styleDefinitionsPart = mainDocumentPart.AddNewPart<StyleDefinitionsPart>();
styleDefinitionsPart.Styles = new Styles();
// Create empty numbering definitions part.
var numberingDefinitionsPart = mainDocumentPart.AddNewPart<NumberingDefinitionsPart>();
numberingDefinitionsPart.Numbering = new Numbering();
}

Related

Duplicate Word Document Using OpenXML While Open Original Document

I need to create a same copy of existing word document and open it as another instance while the original first document being opened. The second word document do not save but user may have the option to save it or not.
This need to be done using OpenXML.
I will attached here the current implementation. This implementation is having several issues.
The first document need to close first before use it in WordprocessingDocument using statement.
The second newly created document need to save in local folder.
Code Initiation
var doc = Globals.ThisAddIn.Application.ActiveDocument;
doc.Save();
string fileName = doc.FullName;
doc.Close();
using (WordprocessingDocument document = WordprocessingDocument.Create(fileName, WordprocessingDocumentType.Document))
{
}
Why do you need to use OpenXML ? With Interop you could simply:
Open the existing document
Copy everything within the document range
Create a new document
Paste the other document in the new one
It's done quickly and does the job perfectly

excluding header, footer and watermark from last page with openxml

I'm using Open XML (DocumentFormat.OpenXml nuget package) to generating a docx file. Here is my approach:
I have a file, named template.docx. In this file I have a Cover Page and a blank page which has header, footer, and a background image. Anyway, I first open the document, then append some text to the document, then close it.
In the other hand, I have a file named template-back.docx which I want to append that at the end of modified document (template.docx) above.
I'm able to do that, by using this snippet:
public static void MergeDocumentWithPagebreak(string sourceFile, string destinationFile, string altChunkID) {
using (var myDoc = WordprocessingDocument.Open(sourceFile, true)) {
var mainPart = myDoc.MainDocumentPart;
//Append page break
var para = new Paragraph(new Run((new Break() { Type = BreakValues.Page })));
mainPart.Document.Body.InsertAfter(para, mainPart.Document.Body.LastChild);
//Append file
var chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkID);
using (var fileStream = File.Open(destinationFile, FileMode.Open))
chunk.FeedData(fileStream);
var altChunk = new AltChunk{
Id = altChunkID
};
mainPart.Document
.Body
.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
mainPart.Document.Save();
}
}
But, when I do that, the header, footer, and background image, are applied to the last page. I want to be able to exclude last page from getting those designs. I want it to be clean, simple and white. But googling the issue, had nothing to help. Do you have any idea please? Thanks in advance.
P.S.
The original article about merging documents here:
It's a little bit tricky, but not so complicated.
First you have to understand how word works:
By default, a word document is one section, and this one section share header and footer. If you want differents header / footer, you have to create a break at the end of a page to indicate "the next page is a new section".
Once a new section is create, you must indicate "the new section don't share the same header / footer"
Some documentation on "how to create different header in word". http://www.techrepublic.com/blog/microsoft-office/accommodate-different-headers-and-footers-in-a-word-document/
If we translate to your code, before inserting your document at the end of the other, you have to:
Create a section break
Inserting a new Header / footer in this section (an empty one)
Insert your new document in the new section
To create the new header, some other documentation: https://msdn.microsoft.com/en-us/library/office/cc546917.aspx
Trick: if the document you insert don't contain header / footer, create empty ones and recopy them
Information: I tried to delete the <w:headerReference r:id="rIdX" w:type="default"/> or to set the r:id to 0 but it don't work. Create an empty header is the fastest way
Replace your Page break with the following code
Paragraph PageBreakParagraph = new Paragraph(new DocumentFormat.OpenXml.Wordprocessing.Run(new DocumentFormat.OpenXml.Wordprocessing.Break() { Type = BreakValues.Page }));
I also saw that you are inserting after the last child which is not essential same as Appending but works well for you! Use this instead.
wordprocessingDocument.MainDocumentPart.Document.Body.Append(PageBreakParagraph)
You need to add the section break to the section properties. You then need to append the section properties to the paragraph properties. Followed by appending the paragraph properties to a paragraph.
Paragraph paragraph232 = new Paragraph();
ParagraphProperties paragraphProperties220 = new ParagraphProperties();
SectionProperties sectionProperties1 = new SectionProperties();
SectionType sectionType1 = new SectionType(){ Val = SectionMarkValues.NextPage };
sectionProperties1.Append(sectionType1);
paragraphProperties220.Append(sectionProperties1);
paragraph232.Append(paragraphProperties220);
//Replace your last but one line with this one.
mainPart.Document
.Body
.Append(altChunk);
The resulting Open XML is:
<w:p>
<w:pPr>
<w:sectPr>
<w:type w:val="nextPage" />
</w:sectPr>
</w:pPr>
</w:p>
The Easiest way to do it is to actually create the document in word and then open in it in the Open XML Productivity Tool, you can reflect the code and see what C# code would generate the various Open XML elements you are trying to achieve. Hope this helps!

Word document orientation lost after using OpenXML SDK AddAlternativeFormatImportPart

I am attempting to merge several Word documents together into a single Word document. I am using the AltChunk capability from Microsoft's OpenXML SDK 2.5. The final report needs to be in landscape orientation, thus we have put each component document into landscape mode. I am merging the documents using the following code.
for (int i = 0; i < otherDocs.Length; i++)
{
using (var headerDoc = WordprocessingDocument.Open(headerPath, true))
{
var mainPart = headerDoc.MainDocumentPart;
string altChunkId = "AltChunkId" + i;
var chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);
using (var fileStream = File.Open(otherDocs[i], FileMode.Open))
{
chunk.FeedData(fileStream);
}
var altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
mainPart.Document.Save();
DocumentWriter.SetPrintOrientation(headerDoc, PageOrientationValues.Landscape);
headerDoc.Close();
}
}
When I run this code, the final output document has a mix of landscape and portrait orientation if the component documents each have at least one section break.
The DocumentWriter.SetPrintOrientation() method is implemented according to instructions from MSDN. It seems to have no effect on the actual orientation of the document. I have also examined the underlying XML files, and all "orient" attributes are set to landscape.
Is there some configuration option or API call I can use to ensure the final document will have landscape orientation across all sections?
An OpenXML Word document is a zipped collections of XML documents that define its content, formatting, and metadata. When merging Word documents together using AddAlternativeFormatImportPart, the Word documents being merged into the original document (AltChunks) are copied into the zip archive, and XML elements referencing the documents are added into the XML document definition. Then, the next time anyone opens the resulting document in Microsoft Word (or any other OpenXML compatible document editor), the application handles merging in the AltChunks. In this case, Microsoft Word 2010 (the version we are using) has a bug causing Word to ignore some formatting information defined in Word document sections. One of these pieces of information is orientation, making the AltChunk approach ineffective at preserving orientation information.
Instead, we used DocumentBuilder from the OpenXml Power Tools project. This resulted in much simpler code that solved our problem in two lines:
var sourceList = documentPaths.Select(doc => new Source(new WmlDocument(doc), true)).ToList();
DocumentBuilder.BuildDocument(sourceList, outputPath);

Converting html strings in Excel file to formatted word file with .NET

Input are Excel files - the cells may contain some basic HTML formatting like <b>, <br>, <h2>.
I want to read the strings and insert the text as formatted text into word documents, i.e. <b>Foo</b> would be shown as a bold string in Word.
I don't know which tags are used so I need a "generic solution", a find/replace approach does not work for me.
I found a solution from January 2011 using the WebBrowser component. So the HTML is converted to RTF and the RTF is inserted into Word. I was wondering if there is a better solution today.
Using a commercial component is fine for me.
Update
I came across Matthew Manela's MarkupConverter class. It converts HTML to RTF. Then I use the clipboard to insert the snippet into the word file
// rtf contains the converted html string using MarkupConverter
Clipboard.SetText(rtf, TextDataFormat.Rtf);
// objTable is a table in my word file
objTable.Cell(1, 1).Range.Paste();
This works, but will copy/pasting up to a few thousand strings using the clipboard break anything?
You will need the OpenXML SDK in order to work with OpenXML. It can be quite tricky getting into, but it is very powerful, and a whole lot more stable and reliable than Office Automation or Interop.
The following will open a document, create an AltChunk part, add the HTML to it, and embed it into the document. For a broader overview of AltChunk see Eric White's blog
using (var wordDoc = WordprocessingDocument.Open("DocumentName.docx", true))
{
var altChunkId = "AltChunkId1";
var mainPart = wordDoc.MainDocumentPart;
var chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Html, altChunkId);
using (var textStream = new MemoryStream())
{
var html = "<html><body>...</body></html>";
var data = Encoding.UTF8.GetBytes(html);
textStream.Write(data, 0, data.Length);
textStream.Position = 0;
chunk.FeedData(textStream);
}
var altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document.Body.InsertAt(altChunk, 0);
mainPart.Document.Save();
}
Obviously for your case, you will want to find (or build) the table you want and insert the AltChunk there instead of at the first position in the body. Note that the HTML that you insert into the word doc must be full HTML documents, with an <html> tag. I'm not sure if <body> is required, but it doesn't hurt. If you just have HTML formatted text, simply wrap the text in these tags and insert into the doc.
It seems that you will need to use Office Automation/Interop to get the table heights. See this answer which says that the OpenXML SDK does not update the heights, only Word does.
Use this code it is working..
Response.AppendHeader("content-disposition", "attachment;filename=FileEName.xls");
Response.Charset = "";
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Response.ContentType = "application/vnd.ms-excel";
this.EnableViewState = false;
//Response.Write("Your HTML Code");
Response.Write("<table border='1 px solid'><tr><th>sfsd</th><th>sfsdfssd</th></tr><tr>
<td>ssfsdf</td><td><table border='1 px solid'><tr><th>sdf</th><th>hhsdf</th></tr><tr>
<td>sdfds</td><td>sdhjhfds</td></tr></table></td></tr></table>");
Response.End();
Why not let WORD do its owns translation since it understands HTML.
Read your Excel cells
Write your values into a HTML textfile as it would be a WORD document.
Open WORD and let it read that HTML file.
Instruct WORD to save the document as a new WORD document (if that is required).

How to do a Mail Merge in C# using Interop.Word?

I have a template in Word that would be used to print out invoices.
Now, I would like to know how to create a Word Document programmatically and copy the template content into the new document so that I still have a clean template, then replace placeholders that I have typed by using Mail.Merge. I found similar Mail.Merge questions but most require Spire components and I am not interested since it needs to be paid for. I am only a student. Others though, actually doesn't help that much.
The problems I am facing now are as follows:
Create a Word document
Copy template content into new document
How to add placeholder names into MailMerge since I'm very confused about this.
Do MailMerge
Here is the current code that I have concocted, this is actually the first time I have used Interops
Document document = new Document();
Microsoft.Office.Interop.Word.Application wordApp = new Microsoft.Office.Interop.Word.Application();
document = wordApp.Documents.Open(fileName);
string[] fieldNames = {"date_issued", "month_covered", "tuition", "lunchfee", "discount", "in_no", "student_no", "student_name", "amount_due", "amount_paid", "balance", "penalty", "status"};
string[] data = new string[25];
Range range;
document.MailMerge.Fields.Add(range, fieldNames[0]);
document.MailMerge.Execute();
I'm really confused on this part
document.MailMerge.Fields.Add(range, fieldNames[0]);
and I don't know what range is for
If your template is an actual template and not a plain Word Document (with the extension .dotx or .dot, and not .docx/.doc), you don't need to copy anything. Just create a new document based on the template:
var doc = wordApp.Documents.Add("put template path here");
This will have the contents of your template. If you have used the Word UI to setup a mailmerge in the template (including the location of the data for the merge), that will also be carried over into the new document.
Then you can execute the mailmerge from C#:
doc.MailMerge.Execute();

Categories