I'm using Open XML (DocumentFormat.OpenXml nuget package) to generating a docx file. Here is my approach:
I have a file, named template.docx. In this file I have a Cover Page and a blank page which has header, footer, and a background image. Anyway, I first open the document, then append some text to the document, then close it.
In the other hand, I have a file named template-back.docx which I want to append that at the end of modified document (template.docx) above.
I'm able to do that, by using this snippet:
public static void MergeDocumentWithPagebreak(string sourceFile, string destinationFile, string altChunkID) {
using (var myDoc = WordprocessingDocument.Open(sourceFile, true)) {
var mainPart = myDoc.MainDocumentPart;
//Append page break
var para = new Paragraph(new Run((new Break() { Type = BreakValues.Page })));
mainPart.Document.Body.InsertAfter(para, mainPart.Document.Body.LastChild);
//Append file
var chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkID);
using (var fileStream = File.Open(destinationFile, FileMode.Open))
chunk.FeedData(fileStream);
var altChunk = new AltChunk{
Id = altChunkID
};
mainPart.Document
.Body
.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
mainPart.Document.Save();
}
}
But, when I do that, the header, footer, and background image, are applied to the last page. I want to be able to exclude last page from getting those designs. I want it to be clean, simple and white. But googling the issue, had nothing to help. Do you have any idea please? Thanks in advance.
P.S.
The original article about merging documents here:
It's a little bit tricky, but not so complicated.
First you have to understand how word works:
By default, a word document is one section, and this one section share header and footer. If you want differents header / footer, you have to create a break at the end of a page to indicate "the next page is a new section".
Once a new section is create, you must indicate "the new section don't share the same header / footer"
Some documentation on "how to create different header in word". http://www.techrepublic.com/blog/microsoft-office/accommodate-different-headers-and-footers-in-a-word-document/
If we translate to your code, before inserting your document at the end of the other, you have to:
Create a section break
Inserting a new Header / footer in this section (an empty one)
Insert your new document in the new section
To create the new header, some other documentation: https://msdn.microsoft.com/en-us/library/office/cc546917.aspx
Trick: if the document you insert don't contain header / footer, create empty ones and recopy them
Information: I tried to delete the <w:headerReference r:id="rIdX" w:type="default"/> or to set the r:id to 0 but it don't work. Create an empty header is the fastest way
Replace your Page break with the following code
Paragraph PageBreakParagraph = new Paragraph(new DocumentFormat.OpenXml.Wordprocessing.Run(new DocumentFormat.OpenXml.Wordprocessing.Break() { Type = BreakValues.Page }));
I also saw that you are inserting after the last child which is not essential same as Appending but works well for you! Use this instead.
wordprocessingDocument.MainDocumentPart.Document.Body.Append(PageBreakParagraph)
You need to add the section break to the section properties. You then need to append the section properties to the paragraph properties. Followed by appending the paragraph properties to a paragraph.
Paragraph paragraph232 = new Paragraph();
ParagraphProperties paragraphProperties220 = new ParagraphProperties();
SectionProperties sectionProperties1 = new SectionProperties();
SectionType sectionType1 = new SectionType(){ Val = SectionMarkValues.NextPage };
sectionProperties1.Append(sectionType1);
paragraphProperties220.Append(sectionProperties1);
paragraph232.Append(paragraphProperties220);
//Replace your last but one line with this one.
mainPart.Document
.Body
.Append(altChunk);
The resulting Open XML is:
<w:p>
<w:pPr>
<w:sectPr>
<w:type w:val="nextPage" />
</w:sectPr>
</w:pPr>
</w:p>
The Easiest way to do it is to actually create the document in word and then open in it in the Open XML Productivity Tool, you can reflect the code and see what C# code would generate the various Open XML elements you are trying to achieve. Hope this helps!
Related
I'm creating WordprocessingDocuments in C# with the Open XML SDK and then converting them to pdf. Initially, I was using Interop to save the document in PDF format, but now that is not an option. I found that LibreOffice can convert documents calling soffice.exe from cmd, and I had wonderful results with normal documents. Still, then, when I tested LibreOffice converter with my dynamic documents, the converter crashed.
I copied one of these documents and opened it with LibreOffice Writer, its structure was wrong, then I opened the same document with Microsoft Word and its structure was fine. Finally, I saved it with Microsoft Word and opened both documents as ZIP files as below:
This is the good one:
And this is the bad one:
I noticed that when I save the document in Microsoft Word, these Open XML parts (which I called "files" in an earlier version of this question) are appearing. When I open the document previously saved with Microsoft Word in LibreOffice, the document is fine again.
Thus, is there a way to generate these Open XML parts (inside the Word document) without opening Microsoft Word?
I use the following code (to check if it is creating all the files):
using (MemoryStream mem = new MemoryStream())
{
// Create Document
using (WordprocessingDocument wordDocument =
WordprocessingDocument.Create(mem, WordprocessingDocumentType.Document, true))
{
// Add a main document part.
MainDocumentPart mainPart = wordDocument.AddMainDocumentPart();
// Create the document structure and add some text.
mainPart.Document = new Document();
Body docBody = new Body();
// Add your docx content here
CreateParagraph(docBody);
CreateStyledParagraph(docBody);
CreateTable(docBody);
CreateList(docBody);
Paragraph pImg = new Paragraph();
ImagePart imagePart = mainPart.AddImagePart(ImagePartType.Jpeg);
string imgPath = "https://cdn.pixabay.com/photo/2019/11/15/05/23/dog-4627679_960_720.png";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(imgPath);
req.UseDefaultCredentials = true;
req.PreAuthenticate = true;
req.Credentials = CredentialCache.DefaultCredentials;
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
imagePart.FeedData(resp.GetResponseStream());
// 1500000 and 1092000 are img width and height
Run rImg = new Run(DrawingManager(mainPart.GetIdOfPart(imagePart), "PictureName", 1500000, 1092000, string.Empty));
pImg.Append(rImg);
docBody.Append(pImg);
Paragraph pLink = new Paragraph();
// For the mainpart see above
pLink.Append(HyperLinkManager("http://YourLink", "My awesome link", mainPart));
docBody.Append(pLink);
mainPart.Document.Append(docBody);
mainPart.Document.Save();
wordDocument.Close();
}
result = Convert.ToBase64String(mem.ToArray());
}
The code above creates a Word document named Result.docx with the following structure:
But there aren't any other Open XML parts (like app.xml or styles.xml)
You need to make a difference between:
the Open XML standard and its minimum requirements on a WordprocessingDocument and
the "minimum" document created by Microsoft Word or other applications.
As per the standard, the minimum WordprocessingDocument only needs a main document part (MainDocumentPart, document.xml) with the following content:
<w:document xmlns:w="...">
<w:body>
<w:p />
</w:body>
</w:document>
Further parts such as the StyleDefinitionsPart (styles.xml) or the NumberingDefintionsPart (numbering.xml) are only required if you have styles or numbering, in which case you must explicitly create them in your code.
Next, looking at your sample code, it seems you are creating:
paragraphs that reference styles (see CreateStyledParagraph(docBody)), which would have to be defined in the StyleDefinitionsPart (styles.xml); and
numbered lists (e.g., CreateList(docBody)), which would have to be defined in the NumberingDefinitionsPart (numbering.xml).
However, your code neither creates a StyleDefinitionsPart nor a NumberingDefintionsPart, which means your document is likely not a valid Open XML document.
Now, Word is very forgiving and fixes various issues silently, ignoring parts of your Open XML markup (e.g., the styles you might have assigned to your paragraphs).
By contrast, depending on how fault-tolerant LibreOffice is, invalid Open XML markup might lead to a crash. For example, if LibreOffice simply assumes that a StyleDefinitionsPart exists when it finds an element like <w:pStyle w:val="MyStyleName" /> in your w:document and then does not check whether it gets a null reference when asking for the StyleDefinitionsPart, it could crash.
Finally, to add parts to your Word document, you would use the Open XML SDK as follows:
[Fact]
public void CanAddParts()
{
const string path = "Document.docx";
const WordprocessingDocumentType type = WordprocessingDocumentType.Document;
using WordprocessingDocument wordDocument = WordprocessingDocument.Create(path, type);
// Create minimum main document part.
MainDocumentPart mainDocumentPart = wordDocument.AddMainDocumentPart();
mainDocumentPart.Document = new Document(new Body(new Paragraph()));
// Create empty style definitions part.
var styleDefinitionsPart = mainDocumentPart.AddNewPart<StyleDefinitionsPart>();
styleDefinitionsPart.Styles = new Styles();
// Create empty numbering definitions part.
var numberingDefinitionsPart = mainDocumentPart.AddNewPart<NumberingDefinitionsPart>();
numberingDefinitionsPart.Numbering = new Numbering();
}
I get count of pages in next code:
using DocumentFormat.OpenXml.Packaging;
WordprocessingDocument doc = WordprocessingDocument.Open( #"D:\2pages.docx", false );
Console.WriteLine(
doc.ExtendedFilePropertiesPart.Properties.Pages.InnerText.ToString()
);
can I get in this way height and width of file?
or in another way but without using office.
Aspose.Word (which is not free) has these things built in:
https://docs.aspose.com/display/wordsnet/Changing+Page+Setup+for+Whole+Document+using+Aspose.Words
Document doc = new Document();
// This truly makes the document empty. No sections (not possible in Microsoft Word).
doc.RemoveAllChildren();
// Create a new section node.
// Note that the section has not yet been added to the document,
// but we have to specify the parent document.
Section section = new Section(doc);
// Append the section to the document.
doc.AppendChild(section);
// Lets set some properties for the section.
section.PageSetup.SectionStart = SectionStart.NewPage;
section.PageSetup.PaperSize = PaperSize.Letter;
Someone had a similar problem and this is the discussion on SO (but with OpenXML):
Change Page size of Wor Document using Open Xml SDK 2.0
Maybe you can deduct your answer from this.
Please use PageSetup.PageHeight and PageSetup.PageWidth properties to get the page's height and width. Hope this helps you.
Document doc = new Document(MyDir + "input.docx");
Console.WriteLine(doc.FirstSection.PageSetup.PageHeight);
Console.WriteLine(doc.FirstSection.PageSetup.PageWidth);
I work with Aspose as Developer Evangelist.
Input are Excel files - the cells may contain some basic HTML formatting like <b>, <br>, <h2>.
I want to read the strings and insert the text as formatted text into word documents, i.e. <b>Foo</b> would be shown as a bold string in Word.
I don't know which tags are used so I need a "generic solution", a find/replace approach does not work for me.
I found a solution from January 2011 using the WebBrowser component. So the HTML is converted to RTF and the RTF is inserted into Word. I was wondering if there is a better solution today.
Using a commercial component is fine for me.
Update
I came across Matthew Manela's MarkupConverter class. It converts HTML to RTF. Then I use the clipboard to insert the snippet into the word file
// rtf contains the converted html string using MarkupConverter
Clipboard.SetText(rtf, TextDataFormat.Rtf);
// objTable is a table in my word file
objTable.Cell(1, 1).Range.Paste();
This works, but will copy/pasting up to a few thousand strings using the clipboard break anything?
You will need the OpenXML SDK in order to work with OpenXML. It can be quite tricky getting into, but it is very powerful, and a whole lot more stable and reliable than Office Automation or Interop.
The following will open a document, create an AltChunk part, add the HTML to it, and embed it into the document. For a broader overview of AltChunk see Eric White's blog
using (var wordDoc = WordprocessingDocument.Open("DocumentName.docx", true))
{
var altChunkId = "AltChunkId1";
var mainPart = wordDoc.MainDocumentPart;
var chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Html, altChunkId);
using (var textStream = new MemoryStream())
{
var html = "<html><body>...</body></html>";
var data = Encoding.UTF8.GetBytes(html);
textStream.Write(data, 0, data.Length);
textStream.Position = 0;
chunk.FeedData(textStream);
}
var altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document.Body.InsertAt(altChunk, 0);
mainPart.Document.Save();
}
Obviously for your case, you will want to find (or build) the table you want and insert the AltChunk there instead of at the first position in the body. Note that the HTML that you insert into the word doc must be full HTML documents, with an <html> tag. I'm not sure if <body> is required, but it doesn't hurt. If you just have HTML formatted text, simply wrap the text in these tags and insert into the doc.
It seems that you will need to use Office Automation/Interop to get the table heights. See this answer which says that the OpenXML SDK does not update the heights, only Word does.
Use this code it is working..
Response.AppendHeader("content-disposition", "attachment;filename=FileEName.xls");
Response.Charset = "";
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Response.ContentType = "application/vnd.ms-excel";
this.EnableViewState = false;
//Response.Write("Your HTML Code");
Response.Write("<table border='1 px solid'><tr><th>sfsd</th><th>sfsdfssd</th></tr><tr>
<td>ssfsdf</td><td><table border='1 px solid'><tr><th>sdf</th><th>hhsdf</th></tr><tr>
<td>sdfds</td><td>sdhjhfds</td></tr></table></td></tr></table>");
Response.End();
Why not let WORD do its owns translation since it understands HTML.
Read your Excel cells
Write your values into a HTML textfile as it would be a WORD document.
Open WORD and let it read that HTML file.
Instruct WORD to save the document as a new WORD document (if that is required).
I need to insert an image based on a generated barcode file.
The problem I'm having is when using the iTextSharp library I can normally fill in text such as
PdfReader pdfReader = new PdfReader(oldFile);
PdfStamper pdfStamper = new PdfStamper(pdfReader, outFile);
AcroFields fields = pdfStamper.AcroFields;
fields.SetField("topmostSubform[0].Page1[0].BARCODE[0]", "X974005-1");
though there's one field where in pdf if I click onto it it prompts me for an image to insert into field, but I can't seem to programmatically accomplish this. Based on some google searches and stumbling upon a stackoverflow page, I inserted the following code expecting it to work as desired:
string fieldName = "topmostSubform[0].Page1[0].BARCODE[0]";
string imageFile = "test-barcode.jpg";
AcroFields.FieldPosition fieldPosition = pdfStamper.AcroFields.GetFieldPositions(fieldName)[0];
PushbuttonField imageField = new PushbuttonField(pdfStamper.Writer, fieldPosition.position, fieldName);
imageField.Layout = PushbuttonField.LAYOUT_ICON_ONLY;
imageField.Image = iTextSharp.text.Image.GetInstance(imageFile);
imageField.ScaleIcon = PushbuttonField.SCALE_ICON_ALWAYS;
imageField.ProportionalIcon = false;
imageField.Options = BaseField.READ_ONLY;
pdfStamper.AcroFields.RemoveField(fieldName);
pdfStamper.AddAnnotation(imageField.Field, fieldPosition.page);
The problem I am having is while it removes the existing field as intended, when I open the newly created PDF file I don't see this new push button field with the intended image file but rather as a blank but when I perform this through debug mode I can see that it's at least picking up the correct dimensions of the image file, so I don't know what I'm doing wrong here.
Please advise, thanks.
If you read the official documentation (that is: my book), you'll find this example: ReplaceIcon.cs
You're removing the field using pdfStamper.AcroFields.RemoveField(fieldName); and subsequently you try adding the new field using pdfStamper.AddAnnotation(imageField.Field, fieldPosition.page);
That's wrong. You should replace the field using pdfStamper.AcroFields.ReplacePushbuttonField(fieldname, imageField.Field);
The ReplacePushbuttonField() method copies plenty of settings behind the scenes.
I have a template in Word that would be used to print out invoices.
Now, I would like to know how to create a Word Document programmatically and copy the template content into the new document so that I still have a clean template, then replace placeholders that I have typed by using Mail.Merge. I found similar Mail.Merge questions but most require Spire components and I am not interested since it needs to be paid for. I am only a student. Others though, actually doesn't help that much.
The problems I am facing now are as follows:
Create a Word document
Copy template content into new document
How to add placeholder names into MailMerge since I'm very confused about this.
Do MailMerge
Here is the current code that I have concocted, this is actually the first time I have used Interops
Document document = new Document();
Microsoft.Office.Interop.Word.Application wordApp = new Microsoft.Office.Interop.Word.Application();
document = wordApp.Documents.Open(fileName);
string[] fieldNames = {"date_issued", "month_covered", "tuition", "lunchfee", "discount", "in_no", "student_no", "student_name", "amount_due", "amount_paid", "balance", "penalty", "status"};
string[] data = new string[25];
Range range;
document.MailMerge.Fields.Add(range, fieldNames[0]);
document.MailMerge.Execute();
I'm really confused on this part
document.MailMerge.Fields.Add(range, fieldNames[0]);
and I don't know what range is for
If your template is an actual template and not a plain Word Document (with the extension .dotx or .dot, and not .docx/.doc), you don't need to copy anything. Just create a new document based on the template:
var doc = wordApp.Documents.Add("put template path here");
This will have the contents of your template. If you have used the Word UI to setup a mailmerge in the template (including the location of the data for the merge), that will also be carried over into the new document.
Then you can execute the mailmerge from C#:
doc.MailMerge.Execute();