Generating a DOC or DOCX using MigraDoc - c#

I am working on a project where I need to create a Word file. For this purpose, I am using MigraDoc library for C#.
Using this library, I am easily able to generate a RTF file by writing :
Document document = CreateDocument();
RtfDocumentRenderer rtf = new RtfDocumentRenderer();
rtf.Render(document, "test.rtf", null);
Process.Start("test.rtf");
But the requirement now asks me to get a DOC or DOCX file rather than a RTF file. Is there a way to generate a DOC or DOCX file using MigraDoc? And if so, how may I achieve this?

MigraDoc cannot generate DOC or DOCX files. Since MigraDoc is open source, you could add a renderer for DOCX if you have the knowledge and the time.
MigraDoc as it is cannot generate DOC/DOCX, but maybe you can invoke an external conversion tool after generating the RTF file.
I don't know any such tools. Word can open RTF quickly and so far our customers never complained about getting RTF, not DOC or DOCX.
Update (2019-07-29): The website mentions "Word", but this only refers to RTF. There never was an implementation for .DOC or .DOCX.

It seems no any MigraDoc renders that support DOC or DOCX formats.
On documentation page we can see one MigraDoc feature:
Supports different output formats (PDF, Word, HTML, any printer supported by Windows)
But seems documentation says about RTF format that perfectly works with Word. I have reviewed MigraDoc repository and I do not see any DOC renders. We can use only RTF converter for Word supporting. So we can't generate DOC file directly using this package.
But we can convert RTF to DOC or DOCX easily (and for free) using FreeSpire.Doc nuget package.
Full code example is here:
using MigraDoc.DocumentObjectModel;
using MigraDoc.RtfRendering;
using Spire.Doc;
using System.IO;
namespace MigraDocTest
{
class Program
{
static void Main(string[] args)
{
using (var stream = new MemoryStream())
{
// Generate RTF (using MigraDoc)
var migraDoc = new MigraDoc.DocumentObjectModel.Document();
var section = migraDoc.AddSection();
var paragraph = section.AddParagraph();
paragraph.AddFormattedText("Hello World!", TextFormat.Bold);
var rtfDocumentRenderer = new RtfDocumentRenderer();
rtfDocumentRenderer.Render(migraDoc, stream, false, null);
// Convert RTF to DOCX (using Spire.Doc)
var spireDoc = new Spire.Doc.Document();
spireDoc.LoadFromStream(stream, FileFormat.Auto);
spireDoc.SaveToFile("D:\\example.docx", FileFormat.Docx );
}
}
}
}

You can use Microsoft's DocumentFormat.OpenXML library which has a NuGet package.

Related

How to open an existing PDF file with Migradoc PDF library

I am trying to use the Migradoc library from PDFSharp (http://www.pdfsharp.net/) to print pdf files. So far I have found that Migradoc does support printing through its MigraDoc.Rendering.Printing.MigraDocPrintDocument class. However, I have not found a way to actually open an existing PDF file with MigraDoc.
I did find a way to open an existing PDF file using PDFSharp, but I cannot successfully convert a PDFSharp.Pdf.PdfDocument into a MigraDoc.DocumentObjectModel.Document object. So far I have not found the MigraDoc and PDFSharp documentation to be very helpful.
Does anyone have any experience using these libraries to work with existing PDF files?
I wrote the following code with help from this sample, but the result when my input PDF is 2 pages is an output PDF with 2 blank pages.
using MigraDoc.DocumentObjectModel;
using MigraDoc.Rendering;
using PdfSharp.Drawing;
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;
...
public void PrintPDF(string filePath, string outFilePath)
{
var document = new Document();
var docRenderer = new DocumentRenderer(document);
docRenderer.PrepareDocument();
var inPdfDoc = PdfReader.Open(filePath, PdfDocumentOpenMode.Modify);
for (var i = 0; i < inPdfDoc.PageCount; i++)
{
document.AddSection();
docRenderer.PrepareDocument();
var page = inPdfDoc.Pages[i];
var gfx = XGraphics.FromPdfPage(page);
docRenderer.RenderPage(gfx, i+1);
}
var renderer = new PdfDocumentRenderer();
renderer.Document = document;
renderer.RenderDocument();
renderer.PdfDocument.Save(outFilePath);
}
Your code modifies the inPdfDoc in memory without saving the changes. Complicated code without any visual effect.
MigraDoc cannot open PDF files, MigraDoc cannot print PDF files, PDFsharp cannot print PDF files.
http://www.pdfsharp.net/wiki/PDFsharpFAQ.ashx

How to convert docx to html file using open xml with formatting

I know there are lot of question having same title but I am currently having some issue for them I didn't get the correct way to go.
I am using Open xml sdk 2.5 along with Power tool to convert .docx file to .html file which uses HtmlConverter class for conversion.
I am successfully able to convert the docx file into the Html file but the problem is, html file doesn't retain the original formatting of the document file. eg. Font-size,color,underline,bold etc doesn't reflect into the html file.
Here is my existing code:
public void ConvertDocxToHtml(string fileName)
{
byte[] byteArray = File.ReadAllBytes(fileName);
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true))
{
HtmlConverterSettings settings = new HtmlConverterSettings()
{
PageTitle = "My Page Title"
};
XElement html = HtmlConverter.ConvertToHtml(doc, settings);
File.WriteAllText(#"E:\Test.html", html.ToStringNewLineOnAttributes());
}
}
}
So I just want to know if is there any way by which I can retain the formatting in converted HTML file.
I know about some third party APIs which does the same thing. But I would prefer if there any way using open xml or any other open source to do this.
PowerTools for Open XML just released a new HtmlConverter module. It now contains an open source, free implementation of a conversion from DOCX to HTML formatted with CSS. The module HtmlConverter.cs supports all paragraph, character, and table styles, fonts and text formatting, numbered and bulleted lists, images, and more. See https://openxmldeveloper.org/
Your end result will not look exactly the way your Word Document turns out, but this link might help.
You might want to find an external tool to help you do this, like Aspose Words
You can use OpenXML Viewer extension for Firefox for Converting with formatting.
http://openxmlviewer.codeplex.com
This works for me. Hope this helps.

Creating report with Aspose.Word without losing formatting

I am using Aspose.Words to create reports from a template file (.docx filetype).
After using Aspose.Words to modify the template file and saving it into a new file, the formatting of the template file were lost (such as bold text, comments, etc).
I have tried:
Aspose.Words.Document doc = new Document(inputStream);
var outputStream = new MemoryStream();
doc.Save(outputStream, SaveFormat.docx);
What I did not expect is that outputStream is much less bytes than inputStream although I have yet to make any modification on doc. It may the reason why the report file lose their formatting.
What should I try now?
Ok, the problem is because the current version of Aspose.Words I'm using does not support docx filetype. But it still can read text of a .docx file, and only text(without any associated formatting).

Converting HTML to Word Docx with style intact

I know there are already questions similar to this, and suggested Open XML and all.
I am using Open XMl but it work only with inline style.
is there any solution to this, or any other better way to convert html to docx other than Open XML.
Thanks!
You can inline a CSS file using a tool like the one described here.
Then, to perform the conversion (adapted from Eric White's blog):
using (WordprocessingDocument myDoc =
WordprocessingDocument.Open("ConvertedDocument.docx", true))
{
string altChunkId = "AltChunkId1";
MainDocumentPart mainPart = myDoc.MainDocumentPart;
var chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.Html, altChunkId);
using (FileStream fileStream = File.Open("YourHtmlDocument.html", FileMode.Open))
{
chunk.FeedData(fileStream);
}
AltChunk altChunk = new AltChunk() {Id = altChunkId};
mainPart.Document.Body.InsertAfter(
altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
mainPart.Document.Save();
}
This isn't exactly converting HTML to DOCX. It's appending YourHtmlDocument.html to ConvertedDocument.docx. If ConvertedDocument.docx is initially empty this approach is effectively a conversion.
Whenever you use an AltChunk to build a document, your HTML is embedded in the document until the next time the document is opened in Word. At that point, the HTML is converted to WordProcessingML markup. This is really only an issue if the document won't be opened in MS Word. If you were uploading to Google docs, opening in OpenOffice, or using COM to convert to a PDF, OpenXML won't be sufficient. In that case, you'll probably need to resort to a paid tool like Aspose.Words.

Embed contents of a RTF file into a DOCX file using OpenXML SDK

In our old MSWord-97 based system we use COM to interact with a .doc file, and embed an OLE object, so the embedded document is visible in the parent (not as an icon).
We're replacing this with a system using OpenXML SDK since it requires having Word on our server, which generates .docx files. however we still need to embed the contents of RTF files into the generated DOCX... specifically we replace a bookmark with the contents of the file.
I found a few examples online but they all differ. When I create a simple example in Word and view the XML, there's a lot of stuff to position/display the embedded object's visual representation, while the embedding itself doesn't seem too horrific. What's the easiest way to do this?
You could embed the content of a RTF document into a OpenXML DOCX file
by using the AltChunk anchor for external content. The AltChunk (w:altChunk) element specifies
a location in your OpenXML WordprocessingML document to insert external content such as a RTF document.
The code below uses the AltChunk class in conjunction with the AlternativeFormatImportPart class
to embed the content of a RTF document into a DOCX file after the last paragraph:
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(#"your_docx_file.docx", true))
{
string altChunkId = "AltChunkId5";
MainDocumentPart mainDocPart = wordDocument.MainDocumentPart;
AlternativeFormatImportPart chunk = mainDocPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.Rtf, altChunkId);
// Read RTF document content.
string rtfDocumentContent = File.ReadAllText("your_rtf_document.rtf", Encoding.ASCII);
using (MemoryStream ms = new MemoryStream(Encoding.ASCII.GetBytes(rtfDocumentContent)))
{
chunk.FeedData(ms);
}
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
// Embed AltChunk after the last paragraph.
mainDocPart.Document.Body.InsertAfter(
altChunk, mainDocPart.Document.Body.Elements<Paragraph>().Last());
mainDocPart.Document.Save();
}
If you want to embed an Unicode RTF string into a DOCX file then you have to escape the Unicode characters. For an example please refer to the following stackoverflow answer.
When you encounter the error "the file is corrupt" then ensure that you Dispose() or Close() the WordprocessingDocument. If you do not Close() the document then the releationship for the w:altchunk is not stored in the Document.xml.rels file.
This fella seemed to have figured it out with his own question and answer at How can I embed any file type into Microsoft Word using OpenXml 2.0

Categories