I'm currently trying this:
using DocumentFormat.OpenXml.Packaging;
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(fileNameDocx as string, true))
{
var xdoc = wordDoc.MainDocumentPart;
mainWebBrowser.NavigateToString(xdoc.Document.OuterXml.ToString());
}
But this just gives me the text and none of the formatting. Is it possible to show a ".docx" in a webbrowser control like this?
There are some articles related to DOCX to HTML converition:
Transforming Open XML WordprocessingML to XHTML Using the Open XML SDK 2.0
Transforming Open XML WordprocessingML to XHtml
Batch conversion of docx to clean HTML
Please try these approaches above.
Hope that helps
If you'd like to show XML in web browser control, you might need to:
load this XML to XDocument,
prepare HTML template with block,
and then - write formatted XML from XDocument into PRE block.
You could either read the final HTML into string and use 'mainWebBrowser.NavigateToString' as you mentioned, or write HTML file to drive and read it from there into your mainWebBrowser
Hope that helps
Related
I am trying to save xml file as PDF as it is. In other words, I am trying to create PDF file that shows content of XML like a screenshot (like raw screenshot). My client somehow needs it like this. I couldn't really find the same question on stackoverflow. Is there anyway I can do this using iText or some other library?
Thank you!
First extract your text from XML file:
XmlDocument doc = new XmlDocument();
doc.Load("c:\\temp.xml");
string myText;
foreach(XmlNode node in doc.DocumentElement.ChildNodes){
myText= node.InnerText; //or loop through its children as well
}
Then create a PDF file and pass this text into it
in this case here I use PDFFlow library to create pdf documents
var DocumentBuilder.New()
.AddSection().AddParagraphToSection(myText).ToDocument()
.Build("Result.PDF");
If you load the xml in a browser you can easily save to searchable PDF
In a shell call (replace msedge with chrome if necessary)
"path to\msedge.exe" --headless --disable-gpu --print-to-pdf="out path\xml.pdf" --enable-logging "file://path to A\file.xml"
Enable logging helps as it can take a long time to process without any visual progress.
[0818/231038.640:INFO:headless_shell.cc(648)] Written to file ...\xml.pdf.
You can also add --print-to-pdf-no-header. Also if adding some style consider --run-all-compositor-stages-before-draw but I have no idea if that works for xml.
ForGet Image --screenshot as a 40 Page high XML as JPEG does NOT translate well to PDF. I tried :-)
If you want that as an image PDF then Re-Print the PDF to PDF using a command line viewer such as this since it is ONLY Print As Image output :-) also note it can in addition read the XML in Black and White (NO linting).
But have not tested how well it does XML2PDF via command line print
SumatraPDF -print-to "My Print to PDF" "path to\filename.pdf" (or xml in mono)
Note "My Print to PDF" is a promptless port you need to configure as required.
I am trying to save xml file as PDF as it is. In other words, I am trying to create PDF file that shows content of XML like a screenshot (like raw screenshot). My client somehow needs it like this. I couldn't really find the same question on stackoverflow. Is there anyway I can do this using iText or some other library?
Thank you!
First extract your text from XML file:
XmlDocument doc = new XmlDocument();
doc.Load("c:\\temp.xml");
string myText;
foreach(XmlNode node in doc.DocumentElement.ChildNodes){
myText= node.InnerText; //or loop through its children as well
}
Then create a PDF file and pass this text into it
in this case here I use PDFFlow library to create pdf documents
var DocumentBuilder.New()
.AddSection().AddParagraphToSection(myText).ToDocument()
.Build("Result.PDF");
If you load the xml in a browser you can easily save to searchable PDF
In a shell call (replace msedge with chrome if necessary)
"path to\msedge.exe" --headless --disable-gpu --print-to-pdf="out path\xml.pdf" --enable-logging "file://path to A\file.xml"
Enable logging helps as it can take a long time to process without any visual progress.
[0818/231038.640:INFO:headless_shell.cc(648)] Written to file ...\xml.pdf.
You can also add --print-to-pdf-no-header. Also if adding some style consider --run-all-compositor-stages-before-draw but I have no idea if that works for xml.
ForGet Image --screenshot as a 40 Page high XML as JPEG does NOT translate well to PDF. I tried :-)
If you want that as an image PDF then Re-Print the PDF to PDF using a command line viewer such as this since it is ONLY Print As Image output :-) also note it can in addition read the XML in Black and White (NO linting).
But have not tested how well it does XML2PDF via command line print
SumatraPDF -print-to "My Print to PDF" "path to\filename.pdf" (or xml in mono)
Note "My Print to PDF" is a promptless port you need to configure as required.
I am working on a project where I need to create a Word file. For this purpose, I am using MigraDoc library for C#.
Using this library, I am easily able to generate a RTF file by writing :
Document document = CreateDocument();
RtfDocumentRenderer rtf = new RtfDocumentRenderer();
rtf.Render(document, "test.rtf", null);
Process.Start("test.rtf");
But the requirement now asks me to get a DOC or DOCX file rather than a RTF file. Is there a way to generate a DOC or DOCX file using MigraDoc? And if so, how may I achieve this?
MigraDoc cannot generate DOC or DOCX files. Since MigraDoc is open source, you could add a renderer for DOCX if you have the knowledge and the time.
MigraDoc as it is cannot generate DOC/DOCX, but maybe you can invoke an external conversion tool after generating the RTF file.
I don't know any such tools. Word can open RTF quickly and so far our customers never complained about getting RTF, not DOC or DOCX.
Update (2019-07-29): The website mentions "Word", but this only refers to RTF. There never was an implementation for .DOC or .DOCX.
It seems no any MigraDoc renders that support DOC or DOCX formats.
On documentation page we can see one MigraDoc feature:
Supports different output formats (PDF, Word, HTML, any printer supported by Windows)
But seems documentation says about RTF format that perfectly works with Word. I have reviewed MigraDoc repository and I do not see any DOC renders. We can use only RTF converter for Word supporting. So we can't generate DOC file directly using this package.
But we can convert RTF to DOC or DOCX easily (and for free) using FreeSpire.Doc nuget package.
Full code example is here:
using MigraDoc.DocumentObjectModel;
using MigraDoc.RtfRendering;
using Spire.Doc;
using System.IO;
namespace MigraDocTest
{
class Program
{
static void Main(string[] args)
{
using (var stream = new MemoryStream())
{
// Generate RTF (using MigraDoc)
var migraDoc = new MigraDoc.DocumentObjectModel.Document();
var section = migraDoc.AddSection();
var paragraph = section.AddParagraph();
paragraph.AddFormattedText("Hello World!", TextFormat.Bold);
var rtfDocumentRenderer = new RtfDocumentRenderer();
rtfDocumentRenderer.Render(migraDoc, stream, false, null);
// Convert RTF to DOCX (using Spire.Doc)
var spireDoc = new Spire.Doc.Document();
spireDoc.LoadFromStream(stream, FileFormat.Auto);
spireDoc.SaveToFile("D:\\example.docx", FileFormat.Docx );
}
}
}
}
You can use Microsoft's DocumentFormat.OpenXML library which has a NuGet package.
I know there are lot of question having same title but I am currently having some issue for them I didn't get the correct way to go.
I am using Open xml sdk 2.5 along with Power tool to convert .docx file to .html file which uses HtmlConverter class for conversion.
I am successfully able to convert the docx file into the Html file but the problem is, html file doesn't retain the original formatting of the document file. eg. Font-size,color,underline,bold etc doesn't reflect into the html file.
Here is my existing code:
public void ConvertDocxToHtml(string fileName)
{
byte[] byteArray = File.ReadAllBytes(fileName);
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true))
{
HtmlConverterSettings settings = new HtmlConverterSettings()
{
PageTitle = "My Page Title"
};
XElement html = HtmlConverter.ConvertToHtml(doc, settings);
File.WriteAllText(#"E:\Test.html", html.ToStringNewLineOnAttributes());
}
}
}
So I just want to know if is there any way by which I can retain the formatting in converted HTML file.
I know about some third party APIs which does the same thing. But I would prefer if there any way using open xml or any other open source to do this.
PowerTools for Open XML just released a new HtmlConverter module. It now contains an open source, free implementation of a conversion from DOCX to HTML formatted with CSS. The module HtmlConverter.cs supports all paragraph, character, and table styles, fonts and text formatting, numbered and bulleted lists, images, and more. See https://openxmldeveloper.org/
Your end result will not look exactly the way your Word Document turns out, but this link might help.
You might want to find an external tool to help you do this, like Aspose Words
You can use OpenXML Viewer extension for Firefox for Converting with formatting.
http://openxmlviewer.codeplex.com
This works for me. Hope this helps.
In our old MSWord-97 based system we use COM to interact with a .doc file, and embed an OLE object, so the embedded document is visible in the parent (not as an icon).
We're replacing this with a system using OpenXML SDK since it requires having Word on our server, which generates .docx files. however we still need to embed the contents of RTF files into the generated DOCX... specifically we replace a bookmark with the contents of the file.
I found a few examples online but they all differ. When I create a simple example in Word and view the XML, there's a lot of stuff to position/display the embedded object's visual representation, while the embedding itself doesn't seem too horrific. What's the easiest way to do this?
You could embed the content of a RTF document into a OpenXML DOCX file
by using the AltChunk anchor for external content. The AltChunk (w:altChunk) element specifies
a location in your OpenXML WordprocessingML document to insert external content such as a RTF document.
The code below uses the AltChunk class in conjunction with the AlternativeFormatImportPart class
to embed the content of a RTF document into a DOCX file after the last paragraph:
using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(#"your_docx_file.docx", true))
{
string altChunkId = "AltChunkId5";
MainDocumentPart mainDocPart = wordDocument.MainDocumentPart;
AlternativeFormatImportPart chunk = mainDocPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.Rtf, altChunkId);
// Read RTF document content.
string rtfDocumentContent = File.ReadAllText("your_rtf_document.rtf", Encoding.ASCII);
using (MemoryStream ms = new MemoryStream(Encoding.ASCII.GetBytes(rtfDocumentContent)))
{
chunk.FeedData(ms);
}
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
// Embed AltChunk after the last paragraph.
mainDocPart.Document.Body.InsertAfter(
altChunk, mainDocPart.Document.Body.Elements<Paragraph>().Last());
mainDocPart.Document.Save();
}
If you want to embed an Unicode RTF string into a DOCX file then you have to escape the Unicode characters. For an example please refer to the following stackoverflow answer.
When you encounter the error "the file is corrupt" then ensure that you Dispose() or Close() the WordprocessingDocument. If you do not Close() the document then the releationship for the w:altchunk is not stored in the Document.xml.rels file.
This fella seemed to have figured it out with his own question and answer at How can I embed any file type into Microsoft Word using OpenXml 2.0