i've got a task to convert docx document to Pdf. I've decided to take this approach:
convert docx to html, then pass html to ItextSharp. Few week i've been looking all over google, codeplex, sourceforge and stackoverflow and so on for a solution thow to make this conversion, until I found Eric White blog. At firs impression he made great tool for working with OpenXml documents. But when I tried to test it I've got an error about null reference. Error occurs when reading header (RevisionAccepter class)
public static void AcceptRevisions(WordprocessingDocument doc)
{
AcceptRevisionsForPart(doc.MainDocumentPart);
foreach (var part in doc.MainDocumentPart.HeaderParts) //part is null
AcceptRevisionsForPart(part); //null ref exception here
foreach (var part in doc.MainDocumentPart.FooterParts)
AcceptRevisionsForPart(part);
if (doc.MainDocumentPart.EndnotesPart != null)
AcceptRevisionsForPart(doc.MainDocumentPart.EndnotesPart);
if (doc.MainDocumentPart.FootnotesPart != null)
AcceptRevisionsForPart(doc.MainDocumentPart.FootnotesPart);
}
code I use for conversion (same as in example)
private void conv()
{
byte[] byteArray = File.ReadAllBytes(textBox1.Text);
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument doc =
WordprocessingDocument.Open(memoryStream, true))
{
HtmlConverterSettings settings = new HtmlConverterSettings()
{
PageTitle = "My Page Title"
};
XElement html = HtmlConverter.ConvertToHtml(doc, settings);
File.WriteAllText("Test.html", html.ToStringNewLineOnAttributes());
}
}
}
nameSpaces:
using System.Xml;
using System.Xml.Xsl;
using OpenXmlPowerTools;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;
I tried to pass documents created by word 2010, with header and without and still getting errors in the same place. Maybe I'm passing document incorrectly or something with the document itself.
Maybe there is another way to convert docx to pdf without using comercial components, like Apose.
found the problem. Errors was occuring because of different references between power tools project and main project.
DocumentFormat.OpenXml version on my project was 2.5.5513.0 and on power tools - 2.0.5022.0
After making references to the same resource everything went fine.
Related
I have to copy selected text from activedocument to new file (at the end of target file). Both (source and target) are docx files. The source file is opened in Word and the user is working with it.
I would like to copy the selection without opening the target file as Microsoft.Office.Interop.Word.Document and copy-paste (for performance reasons).
I don't know how to change the "Selection" in open document to xml understandable by DocumentOpenXml and how to inject this xml into target file.
using DocumentFormat.OpenXml.Wordprocessing;
using DocumentFormat.OpenXml.Packaging;
using Range = Microsoft.Office.Interop.Word.Range;
public void RangeToNewDocument(string documentPath, Range range)
{
string selectedXML = range.WordOpenXML; //??????????
using (WordprocessingDocument doc = WordprocessingDocument.Open(documentPath, true))
{
Body body = doc.MainDocumentPart.Document.Body;
//body.Append(selectedXML); ??????
doc.SaveAs(documentPath + ".RangeToNewDocumentTest.docx");
}
}
Many example codes are for "How to add something" but there are new objects (as 'Runs' or all 'Paragraphs') but I couldn't find anything about an existing object.
The only link I found is:
https://learn.microsoft.com/en-us/office/open-xml/how-to-copy-the-contents-of-an-open-xml-package-part-to-a-document-part-in-a-dif
but there replace one ThemePart with another and I have no idea how to adjust it for me.
Currently i am unable to load original pdf document using GemBox. it gives me below error in image. and I am using Acrobat 9.
I have tried using 8/16/2018 fixes too. Any suggestion will be highly appreciated.
Basic Code i am using is,
using GemBox.Document;
using System;
namespace Pdf2Text
{
class Program
{
[STAThread]
static void Main(string[] args)
{
ComponentInfo.SetLicense("My-License");
DocumentModel document = null;
document = DocumentModel.Load(#"E:\data\testing\HA021.pdf");
document.Save(#"E:\data\testing\HA021.docx");
}
}
}
The current implementation of PDF reader in GemBox.Document is still in beta and cannot handle this PDF feature, an "iref streams" which are a cross-reference tables stored in streams.
However, GemBox.Pdf can handle cross-reference streams so as a workaround what you could do is something like the following:
// Load PDF with GemBox.Pdf.
var pdfDocument = PdfDocument.Load("Sample.pdf");
pdfDocument.SaveOptions.CrossReferenceType = PdfCrossReferenceType.Table;
// Save PDF with GemBox.Pdf.
var pdfStream = new MemoryStream();
pdfDocument.Save(pdfStream);
// Load PDF with GemBox.Document.
var document = DocumentModel.Load(pdfStream, LoadOptions.PdfDefault);
Last regarding the conversion of PDF to DOCX, GemBox.Document's PDF reader is currently intended for extracting text and tables from PDF files, it's not intended for any high fidelity requirement.
How can we convert the excel file and word file to .pdf format from c#. i tried the following code but it shows the an error
this is my code:
Microsoft.Office.Interop.Word.Application appWord = new Microsoft.Office.Interop.Word.Application();
wordDocument = appWord.Documents.Open(#"C:\Users\ITPro2\Documents\test.docx");
wordDocument.ExportAsFixedFormat(#"D:\desktop\DocTo.pdf", Microsoft.Office.Interop.Word.WdExportFormat.wdExportFormatPDF);
and i got the following Error
The export failed because this feature is not installed. during export to pdf from word from c#
While not directly related the documentation under
https://msdn.microsoft.com/en-us/library/office/ff198122.aspx
gives a note, that if the pdf add-in is not installed, exactly this error will occur. So check your prerequisites, i.e. Office installed and the add-in, too.
1) Excel 2013 Primary Interop Assembly Class Library and it works perfectly fine under .NET 4.5.1 Just add Microsoft.Office.Interop.Excel assembly to your references and you are ready to go.
using System;
using Microsoft.Office.Interop.Excel;
namespace officeInterop
{
class Program
{
static void Main(string[] args)
{
Application app = new Application();
Workbook wkb = app.Workbooks.Open("d:\\x.xlsx");
wkb.ExportAsFixedFormat(XlFixedFormatType.xlTypePDF, "d:\\x.pdf");
}
}
}
OR
2) refer this link to convert DOC or DOCx file into PDF
http://www.rasteredge.com/how-to/csharp-imaging/pdf-convert-word-to-pdf/
Since my other comment got deleted, here is the updated version.
For converting my files to pdf in c# i used the metamorphosis libary, this could also be a solution for you.
Below is a code example from me where i used a blobstorage to download PDF files from regular files.
var converter = new SautinSoft.PdfMetamorphosis();
var ms = new MemoryStream();
await blob.DownloadToStreamAsync(ms);
ms.Seek(0, SeekOrigin.Begin);
var pdfStream = converter.DocxToPdfConvertStream(ms);
if (pdfStream != null)
{
await _storageProvider.SaveFileAsync(containerName, fileName, pdfStream);
}
ms.Close();
var result = await _storageProvider.GetFileWithAttributesAsync(containerName, fileName);
return new ServiceResponse<CloudBlockBlob>(result);
Below i posted some links with the sample code from the library itself:
Word to PDF example
Excel to PDF example
I am working on a project where I need to create a Word file. For this purpose, I am using MigraDoc library for C#.
Using this library, I am easily able to generate a RTF file by writing :
Document document = CreateDocument();
RtfDocumentRenderer rtf = new RtfDocumentRenderer();
rtf.Render(document, "test.rtf", null);
Process.Start("test.rtf");
But the requirement now asks me to get a DOC or DOCX file rather than a RTF file. Is there a way to generate a DOC or DOCX file using MigraDoc? And if so, how may I achieve this?
MigraDoc cannot generate DOC or DOCX files. Since MigraDoc is open source, you could add a renderer for DOCX if you have the knowledge and the time.
MigraDoc as it is cannot generate DOC/DOCX, but maybe you can invoke an external conversion tool after generating the RTF file.
I don't know any such tools. Word can open RTF quickly and so far our customers never complained about getting RTF, not DOC or DOCX.
Update (2019-07-29): The website mentions "Word", but this only refers to RTF. There never was an implementation for .DOC or .DOCX.
It seems no any MigraDoc renders that support DOC or DOCX formats.
On documentation page we can see one MigraDoc feature:
Supports different output formats (PDF, Word, HTML, any printer supported by Windows)
But seems documentation says about RTF format that perfectly works with Word. I have reviewed MigraDoc repository and I do not see any DOC renders. We can use only RTF converter for Word supporting. So we can't generate DOC file directly using this package.
But we can convert RTF to DOC or DOCX easily (and for free) using FreeSpire.Doc nuget package.
Full code example is here:
using MigraDoc.DocumentObjectModel;
using MigraDoc.RtfRendering;
using Spire.Doc;
using System.IO;
namespace MigraDocTest
{
class Program
{
static void Main(string[] args)
{
using (var stream = new MemoryStream())
{
// Generate RTF (using MigraDoc)
var migraDoc = new MigraDoc.DocumentObjectModel.Document();
var section = migraDoc.AddSection();
var paragraph = section.AddParagraph();
paragraph.AddFormattedText("Hello World!", TextFormat.Bold);
var rtfDocumentRenderer = new RtfDocumentRenderer();
rtfDocumentRenderer.Render(migraDoc, stream, false, null);
// Convert RTF to DOCX (using Spire.Doc)
var spireDoc = new Spire.Doc.Document();
spireDoc.LoadFromStream(stream, FileFormat.Auto);
spireDoc.SaveToFile("D:\\example.docx", FileFormat.Docx );
}
}
}
}
You can use Microsoft's DocumentFormat.OpenXML library which has a NuGet package.
I know there are lot of question having same title but I am currently having some issue for them I didn't get the correct way to go.
I am using Open xml sdk 2.5 along with Power tool to convert .docx file to .html file which uses HtmlConverter class for conversion.
I am successfully able to convert the docx file into the Html file but the problem is, html file doesn't retain the original formatting of the document file. eg. Font-size,color,underline,bold etc doesn't reflect into the html file.
Here is my existing code:
public void ConvertDocxToHtml(string fileName)
{
byte[] byteArray = File.ReadAllBytes(fileName);
using (MemoryStream memoryStream = new MemoryStream())
{
memoryStream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true))
{
HtmlConverterSettings settings = new HtmlConverterSettings()
{
PageTitle = "My Page Title"
};
XElement html = HtmlConverter.ConvertToHtml(doc, settings);
File.WriteAllText(#"E:\Test.html", html.ToStringNewLineOnAttributes());
}
}
}
So I just want to know if is there any way by which I can retain the formatting in converted HTML file.
I know about some third party APIs which does the same thing. But I would prefer if there any way using open xml or any other open source to do this.
PowerTools for Open XML just released a new HtmlConverter module. It now contains an open source, free implementation of a conversion from DOCX to HTML formatted with CSS. The module HtmlConverter.cs supports all paragraph, character, and table styles, fonts and text formatting, numbered and bulleted lists, images, and more. See https://openxmldeveloper.org/
Your end result will not look exactly the way your Word Document turns out, but this link might help.
You might want to find an external tool to help you do this, like Aspose Words
You can use OpenXML Viewer extension for Firefox for Converting with formatting.
http://openxmlviewer.codeplex.com
This works for me. Hope this helps.