How to open an existing PDF file with Migradoc PDF library

How to open an existing PDF file with Migradoc PDF library - c#

I am trying to use the Migradoc library from PDFSharp (http://www.pdfsharp.net/) to print pdf files. So far I have found that Migradoc does support printing through its MigraDoc.Rendering.Printing.MigraDocPrintDocument class. However, I have not found a way to actually open an existing PDF file with MigraDoc.
I did find a way to open an existing PDF file using PDFSharp, but I cannot successfully convert a PDFSharp.Pdf.PdfDocument into a MigraDoc.DocumentObjectModel.Document object. So far I have not found the MigraDoc and PDFSharp documentation to be very helpful.
Does anyone have any experience using these libraries to work with existing PDF files?
I wrote the following code with help from this sample, but the result when my input PDF is 2 pages is an output PDF with 2 blank pages.
using MigraDoc.DocumentObjectModel;
using MigraDoc.Rendering;
using PdfSharp.Drawing;
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;
...
public void PrintPDF(string filePath, string outFilePath)
{
var document = new Document();
var docRenderer = new DocumentRenderer(document);
docRenderer.PrepareDocument();
var inPdfDoc = PdfReader.Open(filePath, PdfDocumentOpenMode.Modify);
for (var i = 0; i < inPdfDoc.PageCount; i++)
{
document.AddSection();
docRenderer.PrepareDocument();
var page = inPdfDoc.Pages[i];
var gfx = XGraphics.FromPdfPage(page);
docRenderer.RenderPage(gfx, i+1);
}
var renderer = new PdfDocumentRenderer();
renderer.Document = document;
renderer.RenderDocument();
renderer.PdfDocument.Save(outFilePath);
}

Your code modifies the inPdfDoc in memory without saving the changes. Complicated code without any visual effect.
MigraDoc cannot open PDF files, MigraDoc cannot print PDF files, PDFsharp cannot print PDF files.
http://www.pdfsharp.net/wiki/PDFsharpFAQ.ashx

Related

GemBox DocumentModel.Load() cannot read Pdf file

Currently i am unable to load original pdf document using GemBox. it gives me below error in image. and I am using Acrobat 9.
I have tried using 8/16/2018 fixes too. Any suggestion will be highly appreciated.
Basic Code i am using is,
using GemBox.Document;
using System;
namespace Pdf2Text
{
class Program
{
[STAThread]
static void Main(string[] args)
{
ComponentInfo.SetLicense("My-License");
DocumentModel document = null;
document = DocumentModel.Load(#"E:\data\testing\HA021.pdf");
document.Save(#"E:\data\testing\HA021.docx");
}
}
}

The current implementation of PDF reader in GemBox.Document is still in beta and cannot handle this PDF feature, an "iref streams" which are a cross-reference tables stored in streams.
However, GemBox.Pdf can handle cross-reference streams so as a workaround what you could do is something like the following:
// Load PDF with GemBox.Pdf.
var pdfDocument = PdfDocument.Load("Sample.pdf");
pdfDocument.SaveOptions.CrossReferenceType = PdfCrossReferenceType.Table;
// Save PDF with GemBox.Pdf.
var pdfStream = new MemoryStream();
pdfDocument.Save(pdfStream);
// Load PDF with GemBox.Document.
var document = DocumentModel.Load(pdfStream, LoadOptions.PdfDefault);
Last regarding the conversion of PDF to DOCX, GemBox.Document's PDF reader is currently intended for extracting text and tables from PDF files, it's not intended for any high fidelity requirement.

Aspose HTML to PDF conversion- hyperlinks to content on same file not working

I am using AsposePDF for .Net version 17.3 for bulk conversion of lot of html files to PDF. I have an existing html file with hyperlinks to content in same file. Below is a sample of the html in the file.
Link:
Section 5
Content:
<a name="#bg880016"><p>section 5 content is here</p></a>
When this is converted to PDF the local links are not working anymore. Below is the conversion code:
public Stream ConvertHtmlToPDF(Stream inputStream, string docTitle)
{
Stream pdfStream = new MemoryStream();
inputStream.Position = 0;
var options = new HtmlLoadOptions();
var pdfDocument = new Aspose.Pdf.Document(inputStream, options);
pdfDocument.Info.Title = docTitle;
pdfDocument.Save(pdfStream);
}
Any help is much appreciated. I have also posted a question in their support forum.

I've found you need to convert it first to a Word document and then convert that to a PDF to get it to work as desired. Do you have Aspose.Words also?

Generating a DOC or DOCX using MigraDoc

I am working on a project where I need to create a Word file. For this purpose, I am using MigraDoc library for C#.
Using this library, I am easily able to generate a RTF file by writing :
Document document = CreateDocument();
RtfDocumentRenderer rtf = new RtfDocumentRenderer();
rtf.Render(document, "test.rtf", null);
Process.Start("test.rtf");
But the requirement now asks me to get a DOC or DOCX file rather than a RTF file. Is there a way to generate a DOC or DOCX file using MigraDoc? And if so, how may I achieve this?

MigraDoc cannot generate DOC or DOCX files. Since MigraDoc is open source, you could add a renderer for DOCX if you have the knowledge and the time.
MigraDoc as it is cannot generate DOC/DOCX, but maybe you can invoke an external conversion tool after generating the RTF file.
I don't know any such tools. Word can open RTF quickly and so far our customers never complained about getting RTF, not DOC or DOCX.
Update (2019-07-29): The website mentions "Word", but this only refers to RTF. There never was an implementation for .DOC or .DOCX.

It seems no any MigraDoc renders that support DOC or DOCX formats.
On documentation page we can see one MigraDoc feature:
Supports different output formats (PDF, Word, HTML, any printer supported by Windows)
But seems documentation says about RTF format that perfectly works with Word. I have reviewed MigraDoc repository and I do not see any DOC renders. We can use only RTF converter for Word supporting. So we can't generate DOC file directly using this package.
But we can convert RTF to DOC or DOCX easily (and for free) using FreeSpire.Doc nuget package.
Full code example is here:
using MigraDoc.DocumentObjectModel;
using MigraDoc.RtfRendering;
using Spire.Doc;
using System.IO;
namespace MigraDocTest
{
class Program
{
static void Main(string[] args)
{
using (var stream = new MemoryStream())
{
// Generate RTF (using MigraDoc)
var migraDoc = new MigraDoc.DocumentObjectModel.Document();
var section = migraDoc.AddSection();
var paragraph = section.AddParagraph();
paragraph.AddFormattedText("Hello World!", TextFormat.Bold);
var rtfDocumentRenderer = new RtfDocumentRenderer();
rtfDocumentRenderer.Render(migraDoc, stream, false, null);
// Convert RTF to DOCX (using Spire.Doc)
var spireDoc = new Spire.Doc.Document();
spireDoc.LoadFromStream(stream, FileFormat.Auto);
spireDoc.SaveToFile("D:\\example.docx", FileFormat.Docx );
}
}
}
}

You can use Microsoft's DocumentFormat.OpenXML library which has a NuGet package.

How to read a PDF Portfolio using iTextSharp

I'm using iTextSharp, in a C# app that reads PDF files and breaks out the pages as separate PDF documents. It works well, except in the case of portfolios. Now I'm trying to figure out how to read a PDF portfolio (or Collection, as they seem to be called in iText) that contains two embedded PDF documents. I want to simply open the portfolio, enumerate the embedded files and then save them as separate, simple PDF files.
There's a good example of how to programmatically create a PDF portfolio, here:
Kubrick Collection Example
But I haven't seen any examples that read portfolios. Any help would be much appreciated!

The example you referenced adds the embedded files as document-level attachments. So you can extract the files like this:
PdfReader reader = new PdfReader(readerPath);
PdfDictionary root = reader.Catalog;
PdfDictionary documentnames = root.GetAsDict(PdfName.NAMES);
PdfDictionary embeddedfiles =
documentnames.GetAsDict(PdfName.EMBEDDEDFILES);
PdfArray filespecs = embeddedfiles.GetAsArray(PdfName.NAMES);
for (int i = 0; i < filespecs.Size; ) {
filespecs.GetAsString(i++);
PdfDictionary filespec = filespecs.GetAsDict(i++);
PdfDictionary refs = filespec.GetAsDict(PdfName.EF);
foreach (PdfName key in refs.Keys) {
PRStream stream = (PRStream) PdfReader.GetPdfObject(
refs.GetAsIndirectObject(key)
);
using (FileStream fs = new FileStream(
filespec.GetAsString(key).ToString(), FileMode.OpenOrCreate
)){
byte[] attachment = PdfReader.GetStreamBytes(stream);
fs.Write(attachment, 0, attachment.Length);
}
}
}
Pass the output file from the Kubrick Collection Example you referenced to the PdfReader constructor (readerPath) if you want to test this.
Java version: part4.chapter16.KubrickDocumentary
C# version.
Hopefully I'll have time to update the C# examples this month from version 5.2.0.0 (the iTextSharp version is about three weeks behind the Java version right now).

how to rotate pdf using itextsharp library

hi
how do we rotate PDF using itext library.
Thanks

If you writing to a new PDF document, the following line will create a new A4 page rotated (into landscape)
Document doc = new Document(PageSize.A4.Rotate());

This is a very simple example:
Document pdf = new Document(PageSize.A4);
PdfWriter.GetInstance(pdf, new FileStream("file.pdf", System.IO.FileMode.Create));
pdf.Open();
pdf.Add(new Paragraph("This is a pdf document!"));
pdf.Close();
Edit: My mistake, I read "how to create pdf ...". That is what the example above does. I am sorry.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to open an existing PDF file with Migradoc PDF library - c#

Your code modifies the inPdfDoc in memory without saving the changes. Complicated code without any visual effect. MigraDoc cannot open PDF files, MigraDoc cannot print PDF files, PDFsharp cannot print PDF files. http://www.pdfsharp.net/wiki/PDFsharpFAQ.ashx

Related

GemBox DocumentModel.Load() cannot read Pdf file

Aspose HTML to PDF conversion- hyperlinks to content on same file not working

Generating a DOC or DOCX using MigraDoc

How to read a PDF Portfolio using iTextSharp

how to rotate pdf using itextsharp library

Categories

Resources