GemBox DocumentModel.Load() cannot read Pdf file - c#

Currently i am unable to load original pdf document using GemBox. it gives me below error in image. and I am using Acrobat 9.
I have tried using 8/16/2018 fixes too. Any suggestion will be highly appreciated.
Basic Code i am using is,
using GemBox.Document;
using System;
namespace Pdf2Text
{
class Program
{
[STAThread]
static void Main(string[] args)
{
ComponentInfo.SetLicense("My-License");
DocumentModel document = null;
document = DocumentModel.Load(#"E:\data\testing\HA021.pdf");
document.Save(#"E:\data\testing\HA021.docx");
}
}
}

The current implementation of PDF reader in GemBox.Document is still in beta and cannot handle this PDF feature, an "iref streams" which are a cross-reference tables stored in streams.
However, GemBox.Pdf can handle cross-reference streams so as a workaround what you could do is something like the following:
// Load PDF with GemBox.Pdf.
var pdfDocument = PdfDocument.Load("Sample.pdf");
pdfDocument.SaveOptions.CrossReferenceType = PdfCrossReferenceType.Table;
// Save PDF with GemBox.Pdf.
var pdfStream = new MemoryStream();
pdfDocument.Save(pdfStream);
// Load PDF with GemBox.Document.
var document = DocumentModel.Load(pdfStream, LoadOptions.PdfDefault);
Last regarding the conversion of PDF to DOCX, GemBox.Document's PDF reader is currently intended for extracting text and tables from PDF files, it's not intended for any high fidelity requirement.

Related

Split large PDF file in to multiple pdfs in C#

I have a large pdf file which I need to split into multiple pdfs or chunks before I upload to the server(another wcf service).
I have two approaches to send large files(>2 MB) to server by splitting them multiple pdfs or one pdf into chunks .Can any one tell me this how to achieve ?
I found the articles using iTextSharp but it's deprecated one. I don't use licensed one. Do we have any feasible way to achieve this ?
I have followed the following article .But they have used iTextshap which is a deprecated one .
https://www.c-sharpcorner.com/article/splitting-pdf-file-in-c-sharp-using-itextsharp/
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;
using System.IO;
class Program
{
// Output Folder
static string outputFolder = #"D:\PDFSplit\Example\outputFolder";
static void Main(string[] args)
{
// Input Folder
var inputFolder = #"D:\PDFSplit\Example\inputFolder";
// Input File name
var inputPDFFileName = "sample.pdf";
// Input file path
string inputPDFFilePath = Path.Combine(inputFolder, inputPDFFileName);
// Open the input file in Import Mode
PdfDocument inputPDFFile = PdfReader.Open(inputPDFFilePath, PdfDocumentOpenMode.Import);
//Get the total pages in the PDF
var totalPagesInInputPDFFile = inputPDFFile.PageCount;
while(totalPagesInInputPDFFile !=0)
{
//Create an instance of the PDF document in memory
PdfDocument outputPDFDocument = new PdfDocument();
// Add a specific page to the PdfDocument instance
outputPDFDocument.AddPage(inputPDFFile.Pages[totalPagesInInputPDFFile-1]);
//save the PDF document
SaveOutputPDF(outputPDFDocument, totalPagesInInputPDFFile);
totalPagesInInputPDFFile--;
}
}
private static void SaveOutputPDF(PdfDocument outputPDFDocument,int pageNo)
{
// Output file path
string outputPDFFilePath = Path.Combine(outputFolder, pageNo.ToString() + ".pdf");
//Save the document
outputPDFDocument.Save(outputPDFFilePath);
}
}
The first thing to split the pdf is to reduce the size of the file and split into serval files and then analyze it. First you need to pdf document which you need to split and then you need to call the split function with the output file streams.
PdfLoadedDocument document = new PdfLoadedDocument("sample.pdf");
document.Split("Document-{0}.pdf");
document.Close(true);
There is a another way, first you need to load the PDF document using Document class and then choose the pages to be split into a Page[] array. After that Create a new Document and add pages to it using Document.Pages.Add(Page[]) method.
Save the PDF file using the Document.Save(String) method.
Try using streams, for instance StreamReader.
See meatvest's answer here
And the docs

how can we convert word to .pdf format and excel to .pdf format from c#

How can we convert the excel file and word file to .pdf format from c#. i tried the following code but it shows the an error
this is my code:
Microsoft.Office.Interop.Word.Application appWord = new Microsoft.Office.Interop.Word.Application();
wordDocument = appWord.Documents.Open(#"C:\Users\ITPro2\Documents\test.docx");
wordDocument.ExportAsFixedFormat(#"D:\desktop\DocTo.pdf", Microsoft.Office.Interop.Word.WdExportFormat.wdExportFormatPDF);
and i got the following Error
The export failed because this feature is not installed. during export to pdf from word from c#
While not directly related the documentation under
https://msdn.microsoft.com/en-us/library/office/ff198122.aspx
gives a note, that if the pdf add-in is not installed, exactly this error will occur. So check your prerequisites, i.e. Office installed and the add-in, too.
1) Excel 2013 Primary Interop Assembly Class Library and it works perfectly fine under .NET 4.5.1 Just add Microsoft.Office.Interop.Excel assembly to your references and you are ready to go.
using System;
using Microsoft.Office.Interop.Excel;
namespace officeInterop
{
class Program
{
static void Main(string[] args)
{
Application app = new Application();
Workbook wkb = app.Workbooks.Open("d:\\x.xlsx");
wkb.ExportAsFixedFormat(XlFixedFormatType.xlTypePDF, "d:\\x.pdf");
}
}
}
OR
2) refer this link to convert DOC or DOCx file into PDF
http://www.rasteredge.com/how-to/csharp-imaging/pdf-convert-word-to-pdf/
Since my other comment got deleted, here is the updated version.
For converting my files to pdf in c# i used the metamorphosis libary, this could also be a solution for you.
Below is a code example from me where i used a blobstorage to download PDF files from regular files.
var converter = new SautinSoft.PdfMetamorphosis();
var ms = new MemoryStream();
await blob.DownloadToStreamAsync(ms);
ms.Seek(0, SeekOrigin.Begin);
var pdfStream = converter.DocxToPdfConvertStream(ms);
if (pdfStream != null)
{
await _storageProvider.SaveFileAsync(containerName, fileName, pdfStream);
}
ms.Close();
var result = await _storageProvider.GetFileWithAttributesAsync(containerName, fileName);
return new ServiceResponse<CloudBlockBlob>(result);
Below i posted some links with the sample code from the library itself:
Word to PDF example
Excel to PDF example

Generating a DOC or DOCX using MigraDoc

I am working on a project where I need to create a Word file. For this purpose, I am using MigraDoc library for C#.
Using this library, I am easily able to generate a RTF file by writing :
Document document = CreateDocument();
RtfDocumentRenderer rtf = new RtfDocumentRenderer();
rtf.Render(document, "test.rtf", null);
Process.Start("test.rtf");
But the requirement now asks me to get a DOC or DOCX file rather than a RTF file. Is there a way to generate a DOC or DOCX file using MigraDoc? And if so, how may I achieve this?
MigraDoc cannot generate DOC or DOCX files. Since MigraDoc is open source, you could add a renderer for DOCX if you have the knowledge and the time.
MigraDoc as it is cannot generate DOC/DOCX, but maybe you can invoke an external conversion tool after generating the RTF file.
I don't know any such tools. Word can open RTF quickly and so far our customers never complained about getting RTF, not DOC or DOCX.
Update (2019-07-29): The website mentions "Word", but this only refers to RTF. There never was an implementation for .DOC or .DOCX.
It seems no any MigraDoc renders that support DOC or DOCX formats.
On documentation page we can see one MigraDoc feature:
Supports different output formats (PDF, Word, HTML, any printer supported by Windows)
But seems documentation says about RTF format that perfectly works with Word. I have reviewed MigraDoc repository and I do not see any DOC renders. We can use only RTF converter for Word supporting. So we can't generate DOC file directly using this package.
But we can convert RTF to DOC or DOCX easily (and for free) using FreeSpire.Doc nuget package.
Full code example is here:
using MigraDoc.DocumentObjectModel;
using MigraDoc.RtfRendering;
using Spire.Doc;
using System.IO;
namespace MigraDocTest
{
class Program
{
static void Main(string[] args)
{
using (var stream = new MemoryStream())
{
// Generate RTF (using MigraDoc)
var migraDoc = new MigraDoc.DocumentObjectModel.Document();
var section = migraDoc.AddSection();
var paragraph = section.AddParagraph();
paragraph.AddFormattedText("Hello World!", TextFormat.Bold);
var rtfDocumentRenderer = new RtfDocumentRenderer();
rtfDocumentRenderer.Render(migraDoc, stream, false, null);
// Convert RTF to DOCX (using Spire.Doc)
var spireDoc = new Spire.Doc.Document();
spireDoc.LoadFromStream(stream, FileFormat.Auto);
spireDoc.SaveToFile("D:\\example.docx", FileFormat.Docx );
}
}
}
}
You can use Microsoft's DocumentFormat.OpenXML library which has a NuGet package.

Read a stored PDF from memory stream

I'm working on a database project using C# and SQLServer 2012. In one of my forms I have a PDF file with some other information that is stored in a table. This is working successfully, but when I want to retrieve the stored information I have a problem with displaying the PDF file, because I can't display it and I don't know how to display it.
I read some articles that said it can not be displayed with Adobe PDF viewer from a memory stream, is there any way to that?
This is my code for retrieving the data from the database:
sql_com.CommandText = "select * from incoming_boks_tbl where [incoming_bok_id]=#incoming_id and [incoming_date]=#incoming_date";
sql_com.Parameters.AddWithValue("incoming_id",up_inco_num_txt.Text);
sql_com.Parameters.AddWithValue("incoming_date", up_inco_date_txt.Text);
sql_dr = sql_com.ExecuteReader();
if(sql_dr.HasRows)
{
while(sql_dr.Read())
{
up_incoming_id_txt.Text = sql_dr[0].ToString();
up_inco_num_txt.Text = sql_dr[1].ToString();
up_inco_date_txt.Text = sql_dr[2].ToString();
up_inco_reg_txt.Text = sql_dr[3].ToString();
up_inco_place_txt.Text = sql_dr[4].ToString();
up_in_out_txt.Text = sql_dr[5].ToString();
up_subj_txt.Text = sql_dr[6].ToString();
up_note_txt.Text = sql_dr[7].ToString();
string file_ext = sql_dr[8].ToString();//pdf file extension
byte[] inco_file = (byte[])(sql_dr[9]);//the pdf file
MemoryStream ms = new MemoryStream(inco_file);
//here I don't know what to do with memory stream file data and where to store it. How can i display it?
}
}
This answer should give you some options: How to render pdfs using C#
In the past I have used Googles open source PDF rendering project - PDFium
There is a C# nuget package called PdfiumViewer which gives a C# wrapper around PDFium and allows PDFs to be displayed and printed.
It works directly with Streams so doesn't require any data to be written to disk
This is my example from a WinForms app
public void LoadPdf(byte[] pdfBytes)
{
var stream = new MemoryStream(pdfBytes);
LoadPdf(stream)
}
public void LoadPdf(Stream stream)
{
// Create PDF Document
var pdfDocument = PdfDocument.Load(stream);
// Load PDF Document into WinForms Control
pdfRenderer.Load(_pdfDocument);
}

How to open an existing PDF file with Migradoc PDF library

I am trying to use the Migradoc library from PDFSharp (http://www.pdfsharp.net/) to print pdf files. So far I have found that Migradoc does support printing through its MigraDoc.Rendering.Printing.MigraDocPrintDocument class. However, I have not found a way to actually open an existing PDF file with MigraDoc.
I did find a way to open an existing PDF file using PDFSharp, but I cannot successfully convert a PDFSharp.Pdf.PdfDocument into a MigraDoc.DocumentObjectModel.Document object. So far I have not found the MigraDoc and PDFSharp documentation to be very helpful.
Does anyone have any experience using these libraries to work with existing PDF files?
I wrote the following code with help from this sample, but the result when my input PDF is 2 pages is an output PDF with 2 blank pages.
using MigraDoc.DocumentObjectModel;
using MigraDoc.Rendering;
using PdfSharp.Drawing;
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;
...
public void PrintPDF(string filePath, string outFilePath)
{
var document = new Document();
var docRenderer = new DocumentRenderer(document);
docRenderer.PrepareDocument();
var inPdfDoc = PdfReader.Open(filePath, PdfDocumentOpenMode.Modify);
for (var i = 0; i < inPdfDoc.PageCount; i++)
{
document.AddSection();
docRenderer.PrepareDocument();
var page = inPdfDoc.Pages[i];
var gfx = XGraphics.FromPdfPage(page);
docRenderer.RenderPage(gfx, i+1);
}
var renderer = new PdfDocumentRenderer();
renderer.Document = document;
renderer.RenderDocument();
renderer.PdfDocument.Save(outFilePath);
}
Your code modifies the inPdfDoc in memory without saving the changes. Complicated code without any visual effect.
MigraDoc cannot open PDF files, MigraDoc cannot print PDF files, PDFsharp cannot print PDF files.
http://www.pdfsharp.net/wiki/PDFsharpFAQ.ashx

Categories