Unable to convert PCL file to PDF using Filestream and PDFSharp - c#

I was unable to find a free library which can directly convert PCL file to PDF file, i had a thought of reading the PCL file into FileStream and saving it to a PDF document using PDF Sharp.
I tried the following code, but it gives me a blank PDF Document.
Can someone let me know what or where i'm doing it incorrectly?
private void pdfSharpPclToPDF(string localPCLPath)
{
using(FileStream fleStream = new FileStream(localPCLPath, FileMode.Open))
{
using (PdfDocument newPDF = new PdfDocument(fleStream))
{
PdfPage pdfPage = new PdfPage(newPDF);
newPDF.Pages.Add(pdfPage);
newPDF.Save("D:\\Research\\PDF_Files\\output.pdf");
}
}
}
It would also help if someone could suggest any other open source library that can do the job for me.

PDFsharp does not use the filestream you create. If you invoke Save() without filename, PDFsharp will save the PDF document to the stream. If you invoke Save(<filename>) the document will be saved in that file.
PDFsharp cannot read PCL files. You are trying something that cannot work with PDFsharp.

Most Applications will usually include some way to export content as a printer output. The higher end various output formats are generically referred to as PDL (Page Description Language) and historically were stored as filename.prn without distinguishing content.
The contents of a PRN could traditionally be Esc code as used by Epson , PostScript Programs or PCL (Printer Control Language) plus many others and nowadays we often include the formerly dumber PDF which is accepted by high end printers.
So PDFsharp can Export to PCL via a driver but is not designed to Import it for conversion into PDF output.
One application that can convert bidirectionally between PDL devices is Artifex GhostPDL (it does NOT Import Epson Esc Code but can export Epson code). GhostScript is Open Source, as you request, but is (AGPL) commercial licensed. However it is the most capable with a few decades of history.

Related

SILENT conversion of PDF to XPS with "Microsoft XPS Document Writer"

I need to convert a bunch PDFs to XPS documents programmatically (in C#). Therefore, I tried to call several command line tools:
AcroRd32.exe (Adobe Reader)
SumatraPDF.exe (neat portable tool)
PDF2Printer for Windows 10
and many, many others.
However, none of them seems to support functionallity to specify a filename for the generated XPS, which causes the "Save-Dialog" to pop-up on call. I am looking for a complete silent way to convert a PDF to XPS without any user interaction. Furthermore, I wish to use only tools that are free for commercial use (without any AGPL licensing). Any workarounds for this issue?
The following lib supports to convert pdf to xps and specify file name for generated xps file but with limitations: http://freepdf.codeplex.com
PdfDocument doc = new PdfDocument();
doc.LoadFromFile("FileName.pdf");
doc.SaveToFile("FileName.xps", FileFormat.XPS);

Reading a part of PDF file in c#

I have many large size PDF files that I need to only read a part of them. I want to start reading the PDF file and write it to another file like a txt file, or any other type of files.
However, I want to make a limitation on the size of the file that I am writing in. When the size of txt file is about 15 MB, I should stop reading the PDF document and then I keep the created txt file for my purpose.
Does anyone can help me how can I do this in C#?
Thanks for your help in advance.
Here is the code that I use for reading the whole file; (image content is not important for me)
using (StreamReader sr = new StreamReader(#"F:\1.pdf"))
{
using (StreamWriter sw = new StreamWriter(#"F:\test.txt"))
{
while (!sr.EndOfStream)
{
string line = sr.ReadLine();
sw.WriteLine(line);
sw.Flush();
}
}
}
You have to use PDF library to do this.There are a lot of free and paid PDF libraries out there which can be used to do your task. Recently I have used EO.pdf library to read pdf page and extract page content. The best part is that it has NuGet package and also continuously developed. The cons is you have to pay for commercial use.
PDF can't be read directly using .NET. You should first convert PDF to text (or XML, or HTML).
there are lot of PDF libraries capable of converting PDF to text like iTextSharp (most popular and open-source) and lot of other tools
To control the size of the output text files you should
get number of pages from PDF
run pdf to text conversion page by page meanwhile checking the output text file size
once file size is over 15 MB just stop the conversion and move to another file

Possibility to convert HTM file to PDF

Is there a way to convert a HTM file to a PDF? Based on my understanding, HTML and HTM file extensions are the same. With that in mind, I tried the following code using Spire but my output was a blank PDF.
if (filelist[f].EndsWith(".htm"))
{
PdfDocument doc = new PdfDocument();
string filext = System.IO.Path.GetExtension(filelist[f]);
string outputDocName = filelist[f].Replace(filext, ".pdf");
doc.SaveToFile(outputDocName);
doc.Close();
}
I have searched on Google but I couldn't find much on converting a HTML file to a PDF. I have even looked into Python using ImageMagick, but there is multiple steps, so will try that once I run out of options. Is iTextSharp a possibility? Do I need to do another conversion to the HTM file to another file type before turning that into a PDF or for what I am trying to do doesn't exists?
HTML (with file extension of .htm or .html) is in a sense a plain text file which needs parsing and rendering to produce "visual" output. So ImageMagick or similar tools will not work if they have no concept of rendering HTML.
If this is a one-off requirement - get a PDF printer driver (for example CUPS PDF) and just "print" the pages from your browser.
If this needs to be an automated process my personal suggestion would be phantomJS. But search and Google are your friends - Converting HTML to PDF using PHP?

Converting Image-Based PDF to Text-Based PDF

how to convert Image-Based PDF to Text-Based PDF. There are lot of tools available for using. But iam looking for a C# code to make an application. I heard about Tessara but i not get code for C#. it is available only c/c++.
I used MODI dll to convert Image to Text. The process is Converting Each page of PDF to Image(using Acrobat dll) and with that output Image(bmp/tif) we can use MODI to get text. is there any possibility available to change the MODI object to PDF?
MODI.Document doc = new MODI.Document();
doc.Create(ImagePath);
doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, false, false);
doc.SaveAs("c://.../test.pdf", MODI.MiFILE_FORMAT.miFILE_FORMAT_DEFAULTVALUE, MODI.MiCOMP_LEVEL.miCOMP_LEVEL_HIGH);
//But this line creating PDF but the PDF is not opened. Due to error.
if u have any other way to do this please let me know.
Regards,
R.Balajiprasad
You can use google's Tesseract-OCR and the documentation can be found here. It's free and works perfectly. There is a nugget package (IronOcr) which uses tesseract and it can be found here.

Free way to convert PDF to XPS with C#

Are there any free tools that I can use to convert a PDF document into an XPS document? Although a nice programmatic API would be nice, I'm not opposed to shelling out to a command line tool to do the conversion.
Thanks!
AbcPdf version 7 includes this funcationality, if you link back to their site you can have a free license key. Utilize the save method to accomplish PDF - XPS conversions.
XPS is exported only if the supplied path ends with ".xps", and it requires the .NET Framework 3.0. XPS is supported via Doc.SaveOptions.FileExtension of the Save method when saving to a stream. Set this property to either ".xps" or "xps" otherwise a conventional PDF output will be generated. XPS streams must have FileAccess.ReadWrite and not simply FileAccess.Write otherwise the operation will fail.
virtual void Save(Stream stream)

Categories