GhostscriptRasterizer.PageCount always returns zero

GhostscriptRasterizer.PageCount always returns zero - c#

This problem has already been discussed here:GhostscriptRasterizer Objects Returns 0 as PageCount value
But the answer to this question did not help me solve the problem.
In my case, it doesn’t help from kat to an older version of Ghostscript. 26 and 25. I always have PageCount = 0, and if the version is lower than 27, I get an error "Native Ghostscript library not found."
private static void PdfToPng(string inputFile, string outputFileName)
{
var xDpi = 100; //set the x DPI
var yDpi = 100; //set the y DPI
var pageNumber = 1; // the pages in a PDF document
using (var rasterizer = new GhostscriptRasterizer()) //create an instance for GhostscriptRasterizer
{
rasterizer.Open(inputFile); //opens the PDF file for rasterizing
//set the output image(png's) complete path
var outputPNGPath = Path.Combine(outputFolder, string.Format("{0}_Page{1}.png", outputFileName,pageNumber));
//converts the PDF pages to png's
var pdf2PNG = rasterizer.GetPage(xDpi, yDpi, pageNumber);
//save the png's
pdf2PNG.Save(outputPNGPath, ImageFormat.Png);
Console.WriteLine("Saved " + outputPNGPath);
}
}

I was struggling with the same problem and ended up using iTextSharp just to get the page count. Below is a snippet from the production code:
using (var reader = new PdfReader(pdfFile))
{
// as a matter of fact we need iTextSharp PdfReader (and all of iTextSharp) only to get the page count of PDF document;
// unfortunately GhostScript itself doesn't know how to do it
pageCount = reader.NumberOfPages;
}
Not a perfect solution but this is exactly what solved my problem. I left that comment there to remind myself that I have to find a better way somehow but I’ve never bothered to come back because it just works fine as it is...
PdfReader class is defined in iTextSharp.text.pdf namespace.
And I'm using Ghostscript.NET.GhostscriptPngDevice instead of GhostscriptRasterizer to rasterize the specific page of PDF document.
Here is my method that rasterizes the page and saves it to PNG file
private static void PdfToPngWithGhostscriptPngDevice(string srcFile, int pageNo, int dpiX, int dpiY, string tgtFile)
{
GhostscriptPngDevice dev = new GhostscriptPngDevice(GhostscriptPngDeviceType.PngGray);
dev.GraphicsAlphaBits = GhostscriptImageDeviceAlphaBits.V_4;
dev.TextAlphaBits = GhostscriptImageDeviceAlphaBits.V_4;
dev.ResolutionXY = new GhostscriptImageDeviceResolution(dpiX, dpiY);
dev.InputFiles.Add(srcFile);
dev.Pdf.FirstPage = pageNo;
dev.Pdf.LastPage = pageNo;
dev.CustomSwitches.Add("-dDOINTERPOLATE");
dev.OutputPath = tgtFile;
dev.Process();
}
Hope that would help...

Related

Reduce the DPI of pdf stream in c#

I have a PDF document with an image. And tried to preview it using PdfDocumentProcessor from DevExpress. But due to memory constraints want to reduce the DPI value of this pdf stream. So is there any possibility to do that?
public static void Main(string[] args)
{
var processor = new PdfDocumentProcessor();
var newDocument = new MemoryStream();
for (int i = 10; i <= 19; i++)
{
using (var stream = File.OpenRead($"D:\\Office\\photos\\BigPDF\\{i}.pdf"))
{
processor.AppendDocument(stream);
processor.SaveDocument(newDocument);
}
}
}

You can only control the DPI when printing, not when saving a local file.
You can however tweak the images within the PDF to convert them to JPEG and change the file quality of those.
See:
PdfGraphics.ConvertImagesToJpeg property
JpegImageQuality property

Moving from itext5 to itext7: How to replace a certain image inside a pdf?

I am just evaluating iText7 for a small tool we need to write about adding/replacing image-watermarks inside pdf files.
I could get the adding part to work with the samples from the official website but am struggeling about removing/replacing my added images.
There is an similar post for itext5 here: https://stackoverflow.com/a/4308500
But I fail to adapt this to itext7 .. the API seems to have changed a lot and the official documentation is not very helpful either (or I just not find the right place to look at)
Basically I need to find my own added image from the pdf back which means to identify it either by an ID that I could give it during adding OR by identify it through its height/width (this would be unique enough for us).
Any help would be very welcome!
update
Code for adding the watermark (interesting might be how to mark the image to find it back easier later, but in general this works already fine):
var reader = new PdfReader($"c:\\testconv\\test.pdf");
PdfDocument pdfDoc = new PdfDocument(reader, new PdfWriter($"c:\\testconv\\test+watermark.pdf"));
Rectangle pageSize;
PdfCanvas canvas;
int n = pdfDoc.GetNumberOfPages();
iText.IO.Image.ImageData myImageData = ImageDataFactory.Create($"c:\\testconv\\label#0,25x.jpg");
var height = 0.45f/2.54f*72f;
for (int i = 1; i <= n; i++)
{
var page = pdfDoc.GetPage(i);
pageSize = page.GetPageSizeWithRotation();
var top = 1.2f/2.54f*72f - height;
var left = pageSize.GetWidth() - myImageData.GetWidth() / myImageData.GetHeight() * height - 1.2f / 2.54f * 72f;
canvas = new PdfCanvas(page);
canvas.AddImage(myImageData, left, top, height, false, false);
}
pdfDoc.Close();
for finding/removing the label again I dont have any working code really yet:
var page = pdfDoc.GetPage(i);
var res = page.GetResources();
var xobj = res.GetResource(PdfName.XObject);
foreach (var xobject in xobj.KeySet())
{
PdfObject obj = xobj.Get(xobject);
if (obj.IsIndirect())
{
var stream = xobj.GetAsStream(xobject);
var subtype = stream.GetAsName(PdfName.Subtype);
if (PdfName.Image.Equals(subtype)) //direct image
{
}
else if (PdfName.Form.Equals(subtype)) // check for nested image?
{
}
}
}
I can't figure out how to actually find the images and then compare their properties to find out if its the right or wrong one to delete (for example by a name or ID, or properties like width/height.

ITextSharp/Pdftk: place Base64 Image from Web on PDF as Pseude-Signature

I am trying to conceptualize a way to get base64 image onto an already rendered PDF in iText. The goal is to have the PDF save to disk then reopen to apply the "signature" in the right spot.
I haven't had any success with finding other examples online so I'm asking Stack.
My app uses .net c#.
Any advice on how to get started?

As #mkl mentioned the question is a confusing, especially the title - usually base64 and signature do not go together. Guessing you want to place a base64 image from web on the PDF as a pseudo signature?!?!
A quick working example to get you started:
static void Main(string[] args)
{
string currentDir = AppDomain.CurrentDomain.BaseDirectory;
// 'INPUT' => already rendered pdf in iText
PdfReader reader = new PdfReader(INPUT);
string outputFile = Path.Combine(currentDir, OUTPUT);
using (var stream = new FileStream(outputFile, FileMode.Create))
{
using (PdfStamper stamper = new PdfStamper(reader, stream))
{
AcroFields form = stamper.AcroFields;
var fldPosition = form.GetFieldPositions("lname")[0];
Rectangle rectangle = fldPosition.position;
string base64Image = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==";
Regex regex = new Regex(#"^data:image/(?<mediaType>[^;]+);base64,(?<data>.*)");
Match match = regex.Match(base64Image);
Image image = Image.GetInstance(
Convert.FromBase64String(match.Groups["data"].Value)
);
// best fit if image bigger than form field
if (image.Height > rectangle.Height || image.Width > rectangle.Width)
{
image.ScaleAbsolute(rectangle);
}
// form field top left - change parameters as needed to set different position
image.SetAbsolutePosition(rectangle.Left + 2, rectangle.Top - 2);
stamper.GetOverContent(fldPosition.page).AddImage(image);
}
}
}
If you're not working with a PDF form template, (AcroFields in code snippet) explicitly set the absolute position and scale the image as needed.

How to convert PDF files to images

I need to convert PDF files to images. If the PDF file is multi-page，I just need one image that contains all of the PDF pages.
Is there an open source solution which is not charged like the Acrobat product?

The thread "converting PDF file to a JPEG image" is suitable for your request.
One solution is to use a third-party library. ImageMagick is a very popular and is freely available too. You can get a .NET wrapper for it here. The original ImageMagick download page is here.
Convert PDF pages to image files using the Solid Framework Convert PDF pages to image files using the Solid Framework (dead link, the deleted document is available on Internet Archive).
Convert PDF to JPG Universal Document Converter
6 Ways to Convert a PDF to a JPG Image
And you also can take a look at the thread
"How to open a page from a pdf file in pictureBox in C#".
If you use this process to convert a PDF to tiff, you can use this class to retrieve the bitmap from TIFF.
public class TiffImage
{
private string myPath;
private Guid myGuid;
private FrameDimension myDimension;
public ArrayList myImages = new ArrayList();
private int myPageCount;
private Bitmap myBMP;
public TiffImage(string path)
{
MemoryStream ms;
Image myImage;
myPath = path;
FileStream fs = new FileStream(myPath, FileMode.Open);
myImage = Image.FromStream(fs);
myGuid = myImage.FrameDimensionsList[0];
myDimension = new FrameDimension(myGuid);
myPageCount = myImage.GetFrameCount(myDimension);
for (int i = 0; i < myPageCount; i++)
{
ms = new MemoryStream();
myImage.SelectActiveFrame(myDimension, i);
myImage.Save(ms, ImageFormat.Bmp);
myBMP = new Bitmap(ms);
myImages.Add(myBMP);
ms.Close();
}
fs.Close();
}
}
Use it like so:
private void button1_Click(object sender, EventArgs e)
{
TiffImage myTiff = new TiffImage("D:\\Some.tif");
//imageBox is a PictureBox control, and the [] operators pass back
//the Bitmap stored at that position in the myImages ArrayList in the TiffImage
this.pictureBox1.Image = (Bitmap)myTiff.myImages[0];
this.pictureBox2.Image = (Bitmap)myTiff.myImages[1];
this.pictureBox3.Image = (Bitmap)myTiff.myImages[2];
}

You can use Ghostscript to convert PDF to images.
To use Ghostscript from .NET you can take a look at Ghostscript.NET library (managed wrapper around the Ghostscript library).
To produce image from the PDF by using Ghostscript.NET, take a look at RasterizerSample.
To combine multiple images into the single image, check out this sample: http://www.niteshluharuka.com/2012/08/combine-several-images-to-form-a-single-image-using-c/#

As for 2018 there is still not a simple answer to the question of how to convert a PDF document to an image in C#; many libraries use Ghostscript licensed under AGPL and in most cases an expensive commercial license is required for production use.
A good alternative might be using the popular 'pdftoppm' utility which has a GPL license; it can be used from C# as command line tool executed with System.Diagnostics.Process. Popular tools are well known in the Linux world, but a windows build is also available.
If you don't want to integrate pdftoppm by yourself, you can use my PdfRenderer popular wrapper (supports both classic .NET Framework and .NET Core) - it is not free, but pricing is very affordable.

I used PDFiumSharp and ImageSharp in a .NET Standard 2.1 class library.
/// <summary>
/// Saves a thumbnail (jpg) to the same folder as the PDF file, using dimensions 300x423,
/// which corresponds to the aspect ratio of 'A' paper sizes like A4 (ratio h/w=sqrt(2))
/// </summary>
/// <param name="pdfPath">Source path of the pdf file.</param>
/// <param name="thumbnailPath">Target path of the thumbnail file.</param>
/// <param name="width"></param>
/// <param name="height"></param>
public static void SaveThumbnail(string pdfPath, string thumbnailPath = "", int width = 300, int height = 423)
{
using var pdfDocument = new PdfDocument(pdfPath);
var firstPage = pdfDocument.Pages[0];
using var pageBitmap = new PDFiumBitmap(width, height, true);
firstPage.Render(pageBitmap);
var imageJpgPath = string.IsNullOrWhiteSpace(thumbnailPath)
? Path.ChangeExtension(pdfPath, "jpg")
: thumbnailPath;
var image = Image.Load(pageBitmap.AsBmpStream());
// Set the background to white, otherwise it's black. https://github.com/SixLabors/ImageSharp/issues/355#issuecomment-333133991
image.Mutate(x => x.BackgroundColor(Rgba32.White));
image.Save(imageJpgPath, new JpegEncoder());
}

Searching for a powerful and free solution in dotnet core that works on Windows and Linux got me to https://github.com/Dtronix/PDFiumCore and https://github.com/GowenGit/docnet. As PDFiumCore use a much newer version of Pdfium (that seems to be a critical point for using a pdf library) I ended up using it.
Note: If you want to use it on Linux you should install 'libgdiplus' as https://stackoverflow.com/a/59252639/6339469 suggests.
Here's a simple single thread code:
var pageIndex = 0;
var scale = 2;
fpdfview.FPDF_InitLibrary();
var document = fpdfview.FPDF_LoadDocument("test.pdf", null);
var page = fpdfview.FPDF_LoadPage(document, pageIndex);
var size = new FS_SIZEF_();
fpdfview.FPDF_GetPageSizeByIndexF(document, 0, size);
var width = (int)Math.Round(size.Width * scale);
var height = (int)Math.Round(size.Height * scale);
var bitmap = fpdfview.FPDFBitmapCreateEx(
width,
height,
4, // BGRA
IntPtr.Zero,
0);
fpdfview.FPDFBitmapFillRect(bitmap, 0, 0, width, height, (uint)Color.White.ToArgb());
// | | a b 0 |
// | matrix = | c d 0 |
// | | e f 1 |
using var matrix = new FS_MATRIX_();
using var clipping = new FS_RECTF_();
matrix.A = scale;
matrix.B = 0;
matrix.C = 0;
matrix.D = scale;
matrix.E = 0;
matrix.F = 0;
clipping.Left = 0;
clipping.Right = width;
clipping.Bottom = 0;
clipping.Top = height;
fpdfview.FPDF_RenderPageBitmapWithMatrix(bitmap, page, matrix, clipping, (int)RenderFlags.RenderAnnotations);
var bitmapImage = new Bitmap(
width,
height,
fpdfview.FPDFBitmapGetStride(bitmap),
PixelFormat.Format32bppArgb,
fpdfview.FPDFBitmapGetBuffer(bitmap));
bitmapImage.Save("test.jpg", ImageFormat.Jpeg);
For a thread safe implementation see this:
https://github.com/hmdhasani/DtronixPdf/blob/master/src/DtronixPdfBenchmark/Program.cs

The PDF engine used in Google Chrome, called PDFium, is open source under the "BSD 3-clause" license. I believe this allows redistribution when used in a commercial product.
There is a .NET wrapper for it called PdfiumViewer (NuGet) which works well to the extent I have tried it. It is under the Apache license which also allows redistribution.
(Note that this is NOT the same 'wrapper' as https://pdfium.patagames.com/ which requires a commercial license)
(There is one other PDFium .NET wrapper, PDFiumSharp, but I have not evaluated it.)
In my opinion, so far, this may be the best choice of open-source (free as in beer) PDF libraries to do the job which do not put restrictions on the closed-source / commercial nature of the software utilizing them. I don't think anything else in the answers here satisfy that criteria, to the best of my knowledge.

Regarding PDFiumSharp: After elaboration I was able to create PNG files from a PDF solution.
This is my code:
using PDFiumSharp;
using System.Collections.Generic;
using System.Drawing;
using System.IO;
public class Program
{
static public void Main(String[] args)
{
var renderfoo = new Renderfoo()
renderfoo.RenderPDFAsImages(#"C:\Temp\example.pdf", #"C:\temp");
}
}
public class Renderfoo
{
public void RenderPDFAsImages(string Inputfile, string OutputFolder)
{
string fileName = Path.GetFileNameWithoutExtension(Inputfile);
using (PDFiumSharp.PdfDocument doc = new PDFiumSharp.PdfDocument(Inputfile))
{
for (int i = 0; i < doc.Pages.Count; i++)
{
var page = doc.Pages[i];
using (var bitmap = new System.Drawing.Bitmap((int)page.Width, (int)page.Height))
{
var grahpics = Graphics.FromImage(bitmap);
grahpics.Clear(Color.White);
page.Render(bitmap);
var targetFile = Path.Combine(OutputFolder, fileName + "_" + i + ".png");
bitmap.Save(targetFile);
}
}
}
}
}
For starters, you need to take the following steps to get the PDFium wrapper up and running:
Run the Custom Code tool for both tt files via right click in Visual Studio
Compile the GDIPlus Project
Copy the compiled assemblies (from the GDIPlus project) to your project
Reference both PDFiumSharp and PDFiumsharp.GdiPlus assemblies in your project
Make sure that pdfium_x64.dll and/or pdfium_x86.dll are both found in your project output directory.

You may check Freeware.Pdf2Png MIT license.
Just find in nuget those name.
var dd = System.IO.File.ReadAllBytes("pdffile.pdf");
byte[] pngByte = Freeware.Pdf2Png.Convert(dd, 1);
System.IO.File.WriteAllBytes(Path.Combine(#"C:\temp", "dd.png"), pngByte );

The NuGet package Pdf2Png is available for free and is only protected by the MIT License, which is very open.
I've tested around a bit and this is the code to get it to convert a PDF file to an image (tt does save the image in the debug folder).
using cs_pdf_to_image;
using PdfToImage;
private void BtnConvert_Click(object sender, EventArgs e)
{
if(openFileDialog1.ShowDialog() == DialogResult.OK)
{
try
{
string PdfFile = openFileDialog1.FileName;
string PngFile = "Convert.png";
List<string> Conversion = cs_pdf_to_image.Pdf2Image.Convert(PdfFile, PngFile);
Bitmap Output = new Bitmap(PngFile);
PbConversion.Image = Output;
}
catch(Exception E)
{
MessageBox.Show(E.Message);
}
}
}

Apache PDFBox also works great for me.
Usage with the command line tool:
javar -jar pdfbox-app-2.0.19.jar PDFToImage -quality 1.0 -dpi 150 -prefix out_dir/page -format png

There is a free nuget package (Pdf2Image), which allows the extraction of pdf pages to jpg files or to a collection of images (List ) in just one line
string file = "c:\\tmp\\test.pdf";
List<System.Drawing.Image> images = PdfSplitter.GetImages(file, PdfSplitter.Scale.High);
PdfSplitter.WriteImages(file, "c:\\tmp", PdfSplitter.Scale.High, PdfSplitter.CompressionLevel.Medium);
All source is also available on github Pdf2Image

Using Android default libraries like AppCompat, you can convert all the PDF pages into images. This way is very fast and optimized. The below code is for getting separate images of a PDF page. It is very fast and quick.
ParcelFileDescriptor fileDescriptor = ParcelFileDescriptor.open(new File("pdfFilePath.pdf"), MODE_READ_ONLY);
PdfRenderer renderer = new PdfRenderer(fileDescriptor);
final int pageCount = renderer.getPageCount();
for (int i = 0; i < pageCount; i++) {
PdfRenderer.Page page = renderer.openPage(i);
Bitmap bitmap = Bitmap.createBitmap(page.getWidth(), page.getHeight(),Bitmap.Config.ARGB_8888);
Canvas canvas = new Canvas(bitmap);
canvas.drawColor(Color.WHITE);
canvas.drawBitmap(bitmap, 0, 0, null);
page.render(bitmap, null, null, PdfRenderer.Page.RENDER_MODE_FOR_DISPLAY);
page.close();
if (bitmap == null)
return null;
if (bitmapIsBlankOrWhite(bitmap))
return null;
String root = Environment.getExternalStorageDirectory().toString();
File file = new File(root + filename + ".png");
if (file.exists()) file.delete();
try {
FileOutputStream out = new FileOutputStream(file);
bitmap.compress(Bitmap.CompressFormat.PNG, 100, out);
Log.v("Saved Image - ", file.getAbsolutePath());
out.flush();
out.close();
} catch (Exception e) {
e.printStackTrace();
}
}
=======================================================
private static boolean bitmapIsBlankOrWhite(Bitmap bitmap) {
if (bitmap == null)
return true;
int w = bitmap.getWidth();
int h = bitmap.getHeight();
for (int i = 0; i < w; i++) {
for (int j = 0; j < h; j++) {
int pixel = bitmap.getPixel(i, j);
if (pixel != Color.WHITE) {
return false;
}
}
}
return true;
}

I kind of bumped into this project at SourceForge. It seems to me it's still active.
PDF convert to JPEG at SourceForge
Developer's site
My two cents.

https://www.codeproject.com/articles/317700/convert-a-pdf-into-a-series-of-images-using-csharp
I found this GhostScript wrapper to be working like a charm for converting the PDFs to PNGs, page by page.
Usage:
string pdf_filename = #"C:\TEMP\test.pdf";
var pdf2Image = new Cyotek.GhostScript.PdfConversion.Pdf2Image(pdf_filename);
for (var page = 1; page < pdf2Image.PageCount; page++)
{
string png_filename = #"C:\TEMP\test" + page + ".png";
pdf2Image.ConvertPdfPageToImage(png_filename, page);
}
Being built on GhostScript, obviously for commercial application the licensing question remains.

(Disclaimer I worked on this component at Software Siglo XXI)
You could use Super Pdf2Image Converter to generate a TIFF multi-page file with all the rendered pages from the PDF in high resolution. It's available for both 32 and 64 bit and is very cheap and effective. I'd recommend you to try it.
Just one line of code...
GetImage(outputFileName, firstPage, lastPage, resolution, imageFormat)
Converts specifies pages to image and save them to outputFileName (tiff allows multi-page or creates several files)
You can take a look here: http://softwaresigloxxi.com/SuperPdf2ImageConverter.html

pdf to image issues

am trying to convert pdftoimage using the below link
http://threebit.net/mail-archive/itext-questions/msg00436.html
but i get this error how to get this code to work ?
"The type or namespace name 'PdfDecoder' could not be found"
am looking for open source .
this ghostscript dint work on server ,
http://www.codeproject.com/KB/webforms/aspnetpdfviewer.aspx
help me.

you Can try this.....
pdfDoc = (Acrobat.CAcroPDDoc)
Microsoft.VisualBasic.Interaction.CreateObject("Ac roExch.PDDoc", "");
int ret = pdfDoc.Open(inputFile);
if (ret == 0)
{
throw new FileNotFoundException();
}
// Get the number of pages (to be used later if you wanted to store that information)
int pageCount = pdfDoc.GetNumPages();
// Get the first page
pdfPage = (Acrobat.CAcroPDPage)pdfDoc.AcquirePage(0);
pdfPoint = (Acrobat.CAcroPoint)pdfPage.GetSize();
pdfRect = (Acrobat.CAcroRect)
Microsoft.VisualBasic.Interaction.CreateObject("Ac roExch.Rect", "");
pdfRect.Left = 0;
pdfRect.right = pdfPoint.x;
pdfRect.Top = 0;
pdfRect.bottom = pdfPoint.y;
// Render to clipboard, scaled by 100 percent (ie. original size)
// Even though we want a smaller image, better for us to scale in .NET
// than Acrobat as it would greek out small text
pdfPage.CopyToClipboard(pdfRect, 0, 0, 100);
IDataObject clipboardData = Clipboard.GetDataObject();
if (clipboardData.GetDataPresent(DataFormats.Bitmap))
{
Bitmap pdfBitmap =
(Bitmap)clipboardData.GetData(DataFormats.Bitmap);
}
pls take a look at this link For more info
you can try this one also
SautinSoft.PdfFocus f = new SautinSoft.PdfFocus();
f.ConvertPdfToImage(#"c:\sample.pdf", #"c:\pages\",
SautinSoft.PdfFocus.eImageFormat.Jpeg, 200);
pls go through this link for more info

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

GhostscriptRasterizer.PageCount always returns zero - c#

Related

Reduce the DPI of pdf stream in c#

Moving from itext5 to itext7: How to replace a certain image inside a pdf?

ITextSharp/Pdftk: place Base64 Image from Web on PDF as Pseude-Signature

How to convert PDF files to images

pdf to image issues

Categories

Resources