I'm looking for a way to get the PDF page color information using ITextSharp. I need to know if the page is Black and White or color
any help would be great.
To the best of my knowledge PDFs don't have a "page color" or a "background color". The fact that when you open a PDF in Acrobat and you see a white canvas is actually an implementation detail, albeit one that everyone does. (Actually this can be changed by turning on some accessibility options in preferences.)
Instead, any PDF that looks like it has a different background color probably has an image or a full color shape stretched across it. Using iTextSharp you could probably enumerate all of the images and shapes and look for any that are the same size or larger than the actual page, but I'm not sure how reliable that would be.
The only way that I could think that would actually work would be to convert the PDF to an image and sample one or more of the corners where (hopefully) no one has any content. Think link shows how to convert a PDF to JPG.
Related
I have the problem of converting PDFs that may contain images which use transparency. In those documents, after conversion, the images will show black in place of the transparent areas. The target of the conversion is not to have transparent areas, but rather the actual page color - usually white.
As I'm using the GhostscriptRasterizer to allow conversion directly into an Image object and subsequent in-memory encoding to either JPEG or PNG, I can't use the recommended workaround of using GhostscriptPngDevice, or at least I'd rather not use that method and write temporary PNGs just for some on-demand PDF conversion.
I already played around in the GhostScript.NET source, trying different ways to inject a BackgroundColor or influencing the value of MaxBitmap, to no avail. Although the default BackgroundColor is already white, and Ghostscript.NET already configures MaxBitmap to 1g by itself.
Right now I'm working around the problem by opening offending documents in Acrobat, and applying the "Fix transparency" Inflight option to flatten any transparent objects inside the PDF, although I do want a more permanent solution that wouldn't require manual intervention.
If anyone has ideas, I'd be glad to hear them.
If you search for "add image to pdf" on Internet, you will find many useful articles. However none of them meet my requirements.
I want to add an image to a certain place inside an existing PDF file, for instance incide a textbox.
I am not certain of how exactly you require an image added to your PDF, but there a number of approaches you can consider:
1- Load the PDF as a rasterized image and draw the image at your desired location.
2- Add the image as an annotation to the PDF.
3- Convert the PDF to a format that allows easy modification of text and insertion of images.
Loading the PDF as a rasterized image is the most direct approach. However, your text will no longer be searchable and any other PDF objects (Annotations, Hyperlinks) will all become part of one image (no longer objects). But using this approach you can simply draw the image at the exact place you need. If you want to restore text searchability after doing this, you can use an OCR engine to process the text in the resulting image.
The ImageMagick library uses the Ghostscript common engine for dealing with PDF, and it can convert PDF pages to images. There's a .NET wrapper for ImageMagick to use with C#. For OCR, there are free engines like MODI or Tesseract.
Adding the image as an annotation allows you to maintain the original format and text in the PDF, though the image will be treated as a separate object than the text and will not be “in-line”. Annotations also allow you to draw them at the exact location you need without too much difficulty.
LibreOffice Draw and Okular are options you can consider for drawing annotations.
Finally, you could simply convert the PDF to a format that easier for processing and editing, like DOC, add your image then convert it back to PDF.
I'm using PDFsharp to use one PDF as a watermark in another PDF. This is mostly working. The watermark PDF is placed "behind" the content of each page in the target PDF. However, the watermark content needs to be partially transparent (or screened) in order for the resulting PDF to be legible.
How do I go about using PDFsharp to globally adjust the transparency of a PDF?
You can check the documentation here for details on adding a watermark onto a pdf using PdfSharp. From the link:
Note: Technically the watermarks in this sample are simple graphical output. They have nothing to do with the Watermark Annotations introduced in PDF 1.5.
Here is another link which claims to have 3 different methods of applying watermarks - have you tried any of these? It looks like you may need to use MigraDocs as well as PdfSharp to achieve this.
You didn't specify what your watermark looks like - does it need to support any custom pdf you can create, or is it just some text going across the page? The latter definitely looks possible using the links I posted.
If you want to create custom objects, maybe you can check this link (Xforms), where it talks about drawing transparent custom shapes:
This sample shows how to create an XForm object from scratch. You can think of such an object as a template, that, once created, can be drawn frequently anywhere in your PDF document.
I think that perhaps instead of having 2 PDFs (1 main and 1 watermark) it is probably going to be easier to have 1 pdf and then create the watermark either with the built-in methods or by creating an XForm object and sticking it on the pdf.
i want to generate a PDF file with tables, etc in it. so what i did is using a PDF converter (EVO PDF). that works great, except that the PDF has a white background-color.
What I wanted to do is to overlay a PDF document (because our company paperwork) and put the HTML above it. But because the HTML has a white background, I can't get it work.
I'm now using EVOPDF to generate the HTML and Syncfusion to overlay the company paperwork..
there must be an easier way
Convert the company stationary into a flat image, and then set that as the background to the page in CSS. As long as care is taken to measure and set sizing right, that should work for you.
I'm using itextsharp to generate the PDFs, but I need to change some text dynamically.
I know that it's possible to change if there's any AcroField, but my PDF doen's have any of it. It just has some pure texts and I need to change some of them.
Does anyone know how to do it?
Actually, I have a blog post on how to do it! But like IanGilham said, it depends on whether you have control over the original PDF. The basic idea is you setup a form on the page and replace the form fields with the text you want. (You can style the form so it doesn't look like a form)
If you don't have control over the PDF, let me know how to do it!
Here is a link to the full post:
Using a template to programmatically create PDFs with C# and iTextSharp
I haven't used itextsharp, but I have been using PDFNet SDK to explore the content of a large pile of PDFs for localisation over the last few weeks.
I would say that what you require is absolutely achievable, but how difficult it is will depend entirely on how much control you have over the quality of the files. In my case, the files can be constructed from any combination of images, text in any random order, tables, forms, paths, single pixel graphics and scanned pages, some of which are composed from hundreds of smaller images. Let's just say we're having fun with it.
In the PDFTron way of doing things, you would have to implement a viewer (sample available), and add some code over a text selection. Given the complexities of the format, it may be necessary to implement a simple editor in a secondary dialog with the ability to expand the selection to the next line (or whatever other fundamental object is used to make up text). The string could then be edited and applied by copying the entire page of the document into a new page, replacing the selected elements with your new string. You would probably have to do some mathematics to get this to work well though, as just about everything in PDF is located on the page by means of an affine transform.
Good luck. I'm sure there are people on here with some experience of itextsharp and PDF in general.
This question comes up from time to time on the mailing list. The same answer is given time and time again - NO. See this thread for the official answer from the person who created iText.
This question should be a FAQ on the itextsharp tag wiki.