Converting a PDF containing PNGs with Alpha in GhostScript.NET - c#

I have the problem of converting PDFs that may contain images which use transparency. In those documents, after conversion, the images will show black in place of the transparent areas. The target of the conversion is not to have transparent areas, but rather the actual page color - usually white.
As I'm using the GhostscriptRasterizer to allow conversion directly into an Image object and subsequent in-memory encoding to either JPEG or PNG, I can't use the recommended workaround of using GhostscriptPngDevice, or at least I'd rather not use that method and write temporary PNGs just for some on-demand PDF conversion.
I already played around in the GhostScript.NET source, trying different ways to inject a BackgroundColor or influencing the value of MaxBitmap, to no avail. Although the default BackgroundColor is already white, and Ghostscript.NET already configures MaxBitmap to 1g by itself.
Right now I'm working around the problem by opening offending documents in Acrobat, and applying the "Fix transparency" Inflight option to flatten any transparent objects inside the PDF, although I do want a more permanent solution that wouldn't require manual intervention.
If anyone has ideas, I'd be glad to hear them.

Related

add image to a specic place inside a PDF file

If you search for "add image to pdf" on Internet, you will find many useful articles. However none of them meet my requirements.
I want to add an image to a certain place inside an existing PDF file, for instance incide a textbox.
I am not certain of how exactly you require an image added to your PDF, but there a number of approaches you can consider:
1- Load the PDF as a rasterized image and draw the image at your desired location.
2- Add the image as an annotation to the PDF.
3- Convert the PDF to a format that allows easy modification of text and insertion of images.
Loading the PDF as a rasterized image is the most direct approach. However, your text will no longer be searchable and any other PDF objects (Annotations, Hyperlinks) will all become part of one image (no longer objects). But using this approach you can simply draw the image at the exact place you need. If you want to restore text searchability after doing this, you can use an OCR engine to process the text in the resulting image.
The ImageMagick library uses the Ghostscript common engine for dealing with PDF, and it can convert PDF pages to images. There's a .NET wrapper for ImageMagick to use with C#. For OCR, there are free engines like MODI or Tesseract.
Adding the image as an annotation allows you to maintain the original format and text in the PDF, though the image will be treated as a separate object than the text and will not be “in-line”. Annotations also allow you to draw them at the exact location you need without too much difficulty.
LibreOffice Draw and Okular are options you can consider for drawing annotations.
Finally, you could simply convert the PDF to a format that easier for processing and editing, like DOC, add your image then convert it back to PDF.

remove Image meta data from Image in C# [duplicate]

I'm writing a service for a project that's going to handle our image processing. One such process is supposed to strip all metadata from the byte[] provided and return the same image as a byte[].
The method I'm currently working on involves always converting the image to a Bitmap, then converting it back to the original format and returning the data from a MemoryStream.
I haven't been able to test it yet but something tells me I'm going to experience some quality loss.
How can I remove all metadata from any image with a common format?
(bmp, gif, png, jpg, icon, tiff)
Not sure how I can narrow that down any further. Would be nice if I got some feedback regarding the downvotes.
For the lossless formats (except JPEG), your idea of loading it as a bitmap and re-saving is fine. Not sure if .NET natively supports TIFFs (I doubt it does).
For JPEGs, as you suggested there may be quality loss if you're re-compressing the file after decompressing it. For that, you might try the ExifLibrary and see if that has anything. If not, there are command line tools (like ImageMagick) that can strip metadata. (If you use ImageMagick, you're all set, since it supports all of your required formats. The command you want is convert -strip.)
For TIFFs, .NET has built-in TiffBitmapDecoder and ...Encoder classes you might be able to use; see here.
In short, using an external tool like ImageMagick is definitely the easiest solution. If you can't use an external tool, you're almost certainly going to need to special-case the formats that .NET doesn't support natively (and the lossy JPEG).
EDIT: I just read that ImageMagick doesn't do lossless stripping with JPEGs, sorry. I guess using the library I linked above, or some other JPEG library, is the best I can think of.

Load jpg preview from DNG into picturebox

I have written a program that has a feature for viewing images. It works fine for common files types, I simply use a picturebox. But I want to be able to load DNG (Digital Negative) images too.
I don't need to load the whole thing, loading just the baked in jpg preview would be fine, even if not all DNGs have it, I am willing to settle for this to avoid a huge hassle. Plus since DNGs are so big and my program is meant to browse through them quickly this might be best anyway.
I have tried using WindowsAPICodePack.Shell to get the Extra Large Bitmap or Icon from the file. I am already using this for my "explorer" thumbnails. But there is nothing larger than "Large" which I believe is around 256px. I know that the image has something around 1024px jpg preview.
Is there an easy way to get at the jpg preview of a DNG file?

.NET component for color PDF to grayscale conversion

Currently i use Ghostscript to convert color PDF's to grayscale PDF's. Now i'm looking for reliable .NET commercial or not commercial component/library for ghostscript replacement. I googled and I did not find any component/library that is able to do that easily or to do that at all.
EDIT #1:
Why Ghostscript does not work for me:
I implemented Ghostscript and I'm using it's native API's. The problem is that Ghostscript does not support multiple instances of the interpreter within a single process. -dJOBSERVER mode also does not work for me because i don't collect all job and them process them all at once. It happens that Ghostscript is processing large job which takes around 20 minutes and meanwhile i get some smaller job which has to be processed ASAP and cannot wait 20 minutes. Other problem is that Ghostscript page processed events are not easily to catch. I wrote a parser for ghostscript stdout messages and i can read out processed page number but not for each page when it's processed as ghostscript pushes message for group of processed pages. There are couple of more problems with Ghostscript like producing bad pdf's, duplicating font problems.....
You can find one more problem i had with ghostscript here: Ghostscript - PS to PDF - Inverted images problem
-
a year after UPDATE:
Before a year a go i asked this question. Later i made my own solution by using iTextSharp.
You can take a look at the converting PDF to grayscale solution here:
http://habjan.blogspot.com/2013/09/proof-of-concept-converting-pdf-files.html
or
https://itextsharpextended.codeplex.com/
Works for me in most cases :)
Not quite an answer, but I think you dismiss Ghostscript too quickly.
Are you aware of the GhostScript API (for in-process Ghostscript)? Or of the -dJOBSERVER mode that can take a series of PS commands piped to its standard in?
That still won't get you your callbacks however, and it's still not multi-threaded.
As previously stated, iText could do it, but it would be a matter of walking through all the content and images looking for non-grayscale color spaces and converting them in a space-specific manner.
You'd also have to replace the pixel data in any images you might find.
The good news is that iText[Sharp] is capable of operating in multiple threads, provided each document is used from one thread at a time.
I suspect this is also the case for the suggested commercial library, which isn't such a good deal.
And then a light went on above my head... drawn in gray scale.
Blending modes and transparency groups!
Take all the current page content and stick it in a transparency group that is blended with a solid black rectangle that covers the page. I think there's even a luminosity to alpha blend mode... lets see here.
Yep, PDF reference section 11.6.5.2 "Soft Mask Dictionaries". You'll want a "luminosity" group.
Now, the bad news. If your goal in switching to gray scale is to save space, this will fail utterly. It'll actually make each file a little larger... say a 100 bytes per page, give or take.
The software rendering the PDF better be pretty hot stuff too. Your cousin's undergrad rendering project need not apply. This is advanced graphics stuff here, infrequently used by Common PDF Files, so the last sort of thing to be implemented.
So... For each original page
Create a new page.
Cover it with a black background.
Cover it with a white rectangle (had it backwards earlier) in a transparency group that uses a soft mask dictionary set to be the luminosity of the original page's content (now stashed in an XObject Form).
Because this is all your own code, you'll have ample opportunity to do whatever it is you want to do at the beginning or end of each page.
By golly, that's just crazy enough to work! It does require some PDF-Fu, but not nearly as much as the "convert each color space and image in various ways as I step through the document". Deeper knowledge, less code to write.
This isn't a .net library, but rather a potential work-around. You could install a virtual printer that is capable of writing PDF files. I would suggest CutePDF, as it's free, easy to use and does a great job 'printing' a large number of file formats to PDF. You can do nearly everything with CutePDF that you can do with a normal printer, including printing to grayscale.
After the virtual printer is installed, you can use c# to 'print' a greyscale version.
Edit: I just remembered that the free version is not silent. Once you print to the CutePDF printer, it will ask you to 'Save As'. They do have an SDK available for purchase, but I couldn't say whether it would be able to help you convert to grayscale.
If a commercial product is a valid option for you, allow me to recommend Amyuni PDF Creator .Net. By using it you will be able to enumerate all items inside the page and change their colors accordingly, images can also be set as grayscale. Usual disclaimers apply
Sample code using Amyuni PDF Creator ActiveX, the .Net version would be similar:
pdfdoc.ReportState = ReportStateConstants.acReportStateDesign;
object[] page_items = (object[])pdfdoc.get_ObjectAttribute("Pages[1]", "Objects");
string[] color_attributes = new string[] { "TextColor", "BackColor", "BorderColor", "StrokeColor" };
foreach (acObject page_item in page_items)
{
object _type = page_item["ObjectType"];
if ((ACPDFCREACTIVEX.ObjectTypeConstants)_type == ACPDFCREACTIVEX.ObjectTypeConstants.acObjectTypePicture)
{
page_item["GrayScale"] = true;
}
else
foreach (string attr_name in color_attributes)
{
try
{
Color color = System.Drawing.ColorTranslator.FromWin32((int)page_item[attr_name]);
int grayColor = (int)(0.3 * color.R + 0.59 * color.G + 0.11 * color.B);
int newColorRef = System.Drawing.ColorTranslator.ToWin32(Color.FromArgb(grayColor, grayColor, grayColor));
page_item[attr_name] = newColorRef;
}
catch { } //not all items have all kinds of color attributes
}
}
Before a year a go i asked this question. Later i made my own solution by using iTextSharp.
You can take a look at the converting PDF to grayscale solution here: https://itextsharpextended.codeplex.com/
iTextPdf a good product for creating/managing pdf it has got both commercial and free versions.
Have a look at aspose.pdf for .net it provides below features and a lot more.
Add and remove watermarks from PDF document
Set page margin, size, orientation, transition type, zoom factor and appearance of PDF document
..
And here is a list of open source pdf libraries.
After a lot of investigation i found out about ABCpdf from Websupergoo. Their component can easily convert any PDF page to grayscale by simple call to Recolor method. The component is commercial.

Getting page color information with ITextSharp

I'm looking for a way to get the PDF page color information using ITextSharp. I need to know if the page is Black and White or color
any help would be great.
To the best of my knowledge PDFs don't have a "page color" or a "background color". The fact that when you open a PDF in Acrobat and you see a white canvas is actually an implementation detail, albeit one that everyone does. (Actually this can be changed by turning on some accessibility options in preferences.)
Instead, any PDF that looks like it has a different background color probably has an image or a full color shape stretched across it. Using iTextSharp you could probably enumerate all of the images and shapes and look for any that are the same size or larger than the actual page, but I'm not sure how reliable that would be.
The only way that I could think that would actually work would be to convert the PDF to an image and sample one or more of the corners where (hopefully) no one has any content. Think link shows how to convert a PDF to JPG.

Categories