I do perform a scan using nTwain lib from NuGet.
I catch the DataTransferred event to save the result image.
What I have in a result is some ImageInfo and null byte[] massive of information.
Is anyone aware of this library and can tell me if I am doing something wrong?
void session_DataTransferred(object sender, NTwain.DataTransferredEventArgs e)
{
Image img = ImageFromBytes(e.MemoryData);
myDS.Close();
session.Close();
}
But the e comes only with ImageInfo.
Update
Argument screenshot if useful ...
For NTwain, you should have more than just ImageInfo for that event. Specifically, e should have ImageInfo, MemData, and NativeData as you are showing in the screenshot.
I haven't done a whole lot with it but what I do in a console utility is to check if e.NativeData != IntPtr.Zero and pull a bitmap from the DIB pointer (Windows, it is a TIFF for Linux) . For this purpose, I am using another dependency CommonWin32.dll. I believe this is a similar method to the examples included in NTwain's starting solution package (look under Tests for a sample Console, WinForm, and WPF project).
If I want to save out a different file type, I do a encoding at that point. You can save a System.Drawing.Image with a given encoding. Obviously, that could be alot better (set the type and compression to make the transfer smaller) but it is a working example.
if (e.NativeData != IntPtr.Zero)
{
Bitmap img = null;
if (this._commands.CheckForDebug())
{
Console.WriteLine("Image data transferred.");
}
//Need to save out the data.
img = e.NativeData.GetDrawingBitmap();
if (img != null)
{
string fileName = "RandomFileName.";
string fileType = this._commands.GetFileType();
switch (fileType)
{
case "png":
fileName += "png";
ImageExtensions.SavePNG(img, fileName, 50L);
break;
case "jpeg":
fileName += "jpeg";
ImageExtensions.SaveJPEG(img, fileName, 50L);
break;
default:
fileName += "png";
ImageExtensions.SavePNG(img, fileName, 50L);
break;
}
}
}
public static void SaveJPEG(Image img, string filePath, long quality)
{
var encoderParameters = new EncoderParameters(1);
encoderParameters.Param[0] = new EncoderParameter(System.Drawing.Imaging.Encoder.Quality, quality);
img.Save(filePath, GetEncoder(ImageFormat.Jpeg), encoderParameters);
}
Related
I am working on reading text from an image through OCR. It only supports TIFF format images.
So, I need to convert other formats to TIFF format. Can it be done? Please help by providing some references.
If you create an Image object in .NET, you can save it as a TIFF. It is one of the many ImageFormat choices at your disposal.
Example:
var png = Image.FromFile("some.png");
png.Save("a.tiff", ImageFormat.Tiff);
You'll need to include the System.Drawing assembly in your project. That assembly will give you a lot of image manipulation capabilities. Hope that helps.
Intro Note:
This answer cover the Bounty Question; which is: How do we
convert multiple files into 1 tiff? For example, let's say have pdfs,
jpegs, pngs, and I'd like to create 1 tiff out of them?
In this answer I use .net implementation of https://imagemagick.org/index.php for image manipulation and and Ghostscript for helping read an AI/EPS/PDF/PS file so we can translate it to image files both are credible and official source.
After I answered this question I got some extra question in my email asking other merging options, I have therefore extended my answer.
IMO there are 2 steps to your goal:
Install required tools for pdf conversion
Take all images including pdf formatted files from source and merge them together
in one tiff file.
1. Install tools that helps Pdf to Image conversion:
Step 1 is only required if you intend to convert AI/EPS/PDF/PS file formats. Otherwise just jump to step2.
To make it possible converting pdf to any image format, we need a library that can read pdf files and we need a tool to convert it to image type. For this purpose, we will need to install Ghostscript (GNU Affero General Public License).
Here after, we need to install ImageMagick.net for .net in Visual Studio, nuget link.
So far so good.
2. Code part
Second and Last step is we need to read files (png, jpg, bmp, pdf etc) from folder location and add each file to MagickImageCollection, then we have several options to merge use AppendHorizontally, AppendVertically, Montage or Multiple page Tiff. ImageMagick has tons of features, like resizing, resolution etc, this is just example to demonstrate merging features:
public static void MergeImage(string src, string dest, MergeType type = MergeType.MultiplePage)
{
var files = new DirectoryInfo(src).GetFiles();
using (var images = new MagickImageCollection())
{
foreach (var file in files)
{
var image = new MagickImage(file)
{
Format = MagickFormat.Tif,
Depth = 8,
};
images.Add(image);
}
switch (type)
{
case MergeType.Vertical:
using (var result = images.AppendVertically())
{
result.AdaptiveResize(new MagickGeometry(){Height = 600, Width = 800});
result.Write(dest);
}
break;
case MergeType.Horizontal:
using (var result = images.AppendHorizontally())
{
result.AdaptiveResize(new MagickGeometry(){Height = 600, Width = 800});
result.Write(dest);
}
break;
case MergeType.Montage:
var settings = new MontageSettings
{
BackgroundColor = new MagickColor("#FFF"),
Geometry = new MagickGeometry("1x1<")
};
using (var result = images.Montage(settings))
{
result.Write(dest);
}
break;
case MergeType.MultiplePage:
images.Write(dest);
break;
default:
throw new ArgumentOutOfRangeException(nameof(type), type, "Un-support choice");
}
images.Dispose();
}
}
public enum MergeType
{
MultiplePage,
Vertical,
Horizontal,
Montage
}
To run the code
public static void Main(string[] args)
{
var src = #"C:\temp\Images";
var dest1 = #"C:\temp\Output\MultiplePage.tiff";
var dest2 = #"C:\temp\Output\Vertical.tiff";
var dest3 = #"C:\temp\Output\Horizontal.tiff";
var dest4 = #"C:\temp\Output\Montage.tiff";
MergeImage(src, dest1);
MergeImage(src, dest2, MergeType.Vertical);
MergeImage(src, dest3, MergeType.Horizontal);
MergeImage(src, dest4, MergeType.Montage);
}
Here is 4 input files in C:\temp\Images:
After running the code, we get 4 new files under C:\temp\Output looks like this:
4 page Multiple Page Tiff
4 image Vertical Merge
4 image Horizontal Merge
4 image Montage Merge
Final note:
it is possible to merge multiple images to tiff using System.Drawing; and using System.Drawing.Imaging; with out using ImageMagick, but pdf does require a third party conversion library or tool, therefore I use Ghostscript and ImageMagick for C#.
ImageMagick has many features, so you can change the resolution, size of output file etc. it is well recognized library.
Disclaimer: A part of this answer is taken from my my personal web site https://itbackyard.com/how-to-convert-ai-eps-pdf-ps-to-image-file/ with source code to github.
To be covert the image in tif format.In the below example to be convert the image and set to a text box.to be see the image in text box is (.tif formate).This sources code is working.
private void btn_Convert(object sender, EventArgs e)
{
string newName = System.IO.Path.GetFileNameWithoutExtension(CurrentFile);
newName = newName + ".tif";
try
{
img.Save(newName, ImageFormat.Tiff);
}
catch (Exception ex)
{
string error = ee.Message.ToString();
MessageBox.Show(MessageBoxIcon.Error);
}
textBox2.Text = System.IO.Path.GetFullPath(newName.ToString());
}
I tested this with jpg, bmp, png, and gif. Works for single and multipage creation of tiffs. Pass it a full pathname to the file. Hope it helps someone. (extracted from MSDN)
public static string[] ConvertJpegToTiff(string[] fileNames, bool isMultipage)
{
EncoderParameters encoderParams = new EncoderParameters(1);
ImageCodecInfo tiffCodecInfo = ImageCodecInfo.GetImageEncoders()
.First(ie => ie.MimeType == "image/tiff");
string[] tiffPaths = null;
if (isMultipage)
{
tiffPaths = new string[1];
System.Drawing.Image tiffImg = null;
try
{
for (int i = 0; i < fileNames.Length; i++)
{
if (i == 0)
{
tiffPaths[i] = String.Format("{0}\\{1}.tif",
Path.GetDirectoryName(fileNames[i]),
Path.GetFileNameWithoutExtension(fileNames[i]));
// Initialize the first frame of multipage tiff.
tiffImg = System.Drawing.Image.FromFile(fileNames[i]);
encoderParams.Param[0] = new EncoderParameter(
System.Drawing.Imaging.Encoder.SaveFlag, (long)EncoderValue.MultiFrame);
tiffImg.Save(tiffPaths[i], tiffCodecInfo, encoderParams);
}
else
{
// Add additional frames.
encoderParams.Param[0] = new EncoderParameter(
System.Drawing.Imaging.Encoder.SaveFlag, (long)EncoderValue.FrameDimensionPage);
using (System.Drawing.Image frame = System.Drawing.Image.FromFile(fileNames[i]))
{
tiffImg.SaveAdd(frame, encoderParams);
}
}
if (i == fileNames.Length - 1)
{
// When it is the last frame, flush the resources and closing.
encoderParams.Param[0] = new EncoderParameter(
System.Drawing.Imaging.Encoder.SaveFlag, (long)EncoderValue.Flush);
tiffImg.SaveAdd(encoderParams);
}
}
}
finally
{
if (tiffImg != null)
{
tiffImg.Dispose();
tiffImg = null;
}
}
}
else
{
tiffPaths = new string[fileNames.Length];
for (int i = 0; i < fileNames.Length; i++)
{
tiffPaths[i] = String.Format("{0}\\{1}.tif",
Path.GetDirectoryName(fileNames[i]),
Path.GetFileNameWithoutExtension(fileNames[i]));
// Save as individual tiff files.
using (System.Drawing.Image tiffImg = System.Drawing.Image.FromFile(fileNames[i]))
{
tiffImg.Save(tiffPaths[i], ImageFormat.Tiff);
}
}
}
return tiffPaths;
}
ImageMagick command line can do that easily. It is supplied on most Linux systems and is available for Mac or Windows also. See https://imagemagick.org/script/download.php
convert image.suffix -compress XXX image.tiff
or you can process a whole folder of files using
mogrify -format tiff -path path/to/output_directory *
ImageMagick supports combining multiple images into a multi-page TIFF. And the images can be of mixed types even including PDF.
convert image1.suffix1 image2.suffix2 ... -compress XXX imageN.suffixN output.tiff
You can choose from a number of compression formats or no compression.
See
https://imagemagick.org/script/command-line-processing.php
https://imagemagick.org/Usage/basics/
https://imagemagick.org/Usage/basics/#mogrify
https://imagemagick.org/script/command-line-options.php#compress
Or you can use Magick.Net for a C# interface. See https://github.com/dlemstra/Magick.NET
Main ImageMagick page is at https://imagemagick.org.
Supported formats are listed at https://imagemagick.org/script/formats.php
You can easily process your images to resize them, convert to grayscale, filter (sharpen), threshold, etc, all in the same command line.
See
https://imagemagick.org/Usage/
https://imagemagick.org/Usage/reference.html
This is how I convert images that are uploaded to a website. Changed it so it outputs Tiff files. The method input and outputs a byte array so it can easily be used in a variety of ways. But you can easily modify it.
using System.Drawing;
using System.Drawing.Drawing2D;
using System.Drawing.Imaging;
public byte[] ConvertImageToTiff(byte[] SourceImage)
{
//create a new byte array
byte[] bin = new byte[0];
//check if there is data
if (SourceImage == null || SourceImage.Length == 0)
{
return bin;
}
//convert the byte array to a bitmap
Bitmap NewImage;
using (MemoryStream ms = new MemoryStream(SourceImage))
{
NewImage = new Bitmap(ms);
}
//set some properties
Bitmap TempImage = new Bitmap(NewImage.Width, NewImage.Height);
using (Graphics g = Graphics.FromImage(TempImage))
{
g.CompositingMode = CompositingMode.SourceCopy;
g.CompositingQuality = CompositingQuality.HighQuality;
g.SmoothingMode = SmoothingMode.HighQuality;
g.InterpolationMode = InterpolationMode.HighQualityBicubic;
g.PixelOffsetMode = PixelOffsetMode.HighQuality;
g.DrawImage(NewImage, 0, 0, NewImage.Width, NewImage.Height);
}
NewImage = TempImage;
//save the image to a stream
using (MemoryStream ms = new MemoryStream())
{
EncoderParameters encoderParameters = new EncoderParameters(1);
encoderParameters.Param[0] = new EncoderParameter(Encoder.Quality, 80L);
NewImage.Save(ms, GetEncoderInfo("image/tiff"), encoderParameters);
bin = ms.ToArray();
}
//cleanup
NewImage.Dispose();
TempImage.Dispose();
//return data
return bin;
}
//get the correct encoder info
public ImageCodecInfo GetEncoderInfo(string MimeType)
{
ImageCodecInfo[] encoders = ImageCodecInfo.GetImageEncoders();
for (int j = 0; j < encoders.Length; ++j)
{
if (encoders[j].MimeType.ToLower() == MimeType.ToLower())
return encoders[j];
}
return null;
}
To test
var oldImage = File.ReadAllBytes(Server.MapPath("OldImage.jpg"));
var newImage = ConvertImageToTiff(oldImage);
File.WriteAllBytes(Server.MapPath("NewImage.tiff"), newImage);
I have problem with my code, I've tryed string originalImage = null;.
But this not really working. Because its not taking original file name somehow..
Code:
private void textBox1_Click(object sender, EventArgs e)
{
FolderBrowserDialog fbd = new FolderBrowserDialog();
fbd.RootFolder = Environment.SpecialFolder.Desktop;
fbd.Description = "+++ Select path +++";
fbd.ShowNewFolderButton = false;
if (fbd.ShowDialog() == DialogResult.OK)
{
textBox1.Text = fbd.SelectedPath;
}
string[] originalImage = Directory.GetFiles(textBox1.Text, "*.JPG");
foreach (var filename in originalImage)
{
Bitmap bitmap = new Bitmap(filename);
//DefaultCompressionJpeg(bitmap);
VariousQuality(bitmap);
}
}
string originalImage = null;
public void VariousQuality(Image original)
{
ImageCodecInfo jpgEncoder = null;
ImageCodecInfo[] codecs = ImageCodecInfo.GetImageEncoders();
foreach (ImageCodecInfo codec in codecs)
{
if (codec.FormatID == ImageFormat.Jpeg.Guid)
{
jpgEncoder = codec;
break;
}
}
if (jpgEncoder != null)
{
Encoder encoder = Encoder.Quality;
EncoderParameters encoderParameters = new EncoderParameters(1);
for (long quality = 90; quality <= 90;)
{
EncoderParameter encoderParameter = new EncoderParameter(encoder, quality);
encoderParameters.Param[0] = encoderParameter;
string fileOut = Path.Combine(#"C:\Users\Kristen\Desktop\pilt2", originalImage + ".jpeg");
Debug.WriteLine(fileOut);
FileStream ms = new FileStream(fileOut, FileMode.Create, FileAccess.Write);
original.Save(ms, jpgEncoder, encoderParameters);
ms.Flush();
ms.Close();
}
}
}
Kind regards,
In you click eventhandler you have a local variable string [] originalImage which you initialize with all the filenames in some directory.
On class level you have a field string originalImage which you initialize with null.
These two elements do have nothing to do with each other, they are completely unrelated.
So in your compression method you use an originalImage to construct a filename. The only entity of this name known in this method is the string field of the class, which has a null value.
You should add a second parameter to your compression method where you pass the current filename to your method and remove the field from the class.
public void VariousQuality (Bitmap original, string filename) {
...
string fileOut = Path.Combine(#"C:\Users\Kristen\Desktop\pilt2", filename + ".jpeg");
}
Call the method as follows
foreach (var filename in originalImage) {
Bitmap bitmap = new Bitmap(filename);
//DefaultCompressionJpeg(bitmap);
string fn = Path.GetFileNameWithoutExtension(filename);
VariousQuality(bitmap, fn);
}
As someone suggested in the comments (which they mysteriously deleted because they were definitely on to something), it seems like you're trying to refer to the original argument:
public void VariousQuality(Image original)
Just rename either that arg to originalImage or rename the other in your code to original.
I extract the pictures found in a PDF document with itextsharp using this snippet (thanks #Scott Stanford from this topic) :
private static IList<System.Drawing.Image> GetImagesFromPdfDict(PdfDictionary dict, PdfReader doc)
{
List<Image> images = new List<Image>();
if (dict == null)
return images;
PdfDictionary res = (PdfDictionary)(PdfReader.GetPdfObject(dict.Get(PdfName.RESOURCES)));
if (res == null)
return images;
PdfDictionary xobj = (PdfDictionary)(PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT)));
if (xobj == null)
return images;
foreach (PdfName name in xobj.Keys)
{
PdfObject obj = xobj.Get(name);
if (obj.IsIndirect())
{
PdfDictionary tg = (PdfDictionary)(PdfReader.GetPdfObject(obj));
PdfName subtype = (PdfName)(PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE)));
if (PdfName.IMAGE.Equals(subtype))
{
int xrefIdx = ((PRIndirectReference)obj).Number;
PdfObject pdfObj = doc.GetPdfObject(xrefIdx);
PdfStream str = (PdfStream)(pdfObj);
iTextSharp.text.pdf.parser.PdfImageObject pdfImage =
new iTextSharp.text.pdf.parser.PdfImageObject((PRStream)str);
System.Drawing.Image img = pdfImage.GetDrawingImage();
images.Add(img);
}
else if (PdfName.FORM.Equals(subtype) || PdfName.GROUP.Equals(subtype))
{
images.AddRange(GetImagesFromPdfDict(tg, doc));
}
}
}
return images;
}
Then I save the extracted System.Drawing.Image into jpeg files like this :
image.Save(path, ImageFormat.Jpeg);
This works well for most pictures, but in some rare cases, the saved pictures look like this :
(I have added the black stroke after the generation of the image because these pictures concern real people).
The white color turns into pink, and the black colors turn into green shades.
I tried to save the System.Drawing.Image with several encodings (System.Drawing.Imaging.EncoderParameter, also with PNG...) but I did not managed to change its output. So I think this problem come from the extraction of the image from the PDF and the creation of the System.Drawing.Image.
To test if the pictures are not corrupted, I tried with the online PDF extractor http://www.extractpdf.com/. This tool managed to extract these pictures without any problem.
Does anybody have an idea to solve this issue ?
I am having a bit of trouble with saving an Image, it says "Bad paremeter" on the line where I try and save the image.
I'm not sure if it's how I am creating the image or if it's just saving that's the problem.
public static void Fullscreen()
{
string fileName = Helper.RandomStr(10) + ".png";
try
{
var image = ScreenCapture.CaptureFullscreen();
image.Save(fileName, ImageFormat.Png);
System.Diagnostics.Process.Start(fileName);
}
catch (Exception ex)
{
MessageBox.Show("Unable to capture fullscreen because: " + ex.ToString() + "\r\n\r\nFile: " + fileName);
}
}
Edit:
Here is the method that gets the Bitmap
public static Bitmap CaptureFullscreen()
{
using (Bitmap bmp = new Bitmap(ScreenDimensions.Width, ScreenDimensions.Height))
{
using (Graphics g = Graphics.FromImage(bmp))
{
g.CopyFromScreen(Point.Empty, Point.Empty, bmp.Size);
}
return bmp;
}
}
Bad parameter is the way GDI+ tells that there was some problem.
Its a shame that the errors are not very much descriptive.
First try to wrap image parameter to Bitmap constructor like:
image = new Bitmap(image);
This forces to process the bitmap immediately.
It was even simpler, remove using on the bitmap.
Try using a known path and see if starts working. If so then you might want a new random string generator that makes valid paths or a different way of naming the file.
I am trying to extract all the images from a pdf using itextsharp but can't seem to overcome this one hurdle.
The error occures on the line System.Drawing.Image ImgPDF = System.Drawing.Image.FromStream(MS); giving an error of "Parameter is not valid".
I think it works when the image is a bitmap but not of any other format.
I have this following code - sorry for the length;
private void Form1_Load(object sender, EventArgs e)
{
FileStream fs = File.OpenRead(#"reader.pdf");
byte[] data = new byte[fs.Length];
fs.Read(data, 0, (int)fs.Length);
List<System.Drawing.Image> ImgList = new List<System.Drawing.Image>();
iTextSharp.text.pdf.RandomAccessFileOrArray RAFObj = null;
iTextSharp.text.pdf.PdfReader PDFReaderObj = null;
iTextSharp.text.pdf.PdfObject PDFObj = null;
iTextSharp.text.pdf.PdfStream PDFStremObj = null;
try
{
RAFObj = new iTextSharp.text.pdf.RandomAccessFileOrArray(data);
PDFReaderObj = new iTextSharp.text.pdf.PdfReader(RAFObj, null);
for (int i = 0; i <= PDFReaderObj.XrefSize - 1; i++)
{
PDFObj = PDFReaderObj.GetPdfObject(i);
if ((PDFObj != null) && PDFObj.IsStream())
{
PDFStremObj = (iTextSharp.text.pdf.PdfStream)PDFObj;
iTextSharp.text.pdf.PdfObject subtype = PDFStremObj.Get(iTextSharp.text.pdf.PdfName.SUBTYPE);
if ((subtype != null) && subtype.ToString() == iTextSharp.text.pdf.PdfName.IMAGE.ToString())
{
byte[] bytes = iTextSharp.text.pdf.PdfReader.GetStreamBytesRaw((iTextSharp.text.pdf.PRStream)PDFStremObj);
if ((bytes != null))
{
try
{
System.IO.MemoryStream MS = new System.IO.MemoryStream(bytes);
MS.Position = 0;
System.Drawing.Image ImgPDF = System.Drawing.Image.FromStream(MS);
ImgList.Add(ImgPDF);
}
catch (Exception)
{
}
}
}
}
}
PDFReaderObj.Close();
}
catch (Exception ex)
{
throw new Exception(ex.Message);
}
} //Form1_Load
Resolved...
Even I got the same exception of "Parameter is not valid" and after so much of
work with the help of the link provided by der_chirurg
(http://kuujinbo.info/iTextSharp/CCITTFaxDecodeExtract.aspx ) I resolved it
and following is the code:
using System.Drawing;
using System.Drawing.Imaging;
using System.IO;
using iTextSharp.text.pdf.parser;
using Dotnet = System.Drawing.Image;
using iTextSharp.text.pdf;
namespace PDF_Parsing
{
partial class PDF_ImgExtraction
{
string imgPath;
private void ExtractImage(string pdfFile)
{
PdfReader pdfReader = new PdfReader(files[fileIndex]);
for (int pageNumber = 1; pageNumber <= pdfReader.NumberOfPages; pageNumber++)
{
PdfReader pdf = new PdfReader(pdfFile);
PdfDictionary pg = pdf.GetPageN(pageNumber);
PdfDictionary res = (PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));
PdfDictionary xobj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
foreach (PdfName name in xobj.Keys)
{
PdfObject obj = xobj.Get(name);
if (obj.IsIndirect())
{
PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
string width = tg.Get(PdfName.WIDTH).ToString();
string height = tg.Get(PdfName.HEIGHT).ToString();
ImageRenderInfo imgRI = ImageRenderInfo.CreateForXObject(new Matrix(float.Parse(width), float.Parse(height)), (PRIndirectReference)obj, tg);
RenderImage(imgRI);
}
}
}
}
private void RenderImage(ImageRenderInfo renderInfo)
{
PdfImageObject image = renderInfo.GetImage();
using (Dotnet dotnetImg = image.GetDrawingImage())
{
if (dotnetImg != null)
{
using (MemoryStream ms = new MemoryStream())
{
dotnetImg.Save(ms, ImageFormat.Tiff);
Bitmap d = new Bitmap(dotnetImg);
d.Save(imgPath);
}
}
}
}
}
}
You need to check the stream's /Filter to see what image format a given image uses. It may be a standard image format:
DCTDecode (jpeg)
JPXDecode (jpeg 2000)
JBIG2Decode (jbig is a B&W only format)
CCITTFaxDecode (fax format, PDF supports group 3 and 4)
Other than that, you'll need to get the raw bytes (as you are), and build an image using the image stream's width, height, bits per component, number of color components (could be CMYK, indexed, RGB, or Something Weird), and a few others, as defined in section 8.9 of the ISO PDF SPECIFICATION (available for free).
So in some cases your code will work, but in others, it'll fail with the exception you mentioned.
PS: When you have an exception, PLEASE include the stack trace every single time. Pretty please with sugar on top?
Works for me like this, using these two methods:
public static List<System.Drawing.Image> ExtractImagesFromPDF(byte[] bytes)
{
var imgs = new List<System.Drawing.Image>();
var pdf = new PdfReader(bytes);
try
{
for (int pageNumber = 1; pageNumber <= pdf.NumberOfPages; pageNumber++)
{
PdfDictionary pg = pdf.GetPageN(pageNumber);
List<PdfObject> objs = FindImageInPDFDictionary(pg);
foreach (var obj in objs)
{
if (obj != null)
{
int XrefIndex = Convert.ToInt32(((PRIndirectReference)obj).Number.ToString(System.Globalization.CultureInfo.InvariantCulture));
PdfObject pdfObj = pdf.GetPdfObject(XrefIndex);
PdfStream pdfStrem = (PdfStream)pdfObj;
var pdfImage = new PdfImageObject((PRStream)pdfStrem);
var img = pdfImage.GetDrawingImage();
imgs.Add(img);
}
}
}
}
finally
{
pdf.Close();
}
return imgs;
}
private static List<PdfObject> FindImageInPDFDictionary(PdfDictionary pg)
{
var res = (PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));
var xobj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
var pdfObgs = new List<PdfObject>();
if (xobj != null)
{
foreach (PdfName name in xobj.Keys)
{
PdfObject obj = xobj.Get(name);
if (obj.IsIndirect())
{
var tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
var type = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
if (PdfName.IMAGE.Equals(type)) // image at the root of the pdf
{
pdfObgs.Add(obj);
}
else if (PdfName.FORM.Equals(type)) // image inside a form
{
FindImageInPDFDictionary(tg).ForEach(o => pdfObgs.Add(o));
}
else if (PdfName.GROUP.Equals(type)) // image inside a group
{
FindImageInPDFDictionary(tg).ForEach(o => pdfObgs.Add(o));
}
}
}
}
return pdfObgs;
}
In newer version of iTextSharp, the 1st parameter of ImageRenderInfo.CreateForXObject is not Matrix anymore but GraphicsState. #der_chirurg's approach should work. I tested myself with the information from the following link and it worked beautifully:
http://www.thevalvepage.com/swmonkey/2014/11/26/extract-images-from-pdf-files-using-itextsharp/
To extract all Images on all Pages, it is not necessary to implement different filters. iTextSharp has an Image Renderer, which saves all Images in their original image type.
Just do the following found here: http://kuujinbo.info/iTextSharp/CCITTFaxDecodeExtract.aspx You don't need to implement HttpHandler...
I added library on github which, extract images in PDF and compress them.
Could be useful, when you are going to start play with very powerful library ITextSharp.
Here the link: https://github.com/rock-walker/PdfCompression
This works for me and I think it's a simple solution:
Write a custom RenderListener and implement its RenderImage method, something like this
public void RenderImage(ImageRenderInfo info)
{
PdfImageObject image = info.GetImage();
Parser.Matrix matrix = info.GetImageCTM();
var fileType = image.GetFileType();
ImageFormat format;
switch (fileType)
{//you may add more types here
case "jpg":
case "jpeg":
format = ImageFormat.Jpeg;
break;
case "pnt":
format = ImageFormat.Png;
break;
case "bmp":
format = ImageFormat.Bmp;
break;
case "tiff":
format = ImageFormat.Tiff;
break;
case "gif":
format = ImageFormat.Gif;
break;
default:
format = ImageFormat.Jpeg;
break;
}
var pic = image.GetDrawingImage();
var x = matrix[Parser.Matrix.I31];
var y = matrix[Parser.Matrix.I32];
var width = matrix[Parser.Matrix.I11];
var height = matrix[Parser.Matrix.I22];
if (x < <some value> && y < <some value>)
{
return;//ignore these images
}
pic.Save(<path and name>, format);
}
I have used this library in the past without any problems.
http://www.winnovative-software.com/PdfImgExtractor.aspx
private void btnExtractImages_Click(object sender, EventArgs e)
{
if (pdfFileTextBox.Text.Trim().Equals(String.Empty))
{
MessageBox.Show("Please choose a source PDF file", "Choose PDF file", MessageBoxButtons.OK);
return;
}
// the source pdf file
string pdfFileName = pdfFileTextBox.Text.Trim();
// start page number
int startPageNumber = int.Parse(textBoxStartPage.Text.Trim());
// end page number
// when it is 0 the extraction will continue up to the end of document
int endPageNumber = 0;
if (textBoxEndPage.Text.Trim() != String.Empty)
endPageNumber = int.Parse(textBoxEndPage.Text.Trim());
// create the PDF images extractor object
PdfImagesExtractor pdfImagesExtractor = new PdfImagesExtractor();
pdfImagesExtractor.LicenseKey = "31FAUEJHUEBQRl5AUENBXkFCXklJSUlQQA==";
// the demo output directory
string outputDirectory = Path.Combine(Application.StartupPath, #"DemoFiles\Output");
Cursor = Cursors.WaitCursor;
// set the handler to be called when an image was extracted
pdfImagesExtractor.ImageExtractedEvent += pdfImagesExtractor_ImageExtractedEvent;
try
{
// start images counting
imageIndex = 0;
// call the images extractor to raise the ImageExtractedEvent event when an images is extracted from a PDF page
// the pdfImagesExtractor_ImageExtractedEvent handler below will be executed for each extracted image
pdfImagesExtractor.ExtractImagesInEvent(pdfFileName, startPageNumber, endPageNumber);
// Alternatively you can use the ExtractImages() and ExtractImagesToFile() methods
// to extracted the images from a PDF document in memory or to image files in a directory
// uncomment the line below to extract the images to an array of ExtractedImage objects
//ExtractedImage[] pdfPageImages = pdfImagesExtractor.ExtractImages(pdfFileName, startPageNumber, endPageNumber);
// uncomment the lines below to extract the images to image files in a directory
//string outputDirectory = System.IO.Path.Combine(Application.StartupPath, #"DemoFiles\Output");
//pdfImagesExtractor.ExtractImagesToFile(pdfFileName, startPageNumber, endPageNumber, outputDirectory, "pdfimage");
}
catch (Exception ex)
{
// The extraction failed
MessageBox.Show(String.Format("An error occurred. {0}", ex.Message), "Error");
return;
}
finally
{
// uninstall the event handler
pdfImagesExtractor.ImageExtractedEvent -= pdfImagesExtractor_ImageExtractedEvent;
Cursor = Cursors.Arrow;
}
try
{
System.Diagnostics.Process.Start(outputDirectory);
}
catch (Exception ex)
{
MessageBox.Show(string.Format("Cannot open output folder. {0}", ex.Message));
return;
}
}
/// <summary>
/// The ImageExtractedEvent event handler called after an image was extracted from a PDF page.
/// The event is raised when the ExtractImagesInEvent() method is used
/// </summary>
/// <param name="args">The handler argument containing the extracted image and the PDF page number</param>
void pdfImagesExtractor_ImageExtractedEvent(ImageExtractedEventArgs args)
{
// get the image object and page number from even handler argument
Image pdfPageImageObj = args.ExtractedImage.ImageObject;
int pageNumber = args.ExtractedImage.PageNumber;
// save the extracted image to a PNG file
string outputPageImage = Path.Combine(Application.StartupPath, #"DemoFiles\Output",
"pdfimage_" + pageNumber.ToString() + "_" + imageIndex++ + ".png");
pdfPageImageObj.Save(outputPageImage, ImageFormat.Png);
args.ExtractedImage.Dispose();
}