I am trying to extract images using the PDFsharp library. As mentioned in the sample program, the library does not support the extraction of the non-JPEG images, therefore, I am trying to do it myself.
I found a non-working sample program for the same purpose. I am using the following code to extract a 400 x 400 PNG image embedded in a PDF file (the image was first inserted in a MS Word file, which was saved as a PDF file then).
PDF File Link:
https://drive.google.com/open?id=1aB-SrMB3eu00BywliOBC8AW0JqRa0Hbd
EXTRACTION CODE:
static void ExportAsPngImage(PdfDictionary image, ref int count)
{
int width = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.Width);
int height = image.Elements.GetInteger(PdfSharp.Pdf.Advanced.PdfImage.Keys.Height);
System.Drawing.Imaging.PixelFormat pixelFormat = System.Drawing.Imaging.PixelFormat.Format8bppIndexed;
byte[] original_byte_boundary = image.Stream.UnfilteredValue;
byte[] result_byte_boundary = null;
//Image data in BMP files always starts at a DWORD boundary, in PDF it starts at a BYTE boundary.
//You must copy the image data line by line and start each line at the DWORD boundary.
byte[, ,] copy_dword_boundary = new byte[3, height, width];
for (int y = 0; y < height; y++)
{
for (int x = 0; x < width; x++)
{
if (x <= width && (x + (y * width) != original_byte_boundary.Length))
// while not at end of line, take orignale array
{
copy_dword_boundary[0, y, x] = original_byte_boundary[3*x + (y * width)];
copy_dword_boundary[1, y, x] = original_byte_boundary[3*x + (y * width) + 1];
copy_dword_boundary[2, y, x] = original_byte_boundary[3*x + (y * width) + 2];
}
else //fill new array with ending 0
{
copy_dword_boundary[0, y, x] = 0;
copy_dword_boundary[1, y, x] = 0;
copy_dword_boundary[2, y, x] = 0;
}
}
}
result_byte_boundary = new byte[3 * width * height];
int counter = 0;
int n_width = copy_dword_boundary.GetLength(2);
int n_height = copy_dword_boundary.GetLength(1);
for (int x = 0; x < width; x++)
{
for (int y = 0; y < height; y++)
{ //put 3dim array back in 1dim array
result_byte_boundary[counter] = copy_dword_boundary[0, x, y];
result_byte_boundary[counter + 1] = copy_dword_boundary[1, x, y];
result_byte_boundary[counter + 2] = copy_dword_boundary[2, x, y];
//counter++;
counter = counter + 3;
}
}
Bitmap bmp = new Bitmap(width, height, pixelFormat);
System.Drawing.Imaging.BitmapData bmd = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.WriteOnly, bmp.PixelFormat);
System.Runtime.InteropServices.Marshal.Copy(result_byte_boundary, 0, bmd.Scan0, result_byte_boundary.Length);
bmp.UnlockBits(bmd);
using (FileStream fs = new FileStream(#"D:\TestPdf\" + String.Format("Image{0}.png", count), FileMode.Create, FileAccess.Write))
{
bmp.Save(fs, ImageFormat.Png);
count++;
}
}
PROBLEM:
Whatever PixelFormat format I choose, the saved PNG image does not look correct.
Original PNG IMAGE (Bit Depth-32):
Result of PixelFormat = Format24bppRgb
You can get the pixelformat from the PDF file. Since you did not include the PDF in your post, I cannot tell you which format would be correct.
PDF files do not contain PNG images, instead images use a special PDF image format which is somewhat similar to the BMP files used by Windows, but without any headers in the binary data. Instead the "header" information can be found with the properties of the Image object. See the PDF Reference for further details.
Related
I wrote some code to show an array of bytes as an image. There is an array of bytes in which every element represents a value of 8-bit gray scale image. Zero equals the most black and 255 does the most white pixel. My goal is to convert this w*w-pixel gray-scale image to some thing accepted by pictureBox1.Image.
This is my code:
namespace ShowRawImage
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
int i = 0, j = 0, w = 256;
byte[] rawIm = new byte[256 * 256];
for(i = 0; i < w; ++i)
{
for (j = 0; j < w; ++j)
{
rawIm[i * w + j] = (byte)j; // BitConverter.GetBytes(j);
}
}
MemoryStream mStream = new MemoryStream();
mStream.Write(rawIm, 0, Convert.ToInt32(rawIm.Length));
Bitmap bm = new Bitmap(mStream, false);// the error occurs here
mStream.Dispose();
pictureBox1.Image = bm;
}
}
}
However I get this error:
Parameter is not valid.
The error snapshot
where is my mistake?
EDIT:
In next step I am going to display 16-bit grayscale images.
The Bitmap(Stream, bool) constructor expects a stream with an actual image format (eg. PNG, GIF, etc.) along with header, palette, and possibly compressed image data.
To create a Bitmap from raw data, you need to use the Bitmap(int width, int height, int stride, PixelFormat format, IntPtr scan0) constructor, but that is also quite inconvenient because you need a pinned raw data that you can pass as scan0.
The best if you just create an 8bpp bitmap with grayscale palette and set the pixels manually:
var bmp = new Bitmap(256, 256, PixelFormat.Format8bppIndexed);
// making it grayscale
var palette = bmp.Palette;
for (int i = 0; i < 255; i++)
palette.Entries[i] = Color.FromArgb(i, i, i);
bmp.Palette = palette;
Now you can access its raw content as bytes where 0 is black and 255 is white:
var bitmapData = bmp.LockBits(new Rectangle(Point.Empty, bmp.Size), ImageLockMode.WriteOnly, PixelFormat.Format8bppIndexed);
for (int y = 0; y < bitmapData.Height; y++)
{
for (int x = 0; x < bitmapData.Width; x++)
{
unsafe
{
((byte*) bitmapData.Scan0)[y * bitmapData.Stride + x] = (byte)x;
}
}
}
bmp.UnlockBits(bitmapData);
The result image:
But if you don't want to use unsafe code, or you want to set pixels by colors, you can use this library (disclaimer: written by me) that supports efficient manipulation regardless of the actual PixelFormat. Using that library the last block can be rewritten like this:
using (IWritableBitmapData bitmapData = bmp.GetWritableBitmapData())
{
IWritableBitmapDataRow row = bitmapData.FirstRow;
do
{
for (int x = 0; x < bitmapData.Width; x++)
row[x] = Color32.FromGray((byte)x); // this works for any pixel format
// row.SetColorIndex(x, x); // for the grayscale 8bpp bitmap created above
} while (row.MoveNextRow());
}
Or like this, using Parallel.For (this works only because in your example all rows are the same so the image is a horizontal gradient):
using (IWritableBitmapData bitmapData = bmp.GetWritableBitmapData())
{
Parallel.For(0, bitmapData.Height, y =>
{
var row = bitmapData[y];
for (int x = 0; x < bitmapData.Width; x++)
row[x] = Color32.FromGray((byte)x); // this works for any pixel format
// row.SetColorIndex(x, x); // for the grayscale 8bpp bitmap created above
});
}
As said in the comments - bitmap is not just an array. So to reach your goal you can create bitmap of needed size and set pixels with Bitmap.SetPixel:
Bitmap bm = new Bitmap(w, w);
for(var i = 0; i < w; ++i)
{
for (var j = 0; j < w; ++j)
{
bm.SetPixel(i,j, Color.FromArgb(j, j, j));
}
}
I have a System.Drawing.Bitmap with PixelFormat of Format32bppRgb.
I want this image to be converted to a bitmap of 8bit.
The following is the code to convert a 32-bit image to an 8-bit grayscale image:
public static Bitmap ToGrayscale(Bitmap bmp)
{
int rgb;
System.Drawing.Color c;
for (int y = 0; y < bmp.Height; y++)
for (int x = 0; x < bmp.Width; x++)
{
c = bmp.GetPixel(x, y);
rgb = (int)((c.R + c.G + c.B) / 3);
bmp.SetPixel(x, y, System.Drawing.Color.FromArgb(rgb, rgb, rgb));
}
return bmp;
}
However, the Bitmap I end up with still has the PixelFormat Property of Format32bppRgb.
Also,
How can I convert a 32-bit color image into an 8-bit color image?
Thanks for any input!
Related.
- Convert RGB image to RGB 16-bit and 8-bit
- C# - How to convert an Image into an 8-bit color Image?
- C# Convert Bitmap to indexed colour format
- Color Image Quantization in .NET
- quantization (Reduction of colors of image)
- The best way to reduce quantity of colors in bitmap palette
You must create (and return) new instance of Bitmap.
PixelFormat is specified in constructor of Bitmap and can not be changed.
Edit (30 March 2022): fixed byte array access expression from x * data.Stride + y to y * data.Stride + x and changed the palette to be grayscale.
EDIT:
Sample code based on this answer on MSDN:
public static Bitmap ToGrayscale(Bitmap bmp) {
var result = new Bitmap(bmp.Width, bmp.Height, PixelFormat.Format8bppIndexed);
var resultPalette = result.Palette;
for (int i = 0; i < 256; i++)
{
resultPalette.Entries[i] = Color.FromArgb(255, i, i, i);
}
result.Palette = resultPalette;
BitmapData data = result.LockBits(new Rectangle(0, 0, result.Width, result.Height), ImageLockMode.WriteOnly, PixelFormat.Format8bppIndexed);
// Copy the bytes from the image into a byte array
byte[] bytes = new byte[data.Height * data.Stride];
Marshal.Copy(data.Scan0, bytes, 0, bytes.Length);
for (int y = 0; y < bmp.Height; y++) {
for (int x = 0; x < bmp.Width; x++) {
var c = bmp.GetPixel(x, y);
var rgb = (byte)((c.R + c.G + c.B) / 3);
bytes[y * data.Stride + x] = rgb;
}
}
// Copy the bytes from the byte array into the image
Marshal.Copy(bytes, 0, data.Scan0, bytes.Length);
result.UnlockBits(data);
return result;
}
The C# software I'm involved with writing has a component that involves the reading of barcodes from scanned documents. The PDFs themselves are opened using PDFSharp.
Unfortunately we're encountering an issue with the process when it involves Flate Decoding of PDFs. Basically, all we get is a bunch of fuzz, which means there is no barcode to check and the document is not recognised.
Our code (which we shamelessly "borrowed" from another Stack Overflow case!) is as follows:
private FileInfo ExportAsPngImage(PdfDictionary image, string sourceFileName, ref int count)
{
//This code basically comes from http://forum.pdfsharp.net/viewtopic.php?f=2&t=2338#p6755
//and http://stackoverflow.com/questions/10024908/how-to-extract-flatedecoded-images-from-pdf-with-pdfsharp
string tempFile = string.Format("{0}_Image{1}.png", sourceFileName, count);
int width = image.Elements.GetInteger(PdfImage.Keys.Width);
int height = image.Elements.GetInteger(PdfImage.Keys.Height);
int bitsPerComponent = image.Elements.GetInteger(PdfImage.Keys.BitsPerComponent);
var pixelFormat = new PixelFormat();
switch (bitsPerComponent)
{
case 1:
pixelFormat = System.Drawing.Imaging.PixelFormat.Format1bppIndexed;
break;
case 8:
pixelFormat = System.Drawing.Imaging.PixelFormat.Format8bppIndexed;
break;
case 24:
pixelFormat = System.Drawing.Imaging.PixelFormat.Format24bppRgb;
break;
default:
throw new Exception("Unknown pixel format " + bitsPerComponent);
}
var fd = new FlateDecode();
byte[] decodedBytes = fd.Decode(image.Stream.Value);
byte[] resultBytes = null;
int newWidth = width;
int alignment = 4;
if (newWidth % alignment != 0)
//Image data in BMP files always starts at a DWORD boundary, in PDF it starts at a BYTE boundary.
//Most images have a width that is a multiple of 4, so there is no problem with them.
//You must copy the image data line by line and start each line at the DWORD boundary.
{
while (newWidth % alignment != 0)
{
newWidth++;
}
var copy_dword_boundary = new byte[height, newWidth];
for (int y = 0; y < height; y++)
{
for (int x = 0; x < newWidth; x++)
{
if (x <= width && (x + (y * width) < decodedBytes.Length))
// while not at end of line, take orignal array
copy_dword_boundary[y, x] = decodedBytes[x + (y * width)];
else //fill new array with ending 0
copy_dword_boundary[y, x] = 0;
}
}
resultBytes = new byte[newWidth * height];
int counter = 0;
for (int x = 0; x < copy_dword_boundary.GetLength(0); x++)
{
for (int y = 0; y < copy_dword_boundary.GetLength(1); y++)
{ //put 2dim array back in 1dim array
resultBytes[counter] = copy_dword_boundary[x, y];
counter++;
}
}
}
else
{
resultBytes = new byte[decodedBytes.Length];
decodedBytes.CopyTo(resultBytes, 0);
}
//Create a new bitmap and shove the bytes into it
var bitmap = new Bitmap(newWidth, height, pixelFormat);
BitmapData bitmapData = bitmap.LockBits(new Rectangle(0, 0, bitmap.Width, bitmap.Height), ImageLockMode.WriteOnly, bitmap.PixelFormat);
int length = (int)Math.Ceiling(width * bitsPerComponent / 8.0);
for (int i = 0; i < height; i++)
{
int offset = i * length;
int scanOffset = i * bitmapData.Stride;
Marshal.Copy(resultBytes, offset, new IntPtr(bitmapData.Scan0.ToInt32() + scanOffset), length);
}
bitmap.UnlockBits(bitmapData);
//Now save the bitmap to memory
using (var fs = new FileStream(String.Format(tempFile, count++), FileMode.Create, FileAccess.Write))
{
bitmap.Save(fs, ImageFormat.Png);
}
return new FileInfo(tempFile);
}
Unfortunately, all we get out of it is this http://i.stack.imgur.com/FwatQ.png
Any ideas on where we're going wrong, or suggestions for things we might try would be greatly appreciated.
Cheers
Thanks for the suggestions guys. One of the other developers managed to crack it - it was (as Jongware suggested) a JPEG, but it was actually zipped as well! Once unzipped it could be processed and recognised as normal.
I need to extract RGB byte values of each pixel of a small GIF stored on a PC (16x16 pixels) as I need to send them to a LED display that accepts RGB 6 byte color code.
After opening the test file and converting it to a 1D byte array I get some byte values, but I am not sure if that decodes the GIF frame and as a result will return my desired pure 192 byte RGB array?
'img = Image.FromFile("mygif.gif");
FrameDimension dimension = new FrameDimension(img.FrameDimensionsList[0]);
int frameCount = img.GetFrameCount(dimension);
img.SelectActiveFrame(dimension, 0);
gifarray = imageToByteArray(img);`
//image to array conversion func.
public byte[] imageToByteArray(System.Drawing.Image imageIn)
{
MemoryStream ms = new MemoryStream();
imageIn.Save(ms, System.Drawing.Imaging.ImageFormat.Gif);
return ms.ToArray();
}
Or maybe there is another method for doing that?
Use this method to get a 2d array containing the pixels:
//using System.Drawing;
Color[,] getPixels(Image image)
{
Bitmap bmp = (Bitmap)image;
Color[,] pixels = new Color[bmp.Width, bmp.Height];
for (int x = 0; x < bmp.Width; x++)
for (int y = 0; y < bmp.Height; y++)
pixels[x, y] = bmp.GetPixel(x, y);
return pixels;
}
Using the data returned by this method, you can get each pixel's R, G, B, and A (each are a single byte) and do whatever you want with them.
If you want the end result to be a byte[] containing values like this: { R0, G0, B0, R1, G1, B1, ... }, and the pixels need to be written to the byte[] in row-major order, then you do this:
byte[] getImageBytes(Image image)
{
Bitmap bmp = (Bitmap)image;
byte[] bytes = new byte[(bmp.Width * bmp.Height) * 3]; // 3 for R+G+B
for (int x = 0; x < bmp.Width; x++)
{
for (int y = 0; y < bmp.Height; y++)
{
Color pixel = bmp.GetPixel(x, y);
bytes[x + y * bmp.Width + 0] = pixel.R;
bytes[x + y * bmp.Width + 1] = pixel.G;
bytes[x + y * bmp.Width + 2] = pixel.B;
}
}
return bytes;
}
You can then send the result of getImageBytes to your LED (assuming that that's how you're supposed to send images to it).
Your way will not decode it to raw RGB byte data. It will most likely output the same data that you loaded in the beginning (GIF encoded).
You will need extract the data pixel by pixel:
public byte[] imageToByteArray(Image imageIn)
{
Bitmap lbBMP = new Bitmap(imageIn);
List<byte> lbBytes = new List<byte>();
for(int liY = 0; liY < lbBMP.Height; liY++)
for(int liX = 0; liX < lbBMP.Width; liX++)
{
Color lcCol = lbBMP.GetPixel(liX, liY);
lbBytes.AddRange(new[] { lcCol.R, lcCol.G, lcCol.B });
}
return lbBytes.ToArray();
}
public unsafe Bitmap MedianFilter(Bitmap Img)
{
int Size =2;
List<byte> R = new List<byte>();
List<byte> G = new List<byte>();
List<byte> B = new List<byte>();
int ApetureMin = -(Size / 2);
int ApetureMax = (Size / 2);
BitmapData imageData = Img.LockBits(new Rectangle(0, 0, Img.Width, Img.Height), ImageLockMode.ReadOnly, PixelFormat.Format32bppRgb);
byte* start = (byte*)imageData.Scan0.ToPointer ();
for (int x = 0; x < imageData.Width; x++)
{
for (int y = 0; y < imageData.Height; y++)
{
for (int x1 = ApetureMin; x1 < ApetureMax; x1++)
{
int valx = x + x1;
if (valx >= 0 && valx < imageData.Width)
{
for (int y1 = ApetureMin; y1 < ApetureMax; y1++)
{
int valy = y + y1;
if (valy >= 0 && valy < imageData.Height)
{
Color tempColor = Img.GetPixel(valx, valy);// error come from here
R.Add(tempColor.R);
G.Add(tempColor.G);
B.Add(tempColor.B);
}
}
}
}
}
}
R.Sort();
G.Sort();
B.Sort();
Img.UnlockBits(imageData);
return Img;
}
I tried to do this. but i got an error call "Bitmap region is already locked" can anyone help how to solve this. (error position is highlighted)
GetPixel is the slooow way to access the image and doesn't work (as you noticed) anymore if someone else starts messing with the image buffer directly. Why would you want to do that?
Check Using the LockBits method to access image data for some good insight into fast image manipulation.
In this case, use something like this instead:
int pixelSize = 4 /* Check below or the site I linked to and make sure this is correct */
byte* color =(byte *)imageData .Scan0+(y*imageData .Stride) + x * pixelSize;
Note that this gives you the first byte for that pixel. Depending on the color format you are looking at (ARGB? RGB? ..) you need to access the following bytes as well. Seems to suite your usecase anyway, since you just care about byte values, not the Color value.
So, after having some spare minutes, this is what I'd came up with (please take your time to understand and check it, I just made sure it compiles):
public void SomeStuff(Bitmap image)
{
var imageWidth = image.Width;
var imageHeight = image.Height;
var imageData = image.LockBits(new Rectangle(0, 0, imageWidth, imageHeight), ImageLockMode.ReadOnly, PixelFormat.Format32bppRgb);
var imageByteCount = imageData.Stride*imageData.Height;
var imageBuffer = new byte[imageByteCount];
Marshal.Copy(imageData.Scan0, imageBuffer, 0, imageByteCount);
for (int x = 0; x < imageWidth; x++)
{
for (int y = 0; y < imageHeight; y++)
{
var pixelColor = GetPixel(imageBuffer, imageData.Stride, x, y);
// Do your stuff
}
}
}
private static Color GetPixel(byte[] imageBuffer, int imageStride, int x, int y)
{
int pixelBase = y*imageStride + x*3;
byte blue = imageBuffer[pixelBase];
byte green = imageBuffer[pixelBase + 1];
byte red = imageBuffer[pixelBase + 2];
return Color.FromArgb(red, green, blue);
}
This
Relies on the PixelFormat you used in your sample (regarding both the pixelsize/bytes per pixel and the order of the values). If you change the PixelFormat this will break.
Doesn't need the unsafe keyword. I doubt that it makes a lot of difference, but you are free to use the pointer based access instead, the method would be the same.