Exctract FlateDecode images using iTextSharp - c#

I want to extract images from an PDF. I'm using iTextSharp right now.
Some images can be extracted correct, but most of them don't have the right colors and are distorted.
I did some experiments with different PixelFormats, but I didn't get a solution for my Problem...
This is the Code which separates the image-types:
if (filter == "/FlateDecode")
{
// ...
int w = int.Parse(width);
int h = int.Parse(height);
int bpp = tg.GetAsNumber(PdfName.BITSPERCOMPONENT).IntValue;
byte[] rawBytes = PdfReader.GetStreamBytesRaw((PRStream)tg);
byte[] decodedBytes = PdfReader.FlateDecode(rawBytes);
byte[] streamBytes = PdfReader.DecodePredictor(decodedBytes, tg.GetAsDict(PdfName.DECODEPARMS));
PixelFormat[] pixFormats = new PixelFormat[23] {
PixelFormat.Format24bppRgb,
// ... all Pixel Formats
};
for (int i = 0; i < pixFormats.Length; i++)
{
Program.ToPixelFormat(w, h, pixFormats[i], streamBytes, bpp, images));
}
}
This is the Code to save the Image in a MemoryStream. Saving the image in a folder is implemented later.
private static void ToPixelFormat(int width, int height, PixelFormat pixelformat, byte[] bytes, int bpp, IList<Image> images)
{
Bitmap bmp = new Bitmap(width, height, pixelformat);
BitmapData bmd = bmp.LockBits(new Rectangle(0, 0, width, height),
ImageLockMode.WriteOnly, pixelformat);
Marshal.Copy(bytes, 0, bmd.Scan0, bytes.Length);
bmp.UnlockBits(bmd);
using (var ms = new MemoryStream())
{
bmp.Save(ms, System.Drawing.Imaging.ImageFormat.Tiff);
bytes = ms.GetBuffer();
}
images.Add(bmp);
}
Please help me.

even you found solution for your problem, let me say suggestion to fix your code above.
I believe the distortion problem is caused because of mismatch in row data boundary. PdfReader returns data in a byte boundary. For example for grayscale image 20 pixel wide you will get 20 bytes of data for each image row. Bitmap class works with 32bit boundary. When creating bitmap with 20 pixels of width, Bitmap class will generate grayscale bitmap with stride(byte width)=32 bytes. It means you cannot simply copy the retrieved bytes from PdfReader into a new bitmap using Marshal.Copy() method as it is in your ToPixelFormat().
First pixel in source byte array is located as 21st byte but destination Bitmap needs it as 33rd byte becasue of the 32bit boundary of the Bitmap. To solve this issue I had to create byte array with size that considers the 32bit boundary for each data row.
Copy data row by row from bytes aray retrieved from PdfReader into new byte array with 32bit row boundary consideration. Now I had bytes of data with boundary that matched the Bitmap class boundary so I can copy it to the new Bitmap using Marshal.Copy().

I found an solution for my own problem.
To extract all Images on all Pages, it is not necessary to implement different filters.
iTextSharp has an Image Renderer, which saves all Images in their original image type.
Just do the following found here: http://kuujinbo.info/iTextSharp/CCITTFaxDecodeExtract.aspx
You don't need to implement HttpHandler...

PDF supports a pretty wide variety of image formats. I don't think I would take this approach you've chosen here. You need to determine the image format from the bytes in the stream itself. For example, JPEG will typically start with the ASCII bytes JFIF.
.NET (3.0+) does come with a method that will attempt to pick the right decoder: BitmapDecoder.Create. See http://msdn.microsoft.com/en-us/library/system.windows.media.imaging.bitmapdecoder.aspx
If that doesn't work you may want to consider some third-party imaging libraries. I've used ImageMagick.NET and LeadTools (way overpriced).

Related

File.ReadAllBytes doesn't read the PNG image pixels properly

I am trying to read the bytes of a .png image using File.ReadAllBytes(string) method without success.
My images are of size 2464x2056x3 (15.197.952 bytes), but this method returns an array of about 12.000.000 bytes.
I tried with a white image of the same size, and I am getting a byte array of 25.549, and checking the byte array I can see all kind of values, that obviously is not correct because is a white image.
The code I am using is:
var frame = File.ReadAllBytes("C:\\workspace\\white.png");
I've also tried to open the image first as an Image object then get the byte array with the following:
using (var ms = new MemoryStream())
{
var imageIn = Image.FromFile("C:\\workspace\\white.png");
imageIn.Save(ms, imageIn.RawFormat);
var array = ms.ToArray();
}
But the result is the same as before...
Any idea of what's happening?
How can I read the byte array?
PNG is a compressed format.
See some info about it: Portable Network Graphics - Wikipedia.
This means that the binary representation is not the actual pixel values that you expect.
You need some kind of a PNG decoder to get the pixel values from the compressed data.
This post might steer you in the right direction: Reading a PNG image file in .Net 2.0. Note that it's quite old, maybe there are newer methods for doing it.
On a side note: even a non compressed format like BMP has a header, so you can't simply read the binary file and get the pixel values in a trivial way.
Update:
One way to get the pixel values from a PNG file is demonstrated below:
using System.Drawing.Imaging;
byte[] GetPngPixels(string filename)
{
byte[] rgbValues = null;
// Load the png and convert to Bitmap. This will use a .NET PNG decoder:
using (var imageIn = Image.FromFile(filename))
using (var bmp = new Bitmap(imageIn))
{
// Lock the pixel data to gain low level access:
BitmapData bmpData = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.ReadWrite, bmp.PixelFormat);
// Get the address of the first line.
IntPtr ptr = bmpData.Scan0;
// Declare an array to hold the bytes of the bitmap.
int bytes = Math.Abs(bmpData.Stride) * bmp.Height;
rgbValues = new byte[bytes];
// Copy the RGB values into the array.
System.Runtime.InteropServices.Marshal.Copy(ptr, rgbValues, 0, bytes);
// Unlock the pixel data:
bmp.UnlockBits(bmpData);
}
// Here rgbValues is an array of the pixel values.
return rgbValues;
}
This method will return a byte array with the size that you expect.
In order to use the data with opencv (or any similar usage), I advise you to enhance my code example and return also the image metadata (width, height, stride, pixel-format). You will need this metadata to construct a cv::Mat.

Convert ByteArray into png image and add it to ImageView in Xamarin.Android

I have the Image ByteArray and want to convert the byte array into png image and add in ImageView as you see in the below code.
byte[] imageBytes = webClient.DownloadDataTaskAsync(uri);
ImageView view = new ImageView(this.Context);
//Here need to add the converted image into ImageView
view.SetImageSource();
I achieved this, by converting ImageBytes into a bitmap and add the bitmap in ImageView. But it has some memory problem. As I adding more no.of times frequently in my source, I couldn't use a bitmap to add in ImageView due to the memory exception.
So please help me.
Thanks.
You can do it by creating the bitmap from Stream to do that use this:
using(var ms = new MemoryStream(imageBytes))
{
var bitmap = BitmapFactory.DecodeStream(ms);
// ...
// rest of your logic here...
// ...
}
Hope this helps
It should be as easy as calling
var bitmap = BitmapFactory.DecodeByteArray(imageBytes, 0, imageBytes.Length);
Android.Graphics.BitmapFactory.DecodeByteArray Method
Decode an immutable bitmap from the specified byte array.
Parameters
data
byte array of compressed image data
offset
offset into imageData for where the decoder should begin parsing.
length
the number of bytes, beginning at offset, to parse
opts
null-ok; Options that control downsampling and whether the image should be completely decoded, or just is size returned.

Encode JPG image file as DICOM PixelData using ClearCanvas

I have a set of JPG images that are actually slices of a CT scan, which I want to reconstruct into DICOM image files and import into a PACS.
I am using ClearCanvas, and have set all of the requisite tags (and confirmed them by converting one of my JPG files to DICOM using a proprietary application to make sure they are the same). I am just not sure how I should be processing my JPG file to get it into the PixelData tag?
Currently I am converting it to a Byte array, on advice from ClearCanvas forums, but the image is just garbled in the DICOM viewer. How should I be processing the image data to get it into a readable format?
public DicomFile CreateFileFromImage(Image image)
{
int height = image.Height;
int width = image.Width;
short bitsPerPixel = (short)Image.GetPixelFormatSize(image.PixelFormat);
byte[] imageBuffer = ImageToByteArray(image);
DicomFile dicomFile = new DicomFile();
dicomFile.DataSet[DicomTags.Columns].SetInt32(0, width);
dicomFile.DataSet[DicomTags.Rows].SetInt32(0, height);
dicomFile.DataSet[DicomTags.BitsStored].SetInt16(0, bitsPerPixel);
dicomFile.DataSet[DicomTags.BitsAllocated].SetInt16(0, bitsPerPixel);
dicomFile.DataSet[DicomTags.HighBit].SetInt16(0, 7);
//other tags not shown
dicomFile.DataSet[DicomTags.PixelData].Values = imageBuffer;
return dicomFile;
}
public static byte[] ImageToByteArray(Image imageIn)
{
MemoryStream ms = new MemoryStream();
imageIn.Save(ms, System.Drawing.Imaging.ImageFormat.Jpeg);
return ms.ToArray();
}
The ClearCanvas library as two helper classes that make it easier to encode and decode pixel data within a DicomFile. They are the DicomCompressedPixelData class and the DicomUncompressedPixelData class. You can use these to set the parameters for the image, and encode them into the DicomFile object.
In your case, since you're encoding a compressed object, you should use the DicomCompressedPixelData class. There are properties on the class that can be set. Calling the UpdateMessage method will copy these property values over to the DicomFile object. Also, this class has an AddFrameFragment method that properly encodes the pixel data. Note that compressed pixel data has to have some specific binary wrappers around each frame of data. This was the part missing from your previous code. The code below shows how to set this up.
short bitsPerPixel = (short)Image.GetPixelFormatSize(image.PixelFormat);
var dicomFile = new DicomFile();
var pd = new DicomCompressedPixelData(dicomFile);
pd.ImageWidth = (ushort)image.Width;
pd.ImageHeight = (ushort) image.Height;
pd.BitsStored = (ushort)bitsPerPixel;
pd.BitsAllocated = (ushort) bitsPerPixel;
pd.HighBit = 7;
pd.SamplesPerPixel = 3;
pd.PlanarConfiguration = 0;
pd.PhotometricInterpretation = "YBR_FULL_422";
byte[] imageBuffer = ImageToByteArray(image);
pd.AddFrameFragment(imageBuffer);
pd.UpdateMessage(dicomFile);
return dicomFile;
I ended up processing the bitmap manually and creating an array out of the Red Channel in the image, following some code in a plugin:
int size = rows * columns;
byte[] pData = new byte[size];
int i = 0;
for (int row = 0; row < rows; ++row)
{
for (int column = 0; column < columns; column++)
{
pData[i++] = image.GetPixel(column, row).R;
}
}
It does work, but it is horribly slow and creates bloated DICOM files. I'd love to get the inbuilt DicomCompressedPixelData class working.
Any further suggestions would be very welcome.
It is important to know the bit depth and color components of your JPEG CT image before inserting into DICOM dataset. It could be 8-bit lossy (JPEG Compression Process 2) or 12-bit lossy (JPEG Compression Process 4) or 8, 12 or 16 bit lossless grayscale JPEG (JPEG Compression Process 14 - lossless, non-hierarchical). This information is critical for updating the Pixel Data related information such as Photometric Interpretation, Sample per Pixel, Planer Configuration, Bits Allocated, High Bit to Transfer Syntax.

Adding an image from memory stream to Excel document

I have an image in a memory stream and I want to write this to an MS Excel document, the PIA only exposes the AddPicture method which takes a file path.
Is there away to add a picture without having to write the image to disc?
MSDN
http://msdn.microsoft.com/en-us/library/microsoft.office.interop.excel.shapes.addpicture(v=office.14).aspx
Well, a bit of blind flying but assuming a thing or two about your code (e.g. the source of your stream, data type, etc) this could be a solution:
First, you need to create bitmap image data from the stream (which I assume is a byte stream, also assuming that the stream describes a bitmap image). There's a solution already for that here on Stack Overflow: Byte Array to Bitmap Image I copy-paste the code from the solution:int w= 100;
int h = 200;
int ch = 3; //number of channels (ie. assuming 24 bit RGB in this case)
byte[] imageData = new byte[whch]; //you image data here
Bitmap bitmap = new Bitmap(w,h,PixelFormat.Format24bppRgb);
BitmapData bmData = bitmap.LockBits(new System.Drawing.Rectangle(0, 0, bitmap.Width, bitmap.Height), ImageLockMode.ReadWrite, bitmap.PixelFormat);
IntPtr pNative = bmData.Scan0;
Marshal.Copy(imageData,0,pNative,whch);
bitmap.UnlockBits(bmData);
Also assuming you have an object for your workbook and the worksheet you are about to work with, something like this:xlBook = (Excel.Workbook)objExcel.Workbooks.Add("");
xlSheet = (Excel.Worksheet)xlBook.Worksheets1;
xlSheet.Activate();
Now that you have a Bitmap-type variable, and a worksheet, all you need is to paste the image to the sheet:System.Windows.Forms.Clipboard.SetDataObject(bitmap, false);
xlsRange = xlSheet.get_Range((Excel.Range)xlSheet.Cells[5, 15], (Excel.Range)xlSheet.Cells[5, 15]);
xlSheet.Paste(xlsRange, bitmap);
So the key is this guy here (instead of using "AddPicture"): Worksheet.Paste Method
Hope this helps!

Got Reversed Image from Byte Array when converting to Base64

I need to get the raw bitmap data only (no header, or other information). I used the following code to get the bitmap data:
using (Bitmap bitmap = svgDocument.Draw())
{
Rectangle rect = new Rectangle(0, 0, bitmap.Width, bitmap.Height);
BitmapData bitmapData = bitmap.LockBits(rect, ImageLockMode.ReadWrite, bitmap.PixelFormat);
var length = Math.Abs(bitmapData.Stride) * bitmapData.Height;
byte[] bytes = new byte[length];
Marshal.Copy(bitmapData.Scan0, bytes, 0, length);
bitmap.UnlockBits(bitmapData);
MemoryStream memoryStream = new MemoryStream();
string filename = DateTime.Now.Ticks.ToString() + ".bmp"; // this works fine
bitmap.Save(filename, ImageFormat.Bmp);
string base64 = Convert.ToBase64String(bytes, Base64FormattingOptions.InsertLineBreaks); // the base64 is reversed.
}
When I save the bitmap, everything looks fine. The image is not reversed. However when I use the bytes only to convert the data to Base64, then the image is reversed.
Edit 1:
I think this has nothing to do with the Base64 conversion. It seems that the bytes are in reversed order.
When I save the image using the code, the image looks like this:
When I use the bytes, then I see this:
Solution:
I found the solution. Instead of creating a new bitmap, I just skipped the first 54 bytes of header information and then stored the byte array.
MemoryStream memoryStream = new MemoryStream();
bitmap.Save(memoryStream, ImageFormat.Bmp);
// Skip header
IEnumerable<byte> bytes = memoryStream.ToArray().Skip(54);
The standard BMP format allows to store bytes of the image either in classical top/down order or in reverse order.
The way to tell whether your image is stored this way is to check the value of the Height parameter in the BMP header:
If Height < 0, then the height of your image is abs(Height) and the pixels are stored in reverse (bottom / top) order.
If Height > 0, then you are in the case you expect, where pixels are in 'normal' order, top to bottom.
I would say that what happens in your case is that you are starting from an image stored with a negative Height header (the SVG object must do that for some reason). But you do not check it, so you store the pixels in bottom to top order.
When you store with the BMP object, it figures that out for you from the context. But when you export just the pixels, then the third party application that loads your image sees positive Height and thus shows your image upside down.
You can find details about this pixel ordering in the Wikipedia page for BMP file format.
Edit:
So, when you write a BMP file to your disk, you have to do the following:
Check whether your bytes are in top to bottom order (a) or in bottom to top order (b)
If (a): put the height of your image as a positive value in the BMP header
If (b): put - height as a negative value in the BMP header. So that the third party program that shows your image knows that it's reversed.
I found the solution. Instead of creating a new bitmap, I just skipped the first 54 bytes of header information and then stored the byte array.
MemoryStream memoryStream = new MemoryStream();
bitmap.Save(memoryStream, ImageFormat.Bmp);
// Skip header
IEnumerable<byte> bytes = memoryStream.ToArray().Skip(54);
I ran into the same problem but still needed a base 64 string (for HTML5 canvas). So I used the Image class to rotate the image, stored in a new stream, then convert to base64 string.
var image = System.Drawing.Image.FromStream(new MemoryStream(bytes));
//Rotate and save to new stream
image.RotateFlip(System.Drawing.RotateFlipType.Rotate180FlipNone);
MemoryStream streamOut = new MemoryStream();
image.Save(streamOut, System.Drawing.Imaging.ImageFormat.Jpeg);
//Convert to base64 string
string base64String = Convert.ToBase64String(streamOut.ToArray());

Categories