I am writing an image processing program with the express purpose to alter large images, the one I'm working with is 8165 pixels by 4915 pixels. I was told to implement gpu processing, so after some research I decided to go with OpenCL. I started implementing the OpenCL C# wrapper OpenCLTemplate.
My code takes in a bitmap and uses lockbits to lock its memory location. I then copy the order of each bit into an array, run the array through the openCL kernel, and it inverts each bit in the array. I then run the inverted bits back into the memory location of the image. I split this process into ten chunks so that i can increment a progress bar.
My code works perfectly with smaller images, but when I try to run it with my big image I keep getting a MemObjectAllocationFailure when trying to execute the kernel. I don't know why its doing this and i would appreciate any help in figuring out why or how to fix it.
using OpenCLTemplate;
public static void Invert(Bitmap image, ToolStripProgressBar progressBar)
{
string openCLInvert = #"
__kernel void Filter(__global uchar * Img0,
__global float * ImgF)
{
// Gets information about work-item
int x = get_global_id(0);
int y = get_global_id(1);
// Gets information about work size
int width = get_global_size(0);
int height = get_global_size(1);
int ind = 4 * (x + width * y );
// Inverts image colors
ImgF[ind]= 255.0f - (float)Img0[ind];
ImgF[1 + ind]= 255.0f - (float)Img0[1 + ind];
ImgF[2 + ind]= 255.0f - (float)Img0[2 + ind];
// Leave alpha component equal
ImgF[ind + 3] = (float)Img0[ind + 3];
}";
//Lock the image in memory and get image lock data
var imageData = image.LockBits(new Rectangle(0, 0, image.Width, image.Height), ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
CLCalc.InitCL();
for (int i = 0; i < 10; i++)
{
unsafe
{
int adjustedHeight = (((i + 1) * imageData.Height) / 10) - ((i * imageData.Height) / 10);
int count = 0;
byte[] Data = new byte[(4 * imageData.Stride * adjustedHeight)];
var startPointer = (byte*)imageData.Scan0;
for (int y = ((i * imageData.Height) / 10); y < (((i + 1) * imageData.Height) / 10); y++)
{
for (int x = 0; x < imageData.Width; x++)
{
byte* Byte = (byte*)(startPointer + (y * imageData.Stride) + (x * 4));
Data[count] = *Byte;
Data[count + 1] = *(Byte + 1);
Data[count + 2] = *(Byte + 2);
Data[count + 3] = *(Byte + 3);
count += 4;
}
}
CLCalc.Program.Compile(openCLInvert);
CLCalc.Program.Kernel kernel = new CLCalc.Program.Kernel("Filter");
CLCalc.Program.Variable CLData = new CLCalc.Program.Variable(Data);
float[] imgProcessed = new float[Data.Length];
CLCalc.Program.Variable CLFiltered = new CLCalc.Program.Variable(imgProcessed);
CLCalc.Program.Variable[] args = new CLCalc.Program.Variable[] { CLData, CLFiltered };
kernel.Execute(args, new int[] { imageData.Width, adjustedHeight });
CLCalc.Program.Sync();
CLFiltered.ReadFromDeviceTo(imgProcessed);
count = 0;
for (int y = ((i * imageData.Height) / 10); y < (((i + 1) * imageData.Height) / 10); y++)
{
for (int x = 0; x < imageData.Width; x++)
{
byte* Byte = (byte*)(startPointer + (y * imageData.Stride) + (x * 4));
*Byte = (byte)imgProcessed[count];
*(Byte + 1) = (byte)imgProcessed[count + 1];
*(Byte + 2) = (byte)imgProcessed[count + 2];
*(Byte + 3) = (byte)imgProcessed[count + 3];
count += 4;
}
}
}
progressBar.Owner.Invoke((Action)progressBar.PerformStep);
}
//Unlock image
image.UnlockBits(imageData);
}
You may have reached a memory allocation limit of your OpenCL driver/device. Check the values returned by clGetDeviceInfo. There is a limit for the size of one single memory object. The OpenCL driver may allow the total size of all allocated memory objects to exceed the memory size on your device, and will copy them to/from host memory when needed.
To process large images, you may have to split them into smaller pieces, and process them separately.
Related
I am getting image data from a camera which comes as an array of ushorts from an unmanaged dll.
I have managed to get the data into managed land at a speed which is pretty good.
// cpp .NET
static void imageCallback(unsigned short * rawData, unsigned int length) {
array<unsigned short>^ imageData = gcnew array<unsigned short>(length);
unsigned int headLength = 512; // header length in shorts
pin_ptr<unsigned short> imageDataStart = &imageData[0];
memcpy(imageDataStart, rawData + headLength, length);
callBackDelegate(imageData);
}
The data comes ordered as "RGBRGBRGB...." ushorts for each color channel.
The managed array is then sent to C# via a delegate. Then I have to convert the raw data and stuff it into a (8 bit valued) bitmap via the usual methods like so:
public static Bitmap RGBDataToBitmap(ushort[] data, int Width, int Height, int bitDepth)
{
Bitmap bmp = new Bitmap(Width, Height, PixelFormat.Format32bppArgb);
var rawdata = bmp.LockBits(new Rectangle(Point.Empty, bmp.Size), ImageLockMode.ReadWrite, bmp.PixelFormat);
var pixelSize = rawdata.PixelFormat == PixelFormat.Format32bppArgb ? 4 : 3; // only works with 32 or 24 pixel-size bitmap!
var padding = rawdata.Stride - (rawdata.Width * pixelSize);
var bytes = new byte[rawdata.Height * rawdata.Stride];
var index = 0;
var pixel = 0;
// scale to 8 bits
var scalar = Math.Pow(2, -(bitDepth - 8));
for (var y = 0; y < Height; y++)
{
for (var x = 0; x < Width; x++)
{
int Rlevel = (int)Math.Round(data[pixel + 0] * scalar);
int Glevel = (int)Math.Round(data[pixel + 1] * scalar);
int Blevel = (int)Math.Round(data[pixel + 2] * scalar);
pixel += 3;
bytes[index + 3] = 255; // A component
bytes[index + 2] = Convert.ToByte(Blevel); // B component
bytes[index + 1] = Convert.ToByte(Glevel); // G component
bytes[index + 0] = Convert.ToByte(Rlevel); // R component
index += pixelSize;
}
index += padding;
}
// copy back the bytes from array to the bitmap
System.Runtime.InteropServices.Marshal.Copy(bytes, 0, rawdata.Scan0, bytes.Length);
bmp.UnlockBits(rawdata);
return bmp;
}
If I time this operation (rawdata to bitmap) it takes ~0.5 seconds. The frames are coming in at ~12 times per second so this is too slow.
Does anyone see a way that I make this operation faster in C#? Or does anyone have any guidance for another method?
The goal is to have a live video image in C#.
Thanks!
EDIT:
Per the suggestions below, if I change the for loop to this:
// scale to 8 bits
var bitminus8 = bitDepth - 8;
var scalar = Math.Pow(2, -(bitminus8));
Parallel.For(0, Height, y =>
{
var index = y * Width;
for (var x = 0; x < Width; x++)
{
var idx = index + x;
byte Rlevel = (byte)(data[idx * 3 + 0] >> bitminus8);
byte Glevel = (byte)(data[idx * 3 + 1] >> bitminus8);
byte Blevel = (byte)(data[idx * 3 + 2] >> bitminus8);
bytes[idx * 4 + 3] = 255; // A component
bytes[idx * 4 + 2] = Blevel; // B component
bytes[idx * 4 + 1] = Glevel; // G component
bytes[idx * 4 + 0] = Rlevel; // R component
}
});
This goes from 0.5 seconds to 0.04 seconds. Nice bit about the byte conversion, that made a big difference.
Trying to write an efficient algorithm to scale down YUV 4:2:2 by a factor of 2 - and which doesn't require a conversion to RGB (which is CPU intensive).
I've seen plenty of code on stack overflow for YUV to RGB conversion - but only an example of scaling for YUV 4:2:0 here which I have started based my code on. However, this produces an image which is effectively 3 columns of the same image with corrupt colours, so something is wrong with the algo when applied to 4:2:2.
Can anybody see what is wrong with this code?
public static byte[] HalveYuv(byte[] data, int imageWidth, int imageHeight)
{
byte[] yuv = new byte[imageWidth / 2 * imageHeight / 2 * 3 / 2];
int i = 0;
for (int y = 0; y < imageHeight; y += 2)
{
for (int x = 0; x < imageWidth; x += 2)
{
yuv[i] = data[y * imageWidth + x];
i++;
}
}
for (int y = 0; y < imageHeight / 2; y += 2)
{
for (int x = 0; x < imageWidth; x += 4)
{
yuv[i] = data[(imageWidth * imageHeight) + (y * imageWidth) + x];
i++;
yuv[i] = data[(imageWidth * imageHeight) + (y * imageWidth) + (x + 1)];
i++;
}
}
return yuv;
}
A fast way to generate a low quality thumbnail would be to discard half of the data in each dimension.
We break the image in 4x2 grid of pixels - each pair of pixels in the grid is represented by 4 bytes. In the down-scaled image, we take the color values for the first 2 pixels in the grid by copying the first 4 bytes, whilst discarding the other 12 bytes worth of data.
This scaling can be generalized to any power of 2 (1/2, 1/4, 1/8, ...) - this method is quick because it doesn't use any interpolation. This will give a lower quality image which appears blocky however - for better results consider some sampling approach.
public static byte[] FastResize(
byte[] data,
int imageWidth,
int imageHeight,
int scaleDownExponent)
{
var scaleDownFactor = (uint)Math.Pow(2, scaleDownExponent);
var outputImageWidth = imageWidth / scaleDownFactor;
var outputImageHeight = imageHeight / scaleDownFactor;
// 2 bytes per pixel.
byte[] yuv = new byte[outputImageWidth * outputImageHeight * 2];
var pos = 0;
// Process every other line.
for (uint pixelY = 0; pixelY < imageHeight; pixelY += scaleDownFactor)
{
// Work in blocks of 2 pixels, we discard the second.
for (uint pixelX = 0; pixelX < imageWidth; pixelX += 2*scaleDownFactor)
{
// Position of pixel bytes.
var start = ((pixelY * imageWidth) + pixelX) * 2;
yuv[pos] = data[start];
yuv[pos + 1] = data[start + 1];
yuv[pos + 2] = data[start + 2];
yuv[pos + 3] = data[start + 3];
pos += 4;
}
}
return yuv;
}
I assume that the original data is in the following order (as it seems so from your example code): First there are the luminance (Y) values of the pixels of the image (size = imageWidth*imageHeight bytes). After that there are the chrominance components UV, s.t., the values for a single pixel are given after each other. This means that the total size of the original image is 3*size.
Now for 4:2:2 subsampling means that every other value of the horizontal chrominance component are discarded. This reduces the data to size size + 0.5*size + 0.5*size = 2*size, i.e., luminance is kept completely and both chrominance components are divided to half. Therefore, the result image should be allocated as:
byte[] yuv = new byte[2*imageWidth*imageHeight];
As the first part of the image is copied in full the first loop becomes:
int i = 0;
for (int y = 0; y < imageHeight; y++)
{
for (int x = 0; x < imageWidth; x++)
{
yuv[i] = data[y * imageWidth + x];
i++;
}
}
Because this just copies the beginning of data this can be simplified to
int size = imageHeight*imageWidth;
int i = 0;
for (; i < size; i++)
{
yuv[i] = data[i];
}
Now to copy the rest we need to skip every other horizontal coordinate
for (int y = 0; y < imageHeight; y++)
{
for (int x = 0; x < imageWidth; x += 2) // +2 skip each other horizontal component
{
yuv[i] = data[size + y*2*imageWidth + 2*x];
i++;
yuv[i] = data[size + y*2*imageWidth + 2*x + 1];
i++;
}
}
The factor two in data-array index is needed because there are 2 bytes for each pixel (both chrominance components), so each "row" has 2*imageWidth bytes of data.
I have the following code:
if (source != null)
{
int count = 0;
int stride = (source.PixelWidth * source.Format.BitsPerPixel + 7) / 8;
byte[] pixels = new byte[source.PixelHeight * stride];
source.CopyPixels(pixels, stride, 0);
for (int y = 0; y < source.PixelHeight; y = y + 2)
{
for (int x = 0; x < source.PixelWidth; x = x + 2)
{
int index = y * stride + 4 * x;
count = index;
byte red = pixels[index];
byte green = pixels[index + 1];
byte blue = pixels[index + 2];
byte alpha = pixels[index + 3];
}
}
MessageBox.Show("Array Length, pixels: " + pixels.Count() + "," + count);
}
However, i am having an issue where certain bitmap images, when stepped through throw an exception
"System.IndexOutOfRangeException" as the index passes the pixel [ ] array count, does anyone know how to solve this efficiently without oversizing the array?
I want to display the progress as i go along hence the need for an accurate array :)
Thanks in advance.
You should be able to step through your code and figure out when index is larger than your pixels buffer size.
Add some debugging output to your code and step through it. Something like:
if (source != null)
{
int count = 0;
int stride = (source.PixelWidth * source.Format.BitsPerPixel + 7) / 8;
byte[] pixels = new byte[source.PixelHeight * stride];
source.CopyPixels(pixels, stride, 0);
for (int y = 0; y < source.PixelHeight; y = y + 2)
{
for (int x = 0; x < source.PixelWidth; x = x + 2)
{
int index = y * stride + 4 * x;
count = index;
int bufsize = source.PixelHeight * stride;
System.Diagnostics.Debug.WriteLine($"bufsize={bufsize}, index={index}, x={x}, y={y}");
System.Diagnostics.Debug.Assert((index+3) <= bufsize);
byte red = pixels[index];
byte green = pixels[index + 1];
byte blue = pixels[index + 2];
byte alpha = pixels[index + 3];
}
}
MessageBox.Show("Array Length, pixels: " + pixels.Count() + "," + count);
}
A big part about writing code is learning how to debug, use the debugger, and to verify the correctness of your algorithms. Good luck with your project.
This code will crash on any image where Format.BitsPerPixel is less than 32 e.g. 24-bit RGB with no alpha. You also shouldn't assume that stride is what you think it is, you should use the value returned from LockBits.
I am trying to save an image as monochrome (black&white, 1 bit-depth) but I'm coming up lost how to do it.
I am starting with a png and converting to a bitmap for printing (it's a thermal printer and only supports black anyway - plus its slow as hell for large images if I try to send them as color/grayscale).
My code so far is dead simple to convert it to a bitmap, but it is retaining the original colour depth.
Image image = Image.FromFile("C:\\test.png");
byte[] bitmapFileData = null;
int bitsPerPixel = 1;
int bitmapDataLength;
using (MemoryStream str = new MemoryStream())
{
image.Save(str, ImageFormat.Bmp);
bitmapFileData = str.ToArray();
}
Here's some code I put together that takes a full colour (24 bits/pixel) image, and converts it to a 1 bit/pixel output bitmap, applying a standard RGB to greyscale conversion, and then using Floyd-Steinberg to convert greyscale to the 1 bit/pixel output.
Note that this should by no means be considered an "ideal" implementation, but it does work. There are a number of improvements that could be applied if you wanted. For example, it copies the entire input image into the data array, whereas we really only need to keep two lines in memory (the "current" and "next" lines) for accumulating the error data. Despite this, performance seems acceptable.
public static Bitmap ConvertTo1Bit(Bitmap input)
{
var masks = new byte[] { 0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01 };
var output = new Bitmap(input.Width, input.Height, PixelFormat.Format1bppIndexed);
var data = new sbyte[input.Width, input.Height];
var inputData = input.LockBits(new Rectangle(0, 0, input.Width, input.Height), ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);
try
{
var scanLine = inputData.Scan0;
var line = new byte[inputData.Stride];
for (var y = 0; y < inputData.Height; y++, scanLine += inputData.Stride)
{
Marshal.Copy(scanLine, line, 0, line.Length);
for (var x = 0; x < input.Width; x++)
{
data[x, y] = (sbyte)(64 * (GetGreyLevel(line[x * 3 + 2], line[x * 3 + 1], line[x * 3 + 0]) - 0.5));
}
}
}
finally
{
input.UnlockBits(inputData);
}
var outputData = output.LockBits(new Rectangle(0, 0, output.Width, output.Height), ImageLockMode.WriteOnly, PixelFormat.Format1bppIndexed);
try
{
var scanLine = outputData.Scan0;
for (var y = 0; y < outputData.Height; y++, scanLine += outputData.Stride)
{
var line = new byte[outputData.Stride];
for (var x = 0; x < input.Width; x++)
{
var j = data[x, y] > 0;
if (j) line[x / 8] |= masks[x % 8];
var error = (sbyte)(data[x, y] - (j ? 32 : -32));
if (x < input.Width - 1) data[x + 1, y] += (sbyte)(7 * error / 16);
if (y < input.Height - 1)
{
if (x > 0) data[x - 1, y + 1] += (sbyte)(3 * error / 16);
data[x, y + 1] += (sbyte)(5 * error / 16);
if (x < input.Width - 1) data[x + 1, y + 1] += (sbyte)(1 * error / 16);
}
}
Marshal.Copy(line, 0, scanLine, outputData.Stride);
}
}
finally
{
output.UnlockBits(outputData);
}
return output;
}
public static double GetGreyLevel(byte r, byte g, byte b)
{
return (r * 0.299 + g * 0.587 + b * 0.114) / 255;
}
What you want is a good dithering algorithm like Floyd-Steinberg or Bayer ordered. You can either implement the binarization yourself or use a library like AForge.NET to do it for you (download the image processing samples). You can find the binarization documentation here.
I've been playing with Huffman Compression on images to reduce size while maintaining a lossless image, but I've also read that you can use predictive coding to further compress image data by reducing entropy.
From what I understand, in the lossless JPEG standard, each pixel is predicted as the weighted average of the adjacent 4 pixels already encountered in raster order (three above and one to the left). e.g., trying to predict the value of a pixel a based on preceding pixels, x, to the left as well as above a :
x x x
x a
Then calculate and encode the residual (difference between predicted and actual value).
But what I don't get is if the average 4 neighbor pixels aren't a multiple of 4, you'd get a fraction right? Should that fraction be ignored? If so, would the proper encoding of an 8 bit image (saved in a byte[]) be something like:
public static void Encode(byte[] buffer, int width, int height)
{
var tempBuff = new byte[buffer.Length];
for (int i = 0; i < buffer.Length; i++)
{
tempBuff[i] = buffer[i];
}
for (int i = 1; i < height; i++)
{
for (int j = 1; j < width - 1; j++)
{
int offsetUp = ((i - 1) * width) + (j - 1);
int offset = (i * width) + (j - 1);
int a = tempBuff[offsetUp];
int b = tempBuff[offsetUp + 1];
int c = tempBuff[offsetUp + 2];
int d = tempBuff[offset];
int pixel = tempBuff[offset + 1];
var ave = (a + b + c + d) / 4;
var val = (byte)(ave - pixel);
buffer[offset + 1] = val;
}
}
}
public static void Decode(byte[] buffer, int width, int height)
{
for (int i = 1; i < height; i++)
{
for (int j = 1; j < width - 1; j++)
{
int offsetUp = ((i - 1) * width) + (j - 1);
int offset = (i * width) + (j - 1);
int a = buffer[offsetUp];
int b = buffer[offsetUp + 1];
int c = buffer[offsetUp + 2];
int d = buffer[offset];
int pixel = buffer[offset + 1];
var ave = (a + b + c + d) / 4;
var val = (byte)(ave - pixel);
buffer[offset + 1] = val;
}
}
}
I don't see how this really will reduce entropy? How will this help compress my images further while still being lossless?
Thanks for any enlightenment
EDIT:
So after playing with the predictive coding images, I noticed that the histogram data shows a lot of +-1's of the varous pixels. This reduces entropy quite a bit in some cases. Here is a screenshot:
Yes, just truncate. Doesn't matter because you store the difference. It reduces entropy because you only store small values, a lot of them will be -1, 0 or 1. There are a couple of off-by-one bugs in your snippet btw.