Algorithm to scale down YUV 4:2:2 - c#

Trying to write an efficient algorithm to scale down YUV 4:2:2 by a factor of 2 - and which doesn't require a conversion to RGB (which is CPU intensive).
I've seen plenty of code on stack overflow for YUV to RGB conversion - but only an example of scaling for YUV 4:2:0 here which I have started based my code on. However, this produces an image which is effectively 3 columns of the same image with corrupt colours, so something is wrong with the algo when applied to 4:2:2.
Can anybody see what is wrong with this code?
public static byte[] HalveYuv(byte[] data, int imageWidth, int imageHeight)
{
byte[] yuv = new byte[imageWidth / 2 * imageHeight / 2 * 3 / 2];
int i = 0;
for (int y = 0; y < imageHeight; y += 2)
{
for (int x = 0; x < imageWidth; x += 2)
{
yuv[i] = data[y * imageWidth + x];
i++;
}
}
for (int y = 0; y < imageHeight / 2; y += 2)
{
for (int x = 0; x < imageWidth; x += 4)
{
yuv[i] = data[(imageWidth * imageHeight) + (y * imageWidth) + x];
i++;
yuv[i] = data[(imageWidth * imageHeight) + (y * imageWidth) + (x + 1)];
i++;
}
}
return yuv;
}

A fast way to generate a low quality thumbnail would be to discard half of the data in each dimension.
We break the image in 4x2 grid of pixels - each pair of pixels in the grid is represented by 4 bytes. In the down-scaled image, we take the color values for the first 2 pixels in the grid by copying the first 4 bytes, whilst discarding the other 12 bytes worth of data.
This scaling can be generalized to any power of 2 (1/2, 1/4, 1/8, ...) - this method is quick because it doesn't use any interpolation. This will give a lower quality image which appears blocky however - for better results consider some sampling approach.
public static byte[] FastResize(
byte[] data,
int imageWidth,
int imageHeight,
int scaleDownExponent)
{
var scaleDownFactor = (uint)Math.Pow(2, scaleDownExponent);
var outputImageWidth = imageWidth / scaleDownFactor;
var outputImageHeight = imageHeight / scaleDownFactor;
// 2 bytes per pixel.
byte[] yuv = new byte[outputImageWidth * outputImageHeight * 2];
var pos = 0;
// Process every other line.
for (uint pixelY = 0; pixelY < imageHeight; pixelY += scaleDownFactor)
{
// Work in blocks of 2 pixels, we discard the second.
for (uint pixelX = 0; pixelX < imageWidth; pixelX += 2*scaleDownFactor)
{
// Position of pixel bytes.
var start = ((pixelY * imageWidth) + pixelX) * 2;
yuv[pos] = data[start];
yuv[pos + 1] = data[start + 1];
yuv[pos + 2] = data[start + 2];
yuv[pos + 3] = data[start + 3];
pos += 4;
}
}
return yuv;
}

I assume that the original data is in the following order (as it seems so from your example code): First there are the luminance (Y) values of the pixels of the image (size = imageWidth*imageHeight bytes). After that there are the chrominance components UV, s.t., the values for a single pixel are given after each other. This means that the total size of the original image is 3*size.
Now for 4:2:2 subsampling means that every other value of the horizontal chrominance component are discarded. This reduces the data to size size + 0.5*size + 0.5*size = 2*size, i.e., luminance is kept completely and both chrominance components are divided to half. Therefore, the result image should be allocated as:
byte[] yuv = new byte[2*imageWidth*imageHeight];
As the first part of the image is copied in full the first loop becomes:
int i = 0;
for (int y = 0; y < imageHeight; y++)
{
for (int x = 0; x < imageWidth; x++)
{
yuv[i] = data[y * imageWidth + x];
i++;
}
}
Because this just copies the beginning of data this can be simplified to
int size = imageHeight*imageWidth;
int i = 0;
for (; i < size; i++)
{
yuv[i] = data[i];
}
Now to copy the rest we need to skip every other horizontal coordinate
for (int y = 0; y < imageHeight; y++)
{
for (int x = 0; x < imageWidth; x += 2) // +2 skip each other horizontal component
{
yuv[i] = data[size + y*2*imageWidth + 2*x];
i++;
yuv[i] = data[size + y*2*imageWidth + 2*x + 1];
i++;
}
}
The factor two in data-array index is needed because there are 2 bytes for each pixel (both chrominance components), so each "row" has 2*imageWidth bytes of data.

Related

Fast data manipulation in C# .NET: RRGGBB to Bitmap / C# Video / Image Processing

I am getting image data from a camera which comes as an array of ushorts from an unmanaged dll.
I have managed to get the data into managed land at a speed which is pretty good.
// cpp .NET
static void imageCallback(unsigned short * rawData, unsigned int length) {
array<unsigned short>^ imageData = gcnew array<unsigned short>(length);
unsigned int headLength = 512; // header length in shorts
pin_ptr<unsigned short> imageDataStart = &imageData[0];
memcpy(imageDataStart, rawData + headLength, length);
callBackDelegate(imageData);
}
The data comes ordered as "RGBRGBRGB...." ushorts for each color channel.
The managed array is then sent to C# via a delegate. Then I have to convert the raw data and stuff it into a (8 bit valued) bitmap via the usual methods like so:
public static Bitmap RGBDataToBitmap(ushort[] data, int Width, int Height, int bitDepth)
{
Bitmap bmp = new Bitmap(Width, Height, PixelFormat.Format32bppArgb);
var rawdata = bmp.LockBits(new Rectangle(Point.Empty, bmp.Size), ImageLockMode.ReadWrite, bmp.PixelFormat);
var pixelSize = rawdata.PixelFormat == PixelFormat.Format32bppArgb ? 4 : 3; // only works with 32 or 24 pixel-size bitmap!
var padding = rawdata.Stride - (rawdata.Width * pixelSize);
var bytes = new byte[rawdata.Height * rawdata.Stride];
var index = 0;
var pixel = 0;
// scale to 8 bits
var scalar = Math.Pow(2, -(bitDepth - 8));
for (var y = 0; y < Height; y++)
{
for (var x = 0; x < Width; x++)
{
int Rlevel = (int)Math.Round(data[pixel + 0] * scalar);
int Glevel = (int)Math.Round(data[pixel + 1] * scalar);
int Blevel = (int)Math.Round(data[pixel + 2] * scalar);
pixel += 3;
bytes[index + 3] = 255; // A component
bytes[index + 2] = Convert.ToByte(Blevel); // B component
bytes[index + 1] = Convert.ToByte(Glevel); // G component
bytes[index + 0] = Convert.ToByte(Rlevel); // R component
index += pixelSize;
}
index += padding;
}
// copy back the bytes from array to the bitmap
System.Runtime.InteropServices.Marshal.Copy(bytes, 0, rawdata.Scan0, bytes.Length);
bmp.UnlockBits(rawdata);
return bmp;
}
If I time this operation (rawdata to bitmap) it takes ~0.5 seconds. The frames are coming in at ~12 times per second so this is too slow.
Does anyone see a way that I make this operation faster in C#? Or does anyone have any guidance for another method?
The goal is to have a live video image in C#.
Thanks!
EDIT:
Per the suggestions below, if I change the for loop to this:
// scale to 8 bits
var bitminus8 = bitDepth - 8;
var scalar = Math.Pow(2, -(bitminus8));
Parallel.For(0, Height, y =>
{
var index = y * Width;
for (var x = 0; x < Width; x++)
{
var idx = index + x;
byte Rlevel = (byte)(data[idx * 3 + 0] >> bitminus8);
byte Glevel = (byte)(data[idx * 3 + 1] >> bitminus8);
byte Blevel = (byte)(data[idx * 3 + 2] >> bitminus8);
bytes[idx * 4 + 3] = 255; // A component
bytes[idx * 4 + 2] = Blevel; // B component
bytes[idx * 4 + 1] = Glevel; // G component
bytes[idx * 4 + 0] = Rlevel; // R component
}
});
This goes from 0.5 seconds to 0.04 seconds. Nice bit about the byte conversion, that made a big difference.

Bitmap outside array bounds

I have the following code:
if (source != null)
{
int count = 0;
int stride = (source.PixelWidth * source.Format.BitsPerPixel + 7) / 8;
byte[] pixels = new byte[source.PixelHeight * stride];
source.CopyPixels(pixels, stride, 0);
for (int y = 0; y < source.PixelHeight; y = y + 2)
{
for (int x = 0; x < source.PixelWidth; x = x + 2)
{
int index = y * stride + 4 * x;
count = index;
byte red = pixels[index];
byte green = pixels[index + 1];
byte blue = pixels[index + 2];
byte alpha = pixels[index + 3];
}
}
MessageBox.Show("Array Length, pixels: " + pixels.Count() + "," + count);
}
However, i am having an issue where certain bitmap images, when stepped through throw an exception
"System.IndexOutOfRangeException" as the index passes the pixel [ ] array count, does anyone know how to solve this efficiently without oversizing the array?
I want to display the progress as i go along hence the need for an accurate array :)
Thanks in advance.
You should be able to step through your code and figure out when index is larger than your pixels buffer size.
Add some debugging output to your code and step through it. Something like:
if (source != null)
{
int count = 0;
int stride = (source.PixelWidth * source.Format.BitsPerPixel + 7) / 8;
byte[] pixels = new byte[source.PixelHeight * stride];
source.CopyPixels(pixels, stride, 0);
for (int y = 0; y < source.PixelHeight; y = y + 2)
{
for (int x = 0; x < source.PixelWidth; x = x + 2)
{
int index = y * stride + 4 * x;
count = index;
int bufsize = source.PixelHeight * stride;
System.Diagnostics.Debug.WriteLine($"bufsize={bufsize}, index={index}, x={x}, y={y}");
System.Diagnostics.Debug.Assert((index+3) <= bufsize);
byte red = pixels[index];
byte green = pixels[index + 1];
byte blue = pixels[index + 2];
byte alpha = pixels[index + 3];
}
}
MessageBox.Show("Array Length, pixels: " + pixels.Count() + "," + count);
}
A big part about writing code is learning how to debug, use the debugger, and to verify the correctness of your algorithms. Good luck with your project.
This code will crash on any image where Format.BitsPerPixel is less than 32 e.g. 24-bit RGB with no alpha. You also shouldn't assume that stride is what you think it is, you should use the value returned from LockBits.

Fast inversion of an axis of 3D data stored in a 1D array

I have a large dataset that represents a 3D image (approx 100,000,000 pixels). I want to invert the pixels along the 'z' axis of the image. My data is stored in a byte array where the data is ordered x, y, z (i.e [] = { (x=0,y,z=0), (x=1,y=0,z=0), (x=2,y=0,z=0) ...)
I can easily sort them using the following code, however I am looking to reduce computation time where possible (currently reporting around 7 seconds). I am considering using an array 'sort' function, but am not sure how to handle the indexing.
Here is my current code:
private int GetIndex(Image _image, int _x, int _y, int _z)
{
return (_z * _image.Size.X * _image.Size.Y) + (_y * _image.Size.X) + _x;;
}
private void InvertZ(Image _image)
{
for (int z = 0; z < _image.Size.Z/2; z++)
{
for (int y = 0; y < _image.Size.Y; y++)
{
for (int x = 0; x < _image.Size.X; x++)
{
int srcIndex = GetIndex(_image, x, y, z);
int destIndex = GetIndex(_image, x, y, _image.Size.Z - z - 1);
byte src = _image.Buffer[srcIndex];
byte dest = _image.Buffer[destIndex];
_image.Buffer[srcIndex] = dest;
_image.Buffer[destIndex] = src;
}
}
}
}
One solution is to copy each frame. Reduces the number of iterations drastically.
private void InvertZ(Image _image)
{
int frameSize = _image.Size.X * _image.Size.Y;
byte[] temp = new byte[frameSize];
for (int z = 0; z < _image.Size.Z / 2; z++)
{
int inverseZ = _image.Size.Z - z - 1;
Array.Copy(_image.Buffer, z * frameSize, temp, 0, frameSize);
Array.Copy(_image.Buffer, inverseZ * frameSize, _image.Buffer, z * frameSize, frameSize);
Array.Copy(temp, 0, _image.Buffer, inverseZ * frameSize, frameSize);
}
}
Runtime approx < 18 ms compared to 3175 ms.
This might be faster that your one loop solution even if it uses several loops, one for each axis.
There are 2 major speed enhancements (very common algorithm enhancements within image processing).
The Image is converted to a array for quick legwork
The index become partly precomputed within the the loops making sure the you calculate as much as possible as few times as possible. (no more use of the built-in thing in Image which is nice for one and another pixel but not suitable for the whole map)
I usually only do this work on 2d so the z axis is just added. it might be faster having it as the last loop. (fringe example running in 2d if y is before x then you loose about 30-40% speed because of the way memory/cache is built Src this happens at 3d as well (that's why I placed the z (levels/pages at the front))
Here is the code I would base my solution on.
private void InvertZ(Image _image)
{
byte[] array =imageToByteArray(_image);
int pageSize = _Image.Size.Y * _Image.Size.X;
for (int z = 0; z < _image.Size.Z/2; z++)
{
int level = z * pageSize;
int dstLevel = (_image.Size.Z - z - 1) * pageSize;
for (int x = 0; x < _image.Size.X; x++)
{
int Row = x*_Image.Size.Y;
int RowOnLevel = level + Row ;
int dstRowOnLevel = dstLevel + xRow;
for (int y = 0; y < _image.Size.Y; y++)
{
int srcIndex = RowOnLevel + y;
int destIndex = dstRowOnLevel + y;
byte tmpDest = array[destIndex];
array[destIndex] = array[srcIndex];
array[srcIndex] = tmpDest;
}
}
}
return byteArrayToImage(array);
}
public Image byteArrayToImage(byte[] byteArrayIn)
{
MemoryStream ms = new MemoryStream(byteArrayIn);
Image returnImage = Image.FromStream(ms);
return returnImage;
}
public byte[] imageToByteArray(System.Drawing.Image imageIn)
{
MemoryStream ms = new MemoryStream();
imageIn.Save(ms,System.Drawing.Imaging.ImageFormat.Gif);
return ms.ToArray();
}
You can use the fact that the array layout is like this
Z Length
==== ======
[0] x * y
[1] x * y
[2] x * y
...
[z-1] x * y
This allows us to greatly reduce index calculations. We can use a variation of the classic O(N) reverse algorithm like this
static void InvertZ(Image _image)
{
int len = _image.Size.X * _image.Size.Y;
for (int lo = 0, hi = (_image.Size.Z - 1) * len; lo < hi; lo += len, hi -= len)
for (int i = 0; i < len; i++)
Swap(ref _image.Buffer[lo + i], ref _image.Buffer[hi + i]);
}
static void Swap<T>(ref T a, ref T b) { var c = a; a = b; b = c; }

Memory Object Allocation failure using c# and opencl

I am writing an image processing program with the express purpose to alter large images, the one I'm working with is 8165 pixels by 4915 pixels. I was told to implement gpu processing, so after some research I decided to go with OpenCL. I started implementing the OpenCL C# wrapper OpenCLTemplate.
My code takes in a bitmap and uses lockbits to lock its memory location. I then copy the order of each bit into an array, run the array through the openCL kernel, and it inverts each bit in the array. I then run the inverted bits back into the memory location of the image. I split this process into ten chunks so that i can increment a progress bar.
My code works perfectly with smaller images, but when I try to run it with my big image I keep getting a MemObjectAllocationFailure when trying to execute the kernel. I don't know why its doing this and i would appreciate any help in figuring out why or how to fix it.
using OpenCLTemplate;
public static void Invert(Bitmap image, ToolStripProgressBar progressBar)
{
string openCLInvert = #"
__kernel void Filter(__global uchar * Img0,
__global float * ImgF)
{
// Gets information about work-item
int x = get_global_id(0);
int y = get_global_id(1);
// Gets information about work size
int width = get_global_size(0);
int height = get_global_size(1);
int ind = 4 * (x + width * y );
// Inverts image colors
ImgF[ind]= 255.0f - (float)Img0[ind];
ImgF[1 + ind]= 255.0f - (float)Img0[1 + ind];
ImgF[2 + ind]= 255.0f - (float)Img0[2 + ind];
// Leave alpha component equal
ImgF[ind + 3] = (float)Img0[ind + 3];
}";
//Lock the image in memory and get image lock data
var imageData = image.LockBits(new Rectangle(0, 0, image.Width, image.Height), ImageLockMode.ReadWrite, PixelFormat.Format32bppArgb);
CLCalc.InitCL();
for (int i = 0; i < 10; i++)
{
unsafe
{
int adjustedHeight = (((i + 1) * imageData.Height) / 10) - ((i * imageData.Height) / 10);
int count = 0;
byte[] Data = new byte[(4 * imageData.Stride * adjustedHeight)];
var startPointer = (byte*)imageData.Scan0;
for (int y = ((i * imageData.Height) / 10); y < (((i + 1) * imageData.Height) / 10); y++)
{
for (int x = 0; x < imageData.Width; x++)
{
byte* Byte = (byte*)(startPointer + (y * imageData.Stride) + (x * 4));
Data[count] = *Byte;
Data[count + 1] = *(Byte + 1);
Data[count + 2] = *(Byte + 2);
Data[count + 3] = *(Byte + 3);
count += 4;
}
}
CLCalc.Program.Compile(openCLInvert);
CLCalc.Program.Kernel kernel = new CLCalc.Program.Kernel("Filter");
CLCalc.Program.Variable CLData = new CLCalc.Program.Variable(Data);
float[] imgProcessed = new float[Data.Length];
CLCalc.Program.Variable CLFiltered = new CLCalc.Program.Variable(imgProcessed);
CLCalc.Program.Variable[] args = new CLCalc.Program.Variable[] { CLData, CLFiltered };
kernel.Execute(args, new int[] { imageData.Width, adjustedHeight });
CLCalc.Program.Sync();
CLFiltered.ReadFromDeviceTo(imgProcessed);
count = 0;
for (int y = ((i * imageData.Height) / 10); y < (((i + 1) * imageData.Height) / 10); y++)
{
for (int x = 0; x < imageData.Width; x++)
{
byte* Byte = (byte*)(startPointer + (y * imageData.Stride) + (x * 4));
*Byte = (byte)imgProcessed[count];
*(Byte + 1) = (byte)imgProcessed[count + 1];
*(Byte + 2) = (byte)imgProcessed[count + 2];
*(Byte + 3) = (byte)imgProcessed[count + 3];
count += 4;
}
}
}
progressBar.Owner.Invoke((Action)progressBar.PerformStep);
}
//Unlock image
image.UnlockBits(imageData);
}
You may have reached a memory allocation limit of your OpenCL driver/device. Check the values returned by clGetDeviceInfo. There is a limit for the size of one single memory object. The OpenCL driver may allow the total size of all allocated memory objects to exceed the memory size on your device, and will copy them to/from host memory when needed.
To process large images, you may have to split them into smaller pieces, and process them separately.

C# predictive coding for image compression

I've been playing with Huffman Compression on images to reduce size while maintaining a lossless image, but I've also read that you can use predictive coding to further compress image data by reducing entropy.
From what I understand, in the lossless JPEG standard, each pixel is predicted as the weighted average of the adjacent 4 pixels already encountered in raster order (three above and one to the left). e.g., trying to predict the value of a pixel a based on preceding pixels, x, to the left as well as above a :
x x x
x a
Then calculate and encode the residual (difference between predicted and actual value).
But what I don't get is if the average 4 neighbor pixels aren't a multiple of 4, you'd get a fraction right? Should that fraction be ignored? If so, would the proper encoding of an 8 bit image (saved in a byte[]) be something like:
public static void Encode(byte[] buffer, int width, int height)
{
var tempBuff = new byte[buffer.Length];
for (int i = 0; i < buffer.Length; i++)
{
tempBuff[i] = buffer[i];
}
for (int i = 1; i < height; i++)
{
for (int j = 1; j < width - 1; j++)
{
int offsetUp = ((i - 1) * width) + (j - 1);
int offset = (i * width) + (j - 1);
int a = tempBuff[offsetUp];
int b = tempBuff[offsetUp + 1];
int c = tempBuff[offsetUp + 2];
int d = tempBuff[offset];
int pixel = tempBuff[offset + 1];
var ave = (a + b + c + d) / 4;
var val = (byte)(ave - pixel);
buffer[offset + 1] = val;
}
}
}
public static void Decode(byte[] buffer, int width, int height)
{
for (int i = 1; i < height; i++)
{
for (int j = 1; j < width - 1; j++)
{
int offsetUp = ((i - 1) * width) + (j - 1);
int offset = (i * width) + (j - 1);
int a = buffer[offsetUp];
int b = buffer[offsetUp + 1];
int c = buffer[offsetUp + 2];
int d = buffer[offset];
int pixel = buffer[offset + 1];
var ave = (a + b + c + d) / 4;
var val = (byte)(ave - pixel);
buffer[offset + 1] = val;
}
}
}
I don't see how this really will reduce entropy? How will this help compress my images further while still being lossless?
Thanks for any enlightenment
EDIT:
So after playing with the predictive coding images, I noticed that the histogram data shows a lot of +-1's of the varous pixels. This reduces entropy quite a bit in some cases. Here is a screenshot:
Yes, just truncate. Doesn't matter because you store the difference. It reduces entropy because you only store small values, a lot of them will be -1, 0 or 1. There are a couple of off-by-one bugs in your snippet btw.

Categories