I've been working on an edge detection program in C#, and to make it run faster, I recently made it use lock bits. However, lockBits is still not as fast as I would like it to run. Although the problem could be my general algorithm, I'm also wondering if there is anything better than lockBits I can use for image processing.
In case the problem is the algorithm, here's a basic explanation. Go through an array of Colors (made using lockbits, which represent pixels) and for each Color, check the color of the eight pixels around that pixel. If those pixels do not match the current pixel closely enough, consider the current pixel an edge.
Here's the basic code that defines if a pixel is an edge. It takes in a Color[] of nine colors, the first of which is the pixel is to check.
public Boolean isEdgeOptimized(Color[] colors)
{
//colors[0] should be the checking pixel
Boolean returnBool = true;
float percentage = percentageInt; //the percentage used is set
//equal to the global variable percentageInt
if (isMatching(colors[0], colors[1], percentage) &&
isMatching(colors[0], colors[2], percentage) &&
isMatching(colors[0], colors[3], percentage) &&
isMatching(colors[0], colors[4], percentage) &&
isMatching(colors[0], colors[5], percentage) &&
isMatching(colors[0], colors[6], percentage) &&
isMatching(colors[0], colors[7], percentage) &&
isMatching(colors[0], colors[8], percentage))
{
returnBool = false;
}
return returnBool;
}
This code is applied for every pixel, the colors of which are fetched using lockbits.
So basically, the question is, how can I get my program to run faster? Is it my algorithm, or is there something I can use that is faster than lockBits?
By the way, the project is on gitHub, here
Are you really passing in a floating point number as a percentage to isMatching?
I looked at your code for isMatching on GitHub and well, yikes. You ported this from Java, right? C# uses bool not Boolean and while I don't know for sure, I don't like the looks of code that does that much boxing and unboxing. Further, you're doing a ton of floating point multiplication and comparison when you don't need to:
public static bool IsMatching(Color a, Color b, int percent)
{
//this method is used to identify whether two pixels,
//of color a and b match, as in they can be considered
//a solid color based on the acceptance value (percent)
int thresh = (int)(percent * 255);
return Math.Abs(a.R - b.R) < thresh &&
Math.Abs(a.G - b.G) < thresh &&
Math.Abs(a.B - b.B) < thresh;
}
This will cut down the amount of work you're doing per pixel. I still don't like it because I try to avoid method calls in the middle of a per-pixel loop especially an 8x per-pixel loop. I made the method static to cut down on an instance being passed in that isn't used. These changes alone will probably double your performance since we're doing only 1 multiply, no boxing, and are now using the inherent short-circuit of && to cut down the work.
If I were doing this, I'd be more likely to do something like this:
// assert: bitmap.Height > 2 && bitmap.Width > 2
BitmapData data = bitmap.LockBits(new Rectangle(0, 0, bitmap.Width, bitmap.Height),
ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
int scaledPercent = percent * 255;
unsafe {
byte* prevLine = (byte*)data.Scan0;
byte* currLine = prevLine + data.Stride;
byte* nextLine = currLine + data.Stride;
for (int y=1; y < bitmap.Height - 1; y++) {
byte* pp = prevLine + 3;
byte* cp = currLine + 3;
byte* np = nextLine + 3;
for (int x = 1; x < bitmap.Width - 1; x++) {
if (IsEdgeOptimized(pp, cp, np, scaledPercent))
{
// do what you need to do
}
pp += 3; cp += 3; np += 3;
}
prevLine = currLine;
currLine = nextLine;
nextLine += data.Stride;
}
}
private unsafe static bool IsEdgeOptimized(byte* pp, byte* cp, byte* np, int scaledPecent)
{
return IsMatching(cp, pp - 3, scaledPercent) &&
IsMatching(cp, pp, scaledPercent) &&
IsMatching(cp, pp + 3, scaledPercent) &&
IsMatching(cp, cp - 3, scaledPercent) &&
IsMatching(cp, cp + 3, scaledPercent) &&
IsMatching(cp, np - 3, scaledPercent) &&
IsMatching(cp, np, scaledPercent) &&
IsMatching(cp, np + 3, scaledPercent);
}
private unsafe static bool IsMatching(byte* p1, byte* p2, int thresh)
{
return Math.Abs(p1++ - p2++) < thresh &&
Math.Abs(p1++ - p2++) < thresh &&
Math.Abs(p1 - p2) < thresh;
}
Which now does all kinds of horrible pointer mangling to cut down on array accesses and so on. If all of this pointer work makes you feel uncomfortable, you can allocate byte arrays for prevLine, currLine and nextLine and do a Marshal.Copy for each row as you go.
The algorithm is this: start one pixel in from the top and left and iterate over every pixel in the image except the outside edge (no edge conditions! Yay!). I keep pointers to the starts of each line, prevLine, currLine, and nextLine. Then when I start the x loop, I make up pp, cp, np which are previous pixel, current pixel and next pixel. current pixel is really the one we care about. pp is the pixel directly above it, np directly below it. I pass those into IsEdgeOptimized which looks around cp, calling IsMatching for each.
Now this all assume 24 bits per pixel. If you're looking at 32 bits per pixel, all those magic 3's in there need to be 4's, but other than that the code doesn't change. You could parameterize the number of bytes per pixel if you want so it could handle either.
FYI, the channels in the pixels are typically b, g, r, (a).
Colors are stored as bytes in memory. Your actual Bitmap, if it is a 24 bit image is stored as a block of bytes. Scanlines are data.Stride bytes wide, which is at least as large as 3 * the number of pixels in a row (it may be larger because scan lines are often padded).
When I declare a variable of type byte * in C#, I'm doing a few things. First, I'm saying that this variable contains the address of a location of a byte in memory. Second, I'm saying that I'm about to violate all the safety measures in .NET because I could now read and write any byte in memory, which can be dangerous.
So when I have something like:
Math.Abs(*p1++ - *p2++) < thresh
What it says is (and this will be long):
Take the byte that p1 points to and hold onto it
Add 1 to p1 (this is the ++ - it makes the pointer point to the next byte)
Take the byte that p2 points to and hold onto it
Add 1 to p2
Subtract step 3 from step 1
Pass that to Math.Abs.
The reasoning behind this is that, historically, reading the contents of a byte and moving forward is a very common operation and one that many CPUs build into a single operation of a couple instructions that pipeline into a single cycle or so.
When we enter IsMatching, p1 points to pixel 1, p2 points to pixel 2 and in memory they are laid out like this:
p1 : B
p1 + 1: G
p1 + 2: R
p2 : B
p2 + 1: G
p2 + 2: R
So IsMatching just does the the absolute difference while stepping through memory.
Your follow-on question tells me that you don't really understand pointers. That's OK - you can probably learn them. Honestly, the concepts really aren't that hard, but the problem with them is that without a lot of experience, you are quite likely to shoot yourself in the foot, and perhaps you should consider just using a profiling tool on your code and cooling down the worst hot spots and call it good.
For example, you'll note that I look from the first row to the penultimate row and the first column to the penultimate column. This is intentional to avoid having to handle the case of "I can't read above the 0th line", which eliminates a big class of potential bugs which would involve reading outside a legal memory block, which may be benign under many runtime conditions.
Instead of copying each image to a byte[], then copying to a Color[], creating another temp Color[9] for each pixel, and then using SetPixel to set the color, compile using the /unsafe flag, mark the method as unsafe, replace copying to a byte[] with Marshal.Copy to:
using (byte* bytePtr = ptr)
{
//code goes here
}
Make sure you replace the SetPixel call with setting the proper bytes. This isn't an issue with LockBits, you need LockBits, the issue is that you're being inefficient with everything else related to processing the image.
If you want to use parallel task execution, you can use the Parallel class in System.Threading.Tasks namespace. Following link has some samples and explanations.
http://csharpexamples.com/fast-image-processing-c/
http://msdn.microsoft.com/en-us/library/dd460713%28v=vs.110%29.aspx
You can split the image into 10 bitmaps and process each one, then finally combine them (just an idea).
Related
I am new to image processing. I have a portion of an image that I have to search in the whole image by comparing pixels. I need to get the coordinates of the small image present in the complete image.
So, I am doing
for int i = 0 to Complete_Image.Lenght
for int j = 0 to Complete_Image.Height
for int x = 0 to Small_Image.Lenght
for int y = 0 to Small_Image.Height
if Complete_Image[i+j+x][i+j+y] == Small_Image[x][y]
Message "image found at coordinate x, y"
Break
It is a simple pixel-matching algorithm that finds a certain portion of a image in a complete image by comparing pixels.
It is very time-consuming. For example, if I have to find coordinates of a 50X50 image in a 1000 X 1000 image, it will take 1000 X 1000 X 50 X 50 pixel color comparisons.
So:
Is there a better way to do image comparison in C#?
Can I use AMD Radeon 460 GPU to do this comparison thing in parallel? Or at least some part of the algorithm using GPU power?
Anyway i have run out of time, and might be able to finish the parallel version later.
The premise is passively walking across the sub image, if it finds a full line pixels matches, it the does a sub loop to compare the whole sub image
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static bool CheckSubImage(int* m0, int* s0, Rectangle mR, Rectangle sR, int x, int y, out Point? result)
{
result = null;
for (int sX = 0, mX = x; sX < sR.Width && mX < mR.Right; sX++, mX++)
for (int sY = 0, mY = y; sY < sR.Height && mY < mR.Bottom; sY++, mY++)
if (*(m0 + mX + mY * mR.Width) != *(s0 + sX + sY * sR.Width))
return false;
result = new Point(x, y);
return true;
}
protected override Point? GetPoint(string main, string sub)
{
using (Bitmap m = new Bitmap(main), s = new Bitmap(sub))
{
Rectangle mR = new Rectangle(Point.Empty, m.Size), sR = new Rectangle(Point.Empty, s.Size);
var mD = m.LockBits(mR, ImageLockMode.ReadOnly, PixelFormat.Format32bppPArgb);
var sD = s.LockBits(sR, ImageLockMode.ReadOnly, PixelFormat.Format32bppPArgb);
int* m0 = (int*)mD.Scan0, s0 = (int*)sD.Scan0;
for (var x = mR.Left; x < mR.Right; x++)
for (var y = mR.Top; y < mR.Bottom; y++)
if (*(m0 + x + y * mR.Width) == *s0)
if (CheckSubImage(m0, s0, mR, sR, x, y, out var result))
return result;
m.UnlockBits(mD);
s.UnlockBits(sD);
}
return null;
}
Usage
var result = GetPoint(#"D:\TestImages\Main.bmp", #"D:\TestImages\1159-980.bmp");
The results are about 100 times faster than simple 4 loop approach you had.
Note, this uses the unsafe keyword so you will have to set project to allow unsafe.
Disclaimer : This could be optimized more also, also it could be done in parallel, and obviously would be faster on the gpu. the point was its the algorithm that matters not the processor
Yep, you can use the GPU from C#
cmsoft has a tutorial for their library here
You will need write some instructions in OpenCL
You may also need to check you have a driver/runtime for opencl (or for AMD)
The code is mostly boilerplate stuff. Pretty straight forward. You may spend more time getting the dependencies installed than writing the code.
This method called correlation and can be optimized by FFT.
Correlation can be simply converted to convolution by rotation of the kernel. So if both the Complete_Image and Small_Image padded by enough zeros and the Small_Image rotated by 180 degree, then the IFFT of the product of the FFT of the two images can compute the entire correlation image in O(n log n) order. Where n is the size of the image(LengthHeight). The main correlation problem has the order of nearly nn.
I am working on a photo software for desktop PC that works on Windows 8. I would like to be able to remove the green background from the photo by means of chroma keying.
I'm a beginner in image manipulation, i found some cool links ( like http://www.quasimondo.com/archives/000615.php ), but I can't transale it in c# code.
I'm using a webcam (with aforge.net) to see a preview and take a picture.
I tried color filters but the green background isn't really uniform, so this doesn't work.
How to do that properly in C#?
It will work, even if the background isn't uniform, you'll just need the proper strategy that is generous enough to grab all of your greenscreen without replacing anything else.
Since at least some links on your linked page are dead, I tried my own approach:
The basics are simple: Compare the image pixel's color with some reference value or apply some other formula to determine whether it should be transparent/replaced.
The most basic formula would involve something as simple as "determine whether green is the biggest value". While this would work with very basic scenes, it can screw you up (e.g. white or gray will be filtered as well).
I've toyed around a bit using some simple sample code. While I used Windows Forms, it should be portable without problems and I'm pretty sure you'll be able to interpret the code. Just note that this isn't necessarily the most performant way to do this.
Bitmap input = new Bitmap(#"G:\Greenbox.jpg");
Bitmap output = new Bitmap(input.Width, input.Height);
// Iterate over all piels from top to bottom...
for (int y = 0; y < output.Height; y++)
{
// ...and from left to right
for (int x = 0; x < output.Width; x++)
{
// Determine the pixel color
Color camColor = input.GetPixel(x, y);
// Every component (red, green, and blue) can have a value from 0 to 255, so determine the extremes
byte max = Math.Max(Math.Max(camColor.R, camColor.G), camColor.B);
byte min = Math.Min(Math.Min(camColor.R, camColor.G), camColor.B);
// Should the pixel be masked/replaced?
bool replace =
camColor.G != min // green is not the smallest value
&& (camColor.G == max // green is the biggest value
|| max - camColor.G < 8) // or at least almost the biggest value
&& (max - min) > 96; // minimum difference between smallest/biggest value (avoid grays)
if (replace)
camColor = Color.Magenta;
// Set the output pixel
output.SetPixel(x, y, camColor);
}
}
I've used an example image from Wikipedia and got the following result:
Just note that you might need different thresholds (8 and 96 in my code above), you might even want to use a different term to determine whether some pixel should be replaced. You can also add smoothening between frames, blending (where there's less green difference), etc. to reduce the hard edges as well.
I've tried Mario solution and it worked perfectly but it's a bit slow for me.
I looked for a different solution and I found a project that uses a more efficient method here.
Github postworthy GreenScreen
That project takes a folder and process all files, I just need an image so I did this:
private Bitmap RemoveBackground(Bitmap input)
{
Bitmap clone = new Bitmap(input.Width, input.Height, PixelFormat.Format32bppArgb);
{
using (input)
using (Graphics gr = Graphics.FromImage(clone))
{
gr.DrawImage(input, new Rectangle(0, 0, clone.Width, clone.Height));
}
var data = clone.LockBits(new Rectangle(0, 0, clone.Width, clone.Height), ImageLockMode.ReadWrite, clone.PixelFormat);
var bytes = Math.Abs(data.Stride) * clone.Height;
byte[] rgba = new byte[bytes];
System.Runtime.InteropServices.Marshal.Copy(data.Scan0, rgba, 0, bytes);
var pixels = Enumerable.Range(0, rgba.Length / 4).Select(x => new {
B = rgba[x * 4],
G = rgba[(x * 4) + 1],
R = rgba[(x * 4) + 2],
A = rgba[(x * 4) + 3],
MakeTransparent = new Action(() => rgba[(x * 4) + 3] = 0)
});
pixels
.AsParallel()
.ForAll(p =>
{
byte max = Math.Max(Math.Max(p.R, p.G), p.B);
byte min = Math.Min(Math.Min(p.R, p.G), p.B);
if (p.G != min && (p.G == max || max - p.G < 7) && (max - min) > 20)
p.MakeTransparent();
});
System.Runtime.InteropServices.Marshal.Copy(rgba, 0, data.Scan0, bytes);
clone.UnlockBits(data);
return clone;
}
}
Do not forget to dispose of your Input Bitmap and the return of this method.
If you need to save the image just use the Save instruction of Bitmap.
clone.Save(#"C:\your\folder\path", ImageFormat.Png);
Here you can find methods to process an image even faster.Fast Image Processing in C#
Chromakey on a photo should assume an analog input. In the real world, exact values are very rare.
How do you compensate for this? Provide a threshold around the green of your choice in both hue and tone. Any colour within this threshold (inclusive) should be replaced by your chosen background; transparent may be best. In the first link, the Mask In and Mask Out parameters achieve this. The pre and post blur parameters attempt to make the background more uniform to reduce encoding noise side effects so that you can use a narrower (preferred) threshold.
For performance, you may want to write a pixel shader to zap the 'green' to transparent but that is a consideration for after you get it working.
I'm a newbie to using OpenCL (with the OpenCL.NET library) with Visual Studio C#, and am currently working on an application that computes a large 3D matrix. At each pixel in the matrix, 192 unique values are computed and then summed to yield the final value for that pixel. So, functionally, it is like a 4-D matrix, (161 x 161 x 161) x 192.
Right now I'm calling the kernel from my host code like this:
//C# host code
...
float[] BigMatrix = new float[161*161*161]; //1-D result array
CLCalc.Program.Variable dev_BigMatrix = new CLCalc.Program.Variable(BigMatrix);
CLCalc.Program.Variable dev_OtherArray = new CLCalc.Program.Variable(otherArray);
//...load some other variables here too.
CLCalc.Program.Variable[] args = new CLCalc.Program.Variable[7] {//stuff...}
//Here, I execute the kernel, with a 2-dimensional worker pool:
BigMatrixCalc.Execute(args, new int[2]{N*N*N,192});
dev_BigMatrix.ReadFromDeviceTo(BigMatrix);
Sample kernel code is posted below.
__kernel void MyKernel(
__global float * BigMatrix
__global float * otherArray
//various other variables...
)
{
int N = 161; //Size of matrix edges
int pixel_id = get_global_id(0); //The location of the pixel in the 1D array
int array_id = get_global_id(1); //The location within the otherArray
//Finding the x,y,z values of the pixel_id.
float3 p;
p.x = pixel_id % N;
p.y = ((pixel_id % (N*N))-p.x)/N;
p.z = (pixel_id - p.x - p.y*N)/(N*N);
float result;
//...
//Some long calculation for 'result' involving otherArray and p...
//...
BigMatrix[pixel_id] += result;
}
My code currently works, however I'm looking for speed for this application, and I'm not sure if my worker/group setup is the best approach (i.e. 161*161*161 and 192 for dimensions of the worker pool).
I've seen other examples of organizing the global worker pool into local worker groups to increase efficiency, but I'm not quite sure how to implement that in OpenCL.NET. I'm also not sure how this is different than just creating another dimension in the worker pool.
So, my question is: Can I use local groups here, and if so how would I organize them? In general, how is using local groups different than just calling an n-dimensional worker pool? (i.e. calling Execute(args, new int[]{(N*N*N),192}), versus having a local workgroup size of 192?)
Thanks for all the help!
I think a lot of performance is lost waiting on memory access. I have answered a similar SO question. I hope my post helps you out. Please ask any questions you have.
Optimizations:
The big boost in my version of your kernel comes from reading otherArray into local memory.
each work item computes 4 values in BigMatrix. This means they can be written at the same time, on the same cacheline. There is minimal loss of parallelism because there are still > 1M work items to execute.
...
#define N 161
#define Nsqr N*N
#define Ncub N*N*N
#define otherSize 192
__kernel void MyKernel(__global float * BigMatrix, __global float * otherArray)
{
//using 1 quarter of the total size of the matrix
//this work item will be responsible for computing 4 consecutive values in BigMatrix
//also reduces global size to (N^3)/4 ~= 1043000 for N=161
int global_id = get_global_id(0) * 4; //The location of the first pixel in the 1D array
int pixel_id;
//array_id won't be used anymore. work items will process BigMatrix[pixel_id] entirely
int local_id = get_local_id(0); //work item id within the group
int local_size = get_local_size(0); //size of group
float result[4]; //result cached for 4 global values
int i, j;
float3 p;
//cache the values in otherArray to local memory
//now each work item in the group will be able to read the values efficently
//each element in otherArray will be read a total of N^3 times, so this is important
//opencl specifies at least 16kb of local memory, so up to 4k floats will work fine
__local float otherValues[otherSize];
for(i=local_id; i<otherSize; i+= local_size){
otherValues[i] = otherArray[i];
}
mem_fence(CLK_LOCAL_MEM_FENCE);
//now this work item can compute the complete result for pixel_id
for(j=0;j<4;j++){
result[j] = 0;
pixel_id = global_id + j;
//Finding the x,y,z values of the pixel_id.
//TODO: optimize the calculation of p.y and p.z
//they will be the same most of the time for a given work item
p.x = pixel_id % N;
p.y = ((pixel_id % Nsqr)-p.x)/N;
p.z = (pixel_id - p.x - p.y*N)/Nsqr;
for(i=0;i<otherSize;i++){
//...
//Some long calculation for 'result' involving otherValues[i] and p...
//...
//result[j] += ...
}
}
//4 consecutive writes to BigMatrix will fall in the same cacheline (faster)
BigMatrix[global_id] += result[0];
BigMatrix[global_id + 1] += result[1];
BigMatrix[global_id + 2] += result[2];
BigMatrix[global_id + 3] += result[3];
}
Notes:
Global work size needs to be a multiple of four. Ideally, a multiple of 4*workgroupsize. This is because there is no error checking to see if each pixel_id falls within the range: 0..N^3-1. Unprocessed elements can be crunched by the cpu while you wait for the kernel to execute.
The work group size should be fairly large. This will force the cached values to be used more heavily and the benefit of caching the data in LDS will grow.
There is a further optimization to be done with the calculation of p.x/y/z in order to avoid too many costly division and modulo operations. see code below.
__kernel void MyKernel(__global float * BigMatrix, __global float * otherArray) {
int global_id = get_global_id(0) * 4; //The location of the first pixel in the 1D array
int pixel_id = global_id;
int local_id = get_local_id(0); //work item id within the group
int local_size = get_local_size(0); //size of group
float result[4]; //result cached for 4 global values
int i, j;
float3 p;
//Finding the initial x,y,z values of the pixel_id.
p.x = pixel_id % N;
p.y = ((pixel_id % Nsqr)-p.x)/N;
p.z = (pixel_id - p.x - p.y*N)/Nsqr;
//cache the values here. same as above...
//now this work item can compute the complete result for pixel_id
for(j=0;j<4;j++){
result[j] = 0;
//increment the x,y,and z values instead of computing them all from scratch
p.x += 1;
if(p.x >= N){
p.x = 0;
p.y += 1;
if(p.y >= N){
p.y = 0;
p.z += 1;
}
}
for(i=0;i<otherSize;i++){
//same i loop as above...
}
}
I have a few suggestions for you:
I think your code has a race condition. Your last line of code has the same element of BigMatrix being modified by multiple different work items.
If your matrix is truly 161x161x161, there is plenty of work items here to use those dimensions as your only dimensions. You already have > 4 million work items, which should be plenty of parallelism for your machine. You don't need 192 times that. Plus, if you don't split the computation of an individual pixel into multiple work items, you won't need to synchronize the final add.
If your global work size is not a nice multiple of a big power of 2, you might try to pad it out so that it becomes one. Even if you pass NULL as your local work size, some OpenCL implementations choose inefficient local sizes for global sizes that don't divide well.
If you don't need local memory or barriers for your algorithm, you can pretty much skip local workgroups.
Hope this helps!
I read the "C.Sharp 3.0 in a Nutshell" book and met the next piece of code, that interested me.
unsafe void RedFilter(int[,] bitmap)
{
int length = bitmap.Length;
fixed (int* b = bitmap)
{
int* p = b;
for(int i = 0; i < length; i++)
*p++ &= 0xFF;
}
}
Could anyone explain me how does this "*p++ &= 0xFF" work?
The function is presumably meant to take a bitmap image, and filter out all the colors except red.
This assumes that it's a 32-bit bitmap, where each pixel is represented by an int.
You're dereferencing the memory location currently pointed to by p (which is an int), and ANDing it with 0xFF, which effectively leaves only the red component of the pixel (assuming that the lowest byte is the red component). You're also automatically incrementing the pointer to the next int (with ++). Does that answer it?
It's the same as this (IMO the original *p++ &= 0xFF; trick is a little nasty -- it's one line of code that's doing two things):
*p = *p & 0xFF;
p++;
An expression like a = a & 0xFF sets all but the bottom 8 bits of variable a to zero.
It's the kind of syntax you'd find in the C language, frowned upon there as well but not uncommon. Written out:
int temp = *p;
temp = temp & 0x000000ff;
*p = temp;
p = p + 1; // Move pointer by 4 bytes.
It depends on the bitmap format if this will work out well. Not commonly, the code resets the alpha of the pixels to zero. Producing a black image.
This code increments the pointer p, making it point to the next pixel in the bitmap, then masks off everything but the least significant byte, which, in Microsoft BMP format (and I assume in other nonstandard implementations) is in BGR format.
This has the effect of removing all of the color components except for red.
As you probably know,
x &= y
is the same as
x = x & y
An 'int' anded with '0xff' will set all the bits in the higher 3 bytes to zero (a mask).
The askterisk is dereferencing the pointer, so
*p
is an integer.
The 'post increment' updates the actual pointer value to point to the next integer in memory.
So in total, the code will run through 'length' integers in memory, and mask out all but the lowest byte (i.e., the 'red' byte if these are [A]BGR colour values).
I'm trying to write a program to programmatically determine the tilt or angle of rotation in an arbitrary image.
Images have the following properties:
Consist of dark text on a light background
Occasionally contain horizontal or vertical lines which only intersect at 90 degree angles.
Skewed between -45 and 45 degrees.
See this image as a reference (its been skewed 2.8 degrees).
So far, I've come up with this strategy: Draw a route from left to right, always selecting the nearest white pixel. Presumably, the route from left to right will prefer to follow the path between lines of text along the tilt of the image.
Here's my code:
private bool IsWhite(Color c) { return c.GetBrightness() >= 0.5 || c == Color.Transparent; }
private bool IsBlack(Color c) { return !IsWhite(c); }
private double ToDegrees(decimal slope) { return (180.0 / Math.PI) * Math.Atan(Convert.ToDouble(slope)); }
private void GetSkew(Bitmap image, out double minSkew, out double maxSkew)
{
decimal minSlope = 0.0M;
decimal maxSlope = 0.0M;
for (int start_y = 0; start_y < image.Height; start_y++)
{
int end_y = start_y;
for (int x = 1; x < image.Width; x++)
{
int above_y = Math.Max(end_y - 1, 0);
int below_y = Math.Min(end_y + 1, image.Height - 1);
Color center = image.GetPixel(x, end_y);
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);
if (IsWhite(center)) { /* no change to end_y */ }
else if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
}
decimal slope = (Convert.ToDecimal(start_y) - Convert.ToDecimal(end_y)) / Convert.ToDecimal(image.Width);
minSlope = Math.Min(minSlope, slope);
maxSlope = Math.Max(maxSlope, slope);
}
minSkew = ToDegrees(minSlope);
maxSkew = ToDegrees(maxSlope);
}
This works well on some images, not so well on others, and its slow.
Is there a more efficient, more reliable way to determine the tilt of an image?
I've made some modifications to my code, and it certainly runs a lot faster, but its not very accurate.
I've made the following improvements:
Using Vinko's suggestion, I avoid GetPixel in favor of working with bytes directly, now the code runs at the speed I needed.
My original code simply used "IsBlack" and "IsWhite", but this isn't granular enough. The original code traces the following paths through the image:
http://img43.imageshack.us/img43/1545/tilted3degtextoriginalw.gif
Note that a number of paths pass through the text. By comparing my center, above, and below paths to the actual brightness value and selecting the brightest pixel. Basically I'm treating the bitmap as a heightmap, and the path from left to right follows the contours of the image, resulting a better path:
http://img10.imageshack.us/img10/5807/tilted3degtextbrightnes.gif
As suggested by Toaomalkster, a Gaussian blur smooths out the height map, I get even better results:
http://img197.imageshack.us/img197/742/tilted3degtextblurredwi.gif
Since this is just prototype code, I blurred the image using GIMP, I did not write my own blur function.
The selected path is pretty good for a greedy algorithm.
As Toaomalkster suggested, choosing the min/max slope is naive. A simple linear regression provides a better approximation of the slope of a path. Additionally, I should cut a path short once I run off the edge of the image, otherwise the path will hug the top of the image and give an incorrect slope.
Code
private double ToDegrees(double slope) { return (180.0 / Math.PI) * Math.Atan(slope); }
private double GetSkew(Bitmap image)
{
BrightnessWrapper wrapper = new BrightnessWrapper(image);
LinkedList<double> slopes = new LinkedList<double>();
for (int y = 0; y < wrapper.Height; y++)
{
int endY = y;
long sumOfX = 0;
long sumOfY = y;
long sumOfXY = 0;
long sumOfXX = 0;
int itemsInSet = 1;
for (int x = 1; x < wrapper.Width; x++)
{
int aboveY = endY - 1;
int belowY = endY + 1;
if (aboveY < 0 || belowY >= wrapper.Height)
{
break;
}
int center = wrapper.GetBrightness(x, endY);
int above = wrapper.GetBrightness(x, aboveY);
int below = wrapper.GetBrightness(x, belowY);
if (center >= above && center >= below) { /* no change to endY */ }
else if (above >= center && above >= below) { endY = aboveY; }
else if (below >= center && below >= above) { endY = belowY; }
itemsInSet++;
sumOfX += x;
sumOfY += endY;
sumOfXX += (x * x);
sumOfXY += (x * endY);
}
// least squares slope = (NΣ(XY) - (ΣX)(ΣY)) / (NΣ(X^2) - (ΣX)^2), where N = elements in set
if (itemsInSet > image.Width / 2) // path covers at least half of the image
{
decimal sumOfX_d = Convert.ToDecimal(sumOfX);
decimal sumOfY_d = Convert.ToDecimal(sumOfY);
decimal sumOfXY_d = Convert.ToDecimal(sumOfXY);
decimal sumOfXX_d = Convert.ToDecimal(sumOfXX);
decimal itemsInSet_d = Convert.ToDecimal(itemsInSet);
decimal slope =
((itemsInSet_d * sumOfXY) - (sumOfX_d * sumOfY_d))
/
((itemsInSet_d * sumOfXX_d) - (sumOfX_d * sumOfX_d));
slopes.AddLast(Convert.ToDouble(slope));
}
}
double mean = slopes.Average();
double sumOfSquares = slopes.Sum(d => Math.Pow(d - mean, 2));
double stddev = Math.Sqrt(sumOfSquares / (slopes.Count - 1));
// select items within 1 standard deviation of the mean
var testSample = slopes.Where(x => Math.Abs(x - mean) <= stddev);
return ToDegrees(testSample.Average());
}
class BrightnessWrapper
{
byte[] rgbValues;
int stride;
public int Height { get; private set; }
public int Width { get; private set; }
public BrightnessWrapper(Bitmap bmp)
{
Rectangle rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
System.Drawing.Imaging.BitmapData bmpData =
bmp.LockBits(rect,
System.Drawing.Imaging.ImageLockMode.ReadOnly,
bmp.PixelFormat);
IntPtr ptr = bmpData.Scan0;
int bytes = bmpData.Stride * bmp.Height;
this.rgbValues = new byte[bytes];
System.Runtime.InteropServices.Marshal.Copy(ptr,
rgbValues, 0, bytes);
this.Height = bmp.Height;
this.Width = bmp.Width;
this.stride = bmpData.Stride;
}
public int GetBrightness(int x, int y)
{
int position = (y * this.stride) + (x * 3);
int b = rgbValues[position];
int g = rgbValues[position + 1];
int r = rgbValues[position + 2];
return (r + r + b + g + g + g) / 6;
}
}
The code is good, but not great. Large amounts of whitespace cause the program to draw relatively flat line, resulting in a slope near 0, causing the code to underestimate the actual tilt of the image.
There is no appreciable difference in the accuracy of the tilt by selecting random sample points vs sampling all points, because the ratio of "flat" paths selected by random sampling is the same as the ratio of "flat" paths in the entire image.
GetPixel is slow. You can get an order of magnitude speed up using the approach listed here.
If text is left (right) aligned you can determine the slope by measuring the distance between the left (right) edge of the image and the first dark pixel in two random places and calculate the slope from that. Additional measurements would lower the error while taking additional time.
First I must say I like the idea. But I've never had to do this before and I'm not sure what all to suggest to improve reliability. The first thing I can think of this is this idea of throwing out statistical anomalies. If the slope suddenly changes sharply then you know you've found a white section of the image that dips into the edge skewing (no pun intended) your results. So you'd want to throw that stuff out somehow.
But from a performance standpoint there are a number of optimizations you could make which may add up.
Namely, I'd change this snippet from your inner loop from this:
Color center = image.GetPixel(x, end_y);
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);
if (IsWhite(center)) { /* no change to end_y */ }
else if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
To this:
Color center = image.GetPixel(x, end_y);
if (IsWhite(center)) { /* no change to end_y */ }
else
{
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);
if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
}
It's the same effect but should drastically reduce the number of calls to GetPixel.
Also consider putting the values that don't change into variables before the madness begins. Things like image.Height and image.Width have a slight overhead every time you call them. So store those values in your own variables before the loops begin. The thing I always tell myself when dealing with nested loops is to optimize everything inside the most inner loop at the expense of everything else.
Also... as Vinko Vrsalovic suggested, you may look at his GetPixel alternative for yet another boost in speed.
At first glance, your code looks overly naive.
Which explains why it doesn't always work.
I like the approach Steve Wortham suggested,
but it might run into problems if you have background images.
Another approach that often helps with images is to blur them first.
If you blur your example image enough, each line of text will end up
as a blurry smooth line. You then apply some sort of algorithm to
basically do a regression analisys. There's lots of ways to do
that, and lots of examples on the net.
Edge detection might be useful, or it might cause more problems that its worth.
By the way, a gaussian blur can be implemented very efficiently if you search hard enough for the code. Otherwise, I'm sure there's lots of libraries available.
Haven't done much of that lately so don't have any links on hand.
But a search for Image Processing library will get you good results.
I'm assuming you're enjoying the fun of solving this, so not much in actual implementation detalis here.
Measuring the angle of every line seems like overkill, especially given the performance of GetPixel.
I wonder if you would have better performance luck by looking for a white triangle in the upper-left or upper-right corner (depending on the slant direction) and measuring the angle of the hypotenuse. All text should follow the same angle on the page, and the upper-left corner of a page won't get tricked by the descenders or whitespace of content above it.
Another tip to consider: rather than blurring, work within a greatly-reduced resolution. That will give you both the smoother data you need, and fewer GetPixel calls.
For example, I made a blank page detection routine once in .NET for faxed TIFF files that simply resampled the entire page to a single pixel and tested the value for a threshold value of white.
What are your constraints in terms of time?
The Hough transform is a very effective mechanism for determining the skew angle of an image. It can be costly in time, but if you're going to use Gaussian blur, you're already burning a pile of CPU time. There are also other ways to accelerate the Hough transform that involve creative image sampling.
Your latest output is confusing me a little.
When you superimposed the blue lines on the source image, did you offset it a bit? It looks like the blue lines are about 5 pixels above the centre of the text.
Not sure about that offset, but you definitely have a problem with the derived line "drifting" away at the wrong angle. It seems to have too strong a bias towards producing a horizontal line.
I wonder if increasing your mask window from 3 pixels (centre, one above, one below) to 5 might improve this (two above, two below). You'll also get this effect if you follow richardtallent's suggestion and resample the image smaller.
Very cool path finding application.
I wonder if this other approach would help or hurt with your particular data set.
Assume a black and white image:
Project all black pixels to the right (EAST). This should give a result of a one dimensional array with a size of IMAGE_HEIGHT. Call the array CANVAS.
As you project all the pixels EAST, keep track numerically of how many pixels project into each bin of CANVAS.
Rotate the image an arbitrary number of degrees and re-project.
Pick the result that gives the highest peaks and lowest valleys for values in CANVAS.
I imagine this will not work well if in fact you have to account for a real -45 -> +45 degrees of tilt. If the actual number is smaller(?+/- 10 degrees), this might be a pretty good strategy. Once you have an intial result, you could consider re-running with a smaller increment of degrees to fine tune the answer. I might therefore try to write this with a function that accepted a float degree_tick as a parm so I could run both a coarse and fine pass (or a spectrum of coarseness or fineness) with the same code.
This might be computationally expensive. To optimize, you might consider selecting just a portion of the image to project-test-rotate-repeat on.