I am working on a photo software for desktop PC that works on Windows 8. I would like to be able to remove the green background from the photo by means of chroma keying.
I'm a beginner in image manipulation, i found some cool links ( like http://www.quasimondo.com/archives/000615.php ), but I can't transale it in c# code.
I'm using a webcam (with aforge.net) to see a preview and take a picture.
I tried color filters but the green background isn't really uniform, so this doesn't work.
How to do that properly in C#?
It will work, even if the background isn't uniform, you'll just need the proper strategy that is generous enough to grab all of your greenscreen without replacing anything else.
Since at least some links on your linked page are dead, I tried my own approach:
The basics are simple: Compare the image pixel's color with some reference value or apply some other formula to determine whether it should be transparent/replaced.
The most basic formula would involve something as simple as "determine whether green is the biggest value". While this would work with very basic scenes, it can screw you up (e.g. white or gray will be filtered as well).
I've toyed around a bit using some simple sample code. While I used Windows Forms, it should be portable without problems and I'm pretty sure you'll be able to interpret the code. Just note that this isn't necessarily the most performant way to do this.
Bitmap input = new Bitmap(#"G:\Greenbox.jpg");
Bitmap output = new Bitmap(input.Width, input.Height);
// Iterate over all piels from top to bottom...
for (int y = 0; y < output.Height; y++)
{
// ...and from left to right
for (int x = 0; x < output.Width; x++)
{
// Determine the pixel color
Color camColor = input.GetPixel(x, y);
// Every component (red, green, and blue) can have a value from 0 to 255, so determine the extremes
byte max = Math.Max(Math.Max(camColor.R, camColor.G), camColor.B);
byte min = Math.Min(Math.Min(camColor.R, camColor.G), camColor.B);
// Should the pixel be masked/replaced?
bool replace =
camColor.G != min // green is not the smallest value
&& (camColor.G == max // green is the biggest value
|| max - camColor.G < 8) // or at least almost the biggest value
&& (max - min) > 96; // minimum difference between smallest/biggest value (avoid grays)
if (replace)
camColor = Color.Magenta;
// Set the output pixel
output.SetPixel(x, y, camColor);
}
}
I've used an example image from Wikipedia and got the following result:
Just note that you might need different thresholds (8 and 96 in my code above), you might even want to use a different term to determine whether some pixel should be replaced. You can also add smoothening between frames, blending (where there's less green difference), etc. to reduce the hard edges as well.
I've tried Mario solution and it worked perfectly but it's a bit slow for me.
I looked for a different solution and I found a project that uses a more efficient method here.
Github postworthy GreenScreen
That project takes a folder and process all files, I just need an image so I did this:
private Bitmap RemoveBackground(Bitmap input)
{
Bitmap clone = new Bitmap(input.Width, input.Height, PixelFormat.Format32bppArgb);
{
using (input)
using (Graphics gr = Graphics.FromImage(clone))
{
gr.DrawImage(input, new Rectangle(0, 0, clone.Width, clone.Height));
}
var data = clone.LockBits(new Rectangle(0, 0, clone.Width, clone.Height), ImageLockMode.ReadWrite, clone.PixelFormat);
var bytes = Math.Abs(data.Stride) * clone.Height;
byte[] rgba = new byte[bytes];
System.Runtime.InteropServices.Marshal.Copy(data.Scan0, rgba, 0, bytes);
var pixels = Enumerable.Range(0, rgba.Length / 4).Select(x => new {
B = rgba[x * 4],
G = rgba[(x * 4) + 1],
R = rgba[(x * 4) + 2],
A = rgba[(x * 4) + 3],
MakeTransparent = new Action(() => rgba[(x * 4) + 3] = 0)
});
pixels
.AsParallel()
.ForAll(p =>
{
byte max = Math.Max(Math.Max(p.R, p.G), p.B);
byte min = Math.Min(Math.Min(p.R, p.G), p.B);
if (p.G != min && (p.G == max || max - p.G < 7) && (max - min) > 20)
p.MakeTransparent();
});
System.Runtime.InteropServices.Marshal.Copy(rgba, 0, data.Scan0, bytes);
clone.UnlockBits(data);
return clone;
}
}
Do not forget to dispose of your Input Bitmap and the return of this method.
If you need to save the image just use the Save instruction of Bitmap.
clone.Save(#"C:\your\folder\path", ImageFormat.Png);
Here you can find methods to process an image even faster.Fast Image Processing in C#
Chromakey on a photo should assume an analog input. In the real world, exact values are very rare.
How do you compensate for this? Provide a threshold around the green of your choice in both hue and tone. Any colour within this threshold (inclusive) should be replaced by your chosen background; transparent may be best. In the first link, the Mask In and Mask Out parameters achieve this. The pre and post blur parameters attempt to make the background more uniform to reduce encoding noise side effects so that you can use a narrower (preferred) threshold.
For performance, you may want to write a pixel shader to zap the 'green' to transparent but that is a consideration for after you get it working.
Related
As a follow up to this question: (How can I draw legible text on a bitmap (Winforms)?), I'm drawing legible but small text on top of a bitmap by calculating the "average" color beneath the text, and choosing an appropriately contrasting color for the text.
I've stolen Till's code from https://stackoverflow.com/a/6185448/3784949 for calculating "average" bmp color. Now I'm looking at the "color difference" algorithm suggested by http://www.w3.org/TR/AERT#color-contrast.
This suggests that I need to make my color brightness at least 125 "units" greater, and my color difference at least 500 units greater, where brightness and difference are calculated like this:
Color brightness is determined by the following formula:
((Red value X 299) + (Green value X 587) + (Blue value X 114)) / 1000
Color difference is determined by the following formula:
(maximum (Red value 1, Red value 2) - minimum (Red value 1, Red value 2)) + (maximum (Green value 1, Green value 2) - minimum (Green value 1, Green value 2)) + (maximum (Blue value 1, Blue value 2) - minimum (Blue value 1, Blue value 2))
How do I implement this? I can set my color by ARGB (I believe, it's a label foreground color); but how do I calculate how much to change each individual value to achieve the difference being required here? I'm not familiar with the math required to break the "difference" units down into their component parts.
As an example, my "average" for one bitmap is: Color [A=255, R=152, G=138, B=129]. How do I "add" enough to each part to achieve the two differences?
EDIT: To be specific, my confusion lies here:
it looks like I need to add to three separate values (R,G,B) to achieve two different goals (new RGB adds up to original plus 125, new RGB adds up to original plus 500
it looks like I may need to "weight" my added brighness values to add more to G than R than B.
I have no idea how to address #1. And I'm not positive I'm correct about #2.
EDIT: Proposed solution
I'm currently experimenting with this:
private Color GetContrastingFontColor(Color AverageColorOfBitmap,
List<Color> FavoriteColors)
{
IEnumerable<Color> AcceptableColors =
(IEnumerable<Color>)FavoriteColors.Where(clr =>
(GetColorDifferenceAboveTarget(AverageColorOfBitmap, clr, (float)200) > 0)
&& (GetBrightnessAboveTarget(AverageColorOfBitmap, clr, (float).125) > 0))
.OrderBy(clr => GetColorDifferenceAboveTarget(
AverageColorOfBitmap, clr, (float)200));
return AcceptableColors.DefaultIfEmpty(Color.Aqua).First();
}
It's a good framework, but I need to work on selecting the "best" candidate from the list. Right now it's just returning "the qualifying color with the greatest color difference that meets the brightness criteria". However, this allows me to modify the float values (W3's "500 color difference required" is complete crap, zero KnownColors qualify) and experiment.
Support code:
private float GetBrightnessAboveTarget(Color AverageColorOfBitmap,
Color proposed, float desiredDifference)
{
float result = proposed.GetBrightness() - AverageColorOfBitmap.GetBrightness();
return result - desiredDifference;
}
private float GetColorDifferenceAboveTarget(Color avg, Color proposed,
float desiredDifference)
{
float r1 = Convert.ToSingle(MaxByte(Color.Red, avg, proposed));
float r2 = Convert.ToSingle(MinByte(Color.Red, avg, proposed));
float r3 = Convert.ToSingle(MaxByte(Color.Green, avg, proposed));
float r4 = Convert.ToSingle(MinByte(Color.Green, avg, proposed));
float r5 = Convert.ToSingle(MaxByte(Color.Blue, avg, proposed));
float r6 = Convert.ToSingle(MinByte(Color.Blue, avg, proposed));
float result = (r1 - r2) + (r3 - r4) + (r5 - r6);
return result - desiredDifference;
}
private byte MaxByte(Color rgb, Color x, Color y)
{
if (rgb == Color.Red) return (x.R >= y.R) ? x.R : y.R;
if (rgb == Color.Green) return (x.G >= y.G) ? x.G : y.G;
if (rgb == Color.Blue) return (x.B >= y.B) ? x.B : y.B;
return byte.MinValue;
}
private byte MinByte(Color rgb, Color x, Color y)
{
if (rgb == Color.Red) return (x.R <= y.R) ? x.R : y.R;
if (rgb == Color.Green) return (x.G <= y.G) ? x.G : y.G;
if (rgb == Color.Blue) return (x.B <= y.B) ? x.B : y.B;
return byte.MinValue;
}
This is more an answer to the original question. I call it a homemeade outline.
Using transparency plus the maximum and minimum brightness you can get (white&black) it creates good contrast, at least it looks pretty good on my screen.
It is a mixture of shadowing and transparency. I have subtracted a little from the red component to get the aqua you thought about..
It is creating first a darker version of the background by printing the text 1 pixel up left and the 1 pixel down right. Finally it prints a bright version on top of that. Note that it is not really using black and white because with its semi-transparent pixels the hue really it that of each background pixel.
For an actual printout you will have to experiment, especially with the font but also with the two transparencies!
Also you should maybe switch between white on a black shadow and black on a white highlight, depending on the brightness of the spot you print on. But with this homemeade outline it really will work on both dark and bright backgrounds, it'll just look a little less elegant on a bright background.
using (Graphics G = Graphics.FromImage(pictureBox1.Image) )
{
Font F = new Font("Arial", 8);
SolidBrush brush0 = new SolidBrush(Color.FromArgb(150, 0, 0, 0))
SolidBrush brush1 = new SolidBrush(Color.FromArgb(200, 255, 255, 222))
G.DrawString(textBox1.Text, F, brush0 , new Point(x-1, y-1));
G.DrawString(textBox1.Text, F, brush0 , new Point(x+1, y+1));
G.DrawString(textBox1.Text, F, brush1, new Point(x, y));
}
Edit: This is called from a button click but really should be in the paint event.
There the Graphics object and its using block G would be replaced by simply the e.Graphics event parameter..
I noticed that you are using a 'transparent' label to display the data to avoid the details of Graphics.DrawString and the Paint event.
Well that can be done and the result looks rather similar:
string theText ="123 - The quick brown fox..";
Label L1, L2, L3;
pictureBox1.Controls.Add(new trLabel());
L1 = (trLabel)pictureBox1.Controls[pictureBox1.Controls.Count - 1];
L1.Text = theText;
L1.ForeColor = Color.FromArgb(150, 0, 0, 0);
L1.Location = new Point(231, 31); // <- position in the image, change!
L1.Controls.Add(new trLabel());
L2 = (trLabel)L1.Controls[pictureBox1.Controls.Count - 1];
L2.Text = theText;
L2.ForeColor = Color.FromArgb(150, 0, 0, 0);
L2.Location = new Point(2, 2); // do not change relative postion in the 1st label!
L2.Controls.Add(new trLabel());
L3 = (trLabel)L2.Controls[pictureBox1.Controls.Count - 1];
L3.Text = theText;
L3.ForeColor = Color.FromArgb(200, 255, 255, 234);
L3.Location = new Point(-1,-1); // do not change relative postion in the 2nd label!
However you will note that due to the impossiblity of having really transparent controls in Winforms we need a little extra effort. You probably use a label subclass like this:
public partial class trLabel : Label
{
public trLabel()
{
SetStyle(ControlStyles.SupportsTransparentBackColor | ControlStyles.UserPaint, true);
BackColor = Color.Transparent;
Visible = true;
AutoSize = true;
}
}
This seems to work. But in reality it only seems that way, because upon creation each label gets a copy of its current background from its parent. Which never gets updated. Which is why I have to add the 2nd & 3rd label not to the picturebox I display the image in, but to the 1st and 2nd
'transparent' label respectively.
There simply is not real transparency between Winforms controls unless you draw things yourself.
So the DrawString solution is not really complicated. And it gives you the bonus of allowing you to twist several properties of the Graphics object like Smoothingmode, TextContrast or InterpolationMode
Short suggestion: Just use black or white.
The algorithms are giving you a passing criteria, but not an algorithm for determining what colors meet that criteria. So, you will have to create such an algorithm. A naive algorithm would be to loop through every possible color, and calculate the color difference, then see if the difference is greater than 125, and if so you have a good color to use. Better, you could search for the color with the maximum difference.
But that's foolish - if I gave you the color R=152, G=138, B=129 - what do YOU think is a very good color to contrast that with? Just by gut, I'm gonna guess 0,0,0. I picked a color with the farthest possible R value, G value, and B value. If you gave me the color 50,200,75 I'd pick R=255, G=0, B=255. Same logic. So my algorithm is if R<128 choose R = 255, else choose R = 0. Same thing for G, and B.
Now that algorithm only picks RGB values that are 0 or 255. But if you don't like that, now you need a mathematical definition for what is "pretty" and I'll leave you to figure that out on your own. :-)
Within a RGB image (from a webcam) I'm looking for a way to increase the intensity/brightness of green. Glad if anyone can give a starting point.
I'm using AFORGE.NET in C# and/or OpenCV directly in C++.
in general multiplication of pixel values is though of as an increase in contrast and addition is though of as an increase in brightness.
in c#
where you have an array to the first pixel in the image such as this:
byte[] pixelsIn;
byte[] pixelsOut; //assuming RGB ordered data
and contrast and brightness values such as this:
float gC = 1.5;
float gB = 50;
you can multiply and/or add to the green channel to achieve your desired effect: (r - row, c - column, ch - nr of channels)
pixelsOut[r*w*ch + c*ch] = pixelsIn[r*w*ch + c*ch] //red
int newGreen = (int)(pixelsIn[r*w*ch + c*ch+1] * gC + gB); //green
pixelsOut[r*w*ch + c*ch+1] = (byte)(newGreen > 255 ? 255 : newGreen < 0 ? 0 : newGreen); //check for overflow
pixelsOut[r*w*ch + c*ch+2] = pixelsIn[r*w*ch + c*ch+2]//blue
obviously you would want to use pointers here to speed things up.
(Please note: this code has NOT BEEN TESTED)
For AFORGE.NET, I suggest use ColorRemapping class to map the values in your green channel to other value. The mapping function should be a concave function from [0,255] to [0,255] if your want to increase the brightness without losing details.
This is what I came up with after reading through many pages of AForge.NET and OpenCV documentation. If you apply the saturation filter first, you might get a dizzy image. If you apply it later you will get a much clearer image but some "light green" pixels might have been lost before while applying the HSV filter.
// apply saturation filter to increase green intensity
var f1 = new SaturationCorrection(0.5f);
f1.ApplyInPlace(image);
var filter = new HSLFiltering();
filter.Hue = new IntRange(83, 189); // all green (large range)
//filter.Hue = new IntRange(100, 120); // light green (small range)
// this will convert all pixels outside the range into gray-scale
//filter.UpdateHue = false;
//filter.UpdateLuminance = false;
// this will convert all pixels outside that range blank (filter.FillColor)
filter.Saturation = new Range(0.4f, 1);
filter.Luminance = new Range(0.4f, 1);
// apply the HSV filter to get only green pixels
filter.ApplyInPlace(image);
I'm having an issue with creating a histogram representation of an image in a WinRT app. What I'd like to make consists of four histogram plots for Red, Green, Blue, Luminosity for an image.
My main issue is how to actually draw a picture of that Histogram so I could show it on the screen. My code so far is pretty... messy, I've searched a lot for this topic, mostly my results consisted of code in Java, which I'm trying somehow to translate it in C#, but API is pretty different... Had an attempt from AForge as well but that's winforms...
Here's my messy code, I know it looks bad but I'm striving to make this work first :
public static WriteableBitmap CreateHistogramRepresentation(long[] histogramData, HistogramType type)
{
//I'm trying to determine a max height of a histogram bar, so
//I could determine a max height of the image that then I'll remake it
//at a lower resolution :
var max = histogramData[0];
//Determine the max value, the highest bar in the histogram, the initial height of the image.
for (int i = 0; i < histogramData.Length; i++)
{
if (histogramData[i] > max)
max = histogramData[i];
}
var bitmap = new WriteableBitmap(256, 500);
//Set a color to draw with according to the type of the histogram :
var color = Colors.White;
switch (type)
{
case HistogramType.Blue :
{
color = Colors.RoyalBlue;
break;
}
case HistogramType.Green:
{
color = Colors.OliveDrab;
break;
}
case HistogramType.Red:
{
color = Colors.Firebrick;
break;
}
case HistogramType.Luminosity:
{
color = Colors.DarkSlateGray;
break;
}
}
//Compute a scaler to scale the bars to the actual image dimensions :
var scaler = 1;
while (max/scaler > 500)
{
scaler++;
}
var stream = bitmap.PixelBuffer.AsStream();
var streamBuffer = new byte[stream.Length];
//Make a white image initially :
for (var i = 0; i < streamBuffer.Length; i++)
{
streamBuffer[i] = 255;
}
//Color the image :
for (var i = 0; i < 256; i++) // i = column
{
for (var j = 0; j < histogramData[i] / scaler; j++) // j = line
{
streamBuffer[j*256*4 + i] = color.B; //the image has a 256-pixel width
streamBuffer[j*256*4 + i + 1] = color.G;
streamBuffer[j*256*4 + i + 2] = color.R;
streamBuffer[j*256*4 + i + 2] = color.A;
}
}
//Write the Pixel Data into the Pixel Buffer of the future Histogram image :
stream.Seek(0, 0);
stream.Write(streamBuffer, 0, streamBuffer.Length);
return bitmap.Flip(WriteableBitmapExtensions.FlipMode.Horizontal);
}
This creates a pretty bad histogram representation, it doesn't even colour it with an corresponding colour... It's not working properly, I'm working on it to fix it...
If you can contribute with a link you might know any code for a histogram representation for WinRT apps or everything else is greatly appreciated.
While you could use a charting control as JP Alioto pointed out, histograms tend to represent a lot of data. In your sample alone you're rendering 256 bars * 4 axis (R,G,B,L). The problem with charting controls is that they usually like to be handed collections (or arrays) of hydrated data, which they draw, and which they tend to keep in memory. A histogram like yours would need to have 1024 objects in memory (256 * 4) and passed to the chart as a whole. It's just not a good use of memory management.
The alternative of course is to draw it yourself. But as you've found, pixel-by-pixel drawing can be a bit of a pain. The best answer - in my opinion - is to agree with Shahar and recommend you use WriteableBitmapEx on CodePlex.
http://writeablebitmapex.codeplex.com
WriteableBitmapEx includes methods for drawing shapes like lines and rectangles that are very very fast. You can draw the data as you enumerate it (instead of having to have it all in memory at one time) and the result is a nice compact image that is already "bitmap cached" (meaning it renders very fast since it doesn't have to redrawn on each frame).
Dev support, design support and more awesome goodness on the way: http://bit.ly/winappsupport
How would I go about mimicking this halftone effect in GDI+?
It almost looks like Floyd–Steinberg dithered version of the image overlaying a full one but I'm not convinced.
I gave this a try and got this result:
It may be a place to start. I did it like this:
Draw the original picture with low saturation (using a color
matrix)
Draw the original image onto 1) with high saturation
using a pattern mask (ie the dots)
I created the pattern mask like this:
using (var g = Graphics.FromImage(bmpPattern))
{
g.Clear(Color.Black);
g.SmoothingMode = SmoothingMode.HighQuality;
for (var y = 0; y < bmp.Height; y += 10)
for (var x = 0; x < bmp.Width ; x += 6)
{
g.FillEllipse(Brushes.White, x, y, 4, 4);
g.FillEllipse(Brushes.White, x + 3, y + 5, 4, 4);
}
}
And then I imposed it over the oversaturated bitmap using this technique.
Update: Elaboration on how the images got merged. Let's talk even a little more general and say that we want to combine two different colorized versions of the same image using a pattern mask, resulting in a new image - we could do it like this:
Create THREE new bitmaps, all with the same size as the original image. Call them bmpA, bmpB and bmpMask.
Draw one colored/effect version into bmpA
Draw the other colored/effect version into bmpB
Create the mask in bmpMask (black and white)
Push one of the R/G/B channels of bmpMask into the alpha channel of bmpB using the
transferOneARGBChannelFromOneBitmapToAnother method.
Draw bmpB over bmpA (since bmpB now has transparent parts in it)
The result is now bmpA. bmpB and bmpMask can be disposed.
Done
I'm trying to write a program to programmatically determine the tilt or angle of rotation in an arbitrary image.
Images have the following properties:
Consist of dark text on a light background
Occasionally contain horizontal or vertical lines which only intersect at 90 degree angles.
Skewed between -45 and 45 degrees.
See this image as a reference (its been skewed 2.8 degrees).
So far, I've come up with this strategy: Draw a route from left to right, always selecting the nearest white pixel. Presumably, the route from left to right will prefer to follow the path between lines of text along the tilt of the image.
Here's my code:
private bool IsWhite(Color c) { return c.GetBrightness() >= 0.5 || c == Color.Transparent; }
private bool IsBlack(Color c) { return !IsWhite(c); }
private double ToDegrees(decimal slope) { return (180.0 / Math.PI) * Math.Atan(Convert.ToDouble(slope)); }
private void GetSkew(Bitmap image, out double minSkew, out double maxSkew)
{
decimal minSlope = 0.0M;
decimal maxSlope = 0.0M;
for (int start_y = 0; start_y < image.Height; start_y++)
{
int end_y = start_y;
for (int x = 1; x < image.Width; x++)
{
int above_y = Math.Max(end_y - 1, 0);
int below_y = Math.Min(end_y + 1, image.Height - 1);
Color center = image.GetPixel(x, end_y);
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);
if (IsWhite(center)) { /* no change to end_y */ }
else if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
}
decimal slope = (Convert.ToDecimal(start_y) - Convert.ToDecimal(end_y)) / Convert.ToDecimal(image.Width);
minSlope = Math.Min(minSlope, slope);
maxSlope = Math.Max(maxSlope, slope);
}
minSkew = ToDegrees(minSlope);
maxSkew = ToDegrees(maxSlope);
}
This works well on some images, not so well on others, and its slow.
Is there a more efficient, more reliable way to determine the tilt of an image?
I've made some modifications to my code, and it certainly runs a lot faster, but its not very accurate.
I've made the following improvements:
Using Vinko's suggestion, I avoid GetPixel in favor of working with bytes directly, now the code runs at the speed I needed.
My original code simply used "IsBlack" and "IsWhite", but this isn't granular enough. The original code traces the following paths through the image:
http://img43.imageshack.us/img43/1545/tilted3degtextoriginalw.gif
Note that a number of paths pass through the text. By comparing my center, above, and below paths to the actual brightness value and selecting the brightest pixel. Basically I'm treating the bitmap as a heightmap, and the path from left to right follows the contours of the image, resulting a better path:
http://img10.imageshack.us/img10/5807/tilted3degtextbrightnes.gif
As suggested by Toaomalkster, a Gaussian blur smooths out the height map, I get even better results:
http://img197.imageshack.us/img197/742/tilted3degtextblurredwi.gif
Since this is just prototype code, I blurred the image using GIMP, I did not write my own blur function.
The selected path is pretty good for a greedy algorithm.
As Toaomalkster suggested, choosing the min/max slope is naive. A simple linear regression provides a better approximation of the slope of a path. Additionally, I should cut a path short once I run off the edge of the image, otherwise the path will hug the top of the image and give an incorrect slope.
Code
private double ToDegrees(double slope) { return (180.0 / Math.PI) * Math.Atan(slope); }
private double GetSkew(Bitmap image)
{
BrightnessWrapper wrapper = new BrightnessWrapper(image);
LinkedList<double> slopes = new LinkedList<double>();
for (int y = 0; y < wrapper.Height; y++)
{
int endY = y;
long sumOfX = 0;
long sumOfY = y;
long sumOfXY = 0;
long sumOfXX = 0;
int itemsInSet = 1;
for (int x = 1; x < wrapper.Width; x++)
{
int aboveY = endY - 1;
int belowY = endY + 1;
if (aboveY < 0 || belowY >= wrapper.Height)
{
break;
}
int center = wrapper.GetBrightness(x, endY);
int above = wrapper.GetBrightness(x, aboveY);
int below = wrapper.GetBrightness(x, belowY);
if (center >= above && center >= below) { /* no change to endY */ }
else if (above >= center && above >= below) { endY = aboveY; }
else if (below >= center && below >= above) { endY = belowY; }
itemsInSet++;
sumOfX += x;
sumOfY += endY;
sumOfXX += (x * x);
sumOfXY += (x * endY);
}
// least squares slope = (NΣ(XY) - (ΣX)(ΣY)) / (NΣ(X^2) - (ΣX)^2), where N = elements in set
if (itemsInSet > image.Width / 2) // path covers at least half of the image
{
decimal sumOfX_d = Convert.ToDecimal(sumOfX);
decimal sumOfY_d = Convert.ToDecimal(sumOfY);
decimal sumOfXY_d = Convert.ToDecimal(sumOfXY);
decimal sumOfXX_d = Convert.ToDecimal(sumOfXX);
decimal itemsInSet_d = Convert.ToDecimal(itemsInSet);
decimal slope =
((itemsInSet_d * sumOfXY) - (sumOfX_d * sumOfY_d))
/
((itemsInSet_d * sumOfXX_d) - (sumOfX_d * sumOfX_d));
slopes.AddLast(Convert.ToDouble(slope));
}
}
double mean = slopes.Average();
double sumOfSquares = slopes.Sum(d => Math.Pow(d - mean, 2));
double stddev = Math.Sqrt(sumOfSquares / (slopes.Count - 1));
// select items within 1 standard deviation of the mean
var testSample = slopes.Where(x => Math.Abs(x - mean) <= stddev);
return ToDegrees(testSample.Average());
}
class BrightnessWrapper
{
byte[] rgbValues;
int stride;
public int Height { get; private set; }
public int Width { get; private set; }
public BrightnessWrapper(Bitmap bmp)
{
Rectangle rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
System.Drawing.Imaging.BitmapData bmpData =
bmp.LockBits(rect,
System.Drawing.Imaging.ImageLockMode.ReadOnly,
bmp.PixelFormat);
IntPtr ptr = bmpData.Scan0;
int bytes = bmpData.Stride * bmp.Height;
this.rgbValues = new byte[bytes];
System.Runtime.InteropServices.Marshal.Copy(ptr,
rgbValues, 0, bytes);
this.Height = bmp.Height;
this.Width = bmp.Width;
this.stride = bmpData.Stride;
}
public int GetBrightness(int x, int y)
{
int position = (y * this.stride) + (x * 3);
int b = rgbValues[position];
int g = rgbValues[position + 1];
int r = rgbValues[position + 2];
return (r + r + b + g + g + g) / 6;
}
}
The code is good, but not great. Large amounts of whitespace cause the program to draw relatively flat line, resulting in a slope near 0, causing the code to underestimate the actual tilt of the image.
There is no appreciable difference in the accuracy of the tilt by selecting random sample points vs sampling all points, because the ratio of "flat" paths selected by random sampling is the same as the ratio of "flat" paths in the entire image.
GetPixel is slow. You can get an order of magnitude speed up using the approach listed here.
If text is left (right) aligned you can determine the slope by measuring the distance between the left (right) edge of the image and the first dark pixel in two random places and calculate the slope from that. Additional measurements would lower the error while taking additional time.
First I must say I like the idea. But I've never had to do this before and I'm not sure what all to suggest to improve reliability. The first thing I can think of this is this idea of throwing out statistical anomalies. If the slope suddenly changes sharply then you know you've found a white section of the image that dips into the edge skewing (no pun intended) your results. So you'd want to throw that stuff out somehow.
But from a performance standpoint there are a number of optimizations you could make which may add up.
Namely, I'd change this snippet from your inner loop from this:
Color center = image.GetPixel(x, end_y);
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);
if (IsWhite(center)) { /* no change to end_y */ }
else if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
To this:
Color center = image.GetPixel(x, end_y);
if (IsWhite(center)) { /* no change to end_y */ }
else
{
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);
if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
}
It's the same effect but should drastically reduce the number of calls to GetPixel.
Also consider putting the values that don't change into variables before the madness begins. Things like image.Height and image.Width have a slight overhead every time you call them. So store those values in your own variables before the loops begin. The thing I always tell myself when dealing with nested loops is to optimize everything inside the most inner loop at the expense of everything else.
Also... as Vinko Vrsalovic suggested, you may look at his GetPixel alternative for yet another boost in speed.
At first glance, your code looks overly naive.
Which explains why it doesn't always work.
I like the approach Steve Wortham suggested,
but it might run into problems if you have background images.
Another approach that often helps with images is to blur them first.
If you blur your example image enough, each line of text will end up
as a blurry smooth line. You then apply some sort of algorithm to
basically do a regression analisys. There's lots of ways to do
that, and lots of examples on the net.
Edge detection might be useful, or it might cause more problems that its worth.
By the way, a gaussian blur can be implemented very efficiently if you search hard enough for the code. Otherwise, I'm sure there's lots of libraries available.
Haven't done much of that lately so don't have any links on hand.
But a search for Image Processing library will get you good results.
I'm assuming you're enjoying the fun of solving this, so not much in actual implementation detalis here.
Measuring the angle of every line seems like overkill, especially given the performance of GetPixel.
I wonder if you would have better performance luck by looking for a white triangle in the upper-left or upper-right corner (depending on the slant direction) and measuring the angle of the hypotenuse. All text should follow the same angle on the page, and the upper-left corner of a page won't get tricked by the descenders or whitespace of content above it.
Another tip to consider: rather than blurring, work within a greatly-reduced resolution. That will give you both the smoother data you need, and fewer GetPixel calls.
For example, I made a blank page detection routine once in .NET for faxed TIFF files that simply resampled the entire page to a single pixel and tested the value for a threshold value of white.
What are your constraints in terms of time?
The Hough transform is a very effective mechanism for determining the skew angle of an image. It can be costly in time, but if you're going to use Gaussian blur, you're already burning a pile of CPU time. There are also other ways to accelerate the Hough transform that involve creative image sampling.
Your latest output is confusing me a little.
When you superimposed the blue lines on the source image, did you offset it a bit? It looks like the blue lines are about 5 pixels above the centre of the text.
Not sure about that offset, but you definitely have a problem with the derived line "drifting" away at the wrong angle. It seems to have too strong a bias towards producing a horizontal line.
I wonder if increasing your mask window from 3 pixels (centre, one above, one below) to 5 might improve this (two above, two below). You'll also get this effect if you follow richardtallent's suggestion and resample the image smaller.
Very cool path finding application.
I wonder if this other approach would help or hurt with your particular data set.
Assume a black and white image:
Project all black pixels to the right (EAST). This should give a result of a one dimensional array with a size of IMAGE_HEIGHT. Call the array CANVAS.
As you project all the pixels EAST, keep track numerically of how many pixels project into each bin of CANVAS.
Rotate the image an arbitrary number of degrees and re-project.
Pick the result that gives the highest peaks and lowest valleys for values in CANVAS.
I imagine this will not work well if in fact you have to account for a real -45 -> +45 degrees of tilt. If the actual number is smaller(?+/- 10 degrees), this might be a pretty good strategy. Once you have an intial result, you could consider re-running with a smaller increment of degrees to fine tune the answer. I might therefore try to write this with a function that accepted a float degree_tick as a parm so I could run both a coarse and fine pass (or a spectrum of coarseness or fineness) with the same code.
This might be computationally expensive. To optimize, you might consider selecting just a portion of the image to project-test-rotate-repeat on.