Given a list of x,y coordinates and a known width & height how can the NUMBER of the enclosed areas be determined (in C#)?
For Example:
In this image 5 ENCLOSED AREAS are defined:
Face (1)
Eyes (2)
Nose (1)
Right of face (1)
The list of x,y points would be any pixel in black, including the mouth.
You can use this simple algorithm, based on idea of flood fill with helper bitmap:
// backColor is an INT representation of color at fillPoint in the beginning.
// result in pixels of enclosed shape.
private int GetFillSize(Bitmap b, Point fillPoint)
{
int count = 0;
Point p;
Stack pixels = new Stack();
var backColor = b.GetPixel(fillPoint.X, fillPoint.Y);
pixels.Push(fillPoint);
while (pixels.Count != 0)
{
count++;
p = (Point)pixels.Pop();
b.SetPixel(p.X, p.Y, backColor);
if (b.GetPixel(p.X - 1, p.Y).ToArgb() == backColor)
pixels.Push(new Point(p.X - 1, p.Y));
if (b.GetPixel(p.X, p.Y - 1).ToArgb() == backColor)
pixels.Push(new Point(p.X, p.Y - 1));
if (b.GetPixel(p.X + 1, p.Y).ToArgb() == backColor)
pixels.Push(new Point(p.X + 1, p.Y));
if (b.GetPixel(p.X, p.Y + 1).ToArgb() == backColor)
pixels.Push(new Point(p.X, p.Y + 1));
}
return count;
}
UPDATE
The code above works only this quadruply-linked enclosed areas. The following code works with octuply-linked enclosed areas.
// offset points initialization.
Point[] Offsets = new Point[]
{
new Point(-1, -1),
new Point(-0, -1),
new Point(+1, -1),
new Point(+1, -0),
new Point(+1, +1),
new Point(+0, +1),
new Point(-1, +1),
new Point(-1, +0),
};
...
private int Fill(Bitmap b, Point fillPoint)
{
int count = 0;
Point p;
Stack<Point> pixels = new Stack<Point>();
var backColor = b.GetPixel(fillPoint.X, fillPoint.Y).ToArgb();
pixels.Push(fillPoint);
while (pixels.Count != 0)
{
count++;
p = (Point)pixels.Pop();
b.SetPixel(p.X, p.Y, Color.FromArgb(backColor));
foreach (var offset in Offsets)
if (b.GetPixel(p.X + offset.X, p.Y + offset.Y).ToArgb() == backColor)
pixels.Push(new Point(p.X + offset.X, p.Y + offset.Y));
}
return count;
}
The picture below clearly demonstates what I mean. Also one could add more far points to offset array in order to able to fill areas with gaps.
I've had great success using OpenCV. There is a library for .net called Emgu CV
Here is a question covering alternatives to Emgu CV: .Net (dotNet) wrappers for OpenCV?
That library contains functions for identifying contours and finding properties about them. You can search for cvContourArea to find more information.
If you are looking for a quick solution to this specific problem and want to write your own code rather than reusing others, I do not have an algorithm I could give that does that. Sorry.
There are a couple special cases in the sample image. You would have to decide how to deal with them.
Generally you will start by converting the raster image into a series of polygons. Then it is a fairly trivial matter to calculate area (See Servy's comment)
The special cases would be the side of the face and the mouth. Both are open shapes, not closed. You need to figure out how to close them.
I think this comes down to counting the number of (non-black) pixels in each region. If you choose one pixel that is not black, add it to a HashSet<>, see if the pixels over, under, to the left of and to the right of your chosen pixel, are also non-black.
Everytime you find new non-black pixels (by going up/down/left/right), add them to your set. When you have found them all, count them.
Your region's area is count / (pixelWidthOfTotalDrawing * pixelHeightOfTotalDrawing) multiplied by the area of the full rectangle (depending on the units you want).
Comment: I don't think this looks like a polygon. That's why I was having the "fill with paint" function of simple drawing software on my mind.
Related
I have a data type representing the maximum forces that can be applied per-axis to a machine in a game I'm making. This data type individually stores six axes -- X, Y, and Z, both positive and negative separately. This is used to define the maximum forces the device can apply in a given direction in 3D space. Since all parts that can apply force to this machine are axis-aligned (and by extension, can only apply force on one or more world-space normals), this does a perfect job of representing its limits.
Using this information, I construct a sort of bounding box. This bounding box is intended to constrain the amount of force that is required to move the machine into a smaller value representing what the machine can actually output.
My current method is to create this virtual bounding box from this data and see where my required force (visualized as a line segment where the first point is the origin of 3D space and the second point is the required force vector itself) intersects at least one of the faces on this virtual rectangular prism. I haven't quite gotten this method working due to some issues understanding how to pick apart this problem on my end, which is part of what I aim to remedy by asking this question. The bounding box always contains the origin of 3D space, but the origin of the box itself is not guaranteed to be equal to the origin of 3D space.
I think this box analogy is the best way to go so far, given that my limited force is best represented as where a line intersects this virtual box. Is there a better way to constrain a point within a box in the manner I need without simply constraining it per-axis? Here's the constants of the problem:
The origin of the line segment representing my force will always be equal to the origin of 3D space.
As a result, the combined direction * magnitude of the line segment representing force will always be equal to the force itself.
The origin of 3D space will never be outside of this virtual box.
I think I have the right idea, but it just feels far too complicated and I think there's a lot of stuff I can cut out to make this calculation easier, but the issue is that I'm not quite certain of where to start to make this as clean and efficient as possible, let alone the best way to actually solve this problem.
Here's the code I've tried.
public Vector3f ConstrainedWithinScaled(Vector3f value) {
// Vector3f is a simple class that contains x, y, z, basic arithmetic operators (*/+-), Magnitude/Normalized properties, and Dot/Cross methods.
// The code that this method exists in is a class called DualVector3f which is the class described above. It contains properties PosX, PosY, PosZ, NegX, NegY, and NegZ -- All of these properties have positive values as they describe magnitude on that specific face.
if (IsInBounds(value)) {
// The input value is already within the constraints of the virtual box.
return value;
}
// Get all intersections
// The first returned intersection that resides on this box's surface is correct.
// Only in cases where the point reside on a corner or edge will result in multiple of these conditions being true, however in these cases the returned point will be identical in all 2 or 3 cases.
Vector3f center = Center;
Vector3f size = Size;
// Cache these so that I don't calculate them every single time.
// These are calculated from the minimum/maximum coordinates to create the center of the virtual box and the size of the virtual box respectively.
// public Vector3f Size => Negative + Positive; (Negative and Positive both only have positive components, as outlined up top)
// public Vector3f Center => Positive - (Size / 2); (Positive is always the maximum)
// As should be evident, Positive is composed of PosX, PosY, and PosZ
// Likewise, Negative is composed of NegX, NegY, and NegZ
Vector3f topIntersection = IntersectPoint(value, new Vector3f(0, 1, 0), center + new Vector3f(0, size.y, 0));
if (topIntersection.y == PosY) return topIntersection;
Vector3f bottomIntersection = IntersectPoint(value, new Vector3f(0, 1, 0), center - new Vector3f(0, size.y, 0));
if (bottomIntersection.y == -NegY) return bottomIntersection;
Vector3f leftIntersection = IntersectPoint(value, new Vector3f(-1, 0, 0), center - new Vector3f(size.x, 0, 0));
if (leftIntersection.x == -NegX) return leftIntersection;
Vector3f rightIntersection = IntersectPoint(value, new Vector3f(1, 0, 0), center + new Vector3f(size.x, 0, 0));
if (rightIntersection.x == PosX) return rightIntersection;
Vector3f frontIntersection = IntersectPoint(value, new Vector3f(0, 0, 1), center + new Vector3f(0, 0, size.z));
if (frontIntersection.z == PosZ) return frontIntersection;
Vector3f backIntersection = IntersectPoint(value, new Vector3f(0, 0, -1), center - new Vector3f(0, 0, size.z));
if (backIntersection.z == -NegZ) return backIntersection;
return new Vector3f(); // Fallback. This should theoretically never occur under any condition, so this simply satisfies the need to return.
}
// This was derived from https://rosettacode.org/wiki/Find_the_intersection_of_a_line_with_a_plane#C.23 and omits the "rayOrigin" parameter as this is always a zero vector.
private static Vector3f IntersectPoint(Vector3f dirWithMag, Vector3f planeNormal, Vector3f planeCenter) {
Vector3f diff = -planeCenter;
float prod1 = diff.Dot(planeNormal);
float prod2 = dirWithMag.Dot(planeNormal);
float prod3 = prod1 / prod2;
return dirWithMag * -prod3;
}
Thanks.
I found that the best way to solve this problem is to take the vector and find which component is furthest out of bounds. After I find this component, I divide all components by the percentage representing how far out of bounds it is, which scales everything down to be within my box. This satisfies the solution because this places the point on the boundary of my box.
Here's the code that does it:
public Vector3f ConstrainedWithinScaled(Vector3f value) {
if (IsInBounds(value)) {
// The input value is already within the constraints of the virtual box.
return value;
}
float factorX = 1;
float factorY = 1;
float factorZ = 1;
if (value.x > PosX) {
factorX = value.x / PosX;
} else if (value.x < -NegX) {
factorX = value.x / -NegX;
}
if (value.y > PosY) {
factorY = value.y / PosY;
} else if (value.y < -NegY) {
factorY = value.y / -NegY;
}
if (value.z > PosZ) {
factorZ = value.z / PosZ;
} else if (value.z < -NegZ) {
factorZ = value.z / -NegZ;
}
float largestFactor = Mathf.Max(factorX, factorY, factorZ);
// Catch case: Box has zero size.
if (largestFactor == 0) return Vector3f.Zero;
return value / largestFactor;
}
Actually my robot wants to move from source to target with obstacle avoidance. I find out the obstacle(rectangle shape) and Target(circle shape) in pixels. But i don't know how to find the path from source to target... Please help me.
Here is the code for finding obstacle and target.
for (int i = 0, n = blobs.Length; i < n; i++)
{
List<IntPoint> edgePoints = blobCounter.GetBlobsEdgePoints(blobs[i]);
AForge.Point center;
float radius;
// is circle ?
if (shapeChecker.IsCircle(edgePoints, out center, out radius))
{
g.DrawEllipse(whitePen, (float)(center.X - radius), (float)(center.Y - radius),
(float)(radius * 2), (float)(radius * 2));
target.Add(center.ToString());
}
else
{
List<IntPoint> corners;
// is triangle or quadrilateral
if (shapeChecker.IsConvexPolygon(edgePoints, out corners))
{
// get sub-type
PolygonSubType subType = shapeChecker.CheckPolygonSubType(corners);
Pen pen;
if (subType == PolygonSubType.Unknown)
{
pen = (corners.Count == 4) ? redPen : bluePen;
}
else
{
pen = (corners.Count == 4) ? greenPen : brownPen;
}
g.DrawPolygon(pen, ToPointsArray(corners));
}
}
}
This above coding will detect obstacle and target position pixel values and store it in a seperate array. But from these pixel values how to calculate the path? Waiting for ur suggestions.....
Trying looking up the A* search algorithm.
I have not looked into your code but it is a classic path finding problem. One suggestion could be to map the entire area the robot moves onto a grid. The grid can have discrete cells. And then you can use any graph search algorithm to find a path from start cell to goal cell.
You can use few of the algorithms, like Dijkistra, Best-first and A-Star search algorithms. It turns out that A-Star is efficient and easy to implement. Check this, contains a nice explanation about A-Star.
I have certain objects in a 2D plane.I'm using a drawing techonology(drawing visuals) which draws the elements like it pushes them on a stack,first element is on the bottom second - on top of it and so on.Now,the problem is that I need all objects, except one of them(the background), on the same Z level because in the current state that my program is it happens to rotate in a sort of 3D way everything and it is supposed to rotate it in a 2D way.I understand that this explanation is NOT good that's why please refer to the images below.
Before rotating by theta angle :
After rotating by a theta angle :
You can see how the two lines start to overlap and this mustn't happen.They do get closer to each other as I rotate the figure and there's a certain angle where the two lines become fully overlapped and they look like one line.I want to avoid that.
The forumlae that I use for the rotation:
foreach(var item in Visuals){
var p = new Point(item.Position.X - center.X, item.Position.Y - center.Y);
var xnew = p.X * cos - p.Y * sin;
var ynew = p.X * sin + p.Y * cos;
p.X = xnew + center.X;
p.Y = ynew + center.Y;
item.Update(p.X, p.Y);
}
This is how I get the sin and cos of the angle
var pos = new Point(position.Y - center.Y, position.X - center.X);
var rad = Math.Atan2(pos.Y, pos.X);
var deg = rad.ToDegrees();
var diff = RotationLastAngle - deg;//The last angle that we rotated to.
RotationLastAngle = deg;
var ans = diff.ToRadians();
Host.Representation.Rotate(Math.Cos(ans), Math.Sin(ans), center);
Update() basically sets the coordinates of item in a single line.
What I think is causing the issue is that the DrawingVisual renders items on layers and thus one of the lines is higher than the other one(Correct me if I'm wrong).I need to find a way to avoid this.
This is how I draw the lines :
var dx = FromAtom.Atom.X - ToAtom.Atom.X;
var dy = FromAtom.Atom.Y - ToAtom.Atom.Y;
var slope = dy / dx;
if (slope > 0)
{
context.DrawLine(new Pen(Brushes.Black, Thickness), new Point(FromAtom.Position.X + 3, FromAtom.Position.Y + 3), new Point(ToAtom.Position.X + 3, ToAtom.Position.Y + 3));
context.DrawLine(new Pen(Brushes.Black, Thickness), new Point(FromAtom.Position.X - 3, FromAtom.Position.Y - 3), new Point(ToAtom.Position.X - 3, ToAtom.Position.Y - 3));
}
else
{
context.DrawLine(new Pen(Brushes.Black, Thickness), new Point(FromAtom.Position.X + 3, FromAtom.Position.Y - 3), new Point(ToAtom.Position.X + 3, ToAtom.Position.Y - 3));
context.DrawLine(new Pen(Brushes.Black, Thickness), new Point(FromAtom.Position.X - 3, FromAtom.Position.Y + 3), new Point(ToAtom.Position.X - 3, ToAtom.Position.Y + 3));
}
Taken from Adam Nathan's WPF 4.5 Unleashed says : "
Later drawings are placed on top of earlier drawings,so they preserve proper Z ordering
which refers to GeometryDrawing but I think this holds for drawing lines too.
In your case the result fails because you are offsetting the lines by 3 pixels vertically or horizontally with no consideration of the actual orientation of the line. The offset vector needs to be rotated with the line in order to achieve this effect. The problem then is that the lines will not meet together and have gaps or overlaps. To solve that problem you will need some math.
Drawing a parallel lines (or polygon) is not a trivial problem. See this answer for how to do it, but it your case you might be able to use compound lines.
Parallel Line Example
I am trying to draw a rectangular object that allows the user to click on a corner-point to resize and also rotate the rectangle in a 2D space.
Therefore I am using an array of four points ordered A, B, C, D (or 0, 1, 2, 3) from top-left to bottom-left in clockwise order.
The rotation works fine, I calculate the center point and rotate each point around it by an angle.
The resizing is done by determining which point was pressed down, and then setting its new position to the position of the mouse on each MouseMove event. The two adjacent points then need to be updated to stay in a rectangular shape. The resizing is intermittently not working. I have tried many ways to error-check, but all leave me with the same problem where if I move the mouse back and forth over the opposing point while moving a point, the points get distorted and are no longer a rectangular shape.
SOURCE CODE HERE
https://www.assembla.com/code/moozhe-testing/subversion/nodes/rotateRectangle
EXERPT OF PROBLEM CODE
private void MovePoint(int id, PointF newPoint)
{
PointF oldPoint = points[id];
PointF delta = newPoint.Substract(oldPoint);
PointF pointPrevious = points[(id + 3) % 4];
PointF pointNext = points[(id + 1) % 4];
PointF sidePrevious = pointPrevious.Substract(oldPoint);
PointF sideNext = pointNext.Substract(oldPoint);
PointF previousProjection = Projection(delta, sidePrevious);
PointF nextProjection = Projection(delta, sideNext);
pointNext = pointNext.AddPoints(previousProjection);
pointPrevious = pointPrevious.AddPoints(nextProjection);
points[(id + 3) % 4] = pointPrevious;
points[(id + 1) % 4] = pointNext;
points[id] = newPoint;
}
private PointF Projection(PointF vectorA, PointF vectorB)
{
PointF vectorBUnit = new PointF(vectorB.X, vectorB.Y);
vectorBUnit = vectorBUnit.Normalize();
float dotProduct = vectorA.X * vectorBUnit.X + vectorA.Y * vectorBUnit.Y;
return vectorBUnit.MultiplyByDecimal(dotProduct);
}
It sounds like you might want to be using a transformation matrix, instead of updating X/Y coordinates manually. Please check out this link:
Comparing GDI mapping modes with GDI+ transforms
Here's the MSDN reference:
https://learn.microsoft.com/en-us/dotnet/api/system.drawing.graphics.transform
I'm trying to write a program to programmatically determine the tilt or angle of rotation in an arbitrary image.
Images have the following properties:
Consist of dark text on a light background
Occasionally contain horizontal or vertical lines which only intersect at 90 degree angles.
Skewed between -45 and 45 degrees.
See this image as a reference (its been skewed 2.8 degrees).
So far, I've come up with this strategy: Draw a route from left to right, always selecting the nearest white pixel. Presumably, the route from left to right will prefer to follow the path between lines of text along the tilt of the image.
Here's my code:
private bool IsWhite(Color c) { return c.GetBrightness() >= 0.5 || c == Color.Transparent; }
private bool IsBlack(Color c) { return !IsWhite(c); }
private double ToDegrees(decimal slope) { return (180.0 / Math.PI) * Math.Atan(Convert.ToDouble(slope)); }
private void GetSkew(Bitmap image, out double minSkew, out double maxSkew)
{
decimal minSlope = 0.0M;
decimal maxSlope = 0.0M;
for (int start_y = 0; start_y < image.Height; start_y++)
{
int end_y = start_y;
for (int x = 1; x < image.Width; x++)
{
int above_y = Math.Max(end_y - 1, 0);
int below_y = Math.Min(end_y + 1, image.Height - 1);
Color center = image.GetPixel(x, end_y);
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);
if (IsWhite(center)) { /* no change to end_y */ }
else if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
}
decimal slope = (Convert.ToDecimal(start_y) - Convert.ToDecimal(end_y)) / Convert.ToDecimal(image.Width);
minSlope = Math.Min(minSlope, slope);
maxSlope = Math.Max(maxSlope, slope);
}
minSkew = ToDegrees(minSlope);
maxSkew = ToDegrees(maxSlope);
}
This works well on some images, not so well on others, and its slow.
Is there a more efficient, more reliable way to determine the tilt of an image?
I've made some modifications to my code, and it certainly runs a lot faster, but its not very accurate.
I've made the following improvements:
Using Vinko's suggestion, I avoid GetPixel in favor of working with bytes directly, now the code runs at the speed I needed.
My original code simply used "IsBlack" and "IsWhite", but this isn't granular enough. The original code traces the following paths through the image:
http://img43.imageshack.us/img43/1545/tilted3degtextoriginalw.gif
Note that a number of paths pass through the text. By comparing my center, above, and below paths to the actual brightness value and selecting the brightest pixel. Basically I'm treating the bitmap as a heightmap, and the path from left to right follows the contours of the image, resulting a better path:
http://img10.imageshack.us/img10/5807/tilted3degtextbrightnes.gif
As suggested by Toaomalkster, a Gaussian blur smooths out the height map, I get even better results:
http://img197.imageshack.us/img197/742/tilted3degtextblurredwi.gif
Since this is just prototype code, I blurred the image using GIMP, I did not write my own blur function.
The selected path is pretty good for a greedy algorithm.
As Toaomalkster suggested, choosing the min/max slope is naive. A simple linear regression provides a better approximation of the slope of a path. Additionally, I should cut a path short once I run off the edge of the image, otherwise the path will hug the top of the image and give an incorrect slope.
Code
private double ToDegrees(double slope) { return (180.0 / Math.PI) * Math.Atan(slope); }
private double GetSkew(Bitmap image)
{
BrightnessWrapper wrapper = new BrightnessWrapper(image);
LinkedList<double> slopes = new LinkedList<double>();
for (int y = 0; y < wrapper.Height; y++)
{
int endY = y;
long sumOfX = 0;
long sumOfY = y;
long sumOfXY = 0;
long sumOfXX = 0;
int itemsInSet = 1;
for (int x = 1; x < wrapper.Width; x++)
{
int aboveY = endY - 1;
int belowY = endY + 1;
if (aboveY < 0 || belowY >= wrapper.Height)
{
break;
}
int center = wrapper.GetBrightness(x, endY);
int above = wrapper.GetBrightness(x, aboveY);
int below = wrapper.GetBrightness(x, belowY);
if (center >= above && center >= below) { /* no change to endY */ }
else if (above >= center && above >= below) { endY = aboveY; }
else if (below >= center && below >= above) { endY = belowY; }
itemsInSet++;
sumOfX += x;
sumOfY += endY;
sumOfXX += (x * x);
sumOfXY += (x * endY);
}
// least squares slope = (NΣ(XY) - (ΣX)(ΣY)) / (NΣ(X^2) - (ΣX)^2), where N = elements in set
if (itemsInSet > image.Width / 2) // path covers at least half of the image
{
decimal sumOfX_d = Convert.ToDecimal(sumOfX);
decimal sumOfY_d = Convert.ToDecimal(sumOfY);
decimal sumOfXY_d = Convert.ToDecimal(sumOfXY);
decimal sumOfXX_d = Convert.ToDecimal(sumOfXX);
decimal itemsInSet_d = Convert.ToDecimal(itemsInSet);
decimal slope =
((itemsInSet_d * sumOfXY) - (sumOfX_d * sumOfY_d))
/
((itemsInSet_d * sumOfXX_d) - (sumOfX_d * sumOfX_d));
slopes.AddLast(Convert.ToDouble(slope));
}
}
double mean = slopes.Average();
double sumOfSquares = slopes.Sum(d => Math.Pow(d - mean, 2));
double stddev = Math.Sqrt(sumOfSquares / (slopes.Count - 1));
// select items within 1 standard deviation of the mean
var testSample = slopes.Where(x => Math.Abs(x - mean) <= stddev);
return ToDegrees(testSample.Average());
}
class BrightnessWrapper
{
byte[] rgbValues;
int stride;
public int Height { get; private set; }
public int Width { get; private set; }
public BrightnessWrapper(Bitmap bmp)
{
Rectangle rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
System.Drawing.Imaging.BitmapData bmpData =
bmp.LockBits(rect,
System.Drawing.Imaging.ImageLockMode.ReadOnly,
bmp.PixelFormat);
IntPtr ptr = bmpData.Scan0;
int bytes = bmpData.Stride * bmp.Height;
this.rgbValues = new byte[bytes];
System.Runtime.InteropServices.Marshal.Copy(ptr,
rgbValues, 0, bytes);
this.Height = bmp.Height;
this.Width = bmp.Width;
this.stride = bmpData.Stride;
}
public int GetBrightness(int x, int y)
{
int position = (y * this.stride) + (x * 3);
int b = rgbValues[position];
int g = rgbValues[position + 1];
int r = rgbValues[position + 2];
return (r + r + b + g + g + g) / 6;
}
}
The code is good, but not great. Large amounts of whitespace cause the program to draw relatively flat line, resulting in a slope near 0, causing the code to underestimate the actual tilt of the image.
There is no appreciable difference in the accuracy of the tilt by selecting random sample points vs sampling all points, because the ratio of "flat" paths selected by random sampling is the same as the ratio of "flat" paths in the entire image.
GetPixel is slow. You can get an order of magnitude speed up using the approach listed here.
If text is left (right) aligned you can determine the slope by measuring the distance between the left (right) edge of the image and the first dark pixel in two random places and calculate the slope from that. Additional measurements would lower the error while taking additional time.
First I must say I like the idea. But I've never had to do this before and I'm not sure what all to suggest to improve reliability. The first thing I can think of this is this idea of throwing out statistical anomalies. If the slope suddenly changes sharply then you know you've found a white section of the image that dips into the edge skewing (no pun intended) your results. So you'd want to throw that stuff out somehow.
But from a performance standpoint there are a number of optimizations you could make which may add up.
Namely, I'd change this snippet from your inner loop from this:
Color center = image.GetPixel(x, end_y);
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);
if (IsWhite(center)) { /* no change to end_y */ }
else if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
To this:
Color center = image.GetPixel(x, end_y);
if (IsWhite(center)) { /* no change to end_y */ }
else
{
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);
if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
}
It's the same effect but should drastically reduce the number of calls to GetPixel.
Also consider putting the values that don't change into variables before the madness begins. Things like image.Height and image.Width have a slight overhead every time you call them. So store those values in your own variables before the loops begin. The thing I always tell myself when dealing with nested loops is to optimize everything inside the most inner loop at the expense of everything else.
Also... as Vinko Vrsalovic suggested, you may look at his GetPixel alternative for yet another boost in speed.
At first glance, your code looks overly naive.
Which explains why it doesn't always work.
I like the approach Steve Wortham suggested,
but it might run into problems if you have background images.
Another approach that often helps with images is to blur them first.
If you blur your example image enough, each line of text will end up
as a blurry smooth line. You then apply some sort of algorithm to
basically do a regression analisys. There's lots of ways to do
that, and lots of examples on the net.
Edge detection might be useful, or it might cause more problems that its worth.
By the way, a gaussian blur can be implemented very efficiently if you search hard enough for the code. Otherwise, I'm sure there's lots of libraries available.
Haven't done much of that lately so don't have any links on hand.
But a search for Image Processing library will get you good results.
I'm assuming you're enjoying the fun of solving this, so not much in actual implementation detalis here.
Measuring the angle of every line seems like overkill, especially given the performance of GetPixel.
I wonder if you would have better performance luck by looking for a white triangle in the upper-left or upper-right corner (depending on the slant direction) and measuring the angle of the hypotenuse. All text should follow the same angle on the page, and the upper-left corner of a page won't get tricked by the descenders or whitespace of content above it.
Another tip to consider: rather than blurring, work within a greatly-reduced resolution. That will give you both the smoother data you need, and fewer GetPixel calls.
For example, I made a blank page detection routine once in .NET for faxed TIFF files that simply resampled the entire page to a single pixel and tested the value for a threshold value of white.
What are your constraints in terms of time?
The Hough transform is a very effective mechanism for determining the skew angle of an image. It can be costly in time, but if you're going to use Gaussian blur, you're already burning a pile of CPU time. There are also other ways to accelerate the Hough transform that involve creative image sampling.
Your latest output is confusing me a little.
When you superimposed the blue lines on the source image, did you offset it a bit? It looks like the blue lines are about 5 pixels above the centre of the text.
Not sure about that offset, but you definitely have a problem with the derived line "drifting" away at the wrong angle. It seems to have too strong a bias towards producing a horizontal line.
I wonder if increasing your mask window from 3 pixels (centre, one above, one below) to 5 might improve this (two above, two below). You'll also get this effect if you follow richardtallent's suggestion and resample the image smaller.
Very cool path finding application.
I wonder if this other approach would help or hurt with your particular data set.
Assume a black and white image:
Project all black pixels to the right (EAST). This should give a result of a one dimensional array with a size of IMAGE_HEIGHT. Call the array CANVAS.
As you project all the pixels EAST, keep track numerically of how many pixels project into each bin of CANVAS.
Rotate the image an arbitrary number of degrees and re-project.
Pick the result that gives the highest peaks and lowest valleys for values in CANVAS.
I imagine this will not work well if in fact you have to account for a real -45 -> +45 degrees of tilt. If the actual number is smaller(?+/- 10 degrees), this might be a pretty good strategy. Once you have an intial result, you could consider re-running with a smaller increment of degrees to fine tune the answer. I might therefore try to write this with a function that accepted a float degree_tick as a parm so I could run both a coarse and fine pass (or a spectrum of coarseness or fineness) with the same code.
This might be computationally expensive. To optimize, you might consider selecting just a portion of the image to project-test-rotate-repeat on.