C# Moving through an image in a spiral motion? - c#

I want to move through the pixels of an image, not by going line by line, column by column in the "normal" way. But begin at the center pixel and going outward in a spiral motion. But I'm not sure how to do this.
Any suggestions on how this can be done?

You can do this by using parametric functions, function for radius is r(t) = R, and x(t) = Rcos(t) and y(t)=Rsin(t).
Do you mean something like this?

It would be helpful to think about this in reverse.
For example, starting at the top left corner and moving in a clockwise direction you would move along the top row, then down the right hand side, along the bottom, and up the left edge to the pixel under the starting point.
Then move along the second row, and continue in a spiral.
Depending on the dimensions of the image you will end up with either a single column of pixels or a single row of pixels and will be moving either up/down or left/right.
From this finishing point you can then follow your steps backwards and process all the pixels as you need to.
To work out your starting position mathematically you would need to know the width/height of the image as well as which pixel you would like to end on and the direction you want to be travelling in when you get to the last pixel.

Something like this should do it:
int x = width / 2;
int y = height / 2;
int left = width * height;
int dir = 0;
int cnt = 1;
int len = 2;
int[] move = { 1, 0, -1, 0, 1 };
while (left > 0) {
if (x >= 0 && x < width && y >= 0 && y < height) {
// here you do something with the pixel at x,y
left--;
}
x += move[dir % 4];
y += move[(dir % 4) + 1];
if (--cnt == 0) {
cnt = len++ / 2;
dir++;
}
}
If the image is not square, the spiral will continue outside the coordinates of the image until the entire image has been covered. The condition in the if statement makes sure that only coordinates that are part of the image are processed.

Related

Find the direction of one rectangle relative to other

Problem
How can i find the direction of one rectangle w.r.t to the other. The directions i am interested is up, down, left, and right. My rectangle is represented by a Cell class. I am trying to write a function in that cell class. Function accepts a parameter of Cell type the returns the direction either 1(up) 2(down) 3(left), or 4(right) of the passed cell w.r.t to the calling cell.
What i tried
I found the mid point of both the rectangles, and then compared the x, and y coordinates. But this technique is not working in all the cases. whenever i find a missing case, i have to include more and more if statements which i think is not a good programming practice. its becoming more and more error prone and difficult to understand.
While searching for the solution: maybe math.atan2() can work in my case. Maybe i can find the angle between mid points of these 2 rectangles and use the value of angle to determine the direction. But i am not sure if my thinking is correct.
Please guide me. Should i keep using my function and rectify it, or is there a better solution such as math.atan2()? A helping image for better understanding and a required solution is demonstrated below after the code.
Code
public int dirOfThisCell(Cell cell)
{
int dir = 0;
//find mid points of both cells
PointF midPointThis = this.computeAndGetMidPoint();
PointF midPointCell = cell.computeAndGetMidPoint();
//MessageBox.Show(mess);
//if x of both points is same or with little variance because of variance in sizes of cells
===>> //Comparison Starts!!
if (midPointThis.X > midPointCell.X)
{
//this cell is to the right.
if ((midPointCell.Y) == (midPointThis.Y))
{
dir = 3;
}
else if (Math.Abs(midPointCell.Y) - Math.Abs(midPointThis.Y) < 5) { dir = 3; }
else if (Math.Abs(midPointCell.Y) - Math.Abs(midPointThis.Y) > 5) {
if (midPointThis.Y > midPointCell.Y) { dir = 1; }
else if (midPointThis.Y < midPointCell.Y) dir = 2;
}
// a considerable difference
else { dir = 3; }
//some small variations in y
//else if(Math.Abs()
}
else if (midPointThis.X < midPointCell.X)
{
// this cell is to the left
if ((midPointCell.Y) == (midPointThis.Y))
{
dir = 4;
}
else if (Math.Abs(midPointCell.Y) - Math.Abs(midPointThis.Y) <= 10)
{
dir = 4;
}
}
//if this cell is below
else if (midPointThis.Y > midPointCell.Y)
{
//this cell is down than the cell
if ((midPointCell.X) == (midPointThis.X))
{
dir = 1;
}
//else if (Math.Abs(midPointCell.X) - Math.Abs(midPointThis.X) < 2) { dir = 1; }
}
else if (midPointThis.Y < midPointCell.Y)
{
if ((midPointCell.X) == (midPointThis.X))
{
dir = 2;
}
}
return dir;
}
Image
Sample Image
Sample rectangles are shown in the picture number wise. The rectangle can be of single cell or made by combining multiple numbered cells.
Sample Solution required
Direction of cell 18 w.r.t to 8 should be up(1)
Direction of cell 18 w.r.t to 10 should be up(1)
Direction of cell 14 w.r.t to 13 should be right(4)
Direction of cell 9 w.r.t to 31 should be down(1)
Direction of cell 15 w.r.t to 9 should be left(3)
I am working in c#.
Any help would be much appreciated.
Thank You
Calculating the angle of the center of a rectangle compared to the center of another does not seem to be a very good idea, because the center of those rectangles are only telling you where their center is and the solution is completely reluctant to the width, height and direction of the sides. I know that the third one is not a concern in your specific case as you can safely assume that the sides are horizontal XOR vertical, but in general terms, that could be an issue as well. To calculate the relative position of Shape1 (which is a rectangle in our particular case) compared to Shape2, you need to calculate the minimum and maximum x and y for both.
Shape1 is to the left of Shape2 <=> Shape1.maxX <= Shape2.minX
Shape1 is to the right of Shape2 <=> Shape2.minX >= Shape2.maxX
Shape1 is above Shape2 <=> Shape1.maxY <= Shape2.minY
Shape2 is below Shape2 <=> Shape2.minY >= Shape1.maxY

Rasterizing image with colored fixed-width chars

uI am making a program to turn an image into coloured 0's, the problem is that the 0's are not colouring properly. To get anything near resembling the image I have to start my for loop at 2 and increase by 3 each time. The following is my current code:
public partial class MainWindow : Window
{
public MainWindow()
{
TextSelection textRange;
TextPointer start;
TextPointer startPos;
TextPointer endPos;
System.Drawing.Color x;
int pixelX = 3;
int pixelY = 8;
InitializeComponent();
Bitmap b = new Bitmap(#"E:\Documents\Visual Studio 2015\Projects\RichTextBox Image to ASCII\RichTextBox Image to ASCII\Akarin.jpg");
for (int i = 2; i < 8000; i += 3)
{
textRange = richTextBox1.Selection;
start = richTextBox1.Document.ContentStart;
startPos = start.GetPositionAtOffset(i);
endPos = start.GetPositionAtOffset(i + 1);
textRange.Select(startPos, endPos);
x = b.GetPixel(pixelX, pixelY);
textRange.ApplyPropertyValue(TextElement.ForegroundProperty, new SolidColorBrush(System.Windows.Media.Color.FromArgb(x.A, x.R, x.G, x.B)));
pixelX += 6;
if (pixelX > 1267)
{
pixelX = 3;
pixelY += 16;
}
i += 3;
textRange = richTextBox1.Selection;
start = richTextBox1.Document.ContentStart;
startPos = start.GetPositionAtOffset(i);
endPos = start.GetPositionAtOffset(i + 1);
textRange.Select(startPos, endPos);
x = b.GetPixel(pixelX, pixelY);
textRange.ApplyPropertyValue(TextElement.ForegroundProperty, new SolidColorBrush(System.Windows.Media.Color.FromArgb(x.A, x.R, x.G, x.B)));
pixelX += 7;
if (pixelX > 1267)
{
pixelX = 3;
pixelY += 16;
}
}
}
}
The reason that I am putting the code in the for loop twice is because when you take the amount of 0's that fit horizontally and find out how many pixels each 0 takes up, it comes to about 6.5 because of the space between each 0.
EDIT: Something else that is also strange, if you look in the top left corner where it starts colouring the 0's, 4 in a row are properly coloured, but then the rest are coloured every other.
A few serious problems I'm seeing here. Normally when rasterizing you either loop through the source pixels or through the target pixels. You however... you loop by a fixed value of roughly 2666 ((8000 - 2) / 3). It's also a very bad idea to do things twice in a loop and even change the loop variable (i). Furthermore since you're having only one loop you have to care about both axes in one run. This is very error prone.
How about this approach?:
Your source image is 1280 × 720 square pixels
Since your zeros are not square you have to know their aspect ratio. If you know that you can calculate how many rows and columns you need. You probably don't want to match them 1:1 as this would give you a huge and stretched image.
Once you know how many rows and columns you need, do two loops, one inside the other and call the loop variables targetX and targetY
If your target image is supposed to be let's say 400 zeroes long in the x-axis, make the first loop go from 1 to 400
Inside the loop pick one pixel (color) from the source at 1280/400 * targetX. Your first target pixel would be at x position 1280/400 * 1 = 3,2 which is roughly 3 (round the number after calculating it). The second would be 1280/400 * 2 = 6 and so on. I think this is the biggest pain in your algorithm since you're trying to get around the 6,5px width. Just round it after calculating! If the first is 6,5, make it 7, the second is 13... you get the idea.
Same logic goes for Y axis, but you handle this with targetY.

Select pixels in a hexagon around a center point

I would like to implement a simple function into my code to get an list of pixel coordinates which are in a (hypothetical) hexagon of a certain size around a center point (x,y or also linear RGBA byte array, but I can convert later).
Maybe there's a simple solution I have not thought about. Could you think of a neat way to implement this?
All you really need is a list of pixels in one quadrant of the hexagon. Then you can simply "reflect" the x and y coordinates to get the full hexagon (subject to screen bounds of course).
First, I would make the assertion that I want one of my hexagon sides to be horizontally aligned. I would also make the assumption that by "size of my hexagon" I mean the length (let's call is L) of the vertical line from the center of my hexagon to the bottom side (which is horizontally aligned). Then, I would do algebra and trig on a hexagon of this alignment with size L, assuming that the origin (0,0) is my center point.
I know, then that all of the points within the bounds [0,0,L/sqrt(3),L] (that is [x-offset, y-offset, width, height]) are definitely within my hexagon. So add all these points to my list.
List<Point> pointsInHexagonQuadrant = new List<Point>();
for (int i = 0; i < L/Math.Sqrt(3); i++) //I'm ignoring any casting, you may have to fix.
{
for (int j = 0; j <= L; j++)
{
pointsInHexagonQuadrant.Add(new Point(i,j));
}
}
I know by trig and algebra that the right-most point of my hexagon is at (2*L/sqrt(3),0) and from L/sqrt(3) to 2*L/sqrt(3) the equation of the hexagon's sloped side is y=sqrt(3)*x-2*L. I want all the points whose y coordinate is less than that.
for(int i = L/Math.Sqrt(3); i <= 2*L/Math.Sqrt(3); i++)
{
for (int j = 0; j <= Math.Sqrt(3)*i-2*L; j++)
{
pointsInHexagonQuadrant.Add(new Point(i,j));
}
}
Add this point you have one quadrant of the hexagon, like this:
(0,0) (2L/sqrt(3),0)
---------------
| /
| /
| /
| /
| /
|-------/
(0,L) (L/sqrt(3),L)
To get the full hexagon, you "reflect" across the x and y axes...
List<Point> pointsInMyHexagon = new List<Point>();
foreach (Point p in pointsInHexagonQuadrant)
{
pointsInMyHexagon.Add(new Point(p.X,p.Y));
pointsInMyHexagon.Add(new Point(-p.X,p.Y));
pointsInMyHexagon.Add(new Point(p.X,-p.Y));
pointsInMyHexagon.Add(new Point(-p.X,-p.Y));
}
Now offset the hexagon to put the center back on your (x,y) point.
foreach (Point p in pointsInMyHexagon)
{
p.Offset(myCenterPoint.X, myCenterPoint.Y);
}
It might be crude but the concept should work.

Is there an efficient algorithm for segmentation of handwritten text?

I want to automatically divide an image of ancient handwritten text by lines (and by words in future).
The first obvious part is preprocessing the image...
I'm just using a simple digitization (based on brightness of pixel). After that I store data into two-dimensional array.
The next obvious part is analyzing the binary array.
My first algorithm was pretty simple - if there are more black pixels in a row of the array than the root-mean-square of Maximum and Minimum value, then this row is part of line.
After forming the list of lines I cut off lines with height that is less than average.
Finally it turned out into some kind of linear regression, trying to minimize the difference between the blank rows and text rows. (I assumed that fact)
My second attempt - I tried to use GA with several fitness functions.
The chromosome contained 3 values - xo, x1, x2. xo [-1;0] x1 [0;0.5] x2 [0;0.5]
Function, that determines identity the row to line is (xo + α1 x1 + α2 x2) > 0, where α1 is scaled sum of black pixels in row, α2 is median value of ranges between the extreme black pixels in row. (a1,a2 [0,1])
Another functions, that I tried is (x1 < α1 OR x2 > α2) and (1/xo + [a1 x1] / [a2 x2] ) > 0
The last function is the most efficient.
The fitness function is
(1 / (HeigthRange + SpacesRange)
Where range is difference between maximum and minimum. It represents the homogeneity of text. The global optimum of this function - the most smooth way to divide the image into lines.
I am using C# with my self-coded GA (classical, with 2-point crossover, gray-code chromosomes, maximum population is 40, mutation rate is 0.05)
Now I ran out of ideas how to divide this image into lines with ~100% accuracy.
What is the efficient algorithm to do this?
UPDATE:
Original BMP (1.3 MB)
UPDATE2:
Improved results on this text to 100%
How I did it:
fixed minor bug in range count
changed fitness function to 1/(distancesRange+1)*(heightsRange+1))
minimized classifying function to (1/xo + x2/range) > 0 (points in row now don't affect classification)
(i.e. optimized input data and made fitness function optimizations more explicit)
Problem:
GA surprisingly failed to recognize this line. I looked at debug data of 'find rages' function and found, that there is too much noise in 'unrecognized' place.
The function code is below:
public double[] Ranges()
{
var ranges = new double[_original.Height];
for (int y = 0; y < _original.Height; y++ )
{
ranges[y] = 0;
var dx = new List<int>();
int last = 0;
int x = 0;
while (last == 0 && x<_original.Width)
{
if (_bit[x, y])
last = x;
x++;
}
if (last == 0)
{
ranges[y] = 0;
continue;
}
for (x = last; x<_original.Width; x++)
{
if (!_bit[x, y]) continue;
if (last != x - 1)
{
dx.Add((x-last)+1);
}
last = x;
}
if (dx.Count > 2)
{
dx.Sort();
ranges[y] = dx[dx.Count / 2];
//ranges[y] = dx.Average();
}
else
ranges[y] = 0;
}
var maximum = ranges.Max();
for (int i = 0; i < ranges.Length; i++)
{
if (Math.Abs(ranges[i] - 0) < 0.9)
ranges[i] = maximum;
}
return ranges;
}
I'm using some hacks in this code. The main reason - I want to minimize the range between nearest black pixels, but if there are no pixels, the value becomes '0', and it becomes impossible to solve this problem with finding optimas. The second reason - this code is changing too frequently.
I'll try to fully change this code, but I have no idea how to do it.
Q:
If there is more efficient fitness function?
How to find more versatile determination function?
Although I'm not sure how to translate the following algorithm into GA (and I'm not sure why you need to use GA for this problem), and I could be off base in proposing it, here goes.
The simple technique I would propose is to count the number of black pixels per row. (Actually it's the dark pixel density per row.) This requires very few operations, and with a few additional calculations it's not difficult to find peaks in the pixel-sum histogram.
A raw histogram will look something like this, where the profile along the left side shows the number of dark pixels in a row. For visibility, the actual count is normalized to stretch out to x = 200.
After some additional, simple processing is added (described below), we can generate a histogram like this that can be clipped at some threshold value. What remains are peaks indicating the center of lines of text.
From there it's a simple matter to find the lines: just clip (threshold) the histogram at some value such as 1/2 or 2/3 the maximum, and optionally check that the width of the peak at your clipping threshold is some minimum value w.
One implementation of the full (yet still simple!) algorithm to find the nicer histogram is as follows:
Binarize the image using a "moving average" threshold or similar local thresholding technique in case a standard Otsu threshold operating on pixels near edges isn't satisfactory. Or, if you have a nice black-on-white image, just use 128 as your binarization threshold.
Create an array to store your histogram. This array's length will be the height of the image.
For each pixel (x,y) in the binarized image, find the number of dark pixels above and below (x,y) at some radius R. That is, count the number of dark pixels from (x, y - R) to x (y + R), inclusive.
If the number of dark pixels within a vertical radius R is equal or greater to R--that is, at least half the pixels are dark--then pixel (x,y) has sufficient vertical dark neighbors. Increment your bin count for row y.
As you march along each row, track the leftmost and rightmost x-values for pixels with sufficient neighbors. As long as the width (right - left + 1) exceeds some minimum value, divide the total count of dark pixels by this width. This normalizes the count to ensure the short lines like the very last line of text are included.
(Optional) Smooth the resulting histogram. I just used the mean over 3 rows.
The "vertical count" (step 3) eliminates horizontal strokes that happen to be located above or below the center line of text. A more sophisticated algorithm would just check directly above and below (x,y), but also to the upper left, upper right, lower left, and lower right.
With my rather crude implementation in C# I was able to process the image in less than 75 milliseconds. In C++, and with some basic optimization, I've little doubt the time could be cut down considerably.
This histogram method assumes the text is horizontal. Since the algorithm is reasonably fast, you may have enough time to calculate pixel count histograms at increments of every 5 degrees from the horizontal. The scan orientation with the greatest peak/valley differences would indicate the rotation.
I'm not familiar with GA terminology, but if what I've suggested is of some value I'm sure you can translate it into GA terms. In any case, I was interested in this problem anyway, so I might as well share.
EDIT: maybe for use GA, it's better to think in terms of "distance since previous dark pixel in X" (or along angle theta) and "distance since previous dark pixel in Y" (or along angle [theta - pi/2]). You might also check distance from white pixel to dark pixel in all radial directions (to find loops).
byte[,] arr = get2DArrayFromBitamp(); //source array from originalBitmap
int w = arr.GetLength(0); //width of 2D array
int h = arr.GetLength(1); //height of 2D array
//we can use a second 2D array of dark pixels that belong to vertical strokes
byte[,] bytes = new byte[w, h]; //dark pixels in vertical strokes
//initial morph
int r = 4; //radius to check for dark pixels
int count = 0; //number of dark pixels within radius
//fill the bytes[,] array only with pixels belonging to vertical strokes
for (int x = 0; x < w; x++)
{
//for the first r rows, just set pixels to white
for (int y = 0; y < r; y++)
{
bytes[x, y] = 255;
}
//assume pixels of value < 128 are dark pixels in text
for (int y = r; y < h - r - 1; y++)
{
count = 0;
//count the dark pixels above and below (x,y)
//total range of check is 2r, from -r to +r
for (int j = -r; j <= r; j++)
{
if (arr[x, y + j] < 128) count++;
}
//if half the pixels are dark, [x,y] is part of vertical stroke
bytes[x, y] = count >= r ? (byte)0 : (byte)255;
}
//for the last r rows, just set pixels to white
for (int y = h - r - 1; y < h; y++)
{
bytes[x, y] = 255;
}
}
//count the number of valid dark pixels in each row
float max = 0;
float[] bins = new float[h]; //normalized "dark pixel strength" for all h rows
int left, right, width; //leftmost and rightmost dark pixels in row
bool dark = false; //tracking variable
for (int y = 0; y < h; y++)
{
//initialize values at beginning of loop iteration
left = 0;
right = 0;
width = 100;
for (int x = 0; x < w; x++)
{
//use value of 128 as threshold between light and dark
dark = bytes[x, y] < 128;
//increment bin if pixel is dark
bins[y] += dark ? 1 : 0;
//update leftmost and rightmost dark pixels
if (dark)
{
if (left == 0) left = x;
if (x > right) right = x;
}
}
width = right - left + 1;
//for bins with few pixels, treat them as empty
if (bins[y] < 10) bins[y] = 0;
//normalize value according to width
//divide bin count by width (leftmost to rightmost)
bins[y] /= width;
//calculate the maximum bin value so that bins can be scaled when drawn
if (bins[y] > max) max = bins[y];
}
//calculated the smoothed value of each bin i by averaging bin i-1, i, and i+1
float[] smooth = new float[bins.Length];
smooth[0] = bins[0];
smooth[smooth.Length - 1] = bins[bins.Length - 1];
for (int i = 1; i < bins.Length - 1; i++)
{
smooth[i] = (bins[i - 1] + bins[i] + bins[i + 1])/3;
}
//create a new bitmap based on the original bitmap, then draw bins on top
Bitmap bmp = new Bitmap(originalBitmap);
using (Graphics gr = Graphics.FromImage(bmp))
{
for (int y = 0; y < bins.Length; y++)
{
//scale each bin so that it is drawn 200 pixels wide from the left edge
float value = 200 * (float)smooth[y] / max;
gr.DrawLine(Pens.Red, new PointF(0, y), new PointF(value, y));
}
}
pictureBox1.Image = bmp;
After fiddling around this for a while I found that I simply need to count the number of crossings for each line, that is, a switch from white to black would count as one, and a switch from black to white would increment by one again. By highlighting each line with a count > 66 I got close to 100% accuracy, except for the bottom most line.
Of course, would not be robust to slightly rotated scanned documents. And there is this disadvantage of needing to determine the correct threshold.
IMHO with the image shown that would be so hard to do 100% perfectly.
My answer is to give you alternate idea's.
Idea 1:
Make your own version of ReCaptcha (to put on your very own pron site) - and make it a fun game.. "Like cut out a word (edges should all be white space - with some tolerance for overlapping chars on above and below lines)."
Idea 2:
This was a game we played as kids, the wire of a coat hanger was all bent in waves and connected to a buzzer and you had to navigate a wand with a ring in the end with the wire through it, across one side to the other without making the buzzer go off. Perhaps you could adapt this idea and make a mobile game where people trace out the lines without touching black text (with tolerance for overlapping chars)... when they can do a line they get points and get to new levels where you give them harder images..
Idea 3:
Research how google/recaptcha got around it
Idea 4:
Get the SDK for photoshop and master the functionality of it Extract Edges tool
Idea 5:
Stretch the image heaps on the Y Axis which should help, apply the algorithm, then reduce the location measurements and apply them on the normal sized image.

Efficient ways to determine tilt of an image

I'm trying to write a program to programmatically determine the tilt or angle of rotation in an arbitrary image.
Images have the following properties:
Consist of dark text on a light background
Occasionally contain horizontal or vertical lines which only intersect at 90 degree angles.
Skewed between -45 and 45 degrees.
See this image as a reference (its been skewed 2.8 degrees).
So far, I've come up with this strategy: Draw a route from left to right, always selecting the nearest white pixel. Presumably, the route from left to right will prefer to follow the path between lines of text along the tilt of the image.
Here's my code:
private bool IsWhite(Color c) { return c.GetBrightness() >= 0.5 || c == Color.Transparent; }
private bool IsBlack(Color c) { return !IsWhite(c); }
private double ToDegrees(decimal slope) { return (180.0 / Math.PI) * Math.Atan(Convert.ToDouble(slope)); }
private void GetSkew(Bitmap image, out double minSkew, out double maxSkew)
{
decimal minSlope = 0.0M;
decimal maxSlope = 0.0M;
for (int start_y = 0; start_y < image.Height; start_y++)
{
int end_y = start_y;
for (int x = 1; x < image.Width; x++)
{
int above_y = Math.Max(end_y - 1, 0);
int below_y = Math.Min(end_y + 1, image.Height - 1);
Color center = image.GetPixel(x, end_y);
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);
if (IsWhite(center)) { /* no change to end_y */ }
else if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
}
decimal slope = (Convert.ToDecimal(start_y) - Convert.ToDecimal(end_y)) / Convert.ToDecimal(image.Width);
minSlope = Math.Min(minSlope, slope);
maxSlope = Math.Max(maxSlope, slope);
}
minSkew = ToDegrees(minSlope);
maxSkew = ToDegrees(maxSlope);
}
This works well on some images, not so well on others, and its slow.
Is there a more efficient, more reliable way to determine the tilt of an image?
I've made some modifications to my code, and it certainly runs a lot faster, but its not very accurate.
I've made the following improvements:
Using Vinko's suggestion, I avoid GetPixel in favor of working with bytes directly, now the code runs at the speed I needed.
My original code simply used "IsBlack" and "IsWhite", but this isn't granular enough. The original code traces the following paths through the image:
http://img43.imageshack.us/img43/1545/tilted3degtextoriginalw.gif
Note that a number of paths pass through the text. By comparing my center, above, and below paths to the actual brightness value and selecting the brightest pixel. Basically I'm treating the bitmap as a heightmap, and the path from left to right follows the contours of the image, resulting a better path:
http://img10.imageshack.us/img10/5807/tilted3degtextbrightnes.gif
As suggested by Toaomalkster, a Gaussian blur smooths out the height map, I get even better results:
http://img197.imageshack.us/img197/742/tilted3degtextblurredwi.gif
Since this is just prototype code, I blurred the image using GIMP, I did not write my own blur function.
The selected path is pretty good for a greedy algorithm.
As Toaomalkster suggested, choosing the min/max slope is naive. A simple linear regression provides a better approximation of the slope of a path. Additionally, I should cut a path short once I run off the edge of the image, otherwise the path will hug the top of the image and give an incorrect slope.
Code
private double ToDegrees(double slope) { return (180.0 / Math.PI) * Math.Atan(slope); }
private double GetSkew(Bitmap image)
{
BrightnessWrapper wrapper = new BrightnessWrapper(image);
LinkedList<double> slopes = new LinkedList<double>();
for (int y = 0; y < wrapper.Height; y++)
{
int endY = y;
long sumOfX = 0;
long sumOfY = y;
long sumOfXY = 0;
long sumOfXX = 0;
int itemsInSet = 1;
for (int x = 1; x < wrapper.Width; x++)
{
int aboveY = endY - 1;
int belowY = endY + 1;
if (aboveY < 0 || belowY >= wrapper.Height)
{
break;
}
int center = wrapper.GetBrightness(x, endY);
int above = wrapper.GetBrightness(x, aboveY);
int below = wrapper.GetBrightness(x, belowY);
if (center >= above && center >= below) { /* no change to endY */ }
else if (above >= center && above >= below) { endY = aboveY; }
else if (below >= center && below >= above) { endY = belowY; }
itemsInSet++;
sumOfX += x;
sumOfY += endY;
sumOfXX += (x * x);
sumOfXY += (x * endY);
}
// least squares slope = (NΣ(XY) - (ΣX)(ΣY)) / (NΣ(X^2) - (ΣX)^2), where N = elements in set
if (itemsInSet > image.Width / 2) // path covers at least half of the image
{
decimal sumOfX_d = Convert.ToDecimal(sumOfX);
decimal sumOfY_d = Convert.ToDecimal(sumOfY);
decimal sumOfXY_d = Convert.ToDecimal(sumOfXY);
decimal sumOfXX_d = Convert.ToDecimal(sumOfXX);
decimal itemsInSet_d = Convert.ToDecimal(itemsInSet);
decimal slope =
((itemsInSet_d * sumOfXY) - (sumOfX_d * sumOfY_d))
/
((itemsInSet_d * sumOfXX_d) - (sumOfX_d * sumOfX_d));
slopes.AddLast(Convert.ToDouble(slope));
}
}
double mean = slopes.Average();
double sumOfSquares = slopes.Sum(d => Math.Pow(d - mean, 2));
double stddev = Math.Sqrt(sumOfSquares / (slopes.Count - 1));
// select items within 1 standard deviation of the mean
var testSample = slopes.Where(x => Math.Abs(x - mean) <= stddev);
return ToDegrees(testSample.Average());
}
class BrightnessWrapper
{
byte[] rgbValues;
int stride;
public int Height { get; private set; }
public int Width { get; private set; }
public BrightnessWrapper(Bitmap bmp)
{
Rectangle rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
System.Drawing.Imaging.BitmapData bmpData =
bmp.LockBits(rect,
System.Drawing.Imaging.ImageLockMode.ReadOnly,
bmp.PixelFormat);
IntPtr ptr = bmpData.Scan0;
int bytes = bmpData.Stride * bmp.Height;
this.rgbValues = new byte[bytes];
System.Runtime.InteropServices.Marshal.Copy(ptr,
rgbValues, 0, bytes);
this.Height = bmp.Height;
this.Width = bmp.Width;
this.stride = bmpData.Stride;
}
public int GetBrightness(int x, int y)
{
int position = (y * this.stride) + (x * 3);
int b = rgbValues[position];
int g = rgbValues[position + 1];
int r = rgbValues[position + 2];
return (r + r + b + g + g + g) / 6;
}
}
The code is good, but not great. Large amounts of whitespace cause the program to draw relatively flat line, resulting in a slope near 0, causing the code to underestimate the actual tilt of the image.
There is no appreciable difference in the accuracy of the tilt by selecting random sample points vs sampling all points, because the ratio of "flat" paths selected by random sampling is the same as the ratio of "flat" paths in the entire image.
GetPixel is slow. You can get an order of magnitude speed up using the approach listed here.
If text is left (right) aligned you can determine the slope by measuring the distance between the left (right) edge of the image and the first dark pixel in two random places and calculate the slope from that. Additional measurements would lower the error while taking additional time.
First I must say I like the idea. But I've never had to do this before and I'm not sure what all to suggest to improve reliability. The first thing I can think of this is this idea of throwing out statistical anomalies. If the slope suddenly changes sharply then you know you've found a white section of the image that dips into the edge skewing (no pun intended) your results. So you'd want to throw that stuff out somehow.
But from a performance standpoint there are a number of optimizations you could make which may add up.
Namely, I'd change this snippet from your inner loop from this:
Color center = image.GetPixel(x, end_y);
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);
if (IsWhite(center)) { /* no change to end_y */ }
else if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
To this:
Color center = image.GetPixel(x, end_y);
if (IsWhite(center)) { /* no change to end_y */ }
else
{
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);
if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
}
It's the same effect but should drastically reduce the number of calls to GetPixel.
Also consider putting the values that don't change into variables before the madness begins. Things like image.Height and image.Width have a slight overhead every time you call them. So store those values in your own variables before the loops begin. The thing I always tell myself when dealing with nested loops is to optimize everything inside the most inner loop at the expense of everything else.
Also... as Vinko Vrsalovic suggested, you may look at his GetPixel alternative for yet another boost in speed.
At first glance, your code looks overly naive.
Which explains why it doesn't always work.
I like the approach Steve Wortham suggested,
but it might run into problems if you have background images.
Another approach that often helps with images is to blur them first.
If you blur your example image enough, each line of text will end up
as a blurry smooth line. You then apply some sort of algorithm to
basically do a regression analisys. There's lots of ways to do
that, and lots of examples on the net.
Edge detection might be useful, or it might cause more problems that its worth.
By the way, a gaussian blur can be implemented very efficiently if you search hard enough for the code. Otherwise, I'm sure there's lots of libraries available.
Haven't done much of that lately so don't have any links on hand.
But a search for Image Processing library will get you good results.
I'm assuming you're enjoying the fun of solving this, so not much in actual implementation detalis here.
Measuring the angle of every line seems like overkill, especially given the performance of GetPixel.
I wonder if you would have better performance luck by looking for a white triangle in the upper-left or upper-right corner (depending on the slant direction) and measuring the angle of the hypotenuse. All text should follow the same angle on the page, and the upper-left corner of a page won't get tricked by the descenders or whitespace of content above it.
Another tip to consider: rather than blurring, work within a greatly-reduced resolution. That will give you both the smoother data you need, and fewer GetPixel calls.
For example, I made a blank page detection routine once in .NET for faxed TIFF files that simply resampled the entire page to a single pixel and tested the value for a threshold value of white.
What are your constraints in terms of time?
The Hough transform is a very effective mechanism for determining the skew angle of an image. It can be costly in time, but if you're going to use Gaussian blur, you're already burning a pile of CPU time. There are also other ways to accelerate the Hough transform that involve creative image sampling.
Your latest output is confusing me a little.
When you superimposed the blue lines on the source image, did you offset it a bit? It looks like the blue lines are about 5 pixels above the centre of the text.
Not sure about that offset, but you definitely have a problem with the derived line "drifting" away at the wrong angle. It seems to have too strong a bias towards producing a horizontal line.
I wonder if increasing your mask window from 3 pixels (centre, one above, one below) to 5 might improve this (two above, two below). You'll also get this effect if you follow richardtallent's suggestion and resample the image smaller.
Very cool path finding application.
I wonder if this other approach would help or hurt with your particular data set.
Assume a black and white image:
Project all black pixels to the right (EAST). This should give a result of a one dimensional array with a size of IMAGE_HEIGHT. Call the array CANVAS.
As you project all the pixels EAST, keep track numerically of how many pixels project into each bin of CANVAS.
Rotate the image an arbitrary number of degrees and re-project.
Pick the result that gives the highest peaks and lowest valleys for values in CANVAS.
I imagine this will not work well if in fact you have to account for a real -45 -> +45 degrees of tilt. If the actual number is smaller(?+/- 10 degrees), this might be a pretty good strategy. Once you have an intial result, you could consider re-running with a smaller increment of degrees to fine tune the answer. I might therefore try to write this with a function that accepted a float degree_tick as a parm so I could run both a coarse and fine pass (or a spectrum of coarseness or fineness) with the same code.
This might be computationally expensive. To optimize, you might consider selecting just a portion of the image to project-test-rotate-repeat on.

Categories