I am trying to code a desktop app for calculating score of shooting target range paper.
After researching, find some article can help, but still the problem how to work with openCv or emguCv, I am good at C# but C++ need time to learn it.
Another questio, what is the best approach to detect overlapping bullet holes in a shooting target?
like this image
Image is above. In the rings 7 and 8 there are two bullet holes overlapping. In this case it would be easy to solve it by simply performing an erosion.
However, in cases where the circles are almost completely overlapped i don't see how i can identify them.
Some links can help:
Detecting circles and shots from paper target
http://www.emgu.com/wiki/index.php/Shape_(Triangle,_Rectangle,_Circle,_Line)_Detection_in_CSharp
You can isolate overlapping bullets by following these steps:
Isolate your bullets from the rest of the image
Apply opening on the bullets (erode then dilate)
Compute the distance for each white pixel to the closest black pixel
Apply thresholding
The C++ code:
cv::Mat preprocess(const cv::Mat image) {
display(image, "Original");
// Color thresholds
cv::Scalar minColor(141, 0, 0);
cv::Scalar maxColor(255, 255, 124);
cv::Mat filtered;
// Isolate the interesting range of colors
cv::inRange(image, minColor, maxColor, filtered);
filtered.convertTo(filtered, CV_8U);
// Apply opening (erode then dilate)
cv::Mat opening;
cv::Mat kernel = cv::getStructuringElement(cv::MORPH_RECT, cv::Size(3, 3));
cv::morphologyEx(filtered, opening, cv::MORPH_OPEN, kernel, cv::Point(-1,-1), 2);
// Compute the distance to the closest zero pixel (euclidian)
cv::Mat distance;
cv::distanceTransform(opening, distance, CV_DIST_L2, 5);
cv::normalize(distance, distance, 0, 1.0, cv::NORM_MINMAX);
display(distance, "Distance");
// Thresholding using the longest distance
double min, max;
cv::minMaxLoc(distance, &min, &max);
cv::Mat thresholded;
cv::threshold(distance, thresholded, 0.7 * max, 255, CV_THRESH_BINARY);
thresholded.convertTo(thresholded, CV_8U);
// Find connected components
cv::Mat labels;
int nbLabels = cv::connectedComponents(thresholded, labels);
// Assign a random color to each label
vector<int> colors(nbLabels, 0);
for (int label = 1; label < nbLabels; ++label) {
colors[label] = rand() & 255;
}
cv::Mat result(distance.size(), CV_8U);
for (int r = 0; r < result.rows; ++r) {
for (int c = 0; c < result.cols; ++c) {
int label = labels.at<int>(r, c);
result.at<uchar>(r, c) = colors[label];
}
}
display(result, "Labels");
return result;
}
there are two ways in which you can complete your task.
The simpler method is to subtract the images. Take an ideal Target image and subtract it with the target image after each hit or at last after completing the entire shoot.
The other method would be to separate colors. If the Color of bullet is blue, then you can us inRange function to filter out the color. You can even make a library of different color bullets so the user can choose it from the option. I have recently done similar project in C#. For more details contact be on my email.(rajkumarm704#gmail.com)
Related
Well, I'm continuing this question without answer (Smoothing random noises with different amplitudes) and I have another question.
I have opted to use the contour/shadow of a shape (Translating/transforming? list of points from its center with an offset/distance).
This contour/shadow is bigger than the current path. I used this repository (https://github.com/n-yoda/unity-vertex-effects) to recreate the shadow. And this works pretty well, except for one fact.
To know the height of all points (obtained by this shadow algorithm (Line 13 of ModifiedShadow.cs & Line 69 of CircleOutline.cs)) I get the distance of the current point to the center and I divide between the maximum distance to the center:
float dist = orig.Max(v => (v - Center).magnitude);
foreach Point in poly --> float d = 1f - (Center - p).magnitude / dist;
Where orig is the entire list of points obtained by the shadow algorithm.
D is the height of the shadow.
But the problem is obvious I get a perfect circle:
In red and black to see the contrast:
And this is not what I want:
As you can see this not a perfect gradient. Let's explain what's happening.
I use this library to generate noises: https://github.com/Auburns/FastNoise_CSharp
Note: If you want to know what I use to get noises with different amplitude: Smoothing random noises with different amplitudes (see first block of code), to see this in action, see this repo
Green background color represent noises with a mean height of -0.25 and an amplitude of 0.3
White background color represent noises with a mean height of 0 and an amplitude of 0.1
Red means 1 (total interpolation for noises corresponding to white pixels)
Black means 0 (total interpolation for noises corresponding to green pixels)
That's why we have this output:
Actually, I have tried comparing distances of each individual point to the center, but this output a weird and unexpected result.
Actually, I don't know what to try...
The problem is that the lerp percentage (e.g., from high/low or "red" to "black" in your visualization) is only a function of the point's distance from the center, which is divided by a constant (which happens to be the maximum distance of any point from the center). That's why it appears circular.
For instance, the centermost point on the left side of the polygon might be 300 pixels away from the center, while the centermost point on the right might be 5 pixels. Both need to be red, but basing it off of 0 distance from center = red won't have either be red, and basing it off the min distance from center = red will only have red on the right side.
The relevant minimum and maximum distances will change depending on where the point is
One alternative method is for each point: find the closest white pixel, and find the closest green pixel, (or, the closest shadow pixel that is adjacent to green/white, such as here). Then, choose your redness depending on how the distances compare between those two points and the current point.
Therefore, you could do this (pseudo-C#):
foreach pixel p in shadow_region {
// technically, closest shadow pixel which is adjacent to x Pixel:
float closestGreen_distance = +inf;
float closestWhite_distance = +inf;
// Possibly: find all shadow-adjacent pixels prior to the outer loop
// and cache them. Then, you only have to loop through those pixels.
foreach pixel p2 in shadow {
float p2Dist = (p-p2).magnitude;
if (p2 is adjacent to green) {
if (p2Dist < closestGreen_distance) {
closestGreen_distance = p2Dist;
}
}
if (p2 is adjacent to white) {
if (p2Dist < closestWhite_distance) {
closestWhite_distance = p2Dist;
}
}
}
float d = 1f - closestWhite_distance / (closestWhite_distance + closestGreen_distance)
}
Using the code you've posted in the comments, this might look like:
foreach (Point p in value)
{
float minOuterDistance = outerPoints.Min(p2 => (p - p2).magnitude);
float minInnerDistance = innerPoints.Min(p2 => (p - p2).magnitude);
float d = 1f - minInnerDistance / (minInnerDistance + minOuterDistance);
Color32? colorValue = func?.Invoke(p.x, p.y, d);
if (colorValue.HasValue)
target[F.P(p.x, p.y, width, height)] = colorValue.Value;
}
The above part was chosen for the solution. The below part, mentioned as another option, turned out to be unnecessary.
If you can't determine if a shadow pixel is adjacent to white/green, here's an alternative that only requires the calculation of the normals of each vertex in your pink (original) outline.
Create outer "yellow" vertices by going to each pink vertex and following its normal outward. Create inner "blue" vertices by going to each pink vertex and following its normal inward.
Then, when looping through each pixel in the shadow, loop through the yellow vertices to get your "closest to green" and through the blue to get "closest to white".
The problem is that since your shapes aren't fully convex, these projected blue and yellow outlines might be inside-out in some places, so you would need to deal with that somehow. I'm having trouble determining an exact method of dealing with that but here's what I have so far:
One step is to ignore any blues/yellows that have outward-normals that point towards the current shadow pixel.
However, if the current pixel is inside of a point where the yellow/blue shape is inside out, I'm not sure how to proceed. There might be something to ignoring blue/yellow vertexes that are closer to the closest pink vertex than they should be.
extremely rough pseudocode:
list yellow_vertex_list = new list
list blue_vertex_list = new list
foreach pink vertex p:
given float dist;
vertex yellowvertex = new vertex(p+normal*dist)
vertex bluevertex = new vertex(p-normal*dist)
yellow_vertex_list.add(yellowvertex)
blue_vertex_list.add(bluevertex)
create shadow
for each pixel p in shadow:
foreach vertex v in blue_vertex_list
if v.normal points towards v: break;
if v is the wrong side of inside-out region: break;
if v is closest so far:
closest_blue = v
closest_blue_dist = (v-p).magnitude
foreach vertex v in yellow_vertex_list
if v.normal points towards v break;
if v is the wrong side of inside-out region: break;
if v is closest so far:
closest_yellow = v
closest_yellow_dist = (v-p).magnitude
float d = 1f - closest_blue_dist / (closest_blue_dist + closest_yellow_dist)
I've been making a top-down shooter game in XNA that requires rectangular collision for the map.
The collision walls for a map is stored in a text file in the format of:rect[0,0,1024,8]
The values correspond to defining a rectangle (x, y, width, height).
I've been thinking that I could write a separate application that can illiterate through the data of the map image, find out the pixels that are black (or any color of the wall) and make rectangles there. Basically, this program will generate the rectangles required for the collision. Ideally, it would be pixel perfect, which would require something like a thousand rectangles each 1 pixel wide that covers all the walls.
Is there a possible way to detect which of these rectangles (or squares I should say) are adjacent to one another, then connect them into the a bigger (but still covering the same area) rectangle?
EG. Lets say I have a wall that is 10 by 2. The program would generate 20 different rectangles, each 1 pixel high. How would I efficiently detect that these rectangles are adjacent and automatically make a 10 by 2 rectangle covering the whole wall instead of having 20 different little pixel rectangles?
EDIT: I've worked out a solution that fits my purposes, for future reference, my code is below:
//map is a bitmap, horizontalCollisions and collisions are List<Rectangle>s
for (int y = 0; y < map.Height; y++) //loop through pixels
{
for (int x = 0; x < map.Width; x++)
{
if (map.GetPixel(x, y).Name == "ff000000") //wall color
{
int i = 1;
while (map.GetPixel(x + i, y).Name == "ff000000")
{
if (i != map.Width - x)
{
i++;
}
if (i == map.Width - x)
{
break;
}
}
Rectangle r = new Rectangle(x, y, i, 1);//create and add
x += i - 1;
horizontalCollisions.Add(r);
}
}
}
for (int j = 0; j < horizontalCollisions.Count; j++)
{
int i = 1;
Rectangle current = horizontalCollisions[j];
Rectangle r = new Rectangle(current.X, current.Y + 1, current.Width, 1);
while(horizontalCollisions.Contains(r))
{
i++;
horizontalCollisions.Remove(r);
r = new Rectangle(current.X, current.Y + i, current.Width, 1);
}
Rectangle add = new Rectangle(current.X, current.Y, current.Width, i);
collisions.Add(add);
}
//collisions now has all the rectangles
Basically, it will loop through the pixel data horizontally. When it encounters a wall pixel, it will stop the counter and (using a while loop) move the counter towards the right, one by one until it hits a non-wall pixel. Then, it will create a rectangle of that width, and continue on. After this process, there will be a big list of rectangles, each 1px tall. Basically, a bunch of horizontal lines. The next loop will run through the horizontal lines, and using the same process as above, it will find out of there are any rectangles with the same X value and the same Width value under it (y+1). This will keep incrementing until there are none, in which one big rectangle will be created, and the used rectangles are deleted from the List. The final resulting list contains all the rectangles that will make up all the black pixels on the image (pretty efficiently, I think).
Etiquette may suggest that I should comment this instead of add it as an answer, but I do not yet have that capability, so bear with me.
I'm afraid I am not able to translate this into code for you, but I can send you towards some academic papers that discuss algorithms that can do some of the things that you're asking.
Other time this questions has appeared:
Find the set of largest contiguous rectangles to cover multiple areas
Puzzle: Find largest rectangle (maximal rectangle problem)
Papers linked in those questions:
Fast Algorithms To Partition Simple Rectilinear Polygons
Polygon Decomposition
The Maximal Rectangle Problem
Hopefully these questions and papers can lead help you find the answer you're looking for, or at least scare you off towards finding another solution.
I am trying to extract out 3D distance in mm between two known points in a 2D image. I am using square AR markers in order to get the camera coordinates relative to the markers in the scene. The points are the corners of these markers.
An example is shown below:
The code is written in C# and I am using XNA. I am using AForge.net for the CoPlanar POSIT
The steps I take in order to work out the distance:
1. Mark corners on screen. Corners are represented in 2D vector form, Image centre is (0,0). Up is positive in the Y direction, right is positive in the X direction.
2. Use AForge.net Co-Planar POSIT algorithm to get pose of each marker:
float focalLength = 640; //Needed for POSIT
float halfCornerSize = 50; //Represents 1/2 an edge i.e. 50mm
AVector[] modelPoints = new AVector3[]
{
new AVector3( -halfCornerSize, 0, halfCornerSize ),
new AVector3( halfCornerSize, 0, halfCornerSize ),
new AVector3( halfCornerSize, 0, -halfCornerSize ),
new AVector3( -halfCornerSize, 0, -halfCornerSize ),
};
CoplanarPosit coPosit = new CoplanarPosit(modelPoints, focalLength);
coPosit.EstimatePose(cornersToEstimate, out marker1Rot, out marker1Trans);
3. Convert to XNA rotation/translation matrix (AForge uses OpenGL matrix form):
float yaw, pitch, roll;
marker1Rot.ExtractYawPitchRoll(out yaw, out pitch, out roll);
Matrix xnaRot = Matrix.CreateFromYawPitchRoll(-yaw, -pitch, roll);
Matrix xnaTranslation = Matrix.CreateTranslation(marker1Trans.X, marker1Trans.Y, -marker1Trans.Z);
Matrix transform = xnaRot * xnaTranslation;
4. Find 3D coordinates of the corners:
//Model corner points
cornerModel = new Vector3[]
{
new Vector3(halfCornerSize,0,-halfCornerSize),
new Vector3(-halfCornerSize,0,-halfCornerSize),
new Vector3(halfCornerSize,0,halfCornerSize),
new Vector3(-halfCornerSize,0,halfCornerSize)
};
Matrix markerTransform = Matrix.CreateTranslation(cornerModel[i].X, cornerModel[i].Y, cornerModel[i].Z);
cornerPositions3d1[i] = (markerTransform * transform).Translation;
//DEBUG: project corner onto screen - represented by brown dots
Vector3 t3 = viewPort.Project(markerTransform.Translation, projectionMatrix, viewMatrix, transform);
cornersProjected1[i].X = t3.X; cornersProjected1[i].Y = t3.Y;
5. Look at the 3D distance between two corners on a marker, this represents 100mm. Find the scaling factor needed to convert this 3D distance to 100mm. (I actually get the average scaling factor):
for (int i = 0; i < 4; i++)
{
//Distance scale;
distanceScale1 += (halfCornerSize * 2) / Vector3.Distance(cornerPositions3d1[i], cornerPositions3d1[(i + 1) % 4]);
}
distanceScale1 /= 4;
6. Finally I find the 3D distance between related corners and multiply by the scaling factor to get distance in mm:
for(int i = 0; i < 4; i++)
{
distance[i] = Vector3.Distance(cornerPositions3d1[i], cornerPositions3d2[i]) * scalingFactor;
}
The distances acquired are never truly correct. I used the cutting board as it allowed me easy calculation of what the distances should be. The above image calculated a distance of 147mm (expected 150mm) for corner 1 (red to purple). The image below shows 188mm (expected 200mm).
What is also worrying is the fact that when measuring the distance between marker corners sharing an edge on the same marker, the 3D distances obtained are never the same. Another thing I noticed is that the brown dots never seem to exactly match up with the colored dots. The colored dots are the coordinates used as input to the CoPlanar posit. The brown dots are the calculated positions from the center of the marker calculated via POSIT.
Does anyone have any idea what might be wrong here? I am pulling out my hair trying to figure it out. The code should be quite simple, I don't think I have made any obvious mistakes with the code. I am not great at maths so please point out where my basic maths might be wrong as well...
You are using way to many black boxes in your question. What is the focal length in the second step? Why go through ypr in step 3? How do you calibrate? I recommend to start over from scratch without using libraries that you do not understand.
Step 1: Create a camera model. Understand the errors, build a projection. If needed apply a 2d filter for lens distortion. This might be hard.
Step 2: Find you markers in 2d, after removing lens distortion. Make sure you know the error and that you get the center. Maybe over multiple frames.
Step 3: Un-project to 3d. After 1 and 2 this should be easy.
Step 4: ???
Step 5: Profit! (Measure distance in 3d and know your error)
I think you need to have 3D photo (two photo from a set of distance) so you can get the parallax distance from image differences
I have looked on google, but the only thing that I could find was a tutorial on how to create one by using photoshop. No interest! I need the logic behind it.
(and i dont need the logic of how to 'use' a bump map, i want to know how to 'make' one!)
I am writing my own HLSL shader and have come as far as to realize that there is some kind of gradient between two pixels which will show its normal - thus with the position of the light can be lit accoardingly.
I want to do this real time so that when the texture changes, the bumpmap does too.
Thanks
I realize that I'm way WAY late to this party but I, too, ran into the same situation recently while attempting to write my own normal map generator for 3ds max. There's bulky and unnecessary libraries for C# but nothing in the way of a simple, math-based solution.
So I ran with the math behind the conversion: the Sobel Operator. That's what you're looking to employ in the shader script.
The following Class is about the simplest implementation I've seen for C#. It does exactly what it's supposed to do and achieves exactly what is desired: a normal map based on either a heightmap, texture or even a programmatically-generated procedural that you provide.
As you can see in the code, I've implemented if / else to mitigate exceptions thrown on edge detection width and height limits.
What it does: samples the HSB Brightness of each pixel / adjoining pixel to determine the scale of the output Hue / Saturation values that are subsequently converted to RGB for the SetPixel operation.
As an aside: you could implement an input control to scale the intensity of the output Hue / Saturation values to scale the subsequent affect that the output normal map will provide your geometry / lighting.
And that's it. No more having to deal with that deprecated, tiny-windowed PhotoShop plugin. Sky's the limit.
Screenshot of C# winforms implementation (source / output):
C# Class to achieve a Sobel-based normal map from source image:
using System.Drawing;
using System.Windows.Forms;
namespace heightmap.Class
{
class Normal
{
public void calculate(Bitmap image, PictureBox pic_normal)
{
Bitmap image = (Bitmap) Bitmap.FromFile(#"yourpath/yourimage.jpg");
#region Global Variables
int w = image.Width - 1;
int h = image.Height - 1;
float sample_l;
float sample_r;
float sample_u;
float sample_d;
float x_vector;
float y_vector;
Bitmap normal = new Bitmap(image.Width, image.Height);
#endregion
for (int y = 0; y < w + 1; y++)
{
for (int x = 0; x < h + 1; x++)
{
if (x > 0) { sample_l = image.GetPixel(x - 1, y).GetBrightness(); }
else { sample_l = image.GetPixel(x, y).GetBrightness(); }
if (x < w) { sample_r = image.GetPixel(x + 1, y).GetBrightness(); }
else { sample_r = image.GetPixel(x, y).GetBrightness(); }
if (y > 1) { sample_u = image.GetPixel(x, y - 1).GetBrightness(); }
else { sample_u = image.GetPixel(x, y).GetBrightness(); }
if (y < h) { sample_d = image.GetPixel(x, y + 1).GetBrightness(); }
else { sample_d = image.GetPixel(x, y).GetBrightness(); }
x_vector = (((sample_l - sample_r) + 1) * .5f) * 255;
y_vector = (((sample_u - sample_d) + 1) * .5f) * 255;
Color col = Color.FromArgb(255, (int)x_vector, (int)y_vector, 255);
normal.SetPixel(x, y, col);
}
}
pic_normal.Image = normal; // set as PictureBox image
}
}
}
A sampler to read your height or depth map.
/// same data as HeightMap, but in a format that the pixel shader can read
/// the pixel shader dynamically generates the surface normals from this.
extern Texture2D HeightMap;
sampler2D HeightSampler = sampler_state
{
Texture=(HeightMap);
AddressU=CLAMP;
AddressV=CLAMP;
Filter=LINEAR;
};
Note that my input map is a 512x512 single-component grayscale texture. Calculating the normals from that is pretty simple:
#define HALF2 ((float2)0.5)
#define GET_HEIGHT(heightSampler,texCoord) (tex2D(heightSampler,texCoord+HALF2))
///calculate a normal for the given location from the height map
/// basically, this calculates the X- and Z- surface derivatives and returns their
/// cross product. Note that this assumes the heightmap is a 512 pixel square for no particular
/// reason other than that my test map is 512x512.
float3 GetNormal(sampler2D heightSampler, float2 texCoord)
{
/// normalized size of one texel. this would be 1/1024.0 if using 1024x1024 bitmap.
float texelSize=1/512.0;
float n = GET_HEIGHT(heightSampler,texCoord+float2(0,-texelSize));
float s = GET_HEIGHT(heightSampler,texCoord+float2(0,texelSize));
float e = GET_HEIGHT(heightSampler,texCoord+float2(-texelSize,0));
float w = GET_HEIGHT(heightSampler,texCoord+float2(texelSize,0));
float3 ew = normalize(float3(2*texelSize,e-w,0));
float3 ns = normalize(float3(0,s-n,2*texelSize));
float3 result = cross(ew,ns);
return result;
}
and a pixel shader to call it:
#define LIGHT_POSITION (float3(0,2,0))
float4 SolidPS(float3 worldPosition : NORMAL0, float2 texCoord : TEXCOORD0) : COLOR0
{
/// calculate a normal from the height map
float3 normal = GetNormal(HeightSampler,texCoord);
/// return it as a color. (Since the normal components can range from -1 to +1, this
/// will probably return a lot of "black" pixels if rendered as-is to screen.
return float3(normal,1);
}
LIGHT_POSITION could (and probably should) be input from your host code, though I've cheated and used a constant here.
Note that this method requires 4 texture lookups per normal, not counting one to get the color. That may not be an issue for you (depending on whatever else your're doing). If that becomes too much of a performance hit, you can either just call it whenever the texture changes, render to a target, and capture the result as a normal map.
An alternative would be to draw a screen-aligned quad textured with the heightmap to a render target and use the ddx/ddy HLSL intrinsics to generate the normals without having to resample the source texture. Obviously you'd do this in a pre-pass step, read the resulting normal map back, and then use it as an input to your later stages.
In any case, this has proved fast enough for me.
The short answer is: there's no way to do this reliably that produces good results, because there's no way to tell the difference between a diffuse texture that has changes in color/brightness due to bumpiness, and a diffuse texture that has changes in color/brightness because the surface is actually a different colour/brightness at that point.
Longer answer:
If you were to assume that the surface were actually a constant colour, then any changes in colour or brightness must be due to shading effects due to bumpiness. Calculate how much brighter/darker each pixel is from the actual surface colour; brighter values indicate parts of the surface that face 'towards' the light source, and darker values indicate parts of the surface that face 'away' from the light source. If you also specify the direction the light is coming from, you can calculate a surface normal at each point on the texture such that it would result in the shading value you calculated.
That's the basic theory. Of course, in reality, the surface is almost never a constant colour, which is why this approach of using purely the diffuse texture as input tends not to work very well. I'm not sure how things like CrazyBump do it but I think they're doing things like averaging the colour over local parts of the image rather than the whole texture.
Ordinarily, normal maps are created from actual 3D models of the surface that are 'projected' onto lower-resolution geometry. Normal maps are just a technique for faking that high-resolution geometry, after all.
Quick answer: It's not possible.
A simple generic (diffuse) texture simply does not contain this information. I haven't looked exactly how Photoshop does it (seen it once used by an artist), but I think they just simply do something like 'depth=r+g+b+a', which basically returns a heightmap/gradient. And then converting the heightmap to a normalmap using a simple edge detect effect to get a Tangent space Normal Map.
Just keep in mind, in most cases you use a normal map to simulate a high res 3D geometry mesh, as it fills in the blank spot vertex-normals leave behind. If your scene heavily relies on lighting, this is a no-go, but if it's a simple directional light, this 'might' work.
Of course, this is just my experience, you might just as well be working on a completely different type of project.
I'm trying to write a program to programmatically determine the tilt or angle of rotation in an arbitrary image.
Images have the following properties:
Consist of dark text on a light background
Occasionally contain horizontal or vertical lines which only intersect at 90 degree angles.
Skewed between -45 and 45 degrees.
See this image as a reference (its been skewed 2.8 degrees).
So far, I've come up with this strategy: Draw a route from left to right, always selecting the nearest white pixel. Presumably, the route from left to right will prefer to follow the path between lines of text along the tilt of the image.
Here's my code:
private bool IsWhite(Color c) { return c.GetBrightness() >= 0.5 || c == Color.Transparent; }
private bool IsBlack(Color c) { return !IsWhite(c); }
private double ToDegrees(decimal slope) { return (180.0 / Math.PI) * Math.Atan(Convert.ToDouble(slope)); }
private void GetSkew(Bitmap image, out double minSkew, out double maxSkew)
{
decimal minSlope = 0.0M;
decimal maxSlope = 0.0M;
for (int start_y = 0; start_y < image.Height; start_y++)
{
int end_y = start_y;
for (int x = 1; x < image.Width; x++)
{
int above_y = Math.Max(end_y - 1, 0);
int below_y = Math.Min(end_y + 1, image.Height - 1);
Color center = image.GetPixel(x, end_y);
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);
if (IsWhite(center)) { /* no change to end_y */ }
else if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
}
decimal slope = (Convert.ToDecimal(start_y) - Convert.ToDecimal(end_y)) / Convert.ToDecimal(image.Width);
minSlope = Math.Min(minSlope, slope);
maxSlope = Math.Max(maxSlope, slope);
}
minSkew = ToDegrees(minSlope);
maxSkew = ToDegrees(maxSlope);
}
This works well on some images, not so well on others, and its slow.
Is there a more efficient, more reliable way to determine the tilt of an image?
I've made some modifications to my code, and it certainly runs a lot faster, but its not very accurate.
I've made the following improvements:
Using Vinko's suggestion, I avoid GetPixel in favor of working with bytes directly, now the code runs at the speed I needed.
My original code simply used "IsBlack" and "IsWhite", but this isn't granular enough. The original code traces the following paths through the image:
http://img43.imageshack.us/img43/1545/tilted3degtextoriginalw.gif
Note that a number of paths pass through the text. By comparing my center, above, and below paths to the actual brightness value and selecting the brightest pixel. Basically I'm treating the bitmap as a heightmap, and the path from left to right follows the contours of the image, resulting a better path:
http://img10.imageshack.us/img10/5807/tilted3degtextbrightnes.gif
As suggested by Toaomalkster, a Gaussian blur smooths out the height map, I get even better results:
http://img197.imageshack.us/img197/742/tilted3degtextblurredwi.gif
Since this is just prototype code, I blurred the image using GIMP, I did not write my own blur function.
The selected path is pretty good for a greedy algorithm.
As Toaomalkster suggested, choosing the min/max slope is naive. A simple linear regression provides a better approximation of the slope of a path. Additionally, I should cut a path short once I run off the edge of the image, otherwise the path will hug the top of the image and give an incorrect slope.
Code
private double ToDegrees(double slope) { return (180.0 / Math.PI) * Math.Atan(slope); }
private double GetSkew(Bitmap image)
{
BrightnessWrapper wrapper = new BrightnessWrapper(image);
LinkedList<double> slopes = new LinkedList<double>();
for (int y = 0; y < wrapper.Height; y++)
{
int endY = y;
long sumOfX = 0;
long sumOfY = y;
long sumOfXY = 0;
long sumOfXX = 0;
int itemsInSet = 1;
for (int x = 1; x < wrapper.Width; x++)
{
int aboveY = endY - 1;
int belowY = endY + 1;
if (aboveY < 0 || belowY >= wrapper.Height)
{
break;
}
int center = wrapper.GetBrightness(x, endY);
int above = wrapper.GetBrightness(x, aboveY);
int below = wrapper.GetBrightness(x, belowY);
if (center >= above && center >= below) { /* no change to endY */ }
else if (above >= center && above >= below) { endY = aboveY; }
else if (below >= center && below >= above) { endY = belowY; }
itemsInSet++;
sumOfX += x;
sumOfY += endY;
sumOfXX += (x * x);
sumOfXY += (x * endY);
}
// least squares slope = (NΣ(XY) - (ΣX)(ΣY)) / (NΣ(X^2) - (ΣX)^2), where N = elements in set
if (itemsInSet > image.Width / 2) // path covers at least half of the image
{
decimal sumOfX_d = Convert.ToDecimal(sumOfX);
decimal sumOfY_d = Convert.ToDecimal(sumOfY);
decimal sumOfXY_d = Convert.ToDecimal(sumOfXY);
decimal sumOfXX_d = Convert.ToDecimal(sumOfXX);
decimal itemsInSet_d = Convert.ToDecimal(itemsInSet);
decimal slope =
((itemsInSet_d * sumOfXY) - (sumOfX_d * sumOfY_d))
/
((itemsInSet_d * sumOfXX_d) - (sumOfX_d * sumOfX_d));
slopes.AddLast(Convert.ToDouble(slope));
}
}
double mean = slopes.Average();
double sumOfSquares = slopes.Sum(d => Math.Pow(d - mean, 2));
double stddev = Math.Sqrt(sumOfSquares / (slopes.Count - 1));
// select items within 1 standard deviation of the mean
var testSample = slopes.Where(x => Math.Abs(x - mean) <= stddev);
return ToDegrees(testSample.Average());
}
class BrightnessWrapper
{
byte[] rgbValues;
int stride;
public int Height { get; private set; }
public int Width { get; private set; }
public BrightnessWrapper(Bitmap bmp)
{
Rectangle rect = new Rectangle(0, 0, bmp.Width, bmp.Height);
System.Drawing.Imaging.BitmapData bmpData =
bmp.LockBits(rect,
System.Drawing.Imaging.ImageLockMode.ReadOnly,
bmp.PixelFormat);
IntPtr ptr = bmpData.Scan0;
int bytes = bmpData.Stride * bmp.Height;
this.rgbValues = new byte[bytes];
System.Runtime.InteropServices.Marshal.Copy(ptr,
rgbValues, 0, bytes);
this.Height = bmp.Height;
this.Width = bmp.Width;
this.stride = bmpData.Stride;
}
public int GetBrightness(int x, int y)
{
int position = (y * this.stride) + (x * 3);
int b = rgbValues[position];
int g = rgbValues[position + 1];
int r = rgbValues[position + 2];
return (r + r + b + g + g + g) / 6;
}
}
The code is good, but not great. Large amounts of whitespace cause the program to draw relatively flat line, resulting in a slope near 0, causing the code to underestimate the actual tilt of the image.
There is no appreciable difference in the accuracy of the tilt by selecting random sample points vs sampling all points, because the ratio of "flat" paths selected by random sampling is the same as the ratio of "flat" paths in the entire image.
GetPixel is slow. You can get an order of magnitude speed up using the approach listed here.
If text is left (right) aligned you can determine the slope by measuring the distance between the left (right) edge of the image and the first dark pixel in two random places and calculate the slope from that. Additional measurements would lower the error while taking additional time.
First I must say I like the idea. But I've never had to do this before and I'm not sure what all to suggest to improve reliability. The first thing I can think of this is this idea of throwing out statistical anomalies. If the slope suddenly changes sharply then you know you've found a white section of the image that dips into the edge skewing (no pun intended) your results. So you'd want to throw that stuff out somehow.
But from a performance standpoint there are a number of optimizations you could make which may add up.
Namely, I'd change this snippet from your inner loop from this:
Color center = image.GetPixel(x, end_y);
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);
if (IsWhite(center)) { /* no change to end_y */ }
else if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
To this:
Color center = image.GetPixel(x, end_y);
if (IsWhite(center)) { /* no change to end_y */ }
else
{
Color above = image.GetPixel(x, above_y);
Color below = image.GetPixel(x, below_y);
if (IsWhite(above) && IsBlack(below)) { end_y = above_y; }
else if (IsBlack(above) && IsWhite(below)) { end_y = below_y; }
}
It's the same effect but should drastically reduce the number of calls to GetPixel.
Also consider putting the values that don't change into variables before the madness begins. Things like image.Height and image.Width have a slight overhead every time you call them. So store those values in your own variables before the loops begin. The thing I always tell myself when dealing with nested loops is to optimize everything inside the most inner loop at the expense of everything else.
Also... as Vinko Vrsalovic suggested, you may look at his GetPixel alternative for yet another boost in speed.
At first glance, your code looks overly naive.
Which explains why it doesn't always work.
I like the approach Steve Wortham suggested,
but it might run into problems if you have background images.
Another approach that often helps with images is to blur them first.
If you blur your example image enough, each line of text will end up
as a blurry smooth line. You then apply some sort of algorithm to
basically do a regression analisys. There's lots of ways to do
that, and lots of examples on the net.
Edge detection might be useful, or it might cause more problems that its worth.
By the way, a gaussian blur can be implemented very efficiently if you search hard enough for the code. Otherwise, I'm sure there's lots of libraries available.
Haven't done much of that lately so don't have any links on hand.
But a search for Image Processing library will get you good results.
I'm assuming you're enjoying the fun of solving this, so not much in actual implementation detalis here.
Measuring the angle of every line seems like overkill, especially given the performance of GetPixel.
I wonder if you would have better performance luck by looking for a white triangle in the upper-left or upper-right corner (depending on the slant direction) and measuring the angle of the hypotenuse. All text should follow the same angle on the page, and the upper-left corner of a page won't get tricked by the descenders or whitespace of content above it.
Another tip to consider: rather than blurring, work within a greatly-reduced resolution. That will give you both the smoother data you need, and fewer GetPixel calls.
For example, I made a blank page detection routine once in .NET for faxed TIFF files that simply resampled the entire page to a single pixel and tested the value for a threshold value of white.
What are your constraints in terms of time?
The Hough transform is a very effective mechanism for determining the skew angle of an image. It can be costly in time, but if you're going to use Gaussian blur, you're already burning a pile of CPU time. There are also other ways to accelerate the Hough transform that involve creative image sampling.
Your latest output is confusing me a little.
When you superimposed the blue lines on the source image, did you offset it a bit? It looks like the blue lines are about 5 pixels above the centre of the text.
Not sure about that offset, but you definitely have a problem with the derived line "drifting" away at the wrong angle. It seems to have too strong a bias towards producing a horizontal line.
I wonder if increasing your mask window from 3 pixels (centre, one above, one below) to 5 might improve this (two above, two below). You'll also get this effect if you follow richardtallent's suggestion and resample the image smaller.
Very cool path finding application.
I wonder if this other approach would help or hurt with your particular data set.
Assume a black and white image:
Project all black pixels to the right (EAST). This should give a result of a one dimensional array with a size of IMAGE_HEIGHT. Call the array CANVAS.
As you project all the pixels EAST, keep track numerically of how many pixels project into each bin of CANVAS.
Rotate the image an arbitrary number of degrees and re-project.
Pick the result that gives the highest peaks and lowest valleys for values in CANVAS.
I imagine this will not work well if in fact you have to account for a real -45 -> +45 degrees of tilt. If the actual number is smaller(?+/- 10 degrees), this might be a pretty good strategy. Once you have an intial result, you could consider re-running with a smaller increment of degrees to fine tune the answer. I might therefore try to write this with a function that accepted a float degree_tick as a parm so I could run both a coarse and fine pass (or a spectrum of coarseness or fineness) with the same code.
This might be computationally expensive. To optimize, you might consider selecting just a portion of the image to project-test-rotate-repeat on.