I'm feeding in a Bitmap image to my C# program to be able to perform OCR to identify the characters in the image. I can do this fairly well if the image is not rotated. One of the program requirements, however, is that the program automatically determines if the image has been rotated, and that it automatically corrects these rotations.
I've tried implementing a simple method where lines are traced across the image and points which contact a character are recorded, and then performing a simple linear regression on the line points. This works to an extent, although it has not proven very accurate due to curvature of characters, etc.
I was wondering if there was a better method to solve this problem? Many thanks in advance! :)
I use gmseDeskew algorithm to deskew an image in my program. It works very well.
It's an interesting problem to be sure. I'd look for certain letters that are easier to tell rotation for. For example, a capital A or R or K should have both of the lower parts are roughly the same horizontal plane. Another option is to take letters that cannot be identified and rotate them in various ways and re-attempt to identify them. If a letter than could not be identified in the raw scan CAN be identified when you rotate it, that's a pretty big clue. Once you have identified the "correction" rotation that makes a non-recognizable character into a recognizable one, apply the same rotation value to the others.
If it recognizes lines of text, then try to blur the image so that lines are mostly solid and find direction of the lines (either with analysis of Fourier transform or by ridge detection).
If the text is formatted like a printed document (column(s) and lines of text) then you can take advantage of this.
An approach that I've often seen used for document text is to do projection profiles:
Scan a document at a specific orientation and sum up the number of "black" pixels on each scan line (creating a 1D array of counts, each index representing a Y coordinate, the profile).
Calculate the variance of the counts (profile).
Repeat for multiple angles, (can be done in a binary search fashion to reduce processing)
The angle that results in the greatest variance is the correct angle (due to the text lines creating large peaks from the printed text, and low valleys due to the absence of text between the lines)
Then after finding this angle you can adjust your image accordingly and do your awesome OCR.
It might be easier to find the vertical-ish lines that are adjacent to the text (i.e., the left margin). For each scanline, record the first black pixel. Put all of those in a linear regression, and you should get a near vertical line. Measure its angle from true vertical and you should be able to unrotate the text. You could imagine doing the same thing for the top, bottom, and right sides, too, and taking an average.
We faced a similar problem before, and we searched for an easy and quick solution, and we ended up using a commercial toolkit (leadtools). You can use it to do auto processing to the image before OCR it. You can check this help topic to know how to use this toolkit to process and scan images.
Related
Suppose I have such bitmap from laser scanning with a red line from the laser on it, what would be the right way to find the center of that line? Either to store its coordinates in an array or to just draw a think line over it.
What approach would you suggest? Preferebly with an option to be able to smooth out that line.
(source: gyazo.com)
THanks
I'd suggest to
Convert image to monochrome
Convert image to black-white using "image thresholding"
Split image in small parts
For every part,that is not entirely black, calculate Hough Transform and fine approximating segment
Merge these segments into chain and then smooth them (using Catmull-Rom splines for example)
However It is not the only possible approach, there are a lot of them
I would approach this with a worm. Have your worm start on one pixel and allow it to move along the line. Every time you detect a change in direction in any dimension you place a point. Then fit a spline through your points. Add the start and end locations as points too.
Important issues:
You need to maintain which pixels have been visited, such that when a worm finishes you can detect if you need to start a new one on what is left.
You need to maintain a velocity vector in your worm and weight posible forward choices based on which will more closely continue the line your're currently on. This is because...
You need to deal with topology changes, where you have two or more lines intersecting the same point. or a Split in the line into two.
For fitting the spline itself have a look at Numerics on NuGet
My suggestion is:
you go row by row, saving the coordinates of the first and last appearance of red-ish pixel in each row.
then, stretch a line between each two coordinates in every row, or between the middle pixels of each two coordinates.
I wonder what kind of algorithm Google uses to show up text (like streets/rivers names etc.) on the map? Especially I want to know how they render the strings that they are folded (e.g. N Cahuenga Blvd).
I tried to have a look at Google API, but I couldn't find anything useful... :(
Here's a guess (I have no idea what algorithm they use):
I would draw the streets using a curve that passes through all the corner points of what you call the "broken line". Specifically, I would try using a Bezier curve (a parametric curve that has useful properties for graphics, such as simple and fast calculation using integer arithmetic and nice results). For laying out the text, I would do something to make sure the text flows along the curve - for example as described here.
If you look closely at Google maps on curvy roads, you will see places where letters touch, and where each letter seems to have a different slope.
Their system might be as conceptually simple as drawing 2 curves parallel to the center of the road - spaced one standard glyph height apart. Then if the line is curving towards the top of the letters, space the next letter by measuring one letter width against the top line. If it curves the downward then use the lower line. Draw each letter normal to the line at the resulting point.
We currently have a dynamic image, which holds on it text which is created from user input. This text follows a Bézier curve to define its position and rotation.
For various reasons, the text needs to be changed to be a set of images as the font needs to be very specific. We will therefore have one PNG for every allowable character of the alphabet. So if the user enters the word "TEST", the system will pull out the letters T, E, S and T and position them next to each other. This part isn't an issue.
The problem is forcing each of the images to follow the same Bézier curve as the text did using graphics.DrawString. The images must be positioned correctly, and ideally should be rotated correctly as well.
Is this possible, and how could this be done?
The quick answer is that you "simply"
parametrise the bezier curve evenly (PDF on math) (Explanation of what is wrong with standard parametrisation)
calculate the normals to the curve
arrange your images along the curve according to the even parametrisation using the glyph widths as the parameter distance
rotate your images so that "up" for your image is the normal direction to the curve
But even this does not get a fairly good looking image. Usually you need to apply a nonlinear transform to each image so that parts away from curve have different width than those near the curve, depending on curvature and convexity.
This site explains many of the details by decomposing the outline of an image in paths
However, as the previous links I'm sure start to show, this is a calculation-intensive process. Instead, you may find it much easier to simply convert your images to fonts and use the method you were using previously. This solution would rely upon some third-party tool to do the conversion, and I hesitate to make suggestions. One direction, though, (of many) would be to use a raster-to-vector graphics tool like the open source Inkscape and create your fonts from the vector graphics output. This method scales best but may involve a separate step of converting the output to a preferred font format like True-Type.
We want a c# solution to correct the scanned image because it is rotated. To solve this problem we must detect the rotation angle first and then rotate the image. This was our first thought for our problem. But then we thought image warping would be more accurate as I think it would make the scanned image like our template. Then we can process it as we know all the coordinates of our template... I searched for a free SDK or a free solution in c#. Helping me in this will be great as it is the last task in our work. Really, thanks for all.
We used the PrimeOCR product to do this. It's not free, but we couldn't find a free program that was comparable.
So, the hard part is to detect the angle of the page.
If you have full control over the template, the simplest way to do this is probably to come up with an easily-detectable symbol (e.g. a solid black circle) and stick 3 of them on the template. Then, detect them (just look for big blocks of pixels with high saturation, in the case of a solid black circle).
So, you'll then have 3 sets of coordinates. If you have a top circle, a left circle, and a right circle with all 3 circles at difference distances from one another, detecting which circle is the top circle should be pretty easy.
Then just call a rotation function. This part is easy and has been done before (e.g. http://www.switchonthecode.com/tutorials/csharp-tutorial-image-editing-rotate ).
Edit:
I suggested a circle because it's easier to find the center, but a rectangle should work, too.
To be more explicit about how to actually locate the rectangles/circles, take the average Brightness value of every a × a group of pixels. If that value is greater than b, then that a × a group of pixels is part of a rectangle. a and b are varables you'll want to come up with yourself.
Use flood-fill (or, more precisely, Connected Component Labeling) group the resulting pixels together. The end result should give you your rectangles.
I am hoping to obtain some some help with 2D object detection. I'll give a brief overview of the context in which this will be implemented.
There will be an image taken of the ceiling. The ceiling will have markers placed on it so the orientation of the camera can be determined. The pictures will always be taken facing straight up. My goal is to detect one of these markers in the image and determine its rotation. So rotation and scaling(to a lesser extent) will be the two primary factors used in the image detection. I will be writing the software in either C# or matlab(not quite sure yet).
For example, the marker might be an arrow like this:
An image taken of the ceiling would contain markers. The software needs to detect a single marker and determine that it has been rotated by 170 degrees.
I have no prior experience with image analysis. I know image processing is a fairly broad topic and was hoping to get some advice on which direction I should take and which techniques would be best for my application. Thanks!
I'm not directly in this field but I would tell you to start by looking into edge detection specifically. If you have a background in math/engineering the materials are pretty easy to understand:
This seemed to spark some ideas:
http://www.cfar.umd.edu/~fer/cmsc426/lectures/edge1.ppt
I'd recommend MATLAB or if you're intent on using C#, Emgu CV is pretty good.
Hough transforms are a great idea. Once you detect the edges in your image, using, say a Canny edge detector, you get an edge image (which is binary image with only 1 or 0 for values).
Then, the Hough straight line transform (essentially) spins a line about each white pixel in the edge image (the resolution of the line depends on you) using a parametrized function for the line and calculates the total number of white (valued at 1) pixels along each spun line and stores this information in a big accumulator which stores the data indexed by the parameters of the line.
alt text http://upload.wikimedia.org/wikipedia/en/a/af/Hough_space_plot_example.png
In the example above, the parametric form for a line is:
rho = x*cos(theta) + y*sin(theta)
where rho is the distance and theta is
the angle
So as you can see the, if you look at the bin at a particular orientation you can find out how many lines are oriented at that angle. Of course, you'll have to do some extra work to figure out which lines are oriented at that angle since you have 5 other lines per arrow but that shouldn't be too hard.
as always in computer vision, your first problem is image illumination and acquisition. before going further, establish how your markers will be printed on the ceiling, what their form will be, what light you will be using to see them, and what camera setup you will chose to look at the markers.
given a good material, a good light and a good camera, you may have no problem at all to process the image. for example, you can print a full arrow in a retro-reflective material, with a longer tail than your example, use a colored light and a corresponding filter on the camera. now all you have on your image is arrows... there are plenty other ways of acquiring the image that will help you there.
once you have plain arrows, a simple blob analysis (which consist of computing statistical moments of objects in the image) will give you a lot of informations: each arrow should have values almost equal for the 7 hu moments, which allows you to filter objects efficiently, also the orientation computed from the central moments will give you the angle of the arrow. blob analysis being only statistical, it is extremely fast.
Several systems have been developed to detect markers and their orientation robustly:
reacTIVision (open source) uses these types of tags to find position and orientation:
ARToolKit (open source) uses a different type of tags to extract all 6 degrees of freedom:
alt text http://www.schanes.net/docs/robot/marker.png
If your primary goal is not to learn, but to make the application work, I would suggest you use one of these. It is not a trivial task for a beginner to robustly detect the position and orientation of a random marker in an image.
On the other hand, if you are manly interested in learning, I would also direct you to ARToolKit and its publications (and their references) that explain how to robustly implement marker detection.
You will need to explore edge detection, so look into Hough filters. After that you will need to look into pattern classifiers and feature extraction.
This paper has an algorithm that appears to work without edge detection.
This book excerpt is more oriented toward the kind of symbol detection you intend, once you have done the edge detection.
A rigorous way to determine the orientation of an imaged acquired under projective geometry (most of cameras) is using the vanishing points and vanishing lines. Good news to you: your marker can be used to find this information! More good news, your image can be rectified, so the image columns (the y-axis) will correspond to the up-down direction. You will find more about this stuff in chapter 8 of Hartley and Zisserman's book, Multiple View Geometry in Computer Vision.
Also remember that probably you will need to work on the radial distortion issue, the distortion caused by the camera lens. The other guys are right about the arrow detection problem: you have to use edge detection and, after that, Hough transform or template matching. Refer to Gonzalez and Woods' book Digital Image Processing for details.