I have an OCR C# project where I get a scanned document with text in it, and I need to return the text in the document.
I already have the solution for parsing the text, however we are stuck in the part where the scanned document is rotated (to the right or to the left).
Suppose there is no noise in the image (All pixels are white or black), can anyone help us with an algorithm to rotate the image in runtime (Without a human eye)?
Thanks
Use Hough Transform to detect the strongest line orientation which should be the horizontal text orientation. The basic premise of the Hough Transform is to convert x-y coordinate to a r-theta coordinate system where r is the distance from origin and theta is the orientation.
Once the image is transformed, bin same thetas to find the strongest orientation.
Because this method uses voting within discrete r and thetas. The resolution of the theta is only as good as number of bins used. So instead of using -180 to +180 degree in one degree increment, you might want to bound it for either more accurate angle or speed.
(I not an expert but by curiosity write this post)
IMHO, this problem can be solved cost effectively with brute force trial and error approach. Because there can be not too many wrong orientation.
I think your can easily determine the bounding box of text. This bounding box can have wrong orientation only in two way. Rotated clock wisely or Rotated counter clock wisely. So with maximum two rotation of image (rotation that make bounding box upright) you can find correct orientation.
That is, you could find correct document orientation without further processing of image to determine text align. And determining the text align will be rather large processing I think.
UPDATE
I'm suggesting that we don't have to find exact rotation angle. If the bonding box is upright it can be in the right angle or 180 degree rotated angle.
1) make bonding box upright
2) run OCR, check the result, if ok its done
3) rotate 180 degree
2) run OCR. this time it must be in the right angle
If we really have to find the exact rotation angle, I think it must start with finding possible shape of character 'o', 'c', or 'm' (excluding italic font). Or, find relative location of the period('.'). This will require complicated operation, I think.
Related
I'm using Emgu CV to find an isosceles triangle in an image, from this triangle that's been detected I'm attempting to determine the orientation (front, left, right, and back side) and what the rotation of the triangle is (ex: -30 degrees).
I'm able to detect where this triangle is and what each of the three coordinates are, I'm just not sure how to continue on finding orientation and angle of rotation. Would this be a function of Emgu CV, or just simple math; and how would I go about it?
Find the two sets of co-ordinates closest together (Pythagoras's theorem makes that simple).
That's your short side.
The point not used on that side is the front.
Left and right are just the lines clockwise and anticlockwise from the front.
The angle can be found using simple trigonometry between the first line you just found and a hypothetical line you are measuring the angle from.
You will need to look up the relevant math but each of those steps is reasonably straightforwards on its own once you break it down like that.
I'm feeding in a Bitmap image to my C# program to be able to perform OCR to identify the characters in the image. I can do this fairly well if the image is not rotated. One of the program requirements, however, is that the program automatically determines if the image has been rotated, and that it automatically corrects these rotations.
I've tried implementing a simple method where lines are traced across the image and points which contact a character are recorded, and then performing a simple linear regression on the line points. This works to an extent, although it has not proven very accurate due to curvature of characters, etc.
I was wondering if there was a better method to solve this problem? Many thanks in advance! :)
I use gmseDeskew algorithm to deskew an image in my program. It works very well.
It's an interesting problem to be sure. I'd look for certain letters that are easier to tell rotation for. For example, a capital A or R or K should have both of the lower parts are roughly the same horizontal plane. Another option is to take letters that cannot be identified and rotate them in various ways and re-attempt to identify them. If a letter than could not be identified in the raw scan CAN be identified when you rotate it, that's a pretty big clue. Once you have identified the "correction" rotation that makes a non-recognizable character into a recognizable one, apply the same rotation value to the others.
If it recognizes lines of text, then try to blur the image so that lines are mostly solid and find direction of the lines (either with analysis of Fourier transform or by ridge detection).
If the text is formatted like a printed document (column(s) and lines of text) then you can take advantage of this.
An approach that I've often seen used for document text is to do projection profiles:
Scan a document at a specific orientation and sum up the number of "black" pixels on each scan line (creating a 1D array of counts, each index representing a Y coordinate, the profile).
Calculate the variance of the counts (profile).
Repeat for multiple angles, (can be done in a binary search fashion to reduce processing)
The angle that results in the greatest variance is the correct angle (due to the text lines creating large peaks from the printed text, and low valleys due to the absence of text between the lines)
Then after finding this angle you can adjust your image accordingly and do your awesome OCR.
It might be easier to find the vertical-ish lines that are adjacent to the text (i.e., the left margin). For each scanline, record the first black pixel. Put all of those in a linear regression, and you should get a near vertical line. Measure its angle from true vertical and you should be able to unrotate the text. You could imagine doing the same thing for the top, bottom, and right sides, too, and taking an average.
We faced a similar problem before, and we searched for an easy and quick solution, and we ended up using a commercial toolkit (leadtools). You can use it to do auto processing to the image before OCR it. You can check this help topic to know how to use this toolkit to process and scan images.
In game Im trying to make, I have some ships(not space ships or so, actual ships they are in water)
If I just directly rotate them, I get absurd results.
Do I need to make 8 picture for each ship ? (considering there is 8 direction)
Are there any way that I can do it with just creating one image or at least a few, instead of 8 ?
Essentially, rotation mathematics are an interpretation of the original image.
Sure, it works depending on the complexity of the image and the relationship of straightlines and things that are perpendicular, but some things just dont work.
If you're doing a top-down 2D game with ships, I'm going to assume Sail ships here, then rotating mathematically really just isn't going to look good as the sails them selves will move and angle depending on Wind speed/direction and the angle of the ship.
Long story short ? Mathematical rotation works well for an Asteroids style triangle ship, doesn't work well for proper graphics.
Hope this helps!
If you are talking 2D graphics and are getting "absurd results" I'm assuming you're not taking into account an origin. If you have a Texture2D and give it a rotation value, it will be rotating by the default origin which is (0,0). Try setting your origin in your spritebatch.Draw call to a new Vector2(texture.width / 2, texture.height / 2) and see if that is a step in the right direction.
Another approach would be to have a spritesheet with the 8 drawings that you mention and reference a different source rectangle of the texture2D.
We want a c# solution to correct the scanned image because it is rotated. To solve this problem we must detect the rotation angle first and then rotate the image. This was our first thought for our problem. But then we thought image warping would be more accurate as I think it would make the scanned image like our template. Then we can process it as we know all the coordinates of our template... I searched for a free SDK or a free solution in c#. Helping me in this will be great as it is the last task in our work. Really, thanks for all.
We used the PrimeOCR product to do this. It's not free, but we couldn't find a free program that was comparable.
So, the hard part is to detect the angle of the page.
If you have full control over the template, the simplest way to do this is probably to come up with an easily-detectable symbol (e.g. a solid black circle) and stick 3 of them on the template. Then, detect them (just look for big blocks of pixels with high saturation, in the case of a solid black circle).
So, you'll then have 3 sets of coordinates. If you have a top circle, a left circle, and a right circle with all 3 circles at difference distances from one another, detecting which circle is the top circle should be pretty easy.
Then just call a rotation function. This part is easy and has been done before (e.g. http://www.switchonthecode.com/tutorials/csharp-tutorial-image-editing-rotate ).
Edit:
I suggested a circle because it's easier to find the center, but a rectangle should work, too.
To be more explicit about how to actually locate the rectangles/circles, take the average Brightness value of every a × a group of pixels. If that value is greater than b, then that a × a group of pixels is part of a rectangle. a and b are varables you'll want to come up with yourself.
Use flood-fill (or, more precisely, Connected Component Labeling) group the resulting pixels together. The end result should give you your rectangles.
I need to write a program that uses matrix multiplication to rotate an image (a simple square), based on the center of the square, a certain amount of degree based on what I need. Any help on this would be greatly appreciated. I almost have no clue as to what I'm doing because I have not taken so much as a glance at Calculus.
Take a look at http://www.aforgenet.com/framework/. This is a complete image processing framework in C# that I'm using on a project. I just checked their help and they have a function that does what you want -
// create filter - rotate for 30 degrees keeping original image size
RotateBicubic filter = new RotateBicubic( 30, true );
// apply the filter
Bitmap newImage = filter.Apply( image );
It is an LGPL library, so if licensing is an issue, if you link against their binaries, you will have no issues. Their are also other libraries out there.
If you do decide to write it yourself, be careful about speed as C# doing number crunching is not great. But there are ways to work around it.
Here's a good code project article discussing just what you're wanting:
http://www.codeproject.com/KB/GDI-plus/matrix_transformation.aspx
Rotating an digital image in the plane boils down to a lot of 2X2 matrix multiplications. There's no calculus involved here! You don't need an entire image processing framework to rotate a square image - unless this is really performance sensitive in terms of image quality and speed.
Go and read the first half of Wikipedia's article on the rotation matrix and that should get you off to a good start.
In a nutshell, establish your origin (perhaps the center of the image if that's where you want to rotate around), then compute in pixel space the coordinate of a pixel you'd like to rotate, and multiply by your rotation matrix (see article.). Once you've done the multiply, you'll have your new coordinates of the pixel in pixel space. Write out that pixel in another image buffer and you'll be off and rotating. Repeat. Note that once you know your angle of rotation, you only need compute your rotation matrix once!
Have fun,
Paul