Image straightening algorithm

Image straightening algorithm - c#

I am looking for a way to auto-straighten my images, and I was wondering if anyone has come across any algorithms to do this. I realize that the ability to do this depends on the content of the image, but any known algorithms would be a start.
I am looking to eventually implement this in C# or PHP, however, I am mainly after the algorithm right now.
Is this possible with OpenCV? ImageMagick? Others?
Many thanks,
Brett

Here is my idea:
edge detection (Sobel, Prewitt, Canny, ...)
hough transformation (horizontal lines +/- 10 degrees)
straighten the image according to the longest/strongest line
This is obviously not going to work in any type of image. This is just meant to fuel the discussion.

Most OCR programs straighten the scanned image prior to running recognition. You probably find good code in the many open source'd OCR programs, such Tesseract

Of course this does depend on what type of images you want to straighten, but there seems to be some resources available for automatic straightening of text scans.
One post I found mentioned 3 programs that could do auto-straightening:
TechSoft's PixEdit 7.0.11
Mystik Media's AutoImager 3.03
Spicer's Imagenation 7.50
If manual straightening is acceptable, there are many tutorials out there for how to straighten them manually using Photoshop; just google "image straightening"

ImageMagick has the -deskew option. This will simply rotate the image to be straight.
Most commercial OCR engines like ABBYY FineReader and Nuance OmniPage do this automatically.
The Leptonica research library has a command line tool called skewtest which will rotate the image.
I have not found a library which can take an image which has been distorted in any other way (like pin cushion or if it has been moved during a scanning operation, or removing the warp at the edge of a book). I am looking for a library or tool that can do this, but cannot find one.
Patrick.

Related

How to handle large Images?

I want to create my own Google Map like this:
My problem is that I can't load and edit my large images.
My Images:
PNG / JPG
700 MiB
300000px x 300000px
My attempts:
ImageMagick
.NET C# / BitmapImages ...
C++ / OpenCV
general image classes in Java and Python
With which language / library I can edit these big images.

I help maintain libvips, an image processing library designed to work with very large images. It's free and works on Linux, Mac and Windows. You can use it from the command-line, C#, C/C++, Python, Ruby and others.
You can make your google maps tiles from the command-line like this:
vips dzsave hugefile.tif myoutputdir --layout google
Or from Python (for example) like this:
import pyvips
image = pyvips.Image.new_from_file("somehugefile.tif", access="sequential")
image.dzsave("filename/of/pyramid", layout="google")
And it'll scan your huge tiff image and generate all the tiles. It's fast, it needs little memory and it'll work on images of any size. I regularly make 200,000 x 200,000 deepzoom images from microscope slides using my small laptop.
There's a chapter in the libvips docs introducing dzsave and explaining how to use it.

This is not a full answer, but I need a little more space than a comment can give.
Take a look at the large image support section on the ImageMagick or the discussion board.
This answer mentions the VIPS package which might be helpful.
You might also consider posting in photography stackexchange, or even blender stackexchange - for example I saw this answer which mentions writing individual image tiles - also here, although that question is about rendering. Blender is not specifically for image processing and editing, but it's pretty amazing and flexible and has a very active and supportive community. You can use python within Blender as well.
You could also think of asking in gis stackexchange.
When you post in the other stackexchanges, take a look around first and make sure you write your question so that it does not look too off-topic for that site.
Good luck - it seems tiling is everywhere!

Search a specific folder for an image that is found within a picture

Been trying Google for a while and haven't even found a start point for this problem.
I am looking for a starting point for developing a check in system. I have a large database of images of inventory in house. Very clean digital HD images with no background or anything. I am looking to do a local image search.
I will have a small temp folder with only the images of products in the current order. Then to verify that the item in employees hand is that same, I want to scan the item in real time and compare it to the images in the folder. and work from there.
I can't seem to find any documentation on any classes that can help me with this functionality.
For example say I have an image on my PC, and I print out that image to paper in a VERY high quality. I want to then be able to match the print out to the original file.
Is there anything built into .Net for this?

I have done something similar in the past. But in my case it was a facial recognition system. It worked pretty well but you have to remember that it might not work in 100% of the cases. Visual recognition is a very complicated subject and we have yet to develop a way to have a flexible 100% accuracy system.
As to how I did it, I developped a NN(Neural Network) algorithm. This algorithm had to be trained against a specific set of pictures.
Another popular approach is to use a SVM(Support Vector Machine) algorithm instead of a neural network. Then again, you will most likely not get a result that is 100% accurate.
Keep in mind that there are many different algorithms that can be used to do visual recognition. Two other popular algorithms for facial recognition are Eigenfaces and Fisherfaces.
Sadly, I have not worked with those kind of project in .Net. But you might want to check for a third party NN or SVM library for .Net.
Here is a link to a SO thread about NN
Open-source .NET neural network library?
Here is a link to a SO thread about SVM
Support Vector Machine library for C#

What should I use for monospaced digits recognition?

I have to recognize digits within an image from video stream and there are several more things, that should make recognition easier:
1) it is fixed font 6x8, all symbols are equal width
2) I know exact positions of digits, they are always rectangular, are not rotated/sqewed/scaled, but there may be some distortions because of air transmission glitch.
3) It is only digits and .
4) digit background is semi black (50% opaque)
I've tried tesseract v2 and v3, but .NET wrappers aren't perfect and recognition error was very large, even if I trained with custom font, as far as I understand that is because of small resolution.
I've made very simple algorithm by my self by turning image to black and white and counting matching pixels between original font image and image from stream, it performs better than tesseract, but I hink more sophisticated algorithm would do better.
I've tried to train AForge using ActivationNetwork with BackPropagationLearning and it fails to converge(this article first part, as long as I don't need scaling and several fonts http://www.codeproject.com/Articles/11285/Neural-Network-OCR, as I understand code in article is for older version of AForge), bad thing is, that this project is not supported anymore, forum is closed and google groups as I understand too.
I know there's OpenCV port to .NET, as far as I see, it has different network approaches than AForge, so questiton is which approach would fit best.
So is there any .NET framework to help me at this, and if it supports more than one neural network implementations, which implementation would fit best?

For fixed size fonts at a fixed magnification, you can probably get away with a less-sophisticated OCR approach based on template matching. See here for an example of how to do template matching using OpenCV (not .NET, but hopefully enough to get you started.) The basic idea is that you create a template for each digit, then try matching all templates at your target location, choosing the one with the highest match score. Because you know where the digits are located, you can search over a very small area for each digit. For more information on the theory behind template-matching, see this wiki article on Cross-correlation.
This is actually the basis for simplified OCR applications (usually for recognizing special OCR fonts, like the SEMI standard fonts used for printing serial numbers on silicon wafers.) The production-grade algorithms can also support tolerance for scaling, rotation and translation, but the underlying techniques are pretty much the same.

Try looking at this project and this project too. Both projects explain how OCR works and shows you how to implement it in C# and .NET.

If you are not in an absolute hurry I would advise you to first look for a method that solves the problem. I've made good experiences with WEKA. Using WEKA you can test a bunch of algorithms pretty fast.
As soon as you found the algorithm that solves your problem, you can either port it to .NET, build a wrapper, look for an implementation or (if it's an easy algo) rebuild it in .NET.

Getting Matrix (CGAffineTransform) Info from iPhone Movies in C#/C++

When the iPhone records a video it puts the data from the camera directly onto the disk. What tells the player how to reorient the video is the Transform Matrix. Its a Mathematics structure that is used to change the position of the pixels in X,Y space.
In the iPhone and in the Macintosh I can ask the Video what its Transform is, and I get back a CGAffineTransform with a b c d tx and ty. Apple describes the Transform Matrix here
With this information I can determine what the Video layout is supposed to be and determine if it is expecting to be rotated before display.
I can get this information with ease in the osx and ios environments. I am trying to determine a way to get the same matrix information with Windows. Preferrable C# however if I must use C++ then so be it. Active X solutions are entirely undesirable and I am hoping that the Quicktime SDK for windows has some use. Otherwise what the heck did Apple write it for ??
If anyone knows how to obtain the Transform Matrix from a video or any place to start please, point me in the right direction.

It appears that the CGAffineTransform is something that will need to be pulled right out of the file itself. I used the Quicktime File Format Specification pdf to gain an understanding of the file and where to get the CGAffineMatrix
Here is a link to the page with the Matrix data on it
Quicktime File Format Specification Matrix Info
as you can see from this clip the matrix is in the Movie header atom. dignified with the 'mvhd' type.
it is a total of 36 Bytes long and is a total of 36 bytes after the 'mvhd' atom typename.
Given the file format specifications of the frames and tracks the Matrix can change throughout the playback of the video. But it is my experience that this method is not exercised on the videos that are output from the iphone.
I imagine that the matrix will need to be grabbed from each frame sometime in the near future and perhaps this is something that FFMpeg or other video format applications can work into their frame grabbers and video translators. But since I currently do not have a version of ffmpeg that used this matrix information I will be creating a simple Movie header grabber that will pull out the matrix and allow me to adjust my ffmpeg command line parameters accordingly, allowing me to translate my video appropriately..
If I come up with a better idea I will try to include this post on that knowledge.
A side note on the journey to this answer
For all of those who downvoted this question because You did not know what I was talking about. A request to clarify could have been quite sufficient. Coming in and downvoting because you dont understand or dont know the answer is neither constructive or fair. This is a very specific question and this answer will assist more than just myself.
I believe in helping to spread all kinds of knowledge and those of you who think its helpful to downvote because you just dont have any clue on how to help. I hope this gives you a better understanding of the issues that people are looking to solve. Just because you dont know what the problem is does not mean you should turn your nose up at it and certainly does not mean that you should discount those looking for answers that you cannot provide.
I am glad that I have an answer to my question and will definitely be open to any further criticism to the answer that I have given. Perhaps this answer will spark more questions about this issue and I will be able to learn and contribute to the future discussions about it
Thank you StackOverflow for restoring my question so I could answer it appropriately.

Howto: Improve the PDF- quality before OCR using C#

I'm creating a service that monitors a folder for scanned files. Once the file is there, The service picks it up, and convert it to a readable PDF. In this process the service also searches for a barcode. After this, the text is extracted and the file, with its text is stored into the database of our software. The location is based on the barcode.
Now, for the OCR we are using the SDK of Atalasoft (http://www.atalasoft.com/).
Also the Barcode recognizer is included in this SDK.
But the converted text still has some mistakes. (I ran some tests with other OCR-programs, but Atalasoft came out nice.)
I'm looking for some software (SDK-kit) which allows me to improve the quality of the PDF for OCR purposes.
I tested Kofax VRS Elite (http://www.kofax.com/vrs-virtualrescan/). I'm looking for something similar, but that can be implemented in the service using some kind of SDK-kit.
Anyone who did this before, or had similar problems?
thx in advance!

You may try and follow a different path altogether:
See if you can configure the scanner(s) to scan directly to PDF and do the OCR on the fly. The Lexmark scanners can do this. This creates PDF's with selectable and searchable text. This in turn can be extracted with a PDF reading library.
Alternatively you may want to have a look at http://www.abbyy.com/ and see if you get better results.
If these are not good options, you may want to break down your problem in a systematic way:
1. Is the image quality of the scanned images the problem? If so, then this will have to be fixed first. Your OCR solution may be affected by resolution, contrast, and colour.
2. Is it the OCR software? Take a highly legible document and see if the OCR software makes mistakes. If so, then you know you have to find better OCR software.
3. If your document quality is decent and your OCR software has a high success rate in deciphering a legible document, then you may want to look at the exceptions that do not work, and tackle these on a case by case basis.
If smears and background images on documents is the cause of the problem, you may want to look into ways of avoiding this, or cleaning this with image processing software that exposes an API.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.