I have to recognize digits within an image from video stream and there are several more things, that should make recognition easier:
1) it is fixed font 6x8, all symbols are equal width
2) I know exact positions of digits, they are always rectangular, are not rotated/sqewed/scaled, but there may be some distortions because of air transmission glitch.
3) It is only digits and .
4) digit background is semi black (50% opaque)
I've tried tesseract v2 and v3, but .NET wrappers aren't perfect and recognition error was very large, even if I trained with custom font, as far as I understand that is because of small resolution.
I've made very simple algorithm by my self by turning image to black and white and counting matching pixels between original font image and image from stream, it performs better than tesseract, but I hink more sophisticated algorithm would do better.
I've tried to train AForge using ActivationNetwork with BackPropagationLearning and it fails to converge(this article first part, as long as I don't need scaling and several fonts http://www.codeproject.com/Articles/11285/Neural-Network-OCR, as I understand code in article is for older version of AForge), bad thing is, that this project is not supported anymore, forum is closed and google groups as I understand too.
I know there's OpenCV port to .NET, as far as I see, it has different network approaches than AForge, so questiton is which approach would fit best.
So is there any .NET framework to help me at this, and if it supports more than one neural network implementations, which implementation would fit best?
For fixed size fonts at a fixed magnification, you can probably get away with a less-sophisticated OCR approach based on template matching. See here for an example of how to do template matching using OpenCV (not .NET, but hopefully enough to get you started.) The basic idea is that you create a template for each digit, then try matching all templates at your target location, choosing the one with the highest match score. Because you know where the digits are located, you can search over a very small area for each digit. For more information on the theory behind template-matching, see this wiki article on Cross-correlation.
This is actually the basis for simplified OCR applications (usually for recognizing special OCR fonts, like the SEMI standard fonts used for printing serial numbers on silicon wafers.) The production-grade algorithms can also support tolerance for scaling, rotation and translation, but the underlying techniques are pretty much the same.
Try looking at this project and this project too. Both projects explain how OCR works and shows you how to implement it in C# and .NET.
If you are not in an absolute hurry I would advise you to first look for a method that solves the problem. I've made good experiences with WEKA. Using WEKA you can test a bunch of algorithms pretty fast.
As soon as you found the algorithm that solves your problem, you can either port it to .NET, build a wrapper, look for an implementation or (if it's an easy algo) rebuild it in .NET.
Related
I am implementing SURF to detect Digits inside a seven segment display using some template. But it is not working fine. Is there any way that can be slow but more effective. I am using Emgu Wrapper for OpenCV
I would suggest you don't use SURF and instead look into using tesserect for character recognition.
SURF is really good for recognising patterns such as logo's and images, but for characters tesserect will not only produce better results, it's easier to implement!
You can create your own custom fonts to look for if the digits you are trying to read are non-standard.
https://www.youtube.com/watch?v=RqvvXJXuRYY&list=UUxAnMtjN08ryThpgYTBmILg
Try following this tutorial, its really helpful for getting started with OCR.
It's in VB but it won't be hard to write in C# once you have got the logic down.
Hope this helps!
Delivering SCADA solutions, we often get the our end user specifications specified in Structured Control Diagram (visio like flow diagrams seen below) that are often submitted in PDF format or as images.
In order to access these in C#, I was hoping to use one of the OpenCV libraries.
I was looking at template recognition, but it seems a wrong fit to start feeding into a machine learning algorithm to teach it to recognize the preknown specific shape of boxes and arrows.
The libraries I've looked at have some polyedge functions. However, as can be seen from the example below there is the danger that the system will treat the whole thing as one large polygon when there is no spacing between elements..
The annotations may be any 90 degree rotation and I would like to identify them as well as the contents of the rectangles using OCR.
I do not have any experience in this, which should be apparent by now, so I hope somebody can point me out in the direction of the appropriate rabbit hole. If there are multiple approaches, then choose the least math heavy.
Update:
This is an example of the type of image I'm talking about.
The problem to adress is:
Identification of the red rectangles with texts in cells (OCR).
The identification of arrow, including direction and end point annotations. Line type, if possible.
Template matching of the components.
Fallback to some polyline entity or something if template matching fails.
I'm sure you do realize this is an active field of research, the algorithms and methods described in this post are fundamental, maybe there are better/more specific solutions either completely heuristic or based on these fundamental methods.
I'll try to describe some methods which I used before and got good results from in similar situation (we worked on simple CAD drawings to find logical graph of a electrical grid) and I hope it would be useful.
Identification of the red rectangles with texts in cells (OCR).
this one is trivial for your solution as your documents are high quality, and you can easily adapt any current free OCR engines (e.g. Tesseract) for your purpose,there would be no problem for 90,180,... degrees, engines like Tesseract would detect them (you should config the engine, and in some cases you should extract detected boundries and pass them individually to OCR engine), you may just need some training and fine tuning to achieve maximum accuracy.
Template matching of the components.
Most template-matching algorithms are sensitive to scales and scale invariant ones are very complex, so I don't think you get very accurate results by using simple template matching algorithms if your documents vary in scale and size.
and your shapes features are very similar and sparse to get good results and unique features from algorithms such as SIFT and SURF.
I suggest you to use contours, your shapes are simple and your components are made from combining these simple shapes, by using contours you can find these simple shapes (e.g rectangles and triangles) and then check the contours against previously gathered ones based on component shapes, for example one of your components are created by combining four rectangles, so you can hold relative contours together for it and check it later against your documents in detection phase
there are lots of articles about contour analysis on the net, I suggest you to have a look at these, they will give you a clue on how you can use contours to detect simple and complex shapes:
http://www.emgu.com/wiki/index.php/Shape_%28Triangle,_Rectangle,_Circle,_Line%29_Detection_in_CSharp
http://www.codeproject.com/Articles/196168/Contour-Analysis-for-Image-Recognition-in-C
http://opencv-code.com/tutorials/detecting-simple-shapes-in-an-image/
http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_imgproc/py_contours/py_contours_begin/py_contours_begin.html
by the way porting code to c# using EmguCV is trivial, so don't worry about it
The identification of arrow, including direction and endpoint annotations. Line type, if possible.
There are several methods for finding line segments (e.g. Hough Transform), the main problem in this part is other components as they are normally detected as lines too, so if we find components first and remove them from document, detecting lines would be a lot easier and with far less false detections.
Approach
1- Layer documents based on different Colors, and execute following phases on every desired layer.
2- Detect and extract text using OCR, then remove text regions and recreate the document without texts.
3-Detect Components, based on contour analysis and gathered component database, then remove detected components (both known and unknown types, as unknown shapes would increase your false detection in next phases) and recreate document without components,at this moment in case of good detection we should only have lines
4-Detect lines
5-At this point you can create a logical graph from extracted components,lines and tags based on detected position
Hope this Helps
I cannot give you solutions to all your four questions, but the first question Identification of the red rectangles with texts in cells (OCR) does not sound very difficult. Here is my solution to this question:
Step 1: separate the color image into 3 layers: Red, Blue, and Green, and only use the red layer for the following operations.
Step 2: binarization of the red layer.
Step 3: connected component analysis of the binarization result, and keep the statics of each connected component (width of the blob, height of the blob for example)
Step 4: discard large blobs, and only keep blobs that are corresponding to texts. Also use the layout information to discard false text blobs (for example, texts are always in the large blob, and texts blobs have horizontal writing style and so on).
Step 5: perform OCR on textural components. When performing OCR, each blob will give you a confidence level, and this can be used for validation whether it is a textual component or not.
Been trying Google for a while and haven't even found a start point for this problem.
I am looking for a starting point for developing a check in system. I have a large database of images of inventory in house. Very clean digital HD images with no background or anything. I am looking to do a local image search.
I will have a small temp folder with only the images of products in the current order. Then to verify that the item in employees hand is that same, I want to scan the item in real time and compare it to the images in the folder. and work from there.
I can't seem to find any documentation on any classes that can help me with this functionality.
For example say I have an image on my PC, and I print out that image to paper in a VERY high quality. I want to then be able to match the print out to the original file.
Is there anything built into .Net for this?
I have done something similar in the past. But in my case it was a facial recognition system. It worked pretty well but you have to remember that it might not work in 100% of the cases. Visual recognition is a very complicated subject and we have yet to develop a way to have a flexible 100% accuracy system.
As to how I did it, I developped a NN(Neural Network) algorithm. This algorithm had to be trained against a specific set of pictures.
Another popular approach is to use a SVM(Support Vector Machine) algorithm instead of a neural network. Then again, you will most likely not get a result that is 100% accurate.
Keep in mind that there are many different algorithms that can be used to do visual recognition. Two other popular algorithms for facial recognition are Eigenfaces and Fisherfaces.
Sadly, I have not worked with those kind of project in .Net. But you might want to check for a third party NN or SVM library for .Net.
Here is a link to a SO thread about NN
Open-source .NET neural network library?
Here is a link to a SO thread about SVM
Support Vector Machine library for C#
i'm using c# tessnet2 wrapper for Tesseract OCR engine to capture chracters of image files. i been searching everywhere if tessnet2 has any build in functions to overwrite certain characters and saved them into the same image file it's reading but have not found anything in regards to that. so what i'm thinking of doing is creating a new imagine file base on what i'm receiving from tessnet2 but i need to create the new image the same exact way but change just few things in the new created image. i'm not sure if i'm using the correct methology or if there is other c# assemblies out there that allow you to read characters from image file and at the same time allow you to manipulate as you need them.
Good luck--but tess has no way of replacing in the proper font. Raster graphics don't generally store glyph information. Even if it did, you would potentially be in violation of licenses and/or copyrights surrounding the fonts you'd be writing in. I'm not an expert in OCR, but I will confidently say that this is something not readily available out there in the wild.
To expand on Brian's answer:
You will need to do this yourself. I have not worked with Tesseract, but I have used the Nuance OCR engine. It will return you font information as well as coordinates for the character it has recognized (note that you will most likely have to compute the actual image coordinate as the OCR engine will have deskewed the image before performing the recognition). Once you get the coordinates and the deskew so that you can compute the actual coordinate, you can then use any image manipulation library (Leadtools, Accusoft, etc) or just straight GDI+ functions to clear the character, then using the font info and size info create a new character and merge it into the image. This is not trivial but certainly doable.
Edit:
It was late when I wrote the initial answer, wanted to clarify what is meant by font information. The OCR engine will give you information regarding the point size, whether its bold/italicized and the font family (Seriph, etc). I do not know of one that will tell you the exact font that the document is in. If you have a sample of the documents that you will process, then you can make a good guess based on the info the OCR engine gives you.
I am looking for a way to auto-straighten my images, and I was wondering if anyone has come across any algorithms to do this. I realize that the ability to do this depends on the content of the image, but any known algorithms would be a start.
I am looking to eventually implement this in C# or PHP, however, I am mainly after the algorithm right now.
Is this possible with OpenCV? ImageMagick? Others?
Many thanks,
Brett
Here is my idea:
edge detection (Sobel, Prewitt, Canny, ...)
hough transformation (horizontal lines +/- 10 degrees)
straighten the image according to the longest/strongest line
This is obviously not going to work in any type of image. This is just meant to fuel the discussion.
Most OCR programs straighten the scanned image prior to running recognition. You probably find good code in the many open source'd OCR programs, such Tesseract
Of course this does depend on what type of images you want to straighten, but there seems to be some resources available for automatic straightening of text scans.
One post I found mentioned 3 programs that could do auto-straightening:
TechSoft's PixEdit 7.0.11
Mystik Media's AutoImager 3.03
Spicer's Imagenation 7.50
If manual straightening is acceptable, there are many tutorials out there for how to straighten them manually using Photoshop; just google "image straightening"
ImageMagick has the -deskew option. This will simply rotate the image to be straight.
Most commercial OCR engines like ABBYY FineReader and Nuance OmniPage do this automatically.
The Leptonica research library has a command line tool called skewtest which will rotate the image.
I have not found a library which can take an image which has been distorted in any other way (like pin cushion or if it has been moved during a scanning operation, or removing the warp at the edge of a book). I am looking for a library or tool that can do this, but cannot find one.
Patrick.