Delivering SCADA solutions, we often get the our end user specifications specified in Structured Control Diagram (visio like flow diagrams seen below) that are often submitted in PDF format or as images.
In order to access these in C#, I was hoping to use one of the OpenCV libraries.
I was looking at template recognition, but it seems a wrong fit to start feeding into a machine learning algorithm to teach it to recognize the preknown specific shape of boxes and arrows.
The libraries I've looked at have some polyedge functions. However, as can be seen from the example below there is the danger that the system will treat the whole thing as one large polygon when there is no spacing between elements..
The annotations may be any 90 degree rotation and I would like to identify them as well as the contents of the rectangles using OCR.
I do not have any experience in this, which should be apparent by now, so I hope somebody can point me out in the direction of the appropriate rabbit hole. If there are multiple approaches, then choose the least math heavy.
Update:
This is an example of the type of image I'm talking about.
The problem to adress is:
Identification of the red rectangles with texts in cells (OCR).
The identification of arrow, including direction and end point annotations. Line type, if possible.
Template matching of the components.
Fallback to some polyline entity or something if template matching fails.
I'm sure you do realize this is an active field of research, the algorithms and methods described in this post are fundamental, maybe there are better/more specific solutions either completely heuristic or based on these fundamental methods.
I'll try to describe some methods which I used before and got good results from in similar situation (we worked on simple CAD drawings to find logical graph of a electrical grid) and I hope it would be useful.
Identification of the red rectangles with texts in cells (OCR).
this one is trivial for your solution as your documents are high quality, and you can easily adapt any current free OCR engines (e.g. Tesseract) for your purpose,there would be no problem for 90,180,... degrees, engines like Tesseract would detect them (you should config the engine, and in some cases you should extract detected boundries and pass them individually to OCR engine), you may just need some training and fine tuning to achieve maximum accuracy.
Template matching of the components.
Most template-matching algorithms are sensitive to scales and scale invariant ones are very complex, so I don't think you get very accurate results by using simple template matching algorithms if your documents vary in scale and size.
and your shapes features are very similar and sparse to get good results and unique features from algorithms such as SIFT and SURF.
I suggest you to use contours, your shapes are simple and your components are made from combining these simple shapes, by using contours you can find these simple shapes (e.g rectangles and triangles) and then check the contours against previously gathered ones based on component shapes, for example one of your components are created by combining four rectangles, so you can hold relative contours together for it and check it later against your documents in detection phase
there are lots of articles about contour analysis on the net, I suggest you to have a look at these, they will give you a clue on how you can use contours to detect simple and complex shapes:
http://www.emgu.com/wiki/index.php/Shape_%28Triangle,_Rectangle,_Circle,_Line%29_Detection_in_CSharp
http://www.codeproject.com/Articles/196168/Contour-Analysis-for-Image-Recognition-in-C
http://opencv-code.com/tutorials/detecting-simple-shapes-in-an-image/
http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_imgproc/py_contours/py_contours_begin/py_contours_begin.html
by the way porting code to c# using EmguCV is trivial, so don't worry about it
The identification of arrow, including direction and endpoint annotations. Line type, if possible.
There are several methods for finding line segments (e.g. Hough Transform), the main problem in this part is other components as they are normally detected as lines too, so if we find components first and remove them from document, detecting lines would be a lot easier and with far less false detections.
Approach
1- Layer documents based on different Colors, and execute following phases on every desired layer.
2- Detect and extract text using OCR, then remove text regions and recreate the document without texts.
3-Detect Components, based on contour analysis and gathered component database, then remove detected components (both known and unknown types, as unknown shapes would increase your false detection in next phases) and recreate document without components,at this moment in case of good detection we should only have lines
4-Detect lines
5-At this point you can create a logical graph from extracted components,lines and tags based on detected position
Hope this Helps
I cannot give you solutions to all your four questions, but the first question Identification of the red rectangles with texts in cells (OCR) does not sound very difficult. Here is my solution to this question:
Step 1: separate the color image into 3 layers: Red, Blue, and Green, and only use the red layer for the following operations.
Step 2: binarization of the red layer.
Step 3: connected component analysis of the binarization result, and keep the statics of each connected component (width of the blob, height of the blob for example)
Step 4: discard large blobs, and only keep blobs that are corresponding to texts. Also use the layout information to discard false text blobs (for example, texts are always in the large blob, and texts blobs have horizontal writing style and so on).
Step 5: perform OCR on textural components. When performing OCR, each blob will give you a confidence level, and this can be used for validation whether it is a textual component or not.
Related
I'm trying to use blob detection to process an image. When a blob looks similar to a sample blob I want to split the logic but I'm having difficulty with comparing the two blobs.
I'm currently using AForge, my best lead is to use the outer points of the blobs and reconstruct the basic shape of the blobs. Theses shapes are black and white and a known shape so it wouldn't be impossible.
Is there a simpler way to compare the difference between two blobs?
Detecting shapes of any form could be done using Haar cascade classifiers. I'm more used to OpenCV (https://docs.opencv.org/2.4/modules/objdetect/doc/cascade_classification.html)
Apparently, AForge doesn't come with Haar classifiers itself, but there seems to be a decent library containing HaarCascade.
So you might want to have a look here: http://accord-framework.net/docs/html/T_Accord_Vision_Detection_HaarCascade.htm
I've used the following blog post for my OpenCV case back in the days: https://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html
I'm novice in OpenCVSharp3 and I've been having a look at some examples for image matching using this library.
The key to my question is that I don't know what kind of modifications the code from this question needs to compare two images that are almost 100% identical, but one of them is rotated (unlimited rotation) and sometimes slightly displaced from the source (some pixels).
The method from such question basically compare if one image is inside the other, but my project only need to compare 5 images with the same size, where two of them are the same with slightly differences.
Is valid such algorithm?
EDIT:
Here is an example of 5 images to detect the same:
It can be valid but:
if you want unlimited rotation, you have to compare your reference image with an infinite combination of rotated other image.
if your other image is displaced from the source you will have to generate all the possible displacement images.
If you combine this two technique, you will have a lot of combination.
So yes, it can be done in generating all possible different images for one image, and compare them with your reference image.
It's not very robust, what will append if you try it on images displaced with a superior amount of pixels? if a color adjustment had be done on a image? if one is in gray-scale?
I recommend you to use machine learning for this problem.
I would proceed like this:
make a dataset of images
for each images, make a data augmentation (make all rotation, displacement, noise that it is possible).
Use a CNN and train it to recognize each variation of an image as the same image.
You're done, you have an algorithm who do the job :)
Here an implementation of tensorflow for C#
https://github.com/migueldeicaza/TensorFlowSharp
For a simple implementation of MNIST CNN with python see here
Here A video that explain how CNN works (look at the feature detection and pooling operation, it can help you)
I'm programming to make a image processing application using C#.NET (Windows Forms), EmguCV 3.1 (OpenCV wrapper for C#) and AForgeNet(another library for image processing). I extract several key points from an image as depicted below:
As it can be seen, there are several white points and red lines. White points shows location of key points? I want to extract lines for each group of pixels having this properties:
1- distance among pixels must be almost equal.
2- they must be close together.
Is there any method or approach to extract lines between pixels among aforementioned libraries? An example of lines that I imagine is depicted in the figure.
Any idea will be appreciated.
There are lots of ways to find lines in points. Ransac is probably the best. Once you find lines in your points, find the points that lie on the line, and test to see that they are roughly equally spaced.
Alternatively, look at all the inter point distances, cluster those distances, and see if any of those clusters live on lines (fit lines using robust techniques, or ransac, the better choice will depend on how noisy these sets are)
I have to recognize digits within an image from video stream and there are several more things, that should make recognition easier:
1) it is fixed font 6x8, all symbols are equal width
2) I know exact positions of digits, they are always rectangular, are not rotated/sqewed/scaled, but there may be some distortions because of air transmission glitch.
3) It is only digits and .
4) digit background is semi black (50% opaque)
I've tried tesseract v2 and v3, but .NET wrappers aren't perfect and recognition error was very large, even if I trained with custom font, as far as I understand that is because of small resolution.
I've made very simple algorithm by my self by turning image to black and white and counting matching pixels between original font image and image from stream, it performs better than tesseract, but I hink more sophisticated algorithm would do better.
I've tried to train AForge using ActivationNetwork with BackPropagationLearning and it fails to converge(this article first part, as long as I don't need scaling and several fonts http://www.codeproject.com/Articles/11285/Neural-Network-OCR, as I understand code in article is for older version of AForge), bad thing is, that this project is not supported anymore, forum is closed and google groups as I understand too.
I know there's OpenCV port to .NET, as far as I see, it has different network approaches than AForge, so questiton is which approach would fit best.
So is there any .NET framework to help me at this, and if it supports more than one neural network implementations, which implementation would fit best?
For fixed size fonts at a fixed magnification, you can probably get away with a less-sophisticated OCR approach based on template matching. See here for an example of how to do template matching using OpenCV (not .NET, but hopefully enough to get you started.) The basic idea is that you create a template for each digit, then try matching all templates at your target location, choosing the one with the highest match score. Because you know where the digits are located, you can search over a very small area for each digit. For more information on the theory behind template-matching, see this wiki article on Cross-correlation.
This is actually the basis for simplified OCR applications (usually for recognizing special OCR fonts, like the SEMI standard fonts used for printing serial numbers on silicon wafers.) The production-grade algorithms can also support tolerance for scaling, rotation and translation, but the underlying techniques are pretty much the same.
Try looking at this project and this project too. Both projects explain how OCR works and shows you how to implement it in C# and .NET.
If you are not in an absolute hurry I would advise you to first look for a method that solves the problem. I've made good experiences with WEKA. Using WEKA you can test a bunch of algorithms pretty fast.
As soon as you found the algorithm that solves your problem, you can either port it to .NET, build a wrapper, look for an implementation or (if it's an easy algo) rebuild it in .NET.
I need some help with an algorithm. I'm using an artificial neural network to read an electrocardiogram and trying to recognize some disturbances in the waves. That's OK, and I have the neural network and I can test it no problem.
What I'd like to do is to give the function to the user to open an electrocardiogram (import a jpeg) and have the program find the waves and convert it in to the arrays that will feed my ANN, but there's the problem. I did some code that reads the image and transforms it into a binary image, but I can't find a nice way for the program to locate the waves, since the exact position can vary from hospital to hospital, I need some suggestions of approaches I should use.
If you've got the wave values in a list, you can use a Fourier transform or FFT (fast Fourier transform) to determine the frequency content at any particular time value. Disturbances typically create additional high-frequency content (ie, sharp, steep waves) that you should be able to use to spot irregularities.
You'd have to assume a certain minimal contrast between the "signal" (the waves) and the background of the image. An edge-finding algorithm might be useful in that case. You could isolate the wave from the background and plot the wave.
This post by Rick Barraza deals with vector fields in Silverlight. You might be able to adapt the concept to your particular problem.