How to detect rotated object from image using OpenCV?

How to detect rotated object from image using OpenCV? - c#

I have been training OpenCV classifier for recognition of books.The requirement is recognize book from an image. I have used 1000+ images and OpenCV is able to detect books with no rotation. However, when I try to detect books with rotations it does not work properly.So I am wondering if their anyway to detect objects with rotations in images using OpenCV?

What features are you using to detect your books? Are you training a CNN and deploying it with OpenCV? In that case adding rotation image augmentation to your training would make it easy to detect rotated books.
If you are using traditional computer vision techniques instead, you can try to use some rotation invariant feature extractors like SURF, however, the results will not be as good as using CNNs which are now the state of the art for this kind of problems.

Your problem can be perfectly solved using OpenCV and keypoint matching algorithms such as SURF or ORB. You don't really need a classifier. To my experience, such solution using unmodified openCv can scale up to recognize around 10.000 images.
What I would do is:
Offline: Loop over your book images to generate a database of keypoint descriptors matching each descriptor to the id of the book in which it comes from.
Online: Compute the keypoints of the query image and try to match (using BF, FLANN, or LSH) each of them to a keypoint of the pre-computed database.
Vote for the database book cover which has matched with the most query keypoints.
Try to compute an homography matrix between selected db book cover and query image to validate match.
ORB, BRISK, SURF, SIFT feature descriptors are all usable for this task, and rotation invariant. ORB and BRISK are faster and a bit less performant.
See this link for simple example:
https://docs.opencv.org/3.3.0/dc/dc3/tutorial_py_matcher.html

Firstly, you should get the main theoritical ideas of pose estimation and image warping.
You should define some important points of the books (some special and strong features that valid for each types of books) and then you can estimate the pose of the book by using this points. After getting the pose angles, you should warp the image to align books. After book alignment you should perform feature extraction so you can improve the success of book detection by this way.
As a summary, pose estimation and warping (alignment) are important for these kinf of rotation problems.

Related

Image Matching using opencvsharp3

I'm novice in OpenCVSharp3 and I've been having a look at some examples for image matching using this library.
The key to my question is that I don't know what kind of modifications the code from this question needs to compare two images that are almost 100% identical, but one of them is rotated (unlimited rotation) and sometimes slightly displaced from the source (some pixels).
The method from such question basically compare if one image is inside the other, but my project only need to compare 5 images with the same size, where two of them are the same with slightly differences.
Is valid such algorithm?
EDIT:
Here is an example of 5 images to detect the same:

It can be valid but:
if you want unlimited rotation, you have to compare your reference image with an infinite combination of rotated other image.
if your other image is displaced from the source you will have to generate all the possible displacement images.
If you combine this two technique, you will have a lot of combination.
So yes, it can be done in generating all possible different images for one image, and compare them with your reference image.
It's not very robust, what will append if you try it on images displaced with a superior amount of pixels? if a color adjustment had be done on a image? if one is in gray-scale?
I recommend you to use machine learning for this problem.
I would proceed like this:
make a dataset of images
for each images, make a data augmentation (make all rotation, displacement, noise that it is possible).
Use a CNN and train it to recognize each variation of an image as the same image.
You're done, you have an algorithm who do the job :)
Here an implementation of tensorflow for C#
https://github.com/migueldeicaza/TensorFlowSharp
For a simple implementation of MNIST CNN with python see here
Here A video that explain how CNN works (look at the feature detection and pooling operation, it can help you)

Recognizing visio shapes in an image

Delivering SCADA solutions, we often get the our end user specifications specified in Structured Control Diagram (visio like flow diagrams seen below) that are often submitted in PDF format or as images.
In order to access these in C#, I was hoping to use one of the OpenCV libraries.
I was looking at template recognition, but it seems a wrong fit to start feeding into a machine learning algorithm to teach it to recognize the preknown specific shape of boxes and arrows.
The libraries I've looked at have some polyedge functions. However, as can be seen from the example below there is the danger that the system will treat the whole thing as one large polygon when there is no spacing between elements..
The annotations may be any 90 degree rotation and I would like to identify them as well as the contents of the rectangles using OCR.
I do not have any experience in this, which should be apparent by now, so I hope somebody can point me out in the direction of the appropriate rabbit hole. If there are multiple approaches, then choose the least math heavy.
Update:
This is an example of the type of image I'm talking about.
The problem to adress is:
Identification of the red rectangles with texts in cells (OCR).
The identification of arrow, including direction and end point annotations. Line type, if possible.
Template matching of the components.
Fallback to some polyline entity or something if template matching fails.

I'm sure you do realize this is an active field of research, the algorithms and methods described in this post are fundamental, maybe there are better/more specific solutions either completely heuristic or based on these fundamental methods.
I'll try to describe some methods which I used before and got good results from in similar situation (we worked on simple CAD drawings to find logical graph of a electrical grid) and I hope it would be useful.
Identification of the red rectangles with texts in cells (OCR).
this one is trivial for your solution as your documents are high quality, and you can easily adapt any current free OCR engines (e.g. Tesseract) for your purpose,there would be no problem for 90,180,... degrees, engines like Tesseract would detect them (you should config the engine, and in some cases you should extract detected boundries and pass them individually to OCR engine), you may just need some training and fine tuning to achieve maximum accuracy.
Template matching of the components.
Most template-matching algorithms are sensitive to scales and scale invariant ones are very complex, so I don't think you get very accurate results by using simple template matching algorithms if your documents vary in scale and size.
and your shapes features are very similar and sparse to get good results and unique features from algorithms such as SIFT and SURF.
I suggest you to use contours, your shapes are simple and your components are made from combining these simple shapes, by using contours you can find these simple shapes (e.g rectangles and triangles) and then check the contours against previously gathered ones based on component shapes, for example one of your components are created by combining four rectangles, so you can hold relative contours together for it and check it later against your documents in detection phase
there are lots of articles about contour analysis on the net, I suggest you to have a look at these, they will give you a clue on how you can use contours to detect simple and complex shapes:
http://www.emgu.com/wiki/index.php/Shape_%28Triangle,_Rectangle,_Circle,_Line%29_Detection_in_CSharp
http://www.codeproject.com/Articles/196168/Contour-Analysis-for-Image-Recognition-in-C
http://opencv-code.com/tutorials/detecting-simple-shapes-in-an-image/
http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_imgproc/py_contours/py_contours_begin/py_contours_begin.html
by the way porting code to c# using EmguCV is trivial, so don't worry about it
The identification of arrow, including direction and endpoint annotations. Line type, if possible.
There are several methods for finding line segments (e.g. Hough Transform), the main problem in this part is other components as they are normally detected as lines too, so if we find components first and remove them from document, detecting lines would be a lot easier and with far less false detections.
Approach
1- Layer documents based on different Colors, and execute following phases on every desired layer.
2- Detect and extract text using OCR, then remove text regions and recreate the document without texts.
3-Detect Components, based on contour analysis and gathered component database, then remove detected components (both known and unknown types, as unknown shapes would increase your false detection in next phases) and recreate document without components,at this moment in case of good detection we should only have lines
4-Detect lines
5-At this point you can create a logical graph from extracted components,lines and tags based on detected position
Hope this Helps

I cannot give you solutions to all your four questions, but the first question Identification of the red rectangles with texts in cells (OCR) does not sound very difficult. Here is my solution to this question:
Step 1: separate the color image into 3 layers: Red, Blue, and Green, and only use the red layer for the following operations.
Step 2: binarization of the red layer.
Step 3: connected component analysis of the binarization result, and keep the statics of each connected component (width of the blob, height of the blob for example)
Step 4: discard large blobs, and only keep blobs that are corresponding to texts. Also use the layout information to discard false text blobs (for example, texts are always in the large blob, and texts blobs have horizontal writing style and so on).
Step 5: perform OCR on textural components. When performing OCR, each blob will give you a confidence level, and this can be used for validation whether it is a textual component or not.

Getting Matrix (CGAffineTransform) Info from iPhone Movies in C#/C++

When the iPhone records a video it puts the data from the camera directly onto the disk. What tells the player how to reorient the video is the Transform Matrix. Its a Mathematics structure that is used to change the position of the pixels in X,Y space.
In the iPhone and in the Macintosh I can ask the Video what its Transform is, and I get back a CGAffineTransform with a b c d tx and ty. Apple describes the Transform Matrix here
With this information I can determine what the Video layout is supposed to be and determine if it is expecting to be rotated before display.
I can get this information with ease in the osx and ios environments. I am trying to determine a way to get the same matrix information with Windows. Preferrable C# however if I must use C++ then so be it. Active X solutions are entirely undesirable and I am hoping that the Quicktime SDK for windows has some use. Otherwise what the heck did Apple write it for ??
If anyone knows how to obtain the Transform Matrix from a video or any place to start please, point me in the right direction.

It appears that the CGAffineTransform is something that will need to be pulled right out of the file itself. I used the Quicktime File Format Specification pdf to gain an understanding of the file and where to get the CGAffineMatrix
Here is a link to the page with the Matrix data on it
Quicktime File Format Specification Matrix Info
as you can see from this clip the matrix is in the Movie header atom. dignified with the 'mvhd' type.
it is a total of 36 Bytes long and is a total of 36 bytes after the 'mvhd' atom typename.
Given the file format specifications of the frames and tracks the Matrix can change throughout the playback of the video. But it is my experience that this method is not exercised on the videos that are output from the iphone.
I imagine that the matrix will need to be grabbed from each frame sometime in the near future and perhaps this is something that FFMpeg or other video format applications can work into their frame grabbers and video translators. But since I currently do not have a version of ffmpeg that used this matrix information I will be creating a simple Movie header grabber that will pull out the matrix and allow me to adjust my ffmpeg command line parameters accordingly, allowing me to translate my video appropriately..
If I come up with a better idea I will try to include this post on that knowledge.
A side note on the journey to this answer
For all of those who downvoted this question because You did not know what I was talking about. A request to clarify could have been quite sufficient. Coming in and downvoting because you dont understand or dont know the answer is neither constructive or fair. This is a very specific question and this answer will assist more than just myself.
I believe in helping to spread all kinds of knowledge and those of you who think its helpful to downvote because you just dont have any clue on how to help. I hope this gives you a better understanding of the issues that people are looking to solve. Just because you dont know what the problem is does not mean you should turn your nose up at it and certainly does not mean that you should discount those looking for answers that you cannot provide.
I am glad that I have an answer to my question and will definitely be open to any further criticism to the answer that I have given. Perhaps this answer will spark more questions about this issue and I will be able to learn and contribute to the future discussions about it
Thank you StackOverflow for restoring my question so I could answer it appropriately.

Image straightening algorithm

I am looking for a way to auto-straighten my images, and I was wondering if anyone has come across any algorithms to do this. I realize that the ability to do this depends on the content of the image, but any known algorithms would be a start.
I am looking to eventually implement this in C# or PHP, however, I am mainly after the algorithm right now.
Is this possible with OpenCV? ImageMagick? Others?
Many thanks,
Brett

Here is my idea:
edge detection (Sobel, Prewitt, Canny, ...)
hough transformation (horizontal lines +/- 10 degrees)
straighten the image according to the longest/strongest line
This is obviously not going to work in any type of image. This is just meant to fuel the discussion.

Most OCR programs straighten the scanned image prior to running recognition. You probably find good code in the many open source'd OCR programs, such Tesseract

Of course this does depend on what type of images you want to straighten, but there seems to be some resources available for automatic straightening of text scans.
One post I found mentioned 3 programs that could do auto-straightening:
TechSoft's PixEdit 7.0.11
Mystik Media's AutoImager 3.03
Spicer's Imagenation 7.50
If manual straightening is acceptable, there are many tutorials out there for how to straighten them manually using Photoshop; just google "image straightening"

ImageMagick has the -deskew option. This will simply rotate the image to be straight.
Most commercial OCR engines like ABBYY FineReader and Nuance OmniPage do this automatically.
The Leptonica research library has a command line tool called skewtest which will rotate the image.
I have not found a library which can take an image which has been distorted in any other way (like pin cushion or if it has been moved during a scanning operation, or removing the warp at the edge of a book). I am looking for a library or tool that can do this, but cannot find one.
Patrick.

Finding a wave graphic inside an image

I need some help with an algorithm. I'm using an artificial neural network to read an electrocardiogram and trying to recognize some disturbances in the waves. That's OK, and I have the neural network and I can test it no problem.
What I'd like to do is to give the function to the user to open an electrocardiogram (import a jpeg) and have the program find the waves and convert it in to the arrays that will feed my ANN, but there's the problem. I did some code that reads the image and transforms it into a binary image, but I can't find a nice way for the program to locate the waves, since the exact position can vary from hospital to hospital, I need some suggestions of approaches I should use.

If you've got the wave values in a list, you can use a Fourier transform or FFT (fast Fourier transform) to determine the frequency content at any particular time value. Disturbances typically create additional high-frequency content (ie, sharp, steep waves) that you should be able to use to spot irregularities.

You'd have to assume a certain minimal contrast between the "signal" (the waves) and the background of the image. An edge-finding algorithm might be useful in that case. You could isolate the wave from the background and plot the wave.
This post by Rick Barraza deals with vector fields in Silverlight. You might be able to adapt the concept to your particular problem.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.