I'm trying to perform image registration without much luck.
The image below is my 'reference' image. I use a webcam to acquire images of the same object in different orientations and then need to perform a transformation on these images so that they look as close to the reference image as possible.
I've been using both the Aforge.NET and Accord.NET libraries in order to solve this problem.
Feature detection/extraction
So far I've tried the image stitching method used in this article. It works well for certain types of image but unfortunately it doesn't seem to work for my sample images. The object itself is rather bland and doesn't have many features so the algorithm doesn't find many correlation points. I've tried two versions of the above approach, one which uses the Harris corner detector and one which uses SURF, neither of which has provided me with the results I need.
One option might be to 'artificially' add more features to the object (i.e. stickers, markings) but I'd like to avoid this if possible.
Shape detection
I've also tried several variations of the shape detection methods used in this article. Ideally I'd like to detect the four well-defined circles/holes on the object. I could then use the coordinates of these to create a transformation matrix (homography?) that I could use to transform the image.
Unfortunately I can't reliably detect all four of the circles. I've tried myriad different ways of pre-processing the image in order to get better circle detection, but can't quite find the perfect sequence. My normal operations is:
turn image grayscale
apply a filter (Mean, Median, Conservative Smoothing, Adaptive Smoothing, etc)
apply edge detection (Homogenity, Sobel, Difference, Canny, etc)
apply color filtering
run shape/circle detector
I just can't quite find the right series of filters to apply in order to reliably detect the four circles.
Image / Template matching
Again, I'd like to detect the four circles/holes in the object, so I tried an image / template matching technique with little success. I've created a template (small image of one of the circles) and run the Exhaustive Template Matching algorithm, without much success. Usually it detects just one of the holes, usually the one the template was created from!
In summary
I feel like I'm using the correct techniques to solve this problem, I'm just not sure quite where I'm going wrong, or where I should focus my attention further.
Any help or pointers would be most appreciated.
If you've added examples of transformations you're trying to be invariant to - we could be more specific. But generally, you can try to use HOG for detecting this structure, since it is rather rich in gradients.
HOG is mostly used to detect pedestrians, besides it is good for detecting distinct logos.
I am not sure about HOG's invariance to rotations, but it's pretty robust under different lighting and under moderate perspective distortion. If rotation invariance is important, you can try to train the classifier on rotated version of object, although your detector may become less discriminative.
After you have roughly detected the scale and position of your structure - you can try to refine it, by detecting ellipse of it's boundary. After that you will have a coarse estimate of holes, which you can further refine using something like maximum of normalized cross correlation in this neighbourhood.
I know it's been awhile but just a short potential solution:
I would just generate a grid of points on the original image (let's say, 16x16) and then use a Lucas-Kanade (or some other) feature detector to find those points on second image. Of course you likely won't find all the points but you can sort and choose the best correlations. Let's say, the best four? Then you can easily compute a transformation matrix.
Also if you don't get good correlations on your first grid, then you can just make other grids (shifted, etc.) until you find good matches.
Hope that helps anyone.
Related
TL;DR
Given a 2-dimensional plane (say, 800x600 pixels or even 4000x4000) that can contain obstacles (static objects) or entities (like tanks, or vehicles), how can computer-controlled tanks navigate the map without colliding with static objects while pursuing other tanks? Please note that every static object or entity has the ability to freely rotate at 360 degrees and has an arbitrary size.
Context
What I am really trying to do is to develop a game with tanks. It initially started as a modern alternative to an old arcade game called "Battle City". At first, it might have been easy to develop an AI, considering the 13x13 grid, fixed sizes and no rotation. But now, with free rotation and arbitrary sizes, I am unable to find a way of replicating such behavior in these circumstances.
The computer-controlled tanks must be able to navigate a map full of obstacles and pursue the player. I think that the main problem is generating all the possibilities for a tank to go to; the collision system is already implemented and awaiting to be used. For example, tanks might be able to fit through tight spaces (which can be diagonal, for instance) just by adjusting its angle of rotation.
Although I have quite some experience in programming, this is way beyond my reach. Even though I would prefer a broader answer regarding the implementationn of tank's artificial intelligence, a method for generating the said paths might suffice.
I initially though about using graphs, but I do not know how to apply them considering different tanks have different sizes and the rotation thing gives me a headache. Then again, if I would be using graphs, what will a node represent? A pixel? 16,000,000 nodes would be quite a large number.
What I am using
C# as the main programming language;
MonoGame (XNA Framework alternative) for rendering;
RotatedRectangle class (http://www.xnadevelopment.com/tutorials/rotatedrectanglecollisions/rotatedrectanglecollisions.shtml).
I am looking forward for your guidelines. Thank you for your time!
I've been working on a project of crowd simulation that included path finding and obstacles/other people avoidance.
We've used the Recast Navigation, a all-in-one library which implements state-of-the-art navigation mesh algorithms.
You can get more info here : https://github.com/memononen/recastnavigation
In our project, it has proven to be reliable and very configurable. Even if it's written in C++, you can easily find/make a wrapper (in our case, we've been using it wrapped in Javascript on a Nodejs server!)
If you don't want to use this library, you can still take a look at Navigation Meshes, which is the underlying theory behind Recast.
I hope it will help!
Navigation Mesh, that's what ur looking for. To explain a bit, it's in theory really easy. U build ur World (2D/3D) and after creation u generate a new mesh, that tells entities where they are allowed to move, without colliding with the surroundings. They then move on this mesh. Next is the path generation algorithm which is basically nothing else then checking in any mathematically form how to get on this mesh to it's target. On an actual navigation mesh, this get's rather complicated but easy if u think of a grid where u check which fields to move to get the shortest way.
So short answered, u need any type of additional layer of ur world, that tells the AI where it is allowed to move, and any kind of algorithm that fits ur type of layer to calculate the path.
As a hint, for unity as an example, there are many free good build solutions. Also u will find a bunch of good libraries to achieve this without a game engine like unity.
Despite Googling around a fair amount, the only things that surfaced were on neural networks and using existing APIs to find tags about an image, and on webcam tracking.
What I would like to do is create my own data set for some objects (a database containing the images of a product (or a fingerprint of each image), and manufacturer information about the product), and then use some combination of machine learning and object detection to find if a given image contains any product from the data I've collected.
For example, I would like to take a picture of a chair and compare that to some data to find which chair is most likely in the picture from the chairs in my database.
What would be an approach to tackling this problem? I have already considered using OpenCV, and feel that this is a starting point and probably how I'll detect the object, but I've not found how to use this to solve my problem.
I think in the end it doesn't matter what tool you use to tackle your problem. You will probably need some kind of machine learning. It's hard to say which method would result in the best detection, for this I'd recommend to use a tool like weka. It's a collection of multiple machine learning algorithms and lets you easily try out what works best for you.
Before you can start trying out the machine learning you will first need to extract some features out of your dataset. Since you can hardly compare the images pixel by pixel which would result in huge computational effort and does not even necessarily provide the needed results. Try to extract features which make your images unique, like average colour or brightness, maybe try to extract some shapes or sizes out of the image. So in the end you will feed your algorithm just with the features you extracted out of your images and not the images itself.
Which are good features is hard to define, it depends on your special case. Generally it helps to have not just one but multiple features covering completely different aspects of the image. To extract the features you could use openCV, or any other image processing tool you like. Get the features of all images in your dataset and get started with the machine learning.
From what I understood, you want to build a Content Based Image Retrieval system.
There are plenty of methods to do this. What defines the best method to solve your problem has to do with:
the type of objects you want to recognize,
the type of images that will be introduced to search the objects,
the priorities of your system (efficiency, robustness, etc.).
You gave the example of recognizing chairs. In your system which would be the determining factor for selecting the most similar chair? The color of the chair? The shape of the chair? These are typical question that you have to answer before choosing the method.
Either way one of the most used methods to solve such problems is the Bag-of-Words model (also Referred the Bag of Features). I wish I could help more but for that I need that you explain it better which are the final goals of your work / project.
I have a set of points, the coordinates are not predetermined and I can set them when I create them, but their links are predetermined. A point can have one or more links, but not zero.
I want to be able to generate a visual representation of these points at positions where these link lines between them will not intersect. From what I've learned in researching so far, I believe this will be somewhat similar to a planar graph, however there will be points with only one link, and I'm not sure planar graphs are able to represent these.
I'm not sure if there is a good way of doing what I am trying to do or not, but I'll admit that maths is not my strong suite. My 'best' idea so far is to somehow detect these intersections and then move points in a direction that somehow takes the intersection location into account to reposition them so that that particular intersection does not occur....and to loop and do this for every point until no more intersections are detected. However there could well be some sort of more efficient mathematical algorithm that I could use instead that I am simply unaware of.
I'm interested in all advice here, whether it is efficient or not.
This is not an easy problem. Here are the Google search results for “algorithm to draw a planar graph”. The Boost C++ Library has some support for drawing planar embeddings including an example. These of course use C++, not the C# you tagged the problem with.
OK. This is part of an (non-English) OCR project. I have already completed preprocessing steps like deskewing, grayscaling, segmentation of glyphs etc and am now stuck at the most important step: Identifcation of a glyph by comparing it against a database of glyph images, and thus need to devise a robust and efficient perceptual image hashing algorithm.
For many reasons, the function I require won't be as complicated as required by the generic image comparison problem. For one, my images are always grayscale (or even B&W if that makes the task of identification easier). For another, those glyphs are more "stroke-oriented" and have simpler structure than photographs.
I have tried some of my own and some borrowed ideas for defining a good similarity metric. One method was to divide the image into a grid of M x N cells and take average "blackness" of each cell to create a hash for that image, and then take Euclidean distance of the hashes to compare the images. Another was to find "corners" in each glyph and then compare their spatial positions. None of them have proven to be very robust.
I know there are stronger candidates like SIFT and SURF out there, but I have 3 good reasons not to use them. One is that I guess they are proprietary (or somehow patented) and cannot be used in commercial apps. Second is that they are very general purpose and would probably be an overkill for my somewhat simpler domain of images. Third is that there are no implementations available (I'm using C#). I have even tried to convert pHash library to C# but remained unsuccessful.
So I'm finally here. Does anyone know of a code (C# or C++ or Java or VB.NET but shouldn't require any dependencies that cannot be used in .NET world), library, algorithm, method or idea to create a robust and efficient hashing algorithm that could survive minor visual defects like translation, rotation, scaling, blur, spots etc.
It looks like you've already tried something similar to this, but it may still be of some use:
https://www.memonic.com/user/aengus/folder/coding/id/1qVeq
I was hoping that I could achieve some guidance from the stackoverflow community regarding a dilemma I have run into for my senior project. First off, I want to state that I am a novice programmer, and I'm sure some of you will quickly tell me this project was way over my head. I've quickly become well aware that this is probably true.
Now that's that's out of the way, let me give some definitions:
Project Goal:
The goal of the project, like many others have sought to achieve in various SO questions (many of which have been very helpful to me in the course of this effort), is to detect
whether a parking space is full or available, eventually reporting such back to the user (ideally via an iPhone or Droid or other mobile app for ease of use -- this aspect was quickly deemed outside the scope of my efforts due to time constraints).
Tools in Use:
I have made heavy use of the resources of the AForge.Net library, which has provided me with all of the building blocks for bringing the project together in terms of capturing video from an IP camera, applying filters to images, and ultimately completing the goal of detection. As a result, you will know that I have selected to program in C#, mainly due to ease-of-use for beginners. Other options included MATLAB/C++, C++ with OpenCV, and other alternatives.
The Problem
Here is where I have run into issues. Below is linked an image that has been pre-processed in the AForge Image Processing Lab. The sequence of filters and processes used was: Grayscale, Histogram Equalization, Sobel Edge Detection and finally Otsu Threshholding (though I'm not convinced the final step is needed).
http://i.stack.imgur.com/u6eqk.jpg
As you can tell from the image with the naked eye of course, there are sequences of detected edges which clearly are parked cars in the spaces I am monitoring with the camera. These cars are clearly defined by the pattern of brightened wheels, the sort of "double railroad track" pattern that essentially represents the outer edging of the side windows, and even the outline of the license plate in this instance. Specifically though, in a continuation of the project the camera chosen would be a PTZ to cover as much of the block as possible, and thus I'd just like to focus on the side features of the car (eliminating factors such as license plate). Features such as a a rectangle for a sunroof may also be considered but obviously this is a not a universal feature of cars, whereas the general window outline is.
We can all see that there are differences to these patterns, varying of course with car make and model. But, generally this sequence not only results in successful retrieval of the desired features, but also eliminates the road from view (important as I intend to use road color as a "first litmus test" if you will for detecting an empty space... if I detect a gray level consistent with data for the road, especially if no edges are detected in a region, I feel I can safely assume an empty space). My question is this, and hopefully it is generic enough to be practically beneficial to others out there on the site:
Focused Question:
Is there a way to take an image segment (via cropping) and then compare the detected edge sequence with future new frames from the camera? More specifically, is there a way to do this while allowing leeway/essentially creating a tolerance threshhold for minor differences in edges?
Personal Thoughts/Brainstorming on The Question:
-- I'm sure there's a way to literally compare pixel-by-pixel -- crop to just the rectangle around your edges and then slide your cropped image through the new processed frame for comparison pixel-by-pixel, but that wouldn't help particularly unless you had an exact match to your detected edges.
All help is appreciated, and I'm more than happy to clarify as needed as well.
Let me give it a shot.
You have two images. Lets call them BeforePic and AfterPic. For each of these two pictures you have a ROI (rectangle of interest) - AKA a cropped segment.
You want to see if AfterPic.ROI is very different from BeforePic.ROI. By "very different" I mean that the difference is greater then some threshold.
If this is indeed your problem, then it should be split into three parts:
get BeforePic and AfterPic (and the ROI for each).
Translate the abstract concept of picture\edge difference into a numerical one.
compare the difference to some threshold.
The first part isn't really a part of your question, so I'll ignore it.
The last part is based basically finding the right threshold. Again out of the scope of the question.
The second part is what I think is the heart of the question (I hope I'm not completely off here). For this I would use the algorithm ShapeContext (In the PDF, it'll be best for you to implement it up to section 3.3, as it gets too robust for your needs from 3.4 and on).
Shape Context is a image matching algorithm using image edges with great success rates.
Implementing this was my finals project, and it seems like a perfect match (no pun intended) for you. If your edges are well, and your ROI is accurate, it won't fail you.
It may take some time to implement, but if done correctly, this will work perfectly for you.
Bare in mind, that a poor implementation might run slowly and I've seen a worst case of 5 seconds per image. A good (yet not perfect) implementation, on the other hand, will take less then 0.1 seconds per image.
Hope this helps, and good luck!
Edit: I found an implementation of ShapeContext in C# # CodeProject, if it's of any interest
I take on a fair number of machine vision problems in my work and the most important thing I can tell you is that simpler is better. The more complex the approach, the more likely it is for unanticipated boundary cases to create failures. In industry, we usually address this by simplifying conditions as much as possible, imposing rigid constraints that limit the number of things we need to consider. Granted, a student project is different than an industry project, as you need to demonstrate an understanding of particular techniques, which may well be more important than whether it is a robust solution to the problem you've chosen to take on.
A few things to consider:
Are there pre-defined parking spaces on the street? Do you have the option to manually pre-define the parking regions that will be observed by the camera? This can greatly simplify the problem.
Are you allowed to provide incorrect results when cars are parked illegally (taking up more than one spot, for instance)?
Are you allowed to provide incorrect results when there are unexpected environmental conditions, such as trash, pot holes, pooled water or snow in the space?
Do you need to support all categories of vehicles (cars, flat-bed trucks, vans, delivery trucks, motorcycles, mini electric cars, tripod vehicles, ?)
Are you allowed to take a baseline snapshot of the street with no cars present?
As to comparing two sets of edges, probably the most robust approach is known as geometric model finding (describing the edges of interest mathematically as a series of 'edgels', combining them into chains and comparing the geometry), but this is over-kill for your application. I would look more toward thresholds of the count of 'edge pixels' present in a parking region or differencing from a baseline image (need to be careful of image shift, however, since material expansion from outdoor temperature changes may cause the field of view to change slightly due to the camera mechanically moving.)