Sliding Window Object detection - c#

I am trying to implement a sliding window technique to identify my hypothesis. I am using a 64 x 128 window however I also want to find objects smaller than my window or objects larger than my window. I heard of an apporoach of resizing the image after a full traversal with my kernel window and placing my sliding window at the coordinates obtained from my current possition and the scalling factor, I have no idea how that technique was called...
Does anyone know how to solve this issue?
Thank you for your time !

I assume that you are referring to pyramidal approach which is frequently used in optical flow.
The patch is first detected at upper pyramid scale, than the obtained coordinates are rescaled and the patch is searched again in lower pyramid scale.
Haar object detection does not need image rescaling as the features can be easily rescaled using integral image.

Related

Comparing focus of 2 image

I'm trying to develop object detection algorithm. I plan to compare 2 image with different focus length. One image that correct focus on the object and one image that correct focus on background.
By reading about autofocus algorithm. I think it can done with contrast detection passive autofocus algorithm. It work on light intensity on the sensor.
But I don't sure that light intensity value from the image file has the same value as from the sensor. (it not a RAW image file. a jpeg image.) Is the light intensity value in jpeg image were the same as on the sensor? Can I use it to detect focus correctness with contrast detection? Is there a better way to detect which area of image were correct focus on the image?
I have tried to process the images a bit and I saw some progress. THis is what I did using opencv:
converted images to gray using cvtColor(I, Mgrey, CV_RGB2GRAY);
downsampled/decimated them a bit since they are huge (several Mb)
Took the sum of absolute horizontal and vertical gradients using http://docs.opencv.org/modules/imgproc/doc/filtering.html?highlight=sobel#cv.Sobel.
The result is below. The foreground when in focus does look brighter than background and vice versa.
You can probably try to match and subtract these images using translation from matchTemplate() on the original gray images; and then assemble pieces using the convex hull of the results as initialization mask for grab cut and plugging in color images. In case you aren’t familiar with the grab cut, chceck out my answer to this question.
But may be a simpler method will work here as well. You can try to apply a strong blur to your gradient images instead of precise matching and see what the difference give you in this case. The images below demonstrate the idea when I turned the difference in the binary masks.
It will be helpful to see your images. It I understood you correctly you try to separate background from foreground using focus (or blur) cue. Contrast in the image depends on focus but it also depend on the contrast of the target. So if the target is clouds you will never get sharp edges or high contrast. Finally jpeg image that use little compression should not affect the critical properties of your algorithm.
I would try to get a number of images at all possible focus lengths in a row and then build a graph of the contrast as a function of focal length (or even better focusing distance). The peak in this graph will give you the distance to the object regardless of object's own contrast. Note, however, that the accuracy of such visual cues goes down sharply with viewing distance.
This is what I expect you to obtain when measuring the sum of absolute gradient in a small window:
The next step for you will be to combine areas that are in focus with the areas that are solid color that is has no particular peak in the graph but none the less belong to the same object. Sometimes getting a convex hull of the focused areas can help to pinpoint the raw boundary of the object.

Emgu - How to extract the images likely to represent an icon or control from a screenshot?

I'm working on an experimental project in which the challenge is to identify and extract an image of the icon or control that the user is has clicked on/touched. The method I'm trying is as follows (I need some help with step 3):
1) Take a screen shot when the user clicks/touches the screen:
2) Apply edge detection:
3) Extract the possible icon images around the Point associated with the user's cursor (Don't know how to do this)
There are easier cases in which the mouse-over event will highlight the icon/control, which allows me to identify the control with a simple screen shot comparison (before and after mouse-over). The above method is specifically for cases in which the icon is not highlighted. I'm new to emgu, so if anyone has any pointers on how to better achieve this, I'm all ears.
Cheers! Matt
Instead of doing edge detection. Consider taking the following steps:
Only grab pixels which are within a certain radius of the point of the user's cursor. Create a new image with just these pixels.
Use thresholding to classify into foreground and background.
Calculate the centroid, (use mean x coordinate and mean y coordinate). Calculate deviation from the mean. Discard foreground pixels which are beyond a certain deviation from the mean. Eg: discard pixels that are more than 1.6 deviations from the mean.
(You may need to experiment with this step ).
Use a convex hull to find the area of the image with the icon in it.

High CPU load when changing background image of Canvas containing overlay elements

I am working on an Application that loads live video images from a camera and displays an overlay on top of said image. The Overlay does not change often so it can be considered as still. However it usually contains about 1,000 to 10,000 Lines.
When the video image is updated there is a notable impact to the CPU load depending on whether the overlay is visible or not. The overlay does neither get invalidated nor changed, just the image behind it is changing.
My setup is this:
<Canvas>
<Image/>
<Canvas>
<OverlayElement 1/>
<OverlayElement 2/>
<OverlayElement 3/>
<.../>
</Canvas>
</Canvas>
The Image's Source is a WriteableBitmap. Every time a new camera image (type byte[]) is available, the main Canvas' Dispatcher is invoked to write the image data by using WriteableBitmap.WritePixels().
The inner Canvas contains all Overlay Elements, being
- a contour (PolyLine)
- a circle (Path with EllipseGeometry) and
- a set of Rays (Path with one Figure containing LineSgements).
The number n of Points in the contour equals the number of line Segments in the last mentioned Path. n is usually around 1,000 - 3,000.
Depending on the count and length of Lines shown in the overlay the CPU load for showing a live image varies (increases if length or count go up) even if the overlay does not change. At some point this affects the frame rate and makes the program unusable. Line length is mostly correlated with line intersection, so maybe the Path is struggling to calculate it's fill area despite it is not painted?
So how could I improve the performance here?
What bugs me most is that even if the overlay does not change, the render time increases with it's primitive count. I would expect to have constant render time once the overlay has been drawn in it's last set state. What could I do to achieve that aside from rendering the whole overlay to a bitmap?
I am also open minded for suggestions on how to get the byte[] onto the screen more efficiently. Just keep in mind this problem is part of a bigger Application and i cannot change all paradigms concentrating on how to get the image drawn.
What I have tried so far:
Override the OnRender() method of the inner Canvas, drawing the overlay myself. This works fine but has the performance issue that brings me here ;)
Use Shapes (PolyLine, Ellipse, Path) as the inner Canvas' children to hold the overlay elements. This works, too. It is faster to redraw the overlay when it changes but on the other hand worsens the performance issue when updating the background image.
Like 2., but use Freeze() on Geometries wherever possible. Has no or little performance impact.
Thanks for your help in advance.

Image Co-Ordinate System Design

I'm diving into something without sufficient background, but I feel like there may be simple solutions that don't require me to have in depth knowledge of the topic.
What I am trying to do is have an image co-ordinate system. Basically the user will supply an image, like a house plan. They can then click on points in the image and create markers (like google maps). The next time they retrieve the map, all the markers they added before are there and they can add new ones.
I need to identify the points these markers are located on so I can store that information. I also need to be able to create a layer on the image that contains the markers and renders them in the exact locations they were placed.
I imagine the easiest way to do this is to use pixel co-ordinates...the rub here is that the image won't be a fixed size since there is a web application and an IPad application, so the co-ordinate system needs to work as long as the image is in the same size ratio.
The server size is .NET and as mentioned there is an IPad app, so the solution needs to be viable given that tech stack.
Any ideas?
Instead of using pixel coordinates in absolute terms, you can use the 0 to 1 range. The top left corner is (0,0), bottom right is (1,1) and the center of the image is (0.5,0.5). This way not matter what image size (or zoom level) you have, the markers will always be in the same place.
My suggestion is don't try to figure out the correlation between the actual image and the coordinates. The only thing I would do is use the resolution of the image, aka 800x600 and use that for your grid. Then overlay your markers using that grid on the image. The points you'd remember would just be X and Y values and maybe a tag name/id.

How to detect region of least energy in an Image

I want to programmatically place text on an image in an area where there is least "going on". It has been some time since I took Computer-Vision, could someone point me in the right direction. Either with respect to C# or Matlab?
I suggest dividing the image into distinct regions, each the size of the space you need for the text overlay. Calculate some measure of visual "energy", such as standard deviation, and choose the region with the lowest value. You could also slide a window around, looking for an arbitrary space of low energy, but this would be computationally much more expensive.
If you have the image processing toolbox for Matlab, you can run an entropy filter (ENTROPYFILT) on the image, matching the filter size to the size of your text. Then, all you need to do is find the filter-result with the smallest value, and you have the center of where you want to put the text.

Categories