I am trying to find duplicate videos in my database, and in order to do so I grab two frames from a videos pair, resize them to the same width and height and then compare both images pixel by pixel.
I have a case where images from the a videos pair are like below:
-----
These are actually the same videos (images), but because of the aspect ratio of the videos (16:9, 4:3 .. etc) the result is negative when comparing pixel by pixel (no match).
If my standard is 50x50, how can I transform any Region Of Interest to 50x50?
For the above example:
Pixel [5,0] shall be [0,0]
Pixel [45,0] shall be [50,0]
Pixel [5,50] shall be [0,50]
Pixel [45,50] shall be [50,50]
and all other pixels are transformed
Encouraged by OP that pseudo-code can be helpful ....
I have no knowledge about "emgucv", so I will answer in pseudo-code.
Definition
Let SRC be a source image - to be read.
Let DST be a destination image - to be written.
Both SRC and DST are 2D-array, can be accessed as ARRAY[int pixelX,int pixelY].
Here is the pseudo-code :-
input : int srcMinX,srcMinY,srcMaxX,srcMaxY;
float linearGra(float dst1,float dst2,float src1,float src2,float dst3){
return ( (dst3-dst1)*src2+ (dst2-dst3)*src1) / (dst2-dst1);
};
for(int y=0;y<50;y++){ //y of DST
for(int x=0;x<50;x++){ //x of DST
float xSRC=linearGra(0,50,srcMinX,srcMaxX,x);
float ySRC=linearGra(0,50,srcMinY,srcMaxY,y);
DST[x,y]=SRC[round(xSRC),round(ySRC)]; //find nearest pixel
}
}
Description
The main idea is to use linear-interpolation.
The function linearGra takes two points in a 2D graph (dst1,src1) and (dst2,src2) .
Assuming that it is a linear function (it is true because scaling+moving is linear function between SRC and DST coordinate), it will find the point (dst3,?) that lying in the graph.
I used this function to calculate pixel coordinate in SRC that match a certain pixel in DST.
Further work
If you are a perfectionist, you may want to :-
bounded the index (xSRC,ySRC) - so it will not index-out-of-bound
improve the accurary :-
I currently ignore some pixels (I use Nearest-neighbor w/o interpolation).
The better approach is to integrate all involved SRC's pixel, but you will get a-bit-blurry image in some cases.
You may also be interested in this opencv link (not emgucv).
Related
I need an inverse perspective transform written in Pascal/Delphi/Lazarus. See the following image:
I think I need to walk through destination pixels and then calculate the corresponding position in the source image (To avoid problems with rounding errors etc.).
function redraw_3d_to_2d(sourcebitmap:tbitmap, sourceaspect:extended, point_a, point_b, point_c, point_d:tpoint, megapixelcount:integer):tbitmap;
var
destinationbitmap:tbitmap;
x,y,sx,sy:integer;
begin
destinationbitmap:=tbitmap.create;
destinationbitmap.width=megapixelcount*sourceaspect*???; // I dont how to calculate this
destinationbitmap.height=megapixelcount*sourceaspect*???; // I dont how to calculate this
for x:=0 to destinationbitmap.width-1 do
for y:=0 to destinationbitmap.height-1 do
begin
sx:=??;
sy:=??;
destinationbitmap.canvas.pixels[x,y]=sourcebitmap.canvas.pixels[sx,sy];
end;
result:=destinationbitmap;
end;
I need the real formula... So an OpenGL solution would not be ideal...
Note: There is a version of this with proper math typesetting on the Math SE.
Computing a projective transformation
A perspective is a special case of a projective transformation, which in turn is defined by four points.
Step 1: Starting with the 4 positions in the source image, named (x1,y1) through (x4,y4), you solve the following system of linear equations:
[x1 x2 x3] [λ] [x4]
[y1 y2 y3]∙[μ] = [y4]
[ 1 1 1] [τ] [ 1]
The colums form homogenous coordinates: one dimension more, created by adding a 1 as the last entry. In subsequent steps, multiples of these vectors will be used to denote the same points. See the last step for an example of how to turn these back into two-dimensional coordinates.
Step 2: Scale the columns by the coefficients you just computed:
[λ∙x1 μ∙x2 τ∙x3]
A = [λ∙y1 μ∙y2 τ∙y3]
[λ μ τ ]
This matrix will map (1,0,0) to a multiple of (x1,y1,1), (0,1,0) to a multiple of (x2,y2,1), (0,0,1) to a multiple of (x3,y3,1) and (1,1,1) to (x4,y4,1). So it will map these four special vectors (called basis vectors in subsequent explanations) to the specified positions in the image.
Step 3: Repeat steps 1 and 2 for the corresponding positions in the destination image, in order to obtain a second matrix called B.
This is a map from basis vectors to destination positions.
Step 4: Invert B to obtain B⁻¹.
B maps from basis vectors to the destination positions, so the inverse matrix maps in the reverse direction.
Step 5: Compute the combined Matrix C = A∙B⁻¹.
B⁻¹ maps from destination positions to basis vectors, while A maps from there to source positions. So the combination maps destination positions to source positions.
Step 6: For every pixel (x,y) of the destination image, compute the product
[x'] [x]
[y'] = C∙[y]
[z'] [1]
These are the homogenous coordinates of your transformed point.
Step 7: Compute the position in the source image like this:
sx = x'/z'
sy = y'/z'
This is called dehomogenization of the coordinate vector.
All this math would be so much easier to read and write if SO were to support MathJax… ☹
Choosing the image size
The above aproach assumes that you know the location of your corners in the destination image. For these you have to know the width and height of that image, which is marked by question marks in your code as well. So let's assume the height of your output image were 1, and the width were sourceaspect. In that case, the overall area would be sourceaspect as well. You have to scale that area by a factor of pixelcount/sourceaspect to achieve an area of pixelcount. Which means that you have to scale each edge length by the square root of that factor. So in the end, you have
pixelcount = 1000000.*megapixelcount;
width = round(sqrt(pixelcount*sourceaspect));
height = round(sqrt(pixelcount/sourceaspect));
Use Graphics32, specifically TProjectiveTransformation (to use with the Transform method). Don't forget to leave some transparent margin in your source image so you don't get jagged edges.
I saw a lot a topic about this, I understood the theory but I'm not able to code this.
I have some pictures and I want to determine if they are blurred or not. I found a library (aforge.dll) and I used it to compte a FFT for an image.
As an example, there is two images i'm working on :
My code is in c# :
public Bitmap PerformFFT(Bitmap Picture)
{
//Loade Image
ComplexImage output = ComplexImage.FromBitmap(Picture);
// Perform FFT
output.ForwardFourierTransform();
// return image
return = output.ToBitmap();
}
How can I determine if the image is blurred ? I am not very comfortable with the theory, I need concret example. I saw this post, but I have no idea how to do that.
EDIT:
I'll clarify my question. When I have a 2D array of complex ComplexImage output (image FFT), what is the C# code (or pseudo code) I can use to determine if image is blurred ?
The concept of "blurred" is subjective. How much power at high frequencies indicates it's not blurry? Note that a blurry image of a complex scene has more power at high frequencies than a sharp image of a very simple scene. For example a sharp picture of a completely uniform scene has no high frequencies whatsoever. Thus it is impossible to define a unique blurriness measure.
What is possible is to compare two images of the same scene, and determine which one is more blurry (or identically, which one is sharper). This is what is used in automatic focussing. I don't know how exactly what process commercial cameras use, but in microscopy, images are taken at a series of focal depths, and compared.
One of the classical comparison methods doesn't involve Fourier transforms at all. One computes the local variance (for each pixel, take a small window around it and compute the variance for those values), and averages it across the image. The image with the highest variance has the best focus.
Comparing high vs low frequencies as in MBo's answer would be comparable to computing the Laplace filtered image, and averaging its absolute values (because it can return negative values). The Laplace filter is a high-pass filter, meaning that low frequencies are removed. Since the power in the high frequencies gives a relative measure of sharpness, this statistic does too (again relative, it is to be compared only to images of the same scene, taken under identical circumstances).
Blurred image has FFT result with smaller magnitude in high-frequency regions. Array elements with low indexes (near Result[0][0]) represent low-frequency region.
So divide resulting array by some criteria, sum magnitudes in both regions and compare them. For example, select a quarter of result array (of size M) with index<M/2 and indexy<M/2
For series of more and more blurred image (for the same initial image) you should see higher and higher ratio Sum(Low)/Sum(High)
Result is square array NxN. It has central symmetry (F(x,y)=F(-x,-y) because source is pure real), so it is enough to treat top half of array with y<N/2.
Low-frequency components are located near top-left and top-right corners of array (smallest values of y, smallest and highest values of x). So sum magnitudes of array elements in ranges
for y in range 0..N/2
for x in range 0..N
amp = magnitude(y,x)
if (y<N/4) and ((x<N/4)or (x>=3*N/4))
low = low + amp
else
high = high + amp
Note that your picture shows jumbled array pieces - this is standard practice to show zero component in the center.
I'm working on a photographic mosaic algorithm. There are 4 steps involved:
Determine segment regions
Determine cost of each candidate image at each segment region
Determine best assignment of each candidate image to each segment region
Render photographic mosaic.
The whole process is relatively straightforward, however Step 2 involves comparing n images with m segments, with n >> m. This is by far the most time intensive step.
Here is the process I go through for each segment-candidate pair:
Determine if the candidate image is compatible with the segment dimensions. If not, the assignment is assumed to be forbidden.
Using an intermediate sub-picture Bitmap created with Graphics.DrawImage(Image, Rectangle, Rectangle, GraphicsUnit), I convert the bitmap data into red, green, and blue int[,] matrices for the segment of the original image. I use the LockBits() method instead of the GetPixel() method as it is vastly faster. To reduce computation time, these matrices are only about 3x3 or 5x5 rather than the full dimensions of the original segment.
I do the same process with the candidate image, creating red, green, and blue 3x3 or 5x5 int[,] matrices.
Starting with cost = 0, I add the magnitude of the difference of the red, green, and blue values of the source and candidate image segments to the cost. The sum of these absolute differences is the assignment cost.
In reality, I check each candidate image with all 16 RotateFlipType transformations, so there are 16*n*m comparisons needed, where n = the number of segments and m = the number of placement regions.
I'm wondering whether I can perhaps do an FFT of each image and rather than comparing each pixel, I compare the low frequency components only, as the high frequency components will not substantially affect the output. On the other hand a lot of the overhead such as getting the sub-images and converting them to matrices are still there, and my gut tells me a spectral comparison will be slower than basic comparison of 25 int values.
At first I would do a huge speed up by
create info for each image like:
average color, r/g/b histograms I think 8 or 16 points per channel will suffice. You can add any other info (darkest/brightest color,...) but it should be rotation/flip invariant
index sort the images by average color
limit the R,G,B to few bits only like 4 ... and create single integer number from it like
col=R+(G<<4)+(B<<8);
and finally index sort used images by this number
comparison
so binary search the index sorted images (if you create table of indexes per each reduced color then this will be also reduced to O(1)) and find only images with close or equal average color as your segment.
Then find closest matches to histogram from these and then apply all you have only on those images...
The histogram comparison can be done by correlation coefficient or by any distance,or statistical deviation ...
As for the FFT part of your question I think it is more or less already answered by comments. Yes you can use it but I think it is an overkill for this. The overhead is huge but you can rescale images to low resolution and use FFT on that or just compare low res images only instead
[Notes]
Also using HSV instead of RGB could improve visual similarity
I am using the Lucas Kanade Optical Flow algorithm from openCV library in C#; There are series of frames that in every two of them I want to find out what was the optical flow and show it in a pictureBox.
I could fetch the velX & velY from following function:
Emgu.CV.OpticalFlow.LK(imGrayCurrent, imGrayNext, windSize, velX, velY);
Now,How should I use these two for show the flow between two frames? or in other words how should I get the displacement of pixels?
Tnx
A common way is to use a HSV to RGB transformation, see Middlebury Flow Dataset. Therefore:
Transform motion vectors to polar coordinates:
length = sqrt(velx² + vely²)
angle = acos(vely / length) [Attention additional checks for e.g. infinity has to be done]
Normalize angle to [0,360] (for OpenCV, or [0,1] depending on the function you use later) and the length to [0,1]. Create a hsv space (3-Channel Image with the first channel H (Hue), second channel S (Saturation) and the third channel V (Value)) and set: H = angle, S = length and V = 1.
Convert HSV colorspace to RGB e.g. by using cv::cvtColor(hsv, rgb, HSV2BGR);
The resulting image show now the motion vector field (velx,vely) where the color donates the direction of your motion and the saturation the length or speed of your motion.
How can I compare two images and determine if they are 100% similar, or only altered in color, or cropping?
Well, abstractly speaking, you need to define a similarity function, that compares two images. To determine if the images are "100% similar" (equal) you can do the following:
compare the sizes of the images
if the image sizes are the same simply subtract the pixels from each other
if ( sum( abs( pixel_1_i - pixel_2_j ) ) / num_pixels < threshold ) return true
For the case that images are differently colored, or cropped
apply an edge detector to both images
compute the cross-correlation (in the frequency domain, FFT)
find the highest peak
place the (smaller) edge map in the determined position
calculate the absolute error
if (error < threshold) return true
BTW: This approach will not work if your images are scaled or rotated.
Further Research:
cross-correlation: FFT (fast fourier transformation, link1, link2, FFT in C#), zero-padding (needed for the FFT if the input signals have different sizes)
edge detection: Sobel, Canny (these are very common image processing filters, they should be available in a C# library, just like the FFT)
The following is a fairly simplistic approach to the problem and won't work well with two different photographs of the same subject taken from slightly different angles, but would work if you had two copies of the same image that you wanted to verify.
The case of two identical images is straightforward - just loop through the pixel arrays subtracting on RGB value from the other. If the difference is less than a small tolerance then the pixel is identical. Thus as soon as you find a pixel difference greater than the tolerance you know that the images are different.
You could allow for a certain number or percentage of differences to allow for differences causes by compression artefacts.
To check for alterations in colour you could look at the HLS (Hue, Lightness and Saturation) values instead. If the pixels have the same L & S values but a different H value then it's just the colour that's different (I think).
Cropping is more difficult as you have to try to find the location of the smaller image in the larger one.
You can use object descriptors such as:
SIFT - http://en.wikipedia.org/wiki/Scale-invariant_feature_transform
SURF - http://en.wikipedia.org/wiki/SURF
Then compare images by using calculated descriptors. Those descriptors will enable you to deal with rotated, scaled and slightly changed images.
Also the descriptors consist of oriented gradients meaning that those descriptors are robust to illumination and color changes as well.
You can use Accord.NET (SURF implementation).