I'm working on a photographic mosaic algorithm. There are 4 steps involved:
Determine segment regions
Determine cost of each candidate image at each segment region
Determine best assignment of each candidate image to each segment region
Render photographic mosaic.
The whole process is relatively straightforward, however Step 2 involves comparing n images with m segments, with n >> m. This is by far the most time intensive step.
Here is the process I go through for each segment-candidate pair:
Determine if the candidate image is compatible with the segment dimensions. If not, the assignment is assumed to be forbidden.
Using an intermediate sub-picture Bitmap created with Graphics.DrawImage(Image, Rectangle, Rectangle, GraphicsUnit), I convert the bitmap data into red, green, and blue int[,] matrices for the segment of the original image. I use the LockBits() method instead of the GetPixel() method as it is vastly faster. To reduce computation time, these matrices are only about 3x3 or 5x5 rather than the full dimensions of the original segment.
I do the same process with the candidate image, creating red, green, and blue 3x3 or 5x5 int[,] matrices.
Starting with cost = 0, I add the magnitude of the difference of the red, green, and blue values of the source and candidate image segments to the cost. The sum of these absolute differences is the assignment cost.
In reality, I check each candidate image with all 16 RotateFlipType transformations, so there are 16*n*m comparisons needed, where n = the number of segments and m = the number of placement regions.
I'm wondering whether I can perhaps do an FFT of each image and rather than comparing each pixel, I compare the low frequency components only, as the high frequency components will not substantially affect the output. On the other hand a lot of the overhead such as getting the sub-images and converting them to matrices are still there, and my gut tells me a spectral comparison will be slower than basic comparison of 25 int values.
At first I would do a huge speed up by
create info for each image like:
average color, r/g/b histograms I think 8 or 16 points per channel will suffice. You can add any other info (darkest/brightest color,...) but it should be rotation/flip invariant
index sort the images by average color
limit the R,G,B to few bits only like 4 ... and create single integer number from it like
col=R+(G<<4)+(B<<8);
and finally index sort used images by this number
comparison
so binary search the index sorted images (if you create table of indexes per each reduced color then this will be also reduced to O(1)) and find only images with close or equal average color as your segment.
Then find closest matches to histogram from these and then apply all you have only on those images...
The histogram comparison can be done by correlation coefficient or by any distance,or statistical deviation ...
As for the FFT part of your question I think it is more or less already answered by comments. Yes you can use it but I think it is an overkill for this. The overhead is huge but you can rescale images to low resolution and use FFT on that or just compare low res images only instead
[Notes]
Also using HSV instead of RGB could improve visual similarity
Related
I have some photos of white pages with some black points drawn on, like this:
photo (the points aren't very circular, I can draw them better),
and I would find the coordinates of these points.
I can binarize the images (the previous photo binarized: image), but how can I find the coordinates of these black points? I need only the coordinates of one pixel for each point, the approximate center.
This is for a school assignment.
Since its for school work I will only provide you with a high level algorithm.
Since the background is guarantee to be white, you are in luck.
First you need to define a threshold on the level black which you want to consider as the black dot's color.
#ffffff is pure white and #000000 is pure black. I would suggest some where like #383838 to be your threshold.
Then you make a two dimensional bool array to keep track of which pixel you have visited already.
Now we can start looking at the picture.
You read the pixel one at the time horizontally and see if the pixel is > threshold. If yes then you do a DFS or BFS to find the entire area where the pixel's neighbor is also > threshold.
During the process you will be marking the bool array we created earlier to indicate that you have already visited the pixel.
since its a circle point you can just take the min, max of x and y coordinate and calculate the center point.
Once you are done with one point you would keep looping thru the picture's pixel and find the points that you have not visited (false in the bool array)
Since the points you have on the photo contains some small dots on the edge which is not connected to the large point, you might have to do some math to see if the radius is > some number to consider that a valid point. Or instead of a radius 1 neighbor you do a 5 - 10 pixel neighbor BFS/DFS to include the ones that are really close to the main point.
The basics for processing image data can be found in other questions, so I won't go into deeper detail about that, but for the threshold check specifically, I'd do it by gathering the red, green and blue bytes of each pixel (as indicated in the answer I linked), and then just combine them to a Color c = Color.FromArgb(r,g,b) and testing that to be "dark" using c.GetBrightness() < brightnessThreshold. A value of 0.4 was a good threshold for your test image.
You should store the result of this threshold detection in an array in which each item is a value that indicates whether the threshold check passed or failed. This means you can use something as simple as a two-dimensional Boolean array with the original image's height and width.
If you already have methods of doing all that, all the better. Just make sure you got some kind of array in which you can easily look up the result of that binarization. If the method you have gives you the result as image, you will be more likely to end up with a simple one-dimensional byte array, but then your lookups will simply be of a format like imagedata[y * stride + x]. This is functionally identical to how internal lookups in a two-dimensional array happen, so it won't be any less efficient.
Now, the real stuff in here, as I said in my comment, would be an algorithm to detect which pixels should be grouped together to one "blob".
The general usage of this algorithm is to loop over every single pixel on the image, then check if A) it cleared the threshold, and B) it isn't already in one of your existing detected blobs. If the pixel qualifies, generate a new list of all threshold-passed pixels connected to this one, and add that new list to your list of detected blobs. I used the Point class to collect coordinates, making each of my blobs a List<Point>, and my collection of blobs a List<List<Point>>.
As for the algorithm itself, what you do is make two collections of points. One is the full collection of neighbouring points you're building up (the points list), the other is the current edge you're scanning (the current edge list). The current edge list will start out containing your origin point, and the following steps will loop as long as there are items in your current edge list:
Add all items from the current edge list into the full points list.
Make a new collection for your next edge (the next edge list).
For each point in your current edge list, get a list of its directly neighbouring points (excluding any that would fall outside the image bounds), and check for all of these points if they clear the threshold, and if they are not already in either the points list or the next edge list. Add the points that pass the checks to the next edge list.
After this loop through the current edge list ends, replace the original current edge list by the next edge list.
...and, as I said, loop these steps as long as your current edge list after this last step is not empty.
This will create an edge that expands until it matches all threshold-clearing pixels, and will add them all to the list. Eventually, as all neighbouring pixels end up in the main list, the new generated edge list will become empty, and the algorithm will end. Then you add your new points list to the list of blobs, and any pixels you loop over after that can be detected as already being in those blobs, so the algorithm is not repeated for them.
There are two ways of doing the neighbouring points; you either get the four points around it, or all eight. The difference is that using four will not make the algorithm do diagonal jumps, while using eight will. (An added effect is that one causes the algorithm to expand in a diamond shape, while the other expands in a square.) Since you seem to have some stray pixels around your blobs, I advise you to get all eight.
As Steve pointed out in his answer, a very quick way of doing checks to see if a point is present in a collection is to create a two-dimensional Boolean array with the dimensions of the image, e.g. Boolean[,] inBlob = new Boolean[height, width];, which you keep synchronized with the actual points list. So whenever you add a point, you also mark the [y, x] position in the Boolean array as true. This will make rather heavy checks of the if (collection.contains(point)) type as simple as if (inBlob[y,x]), which requires no iterations at all.
I had a List<Boolean[,]> inBlobs which I kept synced with the List<List<Point>> blobs I built, and in the expanding-edge algorithm I kept such a Boolean[,] for both the next edge list and the points list (the latter of which was added to inBlobs at the end).
As I commented, once you have your blobs, just loop over the points inside them per blob and get the minimums and maximums for both X and Y, so you end up with the boundaries of the blob. Then just take the averages of those to get the center of the blob.
Extras:
If all your dots are guaranteed to be a significant distance apart, a very easy way to get rid of floating edge pixels is to take the edge boundaries of each blob, expand them all by a certain threshold (I took 2 pixels for that), and then loop over these rectangles and check if any intersect, and merge those that do. The Rectangle class has both an IntersectsWith() for easy checks, and a static Rectangle.Inflate for increasing a rectangle's size.
You can optimise the memory usage of the fill method by only storing the edge points (threshold-matching points with non-matching neighbours in any of the four main directions) in the main list. The final boundaries, and thus the center, will remain the same. The important thing to remember then is that, while you exclude a bunch of points from the blob list, you should mark all of them in the Boolean[,] array that's used for checking the already-processed pixels. This doesn't take up any extra memory anyway.
The full algorithm, including optimisations, in action on your photo, using 0.4 as brightness threshold:
Blue are the detected blobs, red is the detected outline (by using the memory-optimised method), and the single green pixels indicate the center points of all blobs.
[Edit]
Since it's been almost a year since I posted this, I guess I might as well link to the implementation I made of this. I actually managed to use it myself about a month after I wrote it, when recreating the video compression algorithm of an old DOS game which used chunked up diff frames.
I saw a lot a topic about this, I understood the theory but I'm not able to code this.
I have some pictures and I want to determine if they are blurred or not. I found a library (aforge.dll) and I used it to compte a FFT for an image.
As an example, there is two images i'm working on :
My code is in c# :
public Bitmap PerformFFT(Bitmap Picture)
{
//Loade Image
ComplexImage output = ComplexImage.FromBitmap(Picture);
// Perform FFT
output.ForwardFourierTransform();
// return image
return = output.ToBitmap();
}
How can I determine if the image is blurred ? I am not very comfortable with the theory, I need concret example. I saw this post, but I have no idea how to do that.
EDIT:
I'll clarify my question. When I have a 2D array of complex ComplexImage output (image FFT), what is the C# code (or pseudo code) I can use to determine if image is blurred ?
The concept of "blurred" is subjective. How much power at high frequencies indicates it's not blurry? Note that a blurry image of a complex scene has more power at high frequencies than a sharp image of a very simple scene. For example a sharp picture of a completely uniform scene has no high frequencies whatsoever. Thus it is impossible to define a unique blurriness measure.
What is possible is to compare two images of the same scene, and determine which one is more blurry (or identically, which one is sharper). This is what is used in automatic focussing. I don't know how exactly what process commercial cameras use, but in microscopy, images are taken at a series of focal depths, and compared.
One of the classical comparison methods doesn't involve Fourier transforms at all. One computes the local variance (for each pixel, take a small window around it and compute the variance for those values), and averages it across the image. The image with the highest variance has the best focus.
Comparing high vs low frequencies as in MBo's answer would be comparable to computing the Laplace filtered image, and averaging its absolute values (because it can return negative values). The Laplace filter is a high-pass filter, meaning that low frequencies are removed. Since the power in the high frequencies gives a relative measure of sharpness, this statistic does too (again relative, it is to be compared only to images of the same scene, taken under identical circumstances).
Blurred image has FFT result with smaller magnitude in high-frequency regions. Array elements with low indexes (near Result[0][0]) represent low-frequency region.
So divide resulting array by some criteria, sum magnitudes in both regions and compare them. For example, select a quarter of result array (of size M) with index<M/2 and indexy<M/2
For series of more and more blurred image (for the same initial image) you should see higher and higher ratio Sum(Low)/Sum(High)
Result is square array NxN. It has central symmetry (F(x,y)=F(-x,-y) because source is pure real), so it is enough to treat top half of array with y<N/2.
Low-frequency components are located near top-left and top-right corners of array (smallest values of y, smallest and highest values of x). So sum magnitudes of array elements in ranges
for y in range 0..N/2
for x in range 0..N
amp = magnitude(y,x)
if (y<N/4) and ((x<N/4)or (x>=3*N/4))
low = low + amp
else
high = high + amp
Note that your picture shows jumbled array pieces - this is standard practice to show zero component in the center.
I am trying to find duplicate videos in my database, and in order to do so I grab two frames from a videos pair, resize them to the same width and height and then compare both images pixel by pixel.
I have a case where images from the a videos pair are like below:
-----
These are actually the same videos (images), but because of the aspect ratio of the videos (16:9, 4:3 .. etc) the result is negative when comparing pixel by pixel (no match).
If my standard is 50x50, how can I transform any Region Of Interest to 50x50?
For the above example:
Pixel [5,0] shall be [0,0]
Pixel [45,0] shall be [50,0]
Pixel [5,50] shall be [0,50]
Pixel [45,50] shall be [50,50]
and all other pixels are transformed
Encouraged by OP that pseudo-code can be helpful ....
I have no knowledge about "emgucv", so I will answer in pseudo-code.
Definition
Let SRC be a source image - to be read.
Let DST be a destination image - to be written.
Both SRC and DST are 2D-array, can be accessed as ARRAY[int pixelX,int pixelY].
Here is the pseudo-code :-
input : int srcMinX,srcMinY,srcMaxX,srcMaxY;
float linearGra(float dst1,float dst2,float src1,float src2,float dst3){
return ( (dst3-dst1)*src2+ (dst2-dst3)*src1) / (dst2-dst1);
};
for(int y=0;y<50;y++){ //y of DST
for(int x=0;x<50;x++){ //x of DST
float xSRC=linearGra(0,50,srcMinX,srcMaxX,x);
float ySRC=linearGra(0,50,srcMinY,srcMaxY,y);
DST[x,y]=SRC[round(xSRC),round(ySRC)]; //find nearest pixel
}
}
Description
The main idea is to use linear-interpolation.
The function linearGra takes two points in a 2D graph (dst1,src1) and (dst2,src2) .
Assuming that it is a linear function (it is true because scaling+moving is linear function between SRC and DST coordinate), it will find the point (dst3,?) that lying in the graph.
I used this function to calculate pixel coordinate in SRC that match a certain pixel in DST.
Further work
If you are a perfectionist, you may want to :-
bounded the index (xSRC,ySRC) - so it will not index-out-of-bound
improve the accurary :-
I currently ignore some pixels (I use Nearest-neighbor w/o interpolation).
The better approach is to integrate all involved SRC's pixel, but you will get a-bit-blurry image in some cases.
You may also be interested in this opencv link (not emgucv).
How can I compare two images and determine if they are 100% similar, or only altered in color, or cropping?
Well, abstractly speaking, you need to define a similarity function, that compares two images. To determine if the images are "100% similar" (equal) you can do the following:
compare the sizes of the images
if the image sizes are the same simply subtract the pixels from each other
if ( sum( abs( pixel_1_i - pixel_2_j ) ) / num_pixels < threshold ) return true
For the case that images are differently colored, or cropped
apply an edge detector to both images
compute the cross-correlation (in the frequency domain, FFT)
find the highest peak
place the (smaller) edge map in the determined position
calculate the absolute error
if (error < threshold) return true
BTW: This approach will not work if your images are scaled or rotated.
Further Research:
cross-correlation: FFT (fast fourier transformation, link1, link2, FFT in C#), zero-padding (needed for the FFT if the input signals have different sizes)
edge detection: Sobel, Canny (these are very common image processing filters, they should be available in a C# library, just like the FFT)
The following is a fairly simplistic approach to the problem and won't work well with two different photographs of the same subject taken from slightly different angles, but would work if you had two copies of the same image that you wanted to verify.
The case of two identical images is straightforward - just loop through the pixel arrays subtracting on RGB value from the other. If the difference is less than a small tolerance then the pixel is identical. Thus as soon as you find a pixel difference greater than the tolerance you know that the images are different.
You could allow for a certain number or percentage of differences to allow for differences causes by compression artefacts.
To check for alterations in colour you could look at the HLS (Hue, Lightness and Saturation) values instead. If the pixels have the same L & S values but a different H value then it's just the colour that's different (I think).
Cropping is more difficult as you have to try to find the location of the smaller image in the larger one.
You can use object descriptors such as:
SIFT - http://en.wikipedia.org/wiki/Scale-invariant_feature_transform
SURF - http://en.wikipedia.org/wiki/SURF
Then compare images by using calculated descriptors. Those descriptors will enable you to deal with rotated, scaled and slightly changed images.
Also the descriptors consist of oriented gradients meaning that those descriptors are robust to illumination and color changes as well.
You can use Accord.NET (SURF implementation).
I am looking for an EASY way to check if an image is a scaled version of another image. It does not have to be very fast, it just should be "fairly" accurate. And written in .NET. And for free.
I know, wishful thinking :-)
I am pretty sure, even without having tried it, that converting the bigger image to the smaller scale and comparing checksums is not working (especially if the smaller version was done with another software then .NET).
The next approach would be to scale down and compare pixels. But first of all, it seems like a really bad idea running a loop over all pixels with a bool comparison results, I am sure there will be some pixels off by a bit or so...
Any library coming to mind? Way back in the university we had some MPEG7 classes, so I am thinking about using a combination of "statistics" like tone distribution, brightness, etc..
Any ideas or links for that topic?
Thanks,
Chris
I think this is going to be your best solution. First check the aspect ratio. Then scale the images to the smaller of the 2 if they're not the same size. Finally, do a hash comparison of the 2 images. This is a lot faster than doing a pixel compare. I found the hash compare method in a post from someone else and just adapted the answer here to fit. I was trying to think of the best way to do this myself for a project where I'm going to have to compare over 5200 images. After I read a few of the posts here I realized I already had everything I needed for it and figured I'd share.
public class CompareImages2
{
public enum CompareResult
{
ciCompareOk,
ciPixelMismatch,
ciAspectMismatch
};
public static CompareResult Compare(Bitmap bmp1, Bitmap bmp2)
{
CompareResult cr = CompareResult.ciCompareOk;
//Test to see if we have the same size of image
if (bmp1.Size.Height / bmp1.Size.Width == bmp2.Size.Height / bmp2.Size.Width)
{
if (bmp1.Size != bmp2.Size)
{
if (bmp1.Size.Height > bmp2.Size.Height)
{
bmp1 = (new Bitmap(bmp1, bmp2.Size));
}
else if (bmp1.Size.Height < bmp2.Size.Height)
{
bmp2 = (new Bitmap(bmp2, bmp1.Size));
}
}
//Convert each image to a byte array
System.Drawing.ImageConverter ic = new System.Drawing.ImageConverter();
byte[] btImage1 = new byte[1];
btImage1 = (byte[])ic.ConvertTo(bmp1, btImage1.GetType());
byte[] btImage2 = new byte[1];
btImage2 = (byte[])ic.ConvertTo(bmp2, btImage2.GetType());
//Compute a hash for each image
SHA256Managed shaM = new SHA256Managed();
byte[] hash1 = shaM.ComputeHash(btImage1);
byte[] hash2 = shaM.ComputeHash(btImage2);
//Compare the hash values
for (int i = 0; i < hash1.Length && i < hash2.Length && cr == CompareResult.ciCompareOk; i++)
{
if (hash1[i] != hash2[i])
cr = CompareResult.ciPixelMismatch;
}
}
else cr = CompareResult.ciAspectMismatch;
return cr;
}
}
One idea to achieve this:
If the image is 10x10, and your original is 40x40
Loop each pixel in the 10x10, then retrieve the 4 pixels representative of that looped pixel.
So for each pixel in the smaller image, find the corresponding scaled amount of pixels in the larger image.
You can then take the average colour of the 4 pixels, and compare with the pixel in the smaller image. You can specify error bounds, IE -10% or +10% bounds are considered a match, others are considered a failure.
Build up a count of matches and failures and use the bounds to determine if it is considered a match or not.
I think this might perform better than scaling the image to the same size and doing a 1pixel:1pixel comparison as I'm not sure how resizing algorithms necesserially work and you might lose some detail which will give less accurate results. Or if there might be different ways and methods of resizing images. But, again I don't know how the resize might work depends on how you go about doing it.
Just scale the larger image back to the size of the smaller one, then compare each pixel by taking the absolute value of the difference in each of the red, green and blue components.
You can then set a threshold for deciding how close you need to be to count it as a match, e.g. if 95%+ of the pixels are within 5% of the colour value, you have a match.
The fuzzy match is necessary because you may have scaling artefacts / anti-aliasing effects.
You'll have to loop over the pixels at some point or another.
Something that is easy to implement yet quite powerful is to calculate the difference between individual color components (RGB) for each pixel, find the average, and see if it crosses a certain threshold. It's certainly not the best method, but for a quick check it should do.
I'd have said roughly what Tom Gullen except I'd just scale down the bigger image to the smaller before comparing (otherwise you're just going to have hard maths if you are comparing a 25x25 with a 30x30 or something).
The other thing I might consider depending on image sizes is to scale them both down to a smaller image. ie if you have one that is 4000x4000 and another that is 3000x3000 then you can scale them both down to 200x200 and compare them at that size.
As others have said you would then need to do a check with a threshold (preferably on colour components) and decide what tolerances work best. I'd suggest this is probably best done by trial and error.
The easiest way is just to scale the biggest image to the smaller images size and compare color difference. Since you don't know if the scaling is cubic or linear (or something else) you have to accept a small difference.
Don't forget to take the absolute value of each pixel difference. ;)
Having absolutely no authority or experience in this area I'm going to make a stab at helping you.
I'd start with the aspect ratio matching by some tolerance, unless you're comparing cropped sections of images, which will makes things a bit harder.
I'd then scan the pixels for regions of similarity, no exactness, again a tolerance level is needed. Then when an area is similar, run along in a straight line comparing one to the other, and find another similarly coloured area. Black & white's gonna be harder.
If you get a hit, you'll have two areas in a line with patches of likeness. With two points you have a reference of length between them and so now you can see what the scaling might be. You could also scale the images first, but this doesn't account for cropped sections where aspects don't match.
Now choose a random point in the source image and get the colour info. Then using the scale factor, find that same random point on the other image and see if the colour checks out. Do it a few times with random points. If many turn up similar it's likely a copy.
You might then want to mark it for further, more CPU intensive, inspection. Either a pixel by pixel comparison or something else.
I know Microsoft (Photosynth) use filters like "outline" (the sort of stuff in Photoshop) to remove the image colours and leave just squrly lines which leave just the 'components' of the picture for matching (they match boundaries and overlap).
For speed, I'd break the problem down into chunks and really think about how humans decide two photos are similar. For non-speed, exhaustively comparing colour will probably get you there.
The process in short:
If you hole punched a sheet of paper randomly 4 times, then put it over two photos, just by seeing the colours coming through you could tell if they were likely a copy and need further inspection.