Problem
Problem shaping
Image sequence position and size are fixed and known beforehand (it's not scaled). It will be quite short, maximum of 20 frames and in a closed loop. I want to verify (event driven by button click), that I have seen it before.
Lets say I have some image sequence, like:
http://img514.imageshack.us/img514/5440/60372aeba8595eda.gif
If seen, I want to see the ID associated with it, if not - it will be analyzed and added as new instance of image sequence, that has been seen. I have though about this quite a while, and I admit, this might be a hard problem. I seem to be having hard time of putting this all together, can someone assist (in C#)?
Limitations and uses
I am not trying to recreate copyright detection system, like content id system Youtube has implemented (Margaret Gould Stewart at TED ( link )). The image sequence can be thought about like a (.gif) file, but it is not and there is no direct way to get binary. Similar method could be used, to avoid duplicates in "image sharing database", but it is not what I am trying to do.
My effort
Gaussian blur
Mathematica function to generate Gaussian blur kernels:
getKernel[L_] := Transpose[{L}].{L}/(Total[Total[Transpose[{L}].{L}]])
getVKernel[L_] := L/Total[L]
Turns out, that it is much more efficient to use 2 passes of vector kernel, then matrix kernel. Thy are based on Pascal triangle uneven rows:
{1d/4, 1d/2, 1d/4}
{1d/16, 1d/4, 3d/8, 1d/4, 1d/16}
{1d/64, 3d/32, 15d/64, 5d/16, 15d/64, 3d/32, 1d/64}
Data input, hashing, grayscaleing and lightboxing
Example of source bits, that might be useful:
Lightbox around the known rectangle: FrameX
Using MD5CryptoServiceProvider to get md5 hash of the content inside known rectangle atm.
Using ColorMatrix to grayscale image
Source example
Source example (GUI; code):
Get current content inside defined rectangle.
private Bitmap getContentBitmap() {
Rectangle r = f.r;
Bitmap hc = new Bitmap(r.Width, r.Height);
using (Graphics gf = Graphics.FromImage(hc)) {
gf.CopyFromScreen(r.Left, r.Top, 0, 0, //
new Size(r.Width, r.Height), CopyPixelOperation.SourceCopy);
}
return hc;
}
Get md5 hash of bitmap.
private byte[] getBitmapHash(Bitmap hc) {
return md5.ComputeHash(c.ConvertTo(hc, typeof(byte[])) as byte[]);
}
Get grayscale of the image.
public static Bitmap getGrayscale(Bitmap hc){
Bitmap result = new Bitmap(hc.Width, hc.Height);
ColorMatrix colorMatrix = new ColorMatrix(new float[][]{
new float[]{0.5f,0.5f,0.5f,0,0}, new float[]{0.5f,0.5f,0.5f,0,0},
new float[]{0.5f,0.5f,0.5f,0,0}, new float[]{0,0,0,1,0,0},
new float[]{0,0,0,0,1,0}, new float[]{0,0,0,0,0,1}});
using (Graphics g = Graphics.FromImage(result)) {
ImageAttributes attributes = new ImageAttributes();
attributes.SetColorMatrix(colorMatrix);
g.DrawImage(hc, new Rectangle(0, 0, hc.Width, hc.Height),
0, 0, hc.Width, hc.Height, GraphicsUnit.Pixel, attributes);
}
return result;
}
I think you have a few issues with this:
Not all image sequences [videos] are equal [but many are similar]
Where is your data coming from?
How will you repesent the data related to your viewings?
Size of the data
Issue #1:
Many images can differ slightly by compression, water marking, missing frames, and adding clips. I would suggest sampling the video. For example you may want to consider sub-sampling small sections of the images in the video. Additionally, to avoid noisy images and issues with lossely compression algorithms. You may want to consider grayscaling the frames sampled, and doing a gaussian blur. [Guassian because its "more natural" (short answer)] Once you have enough sub samples to where you have a good confidence of similarity to the video then store it in a database. With the samples you can hash them, or store them to do a % similarity later.
Issue #2
Your datasource is going to influence the tool kits, and libraries that you use.
I would suggest keeping this simple [keep it with gifs and create a custom viewer, dont' try to write a browser plugin while developing your logic]
Issue #3
Using something like Postgres [if there are a lot of large sized objects] or SQLLite is highly suggested for indexing, storing, and recalling past meta data.
Issue #4
The size of the data will have a huge determination on recall, sampling, querying the database, etc.
Overall advice: Don't bite off more than you can handle at this stage. Start small and then grow.
Also take a look at Computer Vision algorithms for more help on the object representation/recall.
The question itself is sure very interesting and challenging, however there are many practical issues as stated by #monksy.
The opportunist pragmatic in me would take a step back, look at the big picture and see if there is another way to solve the problem. For example, if you are building some kind of "image sharing community" and want to avoid duplicates in the database, you could do a simple md5 on the file (animated gifs on the web are usually always the same, it's rare that people modify them).
Another example: if you are analyzing scientific samples (like meteo sequences) it may be easier to directly embed some kind of hash in every file when generating them.
This depends on wether you only want to know wether you've seen an absolutely identical movie again, or you also want to identify movies that are very similar but have been changed a bit (made lighter, have a watermark added, compression changed, etc.)
In the first case, just take any type of hash of the file and use that (because the file will be identical on the binary level.
In the second case (which I think is what you want) you have an interesting image processing problem on your hands. You could find yourself at the front-lines of image processing science with this if you'd want. If that is the case I suggest you start reading about SURF and OpenCV, and continue on from that.
If you want to match very similar, but not identical videos, and don't want to go the ultra-robus scientific route then I'd suggest the following process:
Do the gaussian blur you already do.
Divide each image into a few equally sized rectangles (you'd have to test for the best number, but I'd suggest you start with 9.
For each rectangle in each frame compute the full-colour histogram, then find the most occurring colour in that rectangle. This gives you 9*20 = 180 numbers. This is the "fingerprint" of this movie.
Find the most similar fingerprint in your database, if it is similar enough you already know about it, otherwise you don't.
Step 4 is a bit vague because I'm not really into this field. You are currently using an MD5 hash as a sort of fingerprint, but this is unsuitable in this case because slight differences in the input of a good cryptographic hashing function produce very large differences in the hash. This will mean that two very similar frames will have a totally different MD5 hash, so from the hash you'd never know they were similar.
As long as speed of database lookups is not an issue I'd just go for the sum of square differences as a measure of fingerprint similarity, and set a threshold on that to identify equal movies. However, this is not very fast for huge datasets, and in those cases you'd probably need to transform your fingerprint to something that will allow you to find similar fingerprints faster. One thing you could do here is start by selecting all known movies with very similar average colour for the entire video, then from that select the movies that have very similar average colour in each frame, and in the ones that remain at that point do the full rectangle-by-rectangle fingerprint match. But I'm sure there are even faster options for matching 180 numbers.
Perhaps you can find a way to get a binary copy of the image data of each frame in a variable. Hash that data (md5?) and store each of the hashes. Then you can see if you've ever seen that hash before. If you haven't, it's a new frame.
Related
I wanna send a series of integers to HLSL in the form of a 3D array using unity. I've been trying to do this for a couple of days now, but without any gain. I tried to pack the buffers into each other (StructuredBuffer<StructuredBuffer<StructuredBuffer<int>>>), but it simply won't work. And I need to make this thing resizable, so I can't use arrays in structs. What should I do?
EDIT: To clarify a bit more what I am trying to do here, this is a medical program. When you go make a scan of your body, some files are generated. Those files are called DICOM files(.dcm). Those files are given out to a doctor. The doctor should open the program, select all of the DICOM files and load them. Each DICOM file contains an image. However, those images are not as the normal images used in our daily life. Those images are grayscale and each pixel has a value that ranges between -1000 to a couple of thousands, so each pixel is saved as 2 bytes(or an Int16). I need to generate a 3D model of the body that got scanned, so I'm using the Marching Cubes algorithm to generate it(have a look at Polygonising a Scalar Field). The problem is I used to loop over each pixel in about 360 512*512 sized images, which took too much time. I used to read the pixel data from each file once I needed it when I used the CPU. Now I'm trying to make this process occur at runtime. I need to send all of the pixel data to the GPU before processing it. That's my problem. I need the GPU to read data from disk. Because that ain't possible, I need to send 360*512*512*4 bytes of data to the GPU in the form of 3D array of Ints. I'm also planning to keep the data there to avoid retransfer of that huge amount of memory. What should I do? Refer to this link to know more about what I'm doing
From what I've understood, I would suggest to try the following:
Flatten your data (nested buffers are not what you want on your gpu)
Split your data across multiple ComputeBuffers if necessary (when I played around with them on a Nvidia Titan X I could store approximately 1GB of data per buffer. I was rendering a 3D point cloud with 1.5GB of data or something, the 360MBytes of data you mentioned should not be a problem then)
If you need multiple buffers: let them overlap as needed for your marching cubes algorithm
Do all of your calculations in a ComputeShader (I think requires DX11, if you have multiple buffer, run it multiple times and accumulate your results) and then use the results in a standard shader which your call from OnPostRender function (use Graphics.DrawProcedural inside to just draw points or build a mesh on the gpu)
Edit (Might be interesting to you)
If you want to append data to a gpu buffer (because you don't know the exact size or you can't write it to the gpu at once), you can use AppendBuffers and a ComputeShader.
C# Script Fragments:
struct DataStruct
{
...
}
DataStruct[] yourData;
yourData = loadStuff();
ComputeBuffer tmpBuffer = new ComputeBuffer(512, Marshal.SizeOf(typeof(DataStruct)));
ComputeBuffer gpuData = new ComputeBuffer(MAX_SIZE, Marshal.SizeOf(typeof(DataStruct)), ComputeBufferType.Append);
for (int i = 0; i < yourData.Length / 512; i++) {
// write data subset to temporary buffer on gpu
tmpBuffer.SetData(DataStruct.Skip(i*512).Take((i+1)*512).ToArray()); // Use fancy Linq stuff to select data subset
// set up and run compute shader for appending data to "gpuData" buffer
AppendComputeShader.SetBuffer(0, "inBuffer", tmpBuffer);
AppendComputeShader.SetBuffer(0, "appendBuffer", gpuData);
AppendComputeShader.Dispatch(0, 512/8, 1, 1); // 8 = gpu work group size -> use 512/8 work groups
}
ComputeShader:
struct DataStruct // replicate struct in shader
{
...
}
#pragma kernel append
StructuredBuffer<DataStruct> inBuffer;
AppendStructuredBuffer<DataStruct> appendBuffer;
[numthreads(8,1,1)]
void append(int id: SV_DispatchThreadID) {
appendBuffer.Append(inBuffer[id]);
}
Note:
AppendComputeShader has to be assigned via the Inspector
512 is an arbitrary batch size, there is an upper limit of how much data you can append to a gpu buffer at once, but I think that depends on the hardware (for me it seemed to be 65536 * 4 Bytes)
you have to provide a maximum size for gpu buffers (on the Titan X it seems to be ~1GB)
In Unity we currently have the MaterialPropertyBlock that allows SetMatrixArray and SetVectorArray, and to make this even sweeter, we can set globally using the Shader static helpers SetGlobalVectorArray and SetGlobalMatrixArray. I believe that these will help you out.
In case you prefer the old way, please look at this quite nice article showing how to pass arrays of vectors.
I'm working on a scientific imaging software for my university, and I've encountered a major problem. Scientific camera (Apogee Alta U57) at my lab provides images as 16bpp array - it's 0-65535 values per pixel! We want to keep this range, but in fact we can't display them on monitor (0-255 grayscale range). So I found a way to resolve this problem - simply to make use of colors, and to display whole image as a heatmap (from black, blue, through green and red, to pure white).
I mean something like this - Example heatmap image I want to achieve
My only question is: How to efficiently convert 16bpp array of pixel values to complete heatmap bitmap in c#? Are there any libraries for doing that? If not, how do I achieve that using .NET resources?
My idea was to create function that maps 65536 values into (255 R, 255G, 255B), but it's a tough job - especially without using HSV model.
I would be much obliged for any help provided!
Your question consist of several parts:
reading in the 16 bit pixel data values
mapping them to 24 bit rgb colors
writing them out to an image file
I'll skip part one and three and give you a few ideas about part 2.
It is in fact harder than it seems. A unique mapping that doesn't lose any information is simple, in fact trivial, just a little bit shifting will do.
But you also want the result to work visually, meaning not so much is should be visually appealing but should make sense to a human eye. so we need a mapping that has a credible yet large enough gradient.
For this you should experiment a little. I suggest to make use of the LinearGradientBrush, as I show here. Have a look at the interpolateColors function! It uses only 6 colors in the example, way to few for your case!
You should pick many more; you may need to go through the color space in a spiral..
The trick for you will be to choose both nice and enough stop colors to create a 64k large set of unique colors, best going from blueish to reddish..
You will need to test the result for uniqueness; in fact you may want to create a pair of Dictionary and Dictionary for the mappings..
I have 2 bmp images.
ImageA is a screenshot (example)
ImageB is a subset of that. Say for example, an icon.
I want to find the X,Y coordinates of ImageB within ImageA (if it exists).
Any idea how I would do that?
Here's a quick sample but it is slow take around 4-6 seconds, but it does exactly what you looking for and i know this post is old but if anyone else visiting this post recently
you can look this thing
you need .NET AForge namespace or framework google it and install it
include AForge name space in your project and that's it
it finds the pictiure with another and gives out the coordinates.
System.Drawing.Bitmap sourceImage = (Bitmap)Bitmap.FromFile(#"C:\SavedBMPs\1.jpg");
System.Drawing.Bitmap template = (Bitmap)Bitmap.FromFile(#"C:\SavedBMPs\2.jpg");
// create template matching algorithm's instance
// (set similarity threshold to 92.1%)
ExhaustiveTemplateMatching tm = new ExhaustiveTemplateMatching(0.921f);
// find all matchings with specified above similarity
TemplateMatch[] matchings = tm.ProcessImage(sourceImage, template);
// highlight found matchings
BitmapData data = sourceImage.LockBits(
new Rectangle(0, 0, sourceImage.Width, sourceImage.Height),
ImageLockMode.ReadWrite, sourceImage.PixelFormat);
foreach (TemplateMatch m in matchings)
{
Drawing.Rectangle(data, m.Rectangle, Color.White);
MessageBox.Show(m.Rectangle.Location.ToString());
// do something else with matching
}
sourceImage.UnlockBits(data);
So is there any warping of ImageB in ImageA?
How "exact" are the images, as in, pixel-for-pixel they will be the same?
How much computational power do you have for this?
If the answers to the first two questions are No and Yes, then you have a simple problem. It also helps to know the answer to Q3.
Update:
The basic idea's this: instead of matching a window around every pixel in imageB with every pixel in imageA and checking the correlation, let's identify points of interest (or features) in both images which will be trackable. So it looks like corners are really trackable since the area around it is kinda similar (not going into details) - hence, let's find some really strong corners in both images and search for corners which look most similar.
This reduces the problem of searching every pixel in B with A to searching for, say, 500 corners in B with a 1000 corners in A (or something like that) - much faster.
And the awesome thing is you have several such corner detectors at your disposal in OpenCV. If you don't feel using emguCV (C# varriant), then use the FAST detector to find matching corners and thus locate multiple features between your images. Once you have that, you can find the location of the top-left corner of the image.
If image B is an exact subset of image A (meaning, the pixel values are exactly the same), this is not an image processing problem, it's just string matching in 2D. In 99% of the cases, taking a line form the middle of B and matching it against each line of A will do what you want, and super fast &mdhas; I guess C# has a function for that. After you get your matches (normally, a few of them), just check the whole of B against the appropriate part of A.
The only problem I can see with this is that in some cases you can get too many matches. E.g. if A is your desktop, B is an icon, and you are unlucky enough to pick a line in B consisting of background only. This problem is easy to solve (you have to choose lines from B a bit more carefully), but this depends on the specifics of your problem.
Finding sub images in an imageFind an image in an ImageCheck if an image exists within another image
I am looking for an EASY way to check if an image is a scaled version of another image. It does not have to be very fast, it just should be "fairly" accurate. And written in .NET. And for free.
I know, wishful thinking :-)
I am pretty sure, even without having tried it, that converting the bigger image to the smaller scale and comparing checksums is not working (especially if the smaller version was done with another software then .NET).
The next approach would be to scale down and compare pixels. But first of all, it seems like a really bad idea running a loop over all pixels with a bool comparison results, I am sure there will be some pixels off by a bit or so...
Any library coming to mind? Way back in the university we had some MPEG7 classes, so I am thinking about using a combination of "statistics" like tone distribution, brightness, etc..
Any ideas or links for that topic?
Thanks,
Chris
I think this is going to be your best solution. First check the aspect ratio. Then scale the images to the smaller of the 2 if they're not the same size. Finally, do a hash comparison of the 2 images. This is a lot faster than doing a pixel compare. I found the hash compare method in a post from someone else and just adapted the answer here to fit. I was trying to think of the best way to do this myself for a project where I'm going to have to compare over 5200 images. After I read a few of the posts here I realized I already had everything I needed for it and figured I'd share.
public class CompareImages2
{
public enum CompareResult
{
ciCompareOk,
ciPixelMismatch,
ciAspectMismatch
};
public static CompareResult Compare(Bitmap bmp1, Bitmap bmp2)
{
CompareResult cr = CompareResult.ciCompareOk;
//Test to see if we have the same size of image
if (bmp1.Size.Height / bmp1.Size.Width == bmp2.Size.Height / bmp2.Size.Width)
{
if (bmp1.Size != bmp2.Size)
{
if (bmp1.Size.Height > bmp2.Size.Height)
{
bmp1 = (new Bitmap(bmp1, bmp2.Size));
}
else if (bmp1.Size.Height < bmp2.Size.Height)
{
bmp2 = (new Bitmap(bmp2, bmp1.Size));
}
}
//Convert each image to a byte array
System.Drawing.ImageConverter ic = new System.Drawing.ImageConverter();
byte[] btImage1 = new byte[1];
btImage1 = (byte[])ic.ConvertTo(bmp1, btImage1.GetType());
byte[] btImage2 = new byte[1];
btImage2 = (byte[])ic.ConvertTo(bmp2, btImage2.GetType());
//Compute a hash for each image
SHA256Managed shaM = new SHA256Managed();
byte[] hash1 = shaM.ComputeHash(btImage1);
byte[] hash2 = shaM.ComputeHash(btImage2);
//Compare the hash values
for (int i = 0; i < hash1.Length && i < hash2.Length && cr == CompareResult.ciCompareOk; i++)
{
if (hash1[i] != hash2[i])
cr = CompareResult.ciPixelMismatch;
}
}
else cr = CompareResult.ciAspectMismatch;
return cr;
}
}
One idea to achieve this:
If the image is 10x10, and your original is 40x40
Loop each pixel in the 10x10, then retrieve the 4 pixels representative of that looped pixel.
So for each pixel in the smaller image, find the corresponding scaled amount of pixels in the larger image.
You can then take the average colour of the 4 pixels, and compare with the pixel in the smaller image. You can specify error bounds, IE -10% or +10% bounds are considered a match, others are considered a failure.
Build up a count of matches and failures and use the bounds to determine if it is considered a match or not.
I think this might perform better than scaling the image to the same size and doing a 1pixel:1pixel comparison as I'm not sure how resizing algorithms necesserially work and you might lose some detail which will give less accurate results. Or if there might be different ways and methods of resizing images. But, again I don't know how the resize might work depends on how you go about doing it.
Just scale the larger image back to the size of the smaller one, then compare each pixel by taking the absolute value of the difference in each of the red, green and blue components.
You can then set a threshold for deciding how close you need to be to count it as a match, e.g. if 95%+ of the pixels are within 5% of the colour value, you have a match.
The fuzzy match is necessary because you may have scaling artefacts / anti-aliasing effects.
You'll have to loop over the pixels at some point or another.
Something that is easy to implement yet quite powerful is to calculate the difference between individual color components (RGB) for each pixel, find the average, and see if it crosses a certain threshold. It's certainly not the best method, but for a quick check it should do.
I'd have said roughly what Tom Gullen except I'd just scale down the bigger image to the smaller before comparing (otherwise you're just going to have hard maths if you are comparing a 25x25 with a 30x30 or something).
The other thing I might consider depending on image sizes is to scale them both down to a smaller image. ie if you have one that is 4000x4000 and another that is 3000x3000 then you can scale them both down to 200x200 and compare them at that size.
As others have said you would then need to do a check with a threshold (preferably on colour components) and decide what tolerances work best. I'd suggest this is probably best done by trial and error.
The easiest way is just to scale the biggest image to the smaller images size and compare color difference. Since you don't know if the scaling is cubic or linear (or something else) you have to accept a small difference.
Don't forget to take the absolute value of each pixel difference. ;)
Having absolutely no authority or experience in this area I'm going to make a stab at helping you.
I'd start with the aspect ratio matching by some tolerance, unless you're comparing cropped sections of images, which will makes things a bit harder.
I'd then scan the pixels for regions of similarity, no exactness, again a tolerance level is needed. Then when an area is similar, run along in a straight line comparing one to the other, and find another similarly coloured area. Black & white's gonna be harder.
If you get a hit, you'll have two areas in a line with patches of likeness. With two points you have a reference of length between them and so now you can see what the scaling might be. You could also scale the images first, but this doesn't account for cropped sections where aspects don't match.
Now choose a random point in the source image and get the colour info. Then using the scale factor, find that same random point on the other image and see if the colour checks out. Do it a few times with random points. If many turn up similar it's likely a copy.
You might then want to mark it for further, more CPU intensive, inspection. Either a pixel by pixel comparison or something else.
I know Microsoft (Photosynth) use filters like "outline" (the sort of stuff in Photoshop) to remove the image colours and leave just squrly lines which leave just the 'components' of the picture for matching (they match boundaries and overlap).
For speed, I'd break the problem down into chunks and really think about how humans decide two photos are similar. For non-speed, exhaustively comparing colour will probably get you there.
The process in short:
If you hole punched a sheet of paper randomly 4 times, then put it over two photos, just by seeing the colours coming through you could tell if they were likely a copy and need further inspection.
How do I compare two images & recognize the pattern in an image irrespective of its size and pattern size, and using .Net C#? Also, which algorithms are used for doing so from Image Processing?
See Scale-invariant feature transform, template matching, and Hough transform. A quick and inaccurate guess may be to make a histogram of color and compare it. If the image is complicated enough, you might be able to distinguish between several sets of images.
To make the matter simple, assume we have three buckets for R, G, and B. A completely white image would have (100%, 100%, 100%) for (R, G, B). A completely red image would have (100%, 0%, 0%). A complicated image might have something like (23%, 53%, 34%). If you take the distance between the points in that (R, G, B) space, you can compare which one is "closer".
look up pattern recognition. I known very little about it other than the name.
Warning: If that is what you want, it is one of the hardest "real world" programming problems known.
I am no expert in image recognition by I once stummbeled upon the AForge library which is written in C# and does image recognition. Maybe it can help...
Techniques for image matching and image recognition can be very different. For the first task, you may make use of SIFT or hand craft your own distance function, based on RGB or otherwise. For recognition, there a vast amount of machine learning techniques that you can use, more popular techniques involves Adaboost, SVM and other hybrid neural networks method. There are no lack of related research papers in this field. Google is your friend.
Jinmala, you've asked a question here that is extremely broad. There are literally thousands of papers in the literature about these topics. There is no correct answer, and there are many unsolved issues in the comparison of images, so you really probably can't hope for a simple solution that just works (unless your situation is quite simple and constrained)
If you narrow things down, I might be able to help.
You might be looking for this
System.Drawing.Bitmap sourceImage = (Bitmap)Bitmap.FromFile(#"C:\SavedBMPs\1.jpg");
System.Drawing.Bitmap template = (Bitmap)Bitmap.FromFile(#"C:\SavedBMPs\2.jpg");
// create template matching algorithm's instance
// (set similarity threshold to 92.5%)
ExhaustiveTemplateMatching tm = new ExhaustiveTemplateMatching(0.921f);
// find all matchings with specified above similarity
TemplateMatch[] matchings = tm.ProcessImage(sourceImage, template);
// highlight found matchings
BitmapData data = sourceImage.LockBits(
new Rectangle(0, 0, sourceImage.Width, sourceImage.Height),
ImageLockMode.ReadWrite, sourceImage.PixelFormat);
foreach (TemplateMatch m in matchings)
{
Drawing.Rectangle(data, m.Rectangle, Color.White);
MessageBox.Show(m.Rectangle.Location.ToString());
// do something else with matching
}
sourceImage.UnlockBits(data);
I warn you it is quite slow takes around 6 seconds to process image of 1024x768 finding in it pciture with the size of 50x50.enter code here
template matching, you can do this with EmguCV ,OpendotnetCV,Aforge.net
Scale-invariant feature transform (SIFT) might be what you're looking for. It's not simple to understand or implement, however.