What I'm trying to do: I want to compress a 2D grey-scale map (2D array of float values between 0 and 1) into a DFT. I then want to be able to sample the value of points in continuous coordinates (i.e. arbitrary points in between the data points in the original 2D map).
What I've tried: So far I've looked at Exocortex and some similar libraries, but they seem to be missing functions for sampling a single point or performing lossy compression. Though the math is a bit above my level, I might be able to derive methods do do these things. Ideally someone can point me to a C# library that already has this functionality. I'm also concerned that libraries that use the row-column FFT algorithm don't produce sinusoid functions that can be easily sampled this way since they unwind the 2D array into a 1D array.
More detail on what I'm trying to do: The intended application for all this is an experiment in efficiently pre-computing, storing, and querying line of sight information. This is similar to the the way spherical harmonic light probes are used to approximate lighting on dynamic objects. A grid of visibility probes store compressed visibility data using a small number of float values each. From this grid, an observer position can calculate an interpolated probe, then use that probe to sample the estimated visibility of nearby positions. The results don't have to be perfectly accurate, this is intended as first pass that can cheaply identify objects that are almost certainly visible or obscured, and then maybe perform more expensive ray-casting on the few on-the-fence objects.
I'm working on a scientific imaging software for my university, and I've encountered a major problem. Scientific camera (Apogee Alta U57) at my lab provides images as 16bpp array - it's 0-65535 values per pixel! We want to keep this range, but in fact we can't display them on monitor (0-255 grayscale range). So I found a way to resolve this problem - simply to make use of colors, and to display whole image as a heatmap (from black, blue, through green and red, to pure white).
I mean something like this - Example heatmap image I want to achieve
My only question is: How to efficiently convert 16bpp array of pixel values to complete heatmap bitmap in c#? Are there any libraries for doing that? If not, how do I achieve that using .NET resources?
My idea was to create function that maps 65536 values into (255 R, 255G, 255B), but it's a tough job - especially without using HSV model.
I would be much obliged for any help provided!
Your question consist of several parts:
reading in the 16 bit pixel data values
mapping them to 24 bit rgb colors
writing them out to an image file
I'll skip part one and three and give you a few ideas about part 2.
It is in fact harder than it seems. A unique mapping that doesn't lose any information is simple, in fact trivial, just a little bit shifting will do.
But you also want the result to work visually, meaning not so much is should be visually appealing but should make sense to a human eye. so we need a mapping that has a credible yet large enough gradient.
For this you should experiment a little. I suggest to make use of the LinearGradientBrush, as I show here. Have a look at the interpolateColors function! It uses only 6 colors in the example, way to few for your case!
You should pick many more; you may need to go through the color space in a spiral..
The trick for you will be to choose both nice and enough stop colors to create a 64k large set of unique colors, best going from blueish to reddish..
You will need to test the result for uniqueness; in fact you may want to create a pair of Dictionary and Dictionary for the mappings..
I am trying to apply image manipulation effects in Windows 8 app on camera feeds directly.
I have tried a way using canvas and redrawing images after applying effects getting from webcam directly. But this approach works fine for basic effects but for effects like edge detection its creating large lag and flickering while using canvas approach.
Other way is to create MFT(media foundation transform) but it can be implemented in C about which i have no idea.
Can anyone tell me how can i achieve my purpose of applying effects on webcam stream directly in Windows 8 metro style app either by improving canvas approach so that large effects like edge detection don not have any issues or how can i apply MFT in C# since i have worked on C# language or by some other approach?
I have just played quite a bit in this area the last week and even considered writing a blog post about it. I guess this answer can be just as good.
You can go the MFT way, which needs to be done in C++, but the things you would need to write would not be much different between C# and C++. The only thing of note is that I think the MFT works in YUV color space, so your typical convolution filters/effects might behave a bit differently or require conversion to RGB. If you decide to go that route On the C# application side the only thing you would need to do is to call MediaCapture.AddEffectAsync(). Well that and you need to edit your Package.appxmanifest etc., but let's go with first things first.
If you look at the Media capture using webcam sample - it already does what you need. It applies a grayscale effect to your camera feed. It includes a C++ MFT project that is used in an application that is available in C# version. I had to apply the effect to a MediaElement which might not be what you need, but is just as simple - call MediaElement.AddVideoEffect() and your video file playback now applies the grayscale effect. To be able to use the MFT - you need to simply add a reference to the GrayscaleTransform project and add following lines to your appxmanifest:
<Extensions>
<Extension Category="windows.activatableClass.inProcessServer">
<InProcessServer>
<Path>GrayscaleTransform.dll</Path>
<ActivatableClass ActivatableClassId="GrayscaleTransform.GrayscaleEffect" ThreadingModel="both" />
</InProcessServer>
</Extension>
</Extensions>
How the MFT code works:
The following lines create a pixel color transformation matrix
float scale = (float)MFGetAttributeDouble(m_pAttributes, MFT_GRAYSCALE_SATURATION, 0.0f);
float angle = (float)MFGetAttributeDouble(m_pAttributes, MFT_GRAYSCALE_CHROMA_ROTATION, 0.0f);
m_transform = D2D1::Matrix3x2F::Scale(scale, scale) * D2D1::Matrix3x2F::Rotation(angle);
Depending on the pixel format of the video feed - a different transformation method is selected to scan the pixels. Look for these lines:
m_pTransformFn = TransformImage_YUY2;
m_pTransformFn = TransformImage_UYVY;
m_pTransformFn = TransformImage_NV12;
For my sample m4v file - the format is detected as NV12, so it is calling TransformImage_NV12.
For pixels within the specified range (m_rcDest) or within the entire screen if no range was specified - the TransformImage_~ methods call TransformChroma(mat, &u, &v).
For other pixels - the values from original frame are copied.
TransformChroma transforms the pixels using m_transform. If you want to change the effect - you can simply change the m_transform matrix or if you need access to neighboring pixels as in an edge detection filter - modify the TransformImage_ methods to process these pixels.
This is one way to do it. I think it is quite CPU intensive, so personally I prefer to write a pixel shader for such operations. How do you apply a pixel shader to a video stream though? Well, I am not quite there yet, but I believe you can transfer video frames to a DirectX surface fairly easily and call a pixel shader on them later. So far - I was able to transfer the video frames and I am hoping to apply the shaders next week. I might write a blog post about it. I took the meplayer class from the Media engine native C++ playback sample and moved it to a template C++ DirectX project converted to a WinRTComponent library, then used it with a C#/XAML application, associating the swapchain the meplayer class creates with the SwapChainBackgroundPanel that I use in the C# project to display the video. I had to make a few changes in the meplayer class. First - I had to move it to a public namespace that would make it available to other assembly. Then I had to modify the swapchain it creates to a format accepted for use with a SwapChainBackgroundPanel:
DXGI_SWAP_CHAIN_DESC1 swapChainDesc = {0};
swapChainDesc.Width = m_rcTarget.right;
swapChainDesc.Height = m_rcTarget.bottom;
// Most common swapchain format is DXGI_FORMAT_R8G8B8A8-UNORM
swapChainDesc.Format = m_d3dFormat;
swapChainDesc.Stereo = false;
// Don't use Multi-sampling
swapChainDesc.SampleDesc.Count = 1;
swapChainDesc.SampleDesc.Quality = 0;
//swapChainDesc.BufferUsage = DXGI_USAGE_BACK_BUFFER | DXGI_USAGE_RENDER_TARGET_OUTPUT;
swapChainDesc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT; // Allow it to be used as a render target.
// Use more than 1 buffer to enable Flip effect.
//swapChainDesc.BufferCount = 4;
swapChainDesc.BufferCount = 2;
//swapChainDesc.Scaling = DXGI_SCALING_NONE;
swapChainDesc.Scaling = DXGI_SCALING_STRETCH;
swapChainDesc.SwapEffect = DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL;
swapChainDesc.Flags = 0;
Finally - instead of calling CreateSwapChainForCoreWindow - I am calling CreateSwapChainForComposition and associating the swapchain with my SwapChainBackgroundPanel:
// Create the swap chain and then associate it with the SwapChainBackgroundPanel.
DX::ThrowIfFailed(
spDXGIFactory.Get()->CreateSwapChainForComposition(
spDevice.Get(),
&swapChainDesc,
nullptr, // allow on all displays
&m_spDX11SwapChain)
);
ComPtr<ISwapChainBackgroundPanelNative> dxRootPanelAsSwapChainBackgroundPanel;
// Set the swap chain on the SwapChainBackgroundPanel.
reinterpret_cast<IUnknown*>(m_swapChainPanel)->QueryInterface(
IID_PPV_ARGS(&dxRootPanelAsSwapChainBackgroundPanel)
);
DX::ThrowIfFailed(
dxRootPanelAsSwapChainBackgroundPanel->SetSwapChain(m_spDX11SwapChain.Get())
);
*EDIT follows
Forgot about one more thing. If your goal is to stay in pure C# - if you figure out how to capture frames to a WriteableBitmap (maybe by calling MediaCapture.CapturePhotoToStreamAsync() with a MemoryStream and then calling WriteableBitmap.SetSource() on the stream) - you can use WriteableBitmapEx to process your images. It might not be top performance, but if your resolution is not too high or your frame-rate requirements are not high - it might just be enough. The project on CodePlex does not officially support WinRT yet, but I have a version that should work that you can try here (Dropbox).
As far as I know, MFTs need to be implemented in C++. I believe that there is a media transform SDK sample which shows implementing some straightforward transforms from a metro style application.
I am trying to figure out a way of getting Sikuli's image recognition to use within C#. I don't want to use Sikuli itself because its scripting language is a little slow, and because I really don't want to introduce a java bridge in the middle of my .NET C# app.
So, I have a bitmap which represents an area of my screen (I will call this region BUTTON1). The screen layout may have changed slightly, or the screen may have been moved on the desktop -- so I can't use a direct position. I have to first find where the current position of BUTTON1 is within the live screen. (I tried to post pictures of this, but I guess I can't because I am a new user... I hope the description makes it clear...)
I think that Sikuli is using OpenCV under the covers. Since it is open source, I guess I could reverse engineer it, and figure out how to do what they are doing in OpenCV, implementing it in Emgu.CV instead -- but my Java isn't very strong.
I looked for examples showing this, but all of the examples are either extremely simple (ie, how to recognize a stop sign) or very complex (ie how to do facial recognition)... and maybe I am just dense, but I can't seem to make the jump in logic of how to do this.
Also I worry that all of the various image manipulation routines are actually processor intensive, and I really want this as lightweight as possible (in reality I might have lots of buttons and fields I am trying to find on a screen...)
So, the way I am thinking about doing this instead is:
A) Convert the bitmaps to byte arrays and do brute force search. (I know how to do that part). And then
B) Use the byte array position that I found to calculate its screen position (I'm really not completely sure how I do this) instead of using the image processing stuff.
Is that completely crazy? Does anyone have a simple example of how one could use Aforge.Net or Emgu.CV to do this? (Or how to flesh out step B above...?)
Thanks!
Generally speaking, it sounds like you want basic object recognition. I don't have any experience with SIKULI, but there are a number of ways to do object recognition (Edge based template matching, etc.). That being said you might be able to go with just straight histogram matching.
http://www.codeproject.com/KB/GDI-plus/Image_Processing_Lab.aspx
That page should show you how to use AForge.net to get the histogram of an image. You would just do a brute force search using something like this:
Bitmap ImageSearchingWithin=new Bitmap("Location of image"); //or just load from a screenshot or whatever
for (int x = 0; x < ImageSearchingWithin.Width - WidthOfImageSearchingFor; ++x)
{
for (int y = 0; y < ImageSearchingWithin.Height - HeightOfImageSearchingFor; ++y)
{
Bitmap MySmallViewOfImage = ImageSearchingWithin.Clone(new Rectangle(x, y, WidthOfImageSearchingFor, HeightOfImageSearchingFor), System.Drawing.Imaging.PixelFormat.Format24bppRgb);
}
}
And then compare the newly created bitmap's histogram to the one that you calculated of the original image (whatever area is the closest in terms of matching is what you would select as being the region of BUTTON1). It's not the most elegant solution but it might work for your needs. Otherwise you get into more difficult techniques (of course I could be forgetting something at the moment that might be simpler).
Problem
Problem shaping
Image sequence position and size are fixed and known beforehand (it's not scaled). It will be quite short, maximum of 20 frames and in a closed loop. I want to verify (event driven by button click), that I have seen it before.
Lets say I have some image sequence, like:
http://img514.imageshack.us/img514/5440/60372aeba8595eda.gif
If seen, I want to see the ID associated with it, if not - it will be analyzed and added as new instance of image sequence, that has been seen. I have though about this quite a while, and I admit, this might be a hard problem. I seem to be having hard time of putting this all together, can someone assist (in C#)?
Limitations and uses
I am not trying to recreate copyright detection system, like content id system Youtube has implemented (Margaret Gould Stewart at TED ( link )). The image sequence can be thought about like a (.gif) file, but it is not and there is no direct way to get binary. Similar method could be used, to avoid duplicates in "image sharing database", but it is not what I am trying to do.
My effort
Gaussian blur
Mathematica function to generate Gaussian blur kernels:
getKernel[L_] := Transpose[{L}].{L}/(Total[Total[Transpose[{L}].{L}]])
getVKernel[L_] := L/Total[L]
Turns out, that it is much more efficient to use 2 passes of vector kernel, then matrix kernel. Thy are based on Pascal triangle uneven rows:
{1d/4, 1d/2, 1d/4}
{1d/16, 1d/4, 3d/8, 1d/4, 1d/16}
{1d/64, 3d/32, 15d/64, 5d/16, 15d/64, 3d/32, 1d/64}
Data input, hashing, grayscaleing and lightboxing
Example of source bits, that might be useful:
Lightbox around the known rectangle: FrameX
Using MD5CryptoServiceProvider to get md5 hash of the content inside known rectangle atm.
Using ColorMatrix to grayscale image
Source example
Source example (GUI; code):
Get current content inside defined rectangle.
private Bitmap getContentBitmap() {
Rectangle r = f.r;
Bitmap hc = new Bitmap(r.Width, r.Height);
using (Graphics gf = Graphics.FromImage(hc)) {
gf.CopyFromScreen(r.Left, r.Top, 0, 0, //
new Size(r.Width, r.Height), CopyPixelOperation.SourceCopy);
}
return hc;
}
Get md5 hash of bitmap.
private byte[] getBitmapHash(Bitmap hc) {
return md5.ComputeHash(c.ConvertTo(hc, typeof(byte[])) as byte[]);
}
Get grayscale of the image.
public static Bitmap getGrayscale(Bitmap hc){
Bitmap result = new Bitmap(hc.Width, hc.Height);
ColorMatrix colorMatrix = new ColorMatrix(new float[][]{
new float[]{0.5f,0.5f,0.5f,0,0}, new float[]{0.5f,0.5f,0.5f,0,0},
new float[]{0.5f,0.5f,0.5f,0,0}, new float[]{0,0,0,1,0,0},
new float[]{0,0,0,0,1,0}, new float[]{0,0,0,0,0,1}});
using (Graphics g = Graphics.FromImage(result)) {
ImageAttributes attributes = new ImageAttributes();
attributes.SetColorMatrix(colorMatrix);
g.DrawImage(hc, new Rectangle(0, 0, hc.Width, hc.Height),
0, 0, hc.Width, hc.Height, GraphicsUnit.Pixel, attributes);
}
return result;
}
I think you have a few issues with this:
Not all image sequences [videos] are equal [but many are similar]
Where is your data coming from?
How will you repesent the data related to your viewings?
Size of the data
Issue #1:
Many images can differ slightly by compression, water marking, missing frames, and adding clips. I would suggest sampling the video. For example you may want to consider sub-sampling small sections of the images in the video. Additionally, to avoid noisy images and issues with lossely compression algorithms. You may want to consider grayscaling the frames sampled, and doing a gaussian blur. [Guassian because its "more natural" (short answer)] Once you have enough sub samples to where you have a good confidence of similarity to the video then store it in a database. With the samples you can hash them, or store them to do a % similarity later.
Issue #2
Your datasource is going to influence the tool kits, and libraries that you use.
I would suggest keeping this simple [keep it with gifs and create a custom viewer, dont' try to write a browser plugin while developing your logic]
Issue #3
Using something like Postgres [if there are a lot of large sized objects] or SQLLite is highly suggested for indexing, storing, and recalling past meta data.
Issue #4
The size of the data will have a huge determination on recall, sampling, querying the database, etc.
Overall advice: Don't bite off more than you can handle at this stage. Start small and then grow.
Also take a look at Computer Vision algorithms for more help on the object representation/recall.
The question itself is sure very interesting and challenging, however there are many practical issues as stated by #monksy.
The opportunist pragmatic in me would take a step back, look at the big picture and see if there is another way to solve the problem. For example, if you are building some kind of "image sharing community" and want to avoid duplicates in the database, you could do a simple md5 on the file (animated gifs on the web are usually always the same, it's rare that people modify them).
Another example: if you are analyzing scientific samples (like meteo sequences) it may be easier to directly embed some kind of hash in every file when generating them.
This depends on wether you only want to know wether you've seen an absolutely identical movie again, or you also want to identify movies that are very similar but have been changed a bit (made lighter, have a watermark added, compression changed, etc.)
In the first case, just take any type of hash of the file and use that (because the file will be identical on the binary level.
In the second case (which I think is what you want) you have an interesting image processing problem on your hands. You could find yourself at the front-lines of image processing science with this if you'd want. If that is the case I suggest you start reading about SURF and OpenCV, and continue on from that.
If you want to match very similar, but not identical videos, and don't want to go the ultra-robus scientific route then I'd suggest the following process:
Do the gaussian blur you already do.
Divide each image into a few equally sized rectangles (you'd have to test for the best number, but I'd suggest you start with 9.
For each rectangle in each frame compute the full-colour histogram, then find the most occurring colour in that rectangle. This gives you 9*20 = 180 numbers. This is the "fingerprint" of this movie.
Find the most similar fingerprint in your database, if it is similar enough you already know about it, otherwise you don't.
Step 4 is a bit vague because I'm not really into this field. You are currently using an MD5 hash as a sort of fingerprint, but this is unsuitable in this case because slight differences in the input of a good cryptographic hashing function produce very large differences in the hash. This will mean that two very similar frames will have a totally different MD5 hash, so from the hash you'd never know they were similar.
As long as speed of database lookups is not an issue I'd just go for the sum of square differences as a measure of fingerprint similarity, and set a threshold on that to identify equal movies. However, this is not very fast for huge datasets, and in those cases you'd probably need to transform your fingerprint to something that will allow you to find similar fingerprints faster. One thing you could do here is start by selecting all known movies with very similar average colour for the entire video, then from that select the movies that have very similar average colour in each frame, and in the ones that remain at that point do the full rectangle-by-rectangle fingerprint match. But I'm sure there are even faster options for matching 180 numbers.
Perhaps you can find a way to get a binary copy of the image data of each frame in a variable. Hash that data (md5?) and store each of the hashes. Then you can see if you've ever seen that hash before. If you haven't, it's a new frame.