I want to know some information about YV12 image format.
1st question is YV12 Image format is equal to YUV420p format or YVU420p or YUV420Sp or YVU420Sp. I know that U and V have the same amount of memory in the format so in the case of planer image formats does swapping U and V makes some major difference. and if it does can some body explain me what is that.
Also i have heard that YV12 and NV12 are both 12 bits formats. Then can some body tell me that what is 4:2:0? it will be great if some can explain me in simple words thanks
According to this site, YV12 looks the same as NV12 except that NV12 is interleaved and YV12 is not. I think that YV12 correspond to YVU420p and NV12 is YVU420sp, but I'm not 100% sure about that.
What this means however, is that instead of having triplets [yuv][yuv][yuv][...] repeated all over your buffer to draw the image, one triplet per pixel, you have a large buffer of [y], then a buffer 1/4 of the [y] size containing all the [u], then another one 1/4 of [y] containing the [v]. For an image of 320x240 (76800 pixels), you will have 76800 Ys at the start of your buffer, then 19200 Us, then 19200 Vs.
In the NV12 case, you have a large buffer of [y], then a buffer 1/2 of the [y] size containing [uv] pairs. When drawing the image, or converting to another format such as RGB, you need a least 2 pointers, one that will read Ys and another one that will read the Us and Vs
When reading such as compressed format, the trick is that you have one Y per pixel, but the UV components change only on odd rows and odd columns and your eye will not see the difference. You reuse the same UV values for 4 pixels at a time while changing the Y for every pixel.
YUV is widely used in video processing, what you are talking about is chroma sub-sampling. The idea behind this, is that human eye is less sensitive to change in color, than to change in brightness.
You can find very thorough explanation of the process on wiki: https://en.wikipedia.org/wiki/Chroma_subsampling
Here is image of various chroma sub-sampling schemes used:
http://commons.wikimedia.org/wiki/File:Common_chroma_subsampling_ratios.svg
Related
I have a data source that provides many (4096) double values in an array. These are measured with high resolution and are the result of a FFT. For visualisation purposes, they need to be reduced. (Reapplying the FFT on the raw signal is not possible here.) I could simply average each n samples and have a resulting array of length / n values. To allow more flexible selection of the number of resulting values, I need interpolation, though.
I've looked up some basic information about this on Wikipedia. I am already familiar with 2D downsampling/interpolation from a user prespective in raster image editors. Now I need this in 1D in C# code. Think of it as changing (reducing) the raster image size of a 1D barcode image, or resampling an audio wave file.
One library I've found recommended is Math.NET Numerics. This is already used for other tasks in my application, so I could easily use that. There's the CubicSpline class in there but I have no idea how to use it.
Q: What would be an approach to reduce the number of samples in a double[] to an arbitrary number using interpolation?
I'm not interested in finding a single double value between two others. I need to combine multiple source values into each a single output value, while not losing any information (single frequency bins with an extreme level) at the boundaries of the groups, and without aliasing effects or rounding because of different group sizes if the numbers aren't divisible.
Maybe the use of bitmap image functions and a 1*n bitmap size is a good solution instead of dealing with the math directly? This would involve a lot of data conversion, though, which reduces performance and probably also precision. Or some library from the autio processing field?
I'm curious to know what is the maximum bitmap width and height independently of each other. I did find that the maximum size is 32768x32768, but is that just referencing a perfect square? Is 32768x32768 = 1,073,741,824 the total amount of pixels I can play with and I can rearrange those pixels among the width and height as long as the total doesn't exceed?
I don't get any error if I do this:
Dim theBitmap as Bitmap = New Bitmap(450, 100000)
Even though I am unable to open the image after I save it (which I don't need to do), I am still able to work with the bitmap BUT I believe there is something not quite right... The final result does not yield the expected result...
The purpose of what I am doing is irrelevant. All I care about is answers to the questions I stated in the first paragraph. If the answer is that I am limited to 32768 for the height, then I'll change my code accordingly. Thanks!
I was able to figure out the answer to my initial questions. You are indeed able to work with any width and height as long as the total dimension stays within the maximum size specification. You may experience problem saving awkward dimensions (1 by 1,000,000), but if you only need to manipulate a bitmap, you can indeed work with such scenarios.
Cheers to everyone that contributed in the comment section!
.bmps size is constrained by the max size of a uint32_t, which is 4GB.
Any dimensions are acceptable as long as the .bmp remains under 4GB.
However, not all bitmaps are created equal. Monochrome bitmaps only need 1 bit per pixel, and also use a slightly smaller color pallet (8 bytes total) so can have a little more than 4x the total number of pixels a 16 color bitmap needs (which uses 4 bits per pixel, and 64 bytes for the color pallet).
This does not take into account compression, as bmps allow for compression for all non monochrome bmps.
PNG and JPEG have no explicit limit on file size, whereas BMP has a limit of 32K by 32K pixels, which I believe is your problem here (some places state that it can also hold 2Gx2G, but I couldn't find anything related to those claims).
I saw a lot a topic about this, I understood the theory but I'm not able to code this.
I have some pictures and I want to determine if they are blurred or not. I found a library (aforge.dll) and I used it to compte a FFT for an image.
As an example, there is two images i'm working on :
My code is in c# :
public Bitmap PerformFFT(Bitmap Picture)
{
//Loade Image
ComplexImage output = ComplexImage.FromBitmap(Picture);
// Perform FFT
output.ForwardFourierTransform();
// return image
return = output.ToBitmap();
}
How can I determine if the image is blurred ? I am not very comfortable with the theory, I need concret example. I saw this post, but I have no idea how to do that.
EDIT:
I'll clarify my question. When I have a 2D array of complex ComplexImage output (image FFT), what is the C# code (or pseudo code) I can use to determine if image is blurred ?
The concept of "blurred" is subjective. How much power at high frequencies indicates it's not blurry? Note that a blurry image of a complex scene has more power at high frequencies than a sharp image of a very simple scene. For example a sharp picture of a completely uniform scene has no high frequencies whatsoever. Thus it is impossible to define a unique blurriness measure.
What is possible is to compare two images of the same scene, and determine which one is more blurry (or identically, which one is sharper). This is what is used in automatic focussing. I don't know how exactly what process commercial cameras use, but in microscopy, images are taken at a series of focal depths, and compared.
One of the classical comparison methods doesn't involve Fourier transforms at all. One computes the local variance (for each pixel, take a small window around it and compute the variance for those values), and averages it across the image. The image with the highest variance has the best focus.
Comparing high vs low frequencies as in MBo's answer would be comparable to computing the Laplace filtered image, and averaging its absolute values (because it can return negative values). The Laplace filter is a high-pass filter, meaning that low frequencies are removed. Since the power in the high frequencies gives a relative measure of sharpness, this statistic does too (again relative, it is to be compared only to images of the same scene, taken under identical circumstances).
Blurred image has FFT result with smaller magnitude in high-frequency regions. Array elements with low indexes (near Result[0][0]) represent low-frequency region.
So divide resulting array by some criteria, sum magnitudes in both regions and compare them. For example, select a quarter of result array (of size M) with index<M/2 and indexy<M/2
For series of more and more blurred image (for the same initial image) you should see higher and higher ratio Sum(Low)/Sum(High)
Result is square array NxN. It has central symmetry (F(x,y)=F(-x,-y) because source is pure real), so it is enough to treat top half of array with y<N/2.
Low-frequency components are located near top-left and top-right corners of array (smallest values of y, smallest and highest values of x). So sum magnitudes of array elements in ranges
for y in range 0..N/2
for x in range 0..N
amp = magnitude(y,x)
if (y<N/4) and ((x<N/4)or (x>=3*N/4))
low = low + amp
else
high = high + amp
Note that your picture shows jumbled array pieces - this is standard practice to show zero component in the center.
I am trying to develop an application for image processing.
Here is my complete code in DotNetFiddle.
I have tested my application with different images from the Internet:
Cameraman is GIF.
Baboon is PNG.
Butterfly is PNG.
Pheasant is JPG.
Butterfly and Pheasant are re-sized to 300x300.
The following two images show correct Fourier and Inverse Fourier spectrum:
The following two images do not show the expected outcome:
What could be the reason?
Are there any problem with the later two images?
Do we need to use images of specific quality to test Image-processing applications?
The code you linked to is a radix-2 FFT implementation which would work for any image with sizes that are exact powers of 2.
Incidentally, the Cameraman image is 256 x 256 (powers of 2) and the Baboon image is 512 x 512 (again powers of 2). The other two images, being resized to 300 x 300 are not powers of 2. After resizing those images to an exact power of 2 (for example 256 or 512), the output of FrequencyPlot for the brightness component of the last two images should look somewhat like the following:
butterfly
pheasant
A common workaround for images of other sizes is to pad the image to sizes that are exact powers of 2. Otherwise, if you must process arbitrary sized images, you should consider other 2D discrete Fourier transform (DFT) algorithms or libraries which will often support sizes that are the product of small primes.
Note that for the purpose of validating your output, you also have option to use the direct DFT formula (though you should not expect the same performance).
I got not time to dig through your code. Like I said in my comments you should focus on the difference between those images.
There is no reason why you should not be able to calculate the FFT of one image and fail for another. Unless you have some problem in your code that can't handle some difference between those images. If you can display them you should be able to process them.
So the first thing that catches my eye is that both images you succeed with have even dimensions while the images your algorithm produces garbage for have at least one odd dimension. I won't look into it any further as from experience I'm pretty confident that this causes your issue.
So befor you do anything else:
Take one of those images that work fine, remove one line or row and see if you get a good result. Then fix your code.
I m new in image processing field. I have worked with bmp images but currently i have a problem at hand which needs image to be converted into YCbCr color space before further processing. I have read about YCbCr and conversion process but the problem is i have no idea how i will store the YCbCr data in image format and which image format will support it.
i mean in bmp images rgb components are stored in bgr format, bytes should be multiples of 4 etc, but what about YCbCr? how they are represented?
i m sorry if this sounds very lame. I googled it a little but the thing is i don't think i m going in right direction. Actually this is for my final project and i m running out of time.
Update: actually there is no need to store it in some image container although tiff and jpeg can be used. i get around it by just converting rgb to YCbCr processing it and then converting it back to rgb pixel by pixel.
Both the formats only need three bytes for each pixel. So, as long as you store your pixels in some uncompressed format such as ppm, you do not need to bother about the conversion. When you are writing, put the Y into R, Cb int G and Cr into the blue bytes respectively. When you read in the values, it is up to to your program to interpret them - the default interpretation of most image processing programs is to treat them as RGB, but you can tell it to read them in as YCbCr
If you choose to store it in some compressed format such as jpeg, the values that you read back might not be the same as the ones that you store, but the decision depends on the accuracy that you need.