Is one of these decidedly faster?
var scan0 = (uint*)bitmapData.Scan0;
int length = pixels.Length;
for (int i = 0; i < length; i++)
{
uint j = scan0[i];
float a = (j >> 24) / 255f;
pixels[i] = new Vector(
(j >> 16 & 0xff) * a / 255,
(j >> 8 & 0xff) * a / 255,
(j & 0xff) * a / 255);
}
versus
var scan0 = (byte*)bitmapData.Scan0;
int length = pixels.Length * 4;
for (int i = 0; i < length; i += 4)
{
float a = scan0[i + 3] / 255f;
pixels[i / 4] = new Vector(
scan0[i + 2] * a / 255,
scan0[i + 1] * a / 255,
scan0[i] * a / 255);
}
In a 32 bit application, the second is about 2.5 times faster than the first. In a 64 bit application, the second is about 25% faster than the first.
Note that there is a bug in your second code. As you are adding four in each iteration, you will place the objects in every fourth item in the pixels array, and cause an IndexOutOfRangeException exception when it runs out of array.
Slightly faster (about 5%) than the second is to move the pointer for each pixel:
byte* scan0 = (byte*)bitmapData.Scan0;
for (int i = 0; i < pixels.Length; i++) {
float a = scan0[3] / 255f;
pixels[i] = new Vector(
scan0[2] * a / 255,
scan0[1] * a / 255,
scan0[0] * a / 255
);
scan0 += 4;
}
Note also that if you are reading data from a Bitmap image, it is not stored as a continuous array of pixel data. There may be padding between the scan lines, so the code can only read pixels from a single scan line, it can not safely read data from an entire image.
Edit:
Also, I just realised that you put the length of the array in a variable and used that in the loop. That will just make the code slower instead of faster, as the compiler can't optimise away the range check on the array accesses.
I think the bit shift "Your first solution" is faster. however you can test it by using Stopwatch. Start the stopwatch before the call the method, run the method multiple time, and then stop the watch and check its ElapcedMilliseconds. Like:
System.Diagnostics.Stopwatch watch = Stopwatch.StartNew();
//run your method that want to test its executable time multi time
for (int testIndex = 0; testIndex < 100; testIndex++)
{
TestWithShift();
}
watch.Stop();
Console.WriteLine("Test with shift time: {0}", watch.ElapcedMilliseconds);
And repeat the test for the other method. Hope that helps.
Related
We are doing some performance optimizations in our project and with the profiler I came upon the following method:
private int CalculateAdcValues(byte lowIndex)
{
byte middleIndex = (byte)(lowIndex + 1);
byte highIndex = (byte)(lowIndex + 2);
// samples is a byte[]
retrun (int)((int)(samples[highIndex] << 24)
+ (int)(samples[middleIndex] << 16) + (int)(samples[lowIndex] << 8));
}
This method is already pretty fast with ~1µs per execution, but it is called ~100.000 times per second and so it takes ~10% of the CPU.
Does anyone have an idea how to further improve this method?
EDIT:
Current solution:
fixed (byte* p = samples)
{
for (; loopIndex < 61; loopIndex += 3)
{
adcValues[k++] = *((int*)(p + loopIndex)) << 8;
}
}
This takes <40% of the time then before (the "whole method" took ~35µs per call before and ~13µs now). The for-loop actualy takes more time then the calcualtion now...
I strongly suspect that after casting to byte, your indexes are being converted back to int anyway for use in the array indexing operation. That will be cheap, but may not be entirely free. So get rid of the casts, unless you were using the conversion to byte to effectively get the index within the range 0..255. At that point you can get rid of the separate local variables, too.
Additionally, your casts to int are no-ops as the shift operations are only defined on int and higher types.
Finally, using | may be faster than +:
private int CalculateAdcValues(byte lowIndex)
{
return (samples[lowIndex + 2] << 24) |
(samples[lowIndex + 1] << 16) |
(samples[lowIndex] << 8);
}
(Why is there nothing in the bottom 8 bits? Is that deliberate? Note that the result will end up being negative if samples[lowIndex + 2] has its top bit set - is that okay?)
Seeing that you have a friendly endianess, go unsafe
unsafe int CalculateAdcValuesFast1(int lowIndex)
{
fixed (byte* p = &samples[lowIndex])
{
return *(int*)p << 8;
}
}
On x86 about 30% faster. Not much gain as I hoped. About 40% when on x64.
As suggested by #CodeInChaos:
var bounds = samples.Length - 3;
fixed (byte* p = samples)
{
for (int i = 0; i < 1000000000; i++)
{
var r = CalculateAdcValuesFast2(p, i % bounds); // about 2x faster
// or inlined:
var r = *((int*)(p + i % bounds)) << 8; // about 3x faster
// do something
}
}
unsafe object CalculateAdcValuesFast2(byte* p1, int p2)
{
return *((int*)(p1 + p2)) << 8;
}
May be following can be little faster. I have removed casting to integer.
var middleIndex = (byte)(lowIndex + 1);
var highIndex = (byte)(lowIndex + 2);
return (this.samples[highIndex] << 24) + (this.samples[middleIndex] << 16) + (this.samples[lowIndex] << 8);
I have this code;
BitmapData bdBackground = Background.LockBits(new Rectangle(0, 0, Background.Width,
Background.Height), ImageLockMode.ReadWrite, Background.PixelFormat);
BitmapData bdForeground = videoFrame.LockBits(new Rectangle(0, 0, videoFrame.Width,
videoFrame.Height), ImageLockMode.ReadWrite, videoFrame.PixelFormat);
unsafe
{
for (int x = 0; x < videoFrame.Width; x++)
{
byte* columnBackground = (byte*)bdBackground.Scan0 + (x * bdBackground.Stride);
byte* columnForeground = (byte*)bdForeground.Scan0 + (x * bdForeground.Stride);
for (int y = 0; y < videoFrame.Height; y++)
{
if (columnBackground[x * pixelSize] == columnForeground[x * pixelSize])
{
columnForeground[x] = 0;
}
}
}
}
Background.UnlockBits(bdBackground);
videoFrame.UnlockBits(bdForeground);
it gives me error;
Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
in if (columnBackground[x * pixelSize] == columnForeground[x * pixelSize])
what is the reason for that? I take this code from here
First, you need to understand how an image is stored in an array.
Images "usually in most APIs" are row major, meaning they are stored row by row (usually in a one dimensional array).
To loop through a row major image (walk the pixels), the outer loop is usually from 0 to height, and the inner from 0 to width. This makes the loops easier to read, and increases cache hits.
Stride is a very important concept, it represents the number of bytes needed for each row, and is not necessarily equal to the width*bytes per pixel, as padding for alignment reasons is usually present.
Stride is used to access a new row, for example, if I want to access the third row:
third_Row = 3 * image_stride;
If you want to access the 10th pixel of the third row, you just add (10 * bytes per pixel) to third_Row:
third_Row_Tenth_Pixel = 3 * image_stride + 10 * Bytes_per_pixel
NOTE: please mark the above does not apply to any image where bits per pixel are lower than 8, usually 4, 2, or 1 are used.
What you are doing is the reverse, you are multiplying the column number by the stride, instead of the row number, effectively, stepping out of the range of the image.
In short, reverse the x and y loops, making the y one contain the x one(for reasons of increasing cache hits):
unsafe
{
for (int y = 0; y < videoFrame.Height; y++)
{
byte* columnBackground = (byte*)bdBackground.Scan0 + (y * bdBackground.Stride);
byte* columnForeground = (byte*)bdForeground.Scan0 + (y * bdForeground.Stride);
for (int x = 0; x < videoFrame.Width; x++)
{
if (columnBackground[x * pixelSize] == columnForeground[x * pixelSize])
{
columnForeground[x] = 0;
}
}
}
}
Background.UnlockBits(bdBackground);
videoFrame.UnlockBits(bdForeground);
You never use the y variable when accessing the bitmap array. You should be multiplying y by the Stride instead of x. Then add x * pixelSize like you are doing.
For windows phone app, when I am adjusting brightness by slider it works fine when I
move it to right. But when I go back to previous position, instead of image darkening, it goes brighter and brighter. Here is my code based on pixel manipulation.
private void slider1_ValueChanged(object sender, RoutedPropertyChangedEventArgs<double> e)
{
wrBmp = new WriteableBitmap(Image1, null);
for (int i = 0; i < wrBmp.Pixels.Count(); i++)
{
int pixel = wrBmp.Pixels[i];
int B = (int)(pixel & 0xFF); pixel >>= 8;
int G = (int)(pixel & 0xFF); pixel >>= 8;
int R = (int)(pixel & 0xFF); pixel >>= 8;
int A = (int)(pixel);
B += (int)slider1.Value; R += (int)slider1.Value; G += (int)slider1.Value;
if (R > 255) R = 255; if (G > 255) G = 255; if (B > 255) B = 255;
if (R < 0) R = 0; if (G < 0) G = 0; if (B < 0) B = 0;
wrBmp.Pixels[i] = B | (G << 8) | (R << 16) | (A << 24);
}
wrBmp.Invalidate();
Image1.Source = wrBmp;
}
What am I missing and is there any problem with slider value. I am working with small images as usual in mobiles. I have already tried copying original image to duplicate one. I think code is perfect, after a lot of research I found that the problem is due to slider value.Possible solution is assigning initial value to slider. I want some code help.
private double lastSlider3Vlaue;
private void slider3_ValueChanged(object sender,`RoutedPropertyChangedEventArgs e)
{
if (slider3 == null) return;
double[] contrastArray = { 1, 1.2, 1.3, 1.6, 1.7, 1.9, 2.1, 2.4, 2.6, 2.9 };
double CFactor = 0;
int nIndex = 0;
nIndex = (int)slider3.Value - (int)lastSlider3Vlaue;
if (nIndex < 0)
{
nIndex = (int)lastSlider3Vlaue - (int)slider3.Value;
this.lastSlider3Vlaue = slider3.Value;
CFactor = contrastArray[nIndex];
}
else
{
nIndex = (int)slider3.Value - (int)lastSlider3Vlaue;
this.lastSlider3Vlaue = slider3.Value;
CFactor = contrastArray[nIndex];
}
WriteableBitmap wbOriginal;
wbOriginal = new WriteableBitmap(Image1, null);
wrBmp = new WriteableBitmap(wbOriginal.PixelWidth, wbOriginal.PixelHeight);
wbOriginal.Pixels.CopyTo(wrBmp.Pixels, 0);
int h = wrBmp.PixelHeight;
int w = wrBmp.PixelWidth;
for (int i = 0; i < wrBmp.Pixels.Count(); i++)
{
int pixel = wrBmp.Pixels[i];
int B = (int)(pixel & 0xFF); pixel >>= 8;
int G = (int)(pixel & 0xFF); pixel >>= 8;
int R = (int)(pixel & 0xFF); pixel >>= 8;
int A = (int)(pixel);
R = (int)(((R - 128) * CFactor) + 128);
G = (int)(((G - 128) * CFactor) + 128);
B = (int)(((B - 128) * CFactor) + 128);
if (R > 255) R = 255; if (G > 255) G = 255; if (B > 255) B = 255;
if (R < 0) R = 0; if (G < 0) G = 0; if (B < 0) B = 0;
wrBmp.Pixels[i] = B | (G << 8) | (R << 16) | (A << 24);
}
wrBmp.Invalidate();
Image1.Source = wrBmp;
}
After debugging I found that the r g b values are decreasing continuosly when sliding forward,but when sliding backwards it is also decreasing where as it shoul increase.
please help iam working on this since past last three months.Besides this u also give me advice about how i can complete this whole image processing
Your algorithm is wrong. Each time the slider's value changes, you're adding that value to the picture's brightness. What makes your logic flawed is that the value returned by the slider will always be positive, and you're always adding the brightness to the same picture.
So, if the slider starts with a value of 10, I'll add 10 to the picture's brightness.
Then, I slide to 5. I'll add 5 to the previous picture's brightness (the one you already added 10 of brightness to).
Two ways to solve the issue:
Keep a copy of the original picture, and duplicate it every time your method is called. Then add the brightness to the copy (and not the original). That's the safest way.
Instead of adding the new absolute value of the slider, calculate the relative value (how much it changed since the last time the method was called:
private double lastSliderValue;
private void slider1_ValueChanged(object sender, RoutedPropertyChangedEventArgs<double> e)
{
var offset = slider1.Value - this.lastSliderValue;
this.lastSliderValue = slider1.Value;
// Insert your old algorithm here, but replace occurences of "slider1.Value" by "offset"
}
This second way can cause a few headaches though. Your algorithm is capping the RGB values to 255. In those cases, you are losing information and cannot revert back to the old state. For instance, take the extreme example of a slider value of 255. The algorithm sets all the pixels to 255, thus generating a white picture. Then you reduce the slider to 0, which should in theory restore the original picture. In this case, you'll subtract 255 to each pixel, but since every pixel's value is 255 you'll end up with a black picture.
Therefore, except if you find a clever way to solve the issue mentionned in the second solution, I'd recommand going with the first one.
What's the faster way to effectively fill an array of bytes where each byte represents a pixel (black or white: < 125 = black, > 125 = white) from a Bitmap class?
I used this for colored images: Better/faster way to fill a big array in C#
However now I'm looking for something different (I can even use a single color like Red to fill this, it doesn't matter is just something I should choose), because the array format changed.
Any suggestion? Actually I'm using this code, which is obviusly not the best idea
for (int x = 0; x < LgLcd.NativeConstants.LGLCD_BMP_WIDTH; ++x)
{
for (int y = 0; y < LgLcd.NativeConstants.LGLCD_BMP_HEIGHT; ++y)
{
tmp = bmp.GetPixel(x, y);
array[y * LgLcd.NativeConstants.LGLCD_BMP_WIDTH + x] = (byte)((tmp.R == 255 && tmp.G == 255 && tmp.B == 255) ? 0 : 255);
//array[y * x] = (byte)0;
}
}
My idea was parallelizing everything (yea, 1 thread per line maybe? (or per column)), it should help I think.
EDIT:
Ok, first, I need a way to have the possibility to access different bytes of the image at the same time, Brandon Moretz is suggesting maybe the correct way to access bytes with lockbits. I would like to avoid, however, unsafe code. Does Lockbits involves necessarily unsafe code?
Second, my idea of parallelization was to use Parallel.For. This method should use the ThreadPool class, which will use an amount of threads not greater than cores of your cpu, and they are pre-allocated.
This method will be called a lot of times, so I think it's not a big trouble, because the threadpool will be used a lot after first call.
Is what I'm saying correct?
Is using "unsafe" code blocks an option? You can use LockBits on a Bitmap to get it's BitmapData, then use Scan0 & Stride properties to iterate over it.
If it's 255 colors I'm assuming a byte per pixel, so so something like:
*( ( ( byte* )bmpData.Scan0 ) + ( y * bmpData.Stride ) + x ) = (byte)((tmp.R == 255 && tmp.G == 255 && tmp.B == 255) ? 0 : 255);
General approach is to divide the image into regions then process. i.e. you can use:
Thread 1) for (int x = 0; x < LGLCD_BMP_WIDTH /2; ++x) { ... }
Thread 2) for (int x = LGLCD_BMP_WIDTH / 2; x < LGLCD_BMP_WIDTH; ++x) { ... }
where you would have two halves of the image be processed by different threads. You can divide further into 4, 8, etc. pieces as you wish. A thread per line would be excess, as thread creation overhead would overwhelm the benefits by a large margin.
I found the answer by myself, working with lockbits and Marshal.ReadByte with a really nice and fast result:
public void SetPixels(Bitmap image)
{
byte[] array = Pixels;
var data = image.LockBits(new Rectangle(0, 0, image.Width, image.Height), System.Drawing.Imaging.ImageLockMode.ReadOnly, System.Drawing.Imaging.PixelFormat.Format32bppArgb);
Parallel.For(0, data.Height, new Action<int>(i =>
{
byte tmp;
int pixel4bpp, pixelPerbpp;
pixelPerbpp = data.Stride / data.Width;
for (pixel4bpp = 0; pixel4bpp < data.Stride; pixel4bpp += pixelPerbpp)
{
tmp = (byte)((
Marshal.ReadByte(data.Scan0, 0 + (data.Stride * i) + pixel4bpp)
+ Marshal.ReadByte(data.Scan0, 1 + (data.Stride * i) + pixel4bpp)
+ Marshal.ReadByte(data.Scan0, 2 + (data.Stride * i) + pixel4bpp)
+ Marshal.ReadByte(data.Scan0, 3 + (data.Stride * i) + pixel4bpp)
) / pixelPerbpp);
array[i * data.Width + (pixel4bpp / pixelPerbpp)] = tmp;
}
}));
image.UnlockBits(data);
}
I've asked before about the opposite of Bitwise AND(&) and you told me its impossible to reverse.
Well,this is the situation: The server sends an image,which is encoded with the function I want to reverse,then it is encoded with zlib.
This is how I get the image from the server:
UInt32[] image = new UInt32[200 * 64];
int imgIndex = 0;
byte[] imgdata = new byte[compressed];
byte[] imgdataout = new byte[uncompressed];
Array.Copy(data, 17, imgdata, 0, compressed);
imgdataout = zlib.Decompress(imgdata);
for (int h = 0; h < height; h++)
{
for (int w = 0; w < width; w++)
{
imgIndex = (int)((height - 1 - h) * width + w);
image[imgIndex] = 0xFF000000;
if (((1 << (Int32)(0xFF & (w & 0x80000007))) & imgdataout[((h * width + w) >> 3)]) > 0)
{
image[imgIndex] = 0xFFFFFFFF;
}
}
}
Width,Height,Image decompressed and Image compressed length are always the same.
When this function is done I put image(UInt32[] array) in a Bitmap and I've got it.
Now I want to be the server and send that image.I have to do two things:
Reverse that function and then compress it with zlib.
How do I reverse that function so I can encode the picture?
for (int h = 0; h < height; h++)
{
for (int w = 0; w < width; w++)
{
imgIndex = (int)((height - 1 - h) * width + w);
image[imgIndex] = 0xFF000000;
if (((1 << (Int32)(0xFF & (w & 0x80000007))) & imgdataout[((h * width + w) >> 3)]) > 0)
{
image[imgIndex] = 0xFFFFFFFF;
}
}
}
EDIT:The format is 32bppRGB
The assumption that the & operator is always irreversible is incorrect.
Yes, in general if you have
c = a & b
and all you know is the value of c, then you cannot know what values a or b had before hand.
However it's very common for & to be used to extract certain bits from a longer value, where those bits were previously combined together with the | operator and where each 'bit field' is independent of every other. The fundamental difference with the generic & or | operators that makes this reversible is that the original bits were all zero beforehand, and the other bits in the word are left unchanged. i.e:
0xc0 | 0x03 = 0xc3 // combine two nybbles
0xc3 & 0xf0 = 0xc0 // extract the top nybble
0xc3 & 0x0f = 0x03 // extract the bottom nybble
In this case your current function appears to be extracting a 1 bit-per-pixel (monochrome image) and converting it to 32-bit RGBA.
You'll need something like:
int source_image[];
byte dest_image[];
for (int h = 0; h < height; ++h) {
for (int w = 0; w < width; ++w) {
int offset = (h * width) + w;
if (source_image[offset] == 0xffffffff) {
int mask = w % 8; // these two lines convert from one int-per-pixel
offset /= 8; // offset to one-bit-per-pixel
dest_image[offset] |= (1 << mask); // only changes _one_ bit
}
}
}
NB: assumes the image is a multiple of 8 pixels wide, that the dest_image array was previously all zeroes. I've used % and / in that inner test because it's easier to understand and the compiler should convert to mask / shift itself. Normally I'd do the masking and shifting myself.