Enhance performance to paint image, is SIMD perhapse a solution? - c#

I have no experience with SIMD, but have a method that is too slow. I know get 40fps, and I need more.
Does anyone know how I could make this paint method faster? Perhaps the SIMD instructions are a solution?
The sourceData is now a byte[] (videoBytes) but could use a pointer too.
public bool PaintFrame(IntPtr layerBuffer, ushort vStart, byte vScale)
{
for (ushort y = 0; y < height; y++)
{
ushort eff_y = (ushort)(vScale * (y - vStart) / 128);
var newY = tileHeight > 0 ? eff_y % tileHeight : 0;
uint y_add = (uint)(newY * tileWidth * bitsPerPixel >> 3);
for (int x = 0; x < width; x++)
{
var newX = tileWidth > 0 ? x % tileWidth : 0;
ushort x_add = (ushort)(newX * bitsPerPixel >> 3);
uint tile_offset = y_add + x_add;
byte color = videoBytes[tile_offset];
var colorIndex = BitsPerPxlCalculation(color, newX);
// Apply Palette Offset
if (paletteOffset > 0)
colorIndex += paletteOffset;
var place = x + eff_y * width;
Marshal.WriteByte(layerBuffer + place, colorIndex);
}
}
return true;
}
private void UpdateBitPerPixelMethod()
{
// Convert tile byte to indexed color
switch (bitsPerPixel)
{
case 1:
BitsPerPxlCalculation = (color, newX) => color;
break;
case 2:
BitsPerPxlCalculation = (color, newX) => (byte)(color >> 6 - ((newX & 3) << 1) & 3);
break;
case 4:
BitsPerPxlCalculation = (color, newX) => (byte)(color >> 4 - ((newX & 1) << 2) & 0xf);
break;
case 8:
BitsPerPxlCalculation = (color, newX) => color;
break;
}
}
More info
Depending on the settings, the bpp can be changed. The indexed colors and the palette colors are separatly stored. Here I have to recreate the image pixels indexes, so later on I use the palette and color indexes in WPF(Windows) or SDL(Linux, Mac) to display the image.
vStart is the ability to crop the image on top.
The UpdateBitPerPixelMethod() will not change during a frame rendering, only before. During the for, no settings data can be changed.
So I was hoping that some parts can be written with SIMD, because the procedure is the same for all pixels.

Hy,
your code is not the clearest to me. Are you trying to create a new matrix / image ? If yes create a new 2D allocation and calculate the entire image into it. Set it to 0 after you do not need the calculations anymore.
Replace the Marshal.WriteByte(layerBuffer + place, colorIndex);with a 2D image ( maybe this is the image ?).
Regarding the rest it is a problem because you have non uniform offsets in indexing and jumps. That will make developing a SIMD solution difficult (you need masking and stuff). My bet would be to calculate everything for all the indices and save it into individual 2D matrices, that are allocated once at the begining.
For example:
ushort eff_y = (ushort)(vScale * (y - vStart) / 128);
Is calculated per every image row. Now you could calculate it once as an array since I do not believe that the format size of the images changes during the run.
I dont know if vStart and vScale are defined as a constant at program start. You should do this for every calculation that uses constant, and just read the matrices later to calculate.
SIMD can help but only if you do every iteration you calculate the same thing and if you avoid branching and switch cases.
Addition 1
You have multiple problems and design considerations from my stand point.
First of all you need to get away from the idea SIMD is going to help in your case. You would need to remove all conditional statements. SIMD-s are not build to deal with conditional statements.
Your idea should be to split up the logic into manageable pieces so you can see witch piece of the code takes most time.
One big problem is the write byte in the marshal, this is automatically saying to the compiler that you handle only and exclusively 1 byte. I'm guessing that this creates on big bottle neck.
By code analysis I see in each loop you are doing checks. This must be restructured.
Assumption is the image get rarely cropped this would be a separation from the image calculations.
List<ushort> eff_y = new List<ushort>();
List<uint> y_add = new List<uint>();
for (ushort y = 0; y < height; y++)
{
eff_y.add((ushort)(vScale * (y - vStart) / 128));
var newY = tileHeight > 0 ? eff_y % tileHeight : 0;
y_add = (uint)(newY * tileWidth * bitsPerPixel >> 3);
}
So this can be precalculated and changed only when the cropping changes.
Now it gets realy tricky.
paletteOffset - the if statement makes only sense in paletteOffset can be negative, then zero it out and remove the if statement
bitsPerPixel - this looks like a fixed value for the rendering duration
so remove the UpdateBitPerPixelMethod and send in a parameter.
for (ushort y = 0; y < height; y++)
{
for (int x = 0; x < width; x++)
{
var newX = tileWidth > 0 ? x % tileWidth : 0; // conditional stetement
ushort x_add = (ushort)(newX * bitsPerPixel >> 3);
uint tile_offset = y_add + x_add;
byte color = videoBytes[tile_offset];
var colorIndex = BitsPerPxlCalculation(color, newX);
// Apply Palette Offset
if (paletteOffset > 0) // conditional stetement
colorIndex += paletteOffset;
var place = x + eff_y * width;
Marshal.WriteByte(layerBuffer + place, colorIndex);
}
}
This are only few things that need to be done before you try anything with the SIMD. But by that time the changes will give the compiler hints about what you want to do. This could improve the machine code execution. You need also to test the performance of your code to pinpoint the bottle neck it is very hard to assume or guess correctly by code.
Good luck

Related

Scrolling through a waveform

I have a waveform visualiser I am trying to make for some audio editing, and need to be able to scroll through the wave form. The code I'm currently using comes from this question and works after I made some modification to allow the specifying of a start audio time and end audio time:
public Texture2D PaintWaveformSpectrum(AudioClip audio, int textWidth, int textHeight, int audioStart, int audioEnd, Color col) {
Texture2D tex = new Texture2D(textWidth, textHeight, TextureFormat.RGBA32, false);
float[] samples = new float[audioLength];
float[] waveform = new float[textWidth];
audio.GetData(samples, 0);
int packSize = ((audioEnd - audioStart) / textWidth) + 1;
if (audioStart != 0) {
audioStart += packSize % audioStart;
}
int s = 0;
for (int i = audioStart; i < audioEnd; i += packSize) {
waveform[s] = Mathf.Abs(samples[i]);
s++;
}
for (int x = 0; x < textWidth; x++) {
for (int y = 0; y < textHeight; y++) {
tex.SetPixel(x, y, Color.gray);
}
}
for (int x = 0; x < waveform.Length; x++) {
for (int y = 0; y <= waveform[x] * ((float)textHeight * .75f); y++) {
tex.SetPixel(x, (textHeight / 2) + y, col);
tex.SetPixel(x, (textHeight / 2) - y, col);
}
}
tex.Apply();
return tex;
}
The issue here however, is that when I'm scrolling through the audio, the waveform changes. It does indeed scroll, but the issue is that it is now showing different values in the waveform. This is because there are significantly more samples than pixels, so there is a need to down sample. At the moment, every nth sample is chosen, but the issue is with a different start point, different samples will be chosen. Images below for comparison (additionally, here's a video. This is what I want the scroll to look like):
As you can see they are slightly different. The overall structure is there but the waveform is ultimately different.
I thought this would be an easy fix - shift the start audio value to the nearest packSize (ie, audioStart += packSize % audioStart when audioStart != 0) but this didn't work. The same issue still occurred.
If anyone has any suggestions on how I can keep the waveform consistent while scrolling it would be much appreciated.
Despite years of programming experience, I still can't seem to correctly round a number. It was as simple as that.
The line
if (audioStart != 0) {
audioStart += packSize % audioStart;
}
should be
audioStart = (int) Mathf.Round(audioStart / packSize) * packSize;
Adding 1 extra byte to waveform is also necessary as half the time the rounding will cause there to be one extra sample included. As such, waveform should be defined as:
float[] waveform = new float[textWidth+1];
This solves the issue and the samples are chosen consistently. I'm not quite sure how programs like audacity manage to get nice looking waveforms that aren't super noisy (comparison below for the same song: mine on top, audacity below) but that's for another question.

Rescaling Complex data after FFT Convolution

I have tested two rescaling functions by applying them on FFT convolution outputs.
The first one is collected from this link.
public static void RescaleComplex(Complex[,] convolve)
{
int imageWidth = convolve.GetLength(0);
int imageHeight = convolve.GetLength(1);
double maxAmp = 0.0;
for (int i = 0; i < imageWidth; i++)
{
for (int j = 0; j < imageHeight; j++)
{
maxAmp = Math.Max(maxAmp, convolve[i, j].Magnitude);
}
}
double scale = 1.0 / maxAmp;
for (int i = 0; i < imageWidth; i++)
{
for (int j = 0; j < imageHeight; j++)
{
convolve[i, j] = new Complex(convolve[i, j].Real * scale,
convolve[i, j].Imaginary * scale);
}
}
}
Here the problem is incorrect contrast.
The second one is collected from this link.
public static void RescaleComplex(Complex[,] convolve)
{
int imageWidth = convolve.GetLength(0);
int imageHeight = convolve.GetLength(1);
double scale = imageWidth * imageHeight;
for (int j = 0; j < imageHeight; j++)
{
for (int i = 0; i < imageWidth; i++)
{
double re = Math.Max(0.0, Math.Min(convolve[i, j].Real * scale, 1.0));
double im = Math.Max(0.0, Math.Min(convolve[i, j].Imaginary * scale, 1.0));
convolve[i, j] = new Complex(re, im);
}
}
}
Here the output is totally white.
So, you can see two of the versions are giving one correct and another incorrect outputs.
How can I solve this dilemma?
.
Note. Matrix is the following kernel:
0 -1 0
-1 5 -1
0 -1 0
Source Code. Here is my FFT Convolution function.
private static Complex[,] ConvolutionFft(Complex[,] image, Complex[,] kernel)
{
Complex[,] imageCopy = (Complex[,])image.Clone();
Complex[,] kernelCopy = (Complex[,])kernel.Clone();
Complex[,] convolve = null;
int imageWidth = imageCopy.GetLength(0);
int imageHeight = imageCopy.GetLength(1);
int kernelWidth = kernelCopy.GetLength(0);
int kernelHeight = kernelCopy.GetLength(1);
if (imageWidth == kernelWidth && imageHeight == kernelHeight)
{
Complex[,] fftConvolved = new Complex[imageWidth, imageHeight];
Complex[,] fftImage = FourierTransform.ForwardFFT(imageCopy);
Complex[,] fftKernel = FourierTransform.ForwardFFT(kernelCopy);
for (int j = 0; j < imageHeight; j++)
{
for (int i = 0; i < imageWidth; i++)
{
fftConvolved[i, j] = fftImage[i, j] * fftKernel[i, j];
}
}
convolve = FourierTransform.InverseFFT(fftConvolved);
RescaleComplex(convolve);
convolve = FourierShifter.ShiftFft(convolve);
}
else
{
throw new Exception("Padded image and kernel dimensions must be same.");
}
return convolve;
}
This is not really a dilemma. This is just an issue of the limited range of the display, and of your expectations, which are different in the two cases.
(top): this is a normalized kernel (its elements sum up to 1). It doesn't change the contrast of the image. But because of negative values in it, it can generate values outside the original range.
(bottom): this is not a normalized kernel. It changes the contrast of the output.
For example, play around with the kernel
0, -1, 0
-1, 6, -1
0, -1, 0
(notice the 6 in the middle). It sums up to 2. The image contrast will be doubled. That is, in a region where the input is all 0, the output is 0 as well, but where the input is all 1, the output will be 2 instead.
Typically, a convolution filter, if it is not meant to change image contrast, is normalized. If you apply such a filter, you don't need to re-scale the output for display (though you might want to clip out-of-range values if they appear). However, it is possible that the out-of-range values are relevant, in this case you need to re-scale the output to match the display range.
In your case 2 (the image kernel), you could normalize the kernel to avoid re-scaling the output. But this is not a solution in general. Some filters add up to 0 (e.g. the Sobel kernels or the Laplace kernel, both of which are based on derivatives which remove the DC component). These cannot be normalized, you will always have to re-scale the output image for display (though you wouldn't re-scale their output for analysis, since their output values have a physical meaning that is destroyed upon re-scaling).
That is to say, the convolution sometimes is meant to produce an output image with the same contrast (within approximately the same range) as the input image, and sometimes it isn't. You need to know what filter you are applying for the output to make sense, and to be able to display the output on a screen that expects images to be in a specific range.
EDIT: explanation of what is going on in your figures.
1st figure: Here you are rescaling so that the full image intensity range is visible. Logically here you don't get any saturated pixels. But because the matrix kernel enhances high frequencies, the output image has values outside the original range. Rescaling to fit the full range within the display's range reduces the contrast of the image.
2nd figure: You are rescaling the frequency-domain convolution result by N = imageWidth * imageHeight. This yields the right output. That you need to apply this scaling indicates that your forward FFT scales by 1/N, and your inverse FFT doesn't scale.
For IFFT(FFT(img))==img, it is necessary that either the FFT or the IFFT are scaled by 1/N. Typically it is the IFFT that is scaled. The reason is that then the convolution does as expected without any further scaling. To see this, imagine an image where all pixels have the same value. FFT(img) will be zero everywhere except for the 0 frequency component (DC component), which will be sum(img). The normalized kernel sums up to 1, so its DC component is sum(kernel)==1. Multiply these two, we obtain again a frequency spectrum like the input's, with a DC component of sum(img). Its inverse transform will be equal to img. This is exactly what we expect for this convolution.
Now, use the other form of normalization (i.e. the one used by the FFT you have access to). The DC component of FFT(img) will be sum(img)/N. The DC component of the kernel will be 1/N. Multiply these two, and obtain a DC component of sum(img)/(N*N). Its inverse transform will be equal to img/N. Thus, you need to multiply by N to obtain the expected result. This is exactly what you're seeing in your frequency-domain convolution for the "matrix kernel", which is normalized.
As I mentioned above, the "image kernel" isn't normalized. The DC component of FFT(kernel) is sum(img)/N, the multiplication of that by FFT(img) has a DC component sum(img)*sum(img)/(N*N), and so the inverse transform has a contrast multiplied by sum(img)/N, multiplying by N still leaves you with a factor sum(img) too large. If you were to normalize the kernel, you would be dividing it by sum(img), which would bring your output into the expected range.

How to convert from System.Drawing.Bitmap to grayscale, then to array of doubles (Double[,]), then back to grayscale Bitmap?

I need to perform some mathematical operations in photographs, and for that I need the floating point grayscale version of an image (which might come from JPG, PNG or BMP files with various colordepths).
I used to do that in Python using PIL and scipy.ndimage, and it was very straightforward to convert to grayscale with PIL and then to an array of floating-point numbers with numpy, but now I need to do something similar in C#, and I'm confused how to do so.
I have read this very nice tutorial, that seems to be a recurring reference, but that only covers the "convert to grayscale" part, I am not sure how to get an array of doubles from a Bitmap, and then (at some moment) to convert it back to System.Drawing.Bitmap for viewing.
I'm sure there are loads of optimal ways to do this.
As #Groo points out perfectly in the comments section, one could use for instance the LockBits method to write and read pixel colors to and from a Bitmap instance.
Going even further, one could use the graphics card of the computer to do the actual computations.
Furthermore, the method Color ToGrayscaleColor(Color color) which turns a color into its
grayscale version is not optically correct. There is a set of ratios which actually need to be applied to the color component strengths. I just used 1, 1, 1 ratios. That's accceptable for me and probably horrible for an artist or a scientist.
In the comments section, #plinth was very nice to point out to this question you should look at, if you want to make an anatomically correct conversion: Converting RGB to grayscale/intensity
Just wanted to share this really easy to understand and implement solution:
First a little helper to turn a Color into it's grayscale version:
public static Color ToGrayscaleColor(Color color) {
var level = (byte)((color.R + color.G + color.B) / 3);
var result = Color.FromArgb(level, level, level);
return result;
}
Then for the color bitmap to grayscale bitmap conversion:
public static Bitmap ToGrayscale(Bitmap bitmap) {
var result = new Bitmap(bitmap.Width, bitmap.Height);
for (int x = 0; x < bitmap.Width; x++)
for (int y = 0; y < bitmap.Height; y++) {
var grayColor = ToGrayscaleColor(bitmap.GetPixel(x, y));
result.SetPixel(x, y, grayColor);
}
return result;
}
The doubles part is quite easy. The Bitmap object is a memory representation of the actual image which you can use in various operations. The colordepth and image format details are only the concern of loading and saving instances of Bitmap onto streams or files. We needn't care about those at this point:
public static double[,] FromGrayscaleToDoubles(Bitmap bitmap) {
var result = new double[bitmap.Width, bitmap.Height];
for (int x = 0; x < bitmap.Width; x++)
for (int y = 0; y < bitmap.Height; y++)
result[x, y] = (double)bitmap.GetPixel(x, y).R / 255;
return result;
}
And turning a double array back into a grayscale image:
public static Bitmap FromDoublesToGrayscal(double[,] doubles) {
var result = new Bitmap(doubles.GetLength(0), doubles.GetLength(1));
for (int x = 0; x < result.Width; x++)
for (int y = 0; y < result.Height; y++) {
int level = (int)Math.Round(doubles[x, y] * 255);
if (level > 255) level = 255; // just to be sure
if (level < 0) level = 0; // just to be sure
result.SetPixel(x, y, Color.FromArgb(level, level, level));
}
return result;
}
The following lines:
if (level > 255) level = 255; // just to be sure
level < 0) level = 0; // just to be sure
are really there in case you operate on the doubles and you want to allow room for little mistakes.
The final code, based mostly in tips taken from the comments, specifically the LockBits part (blog post here) and the perceptual balancing between R, G and B values (not paramount here, but something to know about):
private double[,] TransformaImagemEmArray(System.Drawing.Bitmap imagem) {
// Transforma a imagem de entrada em um array de doubles
// com os valores grayscale da imagem
BitmapData bitmap_data = imagem.LockBits(new System.Drawing.Rectangle(0,0,_foto_franjas_original.Width,_foto_franjas_original.Height),
ImageLockMode.ReadOnly, _foto_franjas_original.PixelFormat);
int pixelsize = System.Drawing.Image.GetPixelFormatSize(bitmap_data.PixelFormat)/8;
IntPtr pointer = bitmap_data.Scan0;
int nbytes = bitmap_data.Height * bitmap_data.Stride;
byte[] imagebytes = new byte[nbytes];
System.Runtime.InteropServices.Marshal.Copy(pointer, imagebytes, 0, nbytes);
double red;
double green;
double blue;
double gray;
var _grayscale_array = new Double[bitmap_data.Height, bitmap_data.Width];
if (pixelsize >= 3 ) {
for (int I = 0; I < bitmap_data.Height; I++) {
for (int J = 0; J < bitmap_data.Width; J++ ) {
int position = (I * bitmap_data.Stride) + (J * pixelsize);
blue = imagebytes[position];
green = imagebytes[position + 1];
red = imagebytes[position + 2];
gray = 0.299 * red + 0.587 * green + 0.114 * blue;
_grayscale_array[I,J] = gray;
}
}
}
_foto_franjas_original.UnlockBits(bitmap_data);
return _grayscale_array;
}

Faster method for drawing in C#

I'm trying to draw the Mandelbrot fractal, using the following method that I wrote:
public void Mendelbrot(int MAX_Iterations)
{
int iterations = 0;
for (float x = -2; x <= 2; x += 0.001f)
{
for (float y = -2; y <= 2; y += 0.001f)
{
Graphics gpr = panel.CreateGraphics();
//System.Numerics
Complex C = new Complex(x, y);
Complex Z = new Complex(0, 0);
for (iterations = 0; iterations < MAX_Iterations && Complex.Abs(Z) < 2; Iterations++)
Z = Complex.Pow(Z, 2) + C;
//ARGB color based on Iterations
int r = (iterations % 32) * 7;
int g = (iterations % 16) * 14;
int b = (iterations % 128) * 2;
int a = 255;
Color c = Color.FromArgb(a,r,g,b);
Pen p = new Pen(c);
//Tranform the coordinates x(real number) and y(immaginary number)
//of the Gauss graph in x and y of the Cartesian graph
float X = (panel.Width * (x + 2)) / 4;
float Y = (panel.Height * (y + 2)) / 4;
//Draw a single pixel using a Rectangle
gpr.DrawRectangle(p, X, Y, 1, 1);
}
}
}
It works, but it's slow, because I need to add the possibility of zooming. Using this method of drawing it isn't possible, so I need something fast. I tried to use a FastBitmap, but it isn't enough, the SetPixel of the FastBitmap doesn't increase the speed of drawing. So I'm searching for something very fast, I know that C# isn't like C and ASM, but it would be interesting do this in C# and Winforms.
Suggestions are welcome.
EDIT: Mendelbrot Set Zoom Animation
I assume it would be significantly more efficient to first populate your RGB values into a byte array in memory, then write them in bulk into a Bitmap using LockBits and Marshal.Copy (follow the link for an example), and finally draw the bitmap using Graphics.DrawImage.
You need to understand some essential concepts, such as stride and image formats, before you can get this to work.
As comment said put out CreateGraphics() out of the double loop, and this is already a good imrovement.
But also
Enable double buffering
For zooming use MatrixTransformation functions like:
ScaleTransform
RotateTransform
TranslateTransform
An interesting article on CodeProject can be found here. It goes a little bit further than just function calls, by explaining actually Matrix calculus ( a simple way, don't worry), which is good and not difficult to understand, in order to know what is going on behind the scenes.

Y coordinate is out of range in a loop

I am working on constructing and saving a bitmap, and i have a loop that sets the pixels in the bitmap to their proper values. However it crashes after a short period of ime with an IndexOutOfRange exception at the noted point in the code.
//data is an array of bytes of size (image width * image height) * 2;
Bitmap b = new Bitmap(width, height, PixelFormat.Format32bppArgb);
for (int i = 0; i < data.Length; i += 2)
{
int luminance = ((int)data[i] << 8) | (int)data[i + 1];
Color c = Color.FromArgb(luminance,luminance,luminance,luminance);
int x = i / 2;
int y = x / width;
x %= width;
b.SetPixel(x, y, c);//crashes here when Y is at 513, should only go to 512
}
b.Save(Path.GetFileNameWithoutExtension(fileName) + ".bmp");
I'm stumped as to why this happens.Why does this happen and how can i fix it?
(a note ot all of those that reommend unsafe code: I am going for a working program then a fast one. I'll be sure to write up 3 questions on the subject when i start! ;) )
When Length is odd, then at some point i+1 == Length will be true.
for (int i = 0; i < data.Length; i += 2)
{
int luminance = ((int)data[i] << 8) | (int)data[i + 1];
int x = (i + 1) / 2;
}
I would suggest replacing
//data is an array of bytes of size (image width * image height) * 2;
with
System.Diagnostics.Debug.Assert(data.Length == width * height * 2);
System.Diagnostics.Debug.Assert((data.Length % 2) == 0);
It's hard to tell what might be wrong without knowing what your data actually is. I suspect that it might be organised into rows like a bitmap, but sometimes bitmap format data requires that rows be a multiple of 4 bytes in length (with unused padding at the end, see BMP file format). If this is the case, your y value might become larger than you expect. You may need to take such padding into account.

Categories