I need to apply a 1d gaussian filter to a list of floats in c#, ie, to smooth a graph.
I got as far as simply averaging each value with n neighbors, but the result wasn't quite right and so I discovered that I need to apply a normal distribution weight to the contributions of the values per iteration.
I can't find a library like scipy that has a function for this, and I don't quite understand the algebraic formulas I have found for computing a gaussian kernal. Examples are generally geared towards a 2D implementation for images.
Can anyone suggest the modifications that would need to be made to the following code to achieve the proper gaussian effect?
public static List<float> MeanFloats(List<float> floats, int width)
{
List<float> results = new List<float>();
if (width % 2 == 0)
width -= 1; // make sure width is odd
int halfWidthMinus1 = width / 2; // width is known to be odd, divide by 2 will round down
for (int i = 0; i < floats.Count; i++) // iterate through all floats in list
{
float result = 0;
for (int j = 0; j < width; j++)
{
var index = i - halfWidthMinus1 + j;
index = math.max(index, 0); // clamp index - the first and last elements of the list will be used when the algorithm tries to access outside the bounds of the list
index = math.min(index, floats.Count-1);
result += floats[index]; // multiply with kernal here??
}
result /= width; // calculate mean
results.Add(result);
}
return results;
}
If relevant this is for use in a Unity game.
A 1-dimensional Gaussian Kernel is defined as
where sigma is the standard deviation of your list, and x is the index distance.
You then create a kernel by filling each of its array slots with a multiplier. Here is an (untested) example:
private static float[] GaussianKernel(int width, float sigma)
{
float[] kernel = new float[width + 1 + width];
for (int i = -width; i <= width; i++)
{
kernel[width + i] = Mathf.Exp(-(i * i) / (2 * sigma * sigma)) / (Math.PI * 2 * sigma * sigma);
}
return kernel;
}
In your smoothing function you apply this multiplier to the floats[index] value. Finally, before adding the result, instead of dividing it by the width, you divide it by the total sum of the kernel weights (the values of the kernel array).
You could compile the values of the current kernel weight during each iteration in your j-loop weightSum += kernel[j].
I'm looking to optimize a program that is basing a lot of its calculations on the rotation of a lot of 2D Points. I've search around to see if it's possible to do these calculations using SIMD in C#.
I found a c++ answer here that seems to do what I want, but I can't seem to translate this into C# using the System.Numerics.Vectors package.
Optimising 2D rotation
Can anyone point me in the right direction for how this can be done?
The below code shows the regular method without SIMD. Where Point is a struct with doubles X and Y.
public static Point[] RotatePoints(Point[] points, double cosAngle, double sinAngle)
{
var pointsLength = points.Length;
var results = new Point[pointsLength];
for (var i = 0; i < pointsLength; i++)
{
results[i].X = (points[i].X * cosAngle) - (points[i].Y * sinAngle);
results[i].Y = (points[i].X * sinAngle) + (points[i].Y * cosAngle);
}
return results;
}
Edit:
I've managed to get an implementation working using two Vector< float> but from benchmarking this, this seems to be a lot slower than the previous implementation.
private static void RotatePoints(float[] x, float[] y, float cosAngle, float sinAngle)
{
var chunkSize = Vector<float>.Count;
var resultX = new float[x.Length];
var resultY = new float[x.Length];
Vector<float> vectorChunk1;
Vector<float> vectorChunk2;
for (var i = 0; i < x.Length; i += chunkSize)
{
vectorChunk1 = new Vector<float>(x, i);
vectorChunk2 = new Vector<float>(y, i);
Vector.Subtract(Vector.Multiply(vectorChunk1, cosAngle), Vector.Multiply(vectorChunk2, sinAngle)).CopyTo(resultX, i);
Vector.Add(Vector.Multiply(vectorChunk1, sinAngle), Vector.Multiply(vectorChunk2, cosAngle)).CopyTo(resultY, i);
}
}
The code added in the edit is a good start, however the codegen for Vector.Multiply(Vector<float>, float) is extremely bad so this function should be avoided. It's an easy change to avoid it though, just broadcast outside the loop and multiply by a vector. I also added a more proper loop bound and "scalar epilog" in case the vector size does not neatly divide the size of the input arrays.
private static void RotatePoints(float[] x, float[] y, float cosAngle, float sinAngle)
{
var chunkSize = Vector<float>.Count;
var resultX = new float[x.Length];
var resultY = new float[x.Length];
Vector<float> vectorChunk1;
Vector<float> vectorChunk2;
Vector<float> vcosAngle = new Vector<float>(cosAngle);
Vector<float> vsinAngle = new Vector<float>(sinAngle);
int i;
for (i = 0; i + chunkSize - 1 < x.Length; i += chunkSize)
{
vectorChunk1 = new Vector<float>(x, i);
vectorChunk2 = new Vector<float>(y, i);
Vector.Subtract(Vector.Multiply(vectorChunk1, vcosAngle), Vector.Multiply(vectorChunk2, vsinAngle)).CopyTo(resultX, i);
Vector.Add(Vector.Multiply(vectorChunk1, vsinAngle), Vector.Multiply(vectorChunk2, vcosAngle)).CopyTo(resultY, i);
}
for (; i < x.Length; i++)
{
resultX[i] = x[i] * cosAngle - y[i] * sinAngle;
resultY[i] = x[i] * sinAngle + y[i] * cosAngle;
}
}
I've got an input signal and I calculated its FFT. After that, I need to calculate its RMS ONLY at a bandwith of frequencies, not for all spectrum.
I solved RMS calculation of the entire spectrum applying Parseval's theorem, but how do I calculate this kind of RMS "selective"? I've got the indexes correctly calculated to get the three frecuencies of interest (F0, FC, F1), but when applying RMS to this bandwith, it seems Parseval's theorem is not holded.
I receive an unique 10 KHz frequency, the RMS from FFT total spectrum is correct, but its RMS selective at 10 KHz frequency gives me a wrong result (-0.4V from RMS correct one) and should give me almost the same result as I only got one frecuency in the spectrum. Here you can see my RMS selective calculation:
public static double RMSSelectiveCalculation(double[] trama, double samplingFreq, double F0, double Fc, double F1)
{
//Frequency of interest
double fs = samplingFreq; // Sampling frequency
double t1 = 1 / fs; // Sample time
int l = trama.Length; // Length of signal
double rmsSelective = 0;
double ParsevalB = 0;
double scalingFactor = fs;
double dt = 1 / fs;
// We just use half of the data as the other half is simetric. The middle is found in NFFT/2 + 1
int nFFT = (int)Math.Pow(2, NextPow2(l));
double df = fs / nFFT;
if (nFFT > 655600)
{ }
// Create complex array for FFT transformation. Use 0s for imaginary part
Complex[] samples = new Complex[nFFT];
Complex[] reverseSamples = new Complex[nFFT];
double[] frecuencies = new double[nFFT];
for (int i = 0; i < nFFT; i++)
{
frecuencies[i] = i * (fs / nFFT);
if (i >= trama.Length)
{
samples[i] = new MathNet.Numerics.Complex(0, 0);
}
else
{
samples[i] = new MathNet.Numerics.Complex(trama[i], 0);
}
}
ComplexFourierTransformation fft = new ComplexFourierTransformation(TransformationConvention.Matlab);
fft.TransformForward(samples);
ComplexVector s = new ComplexVector(samples);
//The indexes will get the index of each frecuency
int f0Index, fcIndex, f1Index;
double k = nFFT / fs;
f0Index = (int)Math.Floor(k * F0);
fcIndex = (int)Math.Floor(k * Fc);
f1Index = (int)Math.Ceiling(k * F1);
for (int i = f0Index; i <= f1Index; i++)
{
ParsevalB += Math.Pow(Math.Abs(s[i].Modulus / scalingFactor), 2.0);
}
ParsevalB = ParsevalB * df;
double ownSF = fs / l; //This is a own scale factor used to take the square root after
rmsSelective = Math.Sqrt(ParsevalB * ownSF);
samples = null;
s = null;
return rmsSelective;
}
An estimate of the power spectral density PSD is given by the square of magnitude of the FFT.
The RMS of a section with a certain bandwidth is the root of the area of the PSD of that section.
So practically, just integrate the absolute value of the FFT between the lower and upper frequency.
MATLAB example
Summary:
My class SquareDistance computes the square of the Cartesian distance in four ways using methods with these names:
Signed
UnsignedBranching
UnsignedDistribute
CastToSignedLong
The first one is fastest and uses signed integers, but my data must be unsigned (for reasons given below). The other three methods start with unsigned numbers. My goal is to write a method like those in SquareDistance that takes unsigned data and performs better than the three I already wrote, as close as possible in performance to #1. Code with benchmark results follows. (unsafe code is permitted, if you think it will help.)
Details:
I am developing an algorithm to solve K-nearest neighbor problems using an index derived from the Hilbert curve. The execution time for the naive, linear scan algorithm grows in time quadratically with the number of points and linearly with the number of dimensions, and it spends all its time computing and comparing Cartesian distances.
The motivation behind the special Hilbert index is to reduce the number of times that the distance function is called. However, it must still be called millions of times, so I must make it as fast as possible. (It is the most frequently called function in the program. A recent failed attempt to optimize the distance function doubled the program execution time from seven minutes to fifteen minutes, so no, this is not a premature or superfluous optimization.)
Dimensions: The points may have anywhere from ten to five thousand dimensions.
Constraints. I have two annoying constraints:
The Hilbert transformation logic requires that the points be expressed as uint (unsigned integer) arrays. (The code was written by another, is magic and uses shifts, ANDs, ORs, and the like and can't be changed.) Storing my points as signed integers and incessantly casting them to uint arrays produced wretched performance, so I must at the very least store a uint array copy of each point.
To improve efficiency, I made a signed integer copy of each point to speed up the distance calculations. This worked very well, but once I get to about 3,000 dimensions, I run out of memory!
To save on memory, I removed the memoized signed integer arrays and tried to write an unsigned version of the distance calculation. My best results are 2.25 times worse than the signed integer version.
The benchmarks create 1000 random points of 1000 dimensions each and perform distance calculations between every point and every other point, for 1,000,000 comparisons. Since I only care about the relative distance, I save time by not performing the square root.
In debug mode:
SignedBenchmark Ratio: 1.000 Seconds: 3.739
UnsignedBranchingBenchmark Ratio: 2.731 Seconds: 10.212
UnsignedDistributeBenchmark Ratio: 3.294 Seconds: 12.320
CastToSignedLongBenchmark Ratio: 3.265 Seconds: 12.211
In release mode:
SignedBenchmark Ratio: 1.000 Seconds: 3.494
UnsignedBranchingBenchmark Ratio: 2.672 Seconds: 9.334
UnsignedDistributeBenchmark Ratio: 3.336 Seconds: 11.657
CastToSignedLongBenchmark Ratio: 3.471 Seconds: 12.127
The above benchmarks were run on a Dell with an Intel Core i7-4800MQ CPU # 2.70GHz with 16 GB memory. My larger algorithm already uses the Task Parallel library for larger tasks, so it is fruitless to parallelize this inner loop.
Question: Can anyone think of a faster algorithm than UnsignedBranching?
Below is my benchmark code.
UPDATE
This uses loop unrolling (thanks to #dasblinkenlight) and is 2.7 times faster:
public static long UnsignedLoopUnrolledBranching(uint[] x, uint[] y)
{
var distance = 0UL;
var leftovers = x.Length % 4;
var dimensions = x.Length;
var roundDimensions = dimensions - leftovers;
for (var i = 0; i < roundDimensions; i += 4)
{
var x1 = x[i];
var y1 = y[i];
var x2 = x[i+1];
var y2 = y[i+1];
var x3 = x[i+2];
var y3 = y[i+2];
var x4 = x[i+3];
var y4 = y[i+3];
var delta1 = x1 > y1 ? x1 - y1 : y1 - x1;
var delta2 = x2 > y2 ? x2 - y2 : y2 - x2;
var delta3 = x3 > y3 ? x3 - y3 : y3 - x3;
var delta4 = x4 > y4 ? x4 - y4 : y4 - x4;
distance += delta1 * delta1 + delta2 * delta2 + delta3 * delta3 + delta4 * delta4;
}
for (var i = roundDimensions; i < dimensions; i++)
{
var xi = x[i];
var yi = y[i];
var delta = xi > yi ? xi - yi : yi - xi;
distance += delta * delta;
}
return (long)distance;
}
SquareDistance.cs:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace DistanceBenchmark
{
/// <summary>
/// Provide several alternate methods for computing the square of the Cartesian distance
/// to allow study of their relative performance.
/// </summary>
public static class SquareDistance
{
/// <summary>
/// Compute the square of the Cartesian distance between two N-dimensional points
/// with calculations done on signed numbers using signed arithmetic,
/// a single multiplication and no branching.
/// </summary>
/// <param name="x">First point.</param>
/// <param name="y">Second point.</param>
/// <returns>Square of the distance.</returns>
public static long Signed(int[] x, int[] y)
{
var distance = 0L;
var dimensions = x.Length;
for (var i = 0; i < dimensions; i++)
{
var delta = x[i] - y[i];
distance += delta * delta;
}
return distance;
}
/// <summary>
/// Compute the square of the Cartesian distance between two N-dimensional points
/// with calculations done on unsigned numbers using unsigned arithmetic, a single multiplication
/// and a branching instruction (the ternary operator).
/// </summary>
/// <param name="x">First point.</param>
/// <param name="y">Second point.</param>
/// <returns>Square of the distance.</returns>
public static long UnsignedBranching(uint[] x, uint[] y)
{
var distance = 0UL;
var dimensions = x.Length;
for (var i = 0; i < dimensions; i++)
{
var xi = x[i];
var yi = y[i];
var delta = xi > yi ? xi - yi : yi - xi;
distance += delta * delta;
}
return (long)distance;
}
/// <summary>
/// Compute the square of the Cartesian distance between two N-dimensional points
/// with calculations done on unsigned numbers using unsigned arithmetic and the distributive law,
/// which requires four multiplications and no branching.
///
/// To prevent overflow, the coordinates are cast to ulongs.
/// </summary>
/// <param name="x">First point.</param>
/// <param name="y">Second point.</param>
/// <returns>Square of the distance.</returns>
public static long UnsignedDistribute(uint[] x, uint[] y)
{
var distance = 0UL;
var dimensions = x.Length;
for (var i = 0; i < dimensions; i++)
{
ulong xi = x[i];
ulong yi = y[i];
distance += xi * xi + yi * yi - 2 * xi * yi;
}
return (long)distance;
}
/// <summary>
/// Compute the square of the Cartesian distance between two N-dimensional points
/// with calculations done on unsigned numbers using signed arithmetic,
/// by first casting the values into longs.
/// </summary>
/// <param name="x">First point.</param>
/// <param name="y">Second point.</param>
/// <returns>Square of the distance.</returns>
public static long CastToSignedLong(uint[] x, uint[] y)
{
var distance = 0L;
var dimensions = x.Length;
for (var i = 0; i < dimensions; i++)
{
var delta = (long)x[i] - (long)y[i];
distance += delta * delta;
}
return distance;
}
}
}
RandomPointFactory.cs:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace DistanceBenchmark
{
public static class RandomPointFactory
{
/// <summary>
/// Get a random list of signed integer points with the given number of dimensions to use as test data.
/// </summary>
/// <param name="recordCount">Number of points to get.</param>
/// <param name="dimensions">Number of dimensions per point.</param>
/// <returns>Signed integer test data.</returns>
public static IList<int[]> GetSignedTestPoints(int recordCount, int dimensions)
{
var testData = new List<int[]>();
var random = new Random(DateTime.Now.Millisecond);
for (var iRecord = 0; iRecord < recordCount; iRecord++)
{
int[] point;
testData.Add(point = new int[dimensions]);
for (var d = 0; d < dimensions; d++)
point[d] = random.Next(100000);
}
return testData;
}
/// <summary>
/// Get a random list of unsigned integer points with the given number of dimensions to use as test data.
/// </summary>
/// <param name="recordCount">Number of points to get.</param>
/// <param name="dimensions">Number of dimensions per point.</param>
/// <returns>Unsigned integer test data.</returns>
public static IList<uint[]> GetUnsignedTestPoints(int recordCount, int dimensions)
{
var testData = new List<uint[]>();
var random = new Random(DateTime.Now.Millisecond);
for (var iRecord = 0; iRecord < recordCount; iRecord++)
{
uint[] point;
testData.Add(point = new uint[dimensions]);
for (var d = 0; d < dimensions; d++)
point[d] = (uint)random.Next(100000);
}
return testData;
}
}
}
Program.cs:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace DistanceBenchmark
{
public class Program
{
private static IList<int[]> SignedTestData = RandomPointFactory.GetSignedTestPoints(1000, 1000);
private static IList<uint[]> UnsignedTestData = RandomPointFactory.GetUnsignedTestPoints(1000, 1000);
static void Main(string[] args)
{
var baseline = TimeIt("SignedBenchmark", SignedBenchmark);
TimeIt("UnsignedBranchingBenchmark", UnsignedBranchingBenchmark, baseline);
TimeIt("UnsignedDistributeBenchmark", UnsignedDistributeBenchmark, baseline);
TimeIt("CastToSignedLongBenchmark", CastToSignedLongBenchmark, baseline);
TimeIt("SignedBenchmark", SignedBenchmark, baseline);
Console.WriteLine("Done. Type any key to exit.");
Console.ReadLine();
}
public static void SignedBenchmark()
{
foreach(var p1 in SignedTestData)
foreach (var p2 in SignedTestData)
SquareDistance.Signed(p1, p2);
}
public static void UnsignedBranchingBenchmark()
{
foreach (var p1 in UnsignedTestData)
foreach (var p2 in UnsignedTestData)
SquareDistance.UnsignedBranching(p1, p2);
}
public static void UnsignedDistributeBenchmark()
{
foreach (var p1 in UnsignedTestData)
foreach (var p2 in UnsignedTestData)
SquareDistance.UnsignedDistribute(p1, p2);
}
public static void CastToSignedLongBenchmark()
{
foreach (var p1 in UnsignedTestData)
foreach (var p2 in UnsignedTestData)
SquareDistance.CastToSignedLong(p1, p2);
}
public static double TimeIt(String testName, Action benchmark, double baseline = 0.0)
{
var stopwatch = new Stopwatch();
stopwatch.Start();
benchmark();
stopwatch.Stop();
var seconds = stopwatch.Elapsed.TotalSeconds;
var ratio = baseline <= 0 ? 1.0 : seconds/baseline;
Console.WriteLine(String.Format("{0,-32} Ratio: {1:0.000} Seconds: {2:0.000}", testName, ratio, seconds));
return seconds;
}
}
}
You should be able to shave off a lot of execution time by unrolling your loops:
public static long Signed(int[] x, int[] y)
{
var distance = 0L;
var dimensions = x.Length;
var stop = dimensions - (dimensions % 4);
for (var i = 0; i < stop; i+=4)
{
var delta0 = x[i] - y[i];
var delta1 = x[i+1] - y[i+1];
var delta2 = x[i+2] - y[i+2];
var delta3 = x[i+3] - y[i+3];
distance += (delta0 * delta0)
+ (delta1 * delta1)
+ (delta2 * delta2)
+ (delta3 * delta3);
}
for (var i = stop; i < dimensions; i++)
{
var delta = x[i] - y[i];
distance += delta * delta;
}
return distance;
}
This change alone reduced the execution time from 8.325s to 4.745s on my local system - a 43% improvement!
The idea is to do four points at a time while you can, and then finish off the remaining points in a separate loop.
If you cannot change the hilbert curve you can try a z curve, i.e. a morton curve. Translate the dimensions to a binary and interleave it. Then sort it. You can verify the upper bounds with the mostsignificat bits. Hilbert curve in n-dimension uses a gray code maybe you can search the internet for a faster version. You can find some fast implementation in the hackers cookbook. A morton curve should be similar to a h-tree. When you need the precision you can try copies of the hilbert curves, i.e. a moore curve. For example in 2d you can interleave 4 hilbert curves:
"
,
Best possible improvement I can see isn't going to be a low hanging fruit. This kind of problem is not well suited for the current version of the .net framework (or a CPU in general).
The class of problem you have is called SIMD. You may have heard of Intel Pentium MMX. The MMX instruction set is the marketing term for SIMD instruction set.
There are three pretty good ways to get SIMD to run with your program. In order of slowest to fastest, and in order of easiest to hardest.
RyuJIT (the preview version of the next .net Compiler) to take advantage of CPU SIMD
P/Invoke into C++ AMP onto your GPU
Interop onto a FPGA with a core designed for this calculation.
I would highly recommend trying to take advantage of your GPU with C++ AMP, especially since uint[] should be easily passed to C++ AMP.
In the shower this morning, I came up with a way to improve this further using the dot product, shaving another fifty percent off the time when the data is stored as uint[] arrays. I had investigated this idea before, but failed to recognize a loop invariant that I could optimize by precomputing. The basis of the idea is to distribute the operations:
(x-y)(x-y) = x*x + y*y - 2xy
If I sum this over all coordinates, the result is:
2 2 2
D = |x| + |y| - 2(x·y)
Since I will be performing lots of distance computations, I can store the square length of each vector. Then finding the distance between two vectors amounts to summing their square distances (outside the loop) and computing the dot-product, which has no negative values, so needs no branching!
Why is the branching a problem? It is because with uint vectors you can't subtract values in the Cartesian formula without using a branch operation to test which value is larger. Thus if I want (x-y)*(x-y) I needed to do this in my loop:
var delta = x[i] > y[i] ? x[i] - y[i] : y[i] - x[i];
distance += delta * delta;
In addition, to prevent overflow from uint to ulong, I needed to cast numbers to ulong a lot, which really killed performance. Since most of my coordinates are smallish, I was able to create a test. I also store the max value for each vector. Since I am unrolling my loops by four iterations at a time, if 4*xMax*yMax does not overflow uint, I can dispense with most of my casting operations. If the test fails, I do the more costly version that casts more.
In the end, I had several implementations: naive with casting, with branching, distributed with casting and loop invariants not removed, and dot product with less casting and invariants removed.
The naive approach has a subtraction, a multiplication and an addition in each loop iteration. The dot product distribution with loop invariants removed uses only a multiplication and an addition.
Here are the benchmarks:
For 100000 iterations and 2000 dimensions.
Naive time = 2.505 sec.
Branch time = 0.628 sec.
Distributed time = 6.371 sec.
Dot Product time = 0.288 sec.
Improve vs Naive = 88.5%.
Improve vs Branch = 54.14%.
Here is the code as an NUnit test:
using System;
using System.Diagnostics;
using NUnit.Framework;
using System.Linq;
namespace HilbertTransformationTests
{
[TestFixture]
public class CartesianDistanceTests
{
[Test]
public void SquareDistanceBenchmark()
{
var dims = 2000;
var x = new uint[dims];
var y = new uint[dims];
var xMag2 = 0L;
var yMag2 = 0L;
for (var i = 0; i < dims; i++)
{
x[i] = (uint)i;
xMag2 += x[i] * (long)x[i];
y[i] = (uint)(10000 - i);
yMag2 += y[i] * (long)y[i];
}
var xMax = (long)x.Max();
var yMax = (long)y.Max();
var repetitions = 100000;
var naiveTime = Time(() => SquareDistanceNaive(x, y), repetitions);
var distributeTime = Time(() => SquareDistanceDistributed(x, y), repetitions);
var branchTime = Time(() => SquareDistanceBranching(x, y), repetitions);
var dotProductTime = Time(() => SquareDistanceDotProduct(x, y, xMag2, yMag2, xMax, yMax), repetitions);
Console.Write($#"
For {repetitions} iterations and {dims} dimensions.
Naive time = {naiveTime} sec.
Branch time = {branchTime} sec.
Distributed time = {distributeTime} sec.
Dot Product time = {dotProductTime} sec.
Improve vs Naive = {((int)(10000 * (naiveTime - dotProductTime) / naiveTime)) / 100.0}%.
Improve vs Branch = {((int)(10000 * (branchTime - dotProductTime) / branchTime)) / 100.0}%.
");
Assert.Less(dotProductTime, branchTime, "Dot product time should have been less than branch time");
}
private static double Time(Action action, int repeatCount)
{
var timer = new Stopwatch();
timer.Start();
for (var j = 0; j < repeatCount; j++)
action();
timer.Stop();
return timer.ElapsedMilliseconds / 1000.0;
}
private static long SquareDistanceNaive(uint[] x, uint[] y)
{
var squareDistance = 0L;
for (var i = 0; i < x.Length; i++)
{
var delta = (long)x[i] - (long)y[i];
squareDistance += delta * delta;
}
return squareDistance;
}
/// <summary>
/// Compute the square distance, using ternary operators for branching to keep subtraction operations from going negative,
/// which is inappropriate for unsigned numbers.
/// </summary>
/// <returns>The distance branching.</returns>
/// <param name="x">The x coordinate.</param>
/// <param name="y">The y coordinate.</param>
private static long SquareDistanceBranching(uint[] x, uint[] y)
{
long squareDistanceLoopUnrolled;
// Unroll the loop partially to improve speed. (2.7x improvement!)
var distance = 0UL;
var leftovers = x.Length % 4;
var dimensions = x.Length;
var roundDimensions = dimensions - leftovers;
for (var i = 0; i < roundDimensions; i += 4)
{
var x1 = x[i];
var y1 = y[i];
var x2 = x[i + 1];
var y2 = y[i + 1];
var x3 = x[i + 2];
var y3 = y[i + 2];
var x4 = x[i + 3];
var y4 = y[i + 3];
var delta1 = x1 > y1 ? x1 - y1 : y1 - x1;
var delta2 = x2 > y2 ? x2 - y2 : y2 - x2;
var delta3 = x3 > y3 ? x3 - y3 : y3 - x3;
var delta4 = x4 > y4 ? x4 - y4 : y4 - x4;
distance += delta1 * delta1 + delta2 * delta2 + delta3 * delta3 + delta4 * delta4;
}
for (var i = roundDimensions; i < dimensions; i++)
{
var xi = x[i];
var yi = y[i];
var delta = xi > yi ? xi - yi : yi - xi;
distance += delta * delta;
}
squareDistanceLoopUnrolled = (long)distance;
return squareDistanceLoopUnrolled;
}
private static long SquareDistanceDistributed(uint[] x, uint[] y)
{
long squareDistanceLoopUnrolled;
// Unroll the loop partially to improve speed. (2.7x improvement!)
var distance = 0UL;
var dSubtract = 0UL;
var leftovers = x.Length % 4;
var dimensions = x.Length;
var roundDimensions = dimensions - leftovers;
for (var i = 0; i < roundDimensions; i += 4)
{
ulong x1 = x[i];
ulong y1 = y[i];
ulong x2 = x[i + 1];
ulong y2 = y[i + 1];
ulong x3 = x[i + 2];
ulong y3 = y[i + 2];
ulong x4 = x[i + 3];
ulong y4 = y[i + 3];
distance += x1 * x1 + y1 * y1
+ x2 * x2 + y2 * y2
+ x3 * x3 + y3 * y3
+ x4 * x4 + y4 * y4;
dSubtract += x1 * y1 + x2 * y2 + x3 * y3 + x4 * y4;
}
distance = distance - 2UL * dSubtract;
for (var i = roundDimensions; i < dimensions; i++)
{
var xi = x[i];
var yi = y[i];
var delta = xi > yi ? xi - yi : yi - xi;
distance += delta * delta;
}
squareDistanceLoopUnrolled = (long)distance;
return squareDistanceLoopUnrolled;
}
private static long SquareDistanceDotProduct(uint[] x, uint[] y, long xMag2, long yMag2, long xMax, long yMax)
{
const int unroll = 4;
if (xMax * yMax * unroll < (long) uint.MaxValue)
return SquareDistanceDotProductNoOverflow(x, y, xMag2, yMag2);
// Unroll the loop partially to improve speed. (2.7x improvement!)
var dotProduct = 0UL;
var leftovers = x.Length % unroll;
var dimensions = x.Length;
var roundDimensions = dimensions - leftovers;
for (var i = 0; i < roundDimensions; i += unroll)
{
var x1 = x[i];
ulong y1 = y[i];
var x2 = x[i + 1];
ulong y2 = y[i + 1];
var x3 = x[i + 2];
ulong y3 = y[i + 2];
var x4 = x[i + 3];
ulong y4 = y[i + 3];
dotProduct += x1 * y1 + x2 * y2 + x3 * y3 + x4 * y4;
}
for (var i = roundDimensions; i < dimensions; i++)
dotProduct += x[i] * (ulong)y[i];
return xMag2 + yMag2 - 2L * (long)dotProduct;
}
/// <summary>
/// Compute the square of the Cartesian distance using the dotproduct method,
/// assuming that calculations wont overflow uint.
///
/// This permits us to skip some widening conversions to ulong, making the computation faster.
///
/// Algorithm:
///
/// 2 2 2
/// D = |x| + |y| - 2(x·y)
///
/// Using the dot product of x and y and precomputed values for the square magnitudes of x and y
/// permits us to use two operations (multiply and add) instead of three (subtract, multiply and add)
/// in the main loop, saving one third of the time.
/// </summary>
/// <returns>The square distance.</returns>
/// <param name="x">First point.</param>
/// <param name="y">Second point.</param>
/// <param name="xMag2">Distance from x to the origin, squared.</param>
/// <param name="yMag2">Distance from y to the origin, squared.</param>
private static long SquareDistanceDotProductNoOverflow(uint[] x, uint[] y, long xMag2, long yMag2)
{
// Unroll the loop partially to improve speed. (2.7x improvement!)
const int unroll = 4;
var dotProduct = 0UL;
var leftovers = x.Length % unroll;
var dimensions = x.Length;
var roundDimensions = dimensions - leftovers;
for (var i = 0; i < roundDimensions; i += unroll)
dotProduct += (x[i] * y[i] + x[i+1] * y[i+1] + x[i+2] * y[i+2] + x[i+3] * y[i+3]);
for (var i = roundDimensions; i < dimensions; i++)
dotProduct += x[i] * y[i];
return xMag2 + yMag2 - 2L * (long)dotProduct;
}
}
}
I have a working FFT, but my question is how do I convert it into an IFFT?
I was told that an IFFT should be just like the FFT that you are using.
so how do I make an ifft from a fft i c#?
I was told there should only be a few changes made to get the ifft.
I tried to do it myself, but I am not getting the same values back that I put in...
so I made an array of values and put it in to the fft and then the ifft and I can not getting the same values I put in...
so I do not think I changed it the right way.
this is the FFT I have:
public Complex[] FFT(Complex[] x )
{
int N2 = x.Length;
Complex[] X = new Complex[N2];
if (N2 == 1)
{
return x;
}
Complex[] odd = new Complex[N2 / 2];
Complex[] even = new Complex[N2 / 2];
Complex[] Y_Odd = new Complex[N2 / 2];
Complex[] Y_Even = new Complex[N2 / 2];
for (int t = 0; t < N2 / 2; t++)
{
even[t] = x[t * 2];
odd[t] = x[(t * 2) + 1];
}
Y_Even = FFT(even);
Y_Odd = FFT(odd);
Complex temp4;
for (int k = 0; k < (N2 / 2); k++)
{
temp4 = Complex1(k, N2);
X[k] = Y_Even[k] + (Y_Odd[k] * temp4);
X[k + (N2 / 2)] = Y_Even[k] - (Y_Odd[k] * temp4);
}
return X;
}
public Complex Complex1(int K, int N3)
{
Complex W = Complex.Pow((Complex.Exp(-1 * Complex.ImaginaryOne * (2.0 * Math.PI / N3))), K);
return W;
}
Depending on the FFT, you may have to scale the entire complex vector (multiply either the input or result vector, not both) by 1/N (the length of the FFT). But this scale factor differs between FFT libraries (some already include a 1/sqrt(N) factor).
Then take the complex conjugate of the input vector, FFT it, and do another complex conjugate to get the IFFT result. This is equivalent to doing an FFT using -i instead of i for the basis vector exponent.
Also, normally, one does not get the same values out of a computed IFFT(FFT()) as went in, as arithmetic rounding adds at least some low level numerical noise to the result.