I have a 8 x 8 matrix of floating point numbers and need to calculate eigenvector and eigenvalue from it. This is for feature reduction using PCA (Principal Component Analysis) and is one hell of a time consuming job if done by traditional methods. I tried to use power method as, Y = C*X where X is my 8 X 8 matrix.
float[,] XMatrix = new float[8, 1];
float[,] YMatrix = new float[8, 1];
float max = 0;
XMatrix[0, 0] = 1;
for (int i = 0; i < 8; i++)
{
for (int j = 0; j < 1; j++)
{
for (int k = 0; k < 8; k++)
{
YMatrix[i, j] += C[i, k] * XMatrix[k, j];
if (YMatrix[i, j] > max)
max = YMatrix[i, j];
}
}
}
I know it is incorrect but cannot figure it out. I need help for using a power method or perhaps more effective way of calculating it.
Thanks in advance.
To retrieve the eigenvalues/eigenvectors in an efficent manner (i.e. fast!) for any size (dense) matrix, is not entirely trivial. I would suggest you use something like the QR algorithm (although this maybe overkill for a one-off calculation of a single 8x8 matrix).
The QR algorithm computes a Schur decomposition of a matrix. It is certainly one of the
most important algorithm in eigenvalue computations. However, it is applied to dense matrices only (as stated above).
The QR algorithm consists of two separate stages. First, by means of a similarity
transformation, the original matrix is transformed in a finite number of steps to Hessenberg
form or – in the Hermitian/symmetric case – to real tridiagonal form. This first stage of
the algorithm prepares its second stage, the actual QR iterations that are applied to the
Hessenberg or tridiagonal matrix.
The overall complexity (number of floating points) of the algorithm is O(n3). For a good explanation of this algorithm see here. Or searches for eigenvalue algorithm in Google should provide you with many alternative ways of calculating your required eigenvalues/vectors.
Also, I have not looked into this in detail, but Math.NET a free library may help you here...
Related
I'm reading data from a sensor. The sensor give an array of points (x,y). But as you can see in the image, there is a lot of noise:
.
I need to clean the data in a way that the data filtered, give a few points . Using something like Median,adjacent averaging, mean of the xy points or an algorithm that removes the noise. I know that there are a bunch of libraries in Python that make the work automatically. All the auto libraries that I found are base on image analysis and I think they do not work for this case because this is different, these are points (x,y) in a dataset.
point-cloud noise cleaned:
PD: I wanted to do the median of the points but i got confused when i tried with an bidimensional array (this mean ListOfPoints[[x,y],[x,y],[x,y],[x,y],[x,y],[x,y]]) I didn't know how to make that calculation with for or while to iterate and make the calc. I prefer C#, but if there is a solution in other language without libraries, I would be open to it.
One of the methods you can use is k_means algorithm. This picture briefly explains this algorithm k_means
This link fully explains the k_means algorithm. Also, how to create a loop on the input data. I don't think any additional explanation is needed k_means algorithm
K_menans algorithm is very simple and you will understand it with the first Google search
You could try doing a weighted average of the Y-value at sampled X-positions. Something like this:
List<Point2> filtered_points = new List<Point2>();
for (int x = xmin; x <= xmax; x++)
{
double weight_sum = 0;
List<double> weights = new List<double>();
foreach (Point2 p in point_list)
{
double w = 1.0/((p.x - x)*(p.x - x) + 1e-3);
weights.Add(w);
weight_sum += w;
}
double y = 0;
for (int i = 0; i < point_list.Count; i++)
{
y += weights[i]*point_list[i].y / weight_sum;
}
filtered_points.Add(new Point2(x, y));
}
You would probably need to tune the weights to get nice results. Also, in the example I am using a quadratic decay, but other weighting functions can be used (linear decay, gaussian function...)
I am doing project euler and i am at problem 15 now, here is a link:
https://projecteuler.net/problem=15 . I am trying to solve this with binomial coefficient. Here is a site that explains it: http://www.mathblog.dk/project-euler-15/ . You can find it at the bottom.
My question is, why is the following code wrong? Since this follows the mathematical algorithm I think: n-k+i/i
int grid = 20;
long paths = 1;
for (int i = 0; i < grid; i++)
{
paths *= (grid * 2) - (grid + i)
paths /= (i + 1);
}
Console.WriteLine(paths);
Console.ReadKey();
And why is this code wrong? This is exactly as the mathblog site but in 1 line.
int grid = 20;
long paths = 1;
for (int i = 0; i < grid; i++)
{
paths *= ((grid * 2) - i) / (i + 1);
}
Console.WriteLine(paths);
Console.ReadKey();
But why is this code right then? Isnt it the same as the previous code? And it doesn't exactly follow the mathematical algorithm does it? Because it's n-k+i/i, and this code does n-i/i
int grid = 20;
long paths = 1;
for (int i = 0; i < grid; i++)
{
paths *= ((grid * 2) - i);
paths /= (i + 1);
}
Console.WriteLine(paths);
Console.ReadKey();
Thnx guys!
If you want to combain the calculation it should be like this
paths = (path *((grid * 2) - i))/(i + 1);
By convention, in many programming languages, int/int gives an int,* not a floating point number. Your method implies that 'paths' should take values that are not int. In fact none of the three methods should work but by a happy coincidence, the last one worked: basically because all intermediate values of 'paths' happen to be binomial coefficients too.
Advice for debugging: ask your program to output the intermediate values. This helps a lot.
*: As a mathematician, I almost never need that feature. In fact the other convention (int/int -> double) would have made my life as a programmer easier on average.
I had a look at the blog you mention. This makes your message much more understandable.
The blog mentions a formula : the product for i from 1 to k of (n-k+1)/i.
So to mimick it you would need to write
for (int i = 1; i <= grid; i++) // bounds!
{
paths *= (grid * 2) - (grid - i) // minus sign!
paths /= (i + 1);
}
About the fact that this works with ints: this is an accident due to the fact that in the intermediate values of the product (at the end of each loop) are binomial coefficients all along the computation. If you would compute the products and divisions in another order, you may very well get non-integers so the computation would fail with an integer variable type for path because of the convention int/int -> int. The blog is not very helpful in not mentionning that.
I am working on a program and it has a pretty long execution time. I'am trying to improve performance where I can, however my knowledge is limited in this area. Can anyone recommend a way to speed up the method below?
public static double DistanceBetween2Points(double[,] p1, double[,] p2, int patchSize)
{
double sum = 0;
for (int i = 0; i < patchSize; i++)
{
for (int j = 0; j < patchSize; j++)
{
sum += Math.Sqrt(Math.Pow(p1[i, j] - p2[i, j], 2));
}
}
return sum;
}
The method calculates the distance between two images by calculating the sum of all the distances between two points on the two images.
Think about your algorithm. Probably a pixel-distance isn't the best thing to get an acurate image-distance.
replace sqrt(x^2) by abs(x) or even faster:
if(x < 0) x = -x;
Rename your routine to OverallImageDistance or similar(will not improve performance) ;)
Use unsafe pointers, and calculate your distance in a single loop using these pointers:
unsafe
{
sum = 0.0;
int numPixels = patchsize*patchsize;
fixed(int *pointer1 = &p1[0])
{
fixed(int* pointer2 = &p2[0])
{
while(numPixels-- > 0)
{
double dist = *pointer1++ - *pointer2++;
if(dist < 0) dist = -dist;
sum += dist;
}
...
This should be several times faster than your original.
Well, this method is really weird and does not look like distance between pixels at all. But certainly you would want to use linear algebra instead of straightforward array calculations.
Image recognition, natural language processing and machine learning algorithms all use matrices, because matrix libraries are highly optimized for these kind of situations, when you need batch processing.
There is a plethora of matrix libraries in the wild, look here Recommendation for C# Matrix Library
EDIT: Ok, thanks for feedback, trying to improve the answer...
You can use Math.Net Numerics open source library (install MathNet.Numerics nuget package) and rewrite your method like this:
using MathNet.Numerics.LinearAlgebra;
public static double DistanceBetween2Points(double[,] p1, double[,] p2, int patchSize)
{
var A = Matrix<double>.Build.DenseOfArray(p1).SubMatrix(0, patchSize, 0, patchSize);
var B = Matrix<double>.Build.DenseOfArray(p2).SubMatrix(0, patchSize, 0, patchSize);
return (A - B).RowAbsoluteSums().Sum();
}
Essentially, loops slow down your code. When doing batch processing ideally you should avoid loops at all.
I just wrote the implementation of dft. Here is my code:
int T = 2205;
float[] sign = new float[T];
for (int i = 0, j = 0; i < T; i++, j++)
sign[i] = (float)Math.Sin(2.0f * Math.PI * 120.0f * i/ 44100.0f);
float[] re = new float[T];
float[] im = new float[T];
float[] dft = new float[T];
for (int k = 0; k < T; k++)
{
for (int n = 0; n < T; n++)
{
re[k] += sign[n] * (float)Math.Cos(2.0f* Math.PI * k * n / T);
im[k] += sign[n] * (float)Math.Sin(2.0f* Math.PI * k * n / T);;
}
dft[k] = (float)Math.Sqrt(re[k] * re[k] + im[k] * im[k]);
}
So the sampling freguency is 44100 Hz and I have a 50ms segment of a 120Hz sinus wave. According to the result I have a peak of the dft function at pont 7 and 2200. Did I do something wrong and if not, how should I interpret the results?
I tried the FFT method of AFORGE. Heres is my code.
int T = 2048;
float[] sign = new float[T];
AForge.Math.Complex[] input = new AForge.Math.Complex[T];
for (int i = 0; i < T; i++)
{
sign[i] = (float)Math.Sin(2.0f * Math.PI * 125.0f * i / 44100.0f);
input[i].Re = sign[i];
input[i].Im = 0.0;
}
AForge.Math.FourierTransform.FFT(input, AForge.Math.FourierTransform.Direction.Forward);
AForge.Math.FourierTransform.FFT(input, AForge.Math.FourierTransform.Direction.Backward);
I had expected to get the original sign but I got something different (a function with only positive values). Is that normal?
Thanks in advance!
Your code look correct, but it could be more efficient, DFT is often solved by FFT algorithm (fast-fourier transform, it's not a new transform, it's just an algorithm to solve DFT in more efficient way).
Even if you do not want to implement FFT (which is a bit harder to understand and it's harder to make it work on data which is not in form of 2^n) or use some open source code, you can make your implementation a bit fast, for example by seeing that 2.0f * Math.PI * K / T is a constant outside of inner loop, so you can compute it once for each k (move it outside inner loop) and then just multiply it by n in your cos/sin functions.
As for position and interpretation, you have changed your domain, now your X-axis, which is the index of data in table corresponds not to time but frequency. You have sampling of 44100Hz and you have captures 2205 samples, that means that every 1 sample represents a magnitude of your input signal at frequency equal to 44100Hz / 2205 = 20Hz. You have your magnitude peak at 7th point (index 6) because your signal is 120Hz, so 6 * 20Hz = 120Hz which is what you could expect.
Seconds peak might seem to represent some high frequency, but it's just a spurious signal, because your sampling rate is 44100Hz you can not measure frequencies higher than 44100Hz / 2 (Nyquist's law) which if you cut-off point, after that frequency DFT data is not valid. That's why, second half of your table is invalid and it's basically your first half but mirrored and you can ignore it.
Edit//
From your questions I can see that you are interested in audio processing, you might want to google NForge.Net library, which is a great opensource library for audio and visual processing and its author have many good articles on codeproject.com regarding many of it's features.
I'm currently implementing a software that measures certain values over time. The user may choose to measure the value 100 times over a duration of 28 days. (Just to give an example)
Linear distribution is not a problem, but I am currently trying to get a logarithmical distribution of the points over the time span.
The straight-forward implementation would be to iterate over the points and thus I'll need an exponential function. (I've gotten this far!)
My current algorithm (C#) is as follows:
long tRelativeLocation = 0;
double tValue;
double tBase = PhaseTimeSpan.Ticks;
int tLastPointMinute = 0;
TimeSpan tSpan;
for (int i = 0; i < NumberOfPoints; i++)
{
tValue = Math.Log(i + 1, NumberOfPoints);
tValue = Math.Pow(tBase, tValue);
tRelativeLocation = (long)tValue;
tSpan = new TimeSpan(tRelativeLocation);
tCurrentPoint = new DefaultMeasuringPointTemplate(tRelativeLocation);
tPoints.Add(tCurrentPoint);
}
this gives me a rather "good" result for 28 days and 100 points.
The first 11 points are all at 0 seconds,
12th point at 1 sec,
20th at 50 sec,
50th at 390 min,
95th at 28605 mins
99 th at 37697 mins (which makes 43 hours to the last point)
My question is:
Does anybody out there have a good idea how to get the first 20-30 points further apart from each other, maybe getting the last 20-30 a bit closer together?
I understand that I will eventually have to add some algorithm that sets the first points apart by at least one minute or so, because I won't be able to get that kind of behaviour into a strictly mathematical algorithm.
Something like this:
if (((int)tSpan.TotalMinutes) <= tLastPointMinute)
{
tSpan = new TimeSpan((tLastPointMinute +1) * 600000000L);
tRelativeLocation = tSpan.Ticks;
tLastPointMinute = (int)tSpan.TotalMinutes;
}
However, I'd like to get a slightly better distribution overall.
Any cool ideas from you math cracks out there would be greatly appreciated!
From a practical point of view, the log function squeezes your time point near the origin already. A power function squeezes them even more. How about simple multiplication?
tValue = Math.Log(i + 1, NumberOfPoints);
tValue = tBase * tValue;
Another way to flatten the curve is start farther from the origin.
for (int i = 0; i < NumberOfPoints; i++)
{
tValue = Math.Log(i + 10, NumberOfPoints + 9);
The range of tvalue is still 0 to 1.
How about this to have a minimum space of 1 second at the beginning?
double nextTick = 0;
for (int i = 0; i < NumberOfPoints; i++)
{
tValue = Math.Log(i + 1, NumberOfPoints);
tValue = Math.Pow(tBase, tValue);
if (tValue < nextTick) tValue = nextTick;
nextTick++;
The distribution curve you choose depends on what you're measuring.
A straight line, a sine wave, a polynomial or exponential curve may individually be the best distribution curve for a given set of measurements.
Once you've decided on the distribution curve, you calculate the missing data points by calculating the y value for any given time value (x value), using the mathematical formula of the curve.
As an example, for a straight line, all you need is one data point and the slope of the line. Let's say at time 0 the measured value is 10, and the measurement goes up by 2 every minute. The formula would by y = 2 * x + 10. if we wanted to calculate the measurement when x = 5 (minutes), the formula gives us a measurement of 20.
For a logarithmic curve, you'd use a logarithm formula. For simplicity, let's say that the actual measurements give us a formula of y = 2 ** x + 12; You plug in the time values (x values) you want to calculate, and calculate the measurements (y values).
Realize that you are introducing calculation errors by calculating data points instead of measuring. You should mark the calculated data points in some manner to help the person reading your graph differentiate them from actual measurements.
I am not exactly sure what you are trying to do, your code does not seem to match your example (it could be that I am screwing up the arithmetic). If you want your samples to have a minimum separation of 1 sec, and each point at a location of x times the last point (except for the first) then you want to find x such that x^(n - 1) = span. This is just x = exp(log(span) / (n - 1)). Then your points would be at x^i for(i = 0; i < n; i++)