Detecting significant changes in data

Detecting significant changes in data - c#

I have a graph input where the X axis is time (going forwards). The Y axis is generally stable but has large drops and raises at different points (marked as the red arrows below)
Visually it's obvious but how do I efficiently detect this from within code? I'm not sure which algorithms I should be using but I would like to keep it as simple as possible.

A simple way is to calculate the difference between every two neighbouring samples, eg diff= abs(y[x point 1] - y[x point 0]) and calculate the standard deviation for all the differences. This will rank the differences in order for you and also help eliminate random noise which you get if you just sample largest diff values.
If your up/down values are over several x periods ( eg temp plotted every minute ), then calculate the diff over N samples, taking the max and min from the N samples. If you want 5 samples to be the detection period, then get samples 0,1,2,3,4 and extract min/max, use those for diff. Repeat for samples 1,2,3,4,5 and so on. You may need to play with this as too many samples starts affecting stddev.
An alternative method is to calculate the slope of up/down parts of the chart by subsampling and selecting slopes and lengths that are interesting. While this can be more accurate for automated detection it is much harder to describe the algorithm in depth.
I've worked on similar issues and built a chart categoriser, but would really love references to research in this area.
When you get this going, you may also want to look at 'control charts' from operations research, they identify several patterns that might also be worth detecting, depending on what your charts are of.

Related

Scaling/Interpolating measure data in C#

First of all, I apologize if the question has already been asked, but in about 10 hours of intensive research on every single link Google offered for every single phrase I gave it, I wasn't able to find anything that could help me with my problem.
What I want to do is the following:
I retrieve two excel sheets with data from two different scientifical measurements. Each sheet contains information that can easily be compared to the other sheet, respectively.
The only difference between the two sheets is the amount of data points they contain.
For example: The first sheet contains data for a time span of 200 seconds, with one point representing 1 second. The second sheet also contains data for the same time span, but with one point representing 0.5 seconds.
The problem I have to solve, is to "scale" the sheet with less data points in a way that they can easily be compared in a single chart, so that each line in the chart uses the same space on the X axis.
The problem I'm having with this task is that im lacking sufficient mathematical background to create an algorithm.
I've already created the entire application with a GUI, the import of the excel sheets and smoothing with moving average (only useful if datasets have equal length).
Any idea or link to any place where this could be explained is welcome.
I also want to say that any code I currently have is completely irrelevant to this question, it's just about an additional method with said functionality.
Thanks in advance,
marfuc

If there is a direct correlation between the data points of both sets - ie the time matches up for both - then it might be sufficient to do a linear interpolation on the smaller set to generate the missing points.
For instance, let's say your first set of data is:
Time Value
12:00:00.0 100.0
12:00:01.0 120.0
12:00:02.0 117.5
...and your second set looks like:
Time Value
12:00:00.0 2.5
12:00:00.5 3.0
12:00:01.0 2.6
12:00:01.5 2.9
12:00:02.0 2.8
We can fill in the gaps in the first list in a couple of ways, depending on what you're trying to do with the data afterwards.
The simplest is to do a linear interpolation of the values. If your points are equidistant from the value you're looking for (ie: you're finding the value at the half-way point) then just average them together at the missing points:
Time Value Lerp
12:00:00.0 100.0
12:00:00.5 110.0
12:00:01.0 120.0
12:00:01.5 118.75
12:00:02.0 117.5
This is OK if the sample rate is high enough with relation to the rate at which the input varies. I've seen a lot of audio processing algorithms that use this sort of calculation for doubling sample rate. Doesn't work so well when you have high frequency data with sample rates that are too low to capture the transitions well.
The second option is to use a spline function to fit a curve against the series of points, then synthesize the missing points as offsets on the curve. This will give you smoother and more natural interpolations, with humps in the data looking much more realistic. This will also give you a fairly good way to offset your data if the timing isn't well aligned between the data sets - calculate each point as an offset along the curve with distance equal to the timing offset. There are plenty of spline implementations out there that you could use for this. I'd suggest Catmull-Rom as a starting algorithm.
Warning: If you're doing some sort of statistical analysis on the outputs then you're not going to get good results doing this, no matter how you do it. Cut the bigger group down instead of fabricating data into the smaller group if analysis is your goal.

Desiring jagged results from simplex noise or another algorithm just as fast

I'm wanting to do some placement of objects like trees and the like based on noise for the terrain of a game/tech demo.
I've used value noise previously and I believe I understand perlin noise well enough. Simplex noise, however, escapes me quite well (just a tad over my head at present).
I have an implementation in C# of simplex noise, however, it's almost completely stolen from here. It works beautifully, but I just don't understand it well enough to modify it for my own purposes.
It is quite fast, but it also gives rather smooth results. I'm actually wanting something that is a little more jagged, like simple linear interpolation would give when I was doing value noise. My issue here is that due to the amount of calls I'd be doing for these object placements and using fractal Brownian motion, the speed of the algorithm becomes quite important.
Any suggestions on how to get more 'jagged' results like linear interpolation gives with value noise using a faster algorithm than value noise is?

if you are using a complex noise function to do a simple task like the placement of trees, your using completely the wrong type of maths function. It is a very specific function which is great for making textures and 3d shapes and irregular curves. Placing treas on 2d certainly doesn't need irregular curves! Unless you want to place trees along in lines that are irregular and curved!
unless you mean you want to place trees in areas of the noise which are a certain level, for example where the noise is larger than 0.98, which will give you nicely randomised zones that you can use as a central point saying some trees will be there.
it will be a lot faster and a lot easier to vary, if you just use any normal noise function, just program your placement code around the noise function. I mean a predictable pseudo-random noise function which is the same every time you use it.
use integers 0 to 10 and 20 to 30, multiplied by your level number, to select 10 X and 10 Y points on the same pseudo-random noise curve. this will give you 10 random spots on your map from where to do stuff using almost no calculations.
Once you have the central point where trees will be, use another 10 random points from the function to say how many trees will be there, another 10 to say how far apart they will be, for the distribution around the tree seed quite exceptional.
The other option, if you want to change the curve http://webstaff.itn.liu.se/~stegu/simplexnoise/simplexnoise.pdf is to read this paper and look at the polynomial function /whatever gradient function could be used in your code, looking the comments for the gradient function, commented out and do X equals Y, which should give you a straight interpolation curve.
if you vote this answer up, I should have enough points in order to comment on this forum:]

I realise this is a very old question, but I felt that the previous answer was entirely wrong, so I wanted to clarify how you should use a noise function to determine the placement of things like trees / rocks / bushes.
Basically, if you want to globally place items across a terrain, you're going to need some function which tells you where those are likely to occur. For instance, you might say "trees need to be on slopes of 45 degrees or less, and below 2000 meters". This gives you a map of possible places for trees. But now you need to choose random, but clustered locations for them.
The best way of doing this is to multiply your map of zeroes and ones by a fractal function (i.e. a Simplex noise function or one generated through subdivision and displacement - see https://fractal-landscapes.co.uk/maths).
This then gives you a probability density function, where the value at a point represents the relative probability of placing a tree at that location. Now you store the partial sum of that function for every location on the map. To place a new tree:
Choose a random number between 0 and the maximum of the summed function.
Do a binary search to find the location on the map in this range.
Place the tree there.
Rinse and repeat.
This allows you to place objects where they belong, according to their natural ranges and so on.

Simple way to calculate point of intersection between two polygons in C#

I've got two polygons defined as a list of Vectors, I've managed to write routines to transform and intersect these two polygons (seen below Frame 1). Using line-intersection I can figure out whether these collide, and have written a working Collide() function.
This is to be used in a variable step timed game, and therefore (as shown below) in Frame 1 the right polygon is not colliding, it's perfectly normal for on Frame 2 for the polygons to be right inside each other, with the right polygon having moved to the left.
My question is, what is the best way to figure out the moment of intersection? In the example, let's assume in Frame 1 the right polygon is at X = 300, Frame 2 it moved -100 and is now at 200, and that's all I know by the time Frame 2 comes about, it was at 300, now it's at 200. What I want to know is when did it actually collide, at what X value, here it was probably about 250.
I'm preferably looking for a C# source code solution to this problem.
Maybe there's a better way of approaching this for games?

I would use the separating axis theorem, as outlined here:
Metanet tutorial
Wikipedia
Then I would sweep test or use multisampling if needed.
GMan here on StackOverflow wrote a sample implementation over at gpwiki.org.
This may all be overkill for your use-case, but it handles polygons of any order. Of course, for simple bounding boxes it can be done much more efficiently through other means.

I'm no mathematician either, but one possible though crude solution would be to run a mini simulation.
Let us call the moving polygon M and the stationary polygon S (though there is no requirement for S to actually be stationary, the approach should work just the same regardless). Let us also call the two frames you have F1 for the earlier and F2 for the later, as per your diagram.
If you were to translate polygon M back towards its position in F1 in very small increments until such time that they are no longer intersecting, then you would have a location for M at which it 'just' intersects, i.e. the previous location before they stop intersecting in this simulation. The intersection in this 'just' intersecting location should be very small — small enough that you could treat it as a point. Let us call this polygon of intersection I.
To treat I as a point you could choose the vertex of it that is nearest the centre point of M in F1: that vertex has the best chance of being outside of S at time of collision. (There are lots of other possibilities for interpreting I as a point that you could experiment with too that may have better results.)
Obviously this approach has some drawbacks:
The simulation will be slower for greater speeds of M as the distance between its locations in F1 and F2 will be greater, more simulation steps will need to be run. (You could address this by having a fixed number of simulation cycles irrespective of speed of M but that would mean the accuracy of the result would be different for faster and slower moving bodies.)
The 'step' size in the simulation will have to be sufficiently small to get the accuracy you require but smaller step sizes will obviously have a larger calculation cost.
Personally, without the necessary mathematical intuition, I would go with this simple approach first and try to find a mathematical solution as an optimization later.

If you have the ability to determine whether the two polygons overlap, one idea might be to use a modified binary search to detect where the two hit. Start by subdividing the time interval in half and seeing if the two polygons intersected at the midpoint. If so, recursively search the first half of the range; if not, search the second half. If you specify some tolerance level at which you no longer care about small distances (for example, at the level of a pixel), then the runtime of this approach is O(log D / K), where D is the distance between the polygons and K is the cutoff threshold. If you know what point is going to ultimately enter the second polygon, you should be able to detect the collision very quickly this way.
Hope this helps!

For a rather generic solution, and assuming ...
no polygons are intersecting at time = 0
at least one polygon is intersecting another polygon at time = t
and you're happy to use a C# clipping library (eg Clipper)
then use a binary approach to deriving the time of intersection by...
double tInterval = t;
double tCurrent = 0;
int direction = +1;
while (tInterval > MinInterval)
{
tInterval = tInterval/2;
tCurrent += (tInterval * direction);
MovePolygons(tCurrent);
if (PolygonsIntersect)
direction = +1;
else
direction = -1;
}

Well - you may see that it's allways a point of one of the polygons that hits the side of the other first (or another point - but thats after all almost the same) - a possible solution would be to calculate the distance of the points from the other lines in the move-direction. But I think this would end beeing rather slow.
I guess normaly the distances between frames are so small that it's not importand to really know excactly where it hit first - some small intersections will not be visible and after all the things will rebound or explode anyway - don't they? :)

calculating frequency for signal in c# .net

I am developing an application for an oscilloscope in c# .NET, I am drawing different kinds of waves (sine, square etc..) with the help of zedgraph control.
I get values from oscilloscope and stored in a buffer of size 1024(byte array) and have to calculate parameters like time period, Frequency, rise time, fall time etc at run time.
for this purpose i have to extract only a single cycle of whole signal.one more problem is that values are not always rise or fall continuously mean values are stored in buffer like this[0,0,0,1,1,2,3,4,5,5,6,6,6,5,5,4,3,2,1,1,0,0,0..........]. signals are continuously receive from machine.
it is not sure that waves are always oscillating around zero.
Thanks
Regards
Nilesh

You can estimate the frequency a number a of ways. Probably the easiest, if you have a math lib, is to compute the FFT and take the lowest frequency.
Alternatively you can check the zero crossings(around the mean value). The faster it oscillates about 0 the higher its frequency. Similarly the extrema tell you a lot about the frequency(think of a sinusoid whose extrema and zeroes alternate and are evenly spaced).
There is also a transform called the period transform but I don't remember it too much. I saw it in a book about music for finding the tempo of a song.
http://www.cs.berkeley.edu/~vazirani/s09quantum/notes/lecture4.pdf
Another way might be to use the auto-correlation and when it is large it means the function is in "sync" with itself(assuming it doesn't change shape to fast). and it should be easy to calculate the distance between these the maximums.

You could find out the time period between a crest and a trough, which will give you half the wavelength for that particular wave.
For graph 1, the first trough is 2, the first crest is 12. Find out the time taking between these points, and you have half the wavelength.
For graph two, the same principle applies, you can calculate the wavelength (and thus the period) for each section of the graph

Metric to compare how similar two 2D linear lines are

Is there an algorithm ( preferably in C# implementation) that allows me to compare how similar two lines are? In my case I have one reference line, and I have a lot of secondary lines, I need to choose, out of so many secondary lines, which is the closest to the reference line.
Edit: It is a 2D line, with start and stop points. When you compare the similarities, you to take into account of the full blown line. The direction of the line ( i.e., whether it's from left to right or vice versa) is not important. And yes, it has to do with how close it is from one another
I know this is kind of subjective ( the similarity, not the question), but still, I am sure there are people who have done work on this.

Obvious metrics include slope, length, and distance between midpoints. You could calculate those and then find weightings that you like.
If you want to kind of wrap them all up into one thing, try the sum of the distances between the endpoints.
You're going to have to try a few things and see which cases irritate you and then figure out why.

lines (and in general hyperplanes) sit on an object call Grassmanian; e.g. lines in the plane sit in Gr(1,3), which is isomorphic to the 2-dimensional projective space, and yours is the simplest non trivial one: Gr(2,4). It is a compact metric space, which comes with a standard metric (arising from the plucker embedding - see the link above). However, this metric is a little expensive to compute, so you may want to consider an approximation (just as you'd consider using dot product instead of angle in 2 dimensions - it works find for small angles)
A more detailed explantion (based in the metric defined in the linked wikipedia article):
For each line l take two points (x1,y1,z1) and (x2,y2,z2) on it. Let A be the 4 by 2 matrix whose columns are (1,x1,y1,z1)^t and (1,x2,y2,z2)^t. Define P to be the 4 by 4 matrix
A(A^tA)^(-1)A^t. Then P is dependent only on l and not of the choice of the two points.
The metric you want is the absolute value of the top eigen value of the difference between the matrices corresponding to the two lines.

If you are talking about lines in the graphical sense, then I would look at a combination of things like line length and angle.
Depending on your situation, you may be able to make optimizations such as using the square of the length (saves a square root) and dy/dx for angle (saves a trig function, but watch for the divide-by-zero case).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.