First of all, I apologize if the question has already been asked, but in about 10 hours of intensive research on every single link Google offered for every single phrase I gave it, I wasn't able to find anything that could help me with my problem.
What I want to do is the following:
I retrieve two excel sheets with data from two different scientifical measurements. Each sheet contains information that can easily be compared to the other sheet, respectively.
The only difference between the two sheets is the amount of data points they contain.
For example: The first sheet contains data for a time span of 200 seconds, with one point representing 1 second. The second sheet also contains data for the same time span, but with one point representing 0.5 seconds.
The problem I have to solve, is to "scale" the sheet with less data points in a way that they can easily be compared in a single chart, so that each line in the chart uses the same space on the X axis.
The problem I'm having with this task is that im lacking sufficient mathematical background to create an algorithm.
I've already created the entire application with a GUI, the import of the excel sheets and smoothing with moving average (only useful if datasets have equal length).
Any idea or link to any place where this could be explained is welcome.
I also want to say that any code I currently have is completely irrelevant to this question, it's just about an additional method with said functionality.
Thanks in advance,
marfuc
If there is a direct correlation between the data points of both sets - ie the time matches up for both - then it might be sufficient to do a linear interpolation on the smaller set to generate the missing points.
For instance, let's say your first set of data is:
Time Value
12:00:00.0 100.0
12:00:01.0 120.0
12:00:02.0 117.5
...and your second set looks like:
Time Value
12:00:00.0 2.5
12:00:00.5 3.0
12:00:01.0 2.6
12:00:01.5 2.9
12:00:02.0 2.8
We can fill in the gaps in the first list in a couple of ways, depending on what you're trying to do with the data afterwards.
The simplest is to do a linear interpolation of the values. If your points are equidistant from the value you're looking for (ie: you're finding the value at the half-way point) then just average them together at the missing points:
Time Value Lerp
12:00:00.0 100.0
12:00:00.5 110.0
12:00:01.0 120.0
12:00:01.5 118.75
12:00:02.0 117.5
This is OK if the sample rate is high enough with relation to the rate at which the input varies. I've seen a lot of audio processing algorithms that use this sort of calculation for doubling sample rate. Doesn't work so well when you have high frequency data with sample rates that are too low to capture the transitions well.
The second option is to use a spline function to fit a curve against the series of points, then synthesize the missing points as offsets on the curve. This will give you smoother and more natural interpolations, with humps in the data looking much more realistic. This will also give you a fairly good way to offset your data if the timing isn't well aligned between the data sets - calculate each point as an offset along the curve with distance equal to the timing offset. There are plenty of spline implementations out there that you could use for this. I'd suggest Catmull-Rom as a starting algorithm.
Warning: If you're doing some sort of statistical analysis on the outputs then you're not going to get good results doing this, no matter how you do it. Cut the bigger group down instead of fabricating data into the smaller group if analysis is your goal.
Related
I recently got an assigment , where I have 4 on 5 matrix with 0 and 1(zero's represent the cut parts of the matrix). I need to find a way to calculate in how many pieces matrix will be sliced and as I mentioned zero's represent the cut parts of the matrix so if from one border to another goes straight line of zero's it means the matrix was sliced on that line , as for example(in this pic I've marked where matrix would be split, a line of zero's slice it) :
So guys I know that you won't entirely solve this code for me and I don't need that, but what I need is to understand this :
Firstly how should I tell a compiler in wich direction(when he is going threw matrix ) he should be going, I have an idea with enumarations.
Secondly , what kind of condition sentence I should use that the compiler would recognise the line of zero's(if there is such) ?
Any help would be appreciated :)
You did not specify any requirements for the algorithm (such as time and space complexity), so I guess one answer, that correlates to some specific solutions, would be:
Go in all 4 directions
Don't condition on 0s to create the line, but try to look at the 1s and find to which piece they belong.
A general algorithm for this can be implemented as follows:
Create a helper matrix of the same size, a function to give you a new symbol (for example by increasing some number any time a symbol is asked for), and a data structure to store collisions
Go in all 4 directions starting from anywhere
Whenever you find a 0 in the original matrix, issue some new symbol to each of the new directions you are going from there
Whenever you find 1, try to store the values in the helper matrix. If there is already a value there, then:
Store in the collisions data structure that you found a collision between 2 symbols
Don't continue in any direction from this one.
This will traverse each cell at most 4 time, so time complexity is O(n), and when you are done you will have a data structure with all the collisions.
Now all you need to do is combine all entries in the other data structure to collect how many unique pieces you really have.
I have a graph input where the X axis is time (going forwards). The Y axis is generally stable but has large drops and raises at different points (marked as the red arrows below)
Visually it's obvious but how do I efficiently detect this from within code? I'm not sure which algorithms I should be using but I would like to keep it as simple as possible.
A simple way is to calculate the difference between every two neighbouring samples, eg diff= abs(y[x point 1] - y[x point 0]) and calculate the standard deviation for all the differences. This will rank the differences in order for you and also help eliminate random noise which you get if you just sample largest diff values.
If your up/down values are over several x periods ( eg temp plotted every minute ), then calculate the diff over N samples, taking the max and min from the N samples. If you want 5 samples to be the detection period, then get samples 0,1,2,3,4 and extract min/max, use those for diff. Repeat for samples 1,2,3,4,5 and so on. You may need to play with this as too many samples starts affecting stddev.
An alternative method is to calculate the slope of up/down parts of the chart by subsampling and selecting slopes and lengths that are interesting. While this can be more accurate for automated detection it is much harder to describe the algorithm in depth.
I've worked on similar issues and built a chart categoriser, but would really love references to research in this area.
When you get this going, you may also want to look at 'control charts' from operations research, they identify several patterns that might also be worth detecting, depending on what your charts are of.
I'm looking at creating a heatmap of numerical data spread over various locations within a building. I've spent a few hours researching data mapping, etc. and am looking for some advice. I am new to GIS. The majority of options available are mostly tile APIs that use lat/long, and are overkill for my requirements...
Ultimately, I just want to output a background image (a floor plan) with the heatmap overlay demonstrating areas of high intensity. The data is bound to specific locations (example, activity level: 14, location: reception entrance) and so is not randomly distributed over the map. Data has timestamps, and the final objective is to print PNGs of hourly activity for animation.
I feel like I have two options.
I like this tutorial (http://dylanvester.com/post/Creating-Heat-Maps-with-NET-20-%28C-Sharp%29.aspx) as it offers a huge amount of flexibility and the final imagery is very similar to what I would like - it's a great head start. That said, I'd need to assign locations such as "reception entrance" to an x,y co-ordinate, or even a number of x,y co-ordinates. I'd then need to process a matrix prior to every heatmap, taking data from my CSV files and placing activity values in the appropriate co-ordinates.
The other option I think I have is to create a custom shapefile (?) from the floor plan. That is, create a vector graphic with defined regions, where each is attributable to a taggable location. This seems the most flexible option, but I'm really, really struggling to find out how to create shapefiles?
My unfamiliarity with GIS terminology is making searches difficult. The latter seems the most sensible solution (use the shapefile with something like https://gist.github.com/1370472) to change the activity values over time.
Links found:
guthcad.com/cad2shape.htm (but don't have CAD drawing, just raster floorplan)
stackoverflow.com/questions/4014072/arcgis-flex-overlay-floor-plan-png (unhelpful, don't want tiled)
oliverobrien.co.uk/2010/01/simple-choropleth-maps-in-quantum-gis/
gis.stackexchange.com/questions/20901/using-gis-for-interactive-floor-plan (looks great)
To summarise: I'd like to map data bound to locations within a building. There's very good code in a C# tutorial I'd like to use, but the linking of activity data to co-ordinates is potentially messy (although could allow for describing transitions of activity between locations as vectors between co-ordinates could be used...). The other option is to create an image with regions that can be linked to CSV data by something like QGIS. Could people with more experience suggest the best direction, or even alternatives?
Thank you!
I recently did something similar for a heatmap of certain events in the USA.
My input data was simply a CSV file with three columns: Latitude, Longitude, and Number of Events.
After examining available options, I ended up using GHeat.Net. It was quite easy to use and required only a little modification to meet my needs.
The output is a transparent PNG that I then overlaid onto Google Maps.
Although your scale is quite different, I imagine the same solution should work in your case.
UPDATE
If the x,y values are integers in a reasonably small range, and if you have enough samples, you might simply create a (sparse?) array, with each array element's value being the number of samples at that coordinate. Identify the "hottest" array element (the one with the most samples) and equate that to "white" in a heat map, with lesser values corresponding to colder colors (or in other words, normalize all values in the array using the highest value and map the normalized values to a color scale). Map the array to a PNG.
Heat maps like GHeat create a sphere of influence around each data point. Depending on your data, you may not need that.
If your sample rate is not high enough, you could lift the sphere of influence code out of GHeat and apply it to your own array.
The sphere of influence stuff basically adds a value of "1" to the specific coordinate in the data sample, and also adds a smaller value to adjacent pixels in the map in order to provide for smoother-looking maps. I don't know the specific algorithm used in GHeat, but the basic idea is to add to the specific x,y value as well as neighbors using a pattern something like this:
0.25 | 0.5 | 0.25
-----------------
0.5 | 1.0 | 0.5
-----------------
0.25 | 0.5 | 0.25
I am developing an application for an oscilloscope in c# .NET, I am drawing different kinds of waves (sine, square etc..) with the help of zedgraph control.
I get values from oscilloscope and stored in a buffer of size 1024(byte array) and have to calculate parameters like time period, Frequency, rise time, fall time etc at run time.
for this purpose i have to extract only a single cycle of whole signal.one more problem is that values are not always rise or fall continuously mean values are stored in buffer like this[0,0,0,1,1,2,3,4,5,5,6,6,6,5,5,4,3,2,1,1,0,0,0..........]. signals are continuously receive from machine.
it is not sure that waves are always oscillating around zero.
Thanks
Regards
Nilesh
You can estimate the frequency a number a of ways. Probably the easiest, if you have a math lib, is to compute the FFT and take the lowest frequency.
Alternatively you can check the zero crossings(around the mean value). The faster it oscillates about 0 the higher its frequency. Similarly the extrema tell you a lot about the frequency(think of a sinusoid whose extrema and zeroes alternate and are evenly spaced).
There is also a transform called the period transform but I don't remember it too much. I saw it in a book about music for finding the tempo of a song.
http://www.cs.berkeley.edu/~vazirani/s09quantum/notes/lecture4.pdf
Another way might be to use the auto-correlation and when it is large it means the function is in "sync" with itself(assuming it doesn't change shape to fast). and it should be easy to calculate the distance between these the maximums.
You could find out the time period between a crest and a trough, which will give you half the wavelength for that particular wave.
For graph 1, the first trough is 2, the first crest is 12. Find out the time taking between these points, and you have half the wavelength.
For graph two, the same principle applies, you can calculate the wavelength (and thus the period) for each section of the graph
I'm sampling a real-world sensor, and I need to display its filtered value. The signal is sampled at a rate of 10 Hz and during that period it could rise as much as 80 per cent of the maximum range.
Earlier I've used Root Mean Square as a filter and just applying it to the last five values I've logged. For this application this wouldn't be good because I don't store unchanged values. In other words, I need to consider time in my filter...
I've read at DSP Guide, but I didn't get much out of it. Is there a tutorial that's pinned specifically at programmers, and not Mathcad engineers? Are there some simple code snippets that could help?
Update: After several spreadsheet tests I've taken the executive decision to log all samples, and apply a Butterworth filter.
You always need to store some values (but not necessarily
all input values). A filter's current output depends on a
number of input values and possibly some past output values.
The simplest filter would be a first order Butterworth low-pass
filter. This would only require you to store one past output
value. The (current) output of the filter, y(n) is:
y(n) = x(n) - a1 * y(n-1)
where x(n) is the current input and y(n-1) is the previous
output of the filter. a1 depends on the cut-off frequency
and the sampling frequency. The cut-off frequency frequency
must be less than 5 Hz (half the sampling frequency),
sufficiently low to filter out the noise, but not so low
that the output will be delayed with respect to the input. And of
course not so low that the real signal is filtered out!
In code (mostly C#):
double a1 = 0.57; //0.57 is just an example value.
double lastY = 0.0;
while (true)
{
double x = <get an input value>;
double y = x - a1 * lastY;
<Use y somehow>
lastY = y;
}
Whether a first order filter is sufficient depends on your
requirements and the characteristics of the input signal (a
higher order filter may be able to suppress more of the
noise at the expense of higher delay of the output signal).
For higher order filters, more values would have to be stored
and the code becomes a little bit more complicated. Usually
the values need to be shifted down in arrays; in an array
for past y values and in an array for past x values.
In DSP, the term "filter" usually refers to the amplification or attenuation (i.e. "lowering") of frequency components within a continuous signal. This is commonly done using Fast Fourier Transform (FFT). FFT starts with a signal recorded over a given length of time (the data are in what's called the "time domain") and transforms these values into what's called the "frequency domain", where the results indicate the strength of the signal in a series of frequency "bins" that range from 0 Hz up to the sampling rate (10 Hz in your case). So, as a rough example, an FFT of one second's worth of your data (10 samples) would tell you the strength of your signal at 0-2 Hz, 2-4 Hz, 4-6 Hz, 6-8 Hz, and 8-10 Hz.
To "filter" these data, you would increase or decrease any or all of these signal strength values, and then perform a reverse FFT to transform these values back into a time-domain signal. So, for example, let's say you wanted to do a lowpass filter on your transformed data, where the cutoff frequency was 6 Hz (in other words, you want to remove any frequency components in your signal above 6 Hz). You would programatically set the 6-8 Hz value to zero and set the 8-10 Hz value to 0, and then do a reverse FFT.
I mention all this because it doesn't sound like "filtering" is really what you want to do here. I think you just want to display the current value of your sensor, but you want to smooth out the results so that it doesn't respond excessively to transient fluctuations in the sensor's measured value. The best way to do this is with a simple running average, possibly with the more recent values weighted more heavily than older values.
A running average is very easy to program (much easier than FFT, trust me) by storing a collection of the most recent measurements. You mention that your app only stores values that are different from the prior value. Assuming you also store the time at which each value is recorded, it should be easy for your running average code to fill in the "missing values" by using the recorded prior values.
I don't have a tutorial that will help you, but in C# you may want to consider using Reactive LINQ - see blog post Reactive programming (II.) - Introducing Reactive LINQ.
As a way to get the events, so you can do your processing without having to store all the values, it would just do the processing as you get the next event in.
To consider time, you could just use an exponential with a negative exponent to decrease the impact of the past measurements.
Yes, for complex real-time systems sampling multiple streams of data, there could be an issue in the data processing (calculation and storage of data) and data consistency.