Have an 'Unidentified' Class in Accord.Net SVM - c#

I'm using the MultiClassSupportVectorMachine class to do some classification. Specifically, my data has 24 dimensions, with the values being grouped pretty close together. I will be identifying around 10 or so classes in this data.
I'm looking to identify when an inputted value is really far off from the groups. Something along the lines of having a class 0 which would be unidentified and then having classes 1 to 10 only being outputted when the SVM has a high degree of confidence it is in the group.
In essence I am looking to go from the top to the bottom of this image:
this image showing SVM classification
Is something like this possible in accord.net?
Thanks!

I answered my own question!
This can be accomplished by using the Probability function to get an estimate of how accurate the guess is from the SupportVectorMachine class, and then using that as a threshold to reject guesses with a low probability.

Related

Is it possible to create isosurfaces using Oxyplot?

I'm using Oxyplot HeatMapSeries for representing some graphical data.
For a new application I need to represent the data with isosurfaces, something looking like this:
Some ideas around this:
I know the ContourSeries can do the isolines, but I can't find any option that allows me to fill the gaps between lines. Does this option exists?
I know the HeatMapSeries can be shown under the contourSeries so I can get a similar result but it does not fit our needs. .
Another option wolud be limiting the HeatMapSeries colours and eliminate the interpolation. Is this possible?
If anyone has another approach to the solution I will hear it!
Thanks in advance!
I'm evaluating whether Oxyplot will meet my needs and this question interests me... from looking at the ContourSeries source code, it appears to be only for finding and rendering the contour lines, but not filling the area between the lines. Looking at AreaSeries, I don't think you could just feed it contours because it is expecting two sets of points which when the ends are connected create a simple closed polygon. The best guess I have is "rasterizing" your data so that you round each data point to the nearest contour level, then plot the heatmap of that rasterized data under the contour. The ContourSeries appears to calculate a level step that does 20 levels across the data by default.
My shortcut for doing the rasterizing based on a step value is to divide the data by the level step you want, then truncate the number with Math.Floor.
Looking at HeatMapSeries, it looks like you can possibly try to turn interpolation off, use a HeatMapRenderMethod.Rectangles render method, or supply a LinearColorAxis with fewer steps and let the rendering do the rasterization perhaps? The Palettes available for a LinearColorAxis can be seen in the OxyPalettes source: BlueWhiteRed31, Hot64, Hue64, BlackWhiteRed, BlueWhiteRed, Cool, Gray, Hot, Hue, HueDistinct, Jet, and Rainbow.
I'm not currently in a position to run OxyPlot to test things, but I figured I would share what I could glean from the source code and limited documentation.

Scaling/Interpolating measure data in C#

First of all, I apologize if the question has already been asked, but in about 10 hours of intensive research on every single link Google offered for every single phrase I gave it, I wasn't able to find anything that could help me with my problem.
What I want to do is the following:
I retrieve two excel sheets with data from two different scientifical measurements. Each sheet contains information that can easily be compared to the other sheet, respectively.
The only difference between the two sheets is the amount of data points they contain.
For example: The first sheet contains data for a time span of 200 seconds, with one point representing 1 second. The second sheet also contains data for the same time span, but with one point representing 0.5 seconds.
The problem I have to solve, is to "scale" the sheet with less data points in a way that they can easily be compared in a single chart, so that each line in the chart uses the same space on the X axis.
The problem I'm having with this task is that im lacking sufficient mathematical background to create an algorithm.
I've already created the entire application with a GUI, the import of the excel sheets and smoothing with moving average (only useful if datasets have equal length).
Any idea or link to any place where this could be explained is welcome.
I also want to say that any code I currently have is completely irrelevant to this question, it's just about an additional method with said functionality.
Thanks in advance,
marfuc
If there is a direct correlation between the data points of both sets - ie the time matches up for both - then it might be sufficient to do a linear interpolation on the smaller set to generate the missing points.
For instance, let's say your first set of data is:
Time Value
12:00:00.0 100.0
12:00:01.0 120.0
12:00:02.0 117.5
...and your second set looks like:
Time Value
12:00:00.0 2.5
12:00:00.5 3.0
12:00:01.0 2.6
12:00:01.5 2.9
12:00:02.0 2.8
We can fill in the gaps in the first list in a couple of ways, depending on what you're trying to do with the data afterwards.
The simplest is to do a linear interpolation of the values. If your points are equidistant from the value you're looking for (ie: you're finding the value at the half-way point) then just average them together at the missing points:
Time Value Lerp
12:00:00.0 100.0
12:00:00.5 110.0
12:00:01.0 120.0
12:00:01.5 118.75
12:00:02.0 117.5
This is OK if the sample rate is high enough with relation to the rate at which the input varies. I've seen a lot of audio processing algorithms that use this sort of calculation for doubling sample rate. Doesn't work so well when you have high frequency data with sample rates that are too low to capture the transitions well.
The second option is to use a spline function to fit a curve against the series of points, then synthesize the missing points as offsets on the curve. This will give you smoother and more natural interpolations, with humps in the data looking much more realistic. This will also give you a fairly good way to offset your data if the timing isn't well aligned between the data sets - calculate each point as an offset along the curve with distance equal to the timing offset. There are plenty of spline implementations out there that you could use for this. I'd suggest Catmull-Rom as a starting algorithm.
Warning: If you're doing some sort of statistical analysis on the outputs then you're not going to get good results doing this, no matter how you do it. Cut the bigger group down instead of fabricating data into the smaller group if analysis is your goal.

How to detect string when pitch-tracking on electric guitar?

Hi I'm a noob in audio related coding and I'm working in a pitch tracking DLL that I will use to try to create a sort of open-source version of the video-game Rocksmith as a learning experience.
So far I have managed to get the FFT to work so I can detect pitch frequency (Hz) then by using an algorithm and the table below I can manage to determine the octave (2th to 6th) and the note (C to B) for played note.
The next step is to detect the string so I can determine the fret.
I've been thinking about it and in theory I can work with this, I will know when the user is playing the right note but the game could be "Hack" because by just using the Hz the game is not able to detect if a note is played in the right string. For example 5th string + 1th fret = C4 261.63Hz is equals to 6th string + 5th fret = C4 261.63Hz.
The chances of having an user playing a note in the wrong string and getting it right is low, but I think it would be really good to know the string so I can provide to the users some error feedback when they play the wrong string (Like you should go a string up or down).
Do you know what can I do to detect the string? Thanks in advance :)
[edit]
The guitar and strings that we are using affect the timbre so analyzing the timbre seems to not be a easy way of detecting strings:
"Variations in timbre on your guitar are produced by an enormous number of factors from pickup design and position, the natural resonances and damping in your guitar due to the wood used (that's a different sort of timber!) and its construction and shape, the gauge and age of your strings, your playing technique, where you fret and pluck the string, and so on."
This might be a little bit late because the post is one years old. But here's a solution, which I found out after long research for pitch detecting a guitar.
THIS IS WHY FFT DOESN'T WORK :
You cannot use FFT since the result gives you a linear array, and the sound is calculated logarithmically (exponential distance between notes). Plus, FFT gives you an array of bins in which your frequency COULD BE, it doesnt give you the precise result.
THIS IS WHAT I SUGGEST :
Use dywapitchtrack. it's a library that uses a wavelet algorythm, which works directly on your wave instead of calculating large bins like FFT.
description:
The dywapitchtrack is based on a custom-tailored algorithm which is of very high quality:
both very accurate (precision < 0.05 semitones), very low latency (< 23 ms) and
very low error rate. It has been thoroughly tested on human voice.
It can best be described as a dynamic wavelet algorithm (dywa):
DOWNLOAD : https://github.com/inniyah/sndpeek/tree/master/src/dywapitchtrack
USE(C++):
put the .c and .h where you need it and import it in your project
include the header file
//Create a dywapitchtracker Object
dywapitchtracker pitchtracker;
//Initialise the object with this function
dywapitch_inittracking(&pitchtracker);
When your buffer is full (buffer needs to be at 44100 resolution and power of 2 of length, mine is 2048):
//use this function with your buffer
double thePitch = dywapitch_computepitch(&pitchtracker, yourBuffer, 0, 2048);
And voilĂ , thePitch contains precisely what you need. (feel free to ask question if something is unclear)
An simple FFT peak estimator is not a good guitar pitch detector/estimator, due to many potentially strong overtones. There exist more robust pitch estimation algorithms (search stackoverflow and DSP.stackexchange). But if you require the players to pre-characterize each string on their individual instruments, both open and fretted, before starting the game, an FFT fingerprint of those characterizations might be able to differentiate the same note played on different strings on some guitars. The thicker strings will give off slightly different ratios of energy in some of the higher overtones, as well as different amounts of slight inharmonicity.
The other answers seem to suggest a simple pitch detection method. However, it is something you will have to research.
Specifically, compare the overtones of 5th string 1st fret to sixth string 5th fret. that is, only look at 261.63*2, 261.63*3, *4, etc. Also, try looking at 261.63*0.5. Compare the amplitudes of the two signals at these freqs. There might be a pattern that could be detected.

Bell Curve algorithm to adjust set of scores

I am faced with a challenge whereby the business user would like a "Bell curve" applied to their scoring.
This system scores people on a 1-5 point scale. The requirement is that most people score too generously, and they would like for the scores within a group of people to be adjusted down (or up) based on a bell curve.
I would assume then that they are trying to make the majority of people sit at the median level i.e. 3 in this case. I am not sure that the client is correct in their terminology wrt Bell Curve but the requirement is that the scores are leveled out to the 3 level.
What would be the best algorithm to achieve this?
For example, in one group they might have a 3,4,4,3,5 group of scores. in this case the scoring is on average higher than 3.What would be a fair way to adjust all these scores so that the "bell curve" is applied?
The bell curve is the Probability Distribution Function (PDF) of the normal distribution, so that's your goal.
The key to this transformation is the Cumulative Distribution Function (CDF). In words, "y% of the values are less or equal to x". You can easily table the CDF that you have in your input. The CDF of the normal distribution is also known (integral of the bell curve).
Together, this gives you: "y% of the scores are less than x, but according to the normal distribution, y% of the scores should be less than x', therefore the correction is x -> x' "
Mathematically, this is done via the probit function.
You usually assume that your data fit a distribution instead of transforming your data into a given distribution.
If your input data fit a normal distribution ("bell curve"), then you can adjust by simply add/remove the same value from all the sample.
The distribution will be preserved, only the mean will change.
If you want to center your distribution on a given mean, just add the difference between your target mean and the actual one.

Advice: custom shapefile or C# heatmap of matrix

I'm looking at creating a heatmap of numerical data spread over various locations within a building. I've spent a few hours researching data mapping, etc. and am looking for some advice. I am new to GIS. The majority of options available are mostly tile APIs that use lat/long, and are overkill for my requirements...
Ultimately, I just want to output a background image (a floor plan) with the heatmap overlay demonstrating areas of high intensity. The data is bound to specific locations (example, activity level: 14, location: reception entrance) and so is not randomly distributed over the map. Data has timestamps, and the final objective is to print PNGs of hourly activity for animation.
I feel like I have two options.
I like this tutorial (http://dylanvester.com/post/Creating-Heat-Maps-with-NET-20-%28C-Sharp%29.aspx) as it offers a huge amount of flexibility and the final imagery is very similar to what I would like - it's a great head start. That said, I'd need to assign locations such as "reception entrance" to an x,y co-ordinate, or even a number of x,y co-ordinates. I'd then need to process a matrix prior to every heatmap, taking data from my CSV files and placing activity values in the appropriate co-ordinates.
The other option I think I have is to create a custom shapefile (?) from the floor plan. That is, create a vector graphic with defined regions, where each is attributable to a taggable location. This seems the most flexible option, but I'm really, really struggling to find out how to create shapefiles?
My unfamiliarity with GIS terminology is making searches difficult. The latter seems the most sensible solution (use the shapefile with something like https://gist.github.com/1370472) to change the activity values over time.
Links found:
guthcad.com/cad2shape.htm (but don't have CAD drawing, just raster floorplan)
stackoverflow.com/questions/4014072/arcgis-flex-overlay-floor-plan-png (unhelpful, don't want tiled)
oliverobrien.co.uk/2010/01/simple-choropleth-maps-in-quantum-gis/
gis.stackexchange.com/questions/20901/using-gis-for-interactive-floor-plan (looks great)
To summarise: I'd like to map data bound to locations within a building. There's very good code in a C# tutorial I'd like to use, but the linking of activity data to co-ordinates is potentially messy (although could allow for describing transitions of activity between locations as vectors between co-ordinates could be used...). The other option is to create an image with regions that can be linked to CSV data by something like QGIS. Could people with more experience suggest the best direction, or even alternatives?
Thank you!
I recently did something similar for a heatmap of certain events in the USA.
My input data was simply a CSV file with three columns: Latitude, Longitude, and Number of Events.
After examining available options, I ended up using GHeat.Net. It was quite easy to use and required only a little modification to meet my needs.
The output is a transparent PNG that I then overlaid onto Google Maps.
Although your scale is quite different, I imagine the same solution should work in your case.
UPDATE
If the x,y values are integers in a reasonably small range, and if you have enough samples, you might simply create a (sparse?) array, with each array element's value being the number of samples at that coordinate. Identify the "hottest" array element (the one with the most samples) and equate that to "white" in a heat map, with lesser values corresponding to colder colors (or in other words, normalize all values in the array using the highest value and map the normalized values to a color scale). Map the array to a PNG.
Heat maps like GHeat create a sphere of influence around each data point. Depending on your data, you may not need that.
If your sample rate is not high enough, you could lift the sphere of influence code out of GHeat and apply it to your own array.
The sphere of influence stuff basically adds a value of "1" to the specific coordinate in the data sample, and also adds a smaller value to adjacent pixels in the map in order to provide for smoother-looking maps. I don't know the specific algorithm used in GHeat, but the basic idea is to add to the specific x,y value as well as neighbors using a pattern something like this:
0.25 | 0.5 | 0.25
-----------------
0.5 | 1.0 | 0.5
-----------------
0.25 | 0.5 | 0.25

Categories