I have a few questions concerning backpropogation. I'm trying to learn the fundamentals behind neural network theory and wanted to start small, building a simple XOR classifier. I've read a lot of articles and skimmed multiple textbooks - but I can't seem to teach this thing the pattern for XOR.
Firstly, I am unclear about the learning model for backpropogation. Here is some pseudo-code to represent how I am trying to train the network. [Lets assume my network is setup properly (ie: multiple inputs connect to a hidden layer connect to an output layer and all wired up properly)].
SET guess = getNetworkOutput() // Note this is using a sigmoid activation function.
SET error = desiredOutput - guess
SET delta = learningConstant * error * sigmoidDerivative(guess)
For Each Node in inputNodes
For Each Weight in inputNodes[n]
inputNodes[n].weight[j] += delta;
// At this point, I am assuming the first layer has been trained.
// Then I recurse a similar function over the hidden layer and output layer.
// The prime difference being that it further divi's up the adjustment delta.
I realize this is probably not enough to go off of, and I will gladly expound on any part of my implementation. Using the above algorithm, my neural network does get trained, kind of. But not properly. The output is always
XOR 1 1 [smallest number]
XOR 0 0 [largest number]
XOR 1 0 [medium number]
XOR 0 1 [medium number]
I can never train the [1,1] [0,0] to be the same value.
If you have any suggestions, additional resources, articles, blogs, etc for me to look at I am very interested in learning more about this topic. Thank you for your assistance, I appreciate it greatly!
Ok. First of all. Backpropagation as it states work from back. From output through all hidden layers up to input layer. The error which is counted in last layer is "propagated" to all previous ones. So lets assume you have model of type: input - 1 hidden layer - output. In first step you count error from desired value and one you have. Then you do backprop on weights between hidden and output. And after that you do backprop for weights between input and hidden. In each step you backprop error from next to previous layer, simple. But maths can be confusing ;) Please take look at his short chapter for further explanation: http://page.mi.fu-berlin.de/rojas/neural/chapter/K7.pdf
Related
I have the following neural network which uses RPOP - Resilent back propagation
NetCore = new BasicNetwork();
NetCore.AddLayer(new BasicLayer(null, false, 32));
NetCore.AddLayer(new BasicLayer(new ActivationTANH(), true, 65));
NetCore.AddLayer(new BasicLayer(new ActivationTANH(), true, 65));
NetCore.AddLayer(new BasicLayer(new ActivationSigmoid(), false, 1));
NetCore.Structure.FinalizeStructure();
NetCore.Reset();
(I've posted the code just to be sure that i am doing right, if no someone would point out, i hope)
After training the network, the error rate is minimized to around 1%, i pass the test data and most of the time the output is produced something like this "5,07080020755566E-10" where i expect numbers from 0 to 1 and also it should be noted that when such cases occur they are always positive number(haven't encountered negative outputs yet).
The second question, which i wanted to ask, is as follows : the neural network is meant to predict soccer matches, so considering that i have 32 inputs. 16 inputs are for team 1 performance data and the 16 are for team 2.
The training sets are prepared like so: say we have 1000 matches and all of those training sets' output is 1.
So during the preparation of the training sets reversed matches are added additonaly, where the output is 0 and of course team 1 and team 2 inputs are changed respectively.
and when testing i get the following results for the same match
Output 0,0125940938512236 Desired 1 direct
Output 0,0386960820583483 Desired 0 reversed
The question is why? :)
I will appreciate any help.
Spreading a light to this issue would point me the direction where should i dig.
Thanks in advance.
After training the network, the error rate is minimized to around 1%, i pass the test data and most of the time the output is produced something like this "5,07080020755566E-10" where i expect numbers from 0 to 1 and also it should be noted that when such cases occur they are always positive number(haven't encountered negative outputs yet).
5,07080020755566E-10 is a number between 0 and 1. It's a very small number - only just a tiny bit more than 0. (I'm assuming your culture uses comma as a decimal separator.) It's 0,00000000050708(...) - the E-10 means "shifted 10 decimal places to the right".
I didn't really follow your second question - I suggest you ask it separately, and with more detail - assuming it's really a programming question at all. (It's hard to tell at the moment.)
In my project i face a scenario where i have a function with numerous inputs. At a certain point i am provided with an result and i need to find one combination of inputs that generates that result.
Here is some pseudocode that illustrates the problem:
Double y = f(x_0,..., x_n)
I am provided with y and i need to find any combination that fits the input.
I tried several things on paper that could generate something, but my each parameter has a range of 6.5 x 10^9 possible values - so i would like to get an optimal execution time.
Can someone name an algorithm or a topic that will be useful for me so i can read up on how other people solved simmilar problems.
I was thinking along the lines of creating a vector from the inputs and judjing how good that vektor fits the problem. This sounds awful lot like an NN, but there is no training phase available.
Edit:
Thank you all for the feedback. The comments sum up the Problems i have and i will try something along the lines of hill climbing.
The general case for your problem might be impossible to solve, but for some cases there are numerical methods that can help you solve your problem.
For example, in 1D space, if you can find a number that is smaller then y and one that is higher then y - you can use the numerical method regula-falsi in order to numerically find the "root" (which is y in your case, by simply invoking the method onf(x) -y).
Other numerical method to find roots is newton-raphson
I admit, I am not familiar with how to apply these methods on multi dimensional space - but it could be a starter. I'd search the literature for these if I were you.
Note: using such a method almost always requires some knowledge on the function.
Another possible solution is to take g(X) = |f(X) - y)|, and use some heuristical algorithms in order to find a minimal value of g. The problem with heuristical methods is they will get you "close enough" - but seldom will get you exactly to the target (unless the function is convex)
Some optimizations algorithms are: Genethic Algorithm, Hill Climbing, Gradient Descent (where you can numerically find the gradient)
after reading some articles about neural network(back-propagation) i try to write a simple neural network by myself.
ive decided XOR neural-network,
my problem is when i am trying to train the network,
if i use only one example to train the network,lets say 1,1,0(as input1,input2,targetOutput).
after 500 trains +- the network answer 0.05.
but if im trying more then one example (lets say 2 different or all the 4 possibilities) the network aims to 0.5 as output :(
i searched in google for my mistakes with no results :S
ill try to give as much details as i can to help find what wrong:
-ive tried networks with 2,2,1 and 2,4,1 (inputlayer,hiddenlayer,outputlayer).
-the output for every neural defined by:
double input = 0.0;
for (int n = 0; n < layers[i].Count; n++)
input += layers[i][n].Output * weights[n];
while 'i' is the current layer and weight are all the weights from the previous layer.
-the last layer(output layer) error is defined by:
value*(1-value)*(targetvalue-value);
while 'value' is the neural output and 'targetvalue' is the target output for the current neural.
-the error for the others neurals define by:
foreach neural in the nextlayer
sum+=neural.value*currentneural.weights[neural];
-all the weights in the network are adapt by this formula(the weight from neural -> neural 2)
weight+=LearnRate*neural.myvalue*neural2.error;
while LearnRate is the nework learning rate(defined 0.25 at my network).
-the biasweight for each neural is defined by:
bias+=LearnRate*neural.myerror*neural.Bias;
bias is const value=1.
that pretty much all i can detail,
as i said the output aim to be 0.5 with different training examples :(
thank you very very much for your help ^_^.
It is difficult to tell where the error is without seeing the complete code. One thing you should carefully check is that your calculation of the local error gradient for each unit matches the activation function you are using on that layer. Have a look here for the general formula: http://www.learnartificialneuralnetworks.com/backpropagation.html .
For instance, the calculation you do for the output layer assumes that you are using a logistic sigmoid activation function but you don't mention that in the code above so it looks like you are using a linear activation function instead.
In principle a 2-2-1 network should be enough to learn XOR although the training will sometime get trapped into a local minimum without being able to converge to the correct state. So it is important not to draw conclusion about the performance of your algorithm from a single training session. Note that simple backprog is bound to be slow, there are faster and more robust solutions like Rprop for instance.
There are books on the subject which provide detailed step-by-step calculation for a simple network (e.g. 'A.I.: A guide to intelligent systems' by Negnevitsky), this could help you debug your algorithm. An alternative would be to use an existing framework (e.g. Encog, FANN, Matlab) set up the exact same topology and initial weights and compare the calculation with your own implementation.
I record a daily 2 minutes radio broadcast from Internet. There's always the same starting and ending jingle. Since the radio broadcast exact time may vary from more or less 6 minutes I have to record around 15 minutes of radio.
I wish to identify the exact time where those jingles are in the 15 minutes record, so I can extract the portion of audio I want.
I already started a C# application where I decode an MP3 to PCM data and convert the PCM data to a spectrogram based on http://www.codeproject.com/KB/audio-video/SoundCatcher.aspx
I tried to use a Cross Correlation algorithm on the PCM data but the algorithm is very slow around 6 minutes with a step of 10ms and is some occasion it fail to find the jingle start time.
Any ideas of algorithms to compare two spectrogram for match? Or a better way to find that jingle start time?
Thanks,
Update, sorry for the delay
First, thank for all the anwsers most of them were relevent and or interresting ideas.
I tried to implement the Shazam algorithm proposed by fonzo. But failed to detect the peaks in the spectrogram. Here's three spectrograms of the starting jingle from three different records. I tried AForge.NET with the blob filter (but it failed to identify peaks), to blur the image and check for difference in height, the Laplace convolution, slope analysis, to detect the series of vertical bars (but there was too many false positive)...
In the mean while, I tried the Hough algorithm proposed by Dave Aaron Smith. Where I calculate the RMS of each columns. Yes yes each columns, it's a O(N*M) but M << N (Notice a column is around 8k of sample). So in the overall it's not that bad, still the algorithm take about 3 minutes, but has never fail.
I could go with that solution, but if possible, I would prefer the Shazam cause it's O(N) and probably much faster (and cooler also). So does any of you have an idea of an algorithm to always detect the same points in those spectrograms (doesn't have to be peaks), thanks to add a comment.
New Update
Finally, I went with the algorithm explained above, I tried to implement the Shazam algorithm, but failed to find proper peaks in the spectrogram, the identified points where not constant from one sound file to another. In theory, the Shazam algorithm is the solution for that kind of problem. The Hough algorithm proposed by Dave Aaron Smith was more stable and effective. I split around 400 files, and only 20 of them fail to split properly. Disk space when from 8GB to 1GB.
Thanks, for your help.
There's a description of the algorithm used by the shazam service (which identifies a music given a short possibly noisy sample) here : http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
From what I understood, the first thing done is to isolate peaks in the spectrogram (with some tweaks to assure an uniform coverage), which will give a "constellation" of pair of values (time;frequency) from the initial spectrogram. Once done, the sample constellation is compared to the constellation of the full track by translating a window of the sample length from the beginning to the end and counting the number of correlated points.
The paper then describes the technical solution they found to be able to do the comparison fast even with a huge collection of tracks.
I wonder if you could use a Hough transform. You would start by cataloging each step of the opening sequence. Let's say you use 10 ms steps and the opening sequence is 50 ms long. You compute some metric on each step and get
1 10 1 17 5
Now go through your audio and analyze each 10 ms step for the same metric. Call this array have_audio
8 10 8 7 5 1 10 1 17 6 2 10...
Now create a new empty array that's the same length as have_audio. Call it start_votes. It will contain "votes" for the start of the opening sequence. If you see a 1, you may be in the 1st or 3rd step of the opening sequence, so you have 1 vote for the opening sequence starting 1 step ago and 1 vote for the opening sequence starting 3 steps ago. If you see a 10, you have 1 vote for the opening sequence starting 2 steps ago, a 17 votes for 4 step ago, and so on.
So for that example have_audio, your votes will look like
2 0 0 1 0 4 0 0 0 0 0 1 ...
You have a lot of votes at position 6, so there's a good chance the opening sequence starts there.
You could improve performance by not bothering to analyze the entire opening sequence. If the opening sequence is 10 seconds long, you could just search for the first 5 seconds.
Here is a good python package that does just this:
https://code.google.com/p/py-astm/
If you are looking for a specific algorithm, good search terms to use are "accoustic fingerprinting" or "perceptual hashing".
Here's another python package that could also be used:
http://rudd-o.com/new-projects/python-audioprocessing/documentation/manuals/algorithms/butterscotch-signatures
If you already know the jingle sequence, you could analyse the correlation with the sequence instead of the cross correlation between the full 15 minutes tracks.
To quickly calculate the correlation against the (short) sequence, I would suggest using a Wiener filter.
Edit: a Wiener filter is a way to locate a signal in a sequence with noise. In this application, we are considering anything that is "not jingle" as noise (question for the reader: can we still assume that the noise is white and not correlated?).
( I found the reference I was looking for! The formulas I remembered were a little off and I'll remove them now)
The relevant page is Wiener deconvolution. The idea is that we can define a system whose impulse response h(t) has the same waveform as the jingle, and we have to locate the point in a noisy sequence where the system has received an impulse (i.e.: emitted a jingje).
Since the jingle is known, we can calculate its power spectrum H(f), and since we can assume that a single jingle appears in a recorded sequence, we can say that the unknown input x(t) has the shape of a pulse, whose power density S(f) is constant at each frequency.
Given the knowledges above, you can use the formula to obtain a "jingle-pass" filter (as in, only signals shaped like the jingle can pass) whose output is highest when the jingle is played.
My teacher asked me to create a program to solve something like:
2x plus 7y plus 2z = 76
6x plus 1y plus 4z = 26
8x plus 2y plus 18z = 1
x = ?
y = ?
z = ?
Problem is, this is literally the first two days of class and he expects us to make something like this.
Any help?
Since this is homework, I'll provide guidance, but not a complete answer...
My suggestion: Write this out on paper. How would you approach this on paper? Once you figure out the basic logic required, translating that into C# should be fairly straightforward.
You'll need to assign a variable for each portion of the equation (not just x/y/z, but also the coefficients), and just step through it in code using the same steps you do on paper.
If you know some maths, you can solve systems of equations using a matrix library (or roll your own).
I would suggest that you come up with the algorithm in pseudo-code before you touch any C#.
At least if you have defined the steps that you need to perform, the task simply becomes one of learning the syntax of C# to accomplish the steps.
Looks like you'll need a math textbook ;)
Have a go at solving this yourself on paper, but keep a note of what steps you do and try and work out what "Algorithm" you are using.
Once you've worked out your algorithm, have a go at writing some C# that does the same thing.
One more advice that can help you is that you'll need to store the equation in some data structure and then (repeatedly) run some steps that modify the data structure. The question is, which data structure can nicely represent this kind of data? If you focus just on the coefficients (since each row always has the same variable in it), you can write just:
2 7 2 76
6 1 4 26
8 2 18 1
Also, you can assume that all operations are + because "minus 7y" actually means "plus (-7)y". This looks like a 2D array, so when programming in C#, you can start by representing the equations as int[,]. Once you load the data into this data structure, you'll just need to write a method that does the operation you did on paper (in general).
Once you get the coefficients represented by a matrix (2 dimensional array), try googling "RREF" (Reduced Row Echelon Form). This is the matrix operation you will want to implement in your program in order to solve the system of equations. Good luck.