neural network-back propagation, error in training

neural network-back propagation, error in training - c#

after reading some articles about neural network(back-propagation) i try to write a simple neural network by myself.
ive decided XOR neural-network,
my problem is when i am trying to train the network,
if i use only one example to train the network,lets say 1,1,0(as input1,input2,targetOutput).
after 500 trains +- the network answer 0.05.
but if im trying more then one example (lets say 2 different or all the 4 possibilities) the network aims to 0.5 as output :(
i searched in google for my mistakes with no results :S
ill try to give as much details as i can to help find what wrong:
-ive tried networks with 2,2,1 and 2,4,1 (inputlayer,hiddenlayer,outputlayer).
-the output for every neural defined by:
double input = 0.0;
for (int n = 0; n < layers[i].Count; n++)
input += layers[i][n].Output * weights[n];
while 'i' is the current layer and weight are all the weights from the previous layer.
-the last layer(output layer) error is defined by:
value*(1-value)*(targetvalue-value);
while 'value' is the neural output and 'targetvalue' is the target output for the current neural.
-the error for the others neurals define by:
foreach neural in the nextlayer
sum+=neural.value*currentneural.weights[neural];
-all the weights in the network are adapt by this formula(the weight from neural -> neural 2)
weight+=LearnRate*neural.myvalue*neural2.error;
while LearnRate is the nework learning rate(defined 0.25 at my network).
-the biasweight for each neural is defined by:
bias+=LearnRate*neural.myerror*neural.Bias;
bias is const value=1.
that pretty much all i can detail,
as i said the output aim to be 0.5 with different training examples :(
thank you very very much for your help ^_^.

It is difficult to tell where the error is without seeing the complete code. One thing you should carefully check is that your calculation of the local error gradient for each unit matches the activation function you are using on that layer. Have a look here for the general formula: http://www.learnartificialneuralnetworks.com/backpropagation.html .
For instance, the calculation you do for the output layer assumes that you are using a logistic sigmoid activation function but you don't mention that in the code above so it looks like you are using a linear activation function instead.
In principle a 2-2-1 network should be enough to learn XOR although the training will sometime get trapped into a local minimum without being able to converge to the correct state. So it is important not to draw conclusion about the performance of your algorithm from a single training session. Note that simple backprog is bound to be slow, there are faster and more robust solutions like Rprop for instance.
There are books on the subject which provide detailed step-by-step calculation for a simple network (e.g. 'A.I.: A guide to intelligent systems' by Negnevitsky), this could help you debug your algorithm. An alternative would be to use an existing framework (e.g. Encog, FANN, Matlab) set up the exact same topology and initial weights and compare the calculation with your own implementation.

Related

How to use GloVe word embedding model in ML.net

I'm new to Machine Learning and working on my master thesis using ML.net. I'm trying use glove model to vectorise a CV text, but finding it hard to wrap my head over the process. I have the Pipeline setup as below:
var pipeline = context.Transforms.Text.NormalizeText("Text", null,
keepDiacritics: false, keepNumbers: false, keepPunctuations: false)
.Append(context.Transforms.Text.TokenizeIntoWords("Tokens", "Text"))
.Append(context.Transforms.Text.RemoveDefaultStopWords("WordsWithoutStopWords", "Tokens", Microsoft.ML.Transforms.Text.StopWordsRemovingEstimator.Language.English))
.Append(context.Transforms.Text.ApplyWordEmbedding("Features", "WordsWithoutStopWords",
Microsoft.ML.Transforms.Text.WordEmbeddingEstimator.PretrainedModelKind.GloVe300D));
var embeddingTransformer = pipeline.Fit(emptyData);
var predictionEngine = context.Model.CreatePredictionEngine<Input,Output>(embeddingTransformer);
var data = new Input { Text = TextExtractor.Extract("/attachments/CV6.docx")};
var prediction = predictionEngine.Predict(data);
Console.WriteLine($"Number of features: {prediction.Features.Length}");
Console.WriteLine("Features: ");
foreach(var feature in prediction.Features)
{
Console.Write($"{feature} ");
}
Console.WriteLine(Environment.NewLine);
From what I've studied about vectorization, each word in the document should be converted into vector, but when I'm printing the features, I can see 900 features getting printed. Can someone explain how this works? There are very less examples and tutorials available about ML.net on internet.

The vector of 900 features coming the WordEmbeddingEstimator is the min/max/average of the individual word embeddings in your phrase. Each of the min/max/average are 300 dimensional for the GloVe 300D model, giving 900 total.
The min/max gives the bounding hyper-rectangle for the words in your phrase. The average gives the standard phrase embedding.
See: https://github.com/dotnet/machinelearning/blob/d1bf42551f0f47b220102f02de6b6c702e90b2e1/src/Microsoft.ML.Transforms/Text/WordEmbeddingsExtractor.cs#L748-L752

GloVe is short for Global Vectorization.
GloVe is an unsupervised (no human labeling of the of the training set) learning method. The vectors associated with each word are generally derived from each word's proximity with others in sentences.
Once you have trained your network (presumably on a much larger data set than a single CV/resume) then you can make interesting comparisons between words based on their absolute and relative "positions" in the vector space. A much less computationally expensive way of developing a network to analyze e.g. documents is to download a pre-trained dataset. I'm sure you've found this page (https://nlp.stanford.edu/projects/glove/) which, among other things, will allow you to access pre-trained word embeddings/vectorizations.
Final thoughts: I'm sorry if all of this is redundant information for you, especially if this really turns out to be a ML.net framework syntax question. I don't know exactly what your goal is but
900 dimensions seems like an awful lot for looking at CV's. Maybe this is an ML.net default? I suspect that 300-500 will be more than adequate. See what the pre-trained data sets provide.
If you only intend to train your network from zero on a single CV, then this method is going to be wholly inadequate.
Your best approach is likely to be a sort of transfer learning approach where you obtain a liberally licensed network that has been pre-trained on a massive data set in your language of interest (usually easy for academic work). Then perform additional training using a smaller, targeted group of training-only CV's to add any specialized words to the 'vocabulary' of your network. Then perform your experimentation and analysis on a set of test CV's, which have never been used to train the network.

Neural Network Random Seed Affecting Results

I was playing around with the code from the interesting article on time-series regression by James McCaffrey (download).
This essentially uses machine learning to generate a prediction and forecast of the given airline data.
This is my graph generated using the code and data from his article. As you can see, everything appears to be working as normal.
The problem occurs when I attempt to mess with the random variable. He specifically seeds the System.Random object with 0 as seen here: this.rnd = new System.Random(0); (in the NeuralNetwork constructor). The program only uses the rnd variable when it is assigning the initial weights of the network and when it randomizes the order of data to process. The seed should be independent of the data (i.e. the order processed and random weights assigned should not affect the results).
However, observe what happens when I change only the line this.rnd = new System.Random(0); to this.rnd = new System.Random(1);. Here I've done nothing else except seed the System.Random object with 1 instead of 0. Now look at the results:
It is still able to learn and predict the data, however, the forecast is completely wrong! Why does changing the seed have such a significant effect on the results? In theory it shouldn't matter which order data is processed or what the starting weights are, as that's the point of the network, to change the bias until it reaches the solution. Is there something I'm missing?

I may be a little late to the party, but let me contribute.
With any prediction task we need to distinguish between interpolation and extrapolation.
Neural Networks are function approximators and have the capacity to fit training data very well. If they are not overfitted then they will perform well in the interpolation task (that is predicting on data points that are very close to the observed distributions in the training set).
When it comes to extrapolation (prediction outside of the seen distributions of the dataset) their predictions can be more varied and dependent on their initialization. The reason is that there are always weights in a neural network that are not used for prediction within distribution and for the other weights there is stochasticity to where you'll end up during training. These two factors attribute to some randomness to your predictions. The way you can think of it is the following - the more you're outside of the observed training distribution, the more these random factors affect your prediction. So in your case when predicting outside of the observed scope the random seed starts playing a bigger role.
You can see an example of this in the image. Blue and Orange dots represent the train/test data used to train an ensemble of NNs. The Green dots are from the same function but have been 'hidden'. Each line represents the predictions of one of the NNs in this ensemble. You can observe how the lines are very close to each other in the regions they have seen training data and they become more varied when they are outside of them. This variance is possibly what you're experiencing on your side and is a metric of the unsertainty of the prediction. It's important to note though that even with the uncertainty range evaluated this does not mean predictions (or mean of predictions of the ensemble) are close to reality, so this is not a measure of the error or potential error.
Ensemble NNs extrapolation predictions
Here are two papers for reference:
Xu et al., “How Neural Networks Extrapolate: from Feedforward to Graph Neural Networks“, ICLR’21;
Madras et al., “Detecting Extrapolation with Local Ensembles”, ICLR’20.

What Algorithm can i use to find any valid result depending on variable integer inputs

In my project i face a scenario where i have a function with numerous inputs. At a certain point i am provided with an result and i need to find one combination of inputs that generates that result.
Here is some pseudocode that illustrates the problem:
Double y = f(x_0,..., x_n)
I am provided with y and i need to find any combination that fits the input.
I tried several things on paper that could generate something, but my each parameter has a range of 6.5 x 10^9 possible values - so i would like to get an optimal execution time.
Can someone name an algorithm or a topic that will be useful for me so i can read up on how other people solved simmilar problems.
I was thinking along the lines of creating a vector from the inputs and judjing how good that vektor fits the problem. This sounds awful lot like an NN, but there is no training phase available.
Edit:
Thank you all for the feedback. The comments sum up the Problems i have and i will try something along the lines of hill climbing.

The general case for your problem might be impossible to solve, but for some cases there are numerical methods that can help you solve your problem.
For example, in 1D space, if you can find a number that is smaller then y and one that is higher then y - you can use the numerical method regula-falsi in order to numerically find the "root" (which is y in your case, by simply invoking the method onf(x) -y).
Other numerical method to find roots is newton-raphson
I admit, I am not familiar with how to apply these methods on multi dimensional space - but it could be a starter. I'd search the literature for these if I were you.
Note: using such a method almost always requires some knowledge on the function.
Another possible solution is to take g(X) = |f(X) - y)|, and use some heuristical algorithms in order to find a minimal value of g. The problem with heuristical methods is they will get you "close enough" - but seldom will get you exactly to the target (unless the function is convex)
Some optimizations algorithms are: Genethic Algorithm, Hill Climbing, Gradient Descent (where you can numerically find the gradient)

Questions about the Backpropogation Algorithm

I have a few questions concerning backpropogation. I'm trying to learn the fundamentals behind neural network theory and wanted to start small, building a simple XOR classifier. I've read a lot of articles and skimmed multiple textbooks - but I can't seem to teach this thing the pattern for XOR.
Firstly, I am unclear about the learning model for backpropogation. Here is some pseudo-code to represent how I am trying to train the network. [Lets assume my network is setup properly (ie: multiple inputs connect to a hidden layer connect to an output layer and all wired up properly)].
SET guess = getNetworkOutput() // Note this is using a sigmoid activation function.
SET error = desiredOutput - guess
SET delta = learningConstant * error * sigmoidDerivative(guess)
For Each Node in inputNodes
For Each Weight in inputNodes[n]
inputNodes[n].weight[j] += delta;
// At this point, I am assuming the first layer has been trained.
// Then I recurse a similar function over the hidden layer and output layer.
// The prime difference being that it further divi's up the adjustment delta.
I realize this is probably not enough to go off of, and I will gladly expound on any part of my implementation. Using the above algorithm, my neural network does get trained, kind of. But not properly. The output is always
XOR 1 1 [smallest number]
XOR 0 0 [largest number]
XOR 1 0 [medium number]
XOR 0 1 [medium number]
I can never train the [1,1] [0,0] to be the same value.
If you have any suggestions, additional resources, articles, blogs, etc for me to look at I am very interested in learning more about this topic. Thank you for your assistance, I appreciate it greatly!

Ok. First of all. Backpropagation as it states work from back. From output through all hidden layers up to input layer. The error which is counted in last layer is "propagated" to all previous ones. So lets assume you have model of type: input - 1 hidden layer - output. In first step you count error from desired value and one you have. Then you do backprop on weights between hidden and output. And after that you do backprop for weights between input and hidden. In each step you backprop error from next to previous layer, simple. But maths can be confusing ;) Please take look at his short chapter for further explanation: http://page.mi.fu-berlin.de/rojas/neural/chapter/K7.pdf

How can I solve this three variable equation with C#?

My teacher asked me to create a program to solve something like:
2x plus 7y plus 2z = 76
6x plus 1y plus 4z = 26
8x plus 2y plus 18z = 1
x = ?
y = ?
z = ?
Problem is, this is literally the first two days of class and he expects us to make something like this.
Any help?

Since this is homework, I'll provide guidance, but not a complete answer...
My suggestion: Write this out on paper. How would you approach this on paper? Once you figure out the basic logic required, translating that into C# should be fairly straightforward.
You'll need to assign a variable for each portion of the equation (not just x/y/z, but also the coefficients), and just step through it in code using the same steps you do on paper.

If you know some maths, you can solve systems of equations using a matrix library (or roll your own).

I would suggest that you come up with the algorithm in pseudo-code before you touch any C#.
At least if you have defined the steps that you need to perform, the task simply becomes one of learning the syntax of C# to accomplish the steps.
Looks like you'll need a math textbook ;)

Have a go at solving this yourself on paper, but keep a note of what steps you do and try and work out what "Algorithm" you are using.
Once you've worked out your algorithm, have a go at writing some C# that does the same thing.

One more advice that can help you is that you'll need to store the equation in some data structure and then (repeatedly) run some steps that modify the data structure. The question is, which data structure can nicely represent this kind of data? If you focus just on the coefficients (since each row always has the same variable in it), you can write just:
2 7 2 76
6 1 4 26
8 2 18 1
Also, you can assume that all operations are + because "minus 7y" actually means "plus (-7)y". This looks like a 2D array, so when programming in C#, you can start by representing the equations as int[,]. Once you load the data into this data structure, you'll just need to write a method that does the operation you did on paper (in general).

Once you get the coefficients represented by a matrix (2 dimensional array), try googling "RREF" (Reduced Row Echelon Form). This is the matrix operation you will want to implement in your program in order to solve the system of equations. Good luck.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.