Neural Network Random Seed Affecting Results

Neural Network Random Seed Affecting Results - c#

I was playing around with the code from the interesting article on time-series regression by James McCaffrey (download).
This essentially uses machine learning to generate a prediction and forecast of the given airline data.
This is my graph generated using the code and data from his article. As you can see, everything appears to be working as normal.
The problem occurs when I attempt to mess with the random variable. He specifically seeds the System.Random object with 0 as seen here: this.rnd = new System.Random(0); (in the NeuralNetwork constructor). The program only uses the rnd variable when it is assigning the initial weights of the network and when it randomizes the order of data to process. The seed should be independent of the data (i.e. the order processed and random weights assigned should not affect the results).
However, observe what happens when I change only the line this.rnd = new System.Random(0); to this.rnd = new System.Random(1);. Here I've done nothing else except seed the System.Random object with 1 instead of 0. Now look at the results:
It is still able to learn and predict the data, however, the forecast is completely wrong! Why does changing the seed have such a significant effect on the results? In theory it shouldn't matter which order data is processed or what the starting weights are, as that's the point of the network, to change the bias until it reaches the solution. Is there something I'm missing?

I may be a little late to the party, but let me contribute.
With any prediction task we need to distinguish between interpolation and extrapolation.
Neural Networks are function approximators and have the capacity to fit training data very well. If they are not overfitted then they will perform well in the interpolation task (that is predicting on data points that are very close to the observed distributions in the training set).
When it comes to extrapolation (prediction outside of the seen distributions of the dataset) their predictions can be more varied and dependent on their initialization. The reason is that there are always weights in a neural network that are not used for prediction within distribution and for the other weights there is stochasticity to where you'll end up during training. These two factors attribute to some randomness to your predictions. The way you can think of it is the following - the more you're outside of the observed training distribution, the more these random factors affect your prediction. So in your case when predicting outside of the observed scope the random seed starts playing a bigger role.
You can see an example of this in the image. Blue and Orange dots represent the train/test data used to train an ensemble of NNs. The Green dots are from the same function but have been 'hidden'. Each line represents the predictions of one of the NNs in this ensemble. You can observe how the lines are very close to each other in the regions they have seen training data and they become more varied when they are outside of them. This variance is possibly what you're experiencing on your side and is a metric of the unsertainty of the prediction. It's important to note though that even with the uncertainty range evaluated this does not mean predictions (or mean of predictions of the ensemble) are close to reality, so this is not a measure of the error or potential error.
Ensemble NNs extrapolation predictions
Here are two papers for reference:
Xu et al., “How Neural Networks Extrapolate: from Feedforward to Graph Neural Networks“, ICLR’21;
Madras et al., “Detecting Extrapolation with Local Ensembles”, ICLR’20.

Related

How to use GloVe word embedding model in ML.net

I'm new to Machine Learning and working on my master thesis using ML.net. I'm trying use glove model to vectorise a CV text, but finding it hard to wrap my head over the process. I have the Pipeline setup as below:
var pipeline = context.Transforms.Text.NormalizeText("Text", null,
keepDiacritics: false, keepNumbers: false, keepPunctuations: false)
.Append(context.Transforms.Text.TokenizeIntoWords("Tokens", "Text"))
.Append(context.Transforms.Text.RemoveDefaultStopWords("WordsWithoutStopWords", "Tokens", Microsoft.ML.Transforms.Text.StopWordsRemovingEstimator.Language.English))
.Append(context.Transforms.Text.ApplyWordEmbedding("Features", "WordsWithoutStopWords",
Microsoft.ML.Transforms.Text.WordEmbeddingEstimator.PretrainedModelKind.GloVe300D));
var embeddingTransformer = pipeline.Fit(emptyData);
var predictionEngine = context.Model.CreatePredictionEngine<Input,Output>(embeddingTransformer);
var data = new Input { Text = TextExtractor.Extract("/attachments/CV6.docx")};
var prediction = predictionEngine.Predict(data);
Console.WriteLine($"Number of features: {prediction.Features.Length}");
Console.WriteLine("Features: ");
foreach(var feature in prediction.Features)
{
Console.Write($"{feature} ");
}
Console.WriteLine(Environment.NewLine);
From what I've studied about vectorization, each word in the document should be converted into vector, but when I'm printing the features, I can see 900 features getting printed. Can someone explain how this works? There are very less examples and tutorials available about ML.net on internet.

The vector of 900 features coming the WordEmbeddingEstimator is the min/max/average of the individual word embeddings in your phrase. Each of the min/max/average are 300 dimensional for the GloVe 300D model, giving 900 total.
The min/max gives the bounding hyper-rectangle for the words in your phrase. The average gives the standard phrase embedding.
See: https://github.com/dotnet/machinelearning/blob/d1bf42551f0f47b220102f02de6b6c702e90b2e1/src/Microsoft.ML.Transforms/Text/WordEmbeddingsExtractor.cs#L748-L752

GloVe is short for Global Vectorization.
GloVe is an unsupervised (no human labeling of the of the training set) learning method. The vectors associated with each word are generally derived from each word's proximity with others in sentences.
Once you have trained your network (presumably on a much larger data set than a single CV/resume) then you can make interesting comparisons between words based on their absolute and relative "positions" in the vector space. A much less computationally expensive way of developing a network to analyze e.g. documents is to download a pre-trained dataset. I'm sure you've found this page (https://nlp.stanford.edu/projects/glove/) which, among other things, will allow you to access pre-trained word embeddings/vectorizations.
Final thoughts: I'm sorry if all of this is redundant information for you, especially if this really turns out to be a ML.net framework syntax question. I don't know exactly what your goal is but
900 dimensions seems like an awful lot for looking at CV's. Maybe this is an ML.net default? I suspect that 300-500 will be more than adequate. See what the pre-trained data sets provide.
If you only intend to train your network from zero on a single CV, then this method is going to be wholly inadequate.
Your best approach is likely to be a sort of transfer learning approach where you obtain a liberally licensed network that has been pre-trained on a massive data set in your language of interest (usually easy for academic work). Then perform additional training using a smaller, targeted group of training-only CV's to add any specialized words to the 'vocabulary' of your network. Then perform your experimentation and analysis on a set of test CV's, which have never been used to train the network.

What is the formula to calculate a QR Code's maximum data?

I've Google'd and read quite a bit on QR codes and the maximum data that can be used based on the various settings, all of it being in tabular format. I can't seem to find anything giving a formula or a proper explanation of how these values are calculated.
What I would like to do is this:
Present the user with a form, allowing them to choose Format, EC & Version.
Then they can type in some data and generate a QR code.
Done deal. That part is easy.
The addition I would like to include is a "remaining character count" so that they (the user) can see how much more data they can type in, as well as what effect the properties have on the storage capacity of the QR code.
Does anyone know where I can find the formula(s)? Or do I need to purchase ISO 18004:2006?

A formula to calculate the amount of data you could put in a QRcode would be quite complex to make, not mentioning it would need some approximations for the calculation to be possible. The formula would have to calculate the amount of modules dedicated to the data in your QRCode based on its version, and then calculate how many codewords (which are sets of 8 modules) will be used for the error correction.
To calculate the amount of modules that will be used for the data, you need to know how many modules will be used for the function patterns. While this is not a problem for the three finder patterns, the timing or the version/format information, there will be a problem with the alignment patterns as their number is dependent on the QRCode's version, meaning you anyway would have to use a table at that point.
For the second part, I have to say I don't know how to calculate the number of error correcting codewords based on the correction capacity. For some reason, there are more error correcting codewords used that there should to match the error correction capacity, as for example a 6-H QRCode can correct up to 32.6% of the data, instead of the 30% set by the H correction level.
In any case, as you can see a formula would be quite complex to implement. Using a table like already suggested is probably the best thing you could do.

I wrote the original AIM specification for QR Code back in the '90s for Denso Corporation, and was also project editor for both editions of the ISO/IEC 18004 standard. It was felt to be much easier for people producing code printing software to use a look-up table rather than calculate capacities from a formula - no easy job as there are several independent variables that have to be taken into account iteratively when parsing the text to be encoded to minimise its length in bits, in order to achieve the smallest symbol. The most crucial factor is the mix of characters in the data, the sequence and lengths of sub-strings of numeric, alphanumeric, Kanji data, with the overhead needed to signal each change of character set, then the required level of error correction. I did produce a guidance section for this which is contained in the ISO standard.

The storage is calculated by the QR mode and the version/type that you are using. More specifically the calculation is based on how 'compressible' the characters are and what algorithm that the qr generator is allowed to use on the content present.
More information can be found http://en.wikipedia.org/wiki/QR_code#Storage

neural network-back propagation, error in training

after reading some articles about neural network(back-propagation) i try to write a simple neural network by myself.
ive decided XOR neural-network,
my problem is when i am trying to train the network,
if i use only one example to train the network,lets say 1,1,0(as input1,input2,targetOutput).
after 500 trains +- the network answer 0.05.
but if im trying more then one example (lets say 2 different or all the 4 possibilities) the network aims to 0.5 as output :(
i searched in google for my mistakes with no results :S
ill try to give as much details as i can to help find what wrong:
-ive tried networks with 2,2,1 and 2,4,1 (inputlayer,hiddenlayer,outputlayer).
-the output for every neural defined by:
double input = 0.0;
for (int n = 0; n < layers[i].Count; n++)
input += layers[i][n].Output * weights[n];
while 'i' is the current layer and weight are all the weights from the previous layer.
-the last layer(output layer) error is defined by:
value*(1-value)*(targetvalue-value);
while 'value' is the neural output and 'targetvalue' is the target output for the current neural.
-the error for the others neurals define by:
foreach neural in the nextlayer
sum+=neural.value*currentneural.weights[neural];
-all the weights in the network are adapt by this formula(the weight from neural -> neural 2)
weight+=LearnRate*neural.myvalue*neural2.error;
while LearnRate is the nework learning rate(defined 0.25 at my network).
-the biasweight for each neural is defined by:
bias+=LearnRate*neural.myerror*neural.Bias;
bias is const value=1.
that pretty much all i can detail,
as i said the output aim to be 0.5 with different training examples :(
thank you very very much for your help ^_^.

It is difficult to tell where the error is without seeing the complete code. One thing you should carefully check is that your calculation of the local error gradient for each unit matches the activation function you are using on that layer. Have a look here for the general formula: http://www.learnartificialneuralnetworks.com/backpropagation.html .
For instance, the calculation you do for the output layer assumes that you are using a logistic sigmoid activation function but you don't mention that in the code above so it looks like you are using a linear activation function instead.
In principle a 2-2-1 network should be enough to learn XOR although the training will sometime get trapped into a local minimum without being able to converge to the correct state. So it is important not to draw conclusion about the performance of your algorithm from a single training session. Note that simple backprog is bound to be slow, there are faster and more robust solutions like Rprop for instance.
There are books on the subject which provide detailed step-by-step calculation for a simple network (e.g. 'A.I.: A guide to intelligent systems' by Negnevitsky), this could help you debug your algorithm. An alternative would be to use an existing framework (e.g. Encog, FANN, Matlab) set up the exact same topology and initial weights and compare the calculation with your own implementation.

How is a random number generated at runtime?

Since computers cannot pick random numbers(can they?) how is this random number actually generated. For example in C# we say,
Random.Next()
What happens inside?

You may checkout this article. According to the documentation the specific implementation used in .NET is based on Donald E. Knuth's subtractive random number generator algorithm. For more information, see D. E. Knuth. "The Art of Computer Programming, volume 2: Seminumerical Algorithms". Addison-Wesley, Reading, MA, second edition, 1981.

Since computers cannot pick random numbers (can they?)
As others have noted, "Random" is actually pseudo-random. To answer your parenthetical question: yes, computers can pick truly random numbers. Doing so is much more expensive than the simple integer arithmetic of a pseudo-random number generator, and usually not required. However there are applications where you must have non-predictable true randomness: cryptography and online poker immediately come to mind. If either use a predictable source of pseudo-randomness then attackers can decrypt/forge messages much more easily, and cheaters can figure out who has what in their hands.
The .NET crypto classes have methods that give random numbers suitable for cryptography or games where money is on the line. As for how they work: the literature on crypto-strength randomness is extensive; consult any good university undergrad textbook on cryptography for details.
Specialty hardware also exists to get random bits. If you need random numbers that are drawn from atmospheric noise, see www.random.org.

Knuth covers the topic of randomness very well.
We don't really understand random well. How can something predictable be random? And yet pseudo-random sequences can appear to be perfectly random by statistical tests.
There are three categories of Random generators, amplifying on the comment above.
First, you have pseudo random number generators where if you know the current random number, it's easy to compute the next one. This makes it easy to reverse engineer other numbers if you find out a few.
Then, there are cryptographic algorithms that make this much harder. I believe they still are pseudo random sequences (contrary to what the comment above implies), but with the very important property that knowing a few numbers in the sequence does NOT make it obvious how to compute the rest. The way it works is that crypto routines tend to hash up the number, so that if one bit changes, every bit is equally likely to change as a result.
Consider a simple modulo generator (similar to some implementations in C rand() )
int rand() {
return seed = seed * m + a;
}
if m=0 and a=0, this is a lousy generator with period 1: 0, 0, 0, 0, ....
if m=1 and a=1, it's also not very random looking: 0, 1, 2, 3, 4, 5, 6, ...
But if you pick m and a to be prime numbers around 2^16, this will jump around nicely looking very random if you are casually inspecting. But because both numbers are odd, you would see that the low bit would toggle, ie the number is alternately odd and even. Not a great random number generator. And since there are only 2^32 values in a 32 bit number, by definition after 2^32 iterations at most, you will repeat the sequence again, making it obvious that the generator is NOT random.
If you think of the middle bits as nice and scrambled, while the lower ones aren't as random, then you can construct a better random number generator out of a few of these, with the various bits XORed together so that all the bits are covered well. Something like:
(rand1() >> 8) ^ rand2() ^ (rand3() > 5) ...
Still, every number is flipping in synch, which makes this predictable. And if you get two sequential values they are correlated, so that if you plot them you will get lines on your screen. Now imagine you have rules combining the generators, so that sequential values are not the next ones.
For example
v1 = rand1() >> 8 ^ rand2() ...
v2 = rand2() >> 8 ^ rand5() ..
and imagine that the seeds don't always advance. Now you're starting to make something that's much harder to predict based on reverse engineering, and the sequence is longer.
For example, if you compute rand1() every time, but only advance the seed in rand2() every 3rd time, a generator combining them might not repeat for far longer than the period of either one.
Now imagine that you pump your (fairly predictable) modulo-type random number generator through DES or some other encryption algorithm. That will scramble up the bits.
Obviously, there are better algorithms, but this gives you an idea. Numerical Recipes has a lot of algorithms implemented in code and explained. One very good trick: generate not one but a block of random values in a table. Then use an independent random number generator to pick one of the generated numbers, generate a new one and replace it. This breaks up any correlation between adjacent pairs of numbers.
The third category is actual hardware-based random number generators, for example based on atmospheric noise
http://www.random.org/randomness/
This is, according to current science, truly random. Perhaps someday we will discover that it obeys some underlying rule, but currently, we cannot predict these values, and they are "truly" random as far as we are concerned.
The boost library has excellent C++ implementations of Fibonacci generators, the reigning kings of pseudo-random sequences if you want to see some source code.

I'll just add an answer to the first part of the question (the "can they?" part).h
Computers can generate (well, generate may not be an entirely accurate word) random numbers (as in, not pseudo-random). Specifically, by using environmental randomness which is gotten through specialized hardware devices (that generates randomness based on noise, for e.g.) or by using environmental inputs (e.g. hard disk timings, user input event timings).
However, that has no bearing on the second question (which was how Random.Next() works).

The Random class is a pseudo-random number generator.
It is basically an extremely long but deterministic repeating sequence. The "randomness" comes from starting at different positions. Specifying where to start is done by choosing a seed for the random number generator and can for example be done by using the system time or by getting a random seed from another random source. The default Random constructor uses the system time as a seed.
The actual algorithm used to generate the sequence of numbers is documented in MSDN:
The current implementation of the Random class is based on Donald E. Knuth's subtractive random number generator algorithm. For more information, see D. E. Knuth. "The Art of Computer Programming, volume 2: Seminumerical Algorithms". Addison-Wesley, Reading, MA, second edition, 1981.

Computers use pseudorandom number generators. Essentially, they work by start with a seed number and iterating it through an algorithm each time a new pseudorandom number is required.
The process is of course entirely deterministic, so a given seed will generate exactly the same sequence of numbers every time it is used, but the numbers generated form a statistically uniform distribution (approximately), and this is fine, since in most scenarios all you need is stochastic randomness.
The usual practice is to use the current system time as a seed, though if more security is required, "entropy" may be gathered from a physical source such as disk latency in order to generate a seed that is more difficult to predict. In this case, you'd also want to use a cryptographically strong random number generator such as this.

I don't know much details but what I know is that a seed is used in order to generate the random numbers it is then based on some algorithm that uses that seed that a new number is obtained.
If you get random numbers based on the same seed they will be the same often.

How can I generate truly (not pseudo) random numbers with C#?

I know that the Random class can generate pseudo-random numbers but is there a way to generate truly random numbers?

The answer here has two main sides to it. There are some quite important subtleties to which you should pay due attention...
The Easy Way (for simplicity & practicality)
The RNGCryptoServiceProvider, which is part of the Crypto API in the BCL, should do the job for you. It's still technically a pseudo-random number generated, but the quality of "randomness" is much higher - suitable for cryptographic purposes, as the name might suggest.
There are other crypographic APIs with high quality pseudo random generaters available too. Algorithms such as the Mersenne twister are quite popular.
Comparing this to the Random class in the BCL, it is significantly better. If you plot the numbers generated by Random on a graph, for example, you should be able to recognise patterns, which is a strong sign of weakness. This is largely due to the fact that the algorithm simply uses a seeded lookup table of fixed size.
The Hard Way (for high quality theoretical randomness)
To generate truly random numbers, you need to make use of some natural phenomenon, such as nuclear decay, microscopic temperature fluctuations (CPU temperature is a comparatively conveient source), to name a few. This however is much more difficult and requires additional hardware, of course. I suspect the practical solution (RNGCryptoServiceProvider or such) should do the job perfectly well for you.
Now, note that if you really do require truly random numbers, you could use a service such as Random.org, which generates numbers with very high randomness/entropy (based on atmospheric noise). Data is freely available for download. This may nonetheless be unnecessarily complicated for your situation, although it certainly gives you data suitable for scientific study and whatnot.
The choice is yours in the end, but at least you should now be able to make an informative decision, being aware of the various types and levels of RNGs.

short answer: It is not directly possible to generate TRULY RANDOM NUMBERS using only C# (i.e. using only a purely mathematical construction).
long(er) answer: Only by means of employing an external device capable of generating "randomness" such as a white noise generator or similar - and capturing the output of that device as a seed for a pseudo random number generator (PRG). That part could be accomplished using C#.

True random numbers can only be generated if there is a truly random physical input device that provides the seed for the random function.
Whether anything physical and truly random exists is still debated (and likely will be for a long time) by the science community.
Psuedo-random number generators are the next best thing and the best are very difficult to predict.

As John von Neumann joked, "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin."

The thread is old and answered, but i thought I'd proceed anyway. It's for completeness and people should know some things about Random in c#.
As for truly random, the best you can ever hope to do is use a "secure Pseudo Random Generator" like salsa20 or RC4 (sort of, sometimes). They pass a barrage of tests where "efficient" adversaries try to distinguish them from random. This comes with certain costs and is probably unnecessary for most uses.
The random class in c# is pretty good most of the time, it has a statically distribution that looks random. However the default seed for random() is the system time. So if you take lots of randoms at the "same time" they are taken with the same seed and will be the same ("random" is completely deterministic, don't let it fool you). Similar system time seeds also may produce similar numbers because of random class's shortcomings.
The way to deal with this is to set you own seeds, like
Random random = new Random((int)DateTime.Now.Ticks & (0x0000FFFF + x));
where x is some value you increment if you've created a loop to get a bunch of random numbers, say.
Also with c# random extensionsto your new variable like NextDouble() can be helpful in manipulating the random numbers, in this case crow-baring them into interval (0,1) to become unif(0,1), which happens is a distribution you can plug into stat formulas to create all the distributions in statistics.

Take a look at using an algorithm like Yarrow or Fortuna with entropy accumulation. The point with these algorithms is that they keep track of entropy as a measure of theoretical information content available for predicting future numbers by knowing the past numbers and the algorithms used to produce them; and they use cryptographic techniques to fold new entropy sources into the number generator.
You'll still need an external source of random data (e.g. hardware source of random numbers), whether that's time of keystrokes, or mouse movement, or hard disk access times, or CPU temperature, or webcam data, or stock prices, or whatever -- but in any case, you keep mixing this information into the entropy pools, so that even if the truly random data is slow or low quality, it's enough to keep things going in an unpredictable fashion.

I was debating building a random number generator based off twitter or one of the other social networking sites. Basically use the api to pull recent posts and then use that to seed a high quality pseudo random number generator. It probably isn't any more effective than randomizing off the timer but seemed like fun. Besides it seems like the best use for most of the stuff people post to twitter.

I always liked this idea, for the retro 60s look:
Lavarand

There is no "true" random in computers, everything is based on something else. For some (feasible) ways to generate pseudorandom data, try something such as a pool of the HD temp, CPU temp, network usage (packets/second) and possibly hits/second to the webserver.

Just to clarify everyone saying that there is no True RNG available in C# or on your computer is mistaken. A multi-core processor is inherently a True RNG. Very simply by taking advantage of processor spin you can generate bools that have no discernible pattern. From there you can generate whatever number range you want by using the bools as bits and constructing the number by adding the bits together.
Yes this is magnitudes slower than a purely mathematical solution but a purely mathematical solution will always have a pattern.
public static bool GenerateBoolean()
{
var gen1 = 0;
var gen2 = 0;
Task.Run(() =>
{
while (gen1 < 1 || gen2 < 1)
Interlocked.Increment(ref gen1);
});
while (gen1 < 1 || gen2 < 1)
Interlocked.Increment(ref gen2);
return (gen1 + gen2) % 2 == 0;
}

There is no way to generate truly random numbers with a computer. True randomness requires an external source that monitors some natural phenomenon.
That said, if you don't have access to such a source of truly random numbers you could use a "poor man's" process like this:
Create a long array (10000 or more items?) of numbers
Populate the array with current time-seeded random numbers the standard way
When a random number is required, generate a random index into the array and return the number contained at that position
Create a new, current time-seeded random number at the array index to replace the number used
This two-step process should improve the randomness of your results somewhat without the need for external input.
Here's a sample library that implements the above-described algorithm in C++: http://www.boost.org/doc/libs/1_39_0/libs/random/random-generators.html

This code will return you a random number between min and max:
private static readonly Random random = new Random();
private static readonly object syncLock = new object();
public int RandomNumber(int min, int max)
{
lock (syncLock)
{ // synchronize
return random.Next(min, max);
}
}
Usage:
int randomNumber = RandomNumber(0, 10); // a random number between 1 and 10

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.