Why are the random numbers not properly scattered? - c#

My task is to implement a random number generator using LCG algorithm.
The task is to generate 1000 axes (x,y) between [-1, 1] and print them on a pane.
If the point is inside the circle of radius 1.0, it will be printed as Red.
Otherwise, Blue.
I used the parameters suggested by Numerical Recipes suggested in this YouTube video. I am following the coding style used in this link.
I am using ZedGraph to show my plots.
Why are the random numbers not properly scattered on the pane?
And, where are the blue points?
Random number generator class:
class MyRandom
{
long m = 4294967296;// modulus
long a = 1664525; // multiplier
long c = 1013904223; // increment
public long nextRandomInt(long seed)
{
return (((a * seed + c) % m));
}
private double nextRandomDouble(long seed)
{
return (2 * (nextRandomInt(seed) / m)) - 1;
}
public double nextRandomDouble(double seed)
{
double new_seed = seed + 1.0;
new_seed = new_seed / 2.0;
new_seed = new_seed * m;
long long_seed = Convert.ToInt64(new_seed);
double new_s = nextRandomInt(long_seed);
new_s = new_s / m;
new_s = new_s * 2;
new_s = new_s - 1;
return new_s;
}
}
Output
Additional Source Code:
Driver Program:
class Program
{
static void Main(string[] args)
{
int N = 1000;
double radius = 1.0;
List<double> rx = new List<double>(); rx.Add(0.0);
List<double> ry = new List<double>(); ry.Add(1.0);
MyRandom r = new MyRandom();
for (int i = 0; i < N; i++)
{
double x = r.nextRandomDouble(rx[rx.Count - 1]);
double y = r.nextRandomDouble(ry[ry.Count - 1]);
rx.Add(x);
ry.Add(y);
}
PlotForm form = new PlotForm();
ZedGraphControl zgControl = form.ZedGrapgControl;
//// get a reference to the GraphPane
GraphPane gPane = zgControl.GraphPane;
gPane.Title.Text = "Random Numbers";
gPane.XAxis.Type = AxisType.Linear;
PointPairList insideCircleList = new PointPairList();
PointPairList outsideCircleList = new PointPairList();
for (int i = 0; i < N; i++)
{
double x = rx[i];
double y = ry[i];
if ((x * x + y * y) < radius)
{
insideCircleList.Add(x, y);
}
else
{
outsideCircleList.Add(x, y);
}
}
LineItem redCurve = gPane.AddCurve("Inside", insideCircleList, Color.Red, SymbolType.Circle);
redCurve.Line.IsVisible = false;
redCurve.Symbol.Fill.Type = FillType.Solid;
LineItem blueCurve = gPane.AddCurve("Outside", outsideCircleList, Color.Blue, SymbolType.Circle);
blueCurve.Line.IsVisible = false;
zgControl.AxisChange();
form.ShowDialog();
Console.ReadLine();
}
}
WinForms Code:
public partial class PlotForm : Form
{
public ZedGraph.ZedGraphControl ZedGrapgControl { get; set; }
public PlotForm()
{
InitializeComponent();
ZedGrapgControl = this.zgc;
}
}

seed should not be a parameter of the random function, but a field within a random class. You set it once, and it changes with every random call.
But to answer your question, you don't save new_seed anywhere. It has to be saved so that it can be used in the next random call. So, your seed is just incremented by one in every new call, and this makes the graph a straight line.

Try using regular C# Random class (https://learn.microsoft.com/en-us/dotnet/api/system.random?view=netframework-4.8), it doesn't seem worth it to "roll your own" in this case.
I think this will be good enough for your purpose.
The current implementation of the Random class is based on a modified
version of Donald E. Knuth's subtractive random number generator
algorithm.
https://learn.microsoft.com/en-us/dotnet/api/system.random?view=netframework-4.8#remarks
If that doesn't meet your requirements you can look into:
https://learn.microsoft.com/en-us/dotnet/api/system.security.cryptography.rngcryptoserviceprovider?view=netframework-4.8

Related

Rotate 2D points using System.Numerics.Vectors

I'm looking to optimize a program that is basing a lot of its calculations on the rotation of a lot of 2D Points. I've search around to see if it's possible to do these calculations using SIMD in C#.
I found a c++ answer here that seems to do what I want, but I can't seem to translate this into C# using the System.Numerics.Vectors package.
Optimising 2D rotation
Can anyone point me in the right direction for how this can be done?
The below code shows the regular method without SIMD. Where Point is a struct with doubles X and Y.
public static Point[] RotatePoints(Point[] points, double cosAngle, double sinAngle)
{
var pointsLength = points.Length;
var results = new Point[pointsLength];
for (var i = 0; i < pointsLength; i++)
{
results[i].X = (points[i].X * cosAngle) - (points[i].Y * sinAngle);
results[i].Y = (points[i].X * sinAngle) + (points[i].Y * cosAngle);
}
return results;
}
Edit:
I've managed to get an implementation working using two Vector< float> but from benchmarking this, this seems to be a lot slower than the previous implementation.
private static void RotatePoints(float[] x, float[] y, float cosAngle, float sinAngle)
{
var chunkSize = Vector<float>.Count;
var resultX = new float[x.Length];
var resultY = new float[x.Length];
Vector<float> vectorChunk1;
Vector<float> vectorChunk2;
for (var i = 0; i < x.Length; i += chunkSize)
{
vectorChunk1 = new Vector<float>(x, i);
vectorChunk2 = new Vector<float>(y, i);
Vector.Subtract(Vector.Multiply(vectorChunk1, cosAngle), Vector.Multiply(vectorChunk2, sinAngle)).CopyTo(resultX, i);
Vector.Add(Vector.Multiply(vectorChunk1, sinAngle), Vector.Multiply(vectorChunk2, cosAngle)).CopyTo(resultY, i);
}
}
The code added in the edit is a good start, however the codegen for Vector.Multiply(Vector<float>, float) is extremely bad so this function should be avoided. It's an easy change to avoid it though, just broadcast outside the loop and multiply by a vector. I also added a more proper loop bound and "scalar epilog" in case the vector size does not neatly divide the size of the input arrays.
private static void RotatePoints(float[] x, float[] y, float cosAngle, float sinAngle)
{
var chunkSize = Vector<float>.Count;
var resultX = new float[x.Length];
var resultY = new float[x.Length];
Vector<float> vectorChunk1;
Vector<float> vectorChunk2;
Vector<float> vcosAngle = new Vector<float>(cosAngle);
Vector<float> vsinAngle = new Vector<float>(sinAngle);
int i;
for (i = 0; i + chunkSize - 1 < x.Length; i += chunkSize)
{
vectorChunk1 = new Vector<float>(x, i);
vectorChunk2 = new Vector<float>(y, i);
Vector.Subtract(Vector.Multiply(vectorChunk1, vcosAngle), Vector.Multiply(vectorChunk2, vsinAngle)).CopyTo(resultX, i);
Vector.Add(Vector.Multiply(vectorChunk1, vsinAngle), Vector.Multiply(vectorChunk2, vcosAngle)).CopyTo(resultY, i);
}
for (; i < x.Length; i++)
{
resultX[i] = x[i] * cosAngle - y[i] * sinAngle;
resultY[i] = x[i] * sinAngle + y[i] * cosAngle;
}
}

How can I generate a "random constant colour" for a given string?

how can I generate a "random constant colour" for a given string at runtime?
So a given string value will always have the same colour but different strings will have different colours.
Like how gmail assigns colours to the sender names.
Thanks
Responses to comments:
Thinking to generate the colour from a hashcode.
The colours won't be stored but generated from a hashcode.
I don't know any dedicated method for this, but here is a simple method generating Hexadecimal values with MD5 based on a given string:
using System.Security.Cryptography;
using System.Text;
static string GetColor(string raw)
{
using (MD5 md5Hash = MD5.Create())
{
byte[] data = md5Hash.ComputeHash(Encoding.UTF8.GetBytes(raw));
return BitConverter.ToString(data).Replace("-", string.Empty).Substring(0, 6);
}
}
Examples:
example#example.com
-> 23463B
info#google.com
-> 3C9015
stack#exchange.com
-> 7CA5E8
Edit:
I didn't tested it enough, so you may want to tweak it a little bit to get more different and unique values.
Edit2:
If you want transparency, check out this question/answer. By setting the Substring to Substring(0,8) , you should return a string with the alpha channel.
Similar to what the other answers are suggesting (hash the string in some form then use that hash to pick the color), but instead of using the hash to directly calculate the color use it as the index to an array of "Acceptable" colors.
class ColorPicker
{
public ColorPicker(int colorCount)
{
//The ".Skip(2)" makes it skip pure white and pure black.
// If you want those two, take out the +2 and the skip.
_colors = ColorGenerator.Generate(colorCount + 2).Skip(2).ToArray();
}
private readonly Color[] _colors;
public Color StringToColor(string message)
{
int someHash = CalculateHashOfStringSomehow(message);
return _colors[someHash % _colors.Length];
}
private int CalculateHashOfStringSomehow(string message)
{
//TODO: I would not use "message.GetHashCode()" as you are not
// guaranteed the same value between runs of the program.
// Make up your own algorithom or use a existing one that has a fixed
// output for a given input, like MD5.
}
}
This prevents issues like getting a white color when you plan on showing the text with a white background and other similar problems.
To populate your Color[] see this answer for the ColorGenerator class or just make your own pre-defined list of colors that look good on whatever background they will be used on.
Appendix:
In case the link goes down, here is a copy of the ColorGenerator class
public static class ColorGenerator
{
// RYB color space
private static class RYB
{
private static readonly double[] White = { 1, 1, 1 };
private static readonly double[] Red = { 1, 0, 0 };
private static readonly double[] Yellow = { 1, 1, 0 };
private static readonly double[] Blue = { 0.163, 0.373, 0.6 };
private static readonly double[] Violet = { 0.5, 0, 0.5 };
private static readonly double[] Green = { 0, 0.66, 0.2 };
private static readonly double[] Orange = { 1, 0.5, 0 };
private static readonly double[] Black = { 0.2, 0.094, 0.0 };
public static double[] ToRgb(double r, double y, double b)
{
var rgb = new double[3];
for (int i = 0; i < 3; i++)
{
rgb[i] = White[i] * (1.0 - r) * (1.0 - b) * (1.0 - y) +
Red[i] * r * (1.0 - b) * (1.0 - y) +
Blue[i] * (1.0 - r) * b * (1.0 - y) +
Violet[i] * r * b * (1.0 - y) +
Yellow[i] * (1.0 - r) * (1.0 - b) * y +
Orange[i] * r * (1.0 - b) * y +
Green[i] * (1.0 - r) * b * y +
Black[i] * r * b * y;
}
return rgb;
}
}
private class Points : IEnumerable<double[]>
{
private readonly int pointsCount;
private double[] picked;
private int pickedCount;
private readonly List<double[]> points = new List<double[]>();
public Points(int count)
{
pointsCount = count;
}
private void Generate()
{
points.Clear();
var numBase = (int)Math.Ceiling(Math.Pow(pointsCount, 1.0 / 3.0));
var ceil = (int)Math.Pow(numBase, 3.0);
for (int i = 0; i < ceil; i++)
{
points.Add(new[]
{
Math.Floor(i/(double)(numBase*numBase))/ (numBase - 1.0),
Math.Floor((i/(double)numBase) % numBase)/ (numBase - 1.0),
Math.Floor((double)(i % numBase))/ (numBase - 1.0),
});
}
}
private double Distance(double[] p1)
{
double distance = 0;
for (int i = 0; i < 3; i++)
{
distance += Math.Pow(p1[i] - picked[i], 2.0);
}
return distance;
}
private double[] Pick()
{
if (picked == null)
{
picked = points[0];
points.RemoveAt(0);
pickedCount = 1;
return picked;
}
var d1 = Distance(points[0]);
int i1 = 0, i2 = 0;
foreach (var point in points)
{
var d2 = Distance(point);
if (d1 < d2)
{
i1 = i2;
d1 = d2;
}
i2 += 1;
}
var pick = points[i1];
points.RemoveAt(i1);
for (int i = 0; i < 3; i++)
{
picked[i] = (pickedCount * picked[i] + pick[i]) / (pickedCount + 1.0);
}
pickedCount += 1;
return pick;
}
public IEnumerator<double[]> GetEnumerator()
{
Generate();
for (int i = 0; i < pointsCount; i++)
{
yield return Pick();
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
public static IEnumerable<Color> Generate(int numOfColors)
{
var points = new Points(numOfColors);
foreach (var point in points)
{
var rgb = RYB.ToRgb(point[0], point[1], point[2]);
yield return Color.FromArgb(
(int)Math.Floor(255 * rgb[0]),
(int)Math.Floor(255 * rgb[1]),
(int)Math.Floor(255 * rgb[2]));
}
}
}
3 integer variables, r,g and b.
Loop through each character in the string in steps of 3 and add the character code.
r += n + 0
g += n + 1
b += n + 2
after the loop take r,g, and b modulo 255 and create a color using Color.FromARGB.
No guarantees the color will be pretty though, and some strings may happen to have colors very close to each other.
I see some pretty good answeers but though it whould contribute with a little fun solutuion to generate colors from string, the Hash version looks like the best way to go but if this gives any one some inspiration to bould off, have at it
ConsoleKeyInfo ch = new ConsoleKeyInfo();
while (ch.KeyChar != 'e')
{
Console.WriteLine("type string to seed color");
string s = Console.ReadLine(); // gets text from input, in this case the command line
double d=0;
foreach(char cha in s.ToCharArray())
{
d=+ (int)cha; // get the value and adds it
}
d= (255/(Math.Pow(0.2,-0.002 *d))); // Generates a seed like value from i where 255 is the maximum. basicly 255/0.2^(-0.002*d)
int i = Convert.ToInt32(d); //then convets and get rid of the decimels
Color c = Color.FromArgb(i, i, i);// add a bit more calculation for varieng colers.
Console.WriteLine(c.Name);
Console.WriteLine("To Exit press e");
ch = Console.ReadKey()
}
Edit1: It definantly needs some tweeking, since the longer the string the ligther the color, but i think something can come from it with a little work :)

Genetic Algorithm implementation in C#

I've recently started working with C# and I'm currently trying to implement a version of GA to solve Schwefel’s function(See code below). The code is based on a working Processing code that I built.
The first generation(first 100 individuals) seems to work fine but after that the fitness function gets repetitive values. I'm sure I'm missing something here but does anyone know what might be the problem?
public void button21_Click(object sender, EventArgs e)
{
Population p;
// populationNum = 100;
p = new Population();
int gen = 0;
while (gen < 8000)
{
p.evolve();
}
++gen;
}
//Class Genotype
public partial class Genotype
{
public int[] genes;
public Genotype()
{
genes = new int[2];
for (int i = 0; i < genes.Length; i++)
{
Random rnd = new Random(int.Parse(Guid.NewGuid().ToString().Substring(0, 8), System.Globalization.NumberStyles.HexNumber));
//Random rnd = new Random(0);
int random = rnd.Next(256);
genes[i] = (int)random;
}
}
public void mutate()
{
//5% mutation rate
for (int i = 0; i < genes.Length; i++)
{
Random rnd = new Random(int.Parse(Guid.NewGuid().ToString().Substring(0, 8), System.Globalization.NumberStyles.HexNumber));
int random = rnd.Next(100);
if (random < 5)
{
//Random genernd = new Random();
int generandom = rnd.Next(256);
genes[i] = (int)generandom;
}
}
}
}
static Genotype crossover(Genotype a, Genotype b)
{
Genotype c = new Genotype();
for (int i = 0; i < c.genes.Length; i++)
{
//50-50 chance of selection
Random rnd = new Random(int.Parse(Guid.NewGuid().ToString().Substring(0, 8), System.Globalization.NumberStyles.HexNumber));
float random = rnd.Next(0, 1);
if (random < 0.5)
{
c.genes[i] = a.genes[i];
}
else
{
c.genes[i] = b.genes[i];
}
}
return c;
}
//Class Phenotype
public partial class Phenotype
{
double i_x;
double i_y;
public Phenotype(Genotype g)
{
i_x = g.genes[0] * 500 / 256;
i_y = g.genes[1] * 500 / 256;
}
public double evaluate()
{
double fitness = 0;
fitness -= (-1.0*i_x * Math.Sin(Math.Sqrt(Math.Abs(i_x)))) + (-1.0*i_y * Math.Sin(Math.Sqrt(Math.Abs(i_y))));
Console.WriteLine(fitness);
return fitness;
}
}
//Class Individual
public partial class Individual : IComparable<Individual>
{
public Genotype i_genotype;
public Phenotype i_phenotype;
double i_fitness;
public Individual()
{
this.i_genotype = new Genotype();
this.i_phenotype = new Phenotype(i_genotype);
this.i_fitness = 0;
}
public void evaluate()
{
i_fitness = i_phenotype.evaluate();
}
int IComparable<Individual>.CompareTo(Individual objI)
{
Individual iToCompare = (Individual)objI;
if (i_fitness < iToCompare.i_fitness)
{
return -1; //if I am less fit than iCompare return -1
}
else if (i_fitness > iToCompare.i_fitness)
{
return 1; //if I am fitter than iCompare return 1
}
return 0; // if we are equally return 0
}
}
static Individual breed(Individual a, Individual b)
{
Individual c = new Individual();
c.i_genotype = crossover(a.i_genotype, b.i_genotype);
c.i_genotype.mutate();
c.i_phenotype = new Phenotype(c.i_genotype);
return c;
}
//Class Population
public class Population
{
Individual[] pop;
int populationNum = 100;
public Population()
{
pop = new Individual[populationNum];
for (int i = 0; i < populationNum; i++)
{
this.pop[i] = new Individual();
pop[i].evaluate();
}
Array.Sort(this.pop);
}
public void evolve()
{
Individual a = select();
Individual b = select();
//breed the two selected individuals
Individual x = breed(a, b);
//place the offspring in the lowest position in the population, thus replacing the previously weakest offspring
pop[0] = x;
//evaluate the new individual (grow)
x.evaluate();
//the fitter offspring will find its way in the population ranks
Array.Sort(this.pop);
//rnd = new Random(0);
}
Individual select()
{
Random rnd = new Random(int.Parse(Guid.NewGuid().ToString().Substring(0, 8), System.Globalization.NumberStyles.HexNumber));
float random = rnd.Next(0, 1);
//skew distribution; multiplying by 99.999999 scales a number from 0-1 to 0-99, BUT NOT 100
//the sqrt of a number between 0-1 has bigger possibilities of giving us a smaller number
//if we subtract that squares number from 1 the opposite is true-> we have bigger possibilities of having a larger number
int which = (int)Math.Floor(((float)populationNum - 1e-6) * (1.0 - Math.Pow(random, random)));
return pop[which];
}
}
This an updated code that I think it performs well:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Threading;
namespace ConsoleApplication8
{
class Program
{
static Random random = new Random();
static void Main(string[] args)
{
Population p;
System.IO.StreamWriter file = new System.IO.StreamWriter("c:\\test.txt");
int population = 100;
p = new Population(file, population);
int gen = 0;
while (gen <= 1000)
{
p.evolve(file);
++gen;
}
file.Close();
}
public static double GetRandomNumber(double min, double max)
{
return (random.NextDouble() * (max - min)) + min;
//return random.NextDouble() *random.Next(min,max);
}
//Class Genotype
public class Genotype
{
public int[] genes;
public Genotype()
{
this.genes = new int[2];
for (int i = 0; i < genes.Length; i++)
{
this.genes[i] = (int)GetRandomNumber(-500.0, 500.0);
}
}
public void mutate()
{
//5% mutation rate
for (int i = 0; i < genes.Length; i++)
{
if (GetRandomNumber(0.0, 100) < 5)
{
//Random genernd = new Random();
this.genes[i] = (int)GetRandomNumber(0.0, 256.0);
}
}
}
}
static Genotype crossover(Genotype a, Genotype b)
{
Genotype c = new Genotype();
for (int i = 0; i < c.genes.Length; i++)
{
//50-50 chance of selection
if (GetRandomNumber(0.0, 1) < 0.5)
{
c.genes[i] = a.genes[i];
}
else
{
c.genes[i] = b.genes[i];
}
}
return c;
}
//Class Phenotype
public class Phenotype
{
double i_x;
double i_y;
public Phenotype(Genotype g)
{
this.i_x = g.genes[0];
this.i_y = g.genes[1];
}
public double evaluate(System.IO.StreamWriter file)
{
double fitness = 0;
//fitness -= i_x + i_y;
fitness -= (i_x*Math.Sin(Math.Sqrt(Math.Abs(i_x)))) + i_y*(Math.Sin(Math.Sqrt(Math.Abs(i_y))));
file.WriteLine(fitness);
return fitness;
}
}
//Class Individual
public class Individual : IComparable<Individual>
{
public Genotype i_genotype;
public Phenotype i_phenotype;
double i_fitness;
public Individual()
{
this.i_genotype = new Genotype();
this.i_phenotype = new Phenotype(i_genotype);
this.i_fitness = 0.0;
}
public void evaluate(System.IO.StreamWriter file)
{
this.i_fitness = i_phenotype.evaluate(file);
}
int IComparable<Individual>.CompareTo(Individual objI)
{
Individual iToCompare = (Individual)objI;
if (i_fitness < iToCompare.i_fitness)
{
return -1; //if I am less fit than iCompare return -1
}
else if (i_fitness > iToCompare.i_fitness)
{
return 1; //if I am fitter than iCompare return 1
}
return 0; // if we are equally return 0
}
}
public static Individual breed(Individual a, Individual b)
{
Individual c = new Individual();
c.i_genotype = crossover(a.i_genotype, b.i_genotype);
c.i_genotype.mutate();
c.i_phenotype = new Phenotype(c.i_genotype);
return c;
}
//Class Population
public class Population
{
Individual[] pop;
//int populationNum = 100;
public Population(System.IO.StreamWriter file, int populationNum)
{
this.pop = new Individual[populationNum];
for (int i = 0; i < populationNum; i++)
{
this.pop[i] = new Individual();
this.pop[i].evaluate(file);
}
Array.Sort(pop);
}
public void evolve(System.IO.StreamWriter file)
{
Individual a = select(100);
Individual b = select(100);
//breed the two selected individuals
Individual x = breed(a, b);
//place the offspring in the lowest position in the population, thus replacing the previously weakest offspring
this.pop[0] = x;
//evaluate the new individual (grow)
x.evaluate(file);
//the fitter offspring will find its way in the population ranks
Array.Sort(pop);
}
Individual select(int popNum)
{
//skew distribution; multiplying by 99.999999 scales a number from 0-1 to 0-99, BUT NOT 100
//the sqrt of a number between 0-1 has bigger possibilities of giving us a smaller number
//if we subtract that squares number from 1 the opposite is true-> we have bigger possibilities of having a larger number
int which = (int)Math.Floor(((float)popNum - 1E-6) * (1.0 - Math.Pow(GetRandomNumber(0.0, 1.0), 2)));
return pop[which];
}
}
}
}
This is a problem:
float random = rnd.Next(0, 1); // returns an integer from 0 to 0 as a float
// Documentation states the second argument is exclusive
Try
float random = (float)rnd.NextDouble(); // rnd should be static, init'd once.
and replace all instances of Individual[] with List<Individual> which wraps an array and allows for easy Add(), InsertAt() and RemoveAt() methods.
PS. Also common convention has it to use PascalCasing for all methods and properties.
I think the biggest issue is with your select function.
The success of GA's depends a lot on picking the right Mutation, Evaluation and Selection techniques, although at first glance your selection function seems elegant to skew distribution, you're only skewing it based on relative position (i.e. Pop[0] < Pop[1]) but you're not taking into account how different they are from each other.
In GA's there's a HUGE difference between having the best individual have 100.0 Fitness and the Second have 99.9 than the best have 100.0 and the second have 75.0 and your selection function completely ignores this fact.
What is happening, why you see the repetitive fitness values, is because you're picking roughly the same individuals over and over, making your genetic pool stagnant and stalling in a local minimum (or maximum whatever you're looking for).
If you look for a method like Roullette (http://en.wikipedia.org/wiki/Fitness_proportionate_selection) they pick the probability as a function of the individual fitness divided over the total fitness, sharing the 'chance' of being picked among more individuals depending on how they behave, although this method can also get trapped in locals, it far less prone to than what you currently have, this should give you a very good boost on exploring the search space.
TL;DR - The selection function is not good enough as it is skewing the distribution too harshly and is only taking into account relative comparisons.
Random.next(int min,int max), will generate only integers between the min and max values.
try the (rnd.NextDouble) to generate a random number between 0 and 1.
this what i can help right now :)

Looking for a Histogram Binning algorithm for decimal data

I need to generate bins for the purposes of calculating a histogram. Language is C#. Basically I need to take in an array of decimal numbers and generate a histogram plot out of those.
Haven't been able to find a decent library to do this outright so now I'm just looking for either a library or an algorithm to help me do the binning of the data.
So...
Are there any C# libraries out there that will take in an array of decimal data and output a binned histogram?
Is there generic algorithm for building the bins to be used in generated a histogram?
Here is a simple bucket function I use. Sadly, .NET generics doesn't support a numerical type contraint so you will have to implement a different version of the following function for decimal, int, double, etc.
public static List<int> Bucketize(this IEnumerable<decimal> source, int totalBuckets)
{
var min = source.Min();
var max = source.Max();
var buckets = new List<int>();
var bucketSize = (max - min) / totalBuckets;
foreach (var value in source)
{
int bucketIndex = 0;
if (bucketSize > 0.0)
{
bucketIndex = (int)((value - min) / bucketSize);
if (bucketIndex == totalBuckets)
{
bucketIndex--;
}
}
buckets[bucketIndex]++;
}
return buckets;
}
I got odd results using #JakePearson accepted answer. It has to do with an edge case.
Here is the code I used to test his method. I changed the extension method ever so slightly, returning an int[] and accepting double instead of decimal.
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
Random rand = new Random(1325165);
int maxValue = 100;
int numberOfBuckets = 100;
List<double> values = new List<double>();
for (int i = 0; i < 10000000; i++)
{
double value = rand.NextDouble() * (maxValue+1);
values.Add(value);
}
int[] bins = values.Bucketize(numberOfBuckets);
PointPairList points = new PointPairList();
for (int i = 0; i < numberOfBuckets; i++)
{
points.Add(i, bins[i]);
}
zedGraphControl1.GraphPane.AddBar("Random Points", points,Color.Black);
zedGraphControl1.GraphPane.YAxis.Title.Text = "Count";
zedGraphControl1.GraphPane.XAxis.Title.Text = "Value";
zedGraphControl1.AxisChange();
zedGraphControl1.Refresh();
}
}
public static class Extension
{
public static int[] Bucketize(this IEnumerable<double> source, int totalBuckets)
{
var min = source.Min();
var max = source.Max();
var buckets = new int[totalBuckets];
var bucketSize = (max - min) / totalBuckets;
foreach (var value in source)
{
int bucketIndex = 0;
if (bucketSize > 0.0)
{
bucketIndex = (int)((value - min) / bucketSize);
if (bucketIndex == totalBuckets)
{
bucketIndex--;
}
}
buckets[bucketIndex]++;
}
return buckets;
}
}
Everything works well when using 10,000,000 random double values between 0 and 100 (exclusive). Each bucket has roughly the same number of values, which makes sense given that Random returns a normal distribution.
But when I changed the value generation line from
double value = rand.NextDouble() * (maxValue+1);
to
double value = rand.Next(0, maxValue + 1);
and you get the following result, which double counts the last bucket.
It appears that when a value is same as one of the boundaries of a bucket, the code as it is written puts the value in the incorrect bucket. This artifact doesn't appear to happen with random double values as the chance of a random number being equal to a boundary of a bucket is rare and wouldn't be obvious.
The way I corrected this is to define what side of the bucket boundary is inclusive vs. exclusive.
Think of
0< x <=1 1< x <=2 ... 99< x <=100
vs.
0<= x <1 1<= x <2 ... 99<= x <100
You cannot have both boundaries inclusive, as the method wouldn't know which bucket to put it in if you have a value that is exactly equal to a boundary.
public enum BucketizeDirectionEnum
{
LowerBoundInclusive,
UpperBoundInclusive
}
public static int[] Bucketize(this IList<double> source, int totalBuckets, BucketizeDirectionEnum inclusivity = BucketizeDirectionEnum.UpperBoundInclusive)
{
var min = source.Min();
var max = source.Max();
var buckets = new int[totalBuckets];
var bucketSize = (max - min) / totalBuckets;
if (inclusivity == BucketizeDirectionEnum.LowerBoundInclusive)
{
foreach (var value in source)
{
int bucketIndex = (int)((value - min) / bucketSize);
if (bucketIndex == totalBuckets)
continue;
buckets[bucketIndex]++;
}
}
else
{
foreach (var value in source)
{
int bucketIndex = (int)Math.Ceiling((value - min) / bucketSize) - 1;
if (bucketIndex < 0)
continue;
buckets[bucketIndex]++;
}
}
return buckets;
}
The only issue now is if the input dataset has a lot of min and max values, the binning method will exclude many of those values and the resulting graph will misrepresent the dataset.

How do I calculate a trendline for a graph?

Google is not being my friend - it's been a long time since my stats class in college...I need to calculate the start and end points for a trendline on a graph - is there an easy way to do this? (working in C# but whatever language works for you)
Thanks to all for your help - I was off this issue for a couple of days and just came back to it - was able to cobble this together - not the most elegant code, but it works for my purposes - thought I'd share if anyone else encounters this issue:
public class Statistics
{
public Trendline CalculateLinearRegression(int[] values)
{
var yAxisValues = new List<int>();
var xAxisValues = new List<int>();
for (int i = 0; i < values.Length; i++)
{
yAxisValues.Add(values[i]);
xAxisValues.Add(i + 1);
}
return new Trendline(yAxisValues, xAxisValues);
}
}
public class Trendline
{
private readonly IList<int> xAxisValues;
private readonly IList<int> yAxisValues;
private int count;
private int xAxisValuesSum;
private int xxSum;
private int xySum;
private int yAxisValuesSum;
public Trendline(IList<int> yAxisValues, IList<int> xAxisValues)
{
this.yAxisValues = yAxisValues;
this.xAxisValues = xAxisValues;
this.Initialize();
}
public int Slope { get; private set; }
public int Intercept { get; private set; }
public int Start { get; private set; }
public int End { get; private set; }
private void Initialize()
{
this.count = this.yAxisValues.Count;
this.yAxisValuesSum = this.yAxisValues.Sum();
this.xAxisValuesSum = this.xAxisValues.Sum();
this.xxSum = 0;
this.xySum = 0;
for (int i = 0; i < this.count; i++)
{
this.xySum += (this.xAxisValues[i]*this.yAxisValues[i]);
this.xxSum += (this.xAxisValues[i]*this.xAxisValues[i]);
}
this.Slope = this.CalculateSlope();
this.Intercept = this.CalculateIntercept();
this.Start = this.CalculateStart();
this.End = this.CalculateEnd();
}
private int CalculateSlope()
{
try
{
return ((this.count*this.xySum) - (this.xAxisValuesSum*this.yAxisValuesSum))/((this.count*this.xxSum) - (this.xAxisValuesSum*this.xAxisValuesSum));
}
catch (DivideByZeroException)
{
return 0;
}
}
private int CalculateIntercept()
{
return (this.yAxisValuesSum - (this.Slope*this.xAxisValuesSum))/this.count;
}
private int CalculateStart()
{
return (this.Slope*this.xAxisValues.First()) + this.Intercept;
}
private int CalculateEnd()
{
return (this.Slope*this.xAxisValues.Last()) + this.Intercept;
}
}
OK, here's my best pseudo math:
The equation for your line is:
Y = a + bX
Where:
b = (sum(x*y) - sum(x)sum(y)/n) / (sum(x^2) - sum(x)^2/n)
a = sum(y)/n - b(sum(x)/n)
Where sum(xy) is the sum of all x*y etc. Not particularly clear I concede, but it's the best I can do without a sigma symbol :)
... and now with added Sigma
b = (Σ(xy) - (ΣxΣy)/n) / (Σ(x^2) - (Σx)^2/n)
a = (Σy)/n - b((Σx)/n)
Where Σ(xy) is the sum of all x*y etc. and n is the number of points
Given that the trendline is straight, find the slope by choosing any two points and calculating:
(A) slope = (y1-y2)/(x1-x2)
Then you need to find the offset for the line. The line is specified by the equation:
(B) y = offset + slope*x
So you need to solve for offset. Pick any point on the line, and solve for offset:
(C) offset = y - (slope*x)
Now you can plug slope and offset into the line equation (B) and have the equation that defines your line. If your line has noise you'll have to decide on an averaging algorithm, or use curve fitting of some sort.
If your line isn't straight then you'll need to look into Curve fitting, or Least Squares Fitting - non trivial, but do-able. You'll see the various types of curve fitting at the bottom of the least squares fitting webpage (exponential, polynomial, etc) if you know what kind of fit you'd like.
Also, if this is a one-off, use Excel.
Here is a very quick (and semi-dirty) implementation of Bedwyr Humphreys's answer. The interface should be compatible with #matt's answer as well, but uses decimal instead of int and uses more IEnumerable concepts to hopefully make it easier to use and read.
Slope is b, Intercept is a
public class Trendline
{
public Trendline(IList<decimal> yAxisValues, IList<decimal> xAxisValues)
: this(yAxisValues.Select((t, i) => new Tuple<decimal, decimal>(xAxisValues[i], t)))
{ }
public Trendline(IEnumerable<Tuple<Decimal, Decimal>> data)
{
var cachedData = data.ToList();
var n = cachedData.Count;
var sumX = cachedData.Sum(x => x.Item1);
var sumX2 = cachedData.Sum(x => x.Item1 * x.Item1);
var sumY = cachedData.Sum(x => x.Item2);
var sumXY = cachedData.Sum(x => x.Item1 * x.Item2);
//b = (sum(x*y) - sum(x)sum(y)/n)
// / (sum(x^2) - sum(x)^2/n)
Slope = (sumXY - ((sumX * sumY) / n))
/ (sumX2 - (sumX * sumX / n));
//a = sum(y)/n - b(sum(x)/n)
Intercept = (sumY / n) - (Slope * (sumX / n));
Start = GetYValue(cachedData.Min(a => a.Item1));
End = GetYValue(cachedData.Max(a => a.Item1));
}
public decimal Slope { get; private set; }
public decimal Intercept { get; private set; }
public decimal Start { get; private set; }
public decimal End { get; private set; }
public decimal GetYValue(decimal xValue)
{
return Intercept + Slope * xValue;
}
}
Regarding a previous answer
if (B) y = offset + slope*x
then (C) offset = y/(slope*x) is wrong
(C) should be:
offset = y-(slope*x)
See:
http://zedgraph.org/wiki/index.php?title=Trend
If you have access to Excel, look in the "Statistical Functions" section of the Function Reference within Help. For straight-line best-fit, you need SLOPE and INTERCEPT and the equations are right there.
Oh, hang on, they're also defined online here: http://office.microsoft.com/en-us/excel/HP052092641033.aspx for SLOPE, and there's a link to INTERCEPT. OF course, that assumes MS don't move the page, in which case try Googling for something like "SLOPE INTERCEPT EQUATION Excel site:microsoft.com" - the link given turned out third just now.
I converted Matt's code to Java so I could use it in Android with the MPAndroidChart library. Also used double values instead of integer values:
ArrayList<Entry> yValues2 = new ArrayList<>();
ArrayList<Double > xAxisValues = new ArrayList<Double>();
ArrayList<Double> yAxisValues = new ArrayList<Double>();
for (int i = 0; i < readings.size(); i++)
{
r = readings.get(i);
yAxisValues.add(r.value);
xAxisValues.add((double)i + 1);
}
TrendLine tl = new TrendLine(yAxisValues, xAxisValues);
//Create the y values for the trend line
double currY = tl.Start;
for (int i = 0; i < readings.size(); ++ i) {
yValues2.add(new Entry(i, (float) currY));
currY = currY + tl.Slope;
}
...
public class TrendLine
{
private ArrayList<Double> xAxisValues = new ArrayList<Double>();
private ArrayList<Double> yAxisValues = new ArrayList<Double>();
private int count;
private double xAxisValuesSum;
private double xxSum;
private double xySum;
private double yAxisValuesSum;
public TrendLine(ArrayList<Double> yAxisValues, ArrayList<Double> xAxisValues)
{
this.yAxisValues = yAxisValues;
this.xAxisValues = xAxisValues;
this.Initialize();
}
public double Slope;
public double Intercept;
public double Start;
public double End;
private double getArraySum(ArrayList<Double> arr) {
double sum = 0;
for (int i = 0; i < arr.size(); ++i) {
sum = sum + arr.get(i);
}
return sum;
}
private void Initialize()
{
this.count = this.yAxisValues.size();
this.yAxisValuesSum = getArraySum(this.yAxisValues);
this.xAxisValuesSum = getArraySum(this.xAxisValues);
this.xxSum = 0;
this.xySum = 0;
for (int i = 0; i < this.count; i++)
{
this.xySum += (this.xAxisValues.get(i)*this.yAxisValues.get(i));
this.xxSum += (this.xAxisValues.get(i)*this.xAxisValues.get(i));
}
this.Slope = this.CalculateSlope();
this.Intercept = this.CalculateIntercept();
this.Start = this.CalculateStart();
this.End = this.CalculateEnd();
}
private double CalculateSlope()
{
try
{
return ((this.count*this.xySum) - (this.xAxisValuesSum*this.yAxisValuesSum))/((this.count*this.xxSum) - (this.xAxisValuesSum*this.xAxisValuesSum));
}
catch (Exception e)
{
return 0;
}
}
private double CalculateIntercept()
{
return (this.yAxisValuesSum - (this.Slope*this.xAxisValuesSum))/this.count;
}
private double CalculateStart()
{
return (this.Slope*this.xAxisValues.get(0)) + this.Intercept;
}
private double CalculateEnd()
{
return (this.Slope*this.xAxisValues.get(this.xAxisValues.size()-1)) + this.Intercept;
}
}
This is the way i calculated the slope:
Source: http://classroom.synonym.com/calculate-trendline-2709.html
class Program
{
public double CalculateTrendlineSlope(List<Point> graph)
{
int n = graph.Count;
double a = 0;
double b = 0;
double bx = 0;
double by = 0;
double c = 0;
double d = 0;
double slope = 0;
foreach (Point point in graph)
{
a += point.x * point.y;
bx = point.x;
by = point.y;
c += Math.Pow(point.x, 2);
d += point.x;
}
a *= n;
b = bx * by;
c *= n;
d = Math.Pow(d, 2);
slope = (a - b) / (c - d);
return slope;
}
}
class Point
{
public double x;
public double y;
}
Here's what I ended up using.
public class DataPoint<T1,T2>
{
public DataPoint(T1 x, T2 y)
{
X = x;
Y = y;
}
[JsonProperty("x")]
public T1 X { get; }
[JsonProperty("y")]
public T2 Y { get; }
}
public class Trendline
{
public Trendline(IEnumerable<DataPoint<long, decimal>> dataPoints)
{
int count = 0;
long sumX = 0;
long sumX2 = 0;
decimal sumY = 0;
decimal sumXY = 0;
foreach (var dataPoint in dataPoints)
{
count++;
sumX += dataPoint.X;
sumX2 += dataPoint.X * dataPoint.X;
sumY += dataPoint.Y;
sumXY += dataPoint.X * dataPoint.Y;
}
Slope = (sumXY - ((sumX * sumY) / count)) / (sumX2 - ((sumX * sumX) / count));
Intercept = (sumY / count) - (Slope * (sumX / count));
}
public decimal Slope { get; private set; }
public decimal Intercept { get; private set; }
public decimal Start { get; private set; }
public decimal End { get; private set; }
public decimal GetYValue(decimal xValue)
{
return Slope * xValue + Intercept;
}
}
My data set is using a Unix timestamp for the x-axis and a decimal for the y. Change those datatypes to fit your need. I do all the sum calculations in one iteration for the best possible performance.
Thank You so much for the solution, I was scratching my head.
Here's how I applied the solution in Excel.
I successfully used the two functions given by MUHD in Excel:
a = (sum(x*y) - sum(x)sum(y)/n) / (sum(x^2) - sum(x)^2/n)
b = sum(y)/n - b(sum(x)/n)
(careful my a and b are the b and a in MUHD's solution).
- Made 4 columns, for example:
NB: my values y values are in B3:B17, so I have n=15;
my x values are 1,2,3,4...15.
1. Column B: Known x's
2. Column C: Known y's
3. Column D: The computed trend line
4. Column E: B values * C values (E3=B3*C3, E4=B4*C4, ..., E17=B17*C17)
5. Column F: x squared values
I then sum the columns B,C and E, the sums go in line 18 for me, so I have B18 as sum of Xs, C18 as sum of Ys, E18 as sum of X*Y, and F18 as sum of squares.
To compute a, enter the followin formula in any cell (F35 for me):
F35=(E18-(B18*C18)/15)/(F18-(B18*B18)/15)
To compute b (in F36 for me):
F36=C18/15-F35*(B18/15)
Column D values, computing the trend line according to the y = ax + b:
D3=$F$35*B3+$F$36, D4=$F$35*B4+$F$36 and so on (until D17 for me).
Select the column datas (C2:D17) to make the graph.
HTH.
If anyone needs the JS code for calculating the trendline of many points on a graph, here's what worked for us in the end:
/**#typedef {{
* x: Number;
* y:Number;
* }} Point
* #param {Point[]} data
* #returns {Function} */
function _getTrendlineEq(data) {
const xySum = data.reduce((acc, item) => {
const xy = item.x * item.y
acc += xy
return acc
}, 0)
const xSum = data.reduce((acc, item) => {
acc += item.x
return acc
}, 0)
const ySum = data.reduce((acc, item) => {
acc += item.y
return acc
}, 0)
const aTop = (data.length * xySum) - (xSum * ySum)
const xSquaredSum = data.reduce((acc, item) => {
const xSquared = item.x * item.x
acc += xSquared
return acc
}, 0)
const aBottom = (data.length * xSquaredSum) - (xSum * xSum)
const a = aTop / aBottom
const bTop = ySum - (a * xSum)
const b = bTop / data.length
return function trendline(x) {
return a * x + b
}
}
It takes an array of (x,y) points and returns the function of a y given a certain x
Have fun :)
Here's a working example in golang. I searched around and found this page and converted this over to what I needed. Hope someone else can find it useful.
// https://classroom.synonym.com/calculate-trendline-2709.html
package main
import (
"fmt"
"math"
)
func main() {
graph := [][]float64{
{1, 3},
{2, 5},
{3, 6.5},
}
n := len(graph)
// get the slope
var a float64
var b float64
var bx float64
var by float64
var c float64
var d float64
var slope float64
for _, point := range graph {
a += point[0] * point[1]
bx += point[0]
by += point[1]
c += math.Pow(point[0], 2)
d += point[0]
}
a *= float64(n) // 97.5
b = bx * by // 87
c *= float64(n) // 42
d = math.Pow(d, 2) // 36
slope = (a - b) / (c - d) // 1.75
// calculating the y-intercept (b) of the Trendline
var e float64
var f float64
e = by // 14.5
f = slope * bx // 10.5
intercept := (e - f) / float64(n) // (14.5 - 10.5) / 3 = 1.3
// output
fmt.Println(slope)
fmt.Println(intercept)
}

Categories