Genetic Algorithm stops mutating - c#

I'm currently trying to make my genetic algorithm "generate" or "evolve" towards an given word. The problem is, that it never fully reaches this word, it stops at an too high fitness score, even if it should continue mutating.
Heres an example:
User input = "HelloWorld"
After 500 generations = "XelgoWorfd"
And I have no clue why it won't continue mutating. Normally it just should resume with changing randomly some chars in the string.
So I would be very glad about some help.
Here's an basic step by step explanation:
Create 20 Chromosomes with fully randomized strings
Calculate the fitness score compared to the goal word.
(Counting Ascii ids differences)
Mate the two Chromosomes with the best score.
Mutate some of the Chromosomes randomly (change random string chars)
Kill 90% of the weak population and replace it with elite chromosomes (The chromosomes with the currently best fitness score).
Repeat everything.
So here the most important methods of my algorithm:
public Chromoson[] mate(string gene) {
Console.WriteLine("[MATING] In Progress : "+gens+" "+gene);
int pivot = (int)Math.Round((double)gens.Length / 2) - 1;
string child1 = this.gens.Substring(0, pivot) + gene.Substring(pivot);
string child2 = gene.Substring(0, pivot) + this.gens.Substring(pivot);
Chromoson[] list = new Chromoson[2];
list[0] = new Chromoson(child1);
list[1] = new Chromoson(child2);
Console.WriteLine("[MATING] Pivot : "+pivot);
Console.WriteLine("[MATING] Children : "+child1+" "+child2);
return list;
}
public void mutate(float chance, int possiblyChanges) {
if (random.Next(0,101) <= chance) return;
int changes = random.Next(0, possiblyChanges + 1);
//int index = (int) Math.Floor((double)random.Next() * this.gens.Length);
for (int i = 0; i < changes; i++) {
int index = random.Next(0, 13);
StringBuilder builder = new StringBuilder(gens);
int upOrDown = random.Next(0, 101);
if (upOrDown <= 50 && (int)builder[index] > 0 && chars.Contains(Convert.ToChar(builder[index] - 1)))
builder[index] = Convert.ToChar(builder[index] - 1);
else if (upOrDown >= 50 && (int)builder[index] < 127 && chars.Contains(Convert.ToChar(builder[index] + 1)))
builder[index] = Convert.ToChar(builder[index] + 1);
else
mutate(chance, possiblyChanges);
gens = builder.ToString();
}
Console.WriteLine("[MUTATING] In Progress");
}
public void calculateCost(string otherGens)
{
int total = 0;
for (int i = 0; i < gens.Length; i++)
{
total += (((int)gens[i] - (int)otherGens[i]) * ((int)gens[i] - (int)otherGens[i])) * (i*i);
}
Console.WriteLine("[CALCULATING] Costs : " + total);
this.cost = total;
}

Something is completely off in your timesteps:
Create 20 Chromosomes with fully randomized strings. Seems okay.
Calculate the fitness score compared to the goal word. (Counting Ascii ids differences). Seems okay.
Mate the two Chromosomes with the best score. What? Your only breeding the two fittest chromosomes to create the new population? That means you will have a population that is nearly completely similar. Breedfitness proportionally, so all genomes have a chance to have an offspring
Mutate some of the Chromosomes randomly (change random string chars)
Kill 90% of the weak population and replace it with elite chromosomes (The chromosomes with the currently best fitness score). You kill 90%? So basically, you're keeping the 2 best genomes every iteration and then replacing the other 18 with step 1? What you want is to keep the 2 fittest at step 3, and create the other 18 individuals by breeding.
Repeat everything.
So change your steps to:
INIT. Initialise population, create 20 random chromosomes
Calculate score for each chromsome
Save the two fittest chromosomes to the next population (aka elitism), getthe other 18 needed individuals by breeding fitness proportionally
Mutate the chromsomes with a certain chance
Repeat
Do not create random individuals every round. This turns your algorithm into a random search.

Your mutate and calculateCost functions are weird. In particular, mutate() looks designed to get trapped in local minimas. Any mutation up or down will be worse than the elites (which are probably identical so crossover changes nothing). Use a different mutate: Pick a random index and change it completely. Also remove i*i from cost().

Related

In C#, how do I store the amount of times random numbers equal certain sums in an array?

I'm writing a program that rolls _ amounts of _ sided dice _ times as specified by the user. This is a project with set requirements such as the class structure and the fact I have to use an array to show my results. The results are supposed to display each total I could possibly get and the amount of times I got each total after rolling for the specified amount of times.
I've written my attempt at this and fully expected it to work, but of course it did not.
Rolling 1,000 times and rolling 2 dice with 6 sides, here are my results:
4) 4
6) 4
8) 4
10) 4
12) 4
I'd expect something like:
2) 85
3) 83
4) 84
5) 82
... until 12
The sum doesn't start at 2 it starts at 4, it rolls 20 times instead of 1,000, and all the values are the same. Any idea what could be wrong?
Here's my code:
private int[] myTotals;
private int possibleTotal = 2;
private int arrayLocation = 0;
private int myNumberOfDice;
private string results = "";
private static Random diceGenerator = new Random();
public DiceFactory()
{
}
public void rollDice(int numberOfRolls, int numberOfSides, int numberOfDice)
{
myNumberOfDice = numberOfDice;
myTotals = new int[numberOfRolls];
arrayLocation = possibleTotal - 2;
for (int i = 0; i < numberOfRolls; i++) {
myTotals[arrayLocation] = diceGenerator.Next(1, numberOfSides + 1);
myTotals[arrayLocation]++;
}
while (possibleTotal <= numberOfSides * myNumberOfDice)
{
results += (possibleTotal) + ") " + myTotals[arrayLocation] + "\r\n";
possibleTotal++;
}
}
public string getResults()
{
return results;
}
}
arrayLocation gets set to 4 (6 - 2) at the beginning of your method, and doesn't change as you go through either your rolling loop or your result-building loop. That explains why you're seeing the value 4 for all your outputs.
I'd suggest sticking with a simple for loop for your result-building loop:
for (int i = 2; i <= numberOfSides * myNumberOfDice; i++)
{
results += (i) + ") " + myTotals[i] + "\r\n";
}
That isn't your only problem. You'll need to figure out how to add up the results from rolling multiple dice. (Your code doesn't currently use myNumberOfDice in the rolling loop.) Definitely learn to use your debugger and understand what you think should be in each variable at each step of the way, and compare it with what actually shows up each step of the way. Breaking complex things down into small, verifiable steps is the essence of software development.

Minimum sum of recursive adding of elements in an array two a time

I have an array of integers. Value of each element represents the time taken to process a file. The processing of files consists of merging two files at a time. What is the algorithm to find the minimum time that can be taken for processing all the files. E.g. - {3,5,9,12,14,18}.
The time of processing can be calculated as -
Case 1) -
a) [8],9,12,14,18
b) [17],12,14,18
c) [26],17,18
d) 26,[35]
e) 61
So total time for processing is 61 + 35 + 26 + 17 + 8 = 147
Case 2) -
a) [21],5,9,12,14
b) [17],[21],9,14
c) [21],[17],[23]
d) [40],[21]
e) 61
This time the total time is 61 + 40 + 23 + 17 + 21 = 162
Seems to me that continuously sorting the array and adding the least two elements is the best bet for the minimum as in Case 1. Is my logic right? If not what is the right and easiest way to achieve this with best performance?
Once you have the sorted list, since you are only removing the two minimum items and replacing them with one, it makes more sense to do a sorted insert and place the new item in the correct place instead of re-sorting the entire list. However, this only saves a fractional amount of time - about 1% faster.
My method CostOfMerge doesn't assume the input is a List but if it is, you can remove the conversion ToList step.
public static class IEnumerableExt {
public static int CostOfMerge(this IEnumerable<int> psrc) {
var src = psrc.ToList();
src.Sort();
while (src.Count > 1) {
var sum = src[0]+src[1];
src.RemoveRange(0, 2);
var index = src.BinarySearch(sum);
if (index < 0)
index = ~index;
src.Insert(index, sum);
total += sum;
}
return total;
}
}
As already discussed in other answers, the best strategy will be to always work on the two items with minimal cost for each iteration. So the only remaining question is how to efficiently take the two smallest items each time.
Since you asked for best performance, I shamelessly took the algorithm from NetMage and modified it to speed it up roughly 40% for my test case (thanks and +1 to NetMage).
The idea is to work mostly in place on a single array.
Each iteration increase the starting index by 1 and move the elements within the array to make space for the sum from current iteration.
public static long CostOfMerge2(this IEnumerable<int> psrc)
{
long total = 0;
var src = psrc.ToArray();
Array.Sort(src);
var i = 1;
int length = src.Length;
while (i < length)
{
var sum = src[i - 1] + src[i];
total += sum;
// find insert position for sum
var index = Array.BinarySearch(src, i + 1, length - i - 1, sum);
if (index < 0)
index = ~index;
--index;
// shift items that come before insert position one place to the left
if (i < index)
Array.Copy(src, i + 1, src, i, index - i);
src[index] = sum;
++i;
}
return total;
}
I tested with the following calling code (switching between CostOfMerge and CostOfMerge2), with a few different values for random-seed, count of elements and max value of initial items.
static void Main(string[] args)
{
var r = new Random(10);
var testcase = Enumerable.Range(0, 400000).Select(x => r.Next(1000)).ToList();
var sw = Stopwatch.StartNew();
long resultCost = testcase.CostOfMerge();
sw.Stop();
Console.WriteLine($"Cost of Merge: {resultCost}");
Console.WriteLine($"Time of Merge: {sw.Elapsed}");
Console.ReadLine();
}
Result for shown configuration for NetMage CostOfMerge:
Cost of Merge: 3670570720
Time of Merge: 00:00:15.4472251
My CostOfMerge2:
Cost of Merge: 3670570720
Time of Merge: 00:00:08.7193612
Ofcourse the detailed numbers are hardware dependent and difference might be bigger or smaller depending on a load of stuff.
No, that's the minimum for a polyphase merge: where N is the bandwidth (number of files you can merge simultaneously), then you want to merge the smallest (N-1) files at each step. However, with this more general problem, you want to delay the larger files as long as possible -- you may want an early step or two to merge fewer than (N-1) files, somewhat like having a "bye" in an elimination tourney. You want all the latter steps to involve the full (N-1) files.
For instance, given N=4 and files 1, 6, 7, 8, 14, 22:
Early merge:
[22], 14, 22
[58]
total = 80
Late merge:
[14], 8, 14, 22
[58]
total = 72
Here, you can apply the following logic to get the desired output.
Get first two minimum values from list.
Remove first two minimum values from list.
Append the sum of first two minimum values in list
And continue until the list become of size 1
Return the only element from list. i.e, this will be your minimum time taken to process every item.
You can follow my Java code out there, if you find helpful .. :)
public class MinimumSums {
private static Integer getFirstMinimum(ArrayList<Integer> list) {
Integer min = Integer.MAX_VALUE;
for(int i=0; i<list.size(); i++) {
if(list.get(i) <= min)
min = list.get(i);
}
return min;
}
private static Integer getSecondMinimum(ArrayList<Integer> list, Integer firstItem) {
Integer min = Integer.MAX_VALUE;
for(int i=0; i<list.size(); i++) {
if(list.get(i) <= min && list.get(i)> firstItem)
min = list.get(i);
}
return min;
}
public static void main(String[] args) {
Integer[] processes = {5, 9, 3, 14, 12, 18};
ArrayList<Integer> list = new ArrayList<Integer>();
ArrayList<Integer> temp = new ArrayList<Integer>();
list.addAll(Arrays.asList(processes));
while(list.size()!= 1) {
Integer firstMin = getFirstMinimum(list); // getting first min value
Integer secondMin = getSecondMinimum(list, firstMin); // getting second min
list.remove(firstMin);
list.remove(secondMin);
list.add(firstMin+secondMin);
temp.add(firstMin + secondMin);
}
System.out.println(temp); // prints all the minimum pairs..
System.out.println(list.get(0)); // prints the output
}
}

Distribute quantities into buckets - Not evenly

I've been searching around for a solution to this, but I think because of how I'm thinking about it, my search phrases might be a bit loaded in favor of topics that aren't completely relevant.
I have a number, say 950,000. This represents an inventory of [widgets] within an entire system. I have about 200 "buckets" that should each receive a portion of this inventory such that there are no widgets left over.
What I would like to happen is for each bucket to receive different amounts. I don't have any solid code to show right now, but here's some pesudo code to illustrate what I've been thinking:
//List<BucketObject> _buckets is a collection of "buckets", each of which has a "quantity" property for holding these numbers.
int _widgetCnt = 950000;
int _bucketCnt = _buckets.Count; //LINQ
//To start, each bucket receives (_widgetCnt / _bucketCnt) or 4750.
for (int _b = 0; b< _bucketCnt - 1; i++)
{
int _rndAmt = _rnd.Next(1, _buckets[i].Quantity/2); //Take SOME from this bucket...
int _rndBucket = _rnd.Next(0,_bucketCnt - 1); //Get a random bucket index from the List<BucketObject> collection.
_buckets.ElementAt(_rndBucket).Quantity += _rndAmt;
_buckets.ElementAt(i).Quantity -= _rndAmt;
}
Is this a statistically/mathematically proper way to handle this, or is there a distribution formula out there that handles this? The kicker is that while this pseudo code would run 200 times (so each bucket has a chance to alter its quantities) it would have to run X number of times depending on the TYPE of widget (which currently stands at just 11 flavors, but is expected to expand significantly in the future).
{EDIT}
This system is for a commodity trading game. Quantities at the 200 shops must differ because the inventory will determine the price at that station. The distro can't be even because that would make all prices the same. Over time, prices will naturally get out of balance, but the inventory must start out off-balance. And all inventories have to be at least similar in scope (ie, no one shop can have 1 item, and another have 900,000)
Sure, there is a solution. You could use Dirichlet Distribution for such task. Property of the distribution is that
Sumi xi = 1
So solution would be to sample 200 (equal the number of buckets) random values from Dirichlet, and then multiply each value by 950,000 (or whatever total inventory is) and that would give you number of items per bucket. If you want non-uniform sampling, you could tweak alpha in the Dirichlet sampling.
Items per bucket shall be rounded up/down, of course, but that is pretty trivial
I have Dirichlet sampling in C# somewhere, if you struggle to implement it - tell me and I would dig it out
UPDATE
I found some code, .NET Core 2, below is the excerpt. I used to sample Dirichlet RNs with the sample alphas, making all of them different is trivial.
//
// Dirichlet sampling, using Gamma sampling from Math .NET
//
using MathNet.Numerics.Distributions;
using MathNet.Numerics.Random;
static void SampleDirichlet(double alpha, double[] rn)
{
if (rn == null)
throw new ArgumentException("SampleDirichlet:: Results placeholder is null");
if (alpha <= 0.0)
throw new ArgumentException($"SampleDirichlet:: alpha {alpha} is non-positive");
int n = rn.Length;
if (n == 0)
throw new ArgumentException("SampleDirichlet:: Results placeholder is of zero size");
var gamma = new Gamma(alpha, 1.0);
double sum = 0.0;
for(int k = 0; k != n; ++k) {
double v = gamma.Sample();
sum += v;
rn[k] = v;
}
if (sum <= 0.0)
throw new ApplicationException($"SampleDirichlet:: sum {sum} is non-positive");
// normalize
sum = 1.0 / sum;
for(int k = 0; k != n; ++k) {
rn[k] *= sum;
}
}

Performance issue with generation of random unique numbers

I have a situation where by I need to create tens of thousands of unique numbers. However these numbers must be 9 digits and cannot contain any 0's. My current approach is to generate 9 digits (1-9) and concatenate them together, and if the number is not already in the list adding it into it. E.g.
public void generateIdentifiers(int quantity)
{
uniqueIdentifiers = new List<string>(quantity);
while (this.uniqueIdentifiers.Count < quantity)
{
string id = string.Empty;
id += random.Next(1,10);
id += random.Next(1,10);
id += random.Next(1,10);
id += " ";
id += random.Next(1,10);
id += random.Next(1,10);
id += random.Next(1,10);
id += " ";
id += random.Next(1,10);
id += random.Next(1,10);
id += random.Next(1,10);
if (!this.uniqueIdentifiers.Contains(id))
{
this.uniqueIdentifiers.Add(id);
}
}
}
However at about 400,000 the process really slows down as more and more of the generated numbers are duplicates. I am looking for a more efficient way to perform this process, any help would be really appreciated.
Edit: - I'm generating these - http://www.nhs.uk/NHSEngland/thenhs/records/Pages/thenhsnumber.aspx
As others have mentioned, use a HashSet<T> instead of a List<T>.
Furthermore, using StringBuilder instead of simple string operations will gain you another 25%. If you can use numbers instead of strings, you win, because it only takes a third or fourth of the time.
var quantity = 400000;
var uniqueIdentifiers = new HashSet<int>();
while (uniqueIdentifiers.Count < quantity)
{
int i=0;
i = i*10 + random.Next(1,10);
i = i*10 + random.Next(1,10);
i = i*10 + random.Next(1,10);
i = i*10 + random.Next(1,10);
i = i*10 + random.Next(1,10);
i = i*10 + random.Next(1,10);
i = i*10 + random.Next(1,10);
i = i*10 + random.Next(1,10);
i = i*10 + random.Next(1,10);
uniqueIdentifiers.Add(i);
}
It takes about 270 ms on my machine for 400,000 numbers and about 700 for 1,000,000. And this even without any parallelism.
Because of the use of a HashSet<T> instead of a List<T>, this algorithm runs in O(n), i.e. the duration will grow linear. 10,000,000 values therefore take about 7 seconds.
This suggestion may or may not be popular.... it depends on people's perspective. Because you haven't been too specific about what you need them for, how often, or the exact number, I will suggest a brute force approach.
I would generate a hundred thousand numbers - shouldn't take very long at all, maybe a few seconds? Then use Parallel LINQ to do a Distinct() on them to eliminate duplicates. Then use another PLINQ query to run a regex against the remainder to eliminate any with zeroes in them. Then take the top x thousand. (PLINQ is brilliant for ripping through large tasks like this). If needed, rinse and repeat until you have enough for your needs.
On a decent machine it will just about take you longer to write this simple function than it will take to run it. I would also query why you have 400K entries to test when you state you actually need "tens of thousands"?
The trick here is that you only need ten thousand unique numbers. Theoretically you could have almost 9,0E+08 possibilities, but why care if you need so many less?
Once you realize that you can cut down on the combinations that much then creating enough unique numbers is easy:
long[] numbers = { 1, 3, 5, 7 }; //note that we just take a few numbers, enough to create the number of combinations we might need
var list = (from i0 in numbers
from i1 in numbers
from i2 in numbers
from i3 in numbers
from i4 in numbers
from i5 in numbers
from i6 in numbers
from i7 in numbers
from i8 in numbers
from i9 in numbers
select i0 + i1 * 10 + i2 * 100 + i3 * 1000 + i4 * 10000 + i5 * 100000 + i6 * 1000000 + i7 * 10000000 + i8 * 100000000 + i9 * 1000000000).ToList();
This snippet creates a list of more than a 1,000,000 valid unique numbers pretty much instantly.
Try avoiding checks making sure that you always pick up a unique number:
static char[] base9 = "123456789".ToCharArray();
static string ConvertToBase9(int value) {
int num = 9;
char[] result = new char[9];
for (int i = 8; i >= 0; --i) {
result[i] = base9[value % num];
value = value / num;
}
return new string(result);
}
public static void generateIdentifiers(int quantity) {
var uniqueIdentifiers = new List<string>(quantity);
// we have 387420489 (9^9) possible numbers of 9 digits in base 9.
// if we choose a number that is prime to that we can easily get always
// unique numbers
Random random = new Random();
int inc = 386000000;
int seed = random.Next(0, 387420489);
while (uniqueIdentifiers.Count < quantity) {
uniqueIdentifiers.Add(ConvertToBase9(seed));
seed += inc;
seed %= 387420489;
}
}
I'll try to explain the idea behind with small numbers...
Suppose you have at most 7 possible combinations. We choose a number that is prime to 7, e.g. 3, and a random starting number, e.g. 4.
At each round, we add 3 to our current number, and then we take the result modulo 7, so we get this sequence:
4 -> 4 + 3 % 7 = 0
0 -> 0 + 3 % 7 = 3
3 -> 3 + 3 % 7 = 6
6 -> 6 + 6 % 7 = 5
In this way, we generate all the values from 0 to 6 in a non-consecutive way. In my example, we are doing the same, but we have 9^9 possible combinations, and as a number prime to that I choose 386000000 (you just have to avoid multiples of 3).
Then, I pick up the number in the sequence and I convert it to base 9.
I hope this is clear :)
I tested it on my machine, and generating 400k unique values took ~ 1 second.
Meybe this will bee faster:
//we can generate first number wich in 9 base system will be between 88888888 - 888888888
//we can't start from zero becouse it will couse the great amount of 1 digit at begining
int randNumber = random.Next((int)Math.Pow(9, 8) - 1, (int)Math.Pow(9, 9));
//no we change our number to 9 base, but we add 1 to each digit in our number
StringBuilder builder = new StringBuilder();
for (int i=(int)Math.Pow(9,8); i>0;i= i/9)
{
builder.Append(randNumber / i +1);
randNumber = randNumber % i;
}
id = builder.ToString();
Looking at the solutions already posted, mine seems fairly basic. But, it works, and generates 1million values in approximate 1s (10 million in 11s).
public static void generateIdentifiers(int quantity)
{
HashSet<int> uniqueIdentifiers = new HashSet<int>();
while (uniqueIdentifiers.Count < quantity)
{
int value = random.Next(111111111, 999999999);
if (!value.ToString().Contains('0') && !uniqueIdentifiers.Contains(value))
uniqueIdentifiers.Add(value);
}
}
use string array or stringbuilder, wjile working with string additions.
more over, your code is not efficient because after generating many id's your list may hold new generated id, so that the while loop will run more than you need.
use for loops and generate your id's from this loop without randomizing. if random id's are required, use again for loops and generate more than you need and give an generation interval, and selected from this list randomly how much you need.
use the code below to have a static list and fill it at starting your program. i will add later a second code to generate random id list. [i'm a little busy]
public static Random RANDOM = new Random();
public static List<int> randomNumbers = new List<int>();
public static List<string> randomStrings = new List<string>();
private void fillRandomNumbers()
{
int i = 100;
while (i < 1000)
{
if (i.ToString().Contains('0') == false)
{
randomNumbers.Add(i);
}
}
}
I think first thing would be to use StringBuilder, instead of concatenation - you'll be pleasantly surprised.
Antoher thing - use a more efficient data structure, for example HashSet<> or HashTable.
If you could drop the quite odd requirement not to have zero's - then you could of course use just one random operation, and then format your resulting number the way you want.
I think #slugster is broadly right - although you could run two parallel processes, one to generate numbers, the other to verify them and add them to the list of accepted numbers when verified. Once you have enough, signal the original process to stop.
Combine this with other suggestions - using more efficient and appropriate data structures - and you should have something that works acceptably.
However the question of why you need such numbers is also significant - this requirement seems like one that should be analysed.
Something like this?
public List<string> generateIdentifiers2(int quantity)
{
var uniqueIdentifiers = new List<string>(quantity);
while (uniqueIdentifiers.Count < quantity)
{
var sb = new StringBuilder();
sb.Append(random.Next(11, 100));
sb.Append(" ");
sb.Append(random.Next(11, 100));
sb.Append(" ");
sb.Append(random.Next(11, 100));
var id = sb.ToString();
id = new string(id.ToList().ConvertAll(x => x == '0' ? char.Parse(random.Next(1, 10).ToString()) : x).ToArray());
if (!uniqueIdentifiers.Contains(id))
{
uniqueIdentifiers.Add(id);
}
}
return uniqueIdentifiers;
}

Random playlist algorithm

I need to create a list of numbers from a range (for example from x to y) in a random order so that every order has an equal chance.
I need this for a music player I write in C#, to create play lists in a random order.
Any ideas?
Thanks.
EDIT: I'm not interested in changing the original list, just pick up random indexes from a range in a random order so that every order has an equal chance.
Here's what I've wrriten so far:
public static IEnumerable<int> RandomIndexes(int count)
{
if (count > 0)
{
int[] indexes = new int[count];
int indexesCountMinus1 = count - 1;
for (int i = 0; i < count; i++)
{
indexes[i] = i;
}
Random random = new Random();
while (indexesCountMinus1 > 0)
{
int currIndex = random.Next(0, indexesCountMinus1 + 1);
yield return indexes[currIndex];
indexes[currIndex] = indexes[indexesCountMinus1];
indexesCountMinus1--;
}
yield return indexes[0];
}
}
It's working, but the only problem of this is that I need to allocate an array in the memory in the size of count. I'm looking for something that dose not require memory allocation.
Thanks.
This can actually be tricky if you're not careful (i.e., using a naïve shuffling algorithm). Take a look at the Fisher-Yates/Knuth shuffle algorithm for proper distribution of values.
Once you have the shuffling algorithm, the rest should be easy.
Here's more detail from Jeff Atwood.
Lastly, here's Jon Skeet's implementation and description.
EDIT
I don't believe that there's a solution that satisfies your two conflicting requirements (first, to be random with no repeats and second to not allocate any additional memory). I believe you may be prematurely optimizing your solution as the memory implications should be negligible, unless you're embedded. Or, perhaps I'm just not smart enough to come up with an answer.
With that, here's code that will create an array of evenly distributed random indexes using the Knuth-Fisher-Yates algorithm (with a slight modification). You can cache the resulting array, or perform any number of optimizations depending on the rest of your implementation.
private static int[] BuildShuffledIndexArray( int size ) {
int[] array = new int[size];
Random rand = new Random();
for ( int currentIndex = array.Length - 1; currentIndex > 0; currentIndex-- ) {
int nextIndex = rand.Next( currentIndex + 1 );
Swap( array, currentIndex, nextIndex );
}
return array;
}
private static void Swap( IList<int> array, int firstIndex, int secondIndex ) {
if ( array[firstIndex] == 0 ) {
array[firstIndex] = firstIndex;
}
if ( array[secondIndex] == 0 ) {
array[secondIndex] = secondIndex;
}
int temp = array[secondIndex];
array[secondIndex] = array[firstIndex];
array[firstIndex] = temp;
}
NOTE: You can use ushort instead of int to half the size in memory as long as you don't have more than 65,535 items in your playlist. You could always programmatically switch to int if the size exceeds ushort.MaxValue. If I, personally, added more than 65K items to a playlist, I wouldn't be shocked by increased memory utilization.
Remember, too, that this is a managed language. The VM will always reserve more memory than you are using to limit the number of times it needs to ask the OS for more RAM and to limit fragmentation.
EDIT
Okay, last try: we can look to tweak the performance/memory trade off: You could create your list of integers, then write it to disk. Then just keep a pointer to the offset in the file. Then every time you need a new number, you just have disk I/O to deal with. Perhaps you can find some balance here, and just read N-sized blocks of data into memory where N is some number you're comfortable with.
Seems like a lot of work for a shuffle algorithm, but if you're dead-set on conserving memory, then at least it's an option.
If you use a maximal linear feedback shift register, you will use O(1) of memory and roughly O(1) time. See here for a handy C implementation (two lines! woo-hoo!) and tables of feedback terms to use.
And here is a solution:
public class MaximalLFSR
{
private int GetFeedbackSize(uint v)
{
uint r = 0;
while ((v >>= 1) != 0)
{
r++;
}
if (r < 4)
r = 4;
return (int)r;
}
static uint[] _feedback = new uint[] {
0x9, 0x17, 0x30, 0x44, 0x8e,
0x108, 0x20d, 0x402, 0x829, 0x1013, 0x203d, 0x4001, 0x801f,
0x1002a, 0x2018b, 0x400e3, 0x801e1, 0x10011e, 0x2002cc, 0x400079, 0x80035e,
0x1000160, 0x20001e4, 0x4000203, 0x8000100, 0x10000235, 0x2000027d, 0x4000016f, 0x80000478
};
private uint GetFeedbackTerm(int bits)
{
if (bits < 4 || bits >= 28)
throw new ArgumentOutOfRangeException("bits");
return _feedback[bits];
}
public IEnumerable<int> RandomIndexes(int count)
{
if (count < 0)
throw new ArgumentOutOfRangeException("count");
int bitsForFeedback = GetFeedbackSize((uint)count);
Random r = new Random();
uint i = (uint)(r.Next(1, count - 1));
uint feedback = GetFeedbackTerm(bitsForFeedback);
int valuesReturned = 0;
while (valuesReturned < count)
{
if ((i & 1) != 0)
{
i = (i >> 1) ^ feedback;
}
else {
i = (i >> 1);
}
if (i <= count)
{
valuesReturned++;
yield return (int)(i-1);
}
}
}
}
Now, I selected the feedback terms (badly) at random from the link above. You could also implement a version that had multiple maximal terms and you select one of those at random, but you know what? This is pretty dang good for what you want.
Here is test code:
static void Main(string[] args)
{
while (true)
{
Console.Write("Enter a count: ");
string s = Console.ReadLine();
int count;
if (Int32.TryParse(s, out count))
{
MaximalLFSR lfsr = new MaximalLFSR();
foreach (int i in lfsr.RandomIndexes(count))
{
Console.Write(i + ", ");
}
}
Console.WriteLine("Done.");
}
}
Be aware that maximal LFSR's never generate 0. I've hacked around this by returning the i term - 1. This works well enough. Also, since you want to guarantee uniqueness, I ignore anything out of range - the LFSR only generates sequences up to powers of two, so in high ranges, it will generate wost case 2x-1 too many values. These will get skipped - that will still be faster than FYK.
Personally, for a music player, I wouldn't generate a shuffled list, and then play that, then generate another shuffled list when that runs out, but do something more like:
IEnumerable<Song> GetSongOrder(List<Song> allSongs)
{
var playOrder = new List<Song>();
while (true)
{
// this step assigns an integer weight to each song,
// corresponding to how likely it is to be played next.
// in a better implementation, this would look at the total number of
// songs as well, and provide a smoother ramp up/down.
var weights = allSongs.Select(x => playOrder.LastIndexOf(x) > playOrder.Length - 10 ? 50 : 1);
int position = random.Next(weights.Sum());
foreach (int i in Enumerable.Range(allSongs.Length))
{
position -= weights[i];
if (position < 0)
{
var song = allSongs[i];
playOrder.Add(song);
yield return song;
break;
}
}
// trim playOrder to prevent infinite memory here as well.
if (playOrder.Length > allSongs.Length * 10)
playOrder = playOrder.Skip(allSongs.Length * 8).ToList();
}
}
This would make songs picked in order, as long as they haven't been recently played. This provides "smoother" transitions from the end of one shuffle to the next, because the first song of the next shuffle could be the same song as the last shuffle with 1/(total songs) probability, whereas this algorithm has a lower (and configurable) chance of hearing one of the last x songs again.
Unless you shuffle the original song list (which you said you don't want to do), you are going to have to allocate some additional memory to accomplish what you're after.
If you generate the random permutation of song indices beforehand (as you are doing), you obviously have to allocate some non-trivial amount of memory to store it, either encoded or as a list.
If the user doesn't need to be able to see the list, you could generate the random song order on the fly: After each song, pick another random song from the pool of unplayed songs. You still have to keep track of which songs have already been played, but you can use a bitfield for that. If you have 10000 songs, you just need 10000 bits (1250 bytes), each one representing whether the song has been played yet.
I don't know your exact limitations, but I have to wonder if the memory required to store a playlist is significant compared to the amount required for playing audio.
There are a number of methods of generating permutations without needing to store the state. See this question.
I think you should stick to your current solution (the one in your edit).
To do a re-order with no repetitions & not making your code behave unreliable, you have to track what you have already used / like by keeping unused indexes or indirectly by swapping from the original list.
I suggest to check it in the context of the working application i.e. if its of any significance vs. the memory used by other pieces of the system.
From a logical standpoint, it is possible. Given a list of n songs, there are n! permutations; if you assign each permutation a number from 1 to n! (or 0 to n!-1 :-D) and pick one of those numbers at random, you can then store the number of the permutation that you are currently using, along with the original list and the index of the current song within the permutation.
For example, if you have a list of songs {1, 2, 3}, your permutations are:
0: {1, 2, 3}
1: {1, 3, 2}
2: {2, 1, 3}
3: {2, 3, 1}
4: {3, 1, 2}
5: {3, 2, 1}
So the only data I need to track is the original list ({1, 2, 3}), the current song index (e.g. 1) and the index of the permutation (e.g. 3). Then, if I want to find the next song to play, I know it's third (2, but zero-based) song of permutation 3, e.g. Song 1.
However, this method relies on you having an efficient means of determining the ith song of the jth permutation, which until I've had chance to think (or someone with a stronger mathematical background than I can interject) is equivalent to "then a miracle happens". But the principle is there.
If memory was really a concern after a certain number of records and it's safe to say that if that memory boundary is reached, there's enough items in the list to not matter if there are some repeats, just as long as the same song was not repeated twice, I would use a combination method.
Case 1: If count < max memory constraint, generate the playlist ahead of time and use Knuth shuffle (see Jon Skeet's implementation, mentioned in other answers).
Case 2: If count >= max memory constraint, the song to be played will be determined at run time (I'd do it as soon as the song starts playing so the next song is already generated by the time the current song ends). Save the last [max memory constraint, or some token value] number of songs played, generate a random number (R) between 1 and song count, and if R = one of X last songs played, generate a new R until it is not in the list. Play that song.
Your max memory constraints will always be upheld, although performance can suffer in case 2 if you've played a lot of songs/get repeat random numbers frequently by chance.
you could use a trick we do in sql server to order sets in random like this with the use of guid. the values are always distributed equaly random.
private IEnumerable<int> RandomIndexes(int startIndexInclusive, int endIndexInclusive)
{
if (endIndexInclusive < startIndexInclusive)
throw new Exception("endIndex must be equal or higher than startIndex");
List<int> originalList = new List<int>(endIndexInclusive - startIndexInclusive);
for (int i = startIndexInclusive; i <= endIndexInclusive; i++)
originalList.Add(i);
return from i in originalList
orderby Guid.NewGuid()
select i;
}
You're going to have to allocate some memory, but it doesn't have to be a lot. You can reduce the memory footprint (the degree by which I'm unsure, as I don't know that much about the guts of C#) by using a bool array instead of int. Best case scenario this will only use (count / 8) bytes of memory, which isn't too bad (but I doubt C# actually represents bools as single bits).
public static IEnumerable<int> RandomIndexes(int count) {
Random rand = new Random();
bool[] used = new bool[count];
int i;
for (int counter = 0; counter < count; counter++) {
while (used[i = rand.Next(count)]); //i = some random unused value
used[i] = true;
yield return i;
}
}
Hope that helps!
As many others have said you should implement THEN optimize, and only optimize the parts that need it (which you check on with a profiler). I offer a (hopefully) elegant method of getting the list you need, which doesn't really care so much about performance:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Test
{
class Program
{
static void Main(string[] a)
{
Random random = new Random();
List<int> list1 = new List<int>(); //source list
List<int> list2 = new List<int>();
list2 = random.SequenceWhile((i) =>
{
if (list2.Contains(i))
{
return false;
}
list2.Add(i);
return true;
},
() => list2.Count == list1.Count,
list1.Count).ToList();
}
}
public static class RandomExtensions
{
public static IEnumerable<int> SequenceWhile(
this Random random,
Func<int, bool> shouldSkip,
Func<bool> continuationCondition,
int maxValue)
{
int current = random.Next(maxValue);
while (continuationCondition())
{
if (!shouldSkip(current))
{
yield return current;
}
current = random.Next(maxValue);
}
}
}
}
It is pretty much impossible to do it without allocating extra memory. If you're worried about the amount of extra memory allocated, you could always pick a random subset and shuffle between those. You'll get repeats before every song is played, but with a sufficiently large subset I'll warrant few people will notice.
const int MaxItemsToShuffle = 20;
public static IEnumerable<int> RandomIndexes(int count)
{
Random random = new Random();
int indexCount = Math.Min(count, MaxItemsToShuffle);
int[] indexes = new int[indexCount];
if (count > MaxItemsToShuffle)
{
int cur = 0, subsetCount = MaxItemsToShuffle;
for (int i = 0; i < count; i += 1)
{
if (random.NextDouble() <= ((float)subsetCount / (float)(count - i + 1)))
{
indexes[cur] = i;
cur += 1;
subsetCount -= 1;
}
}
}
else
{
for (int i = 0; i < count; i += 1)
{
indexes[i] = i;
}
}
for (int i = indexCount; i > 0; i -= 1)
{
int curIndex = random.Next(0, i);
yield return indexes[curIndex];
indexes[curIndex] = indexes[i - 1];
}
}

Categories