Best way to randomize an array with .NET - c#

What is the best way to randomize an array of strings with .NET? My array contains about 500 strings and I'd like to create a new Array with the same strings but in a random order.
Please include a C# example in your answer.

The following implementation uses the Fisher-Yates algorithm AKA the Knuth Shuffle. It runs in O(n) time and shuffles in place, so is better performing than the 'sort by random' technique, although it is more lines of code. See here for some comparative performance measurements. I have used System.Random, which is fine for non-cryptographic purposes.*
static class RandomExtensions
{
public static void Shuffle<T> (this Random rng, T[] array)
{
int n = array.Length;
while (n > 1)
{
int k = rng.Next(n--);
T temp = array[n];
array[n] = array[k];
array[k] = temp;
}
}
}
Usage:
var array = new int[] {1, 2, 3, 4};
var rng = new Random();
rng.Shuffle(array);
rng.Shuffle(array); // different order from first call to Shuffle
* For longer arrays, in order to make the (extremely large) number of permutations equally probable it would be necessary to run a pseudo-random number generator (PRNG) through many iterations for each swap to produce enough entropy. For a 500-element array only a very small fraction of the possible 500! permutations will be possible to obtain using a PRNG. Nevertheless, the Fisher-Yates algorithm is unbiased and therefore the shuffle will be as good as the RNG you use.

If you're on .NET 3.5, you can use the following IEnumerable coolness:
Random rnd=new Random();
string[] MyRandomArray = MyArray.OrderBy(x => rnd.Next()).ToArray();
Edit: and here's the corresponding VB.NET code:
Dim rnd As New System.Random
Dim MyRandomArray = MyArray.OrderBy(Function() rnd.Next()).ToArray()
Second edit, in response to remarks that System.Random "isn't threadsafe" and "only suitable for toy apps" due to returning a time-based sequence: as used in my example, Random() is perfectly thread-safe, unless you're allowing the routine in which you randomize the array to be re-entered, in which case you'll need something like lock (MyRandomArray) anyway in order not to corrupt your data, which will protect rnd as well.
Also, it should be well-understood that System.Random as a source of entropy isn't very strong. As noted in the MSDN documentation, you should use something derived from System.Security.Cryptography.RandomNumberGenerator if you're doing anything security-related. For example:
using System.Security.Cryptography;
...
RNGCryptoServiceProvider rnd = new RNGCryptoServiceProvider();
string[] MyRandomArray = MyArray.OrderBy(x => GetNextInt32(rnd)).ToArray();
...
static int GetNextInt32(RNGCryptoServiceProvider rnd)
{
byte[] randomInt = new byte[4];
rnd.GetBytes(randomInt);
return Convert.ToInt32(randomInt[0]);
}

You're looking for a shuffling algorithm, right?
Okay, there are two ways to do this: the clever-but-people-always-seem-to-misunderstand-it-and-get-it-wrong-so-maybe-its-not-that-clever-after-all way, and the dumb-as-rocks-but-who-cares-because-it-works way.
Dumb way
Create a duplicate of your first array, but tag each string should with a random number.
Sort the duplicate array with respect to the random number.
This algorithm works well, but make sure that your random number generator is unlikely to tag two strings with the same number. Because of the so-called Birthday Paradox, this happens more often than you might expect. Its time complexity is O(n log n).
Clever way
I'll describe this as a recursive algorithm:
To shuffle an array of size n (indices in the range [0..n-1]):
if n = 0
do nothing
if n > 0
(recursive step) shuffle the first n-1 elements of the array
choose a random index, x, in the range [0..n-1]
swap the element at index n-1 with the element at index x
The iterative equivalent is to walk an iterator through the array, swapping with random elements as you go along, but notice that you cannot swap with an element after the one that the iterator points to. This is a very common mistake, and leads to a biased shuffle.
Time complexity is O(n).

This algorithm is simple but not efficient, O(N2). All the "order by" algorithms are typically O(N log N). It probably doesn't make a difference below hundreds of thousands of elements but it would for large lists.
var stringlist = ... // add your values to stringlist
var r = new Random();
var res = new List<string>(stringlist.Count);
while (stringlist.Count >0)
{
var i = r.Next(stringlist.Count);
res.Add(stringlist[i]);
stringlist.RemoveAt(i);
}
The reason why it's O(N2) is subtle: List.RemoveAt() is a O(N) operation unless you remove in order from the end.

You can also make an extention method out of Matt Howells. Example.
namespace System
{
public static class MSSystemExtenstions
{
private static Random rng = new Random();
public static void Shuffle<T>(this T[] array)
{
rng = new Random();
int n = array.Length;
while (n > 1)
{
int k = rng.Next(n);
n--;
T temp = array[n];
array[n] = array[k];
array[k] = temp;
}
}
}
}
Then you can just use it like:
string[] names = new string[] {
"Aaron Moline1",
"Aaron Moline2",
"Aaron Moline3",
"Aaron Moline4",
"Aaron Moline5",
"Aaron Moline6",
"Aaron Moline7",
"Aaron Moline8",
"Aaron Moline9",
};
names.Shuffle<string>();

Just thinking off the top of my head, you could do this:
public string[] Randomize(string[] input)
{
List<string> inputList = input.ToList();
string[] output = new string[input.Length];
Random randomizer = new Random();
int i = 0;
while (inputList.Count > 0)
{
int index = r.Next(inputList.Count);
output[i++] = inputList[index];
inputList.RemoveAt(index);
}
return (output);
}

Randomizing the array is intensive as you have to shift around a bunch of strings. Why not just randomly read from the array? In the worst case you could even create a wrapper class with a getNextString(). If you really do need to create a random array then you could do something like
for i = 0 -> i= array.length * 5
swap two strings in random places
The *5 is arbitrary.

public static void Shuffle(object[] arr)
{
Random rand = new Random();
for (int i = arr.Length - 1; i >= 1; i--)
{
int j = rand.Next(i + 1);
object tmp = arr[j];
arr[j] = arr[i];
arr[i] = tmp;
}
}

Generate an array of random floats or ints of the same length. Sort that array, and do corresponding swaps on your target array.
This yields a truly independent sort.

Ok, this is clearly a bump from my side (apologizes...), but I often use a quite general and cryptographically strong method.
public static class EnumerableExtensions
{
static readonly RNGCryptoServiceProvider RngCryptoServiceProvider = new RNGCryptoServiceProvider();
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> enumerable)
{
var randomIntegerBuffer = new byte[4];
Func<int> rand = () =>
{
RngCryptoServiceProvider.GetBytes(randomIntegerBuffer);
return BitConverter.ToInt32(randomIntegerBuffer, 0);
};
return from item in enumerable
let rec = new {item, rnd = rand()}
orderby rec.rnd
select rec.item;
}
}
Shuffle() is an extension on any IEnumerable so getting, say, numbers from 0 to 1000 in random order in a list can be done with
Enumerable.Range(0,1000).Shuffle().ToList()
This method also wont give any surprises when it comes to sorting, since the sort value is generated and remembered exactly once per element in the sequence.

Random r = new Random();
List<string> list = new List(originalArray);
List<string> randomStrings = new List();
while(list.Count > 0)
{
int i = r.Random(list.Count);
randomStrings.Add(list[i]);
list.RemoveAt(i);
}

Jacco, your solution ising a custom IComparer isn't safe. The Sort routines require the comparer to conform to several requirements in order to function properly. First among them is consistency. If the comparer is called on the same pair of objects, it must always return the same result. (the comparison must also be transitive).
Failure to meet these requirements can cause any number of problems in the sorting routine including the possibility of an infinite loop.
Regarding the solutions that associate a random numeric value with each entry and then sort by that value, these are lead to an inherent bias in the output because any time two entries are assigned the same numeric value, the randomness of the output will be compromised. (In a "stable" sort routine, whichever is first in the input will be first in the output. Array.Sort doesn't happen to be stable, but there is still a bias based on the partitioning done by the Quicksort algorithm).
You need to do some thinking about what level of randomness you require. If you are running a poker site where you need cryptographic levels of randomness to protect against a determined attacker you have very different requirements from someone who just wants to randomize a song playlist.
For song-list shuffling, there's no problem using a seeded PRNG (like System.Random). For a poker site, it's not even an option and you need to think about the problem a lot harder than anyone is going to do for you on stackoverflow. (using a cryptographic RNG is only the beginning, you need to ensure that your algorithm doesn't introduce a bias, that you have sufficient sources of entropy, and that you don't expose any internal state that would compromise subsequent randomness).

This post has already been pretty well answered - use a Durstenfeld implementation of the Fisher-Yates shuffle for a fast and unbiased result. There have even been some implementations posted, though I note some are actually incorrect.
I wrote a couple of posts a while back about implementing full and partial shuffles using this technique, and (this second link is where I'm hoping to add value) also a follow-up post about how to check whether your implementation is unbiased, which can be used to check any shuffle algorithm. You can see at the end of the second post the effect of a simple mistake in the random number selection can make.

You don't need complicated algorithms.
Just one simple line:
Random random = new Random();
array.ToList().Sort((x, y) => random.Next(-1, 1)).ToArray();
Note that we need to convert the Array to a List first, if you don't use List in the first place.
Also, mind that this is not efficient for very large arrays! Otherwise it's clean & simple.

This is a complete working Console solution based on the example provided in here:
class Program
{
static string[] words1 = new string[] { "brown", "jumped", "the", "fox", "quick" };
static void Main()
{
var result = Shuffle(words1);
foreach (var i in result)
{
Console.Write(i + " ");
}
Console.ReadKey();
}
static string[] Shuffle(string[] wordArray) {
Random random = new Random();
for (int i = wordArray.Length - 1; i > 0; i--)
{
int swapIndex = random.Next(i + 1);
string temp = wordArray[i];
wordArray[i] = wordArray[swapIndex];
wordArray[swapIndex] = temp;
}
return wordArray;
}
}

int[] numbers = {0,1,2,3,4,5,6,7,8,9};
List<int> numList = new List<int>();
numList.AddRange(numbers);
Console.WriteLine("Original Order");
for (int i = 0; i < numList.Count; i++)
{
Console.Write(String.Format("{0} ",numList[i]));
}
Random random = new Random();
Console.WriteLine("\n\nRandom Order");
for (int i = 0; i < numList.Capacity; i++)
{
int randomIndex = random.Next(numList.Count);
Console.Write(String.Format("{0} ", numList[randomIndex]));
numList.RemoveAt(randomIndex);
}
Console.ReadLine();

Could be:
Random random = new();
string RandomWord()
{
const string CHARS = "abcdefghijklmnoprstuvwxyz";
int n = random.Next(CHARS.Length);
return string.Join("", CHARS.OrderBy(x => random.Next()).ToArray())[0..n];
}

Here's a simple way using OLINQ:
// Input array
List<String> lst = new List<string>();
for (int i = 0; i < 500; i += 1) lst.Add(i.ToString());
// Output array
List<String> lstRandom = new List<string>();
// Randomize
Random rnd = new Random();
lstRandom.AddRange(from s in lst orderby rnd.Next(100) select s);

private ArrayList ShuffleArrayList(ArrayList source)
{
ArrayList sortedList = new ArrayList();
Random generator = new Random();
while (source.Count > 0)
{
int position = generator.Next(source.Count);
sortedList.Add(source[position]);
source.RemoveAt(position);
}
return sortedList;
}

Related

does fisher yates shuffle based on index? [duplicate]

What is the best way to randomize an array of strings with .NET? My array contains about 500 strings and I'd like to create a new Array with the same strings but in a random order.
Please include a C# example in your answer.
The following implementation uses the Fisher-Yates algorithm AKA the Knuth Shuffle. It runs in O(n) time and shuffles in place, so is better performing than the 'sort by random' technique, although it is more lines of code. See here for some comparative performance measurements. I have used System.Random, which is fine for non-cryptographic purposes.*
static class RandomExtensions
{
public static void Shuffle<T> (this Random rng, T[] array)
{
int n = array.Length;
while (n > 1)
{
int k = rng.Next(n--);
T temp = array[n];
array[n] = array[k];
array[k] = temp;
}
}
}
Usage:
var array = new int[] {1, 2, 3, 4};
var rng = new Random();
rng.Shuffle(array);
rng.Shuffle(array); // different order from first call to Shuffle
* For longer arrays, in order to make the (extremely large) number of permutations equally probable it would be necessary to run a pseudo-random number generator (PRNG) through many iterations for each swap to produce enough entropy. For a 500-element array only a very small fraction of the possible 500! permutations will be possible to obtain using a PRNG. Nevertheless, the Fisher-Yates algorithm is unbiased and therefore the shuffle will be as good as the RNG you use.
If you're on .NET 3.5, you can use the following IEnumerable coolness:
Random rnd=new Random();
string[] MyRandomArray = MyArray.OrderBy(x => rnd.Next()).ToArray();
Edit: and here's the corresponding VB.NET code:
Dim rnd As New System.Random
Dim MyRandomArray = MyArray.OrderBy(Function() rnd.Next()).ToArray()
Second edit, in response to remarks that System.Random "isn't threadsafe" and "only suitable for toy apps" due to returning a time-based sequence: as used in my example, Random() is perfectly thread-safe, unless you're allowing the routine in which you randomize the array to be re-entered, in which case you'll need something like lock (MyRandomArray) anyway in order not to corrupt your data, which will protect rnd as well.
Also, it should be well-understood that System.Random as a source of entropy isn't very strong. As noted in the MSDN documentation, you should use something derived from System.Security.Cryptography.RandomNumberGenerator if you're doing anything security-related. For example:
using System.Security.Cryptography;
...
RNGCryptoServiceProvider rnd = new RNGCryptoServiceProvider();
string[] MyRandomArray = MyArray.OrderBy(x => GetNextInt32(rnd)).ToArray();
...
static int GetNextInt32(RNGCryptoServiceProvider rnd)
{
byte[] randomInt = new byte[4];
rnd.GetBytes(randomInt);
return Convert.ToInt32(randomInt[0]);
}
You're looking for a shuffling algorithm, right?
Okay, there are two ways to do this: the clever-but-people-always-seem-to-misunderstand-it-and-get-it-wrong-so-maybe-its-not-that-clever-after-all way, and the dumb-as-rocks-but-who-cares-because-it-works way.
Dumb way
Create a duplicate of your first array, but tag each string should with a random number.
Sort the duplicate array with respect to the random number.
This algorithm works well, but make sure that your random number generator is unlikely to tag two strings with the same number. Because of the so-called Birthday Paradox, this happens more often than you might expect. Its time complexity is O(n log n).
Clever way
I'll describe this as a recursive algorithm:
To shuffle an array of size n (indices in the range [0..n-1]):
if n = 0
do nothing
if n > 0
(recursive step) shuffle the first n-1 elements of the array
choose a random index, x, in the range [0..n-1]
swap the element at index n-1 with the element at index x
The iterative equivalent is to walk an iterator through the array, swapping with random elements as you go along, but notice that you cannot swap with an element after the one that the iterator points to. This is a very common mistake, and leads to a biased shuffle.
Time complexity is O(n).
This algorithm is simple but not efficient, O(N2). All the "order by" algorithms are typically O(N log N). It probably doesn't make a difference below hundreds of thousands of elements but it would for large lists.
var stringlist = ... // add your values to stringlist
var r = new Random();
var res = new List<string>(stringlist.Count);
while (stringlist.Count >0)
{
var i = r.Next(stringlist.Count);
res.Add(stringlist[i]);
stringlist.RemoveAt(i);
}
The reason why it's O(N2) is subtle: List.RemoveAt() is a O(N) operation unless you remove in order from the end.
You can also make an extention method out of Matt Howells. Example.
namespace System
{
public static class MSSystemExtenstions
{
private static Random rng = new Random();
public static void Shuffle<T>(this T[] array)
{
rng = new Random();
int n = array.Length;
while (n > 1)
{
int k = rng.Next(n);
n--;
T temp = array[n];
array[n] = array[k];
array[k] = temp;
}
}
}
}
Then you can just use it like:
string[] names = new string[] {
"Aaron Moline1",
"Aaron Moline2",
"Aaron Moline3",
"Aaron Moline4",
"Aaron Moline5",
"Aaron Moline6",
"Aaron Moline7",
"Aaron Moline8",
"Aaron Moline9",
};
names.Shuffle<string>();
Just thinking off the top of my head, you could do this:
public string[] Randomize(string[] input)
{
List<string> inputList = input.ToList();
string[] output = new string[input.Length];
Random randomizer = new Random();
int i = 0;
while (inputList.Count > 0)
{
int index = r.Next(inputList.Count);
output[i++] = inputList[index];
inputList.RemoveAt(index);
}
return (output);
}
Randomizing the array is intensive as you have to shift around a bunch of strings. Why not just randomly read from the array? In the worst case you could even create a wrapper class with a getNextString(). If you really do need to create a random array then you could do something like
for i = 0 -> i= array.length * 5
swap two strings in random places
The *5 is arbitrary.
public static void Shuffle(object[] arr)
{
Random rand = new Random();
for (int i = arr.Length - 1; i >= 1; i--)
{
int j = rand.Next(i + 1);
object tmp = arr[j];
arr[j] = arr[i];
arr[i] = tmp;
}
}
Generate an array of random floats or ints of the same length. Sort that array, and do corresponding swaps on your target array.
This yields a truly independent sort.
Ok, this is clearly a bump from my side (apologizes...), but I often use a quite general and cryptographically strong method.
public static class EnumerableExtensions
{
static readonly RNGCryptoServiceProvider RngCryptoServiceProvider = new RNGCryptoServiceProvider();
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> enumerable)
{
var randomIntegerBuffer = new byte[4];
Func<int> rand = () =>
{
RngCryptoServiceProvider.GetBytes(randomIntegerBuffer);
return BitConverter.ToInt32(randomIntegerBuffer, 0);
};
return from item in enumerable
let rec = new {item, rnd = rand()}
orderby rec.rnd
select rec.item;
}
}
Shuffle() is an extension on any IEnumerable so getting, say, numbers from 0 to 1000 in random order in a list can be done with
Enumerable.Range(0,1000).Shuffle().ToList()
This method also wont give any surprises when it comes to sorting, since the sort value is generated and remembered exactly once per element in the sequence.
Random r = new Random();
List<string> list = new List(originalArray);
List<string> randomStrings = new List();
while(list.Count > 0)
{
int i = r.Random(list.Count);
randomStrings.Add(list[i]);
list.RemoveAt(i);
}
Jacco, your solution ising a custom IComparer isn't safe. The Sort routines require the comparer to conform to several requirements in order to function properly. First among them is consistency. If the comparer is called on the same pair of objects, it must always return the same result. (the comparison must also be transitive).
Failure to meet these requirements can cause any number of problems in the sorting routine including the possibility of an infinite loop.
Regarding the solutions that associate a random numeric value with each entry and then sort by that value, these are lead to an inherent bias in the output because any time two entries are assigned the same numeric value, the randomness of the output will be compromised. (In a "stable" sort routine, whichever is first in the input will be first in the output. Array.Sort doesn't happen to be stable, but there is still a bias based on the partitioning done by the Quicksort algorithm).
You need to do some thinking about what level of randomness you require. If you are running a poker site where you need cryptographic levels of randomness to protect against a determined attacker you have very different requirements from someone who just wants to randomize a song playlist.
For song-list shuffling, there's no problem using a seeded PRNG (like System.Random). For a poker site, it's not even an option and you need to think about the problem a lot harder than anyone is going to do for you on stackoverflow. (using a cryptographic RNG is only the beginning, you need to ensure that your algorithm doesn't introduce a bias, that you have sufficient sources of entropy, and that you don't expose any internal state that would compromise subsequent randomness).
This post has already been pretty well answered - use a Durstenfeld implementation of the Fisher-Yates shuffle for a fast and unbiased result. There have even been some implementations posted, though I note some are actually incorrect.
I wrote a couple of posts a while back about implementing full and partial shuffles using this technique, and (this second link is where I'm hoping to add value) also a follow-up post about how to check whether your implementation is unbiased, which can be used to check any shuffle algorithm. You can see at the end of the second post the effect of a simple mistake in the random number selection can make.
You don't need complicated algorithms.
Just one simple line:
Random random = new Random();
array.ToList().Sort((x, y) => random.Next(-1, 1)).ToArray();
Note that we need to convert the Array to a List first, if you don't use List in the first place.
Also, mind that this is not efficient for very large arrays! Otherwise it's clean & simple.
This is a complete working Console solution based on the example provided in here:
class Program
{
static string[] words1 = new string[] { "brown", "jumped", "the", "fox", "quick" };
static void Main()
{
var result = Shuffle(words1);
foreach (var i in result)
{
Console.Write(i + " ");
}
Console.ReadKey();
}
static string[] Shuffle(string[] wordArray) {
Random random = new Random();
for (int i = wordArray.Length - 1; i > 0; i--)
{
int swapIndex = random.Next(i + 1);
string temp = wordArray[i];
wordArray[i] = wordArray[swapIndex];
wordArray[swapIndex] = temp;
}
return wordArray;
}
}
int[] numbers = {0,1,2,3,4,5,6,7,8,9};
List<int> numList = new List<int>();
numList.AddRange(numbers);
Console.WriteLine("Original Order");
for (int i = 0; i < numList.Count; i++)
{
Console.Write(String.Format("{0} ",numList[i]));
}
Random random = new Random();
Console.WriteLine("\n\nRandom Order");
for (int i = 0; i < numList.Capacity; i++)
{
int randomIndex = random.Next(numList.Count);
Console.Write(String.Format("{0} ", numList[randomIndex]));
numList.RemoveAt(randomIndex);
}
Console.ReadLine();
Could be:
Random random = new();
string RandomWord()
{
const string CHARS = "abcdefghijklmnoprstuvwxyz";
int n = random.Next(CHARS.Length);
return string.Join("", CHARS.OrderBy(x => random.Next()).ToArray())[0..n];
}
Here's a simple way using OLINQ:
// Input array
List<String> lst = new List<string>();
for (int i = 0; i < 500; i += 1) lst.Add(i.ToString());
// Output array
List<String> lstRandom = new List<string>();
// Randomize
Random rnd = new Random();
lstRandom.AddRange(from s in lst orderby rnd.Next(100) select s);
private ArrayList ShuffleArrayList(ArrayList source)
{
ArrayList sortedList = new ArrayList();
Random generator = new Random();
while (source.Count > 0)
{
int position = generator.Next(source.Count);
sortedList.Add(source[position]);
source.RemoveAt(position);
}
return sortedList;
}

Random Values for an Array?

I'm looking for some help with my array. to put it in context im trying to create a console application that randomly generates a 4-digit code for the user to guess.
to do this I need an array of [3] and they need to have random numbers assigned to it.
int[] secretCode = new int[3];
secretCode[0] =
secretCode[1] =
secretCode[2] =
secretCode[3] =
My concern is what would i put here to make them generate random numbers?
thank you for your help in advance.
You can use Linq and the Random class
Random rnd = new Random();
int[] secretCode =
Enumerable.Repeat(0, 4).Select(i => rnd.Next(1000, 10000)).ToArray();
or more traditionally
int SIZE = 4;
Random rnd = new Random();
int[] secretCode = new int[SIZE];
for (int i = 0; i < SIZE; i++)
{
secretCode[i] = rnd.Next(1000, 10000);
}
Note that there is a possibility of creating the same code more than once. The solution assumes you want something in the range 1000..9999 (10000 is an exclusive upper bound) since the code is to be 4 digits. If you would also allow smaller numbers, just adjust the Select portion.
As a side note, do not create a new instance of Random inside the select, as it would likely be reseeded to the same seed, based on the system time, resulting in the same "random" number over and over. Also Random is fast, but not cryptographically strong. If you need cryptographically valid randomness use RNGCryptoServiceProvider in place of Random.

Unique number in random?

I know there are many questions on the net about it ,but I would like to know why my method fails
What Am i Doing wrong?
public class Generator
{
private static readonly Random random = new Random();
private static readonly object SyncLock = new object();
public static int GetRandomNumber(int min, int max)
{
lock (SyncLock)
{
return random.Next(min, max);
}
}
}
[TestFixture]
public class Class1
{
[Test]
public void SimpleTest()
{
var numbers=new List<int>();
for (int i = 1; i < 10000; i++)
{
var random = Generator.GetRandomNumber(1,10000);
numbers.Add(random);
}
CollectionAssert.AllItemsAreUnique(numbers);
}
}
EDIT
Test method is failing!! Sorry for not mentioning
Thanks for your time and suggestions
How can you possibly expect a sequence of 10,000 random numbers from a set of 10,000 possible values to be all unique unless you are extremely lucky? What you are expecting is wrong.
Flip a coin twice. Do you truly expect TH and HT to be the only possible sequences?
What makes you think random numbers should work any differently?
This output from a random number generator is possible:
1, 1, 1, 1, 1, 1, ..., 1
So is this:
1, 2, 3, 4, 5, 6, ..., 10000
In fact, both of those sequences are equally likely!
You seem to be under the misapprehension that the Random class generates a sequence of unique, though apparently random numbers. This is simply not the case; randomness implies that the next number could be any of the possible choices, not just any except one I've seen before.
That being the case, it is entirely unsurprising that your test fails: the probability that 10000 randomly generated integers (between 1 and 10000 no less) are unique is minuscule.
Random != Unique
The point here is that your code should model your problem and yours really doesn't. Random is not equal to unique. If you want unique, you need to get your set of values and shuffle them.
If you truly want random numbers, you can't expect them to be unique. If your (P)RNG offers an even distribution, then over many trials you should see similar counts of every value (see the Law of Large Numbers). Cases may show up that seem "wrong" but you can't discount that you hit that case by chance.
public static void FisherYatesShuffle<T>(T[] array)
{
Random r = new Random();
for (int i = array.Length - 1; i > 0; i--)
{
int j = r.Next(0, i + 1);
T temp = array[j];
array[j] = array[i];
array[i] = temp;
}
}
int[] array = new int[10000];
for (int i = 0; i < array.Length; i++) array[i] = i;
FisherYatesShuffle(array);
I think you failed to mention that your test method is failing.
It is failing because your random generator is not producing Unique numbers. I'm not sure how it would under it's current condition.

C# creating a list of random unique integers

I need to create a list with one billion integers and they must all be unique. I also need this to be done extremely fast.
Creating a list and adding random numbers one by one and checking to see if each one is a duplicate is extremely slow.
It seems to be quite fast if I just populate a list with random numbers without checking if they are duplicates and then using distinct().toList(). I repeat this until there are no more duplicates. However the extra memory used by creating a new list is not optimal. Is there a way to get the performance of distinct() but instead of creating a new list it just modifies the source list?
Do the integers need to be in a certain range? If so, you could create an array or list with all numbers in that range (for example from 1 to 1000000000) and shuffle that list.
I found this the fastest while maintaining randomness:
Random rand = new Random();
var ints = Enumerable.Range(0, numOfInts)
.Select(i => new Tuple<int, int>(rand.Next(numOfInts), i))
.OrderBy(i => i.Item1)
.Select(i => i.Item2);
...basically assigning a random id to each int and then sorting by that id and selecting the resulting list of ints.
You can track duplicates in a separate HashSet<int>:
var set = new HashSet<int>();
var nums = new List<int>();
while(nums.Count < 1000000000) {
int num;
do {
num = rand.NextInt();
} while (!set.Contains(num));
set.Add(num);
list.Add(num);
}
You need a separate List<int> to store the numbers because a hashset will not preserve your random ordering.
Taking the question literally (a list with one billion integers and they must all be unique):
Enumerable<int>.Range(0, 1000000000)
But along the lines of CodeCaster's answer, you can create the list and shuffle it at the same time:
var count = 1000000000;
var list = new List<int>(count);
var random = new Random();
list.Add(0);
for (var i = 1; i < count; i++)
{
var swap = random.Next(i - 1);
list.Add(list[swap]);
list[swap] = i;
}
If the amount of possible integers from which you draw is significantly larger (say factor 2) than the amount of integers you want you can simply use a HashSet<T> to check for duplicates.
List<int> GetUniqueRandoms(Random random, int count)
{
List<int> result = new List<int>(count);
HashSet<int> set = new HashSet<int>(count);
for(int i = 0; i < count; i++)
{
int num;
do
{
num = random.NextInt();
while(!set.Add(num));
result.Add(num);
}
return result;
}
This allocates the collections with the correct capacity to avoid reallocation during growth. Since your collections are large this should be a large improvement.
You can also use Distinct a single time:
IEnumerable<int> RandomSequence(Random random)
{
while(true)
{
yield return random.NextInt();
}
}
RandomSequence(rand).Distinct().Take(1000000000).ToList();
But with both solutions you need enough memory for a HashSet<int> and a List<int>.
If the amount of possible integers from which you draw is about as large as the amount of integers you want, you can create an array that contains all of them, shuffle them and finally cut off those you're not interested in.
You can use Jon Skeet's shuffle implementation.
What if you created the list in a sorted but still random fashion (such as adding a random number to the last element of the list as the next element), and then shuffled the list with a Fisher-Yates-Durstenfeld? That would execute in linear time overall, which is pretty much as good as it gets for list generation. However, it might have some significant bias that would affect the distribution.

Random playlist algorithm

I need to create a list of numbers from a range (for example from x to y) in a random order so that every order has an equal chance.
I need this for a music player I write in C#, to create play lists in a random order.
Any ideas?
Thanks.
EDIT: I'm not interested in changing the original list, just pick up random indexes from a range in a random order so that every order has an equal chance.
Here's what I've wrriten so far:
public static IEnumerable<int> RandomIndexes(int count)
{
if (count > 0)
{
int[] indexes = new int[count];
int indexesCountMinus1 = count - 1;
for (int i = 0; i < count; i++)
{
indexes[i] = i;
}
Random random = new Random();
while (indexesCountMinus1 > 0)
{
int currIndex = random.Next(0, indexesCountMinus1 + 1);
yield return indexes[currIndex];
indexes[currIndex] = indexes[indexesCountMinus1];
indexesCountMinus1--;
}
yield return indexes[0];
}
}
It's working, but the only problem of this is that I need to allocate an array in the memory in the size of count. I'm looking for something that dose not require memory allocation.
Thanks.
This can actually be tricky if you're not careful (i.e., using a naïve shuffling algorithm). Take a look at the Fisher-Yates/Knuth shuffle algorithm for proper distribution of values.
Once you have the shuffling algorithm, the rest should be easy.
Here's more detail from Jeff Atwood.
Lastly, here's Jon Skeet's implementation and description.
EDIT
I don't believe that there's a solution that satisfies your two conflicting requirements (first, to be random with no repeats and second to not allocate any additional memory). I believe you may be prematurely optimizing your solution as the memory implications should be negligible, unless you're embedded. Or, perhaps I'm just not smart enough to come up with an answer.
With that, here's code that will create an array of evenly distributed random indexes using the Knuth-Fisher-Yates algorithm (with a slight modification). You can cache the resulting array, or perform any number of optimizations depending on the rest of your implementation.
private static int[] BuildShuffledIndexArray( int size ) {
int[] array = new int[size];
Random rand = new Random();
for ( int currentIndex = array.Length - 1; currentIndex > 0; currentIndex-- ) {
int nextIndex = rand.Next( currentIndex + 1 );
Swap( array, currentIndex, nextIndex );
}
return array;
}
private static void Swap( IList<int> array, int firstIndex, int secondIndex ) {
if ( array[firstIndex] == 0 ) {
array[firstIndex] = firstIndex;
}
if ( array[secondIndex] == 0 ) {
array[secondIndex] = secondIndex;
}
int temp = array[secondIndex];
array[secondIndex] = array[firstIndex];
array[firstIndex] = temp;
}
NOTE: You can use ushort instead of int to half the size in memory as long as you don't have more than 65,535 items in your playlist. You could always programmatically switch to int if the size exceeds ushort.MaxValue. If I, personally, added more than 65K items to a playlist, I wouldn't be shocked by increased memory utilization.
Remember, too, that this is a managed language. The VM will always reserve more memory than you are using to limit the number of times it needs to ask the OS for more RAM and to limit fragmentation.
EDIT
Okay, last try: we can look to tweak the performance/memory trade off: You could create your list of integers, then write it to disk. Then just keep a pointer to the offset in the file. Then every time you need a new number, you just have disk I/O to deal with. Perhaps you can find some balance here, and just read N-sized blocks of data into memory where N is some number you're comfortable with.
Seems like a lot of work for a shuffle algorithm, but if you're dead-set on conserving memory, then at least it's an option.
If you use a maximal linear feedback shift register, you will use O(1) of memory and roughly O(1) time. See here for a handy C implementation (two lines! woo-hoo!) and tables of feedback terms to use.
And here is a solution:
public class MaximalLFSR
{
private int GetFeedbackSize(uint v)
{
uint r = 0;
while ((v >>= 1) != 0)
{
r++;
}
if (r < 4)
r = 4;
return (int)r;
}
static uint[] _feedback = new uint[] {
0x9, 0x17, 0x30, 0x44, 0x8e,
0x108, 0x20d, 0x402, 0x829, 0x1013, 0x203d, 0x4001, 0x801f,
0x1002a, 0x2018b, 0x400e3, 0x801e1, 0x10011e, 0x2002cc, 0x400079, 0x80035e,
0x1000160, 0x20001e4, 0x4000203, 0x8000100, 0x10000235, 0x2000027d, 0x4000016f, 0x80000478
};
private uint GetFeedbackTerm(int bits)
{
if (bits < 4 || bits >= 28)
throw new ArgumentOutOfRangeException("bits");
return _feedback[bits];
}
public IEnumerable<int> RandomIndexes(int count)
{
if (count < 0)
throw new ArgumentOutOfRangeException("count");
int bitsForFeedback = GetFeedbackSize((uint)count);
Random r = new Random();
uint i = (uint)(r.Next(1, count - 1));
uint feedback = GetFeedbackTerm(bitsForFeedback);
int valuesReturned = 0;
while (valuesReturned < count)
{
if ((i & 1) != 0)
{
i = (i >> 1) ^ feedback;
}
else {
i = (i >> 1);
}
if (i <= count)
{
valuesReturned++;
yield return (int)(i-1);
}
}
}
}
Now, I selected the feedback terms (badly) at random from the link above. You could also implement a version that had multiple maximal terms and you select one of those at random, but you know what? This is pretty dang good for what you want.
Here is test code:
static void Main(string[] args)
{
while (true)
{
Console.Write("Enter a count: ");
string s = Console.ReadLine();
int count;
if (Int32.TryParse(s, out count))
{
MaximalLFSR lfsr = new MaximalLFSR();
foreach (int i in lfsr.RandomIndexes(count))
{
Console.Write(i + ", ");
}
}
Console.WriteLine("Done.");
}
}
Be aware that maximal LFSR's never generate 0. I've hacked around this by returning the i term - 1. This works well enough. Also, since you want to guarantee uniqueness, I ignore anything out of range - the LFSR only generates sequences up to powers of two, so in high ranges, it will generate wost case 2x-1 too many values. These will get skipped - that will still be faster than FYK.
Personally, for a music player, I wouldn't generate a shuffled list, and then play that, then generate another shuffled list when that runs out, but do something more like:
IEnumerable<Song> GetSongOrder(List<Song> allSongs)
{
var playOrder = new List<Song>();
while (true)
{
// this step assigns an integer weight to each song,
// corresponding to how likely it is to be played next.
// in a better implementation, this would look at the total number of
// songs as well, and provide a smoother ramp up/down.
var weights = allSongs.Select(x => playOrder.LastIndexOf(x) > playOrder.Length - 10 ? 50 : 1);
int position = random.Next(weights.Sum());
foreach (int i in Enumerable.Range(allSongs.Length))
{
position -= weights[i];
if (position < 0)
{
var song = allSongs[i];
playOrder.Add(song);
yield return song;
break;
}
}
// trim playOrder to prevent infinite memory here as well.
if (playOrder.Length > allSongs.Length * 10)
playOrder = playOrder.Skip(allSongs.Length * 8).ToList();
}
}
This would make songs picked in order, as long as they haven't been recently played. This provides "smoother" transitions from the end of one shuffle to the next, because the first song of the next shuffle could be the same song as the last shuffle with 1/(total songs) probability, whereas this algorithm has a lower (and configurable) chance of hearing one of the last x songs again.
Unless you shuffle the original song list (which you said you don't want to do), you are going to have to allocate some additional memory to accomplish what you're after.
If you generate the random permutation of song indices beforehand (as you are doing), you obviously have to allocate some non-trivial amount of memory to store it, either encoded or as a list.
If the user doesn't need to be able to see the list, you could generate the random song order on the fly: After each song, pick another random song from the pool of unplayed songs. You still have to keep track of which songs have already been played, but you can use a bitfield for that. If you have 10000 songs, you just need 10000 bits (1250 bytes), each one representing whether the song has been played yet.
I don't know your exact limitations, but I have to wonder if the memory required to store a playlist is significant compared to the amount required for playing audio.
There are a number of methods of generating permutations without needing to store the state. See this question.
I think you should stick to your current solution (the one in your edit).
To do a re-order with no repetitions & not making your code behave unreliable, you have to track what you have already used / like by keeping unused indexes or indirectly by swapping from the original list.
I suggest to check it in the context of the working application i.e. if its of any significance vs. the memory used by other pieces of the system.
From a logical standpoint, it is possible. Given a list of n songs, there are n! permutations; if you assign each permutation a number from 1 to n! (or 0 to n!-1 :-D) and pick one of those numbers at random, you can then store the number of the permutation that you are currently using, along with the original list and the index of the current song within the permutation.
For example, if you have a list of songs {1, 2, 3}, your permutations are:
0: {1, 2, 3}
1: {1, 3, 2}
2: {2, 1, 3}
3: {2, 3, 1}
4: {3, 1, 2}
5: {3, 2, 1}
So the only data I need to track is the original list ({1, 2, 3}), the current song index (e.g. 1) and the index of the permutation (e.g. 3). Then, if I want to find the next song to play, I know it's third (2, but zero-based) song of permutation 3, e.g. Song 1.
However, this method relies on you having an efficient means of determining the ith song of the jth permutation, which until I've had chance to think (or someone with a stronger mathematical background than I can interject) is equivalent to "then a miracle happens". But the principle is there.
If memory was really a concern after a certain number of records and it's safe to say that if that memory boundary is reached, there's enough items in the list to not matter if there are some repeats, just as long as the same song was not repeated twice, I would use a combination method.
Case 1: If count < max memory constraint, generate the playlist ahead of time and use Knuth shuffle (see Jon Skeet's implementation, mentioned in other answers).
Case 2: If count >= max memory constraint, the song to be played will be determined at run time (I'd do it as soon as the song starts playing so the next song is already generated by the time the current song ends). Save the last [max memory constraint, or some token value] number of songs played, generate a random number (R) between 1 and song count, and if R = one of X last songs played, generate a new R until it is not in the list. Play that song.
Your max memory constraints will always be upheld, although performance can suffer in case 2 if you've played a lot of songs/get repeat random numbers frequently by chance.
you could use a trick we do in sql server to order sets in random like this with the use of guid. the values are always distributed equaly random.
private IEnumerable<int> RandomIndexes(int startIndexInclusive, int endIndexInclusive)
{
if (endIndexInclusive < startIndexInclusive)
throw new Exception("endIndex must be equal or higher than startIndex");
List<int> originalList = new List<int>(endIndexInclusive - startIndexInclusive);
for (int i = startIndexInclusive; i <= endIndexInclusive; i++)
originalList.Add(i);
return from i in originalList
orderby Guid.NewGuid()
select i;
}
You're going to have to allocate some memory, but it doesn't have to be a lot. You can reduce the memory footprint (the degree by which I'm unsure, as I don't know that much about the guts of C#) by using a bool array instead of int. Best case scenario this will only use (count / 8) bytes of memory, which isn't too bad (but I doubt C# actually represents bools as single bits).
public static IEnumerable<int> RandomIndexes(int count) {
Random rand = new Random();
bool[] used = new bool[count];
int i;
for (int counter = 0; counter < count; counter++) {
while (used[i = rand.Next(count)]); //i = some random unused value
used[i] = true;
yield return i;
}
}
Hope that helps!
As many others have said you should implement THEN optimize, and only optimize the parts that need it (which you check on with a profiler). I offer a (hopefully) elegant method of getting the list you need, which doesn't really care so much about performance:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Test
{
class Program
{
static void Main(string[] a)
{
Random random = new Random();
List<int> list1 = new List<int>(); //source list
List<int> list2 = new List<int>();
list2 = random.SequenceWhile((i) =>
{
if (list2.Contains(i))
{
return false;
}
list2.Add(i);
return true;
},
() => list2.Count == list1.Count,
list1.Count).ToList();
}
}
public static class RandomExtensions
{
public static IEnumerable<int> SequenceWhile(
this Random random,
Func<int, bool> shouldSkip,
Func<bool> continuationCondition,
int maxValue)
{
int current = random.Next(maxValue);
while (continuationCondition())
{
if (!shouldSkip(current))
{
yield return current;
}
current = random.Next(maxValue);
}
}
}
}
It is pretty much impossible to do it without allocating extra memory. If you're worried about the amount of extra memory allocated, you could always pick a random subset and shuffle between those. You'll get repeats before every song is played, but with a sufficiently large subset I'll warrant few people will notice.
const int MaxItemsToShuffle = 20;
public static IEnumerable<int> RandomIndexes(int count)
{
Random random = new Random();
int indexCount = Math.Min(count, MaxItemsToShuffle);
int[] indexes = new int[indexCount];
if (count > MaxItemsToShuffle)
{
int cur = 0, subsetCount = MaxItemsToShuffle;
for (int i = 0; i < count; i += 1)
{
if (random.NextDouble() <= ((float)subsetCount / (float)(count - i + 1)))
{
indexes[cur] = i;
cur += 1;
subsetCount -= 1;
}
}
}
else
{
for (int i = 0; i < count; i += 1)
{
indexes[i] = i;
}
}
for (int i = indexCount; i > 0; i -= 1)
{
int curIndex = random.Next(0, i);
yield return indexes[curIndex];
indexes[curIndex] = indexes[i - 1];
}
}

Categories