Related
I've got two lists with the following identical fields (but different content):
TriangleID
Perimeter(in pixels)
My task is to extract the couples of triangles that have their perimeter difference smaller than a fixed threshold.
I'd like to do it with Linq.
It's not Linq, but collection's size (N) that matters. In the worst case (all equal triangles) you have to return all possible pairs of triangles as a solution; we have
N * N
pairs. If you have N ~ 1e6 triangles you are going to obtain as many as trillion (1e12) pairs as an answer. It's too much for the modern personal computers (in case of supercomputer, however, you can try solving the problem).
Let's assume that you don't have the worst case and you expect to obtain at most ~ N pairs. You can do it like this (C# pseudocode):
// Sort triangles by their perimeters
firstList.Sort((t1, t2) => t1.Perimeter.CompareTo(t2.Perimeter));
foreach (left in secondList) {
//TODO: you have to implement BinarySearchIndex
int from = firstList.BinarySearchIndex(left.Period - threshould);
int to = firstList.BinarySearchIndex(left.Period + threshould);
// Scan all triangles within borders
for (int i = leftBorder; i <= rightBorder; ++i) {
triangle right = firstList[i];
// return pair if right and left are different triangles
if (right.Id != left.Id)
yield return Pair(left, right);
}
}
Time complexity is
O(N * log(N)) + /* sorting */ +
O(N * log(N)) /* foreach (N) *
Binary search (log N) *
for (1 - not the worst case) */ =
O(N * log(N))
I'm assuming it's ok to return a Triangle and associate a collection of nearby other Triangles, instead of returning a list of pairs.
The idea is to sort both lists and then iterate through them. The first list will iterate over every single item, and compare the perimeter to items in the second list. But not every item in the second list needs to be checked. Since both lists are sorted, you can iterate through the second list until you no longer find perimeters within a cutoff. The other time saving method is advancing the starting index of the second list based on the difference to the perimeter in the first list; this is left as an exercise for the reader.
public class Triangle
{
public int TriangleId {get; set;}
public int Perimeter {get; set;}
}
// Returns a dictionary, that each triangle has an associated list of other triangles
// with a perimeter within a specified distance. This list may be empty.
public Dictionary<Triangle, List<Triangle>> NearbyPerimeter(List<Triangle> primary, List<Triangle> compareList, int maxDistance)
{
// sort ~ O(n log n)
// The sort is required to make an orderly advance through both lists, otherwise
// every element needs to be compared to every other element.
var sorteda = primary.OrderBy(x => x.Perimeter);
// Call ToList to allow indexing with []
var sortedb = compareList.OrderBy(x => x.Perimeter).ToList();
var results = new Dictionary<Triangle, List<Triangle>>();
int minCompareIndex = 0;
int compareCount = compareList.Count;
// ~ O(n)
foreach (var tprime in sorteda)
{
var neighbors = new List<Triangle>();
// Add logic to advance minCompareIndex based on
// which is larger, tprime.Perimeter or sortedb[minCompareIndex].Perimeter
int i = minCompareIndex;
var foundMatch = false;
// Until the missing logic above is added, this is O(n) x O(n) so ~ O(n^2)
while (i < compareCount)
{
var second = sortedb[i];
if (Math.Abs(tprime.Perimeter - second.Perimeter) < maxDistance)
{
neighbors.Add(second);
foundMatch = true;
}
else if (foundMatch)
{
break;
}
i++;
}
results.Add(tprime, neighbors);
}
return results;
}
You can do it with Linq but with a small artefact - basically what you need is a Cartesian product of both collections and a filter on the differences. The Cartesian product can be obtained using Join and an always true condition.
The code below should do the trick (I'm assuming the lists contain a class called Triangle; if this is not the case adjust the code to your needs):
var results = list1.Join(list2,
_ => true,
_ => true,
(t1, t2) => new { Triangle1 = t1, Triangle2 = t2})
.Where(pair = > Math.Abs(pair.Triangle1.Perimeter - pair.Triangle2.Perimeter) < threshold)
.Select(pair => new{/*…*/});
We have a number of payments (Transaction) that come into our business each day. Each Transaction has an ID and an Amount. We have the requirement to match a number of these transactions to a specific amount. Example:
Transaction Amount
1 100
2 200
3 300
4 400
5 500
If we wanted to find the transactions that add up to 600 you would have a number of sets (1,2,3),(2,4),(1,5).
I found an algorithm that I have adapted, that works as defined below. For 30 transactions it takes 15ms. But the number of transactions average around 740 and have a maximum close to 6000. Is the a more efficient way to perform this search?
sum_up(TransactionList, remittanceValue, ref MatchedLists);
private static void sum_up(List<Transaction> transactions, decimal target, ref List<List<Transaction>> matchedLists)
{
sum_up_recursive(transactions, target, new List<Transaction>(), ref matchedLists);
}
private static void sum_up_recursive(List<Transaction> transactions, decimal target, List<Transaction> partial, ref List<List<Transaction>> matchedLists)
{
decimal s = 0;
foreach (Transaction x in partial) s += x.Amount;
if (s == target)
{
matchedLists.Add(partial);
}
if (s > target)
return;
for (int i = 0; i < transactions.Count; i++)
{
List<Transaction> remaining = new List<Transaction>();
Transaction n = new Transaction(0, transactions[i].ID, transactions[i].Amount);
for (int j = i + 1; j < transactions.Count; j++) remaining.Add(transactions[j]);
List<Transaction> partial_rec = new List<Transaction>(partial);
partial_rec.Add(new Transaction(n.MatchNumber, n.ID, n.Amount));
sum_up_recursive(remaining, target, partial_rec, ref matchedLists);
}
}
With Transaction defined as:
class Transaction
{
public int ID;
public decimal Amount;
public int MatchNumber;
public Transaction(int matchNumber, int id, decimal amount)
{
ID = id;
Amount = amount;
MatchNumber = matchNumber;
}
}
As already mentioned your problem can be solved by pseudo polynomial algorithm in O(n*G) with n - number of items and G - your targeted sum.
The first part question: is it possible to achieve the targeted sum G. The following pseudo/python code solves it (have no C# on my machine):
def subsum(values, target):
reached=[False]*(target+1) # initialize as no sums reached at all
reached[0]=True # with 0 elements we can only achieve the sum=0
for val in values:
for s in reversed(xrange(target+1)): #for target, target-1,...,0
if reached[s] and s+val<=target: # if subsum=s can be reached, that we can add the current value to this sum and build an new sum
reached[s+val]=True
return reached[target]
What is the idea? Let's consider values [1,2,3,6] and target sum 7:
We start with an empty set - the possible sum is obviously 0.
Now we look at the first element 1 and have to options to take or not to take. That leaves as with possible sums {0,1}.
Now looking at the next element 2: leads to possible sets {0,1} (not taking)+{2,3} (taking).
Until now not much difference to your approach, but now for element 3 we have possible sets a. for not taking {0,1,2,3} and b. for taking {3,4,5,6} resulting in {0,1,2,3,4,5,6} as possible sums. The difference to your approach is that there are two way to get to 3 and your recursion will be started twice from that (which is not needed). Calculating basically the same staff over and over again is the problem of your approach and why the proposed algorithm is better.
As last step we consider 6 and get {0,1,2,3,4,5,6,7} as possible sums.
But you also need the subset which leads to the targeted sum, for this we just remember which element was taken to achieve the current sub sum. This version returns a subset which results in the target sum or None otherwise:
def subsum(values, target):
reached=[False]*(target+1)
val_ids=[-1]*(target+1)
reached[0]=True # with 0 elements we can only achieve the sum=0
for (val_id,val) in enumerate(values):
for s in reversed(xrange(target+1)): #for target, target-1,...,0
if reached[s] and s+val<=target:
reached[s+val]=True
val_ids[s+val]=val_id
#reconstruct the subset for target:
if not reached[target]:
return None # means not possible
else:
result=[]
current=target
while current!=0:# search backwards jumping from predecessor to predecessor
val_id=val_ids[current]
result.append(val_id)
current-=values[val_id]
return result
As an another approach you could use memoization to speed up your current solution remembering for the state (subsum, number_of_elements_not considered) whether it is possible to achieve the target sum. But I would say the standard dynamic programming is a less error prone possibility here.
Yes.
I can't provide full code at the moment, but instead of iterating each list of transactions twice until finding matches (O squared), try this concept:
setup a hashtable with the existing transaction amounts as entries, as well as the summation of each set of two transactions assuming each value is made of a max of two transactions (weekend credit card processing).
for each total, reference into the hashtable - the sets of transactions in that slot are the list of matching transactions.
Instead of O^2, you can get it down to 4*O, which would make a noticeable difference in speed.
Good luck!
Dynamic programming can solve this problem efficiently:
Assume you have n transactions and the max amount of transactions is m.
we can solve it just in the complexity of O(nm).
learn it at Knapsack problem.
for this problem we can define for pre i transactions the numbers of subset, add up to sum: dp[i][sum].
the equation:
for i 1 to n:
dp[i][sum] = dp[i - 1][sum - amount_i]
the dp[n][sum] is the numbers of you need, and you need to add some tricks to get what are all the subsets.
Blockquote
You have a couple of practical assumptions here that would make brute force with smartish branch pruning feasible:
items are unique, hence you wouldn't be getting combinatorial blow up of valid subsets (i.e. (1,1,1,1,1,1,1,1,1,1,1,1,1) adding up to 3)
if the number of resulting feasible sets is still huge, you would run out of memory collecting them before running into total runtime issues.
ordering input ascending would allow for an easy early stop check - if your remaining sum is smaller then the current element, then none of the yet unexamined items could possibly be in a result (as current and subsequent items would only get bigger)
keeping running sums would speed up each step, as you wouldn't be recalculating it over and over again
Here's a bit of code:
public static List<T[]> SubsetSums<T>(T[] items, int target, Func<T, int> amountGetter)
{
Stack<T> unusedItems = new Stack<T>(items.OrderByDescending(amountGetter));
Stack<T> usedItems = new Stack<T>();
List<T[]> results = new List<T[]>();
SubsetSumsRec(unusedItems, usedItems, target, results, amountGetter);
return results;
}
public static void SubsetSumsRec<T>(Stack<T> unusedItems, Stack<T> usedItems, int targetSum, List<T[]> results, Func<T,int> amountGetter)
{
if (targetSum == 0)
results.Add(usedItems.ToArray());
if (targetSum < 0 || unusedItems.Count == 0)
return;
var item = unusedItems.Pop();
int currentAmount = amountGetter(item);
if (targetSum >= currentAmount)
{
// case 1: use current element
usedItems.Push(item);
SubsetSumsRec(unusedItems, usedItems, targetSum - currentAmount, results, amountGetter);
usedItems.Pop();
// case 2: skip current element
SubsetSumsRec(unusedItems, usedItems, targetSum, results, amountGetter);
}
unusedItems.Push(item);
}
I've run it against 100k input that yields around 1k results in under 25 millis, so it should be able to handle your 740 case with ease.
suppose I have this query :
int[] Numbers= new int[5]{5,2,3,4,5};
var query = from a in Numbers
where a== Numbers.Max (n => n) //notice MAX he should also get his value somehow
select a;
foreach (var element in query)
Console.WriteLine (element);
How many times does Numbers is enumerated when running the foreach ?
how can I test it ( I mean , writing a code which tells me the number of iterations)
It will be iterated 6 times. Once for the Where and once per element for the Max.
The code to demonstrate this:
private static int count = 0;
public static IEnumerable<int> Regurgitate(IEnumerable<int> source)
{
count++;
Console.WriteLine("Iterated sequence {0} times", count);
foreach (int i in source)
yield return i;
}
int[] Numbers = new int[5] { 5, 2, 3, 4, 5 };
IEnumerable<int> sequence = Regurgitate(Numbers);
var query = from a in sequence
where a == sequence.Max(n => n)
select a;
It will print "Iterated sequence 6 times".
We could make a more general purpose wrapper that is more flexible, if you're planning to use this to experiment with other cases:
public class EnumerableWrapper<T> : IEnumerable<T>
{
private IEnumerable<T> source;
public EnumerableWrapper(IEnumerable<T> source)
{
this.source = source;
}
public int IterationsStarted { get; private set; }
public int NumMoveNexts { get; private set; }
public int IterationsFinished { get; private set; }
public IEnumerator<T> GetEnumerator()
{
IterationsStarted++;
foreach (T item in source)
{
NumMoveNexts++;
yield return item;
}
IterationsFinished++;
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
public override string ToString()
{
return string.Format(
#"Iterations Started: {0}
Iterations Finished: {1}
Number of move next calls: {2}"
, IterationsStarted, IterationsFinished, NumMoveNexts);
}
}
This has several advantages over the other function:
It records both the number of iterations started, the number of iterations that were completed, and the total number of times all of the sequences were incremented.
You can create different instances to wrap different underlying sequences, thus allowing you to inspect multiple sequences per program, instead of just one when using a static variable.
Here is how you can estimate a quick count of the number of times the collection is enumerated: wrap your collection in a CountedEnum<T>, and increment counter on each yield return, like this --
static int counter = 0;
public static IEnumerable<T> CountedEnum<T>(IEnumerable<T> ee) {
foreach (var e in ee) {
counter++;
yield return e;
}
}
Then change your array declaration to this,
var Numbers= CountedEnum(new int[5]{5,2,3,4,5});
run your query, and print the counter. For your query, the code prints 30 (link to ideone), meaning that your collection of five items has been enumerated six times.
Here is how you can check the count
void Main()
{
var Numbers= new int[5]{5,2,3,4,5}.Select(n=>
{
Console.Write(n);
return n;
});
var query = from a in Numbers
where a== Numbers.Max (n => n)
select a;
foreach (var element in query)
{
var v = element;
}
}
Here is output
5 5 2 3 4 5 2 5 2 3 4 5 3 5 2 3 4 5 4 5 2 3 4 5 5 5 2 3 4 5
The number of iteration has to be equal to query.Count().
So to the count of the elements in the result of the first query.
If you're asking about something else, please clarify.
EDIT
After clarification:
if you're searching for total count of the iteration in the code provided, there will be 7 iterations (for this concrete case).
var query = from a in Numbers
where a== Numbers.Max (n => n) //5 iterations to find MAX among 5 elements
select a;
and
foreach (var element in query)
Console.WriteLine (element); //2 iterations over resulting collection(in this question)
How many times does Numbers is enumerated when running the foreach
Loosely speaking, your code is morally equivalent to:
foreach(int a in Numbers)
{
// 1. I've gotten rid of the unnecessary identity lambda.
// 2. Note that Max works by enumerating the entire source.
var max = Numbers.Max();
if(a == max)
Console.WriteLine(a);
}
So we enumerate the following times:
One enumeration of the sequence for the outer loop (1).
One enumeration of the sequence for each of its members (Count).
So in total, we enumerate Count + 1 times.
You could bring this down to 2 by hoisting the Max query outside the loop by introducing a local.
how can I test it ( I mean , writing a code which tells me the number
of iterations)
This wouldn't be easy with a raw array. But you could write your own enumerable implementation (that perhaps wrapped an array) and add some instrumentation to the GetEnumerator method. Or if you want to go deeper, go the whole hog and write a custom enumerator with instrumentation on MoveNext and Current as well.
Count via public property also yields 6.
private static int ncount = 0;
private int[] numbers= new int[5]{5,2,3,4,5};
public int[] Numbers
{
get
{
ncount++;
Debug.WriteLine("Numbers Get " + ncount.ToString());
return numbers;
}
}
This brings the count down to 2.
Makes sense but I would not have thought of it.
int nmax = Numbers.Max(n => n);
var query = from a in Numbers
where a == nmax //notice MAX he should also get his value somehow
//where a == Numbers.Max(n => n) //notice MAX he should also get his value somehow
select a;
It will be iterated 6 times. Once for the Where and once per element for the Max.
Define and initialize a count variable outside the foreach loop and increment the count variable as count++ inside the loop to get the number of times of enumeration.
I want to generate a shuffled merged list that will keep the internal order of the lists.
For example:
list A: 11 22 33
list B: 6 7 8
valid result: 11 22 6 33 7 8
invalid result: 22 11 7 6 33 8
Just randomly select a list (e.g. generate a random number between 0 and 1, if < 0.5 list A, otherwise list B) and then take the element from that list and add it to you new list. Repeat until you have no elements left in each list.
Generate A.Length random integers in the interval [0, B.Length). Sort the random numbers, then iterate i from 0..A.Length adding A[i] to into position r[i]+i in B. The +i is because you're shifting the original values in B to the right as you insert values from A.
This will be as random as your RNG.
None of the answers provided in this page work if you need the outputs to be uniformly distributed.
To illustrate my examples, assume we are merging two lists A=[1,2,3], B=[a,b,c]
In the approach mentioned in most answers (i.e. merging two lists a la mergesort, but choosing a list head randomly each time), the output [1 a 2 b 3 c] is far less likely than [1 2 3 a b c]. Intuitively, this happens because when you run out of elements in a list, then the elements on the other list are appended at the end. Because of that, the probability for the first case is 0.5*0.5*0.5 = 0.5^3 = 0.125, but in the second case, there are more random random events, since a random head has to be picked 5 times instead of just 3, leaving us with a probability of 0.5^5 = 0.03125. An empirical evaluation also easily validates these results.
The answer suggested by #marcog is almost correct. However, there is an issue where the distribution of r is not uniform after sorting it. This happens because original lists [0,1,2], [2,1,0], [2,1,0] all get sorted into [0,1,2], making this sorted r more likely than, for example, [0,0,0] for which there is only one possibility.
There is a clever way of generating the list r in such a way that it is uniformly distributed, as seen in this Math StackExchange question: https://math.stackexchange.com/questions/3218854/randomly-generate-a-sorted-set-with-uniform-distribution
To summarize the answer to that question, you must sample |B| elements (uniformly at random, and without repetition) from the set {0,1,..|A|+|B|-1}, sort the result and then subtract its index to each element in this new list. The result is the list r that can be used in replacement at #marcog's answer.
Original Answer:
static IEnumerable<T> MergeShuffle<T>(IEnumerable<T> lista, IEnumerable<T> listb)
{
var first = lista.GetEnumerator();
var second = listb.GetEnumerator();
var rand = new Random();
bool exhaustedA = false;
bool exhaustedB = false;
while (!(exhaustedA && exhaustedB))
{
bool found = false;
if (!exhaustedB && (exhaustedA || rand.Next(0, 2) == 0))
{
exhaustedB = !(found = second.MoveNext());
if (found)
yield return second.Current;
}
if (!found && !exhaustedA)
{
exhaustedA = !(found = first.MoveNext());
if (found)
yield return first.Current;
}
}
}
Second answer based on marcog's answer
static IEnumerable<T> MergeShuffle<T>(IEnumerable<T> lista, IEnumerable<T> listb)
{
int total = lista.Count() + listb.Count();
var random = new Random();
var indexes = Enumerable.Range(0, total-1)
.OrderBy(_=>random.NextDouble())
.Take(lista.Count())
.OrderBy(x=>x)
.ToList();
var first = lista.GetEnumerator();
var second = listb.GetEnumerator();
for (int i = 0; i < total; i++)
if (indexes.Contains(i))
{
first.MoveNext();
yield return first.Current;
}
else
{
second.MoveNext();
yield return second.Current;
}
}
Rather than generating a list of indices, this can be done by adjusting the probabilities based on the number of elements left in each list. On each iteration, A will have A_size elements remaining, and B will have B_size elements remaining. Choose a random number R from 1..(A_size + B_size). If R <= A_size, then use an element from A as the next element in the output. Otherwise use an element from B.
int A[] = {11, 22, 33}, A_pos = 0, A_remaining = 3;
int B[] = {6, 7, 8}, B_pos = 0, B_remaining = 3;
while (A_remaining || B_remaining) {
int r = rand() % (A_remaining + B_remaining);
if (r < A_remaining) {
printf("%d ", A[A_pos++]);
A_remaining--;
} else {
printf("%d ", B[B_pos++]);
B_remaining--;
}
}
printf("\n");
As a list gets smaller, the probability an element gets chosen from it will decrease.
This can be scaled to multiple lists. For example, given lists A, B, and C with sizes A_size, B_size, and C_size, choose R in 1..(A_size+B_size+C_size). If R <= A_size, use an element from A. Otherwise, if R <= A_size+B_size use an element from B. Otherwise C.
Here is a solution that ensures a uniformly distributed output, and is easy to reason why. The idea is first to generate a list of tokens, where each token represent an element of a specific list, but not a specific element. For example for two lists having 3 elements each, we generate this list of tokens: 0, 0, 0, 1, 1, 1. Then we shuffle the tokens. Finally we yield an element for each token, selecting the next element from the corresponding original list.
public static IEnumerable<T> MergeShufflePreservingOrder<T>(
params IEnumerable<T>[] sources)
{
var random = new Random();
var queues = sources
.Select(source => new Queue<T>(source))
.ToArray();
var tokens = queues
.SelectMany((queue, i) => Enumerable.Repeat(i, queue.Count))
.ToArray();
Shuffle(tokens);
return tokens.Select(token => queues[token].Dequeue());
void Shuffle(int[] array)
{
for (int i = 0; i < array.Length; i++)
{
int j = random.Next(i, array.Length);
if (i == j) continue;
if (array[i] == array[j]) continue;
var temp = array[i];
array[i] = array[j];
array[j] = temp;
}
}
}
Usage example:
var list1 = "ABCDEFGHIJKL".ToCharArray();
var list2 = "abcd".ToCharArray();
var list3 = "#".ToCharArray();
var merged = MergeShufflePreservingOrder(list1, list2, list3);
Console.WriteLine(String.Join("", merged));
Output:
ABCDaEFGHIb#cJKLd
This might be easier, assuming you have a list of three values in order that match 3 values in another table.
You can also sequence with the identity using identity (1,2)
Create TABLE #tmp1 (ID int identity(1,1),firstvalue char(2),secondvalue char(2))
Create TABLE #tmp2 (ID int identity(1,1),firstvalue char(2),secondvalue char(2))
Insert into #tmp1(firstvalue,secondvalue) Select firstvalue,null secondvalue from firsttable
Insert into #tmp2(firstvalue,secondvalue) Select null firstvalue,secondvalue from secondtable
Select a.firstvalue,b.secondvalue from #tmp1 a join #tmp2 b on a.id=b.id
DROP TABLE #tmp1
DROP TABLE #tmp2
I need to create a list of numbers from a range (for example from x to y) in a random order so that every order has an equal chance.
I need this for a music player I write in C#, to create play lists in a random order.
Any ideas?
Thanks.
EDIT: I'm not interested in changing the original list, just pick up random indexes from a range in a random order so that every order has an equal chance.
Here's what I've wrriten so far:
public static IEnumerable<int> RandomIndexes(int count)
{
if (count > 0)
{
int[] indexes = new int[count];
int indexesCountMinus1 = count - 1;
for (int i = 0; i < count; i++)
{
indexes[i] = i;
}
Random random = new Random();
while (indexesCountMinus1 > 0)
{
int currIndex = random.Next(0, indexesCountMinus1 + 1);
yield return indexes[currIndex];
indexes[currIndex] = indexes[indexesCountMinus1];
indexesCountMinus1--;
}
yield return indexes[0];
}
}
It's working, but the only problem of this is that I need to allocate an array in the memory in the size of count. I'm looking for something that dose not require memory allocation.
Thanks.
This can actually be tricky if you're not careful (i.e., using a naïve shuffling algorithm). Take a look at the Fisher-Yates/Knuth shuffle algorithm for proper distribution of values.
Once you have the shuffling algorithm, the rest should be easy.
Here's more detail from Jeff Atwood.
Lastly, here's Jon Skeet's implementation and description.
EDIT
I don't believe that there's a solution that satisfies your two conflicting requirements (first, to be random with no repeats and second to not allocate any additional memory). I believe you may be prematurely optimizing your solution as the memory implications should be negligible, unless you're embedded. Or, perhaps I'm just not smart enough to come up with an answer.
With that, here's code that will create an array of evenly distributed random indexes using the Knuth-Fisher-Yates algorithm (with a slight modification). You can cache the resulting array, or perform any number of optimizations depending on the rest of your implementation.
private static int[] BuildShuffledIndexArray( int size ) {
int[] array = new int[size];
Random rand = new Random();
for ( int currentIndex = array.Length - 1; currentIndex > 0; currentIndex-- ) {
int nextIndex = rand.Next( currentIndex + 1 );
Swap( array, currentIndex, nextIndex );
}
return array;
}
private static void Swap( IList<int> array, int firstIndex, int secondIndex ) {
if ( array[firstIndex] == 0 ) {
array[firstIndex] = firstIndex;
}
if ( array[secondIndex] == 0 ) {
array[secondIndex] = secondIndex;
}
int temp = array[secondIndex];
array[secondIndex] = array[firstIndex];
array[firstIndex] = temp;
}
NOTE: You can use ushort instead of int to half the size in memory as long as you don't have more than 65,535 items in your playlist. You could always programmatically switch to int if the size exceeds ushort.MaxValue. If I, personally, added more than 65K items to a playlist, I wouldn't be shocked by increased memory utilization.
Remember, too, that this is a managed language. The VM will always reserve more memory than you are using to limit the number of times it needs to ask the OS for more RAM and to limit fragmentation.
EDIT
Okay, last try: we can look to tweak the performance/memory trade off: You could create your list of integers, then write it to disk. Then just keep a pointer to the offset in the file. Then every time you need a new number, you just have disk I/O to deal with. Perhaps you can find some balance here, and just read N-sized blocks of data into memory where N is some number you're comfortable with.
Seems like a lot of work for a shuffle algorithm, but if you're dead-set on conserving memory, then at least it's an option.
If you use a maximal linear feedback shift register, you will use O(1) of memory and roughly O(1) time. See here for a handy C implementation (two lines! woo-hoo!) and tables of feedback terms to use.
And here is a solution:
public class MaximalLFSR
{
private int GetFeedbackSize(uint v)
{
uint r = 0;
while ((v >>= 1) != 0)
{
r++;
}
if (r < 4)
r = 4;
return (int)r;
}
static uint[] _feedback = new uint[] {
0x9, 0x17, 0x30, 0x44, 0x8e,
0x108, 0x20d, 0x402, 0x829, 0x1013, 0x203d, 0x4001, 0x801f,
0x1002a, 0x2018b, 0x400e3, 0x801e1, 0x10011e, 0x2002cc, 0x400079, 0x80035e,
0x1000160, 0x20001e4, 0x4000203, 0x8000100, 0x10000235, 0x2000027d, 0x4000016f, 0x80000478
};
private uint GetFeedbackTerm(int bits)
{
if (bits < 4 || bits >= 28)
throw new ArgumentOutOfRangeException("bits");
return _feedback[bits];
}
public IEnumerable<int> RandomIndexes(int count)
{
if (count < 0)
throw new ArgumentOutOfRangeException("count");
int bitsForFeedback = GetFeedbackSize((uint)count);
Random r = new Random();
uint i = (uint)(r.Next(1, count - 1));
uint feedback = GetFeedbackTerm(bitsForFeedback);
int valuesReturned = 0;
while (valuesReturned < count)
{
if ((i & 1) != 0)
{
i = (i >> 1) ^ feedback;
}
else {
i = (i >> 1);
}
if (i <= count)
{
valuesReturned++;
yield return (int)(i-1);
}
}
}
}
Now, I selected the feedback terms (badly) at random from the link above. You could also implement a version that had multiple maximal terms and you select one of those at random, but you know what? This is pretty dang good for what you want.
Here is test code:
static void Main(string[] args)
{
while (true)
{
Console.Write("Enter a count: ");
string s = Console.ReadLine();
int count;
if (Int32.TryParse(s, out count))
{
MaximalLFSR lfsr = new MaximalLFSR();
foreach (int i in lfsr.RandomIndexes(count))
{
Console.Write(i + ", ");
}
}
Console.WriteLine("Done.");
}
}
Be aware that maximal LFSR's never generate 0. I've hacked around this by returning the i term - 1. This works well enough. Also, since you want to guarantee uniqueness, I ignore anything out of range - the LFSR only generates sequences up to powers of two, so in high ranges, it will generate wost case 2x-1 too many values. These will get skipped - that will still be faster than FYK.
Personally, for a music player, I wouldn't generate a shuffled list, and then play that, then generate another shuffled list when that runs out, but do something more like:
IEnumerable<Song> GetSongOrder(List<Song> allSongs)
{
var playOrder = new List<Song>();
while (true)
{
// this step assigns an integer weight to each song,
// corresponding to how likely it is to be played next.
// in a better implementation, this would look at the total number of
// songs as well, and provide a smoother ramp up/down.
var weights = allSongs.Select(x => playOrder.LastIndexOf(x) > playOrder.Length - 10 ? 50 : 1);
int position = random.Next(weights.Sum());
foreach (int i in Enumerable.Range(allSongs.Length))
{
position -= weights[i];
if (position < 0)
{
var song = allSongs[i];
playOrder.Add(song);
yield return song;
break;
}
}
// trim playOrder to prevent infinite memory here as well.
if (playOrder.Length > allSongs.Length * 10)
playOrder = playOrder.Skip(allSongs.Length * 8).ToList();
}
}
This would make songs picked in order, as long as they haven't been recently played. This provides "smoother" transitions from the end of one shuffle to the next, because the first song of the next shuffle could be the same song as the last shuffle with 1/(total songs) probability, whereas this algorithm has a lower (and configurable) chance of hearing one of the last x songs again.
Unless you shuffle the original song list (which you said you don't want to do), you are going to have to allocate some additional memory to accomplish what you're after.
If you generate the random permutation of song indices beforehand (as you are doing), you obviously have to allocate some non-trivial amount of memory to store it, either encoded or as a list.
If the user doesn't need to be able to see the list, you could generate the random song order on the fly: After each song, pick another random song from the pool of unplayed songs. You still have to keep track of which songs have already been played, but you can use a bitfield for that. If you have 10000 songs, you just need 10000 bits (1250 bytes), each one representing whether the song has been played yet.
I don't know your exact limitations, but I have to wonder if the memory required to store a playlist is significant compared to the amount required for playing audio.
There are a number of methods of generating permutations without needing to store the state. See this question.
I think you should stick to your current solution (the one in your edit).
To do a re-order with no repetitions & not making your code behave unreliable, you have to track what you have already used / like by keeping unused indexes or indirectly by swapping from the original list.
I suggest to check it in the context of the working application i.e. if its of any significance vs. the memory used by other pieces of the system.
From a logical standpoint, it is possible. Given a list of n songs, there are n! permutations; if you assign each permutation a number from 1 to n! (or 0 to n!-1 :-D) and pick one of those numbers at random, you can then store the number of the permutation that you are currently using, along with the original list and the index of the current song within the permutation.
For example, if you have a list of songs {1, 2, 3}, your permutations are:
0: {1, 2, 3}
1: {1, 3, 2}
2: {2, 1, 3}
3: {2, 3, 1}
4: {3, 1, 2}
5: {3, 2, 1}
So the only data I need to track is the original list ({1, 2, 3}), the current song index (e.g. 1) and the index of the permutation (e.g. 3). Then, if I want to find the next song to play, I know it's third (2, but zero-based) song of permutation 3, e.g. Song 1.
However, this method relies on you having an efficient means of determining the ith song of the jth permutation, which until I've had chance to think (or someone with a stronger mathematical background than I can interject) is equivalent to "then a miracle happens". But the principle is there.
If memory was really a concern after a certain number of records and it's safe to say that if that memory boundary is reached, there's enough items in the list to not matter if there are some repeats, just as long as the same song was not repeated twice, I would use a combination method.
Case 1: If count < max memory constraint, generate the playlist ahead of time and use Knuth shuffle (see Jon Skeet's implementation, mentioned in other answers).
Case 2: If count >= max memory constraint, the song to be played will be determined at run time (I'd do it as soon as the song starts playing so the next song is already generated by the time the current song ends). Save the last [max memory constraint, or some token value] number of songs played, generate a random number (R) between 1 and song count, and if R = one of X last songs played, generate a new R until it is not in the list. Play that song.
Your max memory constraints will always be upheld, although performance can suffer in case 2 if you've played a lot of songs/get repeat random numbers frequently by chance.
you could use a trick we do in sql server to order sets in random like this with the use of guid. the values are always distributed equaly random.
private IEnumerable<int> RandomIndexes(int startIndexInclusive, int endIndexInclusive)
{
if (endIndexInclusive < startIndexInclusive)
throw new Exception("endIndex must be equal or higher than startIndex");
List<int> originalList = new List<int>(endIndexInclusive - startIndexInclusive);
for (int i = startIndexInclusive; i <= endIndexInclusive; i++)
originalList.Add(i);
return from i in originalList
orderby Guid.NewGuid()
select i;
}
You're going to have to allocate some memory, but it doesn't have to be a lot. You can reduce the memory footprint (the degree by which I'm unsure, as I don't know that much about the guts of C#) by using a bool array instead of int. Best case scenario this will only use (count / 8) bytes of memory, which isn't too bad (but I doubt C# actually represents bools as single bits).
public static IEnumerable<int> RandomIndexes(int count) {
Random rand = new Random();
bool[] used = new bool[count];
int i;
for (int counter = 0; counter < count; counter++) {
while (used[i = rand.Next(count)]); //i = some random unused value
used[i] = true;
yield return i;
}
}
Hope that helps!
As many others have said you should implement THEN optimize, and only optimize the parts that need it (which you check on with a profiler). I offer a (hopefully) elegant method of getting the list you need, which doesn't really care so much about performance:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Test
{
class Program
{
static void Main(string[] a)
{
Random random = new Random();
List<int> list1 = new List<int>(); //source list
List<int> list2 = new List<int>();
list2 = random.SequenceWhile((i) =>
{
if (list2.Contains(i))
{
return false;
}
list2.Add(i);
return true;
},
() => list2.Count == list1.Count,
list1.Count).ToList();
}
}
public static class RandomExtensions
{
public static IEnumerable<int> SequenceWhile(
this Random random,
Func<int, bool> shouldSkip,
Func<bool> continuationCondition,
int maxValue)
{
int current = random.Next(maxValue);
while (continuationCondition())
{
if (!shouldSkip(current))
{
yield return current;
}
current = random.Next(maxValue);
}
}
}
}
It is pretty much impossible to do it without allocating extra memory. If you're worried about the amount of extra memory allocated, you could always pick a random subset and shuffle between those. You'll get repeats before every song is played, but with a sufficiently large subset I'll warrant few people will notice.
const int MaxItemsToShuffle = 20;
public static IEnumerable<int> RandomIndexes(int count)
{
Random random = new Random();
int indexCount = Math.Min(count, MaxItemsToShuffle);
int[] indexes = new int[indexCount];
if (count > MaxItemsToShuffle)
{
int cur = 0, subsetCount = MaxItemsToShuffle;
for (int i = 0; i < count; i += 1)
{
if (random.NextDouble() <= ((float)subsetCount / (float)(count - i + 1)))
{
indexes[cur] = i;
cur += 1;
subsetCount -= 1;
}
}
}
else
{
for (int i = 0; i < count; i += 1)
{
indexes[i] = i;
}
}
for (int i = indexCount; i > 0; i -= 1)
{
int curIndex = random.Next(0, i);
yield return indexes[curIndex];
indexes[curIndex] = indexes[i - 1];
}
}