Parallel.For with 64-bit unsigned indexes (UInt64)

Parallel.For with 64-bit unsigned indexes (UInt64) - c#

In C#, there's a System.Threading.Tasks.Parallel.For(...) which does the same as a for loop, without order, but in multiple threads.
The thing is, it works only on long and int, I want to work with ulong. Okay, I can typecast but I have some trouble with the borders.
Let's say, I want a loop from long.MaxValue-10 to long.MaxValue+10 (remember, I'm talking about ulong). How do I do that?
An example:
for (long i = long.MaxValue - 10; i < long.MaxValue; ++i)
{
Console.WriteLine(i);
}
//does the same as
System.Threading.Tasks.Parallel.For(long.MaxValue - 10, long.MaxValue, delegate(long i)
{
Console.WriteLine(i);
});
//except for the order, but theres no equivalent for
long max = long.MaxValue;
for (ulong i = (ulong)max - 10; i < (ulong)max + 10; ++i)
{
Console.WriteLine(i);
}

You can always write to Microsoft and ask them to add Parallel.For(ulong, ulong, Action<ulong>) to the next version of the .NET Framework. Until that comes out, you'll have to resort to something like this:
Parallel.For(-10L, 10L, x => { var index = long.MaxValue + (ulong) x; });

Or you can create a custom range for Parallel.ForEach
public static IEnumerable<ulong> Range(ulong fromInclusive, ulong toExclusive)
{
for (var i = fromInclusive; i < toExclusive; i++) yield return i;
}
public static void ParallelFor(ulong fromInclusive, ulong toExclusive, Action<ulong> body)
{
Parallel.ForEach(
Range(fromInclusive, toExclusive),
new ParallelOptions { MaxDegreeOfParallelism = 4 },
body);
}

This will work for every long value from long.MinValue inclusive to long.MaxValue exclusive
Parallel.For(long.MinValue, long.MaxValue, x =>
{
ulong u = (ulong)(x + (-(long.MinValue + 1))) + 1;
Console.WriteLine(u);
});

Related

Thread local BigInteger variable in nested Parallel.For is not processed for aggregation with standard patterns?

I tryed to refactor a nested sequential for loop into a nested Parallel.For loop.
But following the recommended parallel patterns and locks, the overall result was too low compared with the sequential result.
The problem was caused by a wrong or inconsistent use of BigInteger calculation methods.
For BigInteger you need to use ++-operator or BigInteger methods like BigInteger.Add().
My sources:
How to: Write a Parallel.For Loop with Thread-Local Variables
Threading in C# - Parallel Programming - The Parallel Class - For and ForEach
Please find sample code below:
internal static class Program
{
static Object lockObj = new Object();
static void Main()
{
//target result: 575
NestedLoopAggregationTest();
return;
}
private static void NestedLoopAggregationTest()
{
BigInteger totalSequential = 0;
BigInteger totalRecomandedPattern = 0;
BigInteger totalAntiPattern = 0;
const int iEnd1 = 5;
const int iEnd2 = 10;
const int iEnd3 = 15;
for (int iCn1 = 1; iCn1 <= iEnd1; iCn1++)
{
for (int iCn2 = 1; iCn2 <= iEnd2; iCn2++)
{
for (int iCn3 = iCn2 - 1; iCn3 <= iEnd3; iCn3++)
{
totalSequential++;
}
}
}
Parallel.For(1, iEnd1 + 1, (iCn1) =>
{
Parallel.For(1, iEnd2 + 1, (iCn2) =>
{
Parallel.For<BigInteger>(iCn2 - 1, iEnd3 + 1, () => 0, (iCn3, state, subtotal) =>
{
//Solution:
//for BigInteger use ++-operator or BigInteger.Add()
subtotal = BigInteger.Add(subtotal, 1);
return subtotal;
},
(subtotal) =>
{
lock (lockObj)
{
totalRecomandedPattern = BigInteger.Add(totalRecomandedPattern, subtotal);
}
}
);
});
});
MessageBox.Show(totalSequential.ToString() + Environment.NewLine + totalRecomandedPattern.ToString() +
}
}

Your current parallel implementation requires a lock every time subtotal is modified in the inner loop. This modified approach is faster than both your serial and parallel implementaions because it avoids a lock in the innermost loop:
Parallel.For(1, iEnd1 + 1, (iCn1) =>
{
Parallel.For(1, iEnd2 + 1, (iCn2) =>
{
BigInteger subtotal = 0;
for (var iCnt3 = iCn2 - 1; iCnt3 < iEnd3 + 1; iCnt3++)
{
//Solution:
//for BigInteger use ++-operator or BigInteger.Add()
subtotal = BigInteger.Add(subtotal, 1);
}
lock (lockObj)
{
totalRecomandedPatternModified = BigInteger.Add(totalRecomandedPatternModified, subtotal);
}
});
});
I increased each of the endpoints by a factor of 10 so the runtime is long enough to be measured on my hardware, then got the following average times:
Serial: 9ms
Parallel: 11ms
Modified: 2ms

Multiple thread accessing and editing the same double array

I need to iterate through every double in an array to do the "Laplacian Smoothing", "mixing values" with neighbour doubles.
I'll keep stored values in a temp clone array update the original at the end.
Pseudo code:
double[] A = new double[1000];
// Filling A with values...
double[] B = A.Clone as double[];
for(int loops=0;loops<10;loops++){ // start of the loop
for(int i=0;i<1000;i++){ // iterating through all doubles in the array
// Parallel.For(0, 1000, (i) => {
double v= A[i];
B[i]-=v;
B[i+1]+=v/2;
B[i-1]+=v/2;
// here i'm going out of array bounds, i know. Pseudo code, not relevant.
}
// });
}
A = B.Clone as double[];
With for it works correctly. "Smoothing" the values in the array.
With Parallel.For() I have some access sync problems: threads are colliding and some values are actually not stored correctly. Threads access and edit the array at the same index many times.
(I haven't tested this in a linear array, i'm actually working on a multidimensional array[x,y,z] ..)
How can I solve this?
I was thinking to make a separate array for each thread, and do the sum later... but I need to know the thread index and I haven't found anywhere in the web. (I'm still interested if a "thread index" exist even with a totally different solution...).
I'll accept any solution.

You probably need one of the more advanced overloads of the Parallel.For method:
public static ParallelLoopResult For<TLocal>(int fromInclusive, int toExclusive,
ParallelOptions parallelOptions, Func<TLocal> localInit,
Func<int, ParallelLoopState, TLocal, TLocal> body,
Action<TLocal> localFinally);
Executes a for loop with thread-local data in which iterations may run in parallel, loop options can be configured, and the state of the loop can be monitored and manipulated.
This looks quite intimidating with all the various lambdas it expects. The idea is to have each thread work with local data, and finally merge the data
at the end. Here is how you could use this method to solve your problem:
double[] A = new double[1000];
double[] B = (double[])A.Clone();
object locker = new object();
var parallelOptions = new ParallelOptions()
{
MaxDegreeOfParallelism = Environment.ProcessorCount
};
Parallel.For(0, A.Length, parallelOptions,
localInit: () => new double[A.Length], // create temp array per thread
body: (i, state, temp) =>
{
double v = A[i];
temp[i] -= v;
temp[i + 1] += v / 2;
temp[i - 1] += v / 2;
return temp; // return a reference to the same temp array
}, localFinally: (localB) =>
{
// Can be called in parallel with other threads, so we need to lock
lock (locker)
{
for (int i = 0; i < localB.Length; i++)
{
B[i] += localB[i];
}
}
});
I should mention that the workload of the above example is too granular, so I wouldn't expect large improvements in performance from the parallelization. Hopefully your actual workload is more chunky. If for example you have two nested loops, parallelizing only the outer loop will work greatly because the inner loop will provide the much needed chunkiness.
Alternative solution: Instead of creating auxiliary arrays per thread, you could just update directly the B array, and use locks only when processing an index in the dangerous zone near the boundaries of the partitions:
Parallel.ForEach(Partitioner.Create(0, A.Length), parallelOptions, range =>
{
bool lockTaken = false;
try
{
for (int i = range.Item1; i < range.Item2; i++)
{
bool shouldLock = i < range.Item1 + 1 || i >= range.Item2 - 1;
if (shouldLock) Monitor.Enter(locker, ref lockTaken);
double v = A[i];
B[i] -= v;
B[i + 1] += v / 2;
B[i - 1] += v / 2;
if (shouldLock) { Monitor.Exit(locker); lockTaken = false; }
}
}
finally
{
if (lockTaken) Monitor.Exit(locker);
}
});

Ok, it appears that modulus can solve pretty much all my problems.
Here a really simplified version of the working code:
(the big script is 3d and unfinished... )
private void RunScript(bool Go, ref object Results)
{
if(Go){
LaplacianSmooth(100);
// Needed to restart "RunScript" over and over
this.Component.ExpireSolution(true);
}
else{
A = new double[count];
A[100] = 10000;
A[500] = 10000;
}
Results = A;
}
// <Custom additional code>
public static int T = Environment.ProcessorCount;
public static int count = 1000;
public double[] A = new double[count];
public double[,] B = new double[count, T];
public void LaplacianSmooth(int loops){
for(int loop = 0;loop < loops;loop++){
B = new double[count, T];
// Copying values to first column of temp multidimensional-array
Parallel.For(0, count, new ParallelOptions { MaxDegreeOfParallelism = T }, i => {
B[i, 0] = A[i];
});
// Applying Laplacian smoothing
Parallel.For(0, count, new ParallelOptions { MaxDegreeOfParallelism = T }, i => {
int t = i % 16;
// Wrapped next and previous element indexes
int n = (i + 1) % count;
int p = (i + count - 1) % count;
double v = A[i] * 0.5;
B[i, t] -= v;
B[p, t] += v / 2;
B[n, t] += v / 2;
});
// Copying values back to main array
Parallel.For(0, count, new ParallelOptions { MaxDegreeOfParallelism = T }, i => {
double val = 0;
for(int t = 0;t < T;t++){
val += B[i, t];
}
A[i] = val;
});
}
}
There are no "collisions" with the threads, as confirmed by the result of "Mass Addition" (a sum) that is constant at 20000.
Thanks everyone for the tips!

Trying to find large prime numbers with Alea GPU

An exception occurs when I try to find the 100,000th prime number using Alea GPU. The algorithm works fine if I try to find a smaller prime number e.g. the 10,000th prime number.
I am using Alea v3.0.4, NVIDIA GTX 970, Cuda 9.2 drivers.
I am new to GPU programming. Any help would be greatly appreciated.
long[] primeNumber = new long[1]; // nth prime number to find
int n = 100000; // find the 100,000th prime number
var worker = Gpu.Default; // GTX 970 CUDA v9.2 drivers
long count = 0;
worker.LongFor(count, n, x =>
{
long a = 2;
while (count < n)
{
long b = 2;
long prime = 1;
while (b * b <= a)
{
if (a % b == 0)
{
prime = 0;
break;
}
b++;
}
if (prime > 0)
{
count++;
}
a++;
}
primeNumber[0] = (a - 1);
}
);
Here are the exception details:
System.Exception occurred HResult=0x80131500 Message=[CUDAError]
CUDA_ERROR_LAUNCH_FAILED Source=Alea StackTrace: at
Alea.CUDAInterop.cuSafeCall#2939.Invoke(String message) at
Alea.CUDAInterop.cuSafeCall(cudaError_enum result) at
A.cf5aded17df9f7cc4c132234dda010fa7.Copy#918-22.Invoke(Unit _arg9)
at Alea.Memory.Copy(FSharpOption1 streamOpt, Memory src, IntPtr
srcOffset, Memory dst, IntPtr dstOffset, FSharpOption1 lengthOpt)
at
Alea.ImplicitMemoryTrackerEntry.cdd2cd00c052408bcdbf03958f14266ca(FSharpFunc2
c600c458623dca7db199a0e417603dff4, Object
cd5116337150ebaa6de788dacd82516fa) at
Alea.ImplicitMemoryTrackerEntry.c6a75c171c9cccafb084beba315394985(FSharpFunc2
c600c458623dca7db199a0e417603dff4, Object
cd5116337150ebaa6de788dacd82516fa) at
Alea.ImplicitMemoryTracker.HostReadWriteBarrier(Object instance) at
Alea.GlobalImplicitMemoryTracker.HostReadWriteBarrier(Object instance)
at A.cf5aded17df9f7cc4c132234dda010fa7.clo#2359-624.Invoke(Object
arg00) at
Microsoft.FSharp.Collections.SeqModule.Iterate[T](FSharpFunc2 action,
IEnumerable1 source) at Alea.Kernel.LaunchRaw(LaunchParam lp,
FSharpOption1 instanceOpt, FSharpList1 args) at
Alea.Parallel.Device.DeviceFor.For(Gpu gpu, Int64 fromInclusive, Int64
toExclusive, Action1 op) at Alea.Parallel.GpuExtension.LongFor(Gpu
gpu, Int64 fromInclusive, Int64 toExclusive, Action1 op) at
TestingGPU.Program.Execute(Int32 t) in
C:\Users..\source\repos\TestingGPU\TestingGPU\Program.cs:line 148
at TestingGPU.Program.Main(String[] args)
Working Solution:
static void Main(string[] args)
{
var devices = Device.Devices;
foreach (var device in devices)
{
Console.WriteLine(device.ToString());
}
while (true)
{
Console.WriteLine("Enter a number to check if it is a prime number:");
string line = Console.ReadLine();
long checkIfPrime = Convert.ToInt64(line);
Stopwatch sw = new Stopwatch();
sw.Start();
bool GPUisPrime = GPUIsItPrime(checkIfPrime+1);
sw.Stop();
Stopwatch sw2 = new Stopwatch();
sw2.Start();
bool CPUisPrime = CPUIsItPrime(checkIfPrime+1);
sw2.Stop();
Console.WriteLine($"GPU: is {checkIfPrime} prime? {GPUisPrime} Time Elapsed: {sw.ElapsedMilliseconds.ToString()}");
Console.WriteLine($"CPU: is {checkIfPrime} prime? {CPUisPrime} Time Elapsed: {sw2.ElapsedMilliseconds.ToString()}");
}
}
[GpuManaged]
private static bool GPUIsItPrime(long n)
{
//Sieve of Eratosthenes Algorithm
bool[] isComposite = new bool[n];
var worker = Gpu.Default;
worker.LongFor(2, n, i =>
{
if (!(isComposite[i]))
{
for (long j = 2; (j * i) < isComposite.Length; j++)
{
isComposite[j * i] = true;
}
}
});
return !isComposite[n-1];
}
private static bool CPUIsItPrime(long n)
{
//Sieve of Eratosthenes Algorithm
bool[] isComposite = new bool[n];
for (int i = 2; i < n; i++)
{
if (!isComposite[i])
{
for (long j = 2; (j * i) < n; j++)
{
isComposite[j * i] = true;
}
}
}
return !isComposite[n-1];
}

Your code doesn't look right. Given a parallel for-loop method here (LongFor), Alea will spawn "n" threads, with an index "x" used to identify what the thread number is. So, for example a simple example like For(0, n, x => a[x] = x); uses "x" to initialize a[] with { 0, 1, 2, ...., n - 1}. But, your kernel code does not use "x" anywhere in the code. Consequently, you run the same code "n" times with absolutely no difference. Why then run on a GPU? What I think you want is to do is to compute in thread "x" whether "x" is prime. With result in hand, set bool prime[x] = true or false. Then, afterwards, in the kernel after all that, add a sync call, followed with a test using a single thread (e.g., x == 0) to go through prime[] and pick the largest prime from the array. Otherwise, there's a lot of collisions for 'primeNumber[0] = (a - 1);' by n-threads on the GPU. I can't imagine how you would ever get the right result. Finally, you probably want to make sure using some Alea call that prime[] is never copied to/from the GPU. But, I don't know how you do that in Alea. The compiler might be smart enough to know that prime[] is only used in the kernel code.

Quick Sort Implementation with large numbers [duplicate]

I learnt about quick sort and how it can be implemented in both Recursive and Iterative method.
In Iterative method:
Push the range (0...n) into the stack
Partition the given array with a pivot
Pop the top element.
Push the partitions (index range) onto a stack if the range has more than one element
Do the above 3 steps, till the stack is empty
And the recursive version is the normal one defined in wiki.
I learnt that recursive algorithms are always slower than their iterative counterpart.
So, Which method is preferred in terms of time complexity (memory is not a concern)?
Which one is fast enough to use in Programming contest?
Is c++ STL sort() using a recursive approach?

In terms of (asymptotic) time complexity - they are both the same.
"Recursive is slower then iterative" - the rational behind this statement is because of the overhead of the recursive stack (saving and restoring the environment between calls).
However -these are constant number of ops, while not changing the number of "iterations".
Both recursive and iterative quicksort are O(nlogn) average case and O(n^2) worst case.
EDIT:
just for the fun of it I ran a benchmark with the (java) code attached to the post , and then I ran wilcoxon statistic test, to check what is the probability that the running times are indeed distinct
The results may be conclusive (P_VALUE=2.6e-34, https://en.wikipedia.org/wiki/P-value. Remember that the P_VALUE is P(T >= t | H) where T is the test statistic and H is the null hypothesis). But the answer is not what you expected.
The average of the iterative solution was 408.86 ms while of recursive was 236.81 ms
(Note - I used Integer and not int as argument to recursiveQsort() - otherwise the recursive would have achieved much better, because it doesn't have to box a lot of integers, which is also time consuming - I did it because the iterative solution has no choice but doing so.
Thus - your assumption is not true, the recursive solution is faster (for my machine and java for the very least) than the iterative one with P_VALUE=2.6e-34.
public static void recursiveQsort(int[] arr,Integer start, Integer end) {
if (end - start < 2) return; //stop clause
int p = start + ((end-start)/2);
p = partition(arr,p,start,end);
recursiveQsort(arr, start, p);
recursiveQsort(arr, p+1, end);
}
public static void iterativeQsort(int[] arr) {
Stack<Integer> stack = new Stack<Integer>();
stack.push(0);
stack.push(arr.length);
while (!stack.isEmpty()) {
int end = stack.pop();
int start = stack.pop();
if (end - start < 2) continue;
int p = start + ((end-start)/2);
p = partition(arr,p,start,end);
stack.push(p+1);
stack.push(end);
stack.push(start);
stack.push(p);
}
}
private static int partition(int[] arr, int p, int start, int end) {
int l = start;
int h = end - 2;
int piv = arr[p];
swap(arr,p,end-1);
while (l < h) {
if (arr[l] < piv) {
l++;
} else if (arr[h] >= piv) {
h--;
} else {
swap(arr,l,h);
}
}
int idx = h;
if (arr[h] < piv) idx++;
swap(arr,end-1,idx);
return idx;
}
private static void swap(int[] arr, int i, int j) {
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
public static void main(String... args) throws Exception {
Random r = new Random(1);
int SIZE = 1000000;
int N = 100;
int[] arr = new int[SIZE];
int[] millisRecursive = new int[N];
int[] millisIterative = new int[N];
for (int t = 0; t < N; t++) {
for (int i = 0; i < SIZE; i++) {
arr[i] = r.nextInt(SIZE);
}
int[] tempArr = Arrays.copyOf(arr, arr.length);
long start = System.currentTimeMillis();
iterativeQsort(tempArr);
millisIterative[t] = (int)(System.currentTimeMillis()-start);
tempArr = Arrays.copyOf(arr, arr.length);
start = System.currentTimeMillis();
recursvieQsort(tempArr,0,arr.length);
millisRecursive[t] = (int)(System.currentTimeMillis()-start);
}
int sum = 0;
for (int x : millisRecursive) {
System.out.println(x);
sum += x;
}
System.out.println("end of recursive. AVG = " + ((double)sum)/millisRecursive.length);
sum = 0;
for (int x : millisIterative) {
System.out.println(x);
sum += x;
}
System.out.println("end of iterative. AVG = " + ((double)sum)/millisIterative.length);
}

Recursion is NOT always slower than iteration. Quicksort is perfect example of it. The only way to do this in iterate way is create stack structure. So in other way do the same that the compiler do if we use recursion, and propably you will do this worse than compiler. Also there will be more jumps if you don't use recursion (to pop and push values to stack).

That's the solution i came up with in Javascript. I think it works.
const myArr = [33, 103, 3, 726, 200, 984, 198, 764, 9]
document.write('initial order :', JSON.stringify(myArr), '<br><br>')
qs_iter(myArr)
document.write('_Final order :', JSON.stringify(myArr))
function qs_iter(items) {
if (!items || items.length <= 1) {
return items
}
var stack = []
var low = 0
var high = items.length - 1
stack.push([low, high])
while (stack.length) {
var range = stack.pop()
low = range[0]
high = range[1]
if (low < high) {
var pivot = Math.floor((low + high) / 2)
stack.push([low, pivot])
stack.push([pivot + 1, high])
while (low < high) {
while (low < pivot && items[low] <= items[pivot]) low++
while (high > pivot && items[high] > items[pivot]) high--
if (low < high) {
var tmp = items[low]
items[low] = items[high]
items[high] = tmp
}
}
}
}
return items
}
Let me know if you found a mistake :)
Mister Jojo UPDATE :
this code just mixes values that can in rare cases lead to a sort, in other words never.
For those who have a doubt, I put it in snippet.

Generating permutations of a set (most efficiently)

I would like to generate all permutations of a set (a collection), like so:
Collection: 1, 2, 3
Permutations: {1, 2, 3}
{1, 3, 2}
{2, 1, 3}
{2, 3, 1}
{3, 1, 2}
{3, 2, 1}
This isn't a question of "how", in general, but more about how most efficiently.
Also, I wouldn't want to generate ALL permutations and return them, but only generating a single permutation, at a time, and continuing only if necessary (much like Iterators - which I've tried as well, but turned out to be less efficient).
I've tested many algorithms and approaches and came up with this code, which is most efficient of those I tried:
public static bool NextPermutation<T>(T[] elements) where T : IComparable<T>
{
// More efficient to have a variable instead of accessing a property
var count = elements.Length;
// Indicates whether this is the last lexicographic permutation
var done = true;
// Go through the array from last to first
for (var i = count - 1; i > 0; i--)
{
var curr = elements[i];
// Check if the current element is less than the one before it
if (curr.CompareTo(elements[i - 1]) < 0)
{
continue;
}
// An element bigger than the one before it has been found,
// so this isn't the last lexicographic permutation.
done = false;
// Save the previous (bigger) element in a variable for more efficiency.
var prev = elements[i - 1];
// Have a variable to hold the index of the element to swap
// with the previous element (the to-swap element would be
// the smallest element that comes after the previous element
// and is bigger than the previous element), initializing it
// as the current index of the current item (curr).
var currIndex = i;
// Go through the array from the element after the current one to last
for (var j = i + 1; j < count; j++)
{
// Save into variable for more efficiency
var tmp = elements[j];
// Check if tmp suits the "next swap" conditions:
// Smallest, but bigger than the "prev" element
if (tmp.CompareTo(curr) < 0 && tmp.CompareTo(prev) > 0)
{
curr = tmp;
currIndex = j;
}
}
// Swap the "prev" with the new "curr" (the swap-with element)
elements[currIndex] = prev;
elements[i - 1] = curr;
// Reverse the order of the tail, in order to reset it's lexicographic order
for (var j = count - 1; j > i; j--, i++)
{
var tmp = elements[j];
elements[j] = elements[i];
elements[i] = tmp;
}
// Break since we have got the next permutation
// The reason to have all the logic inside the loop is
// to prevent the need of an extra variable indicating "i" when
// the next needed swap is found (moving "i" outside the loop is a
// bad practice, and isn't very readable, so I preferred not doing
// that as well).
break;
}
// Return whether this has been the last lexicographic permutation.
return done;
}
It's usage would be sending an array of elements, and getting back a boolean indicating whether this was the last lexicographical permutation or not, as well as having the array altered to the next permutation.
Usage example:
var arr = new[] {1, 2, 3};
PrintArray(arr);
while (!NextPermutation(arr))
{
PrintArray(arr);
}
The thing is that I'm not happy with the speed of the code.
Iterating over all permutations of an array of size 11 takes about 4 seconds.
Although it could be considered impressive, since the amount of possible permutations of a set of size 11 is 11! which is nearly 40 million.
Logically, with an array of size 12 it will take about 12 times more time, since 12! is 11! * 12, and with an array of size 13 it will take about 13 times more time than the time it took with size 12, and so on.
So you can easily understand how with an array of size 12 and more, it really takes a very long time to go through all permutations.
And I have a strong hunch that I can somehow cut that time by a lot (without switching to a language other than C# - because compiler optimization really does optimize pretty nicely, and I doubt I could optimize as good, manually, in Assembly).
Does anyone know any other way to get that done faster?
Do you have any idea as to how to make the current algorithm faster?
Note that I don't want to use an external library or service in order to do that - I want to have the code itself and I want it to be as efficient as humanly possible.

This might be what you're looking for.
private static bool NextPermutation(int[] numList)
{
/*
Knuths
1. Find the largest index j such that a[j] < a[j + 1]. If no such index exists, the permutation is the last permutation.
2. Find the largest index l such that a[j] < a[l]. Since j + 1 is such an index, l is well defined and satisfies j < l.
3. Swap a[j] with a[l].
4. Reverse the sequence from a[j + 1] up to and including the final element a[n].
*/
var largestIndex = -1;
for (var i = numList.Length - 2; i >= 0; i--)
{
if (numList[i] < numList[i + 1]) {
largestIndex = i;
break;
}
}
if (largestIndex < 0) return false;
var largestIndex2 = -1;
for (var i = numList.Length - 1 ; i >= 0; i--) {
if (numList[largestIndex] < numList[i]) {
largestIndex2 = i;
break;
}
}
var tmp = numList[largestIndex];
numList[largestIndex] = numList[largestIndex2];
numList[largestIndex2] = tmp;
for (int i = largestIndex + 1, j = numList.Length - 1; i < j; i++, j--) {
tmp = numList[i];
numList[i] = numList[j];
numList[j] = tmp;
}
return true;
}

Update 2018-05-28:
A new multithreaded version (lot faster) is available below as another answer.
Also an article about permutation: Permutations: Fast implementations and a new indexing algorithm allowing multithreading
A little bit too late...
According to recent tests (updated 2018-05-22)
Fastest is mine BUT not in lexicographic order
For fastest lexicographic order, Sani Singh Huttunen solution seems to be the way to go.
Performance test results for 10 items (10!) in release on my machine (millisecs):
Ouellet : 29
SimpleVar: 95
Erez Robinson : 156
Sani Singh Huttunen : 37
Pengyang : 45047
Performance test results for 13 items (13!) in release on my machine (seconds):
Ouellet : 48.437
SimpleVar: 159.869
Erez Robinson : 327.781
Sani Singh Huttunen : 64.839
Advantages of my solution:
Heap's algorithm (Single swap per permutation)
No multiplication (like some implementations seen on the web)
Inlined swap
Generic
No unsafe code
In place (very low memory usage)
No modulo (only first bit compare)
My implementation of Heap's algorithm:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Runtime.CompilerServices;
namespace WpfPermutations
{
/// <summary>
/// EO: 2016-04-14
/// Generator of all permutations of an array of anything.
/// Base on Heap's Algorithm. See: https://en.wikipedia.org/wiki/Heap%27s_algorithm#cite_note-3
/// </summary>
public static class Permutations
{
/// <summary>
/// Heap's algorithm to find all pmermutations. Non recursive, more efficient.
/// </summary>
/// <param name="items">Items to permute in each possible ways</param>
/// <param name="funcExecuteAndTellIfShouldStop"></param>
/// <returns>Return true if cancelled</returns>
public static bool ForAllPermutation<T>(T[] items, Func<T[], bool> funcExecuteAndTellIfShouldStop)
{
int countOfItem = items.Length;
if (countOfItem <= 1)
{
return funcExecuteAndTellIfShouldStop(items);
}
var indexes = new int[countOfItem];
// Unecessary. Thanks to NetManage for the advise
// for (int i = 0; i < countOfItem; i++)
// {
// indexes[i] = 0;
// }
if (funcExecuteAndTellIfShouldStop(items))
{
return true;
}
for (int i = 1; i < countOfItem;)
{
if (indexes[i] < i)
{ // On the web there is an implementation with a multiplication which should be less efficient.
if ((i & 1) == 1) // if (i % 2 == 1) ... more efficient ??? At least the same.
{
Swap(ref items[i], ref items[indexes[i]]);
}
else
{
Swap(ref items[i], ref items[0]);
}
if (funcExecuteAndTellIfShouldStop(items))
{
return true;
}
indexes[i]++;
i = 1;
}
else
{
indexes[i++] = 0;
}
}
return false;
}
/// <summary>
/// This function is to show a linq way but is far less efficient
/// From: StackOverflow user: Pengyang : http://stackoverflow.com/questions/756055/listing-all-permutations-of-a-string-integer
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="list"></param>
/// <param name="length"></param>
/// <returns></returns>
static IEnumerable<IEnumerable<T>> GetPermutations<T>(IEnumerable<T> list, int length)
{
if (length == 1) return list.Select(t => new T[] { t });
return GetPermutations(list, length - 1)
.SelectMany(t => list.Where(e => !t.Contains(e)),
(t1, t2) => t1.Concat(new T[] { t2 }));
}
/// <summary>
/// Swap 2 elements of same type
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="a"></param>
/// <param name="b"></param>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
static void Swap<T>(ref T a, ref T b)
{
T temp = a;
a = b;
b = temp;
}
/// <summary>
/// Func to show how to call. It does a little test for an array of 4 items.
/// </summary>
public static void Test()
{
ForAllPermutation("123".ToCharArray(), (vals) =>
{
Console.WriteLine(String.Join("", vals));
return false;
});
int[] values = new int[] { 0, 1, 2, 4 };
Console.WriteLine("Ouellet heap's algorithm implementation");
ForAllPermutation(values, (vals) =>
{
Console.WriteLine(String.Join("", vals));
return false;
});
Console.WriteLine("Linq algorithm");
foreach (var v in GetPermutations(values, values.Length))
{
Console.WriteLine(String.Join("", v));
}
// Performance Heap's against Linq version : huge differences
int count = 0;
values = new int[10];
for (int n = 0; n < values.Length; n++)
{
values[n] = n;
}
Stopwatch stopWatch = new Stopwatch();
ForAllPermutation(values, (vals) =>
{
foreach (var v in vals)
{
count++;
}
return false;
});
stopWatch.Stop();
Console.WriteLine($"Ouellet heap's algorithm implementation {count} items in {stopWatch.ElapsedMilliseconds} millisecs");
count = 0;
stopWatch.Reset();
stopWatch.Start();
foreach (var vals in GetPermutations(values, values.Length))
{
foreach (var v in vals)
{
count++;
}
}
stopWatch.Stop();
Console.WriteLine($"Linq {count} items in {stopWatch.ElapsedMilliseconds} millisecs");
}
}
}
An this is my test code:
Task.Run(() =>
{
int[] values = new int[12];
for (int n = 0; n < values.Length; n++)
{
values[n] = n;
}
// Eric Ouellet Algorithm
int count = 0;
var stopwatch = new Stopwatch();
stopwatch.Reset();
stopwatch.Start();
Permutations.ForAllPermutation(values, (vals) =>
{
foreach (var v in vals)
{
count++;
}
return false;
});
stopwatch.Stop();
Console.WriteLine($"This {count} items in {stopwatch.ElapsedMilliseconds} millisecs");
// Simple Plan Algorithm
count = 0;
stopwatch.Reset();
stopwatch.Start();
PermutationsSimpleVar permutations2 = new PermutationsSimpleVar();
permutations2.Permutate(1, values.Length, (int[] vals) =>
{
foreach (var v in vals)
{
count++;
}
});
stopwatch.Stop();
Console.WriteLine($"Simple Plan {count} items in {stopwatch.ElapsedMilliseconds} millisecs");
// ErezRobinson Algorithm
count = 0;
stopwatch.Reset();
stopwatch.Start();
foreach(var vals in PermutationsErezRobinson.QuickPerm(values))
{
foreach (var v in vals)
{
count++;
}
};
stopwatch.Stop();
Console.WriteLine($"Erez Robinson {count} items in {stopwatch.ElapsedMilliseconds} millisecs");
});
Usage examples:
ForAllPermutation("123".ToCharArray(), (vals) =>
{
Console.WriteLine(String.Join("", vals));
return false;
});
int[] values = new int[] { 0, 1, 2, 4 };
ForAllPermutation(values, (vals) =>
{
Console.WriteLine(String.Join("", vals));
return false;
});

Well, if you can handle it in C and then translate to your language of choice, you can't really go much faster than this, because the time will be dominated by print:
void perm(char* s, int n, int i){
if (i >= n-1) print(s);
else {
perm(s, n, i+1);
for (int j = i+1; j<n; j++){
swap(s[i], s[j]);
perm(s, n, i+1);
swap(s[i], s[j]);
}
}
}
perm("ABC", 3, 0);

Update 2018-05-28, a new version, the fastest ... (multi-threaded)
Time taken for fastest algorithms
Need: Sani Singh Huttunen (fastest lexico) solution and my new OuelletLexico3 which support indexing
Indexing has 2 main advantages:
allows to get anyone permutation directly
allows multi-threading (derived from the first advantage)
Article: Permutations: Fast implementations and a new indexing algorithm allowing multithreading
On my machine (6 hyperthread cores : 12 threads) Xeon E5-1660 0 # 3.30Ghz, tests algorithms running with empty stuff to do for 13! items (time in millisecs):
53071: Ouellet (implementation of Heap)
65366: Sani Singh Huttunen (Fastest lexico)
11377: Mix OuelletLexico3 - Sani Singh Huttunen
A side note: using shares properties/variables between threads for permutation action will strongly impact performance if their usage is modification (read / write). Doing so will generate "false sharing" between threads. You will not get expected performance. I got this behavior while testing. My experience showed problems when I try to increase the global variable for the total count of permutation.
Usage:
PermutationMixOuelletSaniSinghHuttunen.ExecuteForEachPermutationMT(
new int[] {1, 2, 3, 4},
p =>
{
Console.WriteLine($"Values: {p[0]}, {p[1]}, p[2]}, {p[3]}");
});
Code:
using System;
using System.Runtime.CompilerServices;
namespace WpfPermutations
{
public class Factorial
{
// ************************************************************************
protected static long[] FactorialTable = new long[21];
// ************************************************************************
static Factorial()
{
FactorialTable[0] = 1; // To prevent divide by 0
long f = 1;
for (int i = 1; i <= 20; i++)
{
f = f * i;
FactorialTable[i] = f;
}
}
// ************************************************************************
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static long GetFactorial(int val) // a long can only support up to 20!
{
if (val > 20)
{
throw new OverflowException($"{nameof(Factorial)} only support a factorial value <= 20");
}
return FactorialTable[val];
}
// ************************************************************************
}
}
namespace WpfPermutations
{
public class PermutationSaniSinghHuttunen
{
public static bool NextPermutation(int[] numList)
{
/*
Knuths
1. Find the largest index j such that a[j] < a[j + 1]. If no such index exists, the permutation is the last permutation.
2. Find the largest index l such that a[j] < a[l]. Since j + 1 is such an index, l is well defined and satisfies j < l.
3. Swap a[j] with a[l].
4. Reverse the sequence from a[j + 1] up to and including the final element a[n].
*/
var largestIndex = -1;
for (var i = numList.Length - 2; i >= 0; i--)
{
if (numList[i] < numList[i + 1])
{
largestIndex = i;
break;
}
}
if (largestIndex < 0) return false;
var largestIndex2 = -1;
for (var i = numList.Length - 1; i >= 0; i--)
{
if (numList[largestIndex] < numList[i])
{
largestIndex2 = i;
break;
}
}
var tmp = numList[largestIndex];
numList[largestIndex] = numList[largestIndex2];
numList[largestIndex2] = tmp;
for (int i = largestIndex + 1, j = numList.Length - 1; i < j; i++, j--)
{
tmp = numList[i];
numList[i] = numList[j];
numList[j] = tmp;
}
return true;
}
}
}
using System;
namespace WpfPermutations
{
public class PermutationOuelletLexico3<T> // Enable indexing
{
// ************************************************************************
private T[] _sortedValues;
private bool[] _valueUsed;
public readonly long MaxIndex; // long to support 20! or less
// ************************************************************************
public PermutationOuelletLexico3(T[] sortedValues)
{
_sortedValues = sortedValues;
Result = new T[_sortedValues.Length];
_valueUsed = new bool[_sortedValues.Length];
MaxIndex = Factorial.GetFactorial(_sortedValues.Length);
}
// ************************************************************************
public T[] Result { get; private set; }
// ************************************************************************
/// <summary>
/// Sort Index is 0 based and should be less than MaxIndex. Otherwise you get an exception.
/// </summary>
/// <param name="sortIndex"></param>
/// <param name="result">Value is not used as inpu, only as output. Re-use buffer in order to save memory</param>
/// <returns></returns>
public void GetSortedValuesFor(long sortIndex)
{
int size = _sortedValues.Length;
if (sortIndex < 0)
{
throw new ArgumentException("sortIndex should greater or equal to 0.");
}
if (sortIndex >= MaxIndex)
{
throw new ArgumentException("sortIndex should less than factorial(the lenght of items)");
}
for (int n = 0; n < _valueUsed.Length; n++)
{
_valueUsed[n] = false;
}
long factorielLower = MaxIndex;
for (int index = 0; index < size; index++)
{
long factorielBigger = factorielLower;
factorielLower = Factorial.GetFactorial(size - index - 1); // factorielBigger / inverseIndex;
int resultItemIndex = (int)(sortIndex % factorielBigger / factorielLower);
int correctedResultItemIndex = 0;
for(;;)
{
if (! _valueUsed[correctedResultItemIndex])
{
resultItemIndex--;
if (resultItemIndex < 0)
{
break;
}
}
correctedResultItemIndex++;
}
Result[index] = _sortedValues[correctedResultItemIndex];
_valueUsed[correctedResultItemIndex] = true;
}
}
// ************************************************************************
}
}
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
namespace WpfPermutations
{
public class PermutationMixOuelletSaniSinghHuttunen
{
// ************************************************************************
private long _indexFirst;
private long _indexLastExclusive;
private int[] _sortedValues;
// ************************************************************************
public PermutationMixOuelletSaniSinghHuttunen(int[] sortedValues, long indexFirst = -1, long indexLastExclusive = -1)
{
if (indexFirst == -1)
{
indexFirst = 0;
}
if (indexLastExclusive == -1)
{
indexLastExclusive = Factorial.GetFactorial(sortedValues.Length);
}
if (indexFirst >= indexLastExclusive)
{
throw new ArgumentException($"{nameof(indexFirst)} should be less than {nameof(indexLastExclusive)}");
}
_indexFirst = indexFirst;
_indexLastExclusive = indexLastExclusive;
_sortedValues = sortedValues;
}
// ************************************************************************
public void ExecuteForEachPermutation(Action<int[]> action)
{
// Console.WriteLine($"Thread {System.Threading.Thread.CurrentThread.ManagedThreadId} started: {_indexFirst} {_indexLastExclusive}");
long index = _indexFirst;
PermutationOuelletLexico3<int> permutationOuellet = new PermutationOuelletLexico3<int>(_sortedValues);
permutationOuellet.GetSortedValuesFor(index);
action(permutationOuellet.Result);
index++;
int[] values = permutationOuellet.Result;
while (index < _indexLastExclusive)
{
PermutationSaniSinghHuttunen.NextPermutation(values);
action(values);
index++;
}
// Console.WriteLine($"Thread {System.Threading.Thread.CurrentThread.ManagedThreadId} ended: {DateTime.Now.ToString("yyyyMMdd_HHmmss_ffffff")}");
}
// ************************************************************************
public static void ExecuteForEachPermutationMT(int[] sortedValues, Action<int[]> action)
{
int coreCount = Environment.ProcessorCount; // Hyper treading are taken into account (ex: on a 4 cores hyperthreaded = 8)
long itemsFactorial = Factorial.GetFactorial(sortedValues.Length);
long partCount = (long)Math.Ceiling((double)itemsFactorial / (double)coreCount);
long startIndex = 0;
var tasks = new List<Task>();
for (int coreIndex = 0; coreIndex < coreCount; coreIndex++)
{
long stopIndex = Math.Min(startIndex + partCount, itemsFactorial);
PermutationMixOuelletSaniSinghHuttunen mix = new PermutationMixOuelletSaniSinghHuttunen(sortedValues, startIndex, stopIndex);
Task task = Task.Run(() => mix.ExecuteForEachPermutation(action));
tasks.Add(task);
if (stopIndex == itemsFactorial)
{
break;
}
startIndex = startIndex + partCount;
}
Task.WaitAll(tasks.ToArray());
}
// ************************************************************************
}
}

The fastest permutation algorithm that i know of is the QuickPerm algorithm.
Here is the implementation, it uses yield return so you can iterate one at a time like required.
Code:
public static IEnumerable<IEnumerable<T>> QuickPerm<T>(this IEnumerable<T> set)
{
int N = set.Count();
int[] a = new int[N];
int[] p = new int[N];
var yieldRet = new T[N];
List<T> list = new List<T>(set);
int i, j, tmp; // Upper Index i; Lower Index j
for (i = 0; i < N; i++)
{
// initialize arrays; a[N] can be any type
a[i] = i + 1; // a[i] value is not revealed and can be arbitrary
p[i] = 0; // p[i] == i controls iteration and index boundaries for i
}
yield return list;
//display(a, 0, 0); // remove comment to display array a[]
i = 1; // setup first swap points to be 1 and 0 respectively (i & j)
while (i < N)
{
if (p[i] < i)
{
j = i%2*p[i]; // IF i is odd then j = p[i] otherwise j = 0
tmp = a[j]; // swap(a[j], a[i])
a[j] = a[i];
a[i] = tmp;
//MAIN!
for (int x = 0; x < N; x++)
{
yieldRet[x] = list[a[x]-1];
}
yield return yieldRet;
//display(a, j, i); // remove comment to display target array a[]
// MAIN!
p[i]++; // increase index "weight" for i by one
i = 1; // reset index i to 1 (assumed)
}
else
{
// otherwise p[i] == i
p[i] = 0; // reset p[i] to zero
i++; // set new index value for i (increase by one)
} // if (p[i] < i)
} // while(i < N)
}

Here is the fastest implementation I ended up with:
public class Permutations
{
private readonly Mutex _mutex = new Mutex();
private Action<int[]> _action;
private Action<IntPtr> _actionUnsafe;
private unsafe int* _arr;
private IntPtr _arrIntPtr;
private unsafe int* _last;
private unsafe int* _lastPrev;
private unsafe int* _lastPrevPrev;
public int Size { get; private set; }
public bool IsRunning()
{
return this._mutex.SafeWaitHandle.IsClosed;
}
public bool Permutate(int start, int count, Action<int[]> action, bool async = false)
{
return this.Permutate(start, count, action, null, async);
}
public bool Permutate(int start, int count, Action<IntPtr> actionUnsafe, bool async = false)
{
return this.Permutate(start, count, null, actionUnsafe, async);
}
private unsafe bool Permutate(int start, int count, Action<int[]> action, Action<IntPtr> actionUnsafe, bool async = false)
{
if (!this._mutex.WaitOne(0))
{
return false;
}
var x = (Action)(() =>
{
this._actionUnsafe = actionUnsafe;
this._action = action;
this.Size = count;
this._arr = (int*)Marshal.AllocHGlobal(count * sizeof(int));
this._arrIntPtr = new IntPtr(this._arr);
for (var i = 0; i < count - 3; i++)
{
this._arr[i] = start + i;
}
this._last = this._arr + count - 1;
this._lastPrev = this._last - 1;
this._lastPrevPrev = this._lastPrev - 1;
*this._last = count - 1;
*this._lastPrev = count - 2;
*this._lastPrevPrev = count - 3;
this.Permutate(count, this._arr);
});
if (!async)
{
x();
}
else
{
new Thread(() => x()).Start();
}
return true;
}
private unsafe void Permutate(int size, int* start)
{
if (size == 3)
{
this.DoAction();
Swap(this._last, this._lastPrev);
this.DoAction();
Swap(this._last, this._lastPrevPrev);
this.DoAction();
Swap(this._last, this._lastPrev);
this.DoAction();
Swap(this._last, this._lastPrevPrev);
this.DoAction();
Swap(this._last, this._lastPrev);
this.DoAction();
return;
}
var sizeDec = size - 1;
var startNext = start + 1;
var usedStarters = 0;
for (var i = 0; i < sizeDec; i++)
{
this.Permutate(sizeDec, startNext);
usedStarters |= 1 << *start;
for (var j = startNext; j <= this._last; j++)
{
var mask = 1 << *j;
if ((usedStarters & mask) != mask)
{
Swap(start, j);
break;
}
}
}
this.Permutate(sizeDec, startNext);
if (size == this.Size)
{
this._mutex.ReleaseMutex();
}
}
private unsafe void DoAction()
{
if (this._action == null)
{
if (this._actionUnsafe != null)
{
this._actionUnsafe(this._arrIntPtr);
}
return;
}
var result = new int[this.Size];
fixed (int* pt = result)
{
var limit = pt + this.Size;
var resultPtr = pt;
var arrayPtr = this._arr;
while (resultPtr < limit)
{
*resultPtr = *arrayPtr;
resultPtr++;
arrayPtr++;
}
}
this._action(result);
}
private static unsafe void Swap(int* a, int* b)
{
var tmp = *a;
*a = *b;
*b = tmp;
}
}
Usage and testing performance:
var perms = new Permutations();
var sw1 = Stopwatch.StartNew();
perms.Permutate(0,
11,
(Action<int[]>)null); // Comment this line and...
//PrintArr); // Uncomment this line, to print permutations
sw1.Stop();
Console.WriteLine(sw1.Elapsed);
Printing method:
private static void PrintArr(int[] arr)
{
Console.WriteLine(string.Join(",", arr));
}
Going deeper:
I did not even think about this for a very long time, so I can only explain my code so much, but here's the general idea:
Permutations aren't lexicographic - this allows me to practically perform less operations between permutations.
The implementation is recursive, and when the "view" size is 3, it skips the complex logic and just performs 6 swaps to get the 6 permutations (or sub-permutations, if you will).
Because the permutations aren't in a lexicographic order, how can I decide which element to bring to the start of the current "view" (sub permutation)? I keep record of elements that were already used as "starters" in the current sub-permutation recursive call and simply search linearly for one that wasn't used in the tail of my array.
The implementation is for integers only, so to permute over a generic collection of elements you simply use the Permutations class to permute indices instead of your actual collection.
The Mutex is there just to ensure things don't get screwed when the execution is asynchronous (notice that you can pass an UnsafeAction parameter that will in turn get a pointer to the permuted array. You must not change the order of elements in that array (pointer)! If you want to, you should copy the array to a tmp array or just use the safe action parameter which takes care of that for you - the passed array is already a copy).
Note:
I have no idea how good this implementation really is - I haven't touched it in so long.
Test and compare to other implementations on your own, and let me know if you have any feedback!
Enjoy.

Here is a generic permutation finder that will iterate through every permutation of a collection and call an evalution function. If the evalution function returns true (it found the answer it was looking for), the permutation finder stops processing.
public class PermutationFinder<T>
{
private T[] items;
private Predicate<T[]> SuccessFunc;
private bool success = false;
private int itemsCount;
public void Evaluate(T[] items, Predicate<T[]> SuccessFunc)
{
this.items = items;
this.SuccessFunc = SuccessFunc;
this.itemsCount = items.Count();
Recurse(0);
}
private void Recurse(int index)
{
T tmp;
if (index == itemsCount)
success = SuccessFunc(items);
else
{
for (int i = index; i < itemsCount; i++)
{
tmp = items[index];
items[index] = items[i];
items[i] = tmp;
Recurse(index + 1);
if (success)
break;
tmp = items[index];
items[index] = items[i];
items[i] = tmp;
}
}
}
}
Here is a simple implementation:
class Program
{
static void Main(string[] args)
{
new Program().Start();
}
void Start()
{
string[] items = new string[5];
items[0] = "A";
items[1] = "B";
items[2] = "C";
items[3] = "D";
items[4] = "E";
new PermutationFinder<string>().Evaluate(items, Evaluate);
Console.ReadLine();
}
public bool Evaluate(string[] items)
{
Console.WriteLine(string.Format("{0},{1},{2},{3},{4}", items[0], items[1], items[2], items[3], items[4]));
bool someCondition = false;
if (someCondition)
return true; // Tell the permutation finder to stop.
return false;
}
}

Here is a recursive implementation with complexity O(n * n!)1 based on swapping of the elements of an array. The array is initialised with values from 1, 2, ..., n.
using System;
namespace Exercise
{
class Permutations
{
static void Main(string[] args)
{
int setSize = 3;
FindPermutations(setSize);
}
//-----------------------------------------------------------------------------
/* Method: FindPermutations(n) */
private static void FindPermutations(int n)
{
int[] arr = new int[n];
for (int i = 0; i < n; i++)
{
arr[i] = i + 1;
}
int iEnd = arr.Length - 1;
Permute(arr, iEnd);
}
//-----------------------------------------------------------------------------
/* Method: Permute(arr) */
private static void Permute(int[] arr, int iEnd)
{
if (iEnd == 0)
{
PrintArray(arr);
return;
}
Permute(arr, iEnd - 1);
for (int i = 0; i < iEnd; i++)
{
swap(ref arr[i], ref arr[iEnd]);
Permute(arr, iEnd - 1);
swap(ref arr[i], ref arr[iEnd]);
}
}
}
}
On each recursive step we swap the last element with the current element pointed to by the local variable in the for loop and then we indicate the uniqueness of the swapping by: incrementing the local variable of the for loop and decrementing the termination condition of the for loop, which is initially set to the number of the elements in the array, when the latter becomes zero we terminate the recursion.
Here are the helper functions:
//-----------------------------------------------------------------------------
/*
Method: PrintArray()
*/
private static void PrintArray(int[] arr, string label = "")
{
Console.WriteLine(label);
Console.Write("{");
for (int i = 0; i < arr.Length; i++)
{
Console.Write(arr[i]);
if (i < arr.Length - 1)
{
Console.Write(", ");
}
}
Console.WriteLine("}");
}
//-----------------------------------------------------------------------------
/*
Method: swap(ref int a, ref int b)
*/
private static void swap(ref int a, ref int b)
{
int temp = a;
a = b;
b = temp;
}
1. There are n! permutations of n elements to be printed.

I would be surprised if there are really order of magnitude improvements to be found. If there are, then C# needs fundamental improvement. Furthermore doing anything interesting with your permutation will generally take more work than generating it. So the cost of generating is going to be insignificant in the overall scheme of things.
That said, I would suggest trying the following things. You have already tried iterators. But have you tried having a function that takes a closure as input, then then calls that closure for each permutation found? Depending on internal mechanics of C#, this may be faster.
Similarly, have you tried having a function that returns a closure that will iterate over a specific permutation?
With either approach, there are a number of micro-optimizations you can experiment with. For instance you can sort your input array, and after that you always know what order it is in. For example you can have an array of bools indicating whether that element is less than the next one, and rather than do comparisons, you can just look at that array.

There's an accessible introduction to the algorithms and survey of implementations in Steven Skiena's Algorithm Design Manual (chapter 14.4 in the second edition)
Skiena references D. Knuth. The Art of Computer Programming, Volume 4 Fascicle 2: Generating All Tuples and Permutations. Addison Wesley, 2005.

I created an algorithm slightly faster than Knuth's one:
11 elements:
mine: 0.39 seconds
Knuth's: 0.624 seconds
13 elements:
mine: 56.615 seconds
Knuth's: 98.681 seconds
Here's my code in Java:
public static void main(String[] args)
{
int n=11;
int a,b,c,i,tmp;
int end=(int)Math.floor(n/2);
int[][] pos = new int[end+1][2];
int[] perm = new int[n];
for(i=0;i<n;i++) perm[i]=i;
while(true)
{
//this is where you can use the permutations (perm)
i=0;
c=n;
while(pos[i][1]==c-2 && pos[i][0]==c-1)
{
pos[i][0]=0;
pos[i][1]=0;
i++;
c-=2;
}
if(i==end) System.exit(0);
a=(pos[i][0]+1)%c+i;
b=pos[i][0]+i;
tmp=perm[b];
perm[b]=perm[a];
perm[a]=tmp;
if(pos[i][0]==c-1)
{
pos[i][0]=0;
pos[i][1]++;
}
else
{
pos[i][0]++;
}
}
}
The problem is my algorithm only works for odd numbers of elements. I wrote this code quickly so I'm pretty sure there's a better way to implement my idea to get better performance, but I don't really have the time to work on it right now to optimize it and solve the issue when the number of elements is even.
It's one swap for every permutation and it uses a really simple way to know which elements to swap.
I wrote an explanation of the method behind the code on my blog: http://antoinecomeau.blogspot.ca/2015/01/fast-generation-of-all-permutations.html

As the author of this question was asking about an algorithm:
[...] generating a single permutation, at a time, and continuing only if necessary
I would suggest considering Steinhaus–Johnson–Trotter algorithm.
Steinhaus–Johnson–Trotter algorithm on Wikipedia
Beautifully explained here

It's 1 am and I was watching TV and thought of this same question, but with string values.
Given a word find all permutations. You can easily modify this to handle an array, sets, etc.
Took me a bit to work it out, but the solution I came up was this:
string word = "abcd";
List<string> combinations = new List<string>();
for(int i=0; i<word.Length; i++)
{
for (int j = 0; j < word.Length; j++)
{
if (i < j)
combinations.Add(word[i] + word.Substring(j) + word.Substring(0, i) + word.Substring(i + 1, j - (i + 1)));
else if (i > j)
{
if(i== word.Length -1)
combinations.Add(word[i] + word.Substring(0, i));
else
combinations.Add(word[i] + word.Substring(0, i) + word.Substring(i + 1));
}
}
}
Here's the same code as above, but with some comments
string word = "abcd";
List<string> combinations = new List<string>();
//i is the first letter of the new word combination
for(int i=0; i<word.Length; i++)
{
for (int j = 0; j < word.Length; j++)
{
//add the first letter of the word, j is past i so we can get all the letters from j to the end
//then add all the letters from the front to i, then skip over i (since we already added that as the beginning of the word)
//and get the remaining letters from i+1 to right before j.
if (i < j)
combinations.Add(word[i] + word.Substring(j) + word.Substring(0, i) + word.Substring(i + 1, j - (i + 1)));
else if (i > j)
{
//if we're at the very last word no need to get the letters after i
if(i== word.Length -1)
combinations.Add(word[i] + word.Substring(0, i));
//add i as the first letter of the word, then get all the letters up to i, skip i, and then add all the lettes after i
else
combinations.Add(word[i] + word.Substring(0, i) + word.Substring(i + 1));
}
}
}

//+------------------------------------------------------------------+
//| |
//+------------------------------------------------------------------+
/**
* http://marknelson.us/2002/03/01/next-permutation/
* Rearranges the elements into the lexicographically next greater permutation and returns true.
* When there are no more greater permutations left, the function eventually returns false.
*/
// next lexicographical permutation
template <typename T>
bool next_permutation(T &arr[], int firstIndex, int lastIndex)
{
int i = lastIndex;
while (i > firstIndex)
{
int ii = i--;
T curr = arr[i];
if (curr < arr[ii])
{
int j = lastIndex;
while (arr[j] <= curr) j--;
Swap(arr[i], arr[j]);
while (ii < lastIndex)
Swap(arr[ii++], arr[lastIndex--]);
return true;
}
}
return false;
}
//+------------------------------------------------------------------+
//| |
//+------------------------------------------------------------------+
/**
* Swaps two variables or two array elements.
* using references/pointers to speed up swapping.
*/
template<typename T>
void Swap(T &var1, T &var2)
{
T temp;
temp = var1;
var1 = var2;
var2 = temp;
}
//+------------------------------------------------------------------+
//| |
//+------------------------------------------------------------------+
// driver program to test above function
#define N 3
void OnStart()
{
int i, x[N];
for (i = 0; i < N; i++) x[i] = i + 1;
printf("The %i! possible permutations with %i elements:", N, N);
do
{
printf("%s", ArrayToString(x));
} while (next_permutation(x, 0, N - 1));
}
// Output:
// The 3! possible permutations with 3 elements:
// "1,2,3"
// "1,3,2"
// "2,1,3"
// "2,3,1"
// "3,1,2"
// "3,2,1"

// Permutations are the different ordered arrangements of an n-element
// array. An n-element array has exactly n! full-length permutations.
// This iterator object allows to iterate all full length permutations
// one by one of an array of n distinct elements.
// The iterator changes the given array in-place.
// Permutations('ABCD') => ABCD DBAC ACDB DCBA
// BACD BDAC CADB CDBA
// CABD ADBC DACB BDCA
// ACBD DABC ADCB DBCA
// BCAD BADC CDAB CBDA
// CBAD ABDC DCAB BCDA
// count of permutations = n!
// Heap's algorithm (Single swap per permutation)
// http://www.quickperm.org/quickperm.php
// https://stackoverflow.com/a/36634935/4208440
// https://en.wikipedia.org/wiki/Heap%27s_algorithm
// My implementation of Heap's algorithm:
template<typename T>
class PermutationsIterator
{
int b, e, n;
int c[32]; /* control array: mixed radix number in rising factorial base.
the i-th digit has base i, which means that the digit must be
strictly less than i. The first digit is always 0, the second
can be 0 or 1, the third 0, 1 or 2, and so on.
ArrayResize isn't strictly necessary, int c[32] would suffice
for most practical purposes. Also, it is much faster */
public:
PermutationsIterator(T &arr[], int firstIndex, int lastIndex)
{
this.b = firstIndex; // v.begin()
this.e = lastIndex; // v.end()
this.n = e - b + 1;
ArrayInitialize(c, 0);
}
// Rearranges the input array into the next permutation and returns true.
// When there are no more permutations left, the function returns false.
bool next(T &arr[])
{
// find index to update
int i = 1;
// reset all the previous indices that reached the maximum possible values
while (c[i] == i)
{
c[i] = 0;
++i;
}
// no more permutations left
if (i == n)
return false;
// generate next permutation
int j = (i & 1) == 1 ? c[i] : 0; // IF i is odd then j = c[i] otherwise j = 0.
swap(arr[b + j], arr[b + i]); // generate a new permutation from previous permutation using a single swap
// Increment that index
++c[i];
return true;
}
};

I found this algo on rosetta code and it is really the fastest one I tried. http://rosettacode.org/wiki/Permutations#C
/* Boothroyd method; exactly N! swaps, about as fast as it gets */
void boothroyd(int *x, int n, int nn, int callback(int *, int))
{
int c = 0, i, t;
while (1) {
if (n > 2) boothroyd(x, n - 1, nn, callback);
if (c >= n - 1) return;
i = (n & 1) ? 0 : c;
c++;
t = x[n - 1], x[n - 1] = x[i], x[i] = t;
if (callback) callback(x, nn);
}
}
/* entry for Boothroyd method */
void perm2(int *x, int n, int callback(int*, int))
{
if (callback) callback(x, n);
boothroyd(x, n, n, callback);
}

If you just want to calculate the number of possible permutations you can avoid all that hard work above and use something like this (contrived in c#):
public static class ContrivedUtils
{
public static Int64 Permutations(char[] array)
{
if (null == array || array.Length == 0) return 0;
Int64 permutations = array.Length;
for (var pos = permutations; pos > 1; pos--)
permutations *= pos - 1;
return permutations;
}
}
You call it like this:
var permutations = ContrivedUtils.Permutations("1234".ToCharArray());
// output is: 24
var permutations = ContrivedUtils.Permutations("123456789".ToCharArray());
// output is: 362880

Simple C# recursive solution by swapping, for the initial call the index must be 0
static public void Permute<T>(List<T> input, List<List<T>> permutations, int index)
{
if (index == input.Count - 1)
{
permutations.Add(new List<T>(input));
return;
}
Permute(input, permutations, index + 1);
for (int i = index+1 ; i < input.Count; i++)
{
//swap
T temp = input[index];
input[index] = input[i];
input[i] = temp;
Permute(input, permutations, index + 1);
//swap back
temp = input[index];
input[index] = input[i];
input[i] = temp;
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parallel.For with 64-bit unsigned indexes (UInt64) - c#

You can always write to Microsoft and ask them to add Parallel.For(ulong, ulong, Action<ulong>) to the next version of the .NET Framework. Until that comes out, you'll have to resort to something like this: Parallel.For(-10L, 10L, x => { var index = long.MaxValue + (ulong) x; });

This will work for every long value from long.MinValue inclusive to long.MaxValue exclusive Parallel.For(long.MinValue, long.MaxValue, x => { ulong u = (ulong)(x + (-(long.MinValue + 1))) + 1; Console.WriteLine(u); });

Related

Thread local BigInteger variable in nested Parallel.For is not processed for aggregation with standard patterns?

Multiple thread accessing and editing the same double array

Trying to find large prime numbers with Alea GPU

Quick Sort Implementation with large numbers [duplicate]

Generating permutations of a set (most efficiently)

Categories

Resources