Trying to find large prime numbers with Alea GPU - c#

An exception occurs when I try to find the 100,000th prime number using Alea GPU. The algorithm works fine if I try to find a smaller prime number e.g. the 10,000th prime number.
I am using Alea v3.0.4, NVIDIA GTX 970, Cuda 9.2 drivers.
I am new to GPU programming. Any help would be greatly appreciated.
long[] primeNumber = new long[1]; // nth prime number to find
int n = 100000; // find the 100,000th prime number
var worker = Gpu.Default; // GTX 970 CUDA v9.2 drivers
long count = 0;
worker.LongFor(count, n, x =>
long a = 2;
while (count < n)
long b = 2;
long prime = 1;
while (b * b <= a)
if (a % b == 0)
prime = 0;
if (prime > 0)
primeNumber[0] = (a - 1);
Here are the exception details:
System.Exception occurred HResult=0x80131500 Message=[CUDAError]
CUDA_ERROR_LAUNCH_FAILED Source=Alea StackTrace: at
Alea.CUDAInterop.cuSafeCall#2939.Invoke(String message) at
Alea.CUDAInterop.cuSafeCall(cudaError_enum result) at
A.cf5aded17df9f7cc4c132234dda010fa7.Copy#918-22.Invoke(Unit _arg9)
at Alea.Memory.Copy(FSharpOption1 streamOpt, Memory src, IntPtr
srcOffset, Memory dst, IntPtr dstOffset, FSharpOption1 lengthOpt)
c600c458623dca7db199a0e417603dff4, Object
cd5116337150ebaa6de788dacd82516fa) at
c600c458623dca7db199a0e417603dff4, Object
cd5116337150ebaa6de788dacd82516fa) at
Alea.ImplicitMemoryTracker.HostReadWriteBarrier(Object instance) at
Alea.GlobalImplicitMemoryTracker.HostReadWriteBarrier(Object instance)
at A.cf5aded17df9f7cc4c132234dda010fa7.clo#2359-624.Invoke(Object
arg00) at
Microsoft.FSharp.Collections.SeqModule.Iterate[T](FSharpFunc2 action,
IEnumerable1 source) at Alea.Kernel.LaunchRaw(LaunchParam lp,
FSharpOption1 instanceOpt, FSharpList1 args) at
Alea.Parallel.Device.DeviceFor.For(Gpu gpu, Int64 fromInclusive, Int64
toExclusive, Action1 op) at Alea.Parallel.GpuExtension.LongFor(Gpu
gpu, Int64 fromInclusive, Int64 toExclusive, Action1 op) at
TestingGPU.Program.Execute(Int32 t) in
C:\Users..\source\repos\TestingGPU\TestingGPU\Program.cs:line 148
at TestingGPU.Program.Main(String[] args)
Working Solution:
static void Main(string[] args)
var devices = Device.Devices;
foreach (var device in devices)
while (true)
Console.WriteLine("Enter a number to check if it is a prime number:");
string line = Console.ReadLine();
long checkIfPrime = Convert.ToInt64(line);
Stopwatch sw = new Stopwatch();
bool GPUisPrime = GPUIsItPrime(checkIfPrime+1);
Stopwatch sw2 = new Stopwatch();
bool CPUisPrime = CPUIsItPrime(checkIfPrime+1);
Console.WriteLine($"GPU: is {checkIfPrime} prime? {GPUisPrime} Time Elapsed: {sw.ElapsedMilliseconds.ToString()}");
Console.WriteLine($"CPU: is {checkIfPrime} prime? {CPUisPrime} Time Elapsed: {sw2.ElapsedMilliseconds.ToString()}");
private static bool GPUIsItPrime(long n)
//Sieve of Eratosthenes Algorithm
bool[] isComposite = new bool[n];
var worker = Gpu.Default;
worker.LongFor(2, n, i =>
if (!(isComposite[i]))
for (long j = 2; (j * i) < isComposite.Length; j++)
isComposite[j * i] = true;
return !isComposite[n-1];
private static bool CPUIsItPrime(long n)
//Sieve of Eratosthenes Algorithm
bool[] isComposite = new bool[n];
for (int i = 2; i < n; i++)
if (!isComposite[i])
for (long j = 2; (j * i) < n; j++)
isComposite[j * i] = true;
return !isComposite[n-1];

Your code doesn't look right. Given a parallel for-loop method here (LongFor), Alea will spawn "n" threads, with an index "x" used to identify what the thread number is. So, for example a simple example like For(0, n, x => a[x] = x); uses "x" to initialize a[] with { 0, 1, 2, ...., n - 1}. But, your kernel code does not use "x" anywhere in the code. Consequently, you run the same code "n" times with absolutely no difference. Why then run on a GPU? What I think you want is to do is to compute in thread "x" whether "x" is prime. With result in hand, set bool prime[x] = true or false. Then, afterwards, in the kernel after all that, add a sync call, followed with a test using a single thread (e.g., x == 0) to go through prime[] and pick the largest prime from the array. Otherwise, there's a lot of collisions for 'primeNumber[0] = (a - 1);' by n-threads on the GPU. I can't imagine how you would ever get the right result. Finally, you probably want to make sure using some Alea call that prime[] is never copied to/from the GPU. But, I don't know how you do that in Alea. The compiler might be smart enough to know that prime[] is only used in the kernel code.


Multiple thread accessing and editing the same double array

I need to iterate through every double in an array to do the "Laplacian Smoothing", "mixing values" with neighbour doubles.
I'll keep stored values in a temp clone array update the original at the end.
Pseudo code:
double[] A = new double[1000];
// Filling A with values...
double[] B = A.Clone as double[];
for(int loops=0;loops<10;loops++){ // start of the loop
for(int i=0;i<1000;i++){ // iterating through all doubles in the array
// Parallel.For(0, 1000, (i) => {
double v= A[i];
// here i'm going out of array bounds, i know. Pseudo code, not relevant.
// });
A = B.Clone as double[];
With for it works correctly. "Smoothing" the values in the array.
With Parallel.For() I have some access sync problems: threads are colliding and some values are actually not stored correctly. Threads access and edit the array at the same index many times.
(I haven't tested this in a linear array, i'm actually working on a multidimensional array[x,y,z] ..)
How can I solve this?
I was thinking to make a separate array for each thread, and do the sum later... but I need to know the thread index and I haven't found anywhere in the web. (I'm still interested if a "thread index" exist even with a totally different solution...).
I'll accept any solution.
You probably need one of the more advanced overloads of the Parallel.For method:
public static ParallelLoopResult For<TLocal>(int fromInclusive, int toExclusive,
ParallelOptions parallelOptions, Func<TLocal> localInit,
Func<int, ParallelLoopState, TLocal, TLocal> body,
Action<TLocal> localFinally);
Executes a for loop with thread-local data in which iterations may run in parallel, loop options can be configured, and the state of the loop can be monitored and manipulated.
This looks quite intimidating with all the various lambdas it expects. The idea is to have each thread work with local data, and finally merge the data
at the end. Here is how you could use this method to solve your problem:
double[] A = new double[1000];
double[] B = (double[])A.Clone();
object locker = new object();
var parallelOptions = new ParallelOptions()
MaxDegreeOfParallelism = Environment.ProcessorCount
Parallel.For(0, A.Length, parallelOptions,
localInit: () => new double[A.Length], // create temp array per thread
body: (i, state, temp) =>
double v = A[i];
temp[i] -= v;
temp[i + 1] += v / 2;
temp[i - 1] += v / 2;
return temp; // return a reference to the same temp array
}, localFinally: (localB) =>
// Can be called in parallel with other threads, so we need to lock
lock (locker)
for (int i = 0; i < localB.Length; i++)
B[i] += localB[i];
I should mention that the workload of the above example is too granular, so I wouldn't expect large improvements in performance from the parallelization. Hopefully your actual workload is more chunky. If for example you have two nested loops, parallelizing only the outer loop will work greatly because the inner loop will provide the much needed chunkiness.
Alternative solution: Instead of creating auxiliary arrays per thread, you could just update directly the B array, and use locks only when processing an index in the dangerous zone near the boundaries of the partitions:
Parallel.ForEach(Partitioner.Create(0, A.Length), parallelOptions, range =>
bool lockTaken = false;
for (int i = range.Item1; i < range.Item2; i++)
bool shouldLock = i < range.Item1 + 1 || i >= range.Item2 - 1;
if (shouldLock) Monitor.Enter(locker, ref lockTaken);
double v = A[i];
B[i] -= v;
B[i + 1] += v / 2;
B[i - 1] += v / 2;
if (shouldLock) { Monitor.Exit(locker); lockTaken = false; }
if (lockTaken) Monitor.Exit(locker);
Ok, it appears that modulus can solve pretty much all my problems.
Here a really simplified version of the working code:
(the big script is 3d and unfinished... )
private void RunScript(bool Go, ref object Results)
// Needed to restart "RunScript" over and over
A = new double[count];
A[100] = 10000;
A[500] = 10000;
Results = A;
// <Custom additional code>
public static int T = Environment.ProcessorCount;
public static int count = 1000;
public double[] A = new double[count];
public double[,] B = new double[count, T];
public void LaplacianSmooth(int loops){
for(int loop = 0;loop < loops;loop++){
B = new double[count, T];
// Copying values to first column of temp multidimensional-array
Parallel.For(0, count, new ParallelOptions { MaxDegreeOfParallelism = T }, i => {
B[i, 0] = A[i];
// Applying Laplacian smoothing
Parallel.For(0, count, new ParallelOptions { MaxDegreeOfParallelism = T }, i => {
int t = i % 16;
// Wrapped next and previous element indexes
int n = (i + 1) % count;
int p = (i + count - 1) % count;
double v = A[i] * 0.5;
B[i, t] -= v;
B[p, t] += v / 2;
B[n, t] += v / 2;
// Copying values back to main array
Parallel.For(0, count, new ParallelOptions { MaxDegreeOfParallelism = T }, i => {
double val = 0;
for(int t = 0;t < T;t++){
val += B[i, t];
A[i] = val;
There are no "collisions" with the threads, as confirmed by the result of "Mass Addition" (a sum) that is constant at 20000.
Thanks everyone for the tips!

Project Euler 549 - My functions are not returning the answer it's supposed to return and I don't know what's wrong

The problem states the following:
The smallest number m such that 10 divides m! is m = 5.
The smallest number m such that 25 divides m! is m = 10.
Let s(n) be the smallest number m such that n divides m!.
So s(10) = 5 and s(25) = 10.
Let S(n) be ∑s(i) for 2 ≤ i ≤ n.
S(100) = 2012.
Find S(10^8).
I made a function for factorials and a function for s(n) which when I tested them worked perfectly but when testing my next function S(n) it returned 1805 as an answer instead of 2012 that I should've gotten.
public class Program
public static long Factorial(long i) {
if (i <= 1)
return 1;
return i * Factorial(i - 1);
public static long s(long n) {
bool dontStop = true;
for (long i = 0; dontStop; i++) {
if (Factorial(i)%n == 0) {
dontStop = false;
return i;
return 0;
public static long S(long n) {
long count = 0;
for (long i = 1; i <= n; i++) {
count += s(i);
return count;
public static void Main(string[] args)
The data type long is simply not nearly "big enough" for high factorials, and with high I mean larger than 20 because:
As you can see, they are in the same order of magnitude, which means that long isn't suited for even Factorial(21) and indeed it returns -4249290049419214848.
You should use System.Numerics.BigInteger, which can store arbitrarily large integers.

C# OpenCL GPU implementation for double array math

How can I make the for loop of this function to use the GPU with OpenCL?
public static double[] Calculate(double[] num, int period)
var final = new double[num.Length];
double sum = num[0];
double coeff = 2.0 / (1.0 + period);
for (int i = 0; i < num.Length; i++)
sum += coeff * (num[i] - sum);
final[i] = sum;
return final;
Your problem as written does not fit well with something that would work on a GPU. You cannot parallelize (in a way that improves performance) the operation on a single array because the value of the nth element depends on elements 1 to n. However, you can utilize the GPU to process multiple arrays, where each GPU core operates on a separate array.
The full code for the solution is at the end of the answer, but the results of the test, to calculate on 10,000 arrays each of which has 10,000 elements, generates the following (on a GTX1080M and an i7 7700k with 32GB RAM):
Task Generating Data: 1096.4583ms
Task CPU Single Thread: 596.2624ms
Task CPU Parallel: 179.1717ms
GPU CPU->GPU: 89ms
GPU Execute: 86ms
GPU GPU->CPU: 29ms
Task Running GPU: 921.4781ms
In this test, we measure the speed at which we can generate results into a managed C# array using the CPU with one thread, the CPU with all threads, and finally the GPU using all cores. We validate that the results from each test are identical, using the function AreTheSame.
The fastest time is processing the arrays on the CPU using all threads (Task CPU Parallel: 179ms).
The GPU is actually the slowest (Task Running GPU: 922ms), but this is because of the time taken to reformat the C# arrays in a way that they can be transferred onto the GPU.
If this bottleneck were removed (which is quite possible, depending on your use case), the GPU could potentially be the fastest. If the data were already formatted in a manner that can be immediately be transferred onto the GPU, the total processing time for the GPU would be 204ms (CPU->GPU: 89ms + Execute: 86ms + GPU->CPU: 29 ms = 204ms). This is still slower than the parallel CPU option, but on a different sort of data set, it might be faster.
To get the data back from the GPU (the most important part of actually using the GPU), we use the function ComputeCommandQueue.Read. This transfers the altered array on the GPU back to the CPU.
To run the following code, reference the Cloo Nuget Package (I used 0.9.1). And make sure to compile on x64 (you will need the memory). You may need to update your graphics card driver too if it fails to find an OpenCL device.
class Program
static string CalculateKernel
return #"
kernel void Calc(global int* offsets, global int* lengths, global double* doubles, double periodFactor)
int id = get_global_id(0);
int start = offsets[id];
int length = lengths[id];
int end = start + length;
double sum = doubles[start];
for(int i = start; i < end; i++)
sum = sum + periodFactor * ( doubles[i] - sum );
doubles[i] = sum;
public static double[] Calculate(double[] num, int period)
var final = new double[num.Length];
double sum = num[0];
double coeff = 2.0 / (1.0 + period);
for (int i = 0; i < num.Length; i++)
sum += coeff * (num[i] - sum);
final[i] = sum;
return final;
static void Main(string[] args)
int maxElements = 10000;
int numArrays = 10000;
int computeCores = 2048;
double[][] sets = new double[numArrays][];
using (Timer("Generating Data"))
Random elementRand = new Random(1);
for (int i = 0; i < numArrays; i++)
sets[i] = GetRandomDoubles(elementRand.Next((int)(maxElements * 0.9), maxElements), randomSeed: i);
int period = 14;
double[][] singleResults;
using (Timer("CPU Single Thread"))
singleResults = CalculateCPU(sets, period);
double[][] parallelResults;
using (Timer("CPU Parallel"))
parallelResults = CalculateCPUParallel(sets, period);
if (!AreTheSame(singleResults, parallelResults)) throw new Exception();
double[][] gpuResults;
using (Timer("Running GPU"))
gpuResults = CalculateGPU(computeCores, sets, period);
if (!AreTheSame(singleResults, gpuResults)) throw new Exception();
public static bool AreTheSame(double[][] a1, double[][] a2)
if (a1.Length != a2.Length) return false;
for (int i = 0; i < a1.Length; i++)
var ar1 = a1[i];
var ar2 = a2[i];
if (ar1.Length != ar2.Length) return false;
for (int j = 0; j < ar1.Length; j++)
if (Math.Abs(ar1[j] - ar2[j]) > 0.0000001) return false;
return true;
public static double[][] CalculateGPU(int partitionSize, double[][] sets, int period)
ComputeContextPropertyList cpl = new ComputeContextPropertyList(ComputePlatform.Platforms[0]);
ComputeContext context = new ComputeContext(ComputeDeviceTypes.Gpu, cpl, null, IntPtr.Zero);
ComputeProgram program = new ComputeProgram(context, new string[] { CalculateKernel });
program.Build(null, null, null, IntPtr.Zero);
ComputeCommandQueue commands = new ComputeCommandQueue(context, context.Devices[0], ComputeCommandQueueFlags.None);
ComputeEventList events = new ComputeEventList();
ComputeKernel kernel = program.CreateKernel("Calc");
double[][] results = new double[sets.Length][];
double periodFactor = 2d / (1d + period);
Stopwatch sendStopWatch = new Stopwatch();
Stopwatch executeStopWatch = new Stopwatch();
Stopwatch recieveStopWatch = new Stopwatch();
int offset = 0;
while (true)
int first = offset;
int last = Math.Min(offset + partitionSize, sets.Length);
int length = last - first;
var merged = Merge(sets, first, length);
ComputeBuffer<int> offsetBuffer = new ComputeBuffer<int>(
ComputeMemoryFlags.ReadWrite | ComputeMemoryFlags.UseHostPointer,
ComputeBuffer<int> lengthsBuffer = new ComputeBuffer<int>(
ComputeMemoryFlags.ReadWrite | ComputeMemoryFlags.UseHostPointer,
ComputeBuffer<double> doublesBuffer = new ComputeBuffer<double>(
ComputeMemoryFlags.ReadWrite | ComputeMemoryFlags.UseHostPointer,
kernel.SetMemoryArgument(0, offsetBuffer);
kernel.SetMemoryArgument(1, lengthsBuffer);
kernel.SetMemoryArgument(2, doublesBuffer);
kernel.SetValueArgument(3, periodFactor);
commands.Execute(kernel, null, new long[] { merged.Lengths.Length }, null, events);
using (var pin = Pinned(merged.Doubles))
commands.Read(doublesBuffer, false, 0, merged.Doubles.Length, pin.Address, events);
for (int i = 0; i < merged.Lengths.Length; i++)
int len = merged.Lengths[i];
int off = merged.Offsets[i];
var res = new double[len];
results[first + i] = res;
offset += partitionSize;
if (offset >= sets.Length) break;
Console.WriteLine("GPU CPU->GPU: " + recieveStopWatch.ElapsedMilliseconds + "ms");
Console.WriteLine("GPU Execute: " + executeStopWatch.ElapsedMilliseconds + "ms");
Console.WriteLine("GPU GPU->CPU: " + sendStopWatch.ElapsedMilliseconds + "ms");
return results;
public static PinnedHandle Pinned(object obj) => new PinnedHandle(obj);
public class PinnedHandle : IDisposable
public IntPtr Address => handle.AddrOfPinnedObject();
private GCHandle handle;
public PinnedHandle(object val)
handle = GCHandle.Alloc(val, GCHandleType.Pinned);
public void Dispose()
public class MergedResults
public double[] Doubles { get; set; }
public int[] Lengths { get; set; }
public int[] Offsets { get; set; }
public static MergedResults Merge(double[][] sets, int offset, int length)
List<int> lengths = new List<int>(length);
List<int> offsets = new List<int>(length);
for (int i = 0; i < length; i++)
var arr = sets[i + offset];
var totalLength = lengths.Sum();
double[] doubles = new double[totalLength];
int dataOffset = 0;
for (int i = 0; i < length; i++)
var arr = sets[i + offset];
Array.Copy(arr, 0, doubles, dataOffset, arr.Length);
dataOffset += arr.Length;
return new MergedResults()
Doubles = doubles,
Lengths = lengths.ToArray(),
Offsets = offsets.ToArray(),
public static IDisposable Timer(string name)
return new SWTimer(name);
public class SWTimer : IDisposable
private Stopwatch _sw;
private string _name;
public SWTimer(string name)
_name = name;
_sw = Stopwatch.StartNew();
public void Dispose()
Console.WriteLine("Task " + _name + ": " + _sw.Elapsed.TotalMilliseconds + "ms");
public static double[][] CalculateCPU(double[][] arrays, int period)
double[][] results = new double[arrays.Length][];
for (var index = 0; index < arrays.Length; index++)
var arr = arrays[index];
results[index] = Calculate(arr, period);
return results;
public static double[][] CalculateCPUParallel(double[][] arrays, int period)
double[][] results = new double[arrays.Length][];
Parallel.For(0, arrays.Length, i =>
var arr = arrays[i];
results[i] = Calculate(arr, period);
return results;
static double[] GetRandomDoubles(int num, int randomSeed)
Random r = new Random(randomSeed);
var res = new double[num];
for (int i = 0; i < num; i++)
res[i] = r.NextDouble() * 0.9 + 0.05;
return res;
as commenter Cory stated refer to this link for setup.
How to use your GPU in .NET
Here is how you would use this project:
Add the Nuget Package Cloo
Add reference to OpenCLlib.dll
Add using OpenCL
static void Main(string[] args)
int[] Primes = { 1,2,3,4,5,6,7 };
EasyCL cl = new EasyCL();
cl.Accelerator = AcceleratorDevice.GPU;
cl.Invoke("GetIfPrime", 0, Primes.Length, Primes, 1.0);
static string IsPrime
return #"
kernel void GetIfPrime(global int* num, int period)
int index = get_global_id(0);
int sum = (2.0 / (1.0 + period)) * (num[index] - num[0]);
printf("" %d \n"",sum);
for (int i = 0; i < num.Length; i++)
sum += coeff * (num[i] - sum);
final[i] = sum;
means first element is multiplied by coeff 1 time and subtracted from 2nd element. First element also multiplied by square of coeff and this time added to 3rd element. Then first element multiplied by cube of coeff and subtracted from 4th element.
This is going like this:
-e0*c*c*c + e1*c*c - e2*c = f3
e0*c*c*c*c - e1*c*c*c + e2*c*c - e3*c = f4
-e0*c*c*c*c*c + e1*c*c*c*c - e2*c*c*c + e3*c*c - e4*c =f5
For all elements, scan through for all smaller id elements and compute this:
if difference of id values(lets call it k) of elements is odd, take subtraction, if not then take addition. Before addition or subtraction, multiply that value by k-th power of coeff. Lastly, multiply the current num value by coefficient and add it to current cell. Current cell value is final(i).
This is O(N*N) and looks like an all-pairs compute kernel. An example using an open-source C# OpenCL project:
ClNumberCruncher cruncher = new ClNumberCruncher(ClPlatforms.all().gpus(), #"
__kernel void foo(__global double * num, __global double * final, __global int *parameters)
int threadId = get_global_id(0);
int period = parameters[0];
double coeff = 2.0 / (1.0 + period);
double sumOfElements = 0.0;
for(int i=0;i<threadId;i++)
// negativity of coeff is to select addition or subtraction for different powers of coeff
double powKofCoeff = pow(-coeff,threadId-i);
sumOfElements += powKofCoeff * num[i];
final[threadId] = sumOfElements + num[threadId] * coeff;
cruncher.performanceFeed = true; // getting benchmark feedback on console
double[] numArray = new double[10000];
double[] finalArray = new double[10000];
int[] parameters = new int[10];
int period = 15;
parameters[0] = period;
ClArray<double> numGpuArray = numArray;
numGpuArray.readOnly = true; // gpus read this from host
ClArray<double> finalGpuArray = finalArray; // finalArray will have results
finalGpuArray.writeOnly = true; // gpus write this to host
ClArray<int> parametersGpu = parameters;
parametersGpu.readOnly = true;
// calculate kernels with exact same ordering of parameters
// num(double),final(double),parameters(int)
// finalGpuArray points to __global double * final
numGpuArray.nextParam(finalGpuArray, parametersGpu).compute(cruncher, 1, "foo", 10000, 100);
// first compute always lags because of compiling the kernel so here are repeated computes to get actual performance
numGpuArray.nextParam(finalGpuArray, parametersGpu).compute(cruncher, 1, "foo", 10000, 100);
numGpuArray.nextParam(finalGpuArray, parametersGpu).compute(cruncher, 1, "foo", 10000, 100);
Results are on finalArray array for 10000 elements, using 100 workitems per workitem-group.
GPGPU part takes 82ms on a rx550 gpu which has very low ratio of 64bit-to-32bit compute performance(because consumer gaming cards are not good at double precision for new series). An Nvidia Tesla or an Amd Vega would easily compute this kernel without crippled performance. Fx8150(8 cores) completes in 683ms. If you need to specifically select only an integrated-GPU and its CPU, you can use
ClPlatforms.all().gpus().devicesWithHostMemorySharing() + ClPlatforms.all().cpus() when creating ClNumberCruncher instance.
binaries of api:
or source code to compile on your pc:
if you have multiple gpus, it uses them without any extra code. Including a cpu to the computations would pull gpu effectiveness down in this sample for first iteration (repeatations complete in 76ms with cpu+gpu) so its better to use 2-3 GPU instead of CPU+GPU.
I didn't check numerical stability(you should use Kahan-Summation when adding millions or more values into same variable but I didn't use it for readability and don't have an idea about if 64-bit values need this too like 32-bit ones) or any value correctness, you should do it. Also foo kernel is not optimized. It makes %50 of core times idle so it should be better scheduled like this:
thread-0: compute element 0 and element N-1
thread-1: compute element 1 and element N-2
thread-m: compute element N/2-1 and element N/2
so all workitems get similar amount of work. On top of this, using 100 for workgroup size is not optimal. It should be something like 128,256,512 or 1024(for Nvidia) but this means array size should also be an integer multiple of this too. Then it would need extra control logic in the kernel to not go out of array borders. For even more performance, for loop could have multiple partial sums to do a "loop unrolling".

Quick Sort Implementation with large numbers [duplicate]

I learnt about quick sort and how it can be implemented in both Recursive and Iterative method.
In Iterative method:
Push the range (0...n) into the stack
Partition the given array with a pivot
Pop the top element.
Push the partitions (index range) onto a stack if the range has more than one element
Do the above 3 steps, till the stack is empty
And the recursive version is the normal one defined in wiki.
I learnt that recursive algorithms are always slower than their iterative counterpart.
So, Which method is preferred in terms of time complexity (memory is not a concern)?
Which one is fast enough to use in Programming contest?
Is c++ STL sort() using a recursive approach?
In terms of (asymptotic) time complexity - they are both the same.
"Recursive is slower then iterative" - the rational behind this statement is because of the overhead of the recursive stack (saving and restoring the environment between calls).
However -these are constant number of ops, while not changing the number of "iterations".
Both recursive and iterative quicksort are O(nlogn) average case and O(n^2) worst case.
just for the fun of it I ran a benchmark with the (java) code attached to the post , and then I ran wilcoxon statistic test, to check what is the probability that the running times are indeed distinct
The results may be conclusive (P_VALUE=2.6e-34, Remember that the P_VALUE is P(T >= t | H) where T is the test statistic and H is the null hypothesis). But the answer is not what you expected.
The average of the iterative solution was 408.86 ms while of recursive was 236.81 ms
(Note - I used Integer and not int as argument to recursiveQsort() - otherwise the recursive would have achieved much better, because it doesn't have to box a lot of integers, which is also time consuming - I did it because the iterative solution has no choice but doing so.
Thus - your assumption is not true, the recursive solution is faster (for my machine and java for the very least) than the iterative one with P_VALUE=2.6e-34.
public static void recursiveQsort(int[] arr,Integer start, Integer end) {
if (end - start < 2) return; //stop clause
int p = start + ((end-start)/2);
p = partition(arr,p,start,end);
recursiveQsort(arr, start, p);
recursiveQsort(arr, p+1, end);
public static void iterativeQsort(int[] arr) {
Stack<Integer> stack = new Stack<Integer>();
while (!stack.isEmpty()) {
int end = stack.pop();
int start = stack.pop();
if (end - start < 2) continue;
int p = start + ((end-start)/2);
p = partition(arr,p,start,end);
private static int partition(int[] arr, int p, int start, int end) {
int l = start;
int h = end - 2;
int piv = arr[p];
while (l < h) {
if (arr[l] < piv) {
} else if (arr[h] >= piv) {
} else {
int idx = h;
if (arr[h] < piv) idx++;
return idx;
private static void swap(int[] arr, int i, int j) {
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
public static void main(String... args) throws Exception {
Random r = new Random(1);
int SIZE = 1000000;
int N = 100;
int[] arr = new int[SIZE];
int[] millisRecursive = new int[N];
int[] millisIterative = new int[N];
for (int t = 0; t < N; t++) {
for (int i = 0; i < SIZE; i++) {
arr[i] = r.nextInt(SIZE);
int[] tempArr = Arrays.copyOf(arr, arr.length);
long start = System.currentTimeMillis();
millisIterative[t] = (int)(System.currentTimeMillis()-start);
tempArr = Arrays.copyOf(arr, arr.length);
start = System.currentTimeMillis();
millisRecursive[t] = (int)(System.currentTimeMillis()-start);
int sum = 0;
for (int x : millisRecursive) {
sum += x;
System.out.println("end of recursive. AVG = " + ((double)sum)/millisRecursive.length);
sum = 0;
for (int x : millisIterative) {
sum += x;
System.out.println("end of iterative. AVG = " + ((double)sum)/millisIterative.length);
Recursion is NOT always slower than iteration. Quicksort is perfect example of it. The only way to do this in iterate way is create stack structure. So in other way do the same that the compiler do if we use recursion, and propably you will do this worse than compiler. Also there will be more jumps if you don't use recursion (to pop and push values to stack).
That's the solution i came up with in Javascript. I think it works.
const myArr = [33, 103, 3, 726, 200, 984, 198, 764, 9]
document.write('initial order :', JSON.stringify(myArr), '<br><br>')
document.write('_Final order :', JSON.stringify(myArr))
function qs_iter(items) {
if (!items || items.length <= 1) {
return items
var stack = []
var low = 0
var high = items.length - 1
stack.push([low, high])
while (stack.length) {
var range = stack.pop()
low = range[0]
high = range[1]
if (low < high) {
var pivot = Math.floor((low + high) / 2)
stack.push([low, pivot])
stack.push([pivot + 1, high])
while (low < high) {
while (low < pivot && items[low] <= items[pivot]) low++
while (high > pivot && items[high] > items[pivot]) high--
if (low < high) {
var tmp = items[low]
items[low] = items[high]
items[high] = tmp
return items
Let me know if you found a mistake :)
Mister Jojo UPDATE :
this code just mixes values that can in rare cases lead to a sort, in other words never.
For those who have a doubt, I put it in snippet.

Program to find prime numbers

I want to find the prime number between 0 and a long variable but I am not able to get any output.
The program is
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication16
class Program
void prime_num(long num)
bool isPrime = true;
for (int i = 0; i <= num; i++)
for (int j = 2; j <= num; j++)
if (i != j && i % j == 0)
isPrime = false;
if (isPrime)
Console.WriteLine ( "Prime:" + i );
isPrime = true;
static void Main(string[] args)
Program p = new Program();
p.prime_num (999999999999999L);
Can any one help me out and find what is the possible error in the program?
You can do this faster using a nearly optimal trial division sieve in one (long) line like this:
Enumerable.Range(0, Math.Floor(2.52*Math.Sqrt(num)/Math.Log(num))).Aggregate(
Enumerable.Range(2, num-1).ToList(),
(result, index) => {
var bp = result[index]; var sqr = bp * bp;
result.RemoveAll(i => i >= sqr && i % bp == 0);
return result;
The approximation formula for number of primes used here is π(x) < 1.26 x / ln(x). We only need to test by primes not greater than x = sqrt(num).
Note that the sieve of Eratosthenes has much better run time complexity than trial division (should run much faster for bigger num values, when properly implemented).
Try this:
void prime_num(long num)
// bool isPrime = true;
for (long i = 0; i <= num; i++)
bool isPrime = true; // Move initialization to here
for (long j = 2; j < i; j++) // you actually only need to check up to sqrt(i)
if (i % j == 0) // you don't need the first condition
isPrime = false;
if (isPrime)
Console.WriteLine ( "Prime:" + i );
// isPrime = true;
You only need to check odd divisors up to the square root of the number. In other words your inner loop needs to start:
for (int j = 3; j <= Math.Sqrt(i); j+=2) { ... }
You can also break out of the function as soon as you find the number is not prime, you don't need to check any more divisors (I see you're already doing that!).
This will only work if num is bigger than two.
No Sqrt
You can avoid the Sqrt altogether by keeping a running sum. For example:
int square_sum=1;
for (int j=3; square_sum<i; square_sum+=4*(j++-1)) {...}
This is because the sum of numbers 1+(3+5)+(7+9) will give you a sequence of odd squares (1,9,25 etc). And hence j represents the square root of square_sum. As long as square_sum is less than i then j is less than the square root.
People have mentioned a couple of the building blocks toward doing this efficiently, but nobody's really put the pieces together. The sieve of Eratosthenes is a good start, but with it you'll run out of memory long before you reach the limit you've set. That doesn't mean it's useless though -- when you're doing your loop, what you really care about are prime divisors. As such, you can start by using the sieve to create a base of prime divisors, then use those in the loop to test numbers for primacy.
When you write the loop, however, you really do NOT want to us sqrt(i) in the loop condition as a couple of answers have suggested. You and I know that the sqrt is a "pure" function that always gives the same answer if given the same input parameter. Unfortunately, the compiler does NOT know that, so if use something like '<=Math.sqrt(x)' in the loop condition, it'll re-compute the sqrt of the number every iteration of the loop.
You can avoid that a couple of different ways. You can either pre-compute the sqrt before the loop, and use the pre-computed value in the loop condition, or you can work in the other direction, and change i<Math.sqrt(x) to i*i<x. Personally, I'd pre-compute the square root though -- I think it's clearer and probably a bit faster--but that depends on the number of iterations of the loop (the i*i means it's still doing a multiplication in the loop). With only a few iterations, i*i will typically be faster. With enough iterations, the loss from i*i every iteration outweighs the time for executing sqrt once outside the loop.
That's probably adequate for the size of numbers you're dealing with -- a 15 digit limit means the square root is 7 or 8 digits, which fits in a pretty reasonable amount of memory. On the other hand, if you want to deal with numbers in this range a lot, you might want to look at some of the more sophisticated prime-checking algorithms, such as Pollard's or Brent's algorithms. These are more complex (to put it mildly) but a lot faster for large numbers.
There are other algorithms for even bigger numbers (quadratic sieve, general number field sieve) but we won't get into them for the moment -- they're a lot more complex, and really only useful for dealing with really big numbers (the GNFS starts to be useful in the 100+ digit range).
First step: write an extension method to find out if an input is prime
public static bool isPrime(this int number ) {
for (int i = 2; i < number; i++) {
if (number % i == 0) {
return false;
return true;
2 step: write the method that will print all prime numbers that are between 0 and the number input
public static void getAllPrimes(int number)
for (int i = 0; i < number; i++)
if (i.isPrime()) Console.WriteLine(i);
It may just be my opinion, but there's another serious error in your program (setting aside the given 'prime number' question, which has been thoroughly answered).
Like the rest of the responders, I'm assuming this is homework, which indicates you want to become a developer (presumably).
You need to learn to compartmentalize your code. It's not something you'll always need to do in a project, but it's good to know how to do it.
Your method prime_num(long num) could stand a better, more descriptive name. And if it is supposed to find all prime numbers less than a given number, it should return them as a list. This makes it easier to seperate your display and your functionality.
If it simply returned an IList containing prime numbers you could then display them in your main function (perhaps calling another outside function to pretty print them) or use them in further calculations down the line.
So my best recommendation to you is to do something like this:
public void main(string args[])
//Get the number you want to use as input
long x = number;//'number' can be hard coded or retrieved from ReadLine() or from the given arguments
IList<long> primes = FindSmallerPrimes(number);
public IList<long> FindSmallerPrimes(long largestNumber)
List<long> returnList = new List<long>();
//Find the primes, using a method as described by another answer, add them to returnList
return returnList;
public void DisplayPrimes(IList<long> primes)
foreach(long l in primes)
Console.WriteLine ( "Prime:" + l.ToString() );
Even if you end up working somewhere where speration like this isn't needed, it's good to know how to do it.
EDIT_ADD: If Will Ness is correct that the question's purpose is just to output a continuous stream of primes for as long as the program is run (pressing Pause/Break to pause and any key to start again) with no serious hope of every getting to that upper limit, then the code should be written with no upper limit argument and a range check of "true" for the first 'i' for loop. On the other hand, if the question wanted to actually print the primes up to a limit, then the following code will do the job much more efficiently using Trial Division only for odd numbers, with the advantage that it doesn't use memory at all (it could also be converted to a continuous loop as per the above):
static void primesttt(ulong top_number) {
Console.WriteLine("Prime: 2");
for (var i = 3UL; i <= top_number; i += 2) {
var isPrime = true;
for (uint j = 3u, lim = (uint)Math.Sqrt((double)i); j <= lim; j += 2) {
if (i % j == 0) {
isPrime = false;
if (isPrime) Console.WriteLine("Prime: {0} ", i);
First, the question code produces no output because of that its loop variables are integers and the limit tested is a huge long integer, meaning that it is impossible for the loop to reach the limit producing an inner loop EDITED: whereby the variable 'j' loops back around to negative numbers; when the 'j' variable comes back around to -1, the tested number fails the prime test because all numbers are evenly divisible by -1 END_EDIT. Even if this were corrected, the question code produces very slow output because it gets bound up doing 64-bit divisions of very large quantities of composite numbers (all the even numbers plus the odd composites) by the whole range of numbers up to that top number of ten raised to the sixteenth power for each prime that it can possibly produce. The above code works because it limits the computation to only the odd numbers and only does modulo divisions up to the square root of the current number being tested.
This takes an hour or so to display the primes up to a billion, so one can imagine the amount of time it would take to show all the primes to ten thousand trillion (10 raised to the sixteenth power), especially as the calculation gets slower with increasing range. END_EDIT_ADD
Although the one liner (kind of) answer by #SLaks using Linq works, it isn't really the Sieve of Eratosthenes as it is just an unoptimised version of Trial Division, unoptimised in that it does not eliminate odd primes, doesn't start at the square of the found base prime, and doesn't stop culling for base primes larger than the square root of the top number to sieve. It is also quite slow due to the multiple nested enumeration operations.
It is actually an abuse of the Linq Aggregate method and doesn't effectively use the first of the two Linq Range's generated. It can become an optimized Trial Division with less enumeration overhead as follows:
static IEnumerable<int> primes(uint top_number) {
var cullbf = Enumerable.Range(2, (int)top_number).ToList();
for (int i = 0; i < cullbf.Count; i++) {
var bp = cullbf[i]; var sqr = bp * bp; if (sqr > top_number) break;
cullbf.RemoveAll(c => c >= sqr && c % bp == 0);
} return cullbf; }
which runs many times faster than the SLaks answer. However, it is still slow and memory intensive due to the List generation and the multiple enumerations as well as the multiple divide (implied by the modulo) operations.
The following true Sieve of Eratosthenes implementation runs about 30 times faster and takes much less memory as it only uses a one bit representation per number sieved and limits its enumeration to the final iterator sequence output, as well having the optimisations of only treating odd composites, and only culling from the squares of the base primes for base primes up to the square root of the maximum number, as follows:
static IEnumerable<uint> primes(uint top_number) {
if (top_number < 2u) yield break;
yield return 2u; if (top_number < 3u) yield break;
var BFLMT = (top_number - 3u) / 2u;
var SQRTLMT = ((uint)(Math.Sqrt((double)top_number)) - 3u) / 2u;
var buf = new BitArray((int)BFLMT + 1,true);
for (var i = 0u; i <= BFLMT; ++i) if (buf[(int)i]) {
var p = 3u + i + i; if (i <= SQRTLMT) {
for (var j = (p * p - 3u) / 2u; j <= BFLMT; j += p)
buf[(int)j] = false; } yield return p; } }
The above code calculates all the primes to ten million range in about 77 milliseconds on an Intel i7-2700K (3.5 GHz).
Either of the two static methods can be called and tested with the using statements and with the static Main method as follows:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
static void Main(string[] args) {
Console.WriteLine("This program generates prime sequences.\r\n");
var n = 10000000u;
var elpsd = -DateTime.Now.Ticks;
var count = 0; var lastp = 0u;
foreach (var p in primes(n)) { if (p > n) break; ++count; lastp = (uint)p; }
elpsd += DateTime.Now.Ticks;
"{0} primes found <= {1}; the last one is {2} in {3} milliseconds.",
count, n, lastp,elpsd / 10000);
Console.Write("\r\nPress any key to exit:");
which will show the number of primes in the sequence up to the limit, the last prime found, and the time expended in enumerating that far.
EDIT_ADD: However, in order to produce an enumeration of the number of primes less than ten thousand trillion (ten to the sixteenth power) as the question asks, a segmented paged approach using multi-core processing is required but even with C++ and the very highly optimized PrimeSieve, this would require something over 400 hours to just produce the number of primes found, and tens of times that long to enumerate all of them so over a year to do what the question asks. To do it using the un-optimized Trial Division algorithm attempted, it will take super eons and a very very long time even using an optimized Trial Division algorithm as in something like ten to the two millionth power years (that's two million zeros years!!!).
It isn't much wonder that his desktop machine just sat and stalled when he tried it!!!! If he had tried a smaller range such as one million, he still would have found it takes in the range of seconds as implemented.
The solutions I post here won't cut it either as even the last Sieve of Eratosthenes one will require about 640 Terabytes of memory for that range.
That is why only a page segmented approach such as that of PrimeSieve can handle this sort of problem for the range as specified at all, and even that requires a very long time, as in weeks to years unless one has access to a super computer with hundreds of thousands of cores. END_EDIT_ADD
Smells like more homework. My very very old graphing calculator had a is prime program like this. Technnically the inner devision checking loop only needs to run to i^(1/2). Do you need to find "all" prime numbers between 0 and L ? The other major problem is that your loop variables are "int" while your input data is "long", this will be causing an overflow making your loops fail to execute even once. Fix the loop variables.
One line code in C# :-
Enumerable.Range(2, 300)
.Where(n => Enumerable.Range(2, (int)Math.Sqrt(n) - 1)
.All(nn => n % nn != 0)).ToArray()));
The Sieve of Eratosthenes answer above is not quite correct. As written it will find all the primes between 1 and 1000000. To find all the primes between 1 and num use:
private static IEnumerable Primes01(int num)
return Enumerable.Range(1, Convert.ToInt32(Math.Floor(Math.Sqrt(num))))
.Aggregate(Enumerable.Range(1, num).ToList(),
(result, index) =>
result.RemoveAll(i => i > result[index] && i%result[index] == 0);
return result;
The seed of the Aggregate should be range 1 to num since this list will contain the final list of primes. The Enumerable.Range(1, Convert.ToInt32(Math.Floor(Math.Sqrt(num)))) is the number of times the seed is purged.
ExchangeCore Forums have a good console application listed that looks to write found primes to a file, it looks like you can also use that same file as a starting point so you don't have to restart finding primes from 2 and they provide a download of that file with all found primes up to 100 million so it would be a good start.
The algorithm on the page also takes a couple shortcuts (odd numbers and only checks up to the square root) which makes it extremely efficient and it will allow you to calculate long numbers.
so this is basically just two typos, one, the most unfortunate, for (int j = 2; j <= num; j++) which is the reason for the unproductive testing of 1%2,1%3 ... 1%(10^15-1) which goes on for very long time so the OP didn't get "any output". It should've been j < i; instead. The other, minor one in comparison, is that i should start from 2, not from 0:
for( i=2; i <= num; i++ )
for( j=2; j < i; j++ ) // j <= sqrt(i) is really enough
Surely it can't be reasonably expected of a console print-out of 28 trillion primes or so to be completed in any reasonable time-frame. So, the original intent of the problem was obviously to print out a steady stream of primes, indefinitely. Hence all the solutions proposing simple use of sieve of Eratosthenes are totally without merit here, because simple sieve of Eratosthenes is bounded - a limit must be set in advance.
What could work here is the optimized trial division which would save the primes as it finds them, and test against the primes, not just all numbers below the candidate.
Second alternative, with much better complexity (i.e. much faster) is to use a segmented sieve of Eratosthenes. Which is incremental and unbounded.
Both these schemes would use double-staged production of primes: one would produce and save the primes, to be used by the other stage in testing (or sieving), much above the limit of the first stage (below its square of course - automatically extending the first stage, as the second stage would go further and further up).
To be quite frank, some of the suggested solutions are really slow, and therefore are bad suggestions. For testing a single number to be prime you need some dividing/modulo operator, but for calculating a range you don't have to.
Basically you just exclude numbers that are multiples of earlier found primes, as the are (by definition) not primes themselves.
I will not give the full implementation, as that would be to easy, this is the approach in pseudo code. (On my machine, the actual implementation calculates all primes in an Sytem.Int32 (2 bilion) within 8 seconds.
public IEnumerable<long> GetPrimes(long max)
// we safe the result set in an array of bytes.
var buffer = new byte[long >> 4];
// 1 is not a prime.
buffer[0] = 1;
var iMax = (long)Math.Sqrt(max);
for(long i = 3; i <= iMax; i +=2 )
// find the index in the buffer
var index = i >> 4;
// find the bit of the buffer.
var bit = (i >> 1) & 7;
// A not set bit means: prime
if((buffer[index] & (1 << bit)) == 0)
var step = i << 2;
while(step < max)
// find position in the buffer to write bits that represent number that are not prime.
// 2 is not in the buffer.
yield return 2;
// loop through buffer and yield return odd primes too.
The solution requires a good understanding of bitwise operations. But it ways, and ways faster. You also can safe the result of the outcome on disc, if you need them for later use. The result of 17 * 10^9 numbers can be safed with 1 GB, and the calculation of that result set takes about 2 minutes max.
I know this is quiet old question, but after reading here:
Sieve of Eratosthenes Wiki
This is the way i wrote it from understanding the algorithm:
void SieveOfEratosthenes(int n)
bool[] primes = new bool[n + 1];
for (int i = 0; i < n; i++)
primes[i] = true;
for (int i = 2; i * i <= n; i++)
if (primes[i])
for (int j = i * 2; j <= n; j += i)
primes[j] = false;
for (int i = 2; i <= n; i++)
if (primes[i]) Console.Write(i + " ");
In the first loop we fill the array of booleans with true.
Second for loop will start from 2 since 1 is not a prime number and will check if prime number is still not changed and then assign false to the index of j.
last loop we just printing when it is prime.
Very similar - from an exercise to implement Sieve of Eratosthenes in C#:
public class PrimeFinder
readonly List<long> _primes = new List<long>();
public PrimeFinder(long seed)
public List<long> Primes { get { return _primes; } }
private void CalcPrimes(long maxValue)
for (int checkValue = 3; checkValue <= maxValue; checkValue += 2)
if (IsPrime(checkValue))
private bool IsPrime(long checkValue)
bool isPrime = true;
foreach (long prime in _primes)
if ((checkValue % prime) == 0 && prime <= Math.Sqrt(checkValue))
isPrime = false;
return isPrime;
Prime Helper very fast calculation
public static class PrimeHelper
public static IEnumerable<Int32> FindPrimes(Int32 maxNumber)
return (new PrimesInt32(maxNumber));
public static IEnumerable<Int32> FindPrimes(Int32 minNumber, Int32 maxNumber)
return FindPrimes(maxNumber).Where(pn => pn >= minNumber);
public static bool IsPrime(this Int64 number)
if (number < 2)
return false;
else if (number < 4 )
return true;
var limit = (Int32)System.Math.Sqrt(number) + 1;
var foundPrimes = new PrimesInt32(limit);
return !foundPrimes.IsDivisible(number);
public static bool IsPrime(this Int32 number)
return IsPrime(Convert.ToInt64(number));
public static bool IsPrime(this Int16 number)
return IsPrime(Convert.ToInt64(number));
public static bool IsPrime(this byte number)
return IsPrime(Convert.ToInt64(number));
public class PrimesInt32 : IEnumerable<Int32>
private Int32 limit;
private BitArray numbers;
public PrimesInt32(Int32 limit)
if (limit < 2)
throw new Exception("Prime numbers not found.");
startTime = DateTime.Now;
calculateTime = startTime - startTime;
this.limit = limit;
try { findPrimes(); } catch{/*Overflows or Out of Memory*/}
calculateTime = DateTime.Now - startTime;
private void findPrimes()
The Sieve Algorithm
numbers = new BitArray(limit, true);
for (Int32 i = 2; i < limit; i++)
if (numbers[i])
for (Int32 j = i * 2; j < limit; j += i)
numbers[j] = false;
public IEnumerator<Int32> GetEnumerator()
for (Int32 i = 2; i < 3; i++)
if (numbers[i])
yield return i;
if (limit > 2)
for (Int32 i = 3; i < limit; i += 2)
if (numbers[i])
yield return i;
IEnumerator IEnumerable.GetEnumerator()
return GetEnumerator();
// Extended for Int64
public bool IsDivisible(Int64 number)
var sqrt = System.Math.Sqrt(number);
foreach (var prime in this)
if (prime > sqrt)
if (number % prime == 0)
DivisibleBy = prime;
return true;
return false;
private static DateTime startTime;
private static TimeSpan calculateTime;
public static TimeSpan CalculateTime { get { return calculateTime; } }
public Int32 DivisibleBy { get; set; }
public static void Main()
Console.WriteLine("enter the number");
int i = int.Parse(Console.ReadLine());
for (int j = 2; j <= i; j++)
for (int k = 2; k <= i; k++)
if (j == k)
Console.WriteLine("{0}is prime", j);
else if (j % k == 0)
static void Main(string[] args)
{ int i,j;
Console.WriteLine("prime no between 1 to 100");
for (i = 2; i <= 100; i++)
int count = 0;
for (j = 1; j <= i; j++)
if (i % j == 0)
{ count=count+1; }
if ( count <= 2)
{ Console.WriteLine(i); }
U can use the normal prime number concept must only two factors (one and itself).
So do like this,easy way
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace PrimeNUmber
class Program
static void FindPrimeNumber(long num)
for (long i = 1; i <= num; i++)
int totalFactors = 0;
for (int j = 1; j <= i; j++)
if (i % j == 0)
totalFactors = totalFactors + 1;
if (totalFactors == 2)
static void Main(string[] args)
long num;
Console.WriteLine("Enter any value");
num = Convert.ToInt64(Console.ReadLine());
This solution displays all prime numbers between 0 and 100.
int counter = 0;
for (int c = 0; c <= 100; c++)
counter = 0;
for (int i = 1; i <= c; i++)
if (c % i == 0)
{ counter++; }
if (counter == 2)
{ Console.Write(c + " "); }
This is the fastest way to calculate prime numbers in C#.
void PrimeNumber(long number)
bool IsprimeNumber = true;
long value = Convert.ToInt32(Math.Sqrt(number));
if (number % 2 == 0)
IsprimeNumber = false;
for (long i = 3; i <= value; i=i+2)
if (number % i == 0)
// MessageBox.Show("It is divisible by" + i);
IsprimeNumber = false;
if (IsprimeNumber)
MessageBox.Show("Yes Prime Number");
MessageBox.Show("No It is not a Prime NUmber");
class CheckIfPrime
static void Main()
while (true)
Console.Write("Enter a number: ");
decimal a = decimal.Parse(Console.ReadLine());
decimal[] k = new decimal[int.Parse(a.ToString())];
decimal p = 0;
for (int i = 2; i < a; i++)
if (a % i != 0)
p += i;
k[i] = i;
p += i;
if (p == k.Sum())
{ Console.WriteLine ("{0} is prime!", a);}
{Console.WriteLine("{0} is NOT prime", a);}
There are some very optimal ways to implement the algorithm. But if you don't know much about maths and you simply follow the definition of prime as the requirement:
a number that is only divisible by 1 and by itself (and nothing else), here's a simple to understand code for positive numbers.
public bool IsPrime(int candidateNumber)
int fromNumber = 2;
int toNumber = candidateNumber - 1;
while(fromNumber <= toNumber)
bool isDivisible = candidateNumber % fromNumber == 0;
if (isDivisible)
return false;
return true;
Since every number is divisible by 1 and by itself, we start checking from 2 onwards until the number immediately before itself. That's the basic reasoning.
You can do also this:
class Program
static void Main(string[] args)
long numberToTest = 350124;
bool isPrime = NumberIsPrime(numberToTest);
Console.WriteLine(string.Format("Number {0} is prime? {1}", numberToTest, isPrime));
private static bool NumberIsPrime(long n)
bool retVal = true;
if (n <= 3)
retVal = n > 1;
} else if (n % 2 == 0 || n % 3 == 0)
retVal = false;
int i = 5;
while (i * i <= n)
if (n % i == 0 || n % (i + 2) == 0)
retVal = false;
i += 6;
return retVal;
An easier approach , what i did is check if a number have exactly two division factors which is the essence of prime numbers .
List<int> factorList = new List<int>();
int[] numArray = new int[] { 1, 0, 6, 9, 7, 5, 3, 6, 0, 8, 1 };
foreach (int item in numArray)
for (int x = 1; x <= item; x++)
//check for the remainder after dividing for each number less that number
if (item % x == 0)
if (factorList.Count == 2) // has only 2 division factors ; prime number
Console.WriteLine(item + " is a prime number ");
{Console.WriteLine(item + " is not a prime number ");}
factorList = new List<int>(); // reinitialize list
Here is a solution with unit test:
The solution:
public class PrimeNumbersKata
public int CountPrimeNumbers(int n)
if (n < 0) throw new ArgumentException("Not valide numbre");
if (n == 0 || n == 1) return 0;
int cpt = 0;
for (int i = 2; i <= n; i++)
if (IsPrimaire(i)) cpt++;
return cpt;
private bool IsPrimaire(int number)
for (int i = 2; i <= number / 2; i++)
if (number % i == 0) return false;
return true;
The tests:
class PrimeNumbersKataTest
private PrimeNumbersKata primeNumbersKata;
public void Init()
primeNumbersKata = new PrimeNumbersKata();
public void CountPrimeNumbers_N_AsArgument_returnCountPrimes(int n, int expected)
var actual = primeNumbersKata.CountPrimeNumbers(n);
public void CountPrimairs_N_IsNegative_RaiseAnException()
var ex = Assert.Throws<ArgumentException>(()=> { primeNumbersKata.CountPrimeNumbers(-1); });
//Assert.That(ex.Message == "Not valide numbre");
Assert.That(ex.Message, Is.EqualTo("Not valide numbre"));
in the university it was necessary to count prime numbers up to 10,000 did so, the teacher was a little surprised, but I passed the test. Lang c#
void Main()
int number=1;
for(long i=2;i<10000;i++)
Console.WriteLine(number+++" " +i);
List<long> KnownPrime = new List<long>();
private bool PrimeTest(long i)
if (i == 1) return false;
if (i == 2)
return true;
foreach(int k in KnownPrime)
return false;
return true;
for (int i = 2; i < 100; i++)
bool isPrimeNumber = true;
for (int j = 2; j <= i && j <= 100; j++)
if (i != j && i % j == 0)
isPrimeNumber = false; break;
if (isPrimeNumber)
