Fastest way to calculate primes in C#? - c#

I actually have an answer to my question but it is not parallelized so I am interested in ways to improve the algorithm. Anyway it might be useful as-is for some people.
int Until = 20000000;
BitArray PrimeBits = new BitArray(Until, true);
/*
* Sieve of Eratosthenes
* PrimeBits is a simple BitArray where all bit is an integer
* and we mark composite numbers as false
*/
PrimeBits.Set(0, false); // You don't actually need this, just
PrimeBits.Set(1, false); // remindig you that 2 is the smallest prime
for (int P = 2; P < (int)Math.Sqrt(Until) + 1; P++)
if (PrimeBits.Get(P))
// These are going to be the multiples of P if it is a prime
for (int PMultiply = P * 2; PMultiply < Until; PMultiply += P)
PrimeBits.Set(PMultiply, false);
// We use this to store the actual prime numbers
List<int> Primes = new List<int>();
for (int i = 2; i < Until; i++)
if (PrimeBits.Get(i))
Primes.Add(i);
Maybe I could use multiple BitArrays and BitArray.And() them together?

You might save some time by cross-referencing your bit array with a doubly-linked list, so you can more quickly advance to the next prime.
Also, in eliminating later composites once you hit a new prime p for the first time - the first composite multiple of p remaining will be p*p, since everything before that has already been eliminated. In fact, you only need to multiply p by all the remaining potential primes that are left after it in the list, stopping as soon as your product is out of range (larger than Until).
There are also some good probabilistic algorithms out there, such as the Miller-Rabin test. The wikipedia page is a good introduction.

Parallelisation aside, you don't want to be calculating sqrt(Until) on every iteration. You also can assume multiples of 2, 3 and 5 and only calculate for N%6 in {1,5} or N%30 in {1,7,11,13,17,19,23,29}.
You should be able to parallelize the factoring algorithm quite easily, since the Nth stage only depends on the sqrt(n)th result, so after a while there won't be any conflicts. But that's not a good algorithm, since it requires lots of division.
You should also be able to parallelize the sieve algorithms, if you have writer work packets which are guaranteed to complete before a read. Mostly the writers shouldn't conflict with the reader - at least once you've done a few entries, they should be working at least N above the reader, so you only need a synchronized read fairly occasionally (when N exceeds the last synchronized read value). You shouldn't need to synchronize the bool array across any number of writer threads, since write conflicts don't arise (at worst, more than one thread will write a true to the same place).
The main issue would be to ensure that any worker being waited on to write has completed. In C++ you'd use a compare-and-set to switch to the worker which is being waited for at any point. I'm not a C# wonk so don't know how to do it that language, but the Win32 InterlockedCompareExchange function should be available.
You also might try an actor based approach, since that way you can schedule the actors working with the lowest values, which may be easier to guarantee that you're reading valid parts of the the sieve without having to lock the bus on each increment of N.
Either way, you have to ensure that all workers have got above entry N before you read it, and the cost of doing that is where the trade-off between parallel and serial is made.

Without profiling we cannot tell which bit of the program needs optimizing.
If you were in a large system, then one would use a profiler to find that the prime number generator is the part that needs optimizing.
Profiling a loop with a dozen or so instructions in it is not usually worth while - the overhead of the profiler is significant compared to the loop body, and about the only ways to improve a loop that small is to change the algorithm to do fewer iterations. So IME, once you've eliminated any expensive functions and have a known target of a few lines of simple code, you're better off changing the algorithm and timing an end-to-end run than trying to improve the code by instruction level profiling.

#DrPizza Profiling only really helps improve an implementation, it doesn't reveal opportunities for parallel execution, or suggest better algorithms (unless you've experience to the otherwise, in which case I'd really like to see your profiler).
I've only single core machines at home, but ran a Java equivalent of your BitArray sieve, and a single threaded version of the inversion of the sieve - holding the marking primes in an array, and using a wheel to reduce the search space by a factor of five, then marking a bit array in increments of the wheel using each marking prime. It also reduces storage to O(sqrt(N)) instead of O(N), which helps both in terms of the largest N, paging, and bandwidth.
For medium values of N (1e8 to 1e12), the primes up to sqrt(N) can be found quite quickly, and after that you should be able to parallelise the subsequent search on the CPU quite easily. On my single core machine, the wheel approach finds primes up to 1e9 in 28s, whereas your sieve (after moving the sqrt out of the loop) takes 86s - the improvement is due to the wheel; the inversion means you can handle N larger than 2^32 but makes it slower. Code can be found here. You could parallelise the output of the results from the naive sieve after you go past sqrt(N) too, as the bit array is not modified after that point; but once you are dealing with N large enough for it to matter the array size is too big for ints.

You also should consider a possible change of algorithms.
Consider that it may be cheaper to simply add the elements to your list, as you find them.
Perhaps preallocating space for your list, will make it cheaper to build/populate.

Are you trying to find new primes? This may sound stupid, but you might be able to load up some sort of a data structure with known primes. I am sure someone out there has a list. It might be a much easier problem to find existing numbers that calculate new ones.
You might also look at Microsofts Parallel FX Library for making your existing code multi-threaded to take advantage of multi-core systems. With minimal code changes you can make you for loops multi-threaded.

There's a very good article about the Sieve of Eratosthenes: The Genuine Sieve of Eratosthenes
It's in a functional setting, but most of the opimization do also apply to a procedural implementation in C#.
The two most important optimizations are to start crossing out at P^2 instead of 2*P and to use a wheel for the next prime numbers.
For concurrency, you can process all numbers till P^2 in parallel to P without doing any unnecessary work.

void PrimeNumber(long number)
{
bool IsprimeNumber = true;
long value = Convert.ToInt32(Math.Sqrt(number));
if (number % 2 == 0)
{
IsprimeNumber = false;
MessageBox.Show("No It is not a Prime NUmber");
return;
}
for (long i = 3; i <= value; i=i+2)
{
if (number % i == 0)
{
MessageBox.Show("It is divisible by" + i);
IsprimeNumber = false;
break;
}
}
if (IsprimeNumber)
{
MessageBox.Show("Yes Prime NUmber");
}
else
{
MessageBox.Show("No It is not a Prime NUmber");
}
}

Related

Parallel while loop in C#

I try to write BBS-generator. But this generator is too slow. So I need to parallel while. But i did not how do it. Please help me:).
public static List<BigInteger> GenerateFullCycle(long N, long x0)
{
List<BigInteger> randomNumbers = new List<BigInteger>();
var bbs = new BBS_generator(N, x0);
BigInteger rvalue = bbs.Generate();
while(randomNumbers.Contains(rvalue) == false)
{
randomNumbers.Add(rvalue);
rvalue = bbs.Generate();
}
return randomNumbers;
}
You might want to look into improving your algorithm first. How large does the randomNumbers list get? Contains is pretty slow on larger lists. You might want to use a HashSet instead.
BBS is slow, as simple as that. Using BigInteger really doesn't help at all - also quite slow.
Apart from that, there is no way to parallelize this code. The second iteration on the loop depends on the first, the third on the second etc.. You can parallelize the GenerateFullCycle calls themselves, but that's about it.
Parallelization isn't a solve-it-all thing. Many things aren't parallelizable at all - and how much do you get by parallelization? Twice faster? Even if you managed to completely separate and run at 100% efficiency, a 4-core CPU still only means you're four times as fast. True, it does have its applications, but I fail to see how it could help a BBS generator.
And the number one rule of performance tweaking - profile. Where is your application actually spending time?
Maybe you can use something like a Background Worker:
http://msdn.microsoft.com/en-us/library/cc221403(v=vs.95).aspx

Arbitrary precision arithmetic with very big factorials

This is a mathematical problem, not programming to be something useful!
I want to count factorials of very big numbers (10^n where n>6).
I reached to arbitrary precision, which is very helpful in tasks like 1000!. But it obviously dies(StackOverflowException :) ) at much higher values. I'm not looking for a direct answer, but some clues on how to proceed further.
static BigInteger factorial(BigInteger i)
{
if (i < 1)
return 1;
else
return i * factorial(i - 1);
}
static void Main(string[] args)
{
long z = (long)Math.Pow(10, 12);
Console.WriteLine(factorial(z));
Console.Read();
}
Would I have to resign from System.Numerics.BigInteger? I was thinking of some way of storing necessary data in files, since RAM will obviously run out. Optimization is at this point very important. So what would You recommend?
Also, I need values to be as precise as possible. Forgot to mention that I don't need all of these numbers, just about 20 last ones.
As other answers have shown, the recursion is easily removed. Now the question is: can you store the result in a BigInteger, or are you going to have to go to some sort of external storage?
The number of bits you need to store n! is roughly proportional to n log n. (This is a weak form of Stirling's Approximation.) So let's look at some sizes: (Note that I made some arithmetic errors in an earlier version of this post, which I am correcting here.)
(10^6)! takes order of 2 x 10^6 bytes = a few megabytes
(10^12)! takes order of 3 x 10^12 bytes = a few terabytes
(10^21)! takes order of 10^22 bytes = ten billion terabytes
A few megs will fit into memory. A few terabytes is easily within your grasp but you'll need to write a memory manager probably. Ten billion terabytes will take the combined resources of all the technology companies in the world, but it is doable.
Now consider the computation time. Suppose we can perform a million multiplications per second per machine and that we can parallelize the work out to multiple machines somehow.
(10^6)! takes order of one second on one machine
(10^12)! takes order of 10^6 seconds on one machine =
10 days on one machine =
a few minutes on a thousand machines.
(10^21)! takes order of 10^15 seconds on one machine =
30 million years on one machine =
3 years on 10 million machines
1 day on 10 billion machines (each with a TB drive.)
So (10^6)! is within your grasp. (10^12)! you are going to have to write your own memory manager and math library, and it will take you some time to get an answer. (10^21)! you will need to organize all the resources of the world to solve this problem, but it is doable.
Or you could find another approach.
The solution is easy: Calculate the factorials without using recursion, and you won't blow out your stack.
I.e. you're not getting this error because the numbers are too large, but because you have too many levels of function calls. And fortunately, for factorials there's no reason to calculate them recursively.
Once you've solved your stack problem, you can worry about whether your number format can handle your "very big" factorials. Since you don't need the exact values, use one of the many efficient numeric approximations (which you can count on to get all of the most significant digits right). The most common one is Stirling's approximation:
n! ~ n^n e^{-n} sqrt(2 \pi n)
The image is from this page, where you'll find discussion and a second, more accurate formula (although "in most cases the difference is quite small", they say). Of course this number is still too large for you to store, but now you can work with logarithms and drop the unimportant digits before you extract the number. Or use the Wikipedia version of the approximation, which is already expressed as a logarithm.
Unroll recursion:
static BigInteger factorial(BigInteger n)
{
BigInteger res = 1;
for (BigInteger i = 2; i <= n; ++i)
res *= i;
return res;
}

Why would this method need to be recursive?

On a whim, I've decided to go back and seek certification, starting with 98-361, Fundamentals of Software Development. (I'm doing this more for myself than anything else. I want to fill in gaps in my knowledge.)
In the very early course of the book, they present this interesting scenario in the Proficient Assessment section:
You are developing a library of utility functions for your
application. You need to write a method that takes an integer and
counts the number of significant digits in it. You need to create a recursive program
to solve this problem. How would you write such a
program?
I find myself gaping at this scenario in befuddlement. If I understand "significant digits" correctly, there's no need whatsoever for a function that counts an integer's significant digits to be recursive. And, any architect who insisted that it be recursive should have his head examined.
Or am I not getting it? Did I completely miss something here? From what I understand, the significant digits are the digits of a number, starting from the left, and proceeding right, excluding any leading zeroes.
Under what conditions would this need to be recursive? (The whole point of this exercise for me is to learn new things. Someone throw me a bone.)
EDIT: I don't want an answer to the problem question. I can figure that out on my own. It just seems to me that this "problem" could be solved far more easily with a simple foreach loop over the characters in a string.
Final Edit
Given the sage advice of the awesome posters below, this was the simple solution I came up with to solve the problem. (Despite what misgivings I may have.)
using System;
class Program
{
static void Main(string[] args)
{
var values = new[] { 5, 15, 150, 250, 2500, 25051, 255500005, -10, -1005 };
foreach (var value in values)
{
Console.WriteLine("Signficiant digits for {0} is {1}.", value, SignificantDigits(value));
}
}
public static int SignificantDigits(int n)
{
if (n == 0)
{
return 0;
}
return 1 + SignificantDigits((int)(n / 10));
}
}
There's no need for such an algorithm to be recursive. But the intent here is not to write real-world code, it's to ensure you understand recursion.
Since you stated you weren't after code, I'll be careful here, but I need to provide something to compare the complexity of the solutions, so I'll use pseudo-code. A recursive solution may be something like:
def sigDigits (n):
# Handle negative numbers.
if n < 0:
return sigDigits (-n)
# 0..9 is one significant digit.
if n < 10:
return 1
# Otherwise it's one plus the count in n/10 (truncated).
return 1 + sigDigits (n / 10)
And you're right, it equally doable as iteration.
def sigDigits (n):
# Handle negative numbers.
if n < 0:
n = -n
# All numbers have at least one significant digit.
digits = 1
# Then we add one and divide by ten (truncated), until we get low enough.
while n > 9:
n = n / 10
digits = digits + 1
return digits
There are some (usually of a mathematical bent, and including myself) that consider recursive algorithms much more elegant where they're suitable (such as where the "solution search space" reduces very quickly so as to not blow out your stack).
I question the suitability in this particular case since the iterative solution is not too complex, but the questioner had to provide some problem and this one is relatively easy to solve.
And, as per your edit:
... could be solved far more easily with a simple foreach loop over the characters in a string
You don't have a string, you have an integer. I don't doubt that you could turn that into a string and then count characters but that seems a roundabout way of doing it.
It doesn't need to be recursive. It's simply that the question is asking you to write a recursive implementation, presumably to test your understanding of how a recursive function works.
That seems like a pretty forced example. The problem can be solved with an simpler iterative algorithm.
A lot of teaching resources really struggle to provide useful examples of when to use recursion. Technically you never need to use it, but for a large class of (mostly algorithmic) problems, it can really simplify things.
For example, consider any operation on a binary tree. Because the physical structure of a binary tree is recursive, the algorithms that operate on it are also naturally recursive. You can also write imperative algorithms to operate on binary trees, but the recursive ones are simpler to write and understand.

Time Complexity Evaluation?

I have implemented cryptography and Steganography algorithms in a application and would like to make a time complexity evaluation for these algorithms. I don't have any idea how these tests should be performed.
Is there any testing framework which I can apply or how to make these tests?
Assuming you're not using recursion, you should look at your loops. It's fairly simple to count by inspection the number of times each loop is iterated based on a certain input. The time complexity is given by multiplication of the number of times nested loops are run. If you have more than one set of nested loops, add them all together and this is the time complexity. As an example:
for (int i=0; i<N; ++i)
{
// Some constant time operation happens here
for (int j=0; j<N; ++j)
{
// Some constant time operation happens here
}
// Some constant time operation happens here
}
This loop will execute in O(N) + O(N2) = O(N + N2) = O(N2) time since the outer constant time portions of the loop are executed in O(N) time, while the inner portion is performed in O(N)*O(N) = O(N2).
If you're using recursion, things are a bit harder to analyze, but it boils down to the same thing ... counting how many times constant time portions of your code get executed. Recommend you read a book on analysis of algorithms (This one is pretty much the standard on the subject) which will help you understand how best to analyze other algorithms as well.
You are manipulating pixels in a bitmap, pretty hard to come up with an algorithm that isn't going to be O(n^2). You could run timing tests with various sizes of bitmaps, use the Stopwatch class. Bitmap manipulations that are not computationally hard, like this, are fundamentally limited by RAM bus bandwidth. And the size of the cpu cache if you don't address the pixels the smart way (storage order). You'll know you have cache problems when the perf suddenly falls off a cliff as you increase the bitmap size.
You could unit test it. Assuming you have a method with parameters you can time how long it takes with different input sizes n. That way you can work out the time complexity with respect to n. This would be the simplest way. Not sure of any frameworks that could automate this for you.

While loop execution time

We were having a performance issue in a C# while loop. The loop was super slow doing only one simple math calc. Turns out that parmIn can be a huge number anywhere from 999999999 to MaxInt. We hadn't anticipated the giant value of parmIn. We have fixed our code using a different methodology.
The loop, coded for simplicity below, did one math calc. I am just curious as to what the actual execution time for a single iteration of a while loop containing one simple math calc is?
int v1=0;
while(v1 < parmIn) {
v1+=parmIn2;
}
There is something else going on here. The following will complete in ~100ms for me. You say that the parmIn can approach MaxInt. If this is true, and the ParmIn2 is > 1, you're not checking to see if your int + the new int will overflow. If ParmIn >= MaxInt - parmIn2, your loop might never complete as it will roll back over to MinInt and continue.
static void Main(string[] args)
{
int i = 0;
int x = int.MaxValue - 50;
int z = 42;
System.Diagnostics.Stopwatch st = new System.Diagnostics.Stopwatch();
st.Start();
while (i < x)
{
i += z;
}
st.Stop();
Console.WriteLine(st.Elapsed.Milliseconds.ToString());
Console.ReadLine();
}
Assuming an optimal compiler, it should be one operation to check the while condition, and one operation to do the addition.
The time, small as it is, to execute just one iteration of the loop shown in your question is ... surprise ... small.
However, it depends on the actual CPU speed and whatnot exactly how small it is.
It should be just a few machine instructions, so not many cycles to pass once through the iteration, but there could be a few cycles to loop back up, especially if branch prediction fails.
In any case, the code as shown either suffers from:
Premature optimization (in that you're asking about timing for it)
Incorrect assumptions. You can probably get a much faster code if parmIn is big by just calculating how many loop iterations you would have to perform, and do a multiplication. (note again that this might be an incorrect assumption, which is why there is only one sure way to find performance issues, measure measure measure)
What is your real question?
It depends on the processor you are using and the calculation it is performing. (For example, even on some modern architectures, an add may take only one clock cycle, but a divide may take many clock cycles. There is a comparison to determine if the loop should continue, which is likely to be around one clock cycle, and then a branch back to the start of the loop, which may take any number of cycles depending on pipeline size and branch prediction)
IMHO the best way to find out more is to put the code you are interested into a very large loop (millions of iterations), time the loop, and divide by the number of iterations - this will give you an idea of how long it takes per iteration of the loop. (on your PC). You can try different operations and learn a bit about how your PC works. I prefer this "hands on" approach (at least to start with) because you can learn so much more from physically trying it than just asking someone else to tell you the answer.
The while loop is couple of instructions and one instruction for the math operation. You're really looking at a minimal execution time for one iteration. it's the sheer number of iterations you're doing that is killing you.
Note that a tight loop like this has implications on other things as well, as it bogs down one CPU and it blocks the UI thread (if it's running on it). Thus, not only it is slow due to the number of operations, it also adds a perceived perf impact due to making the whole machine look unresponsive.
If you're interested in the actual execution time, why not time it for yourself and find out?
int parmIn = 10 * 1000 * 1000; // 10 million
int v1=0;
Stopwatch sw = Stopwatch.StartNew();
while(v1 < parmIn) {
v1+=parmIn2;
}
sw.Stop();
double opsPerSec = (double)parmIn / sw.Elapsed.TotalSeconds;
And, of course, the time for one iteration is 1/opsPerSec.
Whenever someone asks about how fast control structures in any language you know they are trying to optimize the wrong thing. If you find yourself changing all your i++ to ++i or changing all your switch to if...else for speed you are micro-optimizing. And micro optimizations almost never give you the speed you want. Instead, think a bit more about what you are really trying to do and devise a better way to do it.
I'm not sure if the code you posted is really what you intend to do or if it is simply the loop stripped down to what you think is causing the problem. If it is the former then what you are trying to do is find the largest value of a number that is smaller than another number. If this is really what you want then you don't really need a loop:
// assuming v1, parmIn and parmIn2 are integers,
// and you want the largest number (v1) that is
// smaller than parmIn but is a multiple of parmIn2.
// AGAIN, assuming INTEGER MATH:
v1 = (parmIn/parmIn2)*parmIn2;
EDIT: I just realized that the code as originally written gives the smallest number that is a multiple of parmIn2 that is larger than parmIn. So the correct code is:
v1 = ((parmIn/parmIn2)*parmIn2)+parmIn2;
If this is not what you really want then my advise remains the same: think a bit on what you are really trying to do (or ask on Stackoverflow) instead of trying to find out weather while or for is faster. Of course, you won't always find a mathematical solution to the problem. In which case there are other strategies to lower the number of loops taken. Here's one based on your current problem: keep doubling the incrementer until it is too large and then back off until it is just right:
int v1=0;
int incrementer=parmIn2;
// keep doubling the incrementer to
// speed up the loop:
while(v1 < parmIn) {
v1+=incrementer;
incrementer=incrementer*2;
}
// now v1 is too big, back off
// and resume normal loop:
v1-=incrementer;
while(v1 < parmIn) {
v1+=parmIn2;
}
Here's yet another alternative that speeds up the loop:
// First count at 100x speed
while(v1 < parmIn) {
v1+=parmIn2*100;
}
// back off and count at 50x speed
v1-=parmIn2*100;
while(v1 < parmIn) {
v1+=parmIn2*50;
}
// back off and count at 10x speed
v1-=parmIn2*50;
while(v1 < parmIn) {
v1+=parmIn2*10;
}
// back off and count at normal speed
v1-=parmIn2*10;
while(v1 < parmIn) {
v1+=parmIn2;
}
In my experience, especially with graphics programming where you have millions of pixels or polygons to process, speeding up code usually involve adding even more code which translates to more processor instructions instead of trying to find the fewest instructions possible for the task at hand. The trick is to avoid processing what you don't have to.

Categories