Help understanding eratosthenes sieve implementation

Help understanding eratosthenes sieve implementation - c#

I found this LINQ implementation of the eratosthenes sieve on this website. I understand the basic concept of the sieve, but there's one detail I don't get. What is the purpose of the first Enumerable.Range(0,168)?
List<int> erathostheness = Enumerable.Range(0, 168)
.Aggregate(Enumerable.Range(2, 1000000).ToList(), (result, index) =>
{
result.RemoveAll(i => i > result[index] && i % result[index] == 0);
return result;
}).ToList();

It is the number of times the sieve will be run to eliminate all non-primes from the list.
result.RemoveAll(i => i > result[index] && i % result[index] == 0);
Each time you run the sieve, this line of code takes the smallest number in the list (the smallest prime that the result hasn't had all its multiples removed of yet) and then removes all the multiples. This is run 168 times, and on the 168th time the smallest number the list hasn't been screened of yet is 997, which naturally is the 168th prime.
This only needs to be run 168 times because all numbers can be expressed as the product of a list of primes, and there is no number less than 1000000 that is a multiple of the 169th primes number (1,009) that is NOT a multiple of a prime lower than 1009. The lowest number that this would removed by sieving out 1009 that has NOT been removed already is 1009 * 1013 = 1,022,117, or the 169th primes multiplied by the 170th prime, which is clearly greater than 100000 and thus doesn't need to be checked for this set of numbers.
Hence, all the multiples of 1009 have already been removed from the list when you get to that point, so there's no point in continuing as you already have removed all the non-primes from the list. :D

There are 168 primes less that 1000.
If x is less than 1,000,000, and x is not prime, then x can be factored into prime numbers p1, p2, ..., pn. At least one of these factors must be less that 1000, or else the product would be more than 1,000,000. This means at least one factor must be one of the first 168 primes.

Related

Randomly selecting numbers from n consecutive numbers

Given an integer array of n consecutive number from 0, i.e.
0,1,2,..n
I wish to select n/2 numbers randomly.
Say n=5
Then a possible set would be 0,3,5.
How to achieve that easily?

You can loop through the numbers and determine the probability that each number should be in the result:
int n = 5;
int left = (n + 1) / 2;
int[] result = new int[left];
Random rnd = new Random();
for (int i = 0; left > 0; i++) {
if (rnd.Next(n + 1 - i) < left) {
result[result.Length - left] = i;
left--;
}
}
Note: This will always produce a sorted result.
Edit:
Here is a tests run creating 200000000 results, counting the combinations generated (where the binary number represents the combination, e.g. 100110 is 0,3,4):
010011 : 9999164
110001 : 10003346
010101 : 9990975
100101 : 9998154
101001 : 10006305
100110 : 10003350
101010 : 10000583
101100 : 9995335
011001 : 10000007
001011 : 10001492
001110 : 10001158
100011 : 9994680
110100 : 9998226
110010 : 9999954
011010 : 10002269
000111 : 10004752
010110 : 9996886
011100 : 9999196
111000 : 10001094
001101 : 10003074

The simplest way I've found of doing this is an incomplete Fisher-Yates shuffle. Stop after n/2 iterations.
The shuffle in effect works with two arrays, the randomly selected numbers and the pool of numbers that have not yet been used, and therefore are available for selection. As it happens the total size of the two arrays is the original array length, so they can be stored in place by partitioning.
After n/2 iterations, the partition representing the numbers that have been selected is a random choice from the original array.
Another way of looking at this is that the first n/2 numbers of the result of the full shuffle will not be changed by the n/2+1 or subsequent iterations of the shuffle.

Use a Fisher-Yates Shuffle, then pick the first n/2 items in the array.
As #Patricia Shanahan points out in her answer, it is only necessary to shuffle the first n/2 items in the array using Fisher-Yates.

An approach is to generate index of the vector numbers by using Random object with Next. You add a new generate value to an structure like ArrayList or HashSet to memorize that. For every new generating of the index you verify if that value is present or not in this structure.

A method that efficiently uses the properties of binary to encode the selection is :
Generate a random 32-bit unsigned int.
Do this n/32 times, and put these in an array called masks
The bits of these random ints select the number selected. Specifically if the word size is 32 then i will be selected if (masks[i/32] >> (i%32)) && 1 is true.
Since random 32 bit ints will on average have 16 0s and 16 1s this method will get close to your value of n/2.
Say the actual number of 1s you get is W and differs from n/2 by k, k = W - n/2
If you have k too few numbers selected generate a random integer in range 0 to n, go to that bit indexed by that number and change the first 0 you encounter to a 1, searching forward.
Do this k times.
In the case where you have k too many, change the first 1 you encounter to a zero.
The advantage if this is it is also the most compact structure to store you subset, with each bit selecting a single integer. You only need to range, and this array of masks, and you have your selection.
Plus you have to generate 16 times less random numbers than the other shuffle methods making it more efficient.
Finally, this increase in efficiency increases with word size. For 64 bit unsigned longs you get 32 times less random numbers required than by the "n/2 swaps permutation method"
Good luck!

Is Linq not allowed when you write algorithm?
int[] arrInts = new int[] {0, 1, 2, 3, 4, 5, 6};
var r = new Random();
var randomInts = arrInts.OrderBy(i => r.Next(arrInts.Length))
.Take(arrInts.Length/2)
.ToArray();
Edit: I am not 100% sure about performance, but it might be better to use .AsParallel() before .OrderBy().
I like using Linq because it is very readable and it took me around 5 secs to write the algorithm instead of many minutes to write sorting and looping "by hand".

Longest recurring cycle in its decimal fraction - a bug or a misunderstanding?

This is fairly 'math-y' but I'm posting here because it's a Project Euler problem, & I have working code that presumably has bugs in it.
The question Determing longest repeating cycle in a decimal expansion solves the problem using logarithms, but I'm interested in solving with simple brute force. More accurately, I'm interested in understanding why my algorithm and code is not returning the correct solution.
The algorithm is simple:
replicate a 'long division',
at each step record the divisor and the remainder
when a divisor / remainder tuple is repeated, infer that the decimal representation will repeat.
Here are private fields, as requested
private int numerator;
private int recurrence;
private int result;
private int resultRecurrence;
private List<dynamic> digits;
and here is the code:
private void Go()
{
foreach (var i in primes)
{
digits = new List<dynamic>();
numerator = 1;
recurrence = 0;
while (numerator != 0)
{
numerator *= 10;
// quotient
var q = numerator / i;
// remainder
var r = numerator % i;
digits.Add(new { Divisor = q, Remainder = r });
// if we've found a repetition then break out
var m = digits.Where(p => p.Divisor == q && p.Remainder == r).ToList();
if (m.Count > 1)
{
recurrence = digits.LastIndexOf(m[0]) - digits.IndexOf(m[0]);
break;
}
numerator = r;
}
if (recurrence > resultRecurrence)
{
resultRecurrence = recurrence;
result = i;
}
}}
When testing integers < 10 and < 20 I get the correct result; and I correctly identify the value of i as well. However the decimal represetation that I get is incorrect - I calculate i-1 whereas the correct result is far less (something like i-250).
So presumably I either have a programming bug - which I can't find - or a logic bug.
I'm confused because it feels like a multiplicative group over p to me, in which there would be p-1 elements. I'm sure I'm missing something, can anyone provide suggestions?
edit
I'm not going to include my prime number code - it's not relevant, as I explain above I correctly identify the value of i (from memory it is 983) but I'm having problems getting the correct value for resultRecurrence.

I'm confused because it feels like a multiplicative group over p to me, in which there would be p-1 elements. I'm sure I'm missing something, can anyone provide suggestions?
Close.
For all primes except 2 and 5 (which divide 10), the sequence of remainders is formed by starting with 1 and transforming by
remainder = (10 * remainder) % prime
thus the k-th remainder is 10k (mod prime) and the set of remainders forms a subgroup of the group of nonzero remainders modulo prime[1]. The length of the recurring cycle is the order of that subgroup, which is also known as the order of 10 modulo prime.
The order of the group of nonzero remainders modulo prime is prime-1, and there's a theorem by Fermat:
Let G be a finite group of order g and H be a subgroup of G. Then the order h of H divides g.
So the length of the cycle is always a divisor of prime-1, and sometimes it's prime-1, e.g. for 7 or 19.
[1] For composite numbers n coprime to 10, that would be the group of remainders modulo n that are coprime to n.

First off, you don’t need the divisors, you only need the remainders.
Secondly, I would split the function into multiple independent parts instead of having everything in one big method: The long division / finding of the cycle length is independent of the rest (= finding the longest cycle).
Your break on Where coupled with Count is unintuitive. Why not just use a while loop with the condition (! digits.Contains(r))? (This would require putting 0 as a remainder into the digits list before the loop start.)
This leaves us with a much cleaner code that should be straightforward to debug.

recurrence = digits.LastIndexOf(m[0]) - digits.IndexOf(m[0]);
Surely the value of resultRecurrence is always going to be i-1 ? Since for a fraction of the form 1/n, the decimal starts repeating exactly when the division-in-progress (the ith digit) gives the same quotient-remainder as the very first trial division (1, hence i-1).
(as a side note, may I introduce you to Math.DivRem).

Big O notation and algorithms

I'm currently studying and trying to implement some algorithms. I'm trying to understand Big O notation and I can't figure out the Big O complexity for the algorithm below:
while (a != 0 && b != 0)
{
if (a > b)
a %= b;
else
b %= a;
}
if (a == 0)
common=b;
else
common=a;

It's easy to see that after two iterations the least of the numbers becomes at least twice smaller. If it was equal m at the beginning, then after 2K iterations it will be no more than m/2^K. If we put K = [log_2(m)] + 1 here, we'll see that after 2K iterations the least of the numbers becomes zero, and the loop terminates. Hence the number of iterations is no more than 2(log_2 m + 1) = O(log m).

That is the Euclidean algorithm for computing the greatest common divisor of two integers. I'll leave it to you to do the research on the complexity of this algorithm but the Fibonnacci numbers play an important role.

Most people (who are not mathematicians) never need to find out that stuff, it's already documented: http://en.wikipedia.org/wiki/Euclidean_algorithm#Algorithmic_efficiency

Getting Factors of a Number

I'm trying to refactor this algorithm to make it faster. What would be the first refactoring here for speed?
public int GetHowManyFactors(int numberToCheck)
{
// we know 1 is a factor and the numberToCheck
int factorCount = 2;
// start from 2 as we know 1 is a factor, and less than as numberToCheck is a factor
for (int i = 2; i < numberToCheck; i++)
{
if (numberToCheck % i == 0)
factorCount++;
}
return factorCount;
}

The first optimization you could make is that you only need to check up to the square root of the number. This is because factors come in pairs where one is less than the square root and the other is greater.
One exception to this is if n is an exact square then its square root is a factor of n but not part of a pair.
For example if your number is 30 the factors are in these pairs:
1 x 30
2 x 15
3 x 10
5 x 6
So you don't need to check any numbers higher than 5 because all the other factors can already be deduced to exist once you find the corresponding small factor in the pair.
Here is one way to do it in C#:
public int GetFactorCount(int numberToCheck)
{
int factorCount = 0;
int sqrt = (int)Math.Ceiling(Math.Sqrt(numberToCheck));
// Start from 1 as we want our method to also work when numberToCheck is 0 or 1.
for (int i = 1; i < sqrt; i++)
{
if (numberToCheck % i == 0)
{
factorCount += 2; // We found a pair of factors.
}
}
// Check if our number is an exact square.
if (sqrt * sqrt == numberToCheck)
{
factorCount++;
}
return factorCount;
}
There are other approaches you could use that are faster but you might find that this is already fast enough for your needs, especially if you only need it to work with 32-bit integers.

Reducing the bound of how high you have to go as you could knowingly stop at the square root of the number, though this does carry the caution of picking out squares that would have the odd number of factors, but it does help reduce how often the loop has to be executed.

Looks like there is a lengthy discussion about this exact topic here: Algorithm to calculate the number of divisors of a given number
Hope this helps

The first thing to notice is that it suffices to find all of the prime factors. Once you have these it's easy to find the number of total divisors: for each prime, add 1 to the number of times it appears and multiply these together. So for 12 = 2 * 2 * 3 you have (2 + 1) * (1 + 1) = 3 * 2 = 6 factors.
The next thing follows from the first: when you find a factor, divide it out so that the resulting number is smaller. When you combine this with the fact that you need only check to the square root of the current number this is a huge improvement. For example, consider N = 10714293844487412. Naively it would take N steps. Checking up to its square root takes sqrt(N) or about 100 million steps. But since the factors 2, 2, 3, and 953 are discovered early on you actually only need to check to one million -- a 100x improvement!
Another improvement: you don't need to check every number to see if it divides your number, just the primes. If it's more convenient you can use 2 and the odd numbers, or 2, 3, and the numbers 6n-1 and 6n+1 (a basic wheel sieve).
Here's another nice improvement. If you can quickly determine whether a number is prime, you can reduce the need for division even further. Suppose, after removing small factors, you have 120528291333090808192969. Even checking up to its square root will take a long time -- 300 billion steps. But a Miller-Rabin test (very fast -- maybe 10 to 20 nanoseconds) will show that this number is composite. How does this help? It means that if you check up to its cube root and find no factors, then there are exactly two primes left. If the number is a square, its factors are prime; if the number is not a square, the numbers are distinct primes. This means you can multiply your 'running total' by 3 or 4, respectively, to get the final answer -- even without knowing the factors! This can make more of a difference than you'd guess: the number of steps needed drops from 300 billion to just 50 million, a 6000-fold improvement!
The only trouble with the above is that Miller-Rabin can only prove that numbers are composite; if it's given a prime it can't prove that the number is prime. In that case you may wish to write a primality-proving function to spare yourself the effort of factoring to the square root of the number. (Alternately, you could just do a few more Miller-Rabin tests, if you would be satisfied with high confidence that your answer is correct rather than a proof that it is. If a number passes 15 tests then it's composite with probability less than 1 in a billion.)

You can limit the upper limit of your FOR loop to numberToCheck / 2
Start your loop counter at 2 (if your number is even) or 3 (for odd values). This should allow you to check every other number dropping your loop count by another 50%.
public int GetHowManyFactors(int numberToCheck)
{
// we know 1 is a factor and the numberToCheck
int factorCount = 2;
int i = 2 + ( numberToCheck % 2 ); //start at 2 (or 3 if numberToCheck is odd)
for( ; i < numberToCheck / 2; i+=2)
{
if (numberToCheck % i == 0)
factorCount++;
}
return factorCount;
}

Well if you are going to use this function a lot you can use modified algorithm of Eratosthenes http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes and store answars for a interval 1 to Max in array. It will run IntializeArray() once and after it will return answers in 0(1).
const int Max =1000000;
int arr [] = new int [Max+1];
public void InitializeArray()
{
for(int i=1;i<=Max;++i)
arr[i]=1;//1 is factor for everyone
for(int i=2;i<=Max;++i)
for(int j=i;i<=Max;i+=j)
++arr[j];
}
public int GetHowManyFactors(int numberToCheck)
{
return arr[numberToCheck];
}
But if you are not going to use this function a lot I think best solution is to check unitll square root.
Note: I have corrected my code!

An easy to implement algorithm that will bring you much farther than trial division is Pollard Rho
Here is a Java implementation, that should be easy to adapt to C#: http://www.cs.princeton.edu/introcs/78crypto/PollardRho.java.html

https://codility.com/demo/results/demoAAW2WH-MGF/
public int solution(int n) {
var counter = 0;
if (n == 1) return 1;
counter = 2; //1 and itself
int sqrtPoint = (Int32)(Math.Truncate(Math.Sqrt(n)));
for (int i = 2; i <= sqrtPoint; i++)
{
if (n % i == 0)
{
counter += 2; // We found a pair of factors.
}
}
// Check if our number is an exact square.
if (sqrtPoint * sqrtPoint == n)
{
counter -=1;
}
return counter;
}

Codility Python 100 %
Here is solution in python with little explanation-
def solution(N):
"""
Problem Statement can be found here-
https://app.codility.com/demo/results/trainingJNNRF6-VG4/
Codility 100%
Idea is count decedent factor in single travers. ie. if 24 is divisible by 4 then it is also divisible by 8
Traverse only up to square root of number ie. in case of 24, 4*4 < 24 but 5*5!<24 so loop through only i*i<N
"""
print(N)
count = 0
i = 1
while i * i <= N:
if N % i == 0:
print()
print("Divisible by " + str(i))
if i * i == N:
count += 1
print("Count increase by one " + str(count))
else:
count += 2
print("Also divisible by " + str(int(N / i)))
print("Count increase by two count " + str(count))
i += 1
return count
Example by run-
if __name__ == '__main__':
# result = solution(24)
# result = solution(35)
result = solution(1)
print("")
print("Solution " + str(result))
"""
Example1-
24
Divisible by 1
Also divisible by 24
Count increase by two count 2
Divisible by 2
Also divisible by 12
Count increase by two count 4
Divisible by 3
Also divisible by 8
Count increase by two count 6
Divisible by 4
Also divisible by 6
Count increase by two count 8
Solution 8
Example2-
35
Divisible by 1
Also divisible by 35
Count increase by two count 2
Divisible by 5
Also divisible by 7
Count increase by two count 4
Solution 4
Example3-
1
Divisible by 1
Count increase by one 1
Solution 1
"""
Github link

I got pretty good results with complexity of O(sqrt(N)).
if (N == 1) return 1;
int divisors = 0;
int max = N;
for (int div = 1; div < max; div++) {
if (N % div == 0) {
divisors++;
if (div != N/div) {
divisors++;
}
}
if (N/div < max) {
max = N/div;
}
}
return divisors;

Python Implementation
Score 100% https://app.codility.com/demo/results/trainingJ78AK2-DZ5/
import math;
def solution(N):
# write your code in Python 3.6
NumberFactor=2; #one and the number itself
if(N==1):
return 1;
if(N==2):
return 2;
squareN=int(math.sqrt(N)) +1;
#print(squareN)
for elem in range (2,squareN):
if(N%elem==0):
NumberFactor+=2;
if( (squareN-1) * (squareN-1) ==N):
NumberFactor-=1;
return NumberFactor

Generate large prime number with specified last digits

Was wondering how is it possible to generate 512 bit (155 decimal digits) prime number, last five decimal digits of which are specified/fixed (eg. ***28071) ??
The principles of generating simple primes without any specifications are quite understandable, but my case goes further.
Any hints for, at least, where should I start?
Java or C# is preferable.
Thanks!

I guess the only way would be to first generate a random number of 150 decimal digits, then append the 28071 behind it by doing number = randomnumber * 100000 + 28071 then just brute force it out with something like
while (!IsPrime(number))
number += 100000;
Of course this could take awhile to compute ;-)

Did you try just generating such numbers and checking them? I would expect that to be acceptably fast. The prime density decreases only as the logarithm of the number, so I'd expect you to try a few hundred numbers until you hit a prime. ln(2^512) = 354 so about one number in 350 will be prime.
Roughly speaking, the prime number theorem states that if a random number nearby some large number N is selected, the chance of it being prime is about 1 / ln(N), where ln(N) denotes the natural logarithm of N. For example, near N = 10,000, about one in nine numbers is prime, whereas near N = 1,000,000,000, only one in every 21 numbers is prime. In other words, the average gap between prime numbers near N is roughly ln(N)
(from http://en.wikipedia.org/wiki/Prime_number_theorem)
You just need to take care that a number exists for your final digits. But I think that's as easy as checking that the last digit isn't divisible by 2 or 5 (i.e. it is 1, 3, 7 or 9).
According to this performance data you can do about 2000 ModPow operations on 512 bit data per second, and since a simple prime-test is checking 2^(p-1) mod p=1 which is one ModPow operation, you should be able to generate several primes with your properties per second.
So you could do (pseudocode):
BigInteger FindPrimeCandidate(int lastDigits)
{
BigInteger i=Random512BitInt;
int remainder = i % 100000;
int increment = lastDigits-remainder;
i += increment;
BigInteger test = BigInteger.ModPow(2, i - 1, i);
if(test == 1)
return i;
else
return null;
}
And do more extensive prime checks on the result of that function.

As #Doggot said, but start from least possible 150 digit number which ends with 28071, means 100000....0028071, now add it up with 100000 each time and for testing primarily use miller rabin like the code I provided here, It needs some customization. If the return value is true, check it for exact primarily.

You can use a sieve which contains only numbers satisfying your special condition to filter out numbers divisible by small primes.
For each small prime p you need to find the correct starting point and step by taking into account that only each 100000th number is present in the sieve.
For the numbers that survive the sieve you can use BigInteger.isProbablePrime() to check whether it is prime with sufficient probability.

Let ABCDE be the five digits number in base ten, which you are considering. Based on Dirichlet's theorem on arithmetic progressions, if ABCDE and 100000 are coprime, then there are infinitely many primes of the form 100000*k+ABCDE. Since you are looking for prime numbers, neither 2 nor 5 would divide ABCDE anyway, thus ABCDE and 100000 are coprime. So there are infinitely many primes of the form you are considering.

You could extend one of the standard methods for generating large primes by adding an extra constraint, i.e. that the last 5 decimal digits must be correct. Naively, you can just add this as an extra test but it will increase the time to find a suitable prime by 10^5.
Not-so-naively: generate a random 512-bit number then set sufficient low-order bits so that the decimal representation ends with the required sequence. Then continue with the normal primality tests.

I rewrote the brute-force algorithm from the int world to the BigDecimal one with the help of the BigSquareRoot class from http://www.merriampark.com/bigsqrt.htm. (Note that from 1 to 1000 there is said to be exactly 168 primes.)
Sorry, but if you put there your range, i.e. <10154; 10155-1>, you can let your computer work and when you have retired, you may have the result... it is damn slow!
However, you can somehow find at least a part of this useful in combination with the other answers in this thread.
package edu.eli.test.primes;
import java.math.BigDecimal;
public class PrimeNumbersGenerator {
public static void main(String[] args) {
// BigDecimal lowerLimit = BigDecimal.valueOf(10).pow(154); /* 155 digits */
// BigDecimal upperLimit = BigDecimal.valueOf(10).pow(155).subtract(BigDecimal.ONE);
BigDecimal lowerLimit = BigDecimal.ONE;
BigDecimal upperLimit = new BigDecimal("1000");
BigDecimal prime = lowerLimit;
int i = 1;
/* http://www.merriampark.com/bigsqrt.htm */
BigSquareRoot bsr = new BigSquareRoot();
upperLimit = upperLimit.add(BigDecimal.ONE);
while (prime.compareTo(upperLimit) == -1) {
bsr.setScale(0);
BigDecimal roundedSqrt = bsr.get(prime);
boolean isPrimeNumber = false;
BigDecimal upper = roundedSqrt;
while (upper.compareTo(BigDecimal.ONE) == 1) {
BigDecimal div = prime.remainder(upper);
if ((prime.compareTo(upper) != 0) && (div.compareTo(BigDecimal.ZERO) == 0)) {
isPrimeNumber = false;
break;
} else if (!isPrimeNumber) {
isPrimeNumber = true;
}
upper = upper.subtract(BigDecimal.ONE);
}
if (isPrimeNumber) {
System.out.println("\n" + i + " -> " + prime + " is a prime!");
i++;
} else {
System.out.print(".");
}
prime = prime.add(BigDecimal.ONE);
}
}
}

Let's consider brute-force. Take a look at this very interesting text called "The prime number lottery":
http://plus.maths.org/content/prime-number-lottery
Given the last entry in the last table, there are ~2.79*10^14 primes less then 10^16. Thus, approximately every 35th number is a prime in that range.
EDIT: See the comment by CodeInChaos - if you just walk a few thousand 512bit numbers with last 5 digits fixed, you'll find one quickly.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.