I'm trying to nail down some interview questions, so I stared with a simple one.
Design the factorial function.
This function is a leaf (no dependencies - easly testable), so I made it static inside the helper class.
public static class MathHelper
{
public static int Factorial(int n)
{
Debug.Assert(n >= 0);
if (n < 0)
{
throw new ArgumentException("n cannot be lower that 0");
}
Debug.Assert(n <= 12);
if (n > 12)
{
throw new OverflowException("Overflow occurs above 12 factorial");
}
int factorialOfN = 1;
for (int i = 1; i <= n; ++i)
{
//checked
//{
factorialOfN *= i;
//}
}
return factorialOfN;
}
}
Testing:
[TestMethod]
[ExpectedException(typeof(OverflowException))]
public void Overflow()
{
int temp = FactorialHelper.MathHelper.Factorial(40);
}
[TestMethod]
public void ZeroTest()
{
int factorialOfZero = FactorialHelper.MathHelper.Factorial(0);
Assert.AreEqual(1, factorialOfZero);
}
[TestMethod]
public void FactorialOf5()
{
int factOf5 = FactorialHelper.MathHelper.Factorial(5);
Assert.AreEqual(5*4*3*2*1,factOf5);
}
[TestMethod]
[ExpectedException(typeof(ArgumentException))]
public void NegativeTest()
{
int factOfMinus5 = FactorialHelper.MathHelper.Factorial(-5);
}
I have a few questions:
Is it correct? (I hope so ;) )
Does it throw right exceptions?
Should I use checked context or this trick ( n > 12 ) is ok?
Is it better to use uint istead of checking for negative values?
Future improving: Overload for long, decimal, BigInteger or maybe generic method?
Thank you
It looks right to me, but it would be inefficient with larger numbers. If you're allowing for big integers, the number will keep growing with each multiply, so you would see a tremendous (asymptotically better) increase in speed if you multiplied them hierarchically. For example:
bigint myFactorial(uint first, uint last)
{
if (first == last) return first;
uint mid = first + (last - first)/2;
return myFactorial(first,mid) * myFactorial(1+mid,last);
}
bigint factorial(uint n)
{
return myFactorial(2,n);
}
If you really want a fast factorial method, you also might consider something like this:
Factor the factorial with a modified Sieve of Eratosthenes
Compute the powers of each prime factor using a fast exponentiation algorithm (and fast multiplication and square algorithms)
Multiply all the powers of primes together hierarchically
Yes, it looks right
The exceptions seem OK to me, and also as an interviewer, I can't see myself being concerned there
Checked. Also, in an interview, you'd never know that 12 just happened to be the right number.
Uint. If you can enforce something with a signature instead of an exception, do it.
You should just make it long (or bigint) and be done with it (int is a silly choice of return types here)
Here are some follow-up questions I'd ask if I were your interviewer:
Why didn't you solve this recursively? Factorial is a naturally recursive problem.
Can you add memoization to this so that it does a faster job computing 12! if it's already done 11!?
Do you need the n==0 case here?
As an interviewer, I'd definitely have some curveballs like that to throw at you. In general, I like the approach of practicing with a whiteboard and a mock interviewer, because so much of it is being nimble and thinking on your feet.
In the for cycle you can start with for (int i = 2...). Multiplying by 1 is quite useless.
I would have throw a single ArgumentOutOfRangeException for both < 0 and > 12. The Debug.Assert will mask the exception when you are using your unit test (you would have to test it in Release mode).
Related
I am trying to solve Project Euler Challenge 5, What is the smallest possible number that is evenly divisible by all the numbers from 1 to 20.
My problem is that my isMultiple() method doesn't return true when it should.
* My Code *
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Challenge_5
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine(isMultiple(2520, 10)); //should return True
Console.WriteLine(smallestMultiple(20)); //should return 232792560
Console.ReadLine();
}
static int factorial(int n)
{
int product = 1;
for (int i = 1; i <= n; i++)
{
product *= i;
}
return product; //returns the factorial of n, (n * n-1 * n-2... * 1)
}
static bool isMultiple(int number, int currentFactor)
{
bool returnBool = false;
if (currentFactor == 1)
{
returnBool = true; // if all factors below largestFactor can divide into the number, returns true
}
else
{
if (number % currentFactor == 0)
{
currentFactor--;
isMultiple(number, currentFactor);
}
}
return returnBool;
}
static int smallestMultiple(int largestFactor)
{
for (int i = largestFactor; i < factorial(largestFactor); i+= largestFactor) //goes through all values from the kargestFactor to largestFactor factorial
{
if (isMultiple(i, largestFactor))
{
return i; // if current number can be evenly divided by all factors, it gets returned
}
}
return factorial(largestFactor); // if no numbers get returned, the factorial is the smallest multiple
}
}
}
I know there are much easier ways to solve this, but I want the program to be used to check the lowest multiple of the numbers from 1 to any number, not just 20.
Help would be much appreciated.
EDIT
Thanks to help, i have fixed my code by changing line 42 from
isMultiple(number, currentFactor);
to
returnBool = isMultiple(number, currentFactor);
I also fixed the problem with not getting an accurate return value for smallestMultiple(20);
by changing some of the variables to long instead of int
Your problem is you forgot to use the output of isMultiple in your recursive part
if (number % currentFactor == 0)
{
currentFactor--;
returnBool = isMultiple(number, currentFactor); //you need a to save the value here.
}
Without assigning returnBool there is no way of knowing if the inner isMultiple returned true or not.
Scott's answer is perfectly valid. Forgive me if I'm wrong, but it sounds like you're a student, so in the interest of education, I thought I'd give you some pointers for cleaning up your code.
When writing recursive functions, it's (in my opinion) usually cleaner to return the recursive call directly if possible, as well as returning base cases directly, rather than storing a value that you return at the end. (It's not always possible, in complicated cases where you have to make a recursive call, modify the return value, and then make another recursive call, but these are uncommon.)
This practice:
makes base cases very obvious increasing readability and forces you to consider all of your base cases
prevents you from forgetting to assign the return value, as you did in your original code
prevents potential bugs where you might accidentally do some erroneous additional processing that alters the result before you return it
reduces the number of nested if statements, increasing readability by reducing preceding whitespace
makes code-flow much more obvious increasing readability and ability to debug
usually results in tail-recursion which is the best for performance, nearly as performant as iterative code
caveat: nowadays most compilers' optimizers will rejigger production code to produce tail-recursive functions, but getting into the habit of writing your code with tail-recursion in the first place is good practice, especially as interpreted scripting languages (e.g. JavaScript) are taking over the world, where code optimization is less possible by nature
The other change I'd make is to remove currentFactor--; and move the subtraction into the recursive call itself. It increases readability, reduces the chance of side effects, and, in the case where you don't use tail-recursion, prevents you from altering a value that you later expect to be unaltered. In general, if you can avoid altering values passed into a function (as opposed to a procedure/void), you should.
Also, in this particular case, making this change removes up to 3 assembly instructions and possibly an additional value on the stack depending on how the optimizer handles it. In long running loops with large depths, this can make a difference*.
static bool isMultiple(int number, int currentFactor)
{
if (currentFactor == 1)
{
// if all factors below largestFactor can divide into the number, return true
return true;
}
if (number % currentFactor != 0)
{
return false;
}
return isMultiple(number, currentFactor - 1);
}
* A personal anecdote regarding deep recursive calls and performance...
A while back, I was writing a program in C++ to enumerate the best moves for all possible Connect-4 games. The maximum depth of the recursive search function was 42 and each depth had up to 7 recursive calls. Initial versions of the code had an estimated running time of 2 million years, and that was using parallelism. Those 3 additional instructions can make a HUGE difference, both for sheer number of additional instructions and the amount L1 and L2 cache misses.
This algorithm came up to my mind right now, so please anyone correct me if i'm wrong.
As composite numbers are made by multiplication of prime numbers (and prime numbers can not be generated from multiplication of any other numbers) so the number must be multiplied to all prime numbers, some numbers like 6 when they are reached our smallest number is dividable (as its already multiplied to 2 and 3), but for those that are not, multiplying by a prime number (that is obviously less than the number itself so should be in our prime list) would make our smallest dividable to that number too. so for example when we get to 4, multiplying by2` (a prime number less than 4) would be enough, 8 and 9 the same way, ...
bool IsPrime(int x)
{
if(x == 1) return false;
for(int i=2;i<=Math.Sqrt(x);i+=2)
if(x%i==0) return false;
return true;
}
int smallest = 1;
List<int> primes = new List<int>();
for(int i=1;i<=20;i++)
if(IsPrime(i))
{
smallest *= i;
primes.Add(i);
}
else if(smallest % i != 0)
for(int j=0;j<primes.Count;j++)
if((primes[j]*smallest)%i == 0)
{
smallest *= primes[j];
break;
}
Edit:
As we have list of prime numbers so the best way to find out if a number is prime or not would be:
bool IsPrime(int x)
{
if(x == 1) return false;
for(int i = 0; i< primes.Count; i++)
if(x%primes[i] == 0) return false;
return true;
}
So there is several ways of creating a random bool in C#:
Using Random.Next(): rand.Next(2) == 0
Using Random.NextDouble(): rand.NextDouble() > 0.5
Is there really a difference? If so, which one actually has the better performance? Or is there another way I did not see, that might be even faster?
The first option - rand.Next(2) executes behind the scenes the following code:
if (maxValue < 0)
{
throw new ArgumentOutOfRangeException("maxValue",
Environment.GetResourceString("ArgumentOutOfRange_MustBePositive", new object[] { "maxValue" }));
}
return (int) (this.Sample() * maxValue);
and for the second option - rand.NextDouble():
return this.Sample();
Since the first option contains maxValue validation, multiplication and casting, the second option is probably faster.
Small enhancement for the second option:
According to MSDN
public virtual double NextDouble()
returns
A double-precision floating point number greater than or equal to 0.0, and less than 1.0.
So if you want an evenly spread random bool you should use >= 0.5
rand.NextDouble() >= 0.5
Range 1: [0.0 ... 0.5[
Range 2: [0.5 ... 1.0[
|Range 1| = |Range 2|
The fastest. Calling the method Random.Next has the less overhead. The extension method below runs 20% faster than Random.NextDouble() > 0.5, and 35% faster than Random.Next(2) == 0.
public static bool NextBoolean(this Random random)
{
return random.Next() > (Int32.MaxValue / 2);
// Next() returns an int in the range [0..Int32.MaxValue]
}
Faster than the fastest. It is possible to generate random booleans with the Random class even faster, by using tricks. The 31 significant bits of a generated int can be used for 31 subsequent boolean productions. The implementation below is 40% faster than the previously declared as the fastest.
public class RandomEx : Random
{
private uint _boolBits;
public RandomEx() : base() { }
public RandomEx(int seed) : base(seed) { }
public bool NextBoolean()
{
_boolBits >>= 1;
if (_boolBits <= 1) _boolBits = (uint)~this.Next();
return (_boolBits & 1) == 0;
}
}
I ran tests with stopwatch. 100,000 iterations:
System.Random rnd = new System.Random();
if (rnd.Next(2) == 0)
trues++;
CPUs like integers, so the Next(2) method was faster. 3,700 versus 7,500ms, which is quite substantial.
Also: I think random numbers can be a bottleneck, I created around 50 every frame in Unity, even with a tiny scene that noticeably slowed down my system, so I also was hoping to find a method to create a random bool.
So I also tried
if (System.DateTime.Now.Millisecond % 2 == 0)
trues++;
but calling a static function was even slower with 9,600ms. Worth a shot.
Finally I skipped the comparison and only created 100,000 random values, to make sure the int vs. double comparison did not influence the elapsed time, but the result was pretty much the same.
Based on answers here, Below is the code I used to generate random bool value.
Random rand = new Random();
bool randomBool = rand.NextDouble() >= 0.5;
References:
Generate a random boolean
https://stackoverflow.com/a/28763727/2218697
I have tried to write a code for Fermat primality test, but apparently failed.
So if I understood well: if p is prime then ((a^p)-a)%p=0 where p%a!=0.
My code seems to be OK, therefore most likely I misunderstood the basics. What am I missing here?
private bool IsPrime(int candidate)
{
//checking if candidate = 0 || 1 || 2
int a = candidate + 1; //candidate can't be divisor of candidate+1
if ((Math.Pow(a, candidate) - a) % candidate == 0) return true;
return false;
}
Reading the wikipedia article on the Fermat primality test, You must choose an a that is less than the candidate you are testing, not more.
Furthermore, as MattW commented, testing only a single a won't give you a conclusive answer as to whether the candidate is prime. You must test many possible as before you can decide that a number is probably prime. And even then, some numbers may appear to be prime but actually be composite.
Your basic algorithm is correct, though you will have to use a larger data type than int if you want to do this for non-trivial numbers.
You should not implement the modular exponentiation in the way that you did, because the intermediate result is huge. Here is the square-and-multiply algorithm for modular exponentiation:
function powerMod(b, e, m)
x := 1
while e > 0
if e%2 == 1
x, e := (x*b)%m, e-1
else b, e := (b*b)%m, e//2
return x
As an example, 437^13 (mod 1741) = 819. If you use the algorithm shown above, no intermediate result will be greater than 1740 * 1740 = 3027600. But if you perform the exponentiation first, the intermediate result of 437^13 is 21196232792890476235164446315006597, which you probably want to avoid.
Even with all of that, the Fermat test is imperfect. There are some composite numbers, the Carmichael numbers, that will always report prime no matter what witness you choose. Look for the Miller-Rabin test if you want something that will work better. I modestly recommend this essay on Programming with Prime Numbers at my blog.
You are dealing with very large numbers, and trying to store them in doubles, which is only 64 bits.
The double will do the best it can to hold your number, but you are going to loose some accuracy.
An alternative approach:
Remember that the mod operator can be applied multiple times, and still give the same result.
So, to avoid getting massive numbers you could apply the mod operator during the calculation of your power.
Something like:
private bool IsPrime(int candidate)
{
//checking if candidate = 0 || 1 || 2
int a = candidate - 1; //candidate can't be divisor of candidate - 1
int result = 1;
for(int i = 0; i < candidate; i++)
{
result = result * a;
//Notice that without the following line,
//this method is essentially the same as your own.
//All this line does is keeps the numbers small and manageable.
result = result % candidate;
}
result -= a;
return result == 0;
}
I was wondering if it is possible to find the largest prime factor of a number by using modulos in C#. In other words, if i % x == 0 then we could break a for loop or something like that, where x is equal to all natural numbers below our i value.
How would I specify the all natural numbers below our i value as equal to our x variable? It becomes a little tedious to write out conditionals for every single integer if you know what I'm saying.
By the way, I'm sure there is a far easier way to do this in C#, so please let me know about it if you have an idea, but I'd also like to try and solve it this way, just to see if I can do it with my beginner knowledge.
Here is my current code if you want to see what I have so far:
static void Main()
{
int largestPrimeFactor = 0;
for (long i = 98739853; i <= 98739853; i--)
{
if (true)
{
largestPrimeFactor += (int) i;
break;
}
}
Console.WriteLine(largestPrimeFactor);
Console.ReadLine();
}
If I were to do this using loop and modulos I would do:
long number = 98739853;
long biggestdiv = number;
while(number%2==0) //get rid of even numbers
number/=2;
long divisor = 3;
if(number!=1)
while(divisor!=number)
{
while(number%divisor==0)
{
number/=divisor;
biggestdiv = divisor;
}
divisor+=2;
}
In the end, biggestdiv would be the largest prime factor.
Note: This code is written directly in browser. I didn't try to compile or run it. This is only for showing my concept. There might be algorithm mistakes. It they are, let me know. I'm aware of the fact that it is not optimized at all (I think Sieve is the best for this).
EDIT:
fixed: previous code would return 1 when number were prime.
fixed: previous code would end in loop leading to overflow of divisor where number were power of 2
Ooh, this sounds like a fun use for iterator blocks. Don't turn this in to your professor, though:
private static List<int> primes = new List<int>() {2};
public static IEnumerable<int> Primes()
{
int p;
foreach(int i in primes) {p = i; yield return p;}
while (p < int.MaxValue)
{
p++;
if (!primes.Any(i => p % i ==0))
{
primes.Add(p);
yield return p;
}
}
}
public int LargestPrimeFactor(int n)
{
return Primes.TakeWhile(p => p <= Math.Sqrt(n)).Where(p => n % p == 0).Last();
}
I'm not sure quite what your question is: perhaps you need a loop over the numbers? However there are two clear problems with your code:
Your for loop has the same stop and end value. Ie it will run once and once only
You have a break before the largestPrimeFactor sum. This sum will NEVER execute, because break will stop the for loop ( and hence execution of that block). The compiler should be giving a warning that this sum is unreachable.
I need to resample big sets of data (few hundred spectra, each containing few thousand points) using simple linear interpolation.
I have created interpolation method in C# but it seems to be really slow for huge datasets.
How can I improve the performance of this code?
public static List<double> interpolate(IList<double> xItems, IList<double> yItems, IList<double> breaks)
{
double[] interpolated = new double[breaks.Count];
int id = 1;
int x = 0;
while(breaks[x] < xItems[0])
{
interpolated[x] = yItems[0];
x++;
}
double p, w;
// left border case - uphold the value
for (int i = x; i < breaks.Count; i++)
{
while (breaks[i] > xItems[id])
{
id++;
if (id > xItems.Count - 1)
{
id = xItems.Count - 1;
break;
}
}
System.Diagnostics.Debug.WriteLine(string.Format("i: {0}, id {1}", i, id));
if (id <= xItems.Count - 1)
{
if (id == xItems.Count - 1 && breaks[i] > xItems[id])
{
interpolated[i] = yItems[yItems.Count - 1];
}
else
{
w = xItems[id] - xItems[id - 1];
p = (breaks[i] - xItems[id - 1]) / w;
interpolated[i] = yItems[id - 1] + p * (yItems[id] - yItems[id - 1]);
}
}
else // right border case - uphold the value
{
interpolated[i] = yItems[yItems.Count - 1];
}
}
return interpolated.ToList();
}
Edit
Thanks, guys, for all your responses. What I wanted to achieve, when I wrote this questions, were some general ideas where I could find some areas to improve the performance. I haven't expected any ready solutions, only some ideas. And you gave me what I wanted, thanks!
Before writing this question I thought about rewriting this code in C++ but after reading comments to Will's asnwer it seems that the gain can be less than I expected.
Also, the code is so simple, that there are no mighty code-tricks to use here. Thanks to Petar for his attempt to optimize the code
It seems that all reduces the problem to finding good profiler and checking every line and soubroutine and trying to optimize that.
Thank you again for all responses and taking your part in this discussion!
public static List<double> Interpolate(IList<double> xItems, IList<double> yItems, IList<double> breaks)
{
var a = xItems.ToArray();
var b = yItems.ToArray();
var aLimit = a.Length - 1;
var bLimit = b.Length - 1;
var interpolated = new double[breaks.Count];
var total = 0;
var initialValue = a[0];
while (breaks[total] < initialValue)
{
total++;
}
Array.Copy(b, 0, interpolated, 0, total);
int id = 1;
for (int i = total; i < breaks.Count; i++)
{
var breakValue = breaks[i];
while (breakValue > a[id])
{
id++;
if (id > aLimit)
{
id = aLimit;
break;
}
}
double value = b[bLimit];
if (id <= aLimit)
{
var currentValue = a[id];
var previousValue = a[id - 1];
if (id != aLimit || breakValue <= currentValue)
{
var w = currentValue - previousValue;
var p = (breakValue - previousValue) / w;
value = b[id - 1] + p * (b[id] - b[id - 1]);
}
}
interpolated[i] = value;
}
return interpolated.ToList();
}
I've cached some (const) values and used Array.Copy, but I think these are micro optimization that are already made by the compiler in Release mode. However You can try this version and see if it will beat the original version of the code.
Instead of
interpolated.ToList()
which copies the whole array, you compute the interpolated values directly in the final list (or return that array instead). Especially if the array/List is big enough to qualify for the large object heap.
Unlike the ordinary heap, the LOH is not compacted by the GC, which means that short lived large objects are far more harmful than small ones.
Then again: 7000 doubles are approx. 56'000 bytes which is below the large object threshold of 85'000 bytes (1).
Looks to me you've created an O(n^2) algorithm. You are searching for the interval, that's O(n), then probably apply it n times. You'll get a quick and cheap speed-up by taking advantage of the fact that the items are already ordered in the list. Use BinarySearch(), that's O(log(n)).
If still necessary, you should be able to do something speedier with the outer loop, what ever interval you found previously should make it easier to find the next one. But that code isn't in your snippet.
I'd say profile the code and see where it spends its time, then you have somewhere to focus on.
ANTS is popular, but Equatec is free I think.
few suggestions,
as others suggested, use profiler to understand better where time is used.
the loop
while (breaks[x] < xItems[0])
could cause exception if x grows bigger than number of items in "breaks" list. You should use something like
while (x < breaks.Count && breaks[x] < xItems[0])
But you might not need that loop at all. Why treat the first item as special case, just start with id=0 and handle the first point in for(i) loop. I understand that id might start from 0 in this case, and [id-1] would be negative index, but see if you can do something there.
If you want to optimize for speed then you sacrifice memory size, and vice versa. You cannot usually have both, except if you make really clever algorithm. In this case, it would mean to calculate as much as you can outside loops, store those values in variables (extra memory) and use them later. For example, instead of always saying:
id = xItems.Count - 1;
You could say:
int lastXItemsIndex = xItems.Count-1;
...
id = lastXItemsIndex;
This is the same suggestion as Petar Petrov did with aLimit, bLimit....
next point, your loop (or the one Petar Petrov suggested):
while (breaks[i] > xItems[id])
{
id++;
if (id > xItems.Count - 1)
{
id = xItems.Count - 1;
break;
}
}
could probably be reduced to:
double currentBreak = breaks[i];
while (id <= lastXIndex && currentBreak > xItems[id]) id++;
and the last point I would add is to check if there is some property in your samples that is special for your problem. For example if xItems represent time, and you are sampling in regular intervals, then
w = xItems[id] - xItems[id - 1];
is constant, and you do not have to calculate it every time in the loop.
This is probably not often the case, but maybe your problem has some other property which you could use to improve performance.
Another idea is this: maybe you do not need double precision, "float" is probably faster because it is smaller.
Good luck
System.Diagnostics.Debug.WriteLine(string.Format("i: {0}, id {1}", i, id));
I hope it's release build without DEBUG defined?
Otherwise, it might depend on what exactly are those IList parameters. May be useful to store Count value instead of accessing property every time.
This is the kind of problem where you need to move over to native code.