Enumerable.Range(...).Any(...) outperforms a basic loop: Why?

Enumerable.Range(...).Any(...) outperforms a basic loop: Why? - c#

I was adapting a simple prime-number generation one-liner from Scala to C# (mentioned in a comment on this blog by its author). I came up with the following:
int NextPrime(int from)
{
while(true)
{
n++;
if (!Enumerable.Range(2, (int)Math.Sqrt(n) - 1).Any((i) => n % i == 0))
return n;
}
}
It works, returning the same results I'd get from running the code referenced in the blog. In fact, it works fairly quickly. In LinqPad, it generated the 100,000th prime in about 1 second. Out of curiosity, I rewrote it without Enumerable.Range() and Any():
int NextPrimeB(int from)
{
while(true)
{
n++;
bool hasFactor = false;
for (int i = 2; i <= (int)Math.Sqrt(n); i++)
{
if (n % i == 0) hasFactor = true;
}
if (!hasFactor) return n;
}
}
Intuitively, I'd expect them to either run at the same speed, or even for the latter to run a little faster. In actuality, computing the same value (100,000th prime) with the second method, takes 12 seconds - It's a staggering difference.
So what's going on here? There must be fundamentally something extra happening in the second approach that's eating up CPU cycles, or some optimization going on the background of the Linq examples. Anybody know why?

For every iteration of the for loop, you are finding the square root of n. Cache it instead.
int root = (int)Math.Sqrt(n);
for (int i = 2; i <= root; i++)
And as other have mentioned, break the for loop as soon as you find a factor.

The LINQ version short circuits, your loop does not. By this I mean that when you have determined that a particular integer is in fact a factor the LINQ code stops, returns it, and then moves on. Your code keeps looping until it's done.
If you change the for to include that short circuit, you should see similar performance:
int NextPrimeB(int from)
{
while(true)
{
n++;
for (int i = 2; i <= (int)Math.Sqrt(n); i++)
{
if (n % i == 0) return n;;
}
}
}

It looks like this is the culprit:
for (int i = 2; i <= (int)Math.Sqrt(n); i++)
{
if (n % i == 0) hasFactor = true;
}
You should exit the loop once you find a factor:
if (n % i == 0){
hasFactor = true;
break;
}
And as other have pointed out, move the Math.Sqrt call outside the loop to avoid calling it each cycle.

Enumerable.Any takes an early out if the condition is successful while your loop does not.
The enumeration of source is stopped as soon as the result can be determined.
This is an example of a bad benchmark. Try modifying your loop and see the difference:
if (n % i == 0) { hasFactor = true; break; }
}
throw new InvalidOperationException("Cannot satisfy criteria.");

In the name of optimization, you can be a little more clever about this by avoiding even numbers after 2:
if (n % 2 != 0)
{
int quux = (int)Math.Sqrt(n);
for (int i = 3; i <= quux; i += 2)
{
if (n % i == 0) return n;
}
}
There are some other ways to optimize prime searches, but this is one of the easier to do and has a large payoff.
Edit: you may want to consider using (int)Math.Sqrt(n) + 1. FP functions + round-down could potentially cause you to miss a square of a large prime number.

At least part of the problem is the number of times Math.Sqrt is executed. In the LINQ query this is executed once but in the loop example it's executed N times. Try pulling that out into a local and reprofiling the application. That will give you a more representative break down
int limit = (int)Math.Sqrt(n);
for (int i = 2; i <= limit; i++)

Related

Quickly returning all numbers less than or divisible by Seven

So i had an interview question: Write a function that takes a number and returns all numbers less than or divisible by 7
private List<int> GetLessThanOrDivisbleBySeven(int num)
{
List<int> ReturnList = new List<int>();
for(int i = 0; i <= num; i++)
{
if(i <7 || i % 7 == 0)
{
ReturnList.Add(i);
}
}
return ReturnList;
}
So far so good. The follow up question was: Let's say that call was being made 10s of thousands of times an hour. How could you speed it up?
I said if you knew what your queue was you could break up your queue and thread it. That got me some points i feel. However, he wanted to know if there was anything in the function i could do.
I came up with the idea to test if the num was greater than 7. if so initialize the list with 1 - 7 and start the loop int i = 8 which i think was ok but is there another way i am missing?

If you want to speed it up without caching, you can just increment i by 7 to get all numbers divisible by 7, it will be something like this:
static private List<int> GetLessThanOrDivisbleBySeven(int num) {
List<int> ReturnList;
int i;
if (num <= 7) {
ReturnList = new List<int>();
for (i = 0; i <= num; i++) {
ReturnList.Add(i);
}
return ReturnList;
}
ReturnList = new List<int> { 0, 1, 2, 3, 4, 5, 6 };
i = 7;
while (i <= num) {
ReturnList.Add(i);
i += 7;
}
return ReturnList;
}

You can cache the results. Each time your function is being called, check what numbers are in the cache, and calculate the rest.
If the current number is smaller, return the appropriate cached results.

use the previous results when calculating new list
int oldMax = 0;
List<int> ReturnList = new List<int>();
private List<int> GetLessThanOrDivisbleBySeven(int num)
{
if (num > oldMax )
{
oldMax = num;
for(int i = oldMax ; i <= num; i++)
{
if(i <7 || i % 7 == 0)
{
ReturnList.Add(i);
}
}
return ReturnList;
}
else
{
// create a copy of ReturnList and Remove from the copy numbers bigger than num
}
}

Interview questions are usually more about how you approach problems in general and not so much the technical implementation. In your case you could do a lot of small things, like caching the list outside. Caching different versions of the list in a dictionary, if space was not a problem. Maybe somebody can come up with some smarter math, to save on calculations, but usually it's more about asking the right questions and considering the right options. Say, if you ask "does this program run on a web server? maybe I can store all data in a table and use it as a quick lookup instead of recalculating every time." There might not even be a correct or best answer, they probably just want to hear, that you can think of special situations.

You can find all the numbers that are divisible by 7 and smaller than num by calculating res = num/7 and then create a loop from 1 to res and multiply each number by 7.
private List<int> GetLessThanOrDivisbleBySeven(int num)
{
List<int> ReturnList = new List<int>();
// Add all the numbers that are less than 7 first
int i = 0;
for(i = 0; i < 7; i++)
ReturnList.Add(i);
int res = num / 7;// num = res*7+rem
for(i = 1; i <= res; i++)
{
ReturnList.Add(i*7);
}
return ReturnList;
}

Think about memory management and how the List class works.
Unless you tell it the capacity it will need, it allocates a new array whenever it runs out of space, however it is easy to work out the size it will need to be.
Returning an array would save one object allocation compared to using a List, so discuss the tradeoff between the two.
What about using "yeild return" to advoid allocating memory, or does it have other costs to consider?
Is the same number requested often, if so consider cacheing.
Would LINQ, maybe using Enumerable.Range help?
An experienced C# programmer would be expected to know at least a little about all the above and that memory management is often an hidden issue.

C# Trial division - first n primes. Error in logic?

In a course a problem was to list the first n primes. Apparently we should implement trial division while saving primes in an array to reduce the number of divisions required. Initially I misunderstood, but got a working if slower solution using a separate function to test for primality but I would like to implement it the way I should have done.
Below is my attempt, with irrelevant code removed, such as the input test.
using System;
namespace PrimeNumbers
{
class MainClass
{
public static void Main (string[] args)
{
Console.Write("How many primes?\n");
string s = Console.ReadLine();
uint N;
UInt32.TryParse(s, out N)
uint[] PrimeTable = new uint[N];
PrimeTable[0] = 2;
for (uint i=1; i < N; i++)//loop n spaces in array, [0] set already so i starts from 1
{
uint j = PrimeTable[i -1] + 1;//sets j bigger than biggest prime so far
bool isPrime = false;// Just a condition to allow the loop to break???(Is that right?)
while (!isPrime)//so loop continues until a break is hit
{
isPrime = true;//to ensure that the loop executes
for(uint k=0; k < i; k++)//want to divide by first i primes
{
if (PrimeTable[k] == 0) break;//try to avoid divide by zero - unnecessary
if (j % PrimeTable[k] == 0)//zero remainder means not prime so break and increment j
{
isPrime = false;
break;
}
}
j++;//j increment mentioned above
}
PrimeTable[i] = j; //not different if this is enclosed in brace above
}
for (uint i = 0; i < N; i++)
Console.Write(PrimeTable[i] + " ");
Console.ReadLine();
}
}
}
My comments are my attempt to describe what I think the code is doing, I have tried very many small changes, often they would lead to divide by zero errors when running so I added in a test, but I don't think it should be necessary. (I also got several out of range errors when trying to change the loop conditions.)
I have looked at several questions on stack exchange, in particular:
Program to find prime numbers
The first answer uses a different method, the second is close to what I want, but the exact thing is in this comment from Nick Larsson:
You could make this faster by keeping track of the primes and only
trying to divide by those.
C# is not shown on here: http://rosettacode.org/wiki/Sequence_of_primes_by_Trial_Division#Python
I have seen plenty of other methods and algorithms, such as Eratosthenes sieve and GNF, but really only want to implement it this way, as I think my problem is with the program logic and I don't understand why it doesn't work. Thanks

The following should solve your problem:
for (uint i = 1; i < numberOfPrimes; i++)//loop n spaces in array, [0] set already so i starts from 1
{
uint j = PrimeTable[i - 1] + 1;//sets j bigger than biggest prime so far
bool isPrime = false;// Just a condition to allow the loop to break???(Is that right?)
while (!isPrime)//so loop continues until a break is hit
{
isPrime = true;//to ensure that the loop executes
for (uint k = 0; k < i; k++)//want to divide by first i primes
{
if (PrimeTable[k] == 0) break;//try to avoid divide by zero - unnecessary
if (j % PrimeTable[k] == 0)//zero remainder means not prime so break and increment j
{
isPrime = false;
j++;
break;
}
}
}
PrimeTable[i] = j;
}
The major change that I did was move the incrementation of the variable j to inside the conditional prime check. This is because, the current value is not prime, so we want to check the next prime number and must move to the next candidate before breaking in the loop.
Your code was incrementing after the check was made. Which means that when you found a prime candidate, you would increment to the next candidate and assign that as your prime. For example, when j = 3, it would pass the condition, isPrime would still = true, but then j++ would increment it to 4 and that would add it to the PrimeTable.
Make sense?

This might not be a very good answer to your question, but you might want to look at this implementation and see if you can spot where yours differs.
int primesCount = 10;
List<uint> primes = new List<uint>() { 2u };
for (uint n = 3u;; n += 2u)
{
if (primes.TakeWhile(u => u * u <= n).All(u => n % u != 0))
{
primes.Add(n);
}
if (primes.Count() >= primesCount)
{
break;
}
}
This correctly and efficiently computes the first primesCount primes.

Interview function with time complexity

I had an interview question to write a program in C# that Outputs odd number of occurrences in an array.
Example: [2, 2, 3, 3, 3] => [3] (Considering the array is sorted)
My solution was:
public list<int> OddOccurance(list<int> InputList)
{
list<int> output = new list<int>();
for(int i=0; i<InputList.length; i++)
{
int Count = 0;
for(int j=1; j<(InputList.length-1); j++)
{
if(InputList[i] == InputList[j])
{
Count++;
}
}
if(Count % 2 != 0)
{
output.add(InputList[i]);
}
}
return output.distinct();
}
I am thinking the answer is correct only but the interviewer had asked me like different ways of how I can make the solution much faster.
Can anyone please tell me the time complexity of the above solution please.
If there is a way to make the above solution much faster then what can be the time complexity of that solution.

Your solution is O(n^2) - if you don't know why - evaluate sum:
This is an equation which describes the running time of your algorithm. You can solve it in linear time easily - just increment i instead of inner loop over all values in array.
for (int i=0; i<InputList.Length; ++i)
{
int currentValue = InputList[i];
int j=i+1;
int count = 1;
while (InputList[j] == currentValue && j<InputList.Length)
{
count++;
i++;
j++;
}
if (count % 2 == 0)
..
}
If array is not sorted - use dictionary (hash table - Dictionary in C#) - value is a dictionary key, count is a dictionary value. (that will give you Contains key check in O(1)) Another way to get linear time if implemented properly.

The root problem of your solution is seen on this line:
return output.Distinct();
The very fact that you are doing a Distinct means that you may be adding more entries than you should.
So how can you optimize it? Observe that since the array is sorted, the only place where you can find a number that's the same as the one you're looking at is next to it, or next to another number that's equal to your current number. In other words, your numbers go in "runs".
This observation lets you go from two nested loops and an O(N2) solution to a single loop and an O(N) solution. Simply walk the array, and check lengths of each "run": when you see a new number, store its index. If you come across a new number, see if the length of the "run" is odd, and start a new run:
int start = 0;
int pos = 1;
while (pos < InputList.Length) {
if (InputList[pos] != InputList[start]) {
if ((pos-start) % 2 == 1) {
output.Add(InputList[start]);
}
start = pos;
}
pos++;
}
// Process the last run
if ((InputList.Length-start) % 2 == 1) {
output.Add(InputList[start]);
}
Demo.

Enter a code branch every ten percent till 100 percent

I can think of some very convoluted methods with loops and nested loops to solve this problem but I'm trying to be more professional than that.
My scenario is that I need to enter a section of code every ten percent but it isn't quite working as expected. It is entering the code about every percent which is due to my code but I lack the knowledge to know how to change it.
int currentPercent = Math.Truncate((current * 100M) / total);
//avoid divide by zero error
if (currentPercent > 0)
{
if (IsDivisible(100, currentPercent))
{
....my code that works fine other than coming in too many times
}
}
Helper referenced above where the trouble is:
private bool IsDivisible(int x, int y)
{
return (x % y) == 0;
}
So obviously it works as it should. Mod eliminates currentPercent of 3 but 1 & 2 pass when really I don't want a true value until currentPercent = 10 and then not again till 20...etc.
Thank you and my apologies for the elementary question

Mod will only catch exact occurrences of your interval. Try keeping track of your next milestone, you'll be less likely to miss them.
const int cycles = 100;
const int interval = 10;
int nextPercent = interval;
for (int index = 0; index <= cycles; index++)
{
int currentPercent = (index * 100) / cycles;
if (currentPercent >= nextPercent)
{
nextPercent = currentPercent - (currentPercent % interval) + interval;
}
}

I might misunderstand you, but it seems like you're trying to do something extremely simple more complex than it needs to be. What about this?
for (int i = 1; i <= 100; i++)
{
if (i % 10 == 0)
{
// Here, you can do what you want - this will happen
// every ten iterations ("percent")
}
}
Or, if your entire code enters from somewhere else (so no loop in this scope), the important part is the i % 10 == 0.

if (IsDivisible(100, currentPercent))
{
....my code that works fine other than coming in too many times
}
try changing that 100 to a 10. And I think your x and y are also backwards.
You can try a few sample operations using google calculator.
(20 mod 10) = 0

Not sure if I fully understand, but I think this is what you want? You also reversed the order of modulo in your code (100 mod percent, rather than the other way around):
int currentPercent = current * 100 / total;
if (currentPercent % 10 == 0)
{
// your code here, every 10%, starting at 0%
}
Note that code this way only works properly if you are guaranteed to hit every percentage-mark. If you could, say, skip from 19% to 21% then you'll need to keep track of which percentage the previous time was to see if you went over a 10% mark.

try this:
for (int percent = 1; percent <= 100; percent++)
{
if (percent % 10 == 0)
{
//code goes here
}
}

Depending on how you increment your % value, this may or may not work % 10 == 0. For example jumping from 89 to 91 % would effectively skip the code execution. You should store last executed value, 80 in this case. Then check if interval is >= 10, so 90 would work, as well as 91.

Interpolation in c# - performance problem

I need to resample big sets of data (few hundred spectra, each containing few thousand points) using simple linear interpolation.
I have created interpolation method in C# but it seems to be really slow for huge datasets.
How can I improve the performance of this code?
public static List<double> interpolate(IList<double> xItems, IList<double> yItems, IList<double> breaks)
{
double[] interpolated = new double[breaks.Count];
int id = 1;
int x = 0;
while(breaks[x] < xItems[0])
{
interpolated[x] = yItems[0];
x++;
}
double p, w;
// left border case - uphold the value
for (int i = x; i < breaks.Count; i++)
{
while (breaks[i] > xItems[id])
{
id++;
if (id > xItems.Count - 1)
{
id = xItems.Count - 1;
break;
}
}
System.Diagnostics.Debug.WriteLine(string.Format("i: {0}, id {1}", i, id));
if (id <= xItems.Count - 1)
{
if (id == xItems.Count - 1 && breaks[i] > xItems[id])
{
interpolated[i] = yItems[yItems.Count - 1];
}
else
{
w = xItems[id] - xItems[id - 1];
p = (breaks[i] - xItems[id - 1]) / w;
interpolated[i] = yItems[id - 1] + p * (yItems[id] - yItems[id - 1]);
}
}
else // right border case - uphold the value
{
interpolated[i] = yItems[yItems.Count - 1];
}
}
return interpolated.ToList();
}
Edit
Thanks, guys, for all your responses. What I wanted to achieve, when I wrote this questions, were some general ideas where I could find some areas to improve the performance. I haven't expected any ready solutions, only some ideas. And you gave me what I wanted, thanks!
Before writing this question I thought about rewriting this code in C++ but after reading comments to Will's asnwer it seems that the gain can be less than I expected.
Also, the code is so simple, that there are no mighty code-tricks to use here. Thanks to Petar for his attempt to optimize the code
It seems that all reduces the problem to finding good profiler and checking every line and soubroutine and trying to optimize that.
Thank you again for all responses and taking your part in this discussion!

public static List<double> Interpolate(IList<double> xItems, IList<double> yItems, IList<double> breaks)
{
var a = xItems.ToArray();
var b = yItems.ToArray();
var aLimit = a.Length - 1;
var bLimit = b.Length - 1;
var interpolated = new double[breaks.Count];
var total = 0;
var initialValue = a[0];
while (breaks[total] < initialValue)
{
total++;
}
Array.Copy(b, 0, interpolated, 0, total);
int id = 1;
for (int i = total; i < breaks.Count; i++)
{
var breakValue = breaks[i];
while (breakValue > a[id])
{
id++;
if (id > aLimit)
{
id = aLimit;
break;
}
}
double value = b[bLimit];
if (id <= aLimit)
{
var currentValue = a[id];
var previousValue = a[id - 1];
if (id != aLimit || breakValue <= currentValue)
{
var w = currentValue - previousValue;
var p = (breakValue - previousValue) / w;
value = b[id - 1] + p * (b[id] - b[id - 1]);
}
}
interpolated[i] = value;
}
return interpolated.ToList();
}
I've cached some (const) values and used Array.Copy, but I think these are micro optimization that are already made by the compiler in Release mode. However You can try this version and see if it will beat the original version of the code.

Instead of
interpolated.ToList()
which copies the whole array, you compute the interpolated values directly in the final list (or return that array instead). Especially if the array/List is big enough to qualify for the large object heap.
Unlike the ordinary heap, the LOH is not compacted by the GC, which means that short lived large objects are far more harmful than small ones.
Then again: 7000 doubles are approx. 56'000 bytes which is below the large object threshold of 85'000 bytes (1).

Looks to me you've created an O(n^2) algorithm. You are searching for the interval, that's O(n), then probably apply it n times. You'll get a quick and cheap speed-up by taking advantage of the fact that the items are already ordered in the list. Use BinarySearch(), that's O(log(n)).
If still necessary, you should be able to do something speedier with the outer loop, what ever interval you found previously should make it easier to find the next one. But that code isn't in your snippet.

I'd say profile the code and see where it spends its time, then you have somewhere to focus on.
ANTS is popular, but Equatec is free I think.

few suggestions,
as others suggested, use profiler to understand better where time is used.
the loop
while (breaks[x] < xItems[0])
could cause exception if x grows bigger than number of items in "breaks" list. You should use something like
while (x < breaks.Count && breaks[x] < xItems[0])
But you might not need that loop at all. Why treat the first item as special case, just start with id=0 and handle the first point in for(i) loop. I understand that id might start from 0 in this case, and [id-1] would be negative index, but see if you can do something there.
If you want to optimize for speed then you sacrifice memory size, and vice versa. You cannot usually have both, except if you make really clever algorithm. In this case, it would mean to calculate as much as you can outside loops, store those values in variables (extra memory) and use them later. For example, instead of always saying:
id = xItems.Count - 1;
You could say:
int lastXItemsIndex = xItems.Count-1;
...
id = lastXItemsIndex;
This is the same suggestion as Petar Petrov did with aLimit, bLimit....
next point, your loop (or the one Petar Petrov suggested):
while (breaks[i] > xItems[id])
{
id++;
if (id > xItems.Count - 1)
{
id = xItems.Count - 1;
break;
}
}
could probably be reduced to:
double currentBreak = breaks[i];
while (id <= lastXIndex && currentBreak > xItems[id]) id++;
and the last point I would add is to check if there is some property in your samples that is special for your problem. For example if xItems represent time, and you are sampling in regular intervals, then
w = xItems[id] - xItems[id - 1];
is constant, and you do not have to calculate it every time in the loop.
This is probably not often the case, but maybe your problem has some other property which you could use to improve performance.
Another idea is this: maybe you do not need double precision, "float" is probably faster because it is smaller.
Good luck

System.Diagnostics.Debug.WriteLine(string.Format("i: {0}, id {1}", i, id));
I hope it's release build without DEBUG defined?
Otherwise, it might depend on what exactly are those IList parameters. May be useful to store Count value instead of accessing property every time.

This is the kind of problem where you need to move over to native code.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Enumerable.Range(...).Any(...) outperforms a basic loop: Why? - c#

For every iteration of the for loop, you are finding the square root of n. Cache it instead. int root = (int)Math.Sqrt(n); for (int i = 2; i <= root; i++) And as other have mentioned, break the for loop as soon as you find a factor.

Related

Quickly returning all numbers less than or divisible by Seven

C# Trial division - first n primes. Error in logic?

Interview function with time complexity

Enter a code branch every ten percent till 100 percent

Interpolation in c# - performance problem

Categories

Resources