C#'s Math class does roots and powers in double only. Various things may go a bit faster if I add float-based square-root and power functions to my Math2 class (Today is a relaxation day and I find optimization relaxing).
So - Fast square-root and power functions that I don't have to worry about licensing for, plskthx. Or a link that'll get me there.
I'm going to take it as axiomatic that no software method will compete with the hardware instruction for square roots. The only difficulty is that .NET doesn't give us direct control of the hardware as in the days of inline assembler for C code.
Let's first discuss a generic x86 hardware prospect.
The floating point x86 instruction FSQRT does come in three precisions: single, double, and extended (the native precision of the 80-bit FP registers), and there is a 25-40% shorter timing for single vs. double precision. See here for 32-bit x86 instructions.
That may sound like a big opportunity, but it's only a dozen clocks or so. That sort of economization will easily get lost in the overhead unless you are able to carefully manage the code from function call to return value. Managed C++ sounds (as Marcelo Cantos suggests) like a more practical base for this than C#.
Note: Timings for FSQRT are identical to those FDIV, with which it shares an execution unit in the Intel architecture, and thus a common latency.
A better opportunity for specialized C# code probably exists in the direction of SSE SIMD instructions, where hardware allows for up to 4 single precision square roots to be done in parallel. JIT compiler support for this has been missing for years, but here are some leads on current development.
Intel has jumped in (Dec. 15,2010), seeing that .NET Framework 4 wasn't doing anything with SIMD:
[Intel Performance Libraries allow... SIMD instructions in C#]
Even before that the Mono project added JIT support for SIMD in Mono 2.2:
[Mono: Release Note Mono 2.2]
The possibility of calling Mono's SIMD support from MS C# was recently raised here:
[Calling mono c# code from Microsoft .net ? -- Stackoverflow]
An earlier question also addresses (though without much love shown!) how to install Mono's SIMD support:
[how to enable Mono.Simd -- Stackoverflow]
Should check out this link:
http://www.codecodex.com/wiki/Calculate_an_integer_square_root
has lots of speedy algorithms in a bunch of different languages.
Ex:
// Finds the integer square root of a positive number
public static int Isqrt(int num) {
if (0 == num) { return 0; } // Avoid zero divide
int n = (num / 2) + 1; // Initial estimate, never low
int n1 = (n + (num / n)) / 2;
while (n1 < n) {
n = n1;
n1 = (n + (num / n)) / 2;
} // end while
return n;
} // end Isqrt()
but there are a lot more, some C/C++ ones are supposed to be the fastest, or so they claim.
for the POW algotrithm check i found this one HERE, along an explanation of how to get to that algorithm, starting from simpler ones.
private double Power(double a, int b) {
if (b<0) {
throw new ApplicationException("B must be a positive integer or zero");
}
if (b==0) return 1;
if (a==0) return 0;
if (b%2==0) {
return Power(a*a, b/2);
} else if (b%2==1) {
return a*Power(a*a,b/2);
}
return 0;
}
Wikipedia has an extensive article on calculation of square roots:
http://en.wikipedia.org/wiki/Methods_of_computing_square_roots
Calculating x to the power of y is simpler:
http://www.osix.net/modules/article/?id=696
I liked this pocked calculator way of doing it:
... but I honestly have no idea whether it is fast.
Probably the easiest way is to implement the float versions in Managed C++. Whether that will go faster that the baked-in double versions or not, I can't say.
Related
I've been warned by numerous programmers not to use the square root function, and instead to raise numbers to the half power. My question is twofold:
What is the perceived/real performance benefit to doing this? Why is it faster?
If it really is faster, why does the square root function even exist?
I've performed a simple test:
Stopwatch sw = new Stopwatch();
sw.Start();
Double s = 0.0;
// compute 1e8 times either Sqrt(x) or its emulation as Pow(x, 0.5)
for (Double d = 0; d < 1e8; d += 1)
// s += Math.Sqrt(d); // <- uncomment it to test Sqrt
s += Math.Pow(d, 0.5); // <- uncomment it to test Pow
sw.Stop();
Console.Out.Write(sw.ElapsedMilliseconds);
The (averaged) outcome at my workstation (x64) is
Sqrt: 950 ms
Pow: 5500 ms
As you can see, more specific Sqrt(x) 5.5 times faster than its emulation Pow(x, 0.5). So it's just one more legend (at least in C#) that Sqrt is that slow one should prefer Pow substitution
You would have to know something about how each function is implemented to answer the question.
The square root function uses Newton's method to iteratively calculate the square root. It converges quadratically. Nothing will speed that up.
The other functions, exp() and ln(x), have implementations that have their own convergence/complexity issues. For example, it's possible to implement both as series sums. A certain number of terms are required to maintain sufficient accuracy.
All bets are off if those functions happen to be implemented in native code. Those might be faster than anything you'll write.
Knowing those would let you make an informed decision. I would not take it on faith because those programmers "know" the answer.
Unless you're doing intensive numerical work, I'd say that the choice won't affect your overall program performance. It's micro-optimization that's best avoided, unless you're doing serious large-scale scientific programming.
i'm looking for an alternative to the BigInteger package of C# which has been introduced with NET 4.x.
The mathematical operations with this object are terribly slow, I guess this is caused by the fact that the arithmetics are done on a higher level than the primitive types - or badly optimized, whatever.
Int64/long/ulong or other 64bit-numbers are way to small and won't calculate correctly - I'm talking about 64bit-integer to the power of 64-bit integers.
Hopefully someone can suggest my something. Thanks in advance.
Honestly, if you have extremely large numbers and need to do heavy computations with them and the BigInteger library still isn't cutting it for you, why not offload it onto an external process using whatever language or toolkit you know of that does it best? Are you truly constrained to write whatever it is you're trying to accomplish entirely in C#?
For example, you can offload to MATLAB in C#.
BIGInteger is indeed very slow. One of the reasons is it's immutability.
If you do a = a - b you will get a new copy of a. Normally this is fast. With BigInteger and say an integer of 2048 bits it will need to allocate an extra 2KB.
It should also have different multiplication-algorithms depending on integersize (I assume it is not that sophisticated). What I mean is that for very very large integers a different algorithm using fourier transforms works best and for smaller integers you break the work down in smaller multiplies (divide and conquer approach). See more on http://en.wikipedia.org/wiki/Multiplication_algorithm
Either way there are alternatives, none of which I have used or tested. They might be slower as .NET internal for all I know. (making a testcase and do some valid testing is your friend)
Google 'C# large integer multiplication' for a lot of homemade BigInteger implementations (usually from pre C#4.0 when BIGInteger was introduced)
https://github.com/devoyster/IntXLib
http://gmplib.org/ (there are C# wrappers)
http://www.extremeoptimization.com/ (commercial)
http://mathnetnumerics.codeplex.com/ (nice opensource, but not much onboard for very large integers)
public static int PowerBySquaring(int baseNumber, int exponent)
{
int result = 1;
while (exponent != 0)
{
if ((exponent & 1)==1)
{
result *= baseNumber;
}
exponent >>= 1;
baseNumber *= baseNumber;
}
return result;
}
Anyone knows if multiply operator is faster than using the Math.Pow method? Like:
n * n * n
vs
Math.Pow ( n, 3 )
I just reinstalled windows so visual studio is not installed and the code is ugly
using System;
using System.Diagnostics;
public static class test{
public static void Main(string[] args){
MyTest();
PowTest();
}
static void PowTest(){
var sw = Stopwatch.StartNew();
double res = 0;
for (int i = 0; i < 333333333; i++){
res = Math.Pow(i,30); //pow(i,30)
}
Console.WriteLine("Math.Pow: " + sw.ElapsedMilliseconds + " ms: " + res);
}
static void MyTest(){
var sw = Stopwatch.StartNew();
double res = 0;
for (int i = 0; i < 333333333; i++){
res = MyPow(i,30);
}
Console.WriteLine("MyPow: " + sw.ElapsedMilliseconds + " ms: " + res);
}
static double MyPow(double num, int exp)
{
double result = 1.0;
while (exp > 0)
{
if (exp % 2 == 1)
result *= num;
exp >>= 1;
num *= num;
}
return result;
}
}
The results:
csc /o test.cs
test.exe
MyPow: 6224 ms: 4.8569351667866E+255
Math.Pow: 43350 ms: 4.8569351667866E+255
Exponentiation by squaring (see https://stackoverflow.com/questions/101439/the-most-efficient-way-to-implement-an-integer-based-power-function-powint-int) is much faster than Math.Pow in my test (my CPU is a Pentium T3200 at 2 Ghz)
EDIT: .NET version is 3.5 SP1, OS is Vista SP1 and power plan is high performance.
Basically, you should benchmark to see.
Educated Guesswork (unreliable):
In case it's not optimized to the same thing by some compiler...
It's very likely that x * x * x is faster than Math.Pow(x, 3) as Math.Pow has to deal with the problem in its general case, dealing with fractional powers and other issues, while x * x * x would just take a couple multiply instructions, so it's very likely to be faster.
A few rules of thumb from 10+ years of optimization in image processing & scientific computing:
Optimizations at an algorithmic level beat any amount of optimization at a low level. Despite the "Write the obvious, then optimize" conventional wisdom this must be done at the start. Not after.
Hand coded math operations (especially SIMD SSE+ types) will generally outperform the fully error checked, generalized inbuilt ones.
Any operation where the compiler knows beforehand what needs to be done are optimized by the compiler. These include:
1. Memory operations such as Array.Copy()
2. For loops over arrays where the array length is given. As in for (..; i<array.Length;..)
Always set unrealistic goals (if you want to).
I just happened to have tested this yesterday, then saw your question now.
On my machine, a Core 2 Duo running 1 test thread, it is faster to use multiply up to a factor of 9. At 10, Math.Pow(b, e) is faster.
However, even at a factor of 2, the results are often not identical. There are rounding errors.
Some algorithms are highly sensitive to rounding errors. I had to literally run over a million random tests until I discovered this.
This is so micro that you should probably benchmark it for specific platforms, I don't think the results for a Pentium Pro will be necessarily the same as for an ARM or Pentium II.
All in all, it's most likely to be totally irrelevant.
I checked, and Math.Pow() is defined to take two doubles. This means that it can't do repeated multiplications, but has to use a more general approach. If there were a Math.Pow(double, int), it could probably be more efficient.
That being said, the performance difference is almost certainly absolutely trivial, and so you should use whichever is clearer. Micro-optimizations like this are almost always pointless, can be introduced at virtually any time, and should be left for the end of the development process. At that point, you can check if the software is too slow, where the hot spots are, and spend your micro-optimization effort where it will actually make a difference.
Let's use the convention x^n. Let's assume n is always an integer.
For small values of n, boring multiplication will be faster, because Math.Pow (likely, implementation dependent) uses fancy algorithms to allow for n to be non-integral and/or negative.
For large values of n, Math.Pow will likely be faster, but if your library isn't very smart it will use the same algorithm, which is not ideal if you know that n is always an integer. For that you could code up an implementation of exponentiation by squaring or some other fancy algorithm.
Of course modern computers are very fast and you should probably stick to the simplest, easiest to read, least likely to be buggy method until you benchmark your program and are sure that you will get a significant speedup by using a different algorithm.
Math.Pow(x, y) is typically calculated internally as Math.Exp(Math.Log(x) * y). Evey power equation requires finding a natural log, a multiplication, and raising e to a power.
As I mentioned in my previous answer, only at a power of 10 does Math.Pow() become faster, but accuracy will be compromised if using a series of multiplications.
I disagree that handbuilt functions are always faster. The cosine functions are way faster and more accurate than anything i could write. As for pow(). I did a quick test to see how slow Math.pow() was in javascript, because Mehrdad cautioned against guesswork
for (i3 = 0; i3 < 50000; ++i3) {
for(n=0; n < 9000;n++){
x=x*Math.cos(i3);
}
}
here are the results:
Each function run 50000 times
time for 50000 Math.cos(i) calls = 8 ms
time for 50000 Math.pow(Math.cos(i),9000) calls = 21 ms
time for 50000 Math.pow(Math.cos(i),9000000) calls = 16 ms
time for 50000 homemade for loop calls 1065 ms
if you don't agree try the program at http://www.m0ose.com/javascripts/speedtests/powSpeedTest.html
I am porting an existing application to C# and want to improve performance wherever possible. Many existing loop counters and array references are defined as System.UInt32, instead of the Int32 I would have used.
Is there any significant performance difference for using UInt32 vs Int32?
The short answer is "No. Any performance impact will be negligible".
The correct answer is "It depends."
A better question is, "Should I use uint when I'm certain I don't need a sign?"
The reason you cannot give a definitive "yes" or "no" with regards to performance is because the target platform will ultimately determine performance. That is, the performance is dictated by whatever processor is going to be executing the code, and the instructions available. Your .NET code compiles down to Intermediate Language (IL or Bytecode). These instructions are then compiled to the target platform by the Just-In-Time (JIT) compiler as part of the Common Language Runtime (CLR). You can't control or predict what code will be generated for every user.
So knowing that the hardware is the final arbiter of performance, the question becomes, "How different is the code .NET generates for a signed versus unsigned integer?" and "Does the difference impact my application and my target platforms?"
The best way to answer these questions is to run a test.
class Program
{
static void Main(string[] args)
{
const int iterations = 100;
Console.WriteLine($"Signed: {Iterate(TestSigned, iterations)}");
Console.WriteLine($"Unsigned: {Iterate(TestUnsigned, iterations)}");
Console.Read();
}
private static void TestUnsigned()
{
uint accumulator = 0;
var max = (uint)Int32.MaxValue;
for (uint i = 0; i < max; i++) ++accumulator;
}
static void TestSigned()
{
int accumulator = 0;
var max = Int32.MaxValue;
for (int i = 0; i < max; i++) ++accumulator;
}
static TimeSpan Iterate(Action action, int count)
{
var elapsed = TimeSpan.Zero;
for (int i = 0; i < count; i++)
elapsed += Time(action);
return new TimeSpan(elapsed.Ticks / count);
}
static TimeSpan Time(Action action)
{
var sw = new Stopwatch();
sw.Start();
action();
sw.Stop();
return sw.Elapsed;
}
}
The two test methods, TestSigned and TestUnsigned, each perform ~2 million iterations of a simple increment on a signed and unsigned integer, respectively. The test code runs 100 iterations of each test and averages the results. This should weed out any potential inconsistencies. The results on my i7-5960X compiled for x64 were:
Signed: 00:00:00.5066966
Unsigned: 00:00:00.5052279
These results are nearly identical, but to get a definitive answer, we really need to look at the bytecode generated for the program. We can use ILDASM as part of the .NET SDK to inspect the code in the assembly generated by the compiler.
Here, we can see that the C# compiler favors signed integers and actually performs most operations natively as signed integers and only ever treats the value in-memory as unsigned when comparing for the branch (a.k.a jump or if). Despite the fact that we're using an unsigned integer for both the iterator AND the accumulator in TestUnsigned, the code is nearly identical to the TestSigned method except for a single instruction: IL_0016. A quick glance at the ECMA spec describes the difference:
blt.un.s :
Branch to target if less than (unsigned or unordered), short form.
blt.s :
Branch to target if less than, short form.
Being such a common instruction, it's safe to assume that most modern high-power processors will have hardware instructions for both operations and they'll very likely execute in the same number of cycles, but this is not guaranteed. A low-power processor may have fewer instructions and not have a branch for unsigned int. In this case, the JIT compiler may have to emit multiple hardware instructions (A conversion first, then a branch, for instance) to execute the blt.un.s IL instruction. Even if this is the case, these additional instructions would be basic and probably wouldn't impact the performance significantly.
So in terms of performance, the long answer is "It is unlikely that there will be a performance difference at all between using a signed or an unsigned integer. If there is a difference, it is likely to be negligible."
So then if the performance is identical, the next logical question is, "Should I use an unsigned value when I'm certain I don't need a sign?"
There are two things to consider here: first, unsigned integers are NOT CLS-compliant, meaning that you may run into issues if you're exposing an unsigned integer as part of an API that another program will consume (such as if you're distributing a reusable library). Second, most operations in .NET, including the method signatures exposed by the BCL (for the reason above), use a signed integer. So if you plan on actually using your unsigned integer, you'll likely find yourself casting it quite a bit. This is going to have a very small performance hit and will make your code a little messier. In the end, it's probably not worth it.
TLDR; back in my C++ days, I'd say "Use whatever is most appropriate and let the compiler sort the rest out." C# is not quite as cut-and-dry, so I would say this for .NET: There's really no performance difference between a signed and unsigned integer on x86/x64, but most operations require a signed integer, so unless you really NEED to restrict the values to positive ONLY or you really NEED the extra range that the sign bit eats, stick with a signed integer. Your code will be cleaner in the end.
I don't think there are any performance considerations, other than possible difference between signed and unsigned arithmetic at the processor level but at that point I think the differences are moot.
The bigger difference is in the CLS compliance as the unsigned types are not CLS compliant as not all languages support them.
I haven't done any research on the matter in .NET, but in the olden days of Win32/C++, if you wanted to cast a "signed int" to a "signed long", the cpu had to run an op to extend the sign. To cast an "unsigned int" to an "unsigned long", it just had stuff zero in the upper bytes. Savings was on the order of a couple of clock cycles (i.e., you'd have to do it billions of times to have an even perceivable difference)
There is no difference, performance wise. Simple integer calculations are well known and modern cpu's are highly optimized to perform them quickly.
These types of optimizations are rarely worth the effort. Use the data type that is most appropriate for the task and leave it at that. If this thing so much as touches a database you could probably find a dozen tweaks in the DB design, query syntax or indexing strategy that would offset a code optimization in C# by a few hundred orders of magnitude.
Its going to allocate the same amount of memory either way (although the one can store a larger value, as its not saving space for the sign). So I doubt you'll see a 'performance' difference, unless you use large values / negative values that will cause one option or the other to explode.
this isn't really to do with performance rather requirements for the loop counter.
Prehaps there were lots of iterations to complete
Console.WriteLine(Int32.MaxValue); // Max interation 2147483647
Console.WriteLine(UInt32.MaxValue); // Max interation 4294967295
The unsigned int may be there for a reason.
I've never empathized with the use of int in loops for(int i=0;i<bla;i++). And oftentimes I would also like to use unsigned just to avoid checking the range. Unfortunately (both in C++ and for similar reasons in C#), the recommendation is to not use unsigned to gain one more bit or to ensure non-negativity, :
"Using an unsigned instead of an int to gain one more bit to represent positive integers is almost never a good idea. Attempts to ensure that some values are positive by declaring variables unsigned will typically be defeated by the implicit conversion rules"
page 73 from "The C++ Programming Language" by the language's creator Bjarne Stroustrup.
My understanding (I apologize for not having the source at hand) is that hardware makers also have a bias to optimize for integer types.
Nonetheless, it would be interesting to do the same exercise that #Robear did above but using integer with some positivity assert versus unsigned.
I actually have an answer to my question but it is not parallelized so I am interested in ways to improve the algorithm. Anyway it might be useful as-is for some people.
int Until = 20000000;
BitArray PrimeBits = new BitArray(Until, true);
/*
* Sieve of Eratosthenes
* PrimeBits is a simple BitArray where all bit is an integer
* and we mark composite numbers as false
*/
PrimeBits.Set(0, false); // You don't actually need this, just
PrimeBits.Set(1, false); // remindig you that 2 is the smallest prime
for (int P = 2; P < (int)Math.Sqrt(Until) + 1; P++)
if (PrimeBits.Get(P))
// These are going to be the multiples of P if it is a prime
for (int PMultiply = P * 2; PMultiply < Until; PMultiply += P)
PrimeBits.Set(PMultiply, false);
// We use this to store the actual prime numbers
List<int> Primes = new List<int>();
for (int i = 2; i < Until; i++)
if (PrimeBits.Get(i))
Primes.Add(i);
Maybe I could use multiple BitArrays and BitArray.And() them together?
You might save some time by cross-referencing your bit array with a doubly-linked list, so you can more quickly advance to the next prime.
Also, in eliminating later composites once you hit a new prime p for the first time - the first composite multiple of p remaining will be p*p, since everything before that has already been eliminated. In fact, you only need to multiply p by all the remaining potential primes that are left after it in the list, stopping as soon as your product is out of range (larger than Until).
There are also some good probabilistic algorithms out there, such as the Miller-Rabin test. The wikipedia page is a good introduction.
Parallelisation aside, you don't want to be calculating sqrt(Until) on every iteration. You also can assume multiples of 2, 3 and 5 and only calculate for N%6 in {1,5} or N%30 in {1,7,11,13,17,19,23,29}.
You should be able to parallelize the factoring algorithm quite easily, since the Nth stage only depends on the sqrt(n)th result, so after a while there won't be any conflicts. But that's not a good algorithm, since it requires lots of division.
You should also be able to parallelize the sieve algorithms, if you have writer work packets which are guaranteed to complete before a read. Mostly the writers shouldn't conflict with the reader - at least once you've done a few entries, they should be working at least N above the reader, so you only need a synchronized read fairly occasionally (when N exceeds the last synchronized read value). You shouldn't need to synchronize the bool array across any number of writer threads, since write conflicts don't arise (at worst, more than one thread will write a true to the same place).
The main issue would be to ensure that any worker being waited on to write has completed. In C++ you'd use a compare-and-set to switch to the worker which is being waited for at any point. I'm not a C# wonk so don't know how to do it that language, but the Win32 InterlockedCompareExchange function should be available.
You also might try an actor based approach, since that way you can schedule the actors working with the lowest values, which may be easier to guarantee that you're reading valid parts of the the sieve without having to lock the bus on each increment of N.
Either way, you have to ensure that all workers have got above entry N before you read it, and the cost of doing that is where the trade-off between parallel and serial is made.
Without profiling we cannot tell which bit of the program needs optimizing.
If you were in a large system, then one would use a profiler to find that the prime number generator is the part that needs optimizing.
Profiling a loop with a dozen or so instructions in it is not usually worth while - the overhead of the profiler is significant compared to the loop body, and about the only ways to improve a loop that small is to change the algorithm to do fewer iterations. So IME, once you've eliminated any expensive functions and have a known target of a few lines of simple code, you're better off changing the algorithm and timing an end-to-end run than trying to improve the code by instruction level profiling.
#DrPizza Profiling only really helps improve an implementation, it doesn't reveal opportunities for parallel execution, or suggest better algorithms (unless you've experience to the otherwise, in which case I'd really like to see your profiler).
I've only single core machines at home, but ran a Java equivalent of your BitArray sieve, and a single threaded version of the inversion of the sieve - holding the marking primes in an array, and using a wheel to reduce the search space by a factor of five, then marking a bit array in increments of the wheel using each marking prime. It also reduces storage to O(sqrt(N)) instead of O(N), which helps both in terms of the largest N, paging, and bandwidth.
For medium values of N (1e8 to 1e12), the primes up to sqrt(N) can be found quite quickly, and after that you should be able to parallelise the subsequent search on the CPU quite easily. On my single core machine, the wheel approach finds primes up to 1e9 in 28s, whereas your sieve (after moving the sqrt out of the loop) takes 86s - the improvement is due to the wheel; the inversion means you can handle N larger than 2^32 but makes it slower. Code can be found here. You could parallelise the output of the results from the naive sieve after you go past sqrt(N) too, as the bit array is not modified after that point; but once you are dealing with N large enough for it to matter the array size is too big for ints.
You also should consider a possible change of algorithms.
Consider that it may be cheaper to simply add the elements to your list, as you find them.
Perhaps preallocating space for your list, will make it cheaper to build/populate.
Are you trying to find new primes? This may sound stupid, but you might be able to load up some sort of a data structure with known primes. I am sure someone out there has a list. It might be a much easier problem to find existing numbers that calculate new ones.
You might also look at Microsofts Parallel FX Library for making your existing code multi-threaded to take advantage of multi-core systems. With minimal code changes you can make you for loops multi-threaded.
There's a very good article about the Sieve of Eratosthenes: The Genuine Sieve of Eratosthenes
It's in a functional setting, but most of the opimization do also apply to a procedural implementation in C#.
The two most important optimizations are to start crossing out at P^2 instead of 2*P and to use a wheel for the next prime numbers.
For concurrency, you can process all numbers till P^2 in parallel to P without doing any unnecessary work.
void PrimeNumber(long number)
{
bool IsprimeNumber = true;
long value = Convert.ToInt32(Math.Sqrt(number));
if (number % 2 == 0)
{
IsprimeNumber = false;
MessageBox.Show("No It is not a Prime NUmber");
return;
}
for (long i = 3; i <= value; i=i+2)
{
if (number % i == 0)
{
MessageBox.Show("It is divisible by" + i);
IsprimeNumber = false;
break;
}
}
if (IsprimeNumber)
{
MessageBox.Show("Yes Prime NUmber");
}
else
{
MessageBox.Show("No It is not a Prime NUmber");
}
}