I have some coding-resource realted questions :
Q1 :
A:
for (int i = 0; i < 15; i++)
{
FunA(i);
FunB(i);
}
Or
B:
for (int i = 0; i < 15; i++)
{
FunA(i);
}
for (int i = 0; i < 15 ; i++)
{
FunB(i);
}
does B take more resources than A because there are 2 loops ? and why ?
Q2:
A:
FunA(10*2-15+15/X);
FunA(10*2-15+15/X);
FunA(10*2-15+15/X);
B:
int result=10*2-15+15/X;
FunA(result);
FunA(result);
FunA(result);
does A take more resource because it have to calculate result every time or B takes more ? does compiler understand results are same and make a variable out of them?
Use a profiler.
For your second question the C# compiler is going to optimize compile-time constants such as 10*2-15+15.
Your second B example is better code despite the compiler potentially optimizing the A example for you.
Performance aside, the second A example is just bad coding practices. Unnecessary duplicate constants (more chance for user error, especially if you change this code later on).
That said, a good rule of thumb is don't rely on compiler optimizations. Try and make the code readable and for things that actually are constant you should literally define them as const:
const int WHATEVER = 10 * 2 - 15 + 15;
int result = WHATEVER / X;
FunA(result);
FunA(result);
FunA(result);
Finally, here's the obligatory mention that profiling is almost always more accurate than rolling your own benchmarks (which I see written incorrectly more times than not). Visual Studio 2017 has a built-in profiler or you can use a variety of other ones out there.
When in doubt about performance, don't guess. Get actual metrics.
Related
I have a newbie question guys.
Let's suppose that I have the following simple nested loop, where m and n are not necessarily the same, but are really big numbers:
x = 0;
for (i=0; i<m; i++)
{
for (j=0; j<n; j++)
{
delta = CalculateDelta(i,j);
x = x + j + i + delta;
}
}
And now I have this:
x = 0;
for (i=0; i<m; i++)
{
for (j=0; j<n; j++)
{
delta = CalculateDelta(i,j);
x = x + j + i + delta;
j++;
delta = CalculateDelta(i,j);
x = x + j + i + delta;
}
}
Rule: I do need to go through all the elements of the loop, because of this delta calculation.
My questions are:
1) Is the second algorithm faster than the first one, or is it the same?
I have this doubt because for me the first algorithm has a complexity of O(m * n) and the second one is O(m * n/2). Or does lower complexity not necessary makes it faster?
2) Is there any other way to make this faster without something like a Parallel. For?
3) If I make usage of a Parallel. For, would it really make it faster since I would probably need to do a synchronization lock on the x variable?
Thanks!
Definitely not, since time complexity is presumably dominated by the number of times CalculateDelta() is called, it doesn't matter whether you make the calls inline, within a single loop or any number of nested loops, the call gets made m*n times.
And now you have a bug (which is the reason I decided to add an answer after #Peter-Duniho had already done so quite comprehensively)
If n is odd, you do more iterations than intended - almost certainly getting the wrong answer or crashing your program...
Asking three questions in a single post is pushing the limits of "too broad". But in your case, the questions are reasonably simple, so…
1) Does the second algorithm is faster then the first one, or is it the same? I have this doubt because for me the first algorithm have a complexity of O(m * n) and the second one is O(m * n/2). Or does lower complexity not necessary makes it faster?
Complexity ignores coefficients like 1/2. So there's no difference between O(m * n) and O(m * n/2). The latter should have been reduced to O(m * n), which is obviously the same as the former.
And the second isn't really O(m * n/2) anyway, because you didn't really remove work. You just partially unrolled your loop. These kinds of meaningless transformations are one of the reasons we ignore coefficients in big-O notation in the first place. It's too easy to fiddle with the coefficients without really changing the actual computational work.
2) Is there any other way to make this faster without something like a Parallel.For?
That's definitely too broad a question. "Any other way"? Probably. But you haven't provided enough context.
The only obvious potential improvement I can see in the code you posted is that you are computing j + i repeatedly, when you could instead just observe that that component of the whole computation increases by 1 with each iteration and so you could keep a separate incrementing variable instead of adding i and j each time. But a) it's far from clear making that change would speed anything up (whether it would, would depend a lot on specifics in the CPU's own optimization logic), and b) if it did so reliably, it's possible that a good optimizing JIT compiler would make that transformation to the code for you.
But beyond that, the CalculateDelta() method is a complete unknown here. It could be a simple on-liner that the compiler inlines for you, or it could be some enormous computation that dominates the whole loop.
There's no way for any of us to tell you if there is "any other way" to make the loop faster. For that matter, it's not even clear that the change you made makes the loop faster. Maybe it did, maybe it didn't.
3) If I make usage of a Parallel.For, would it really make it faster since I would probably need to do a syncronization lock on the x variable?
That at least depends on what CalculateDelta() is doing. If it's expensive enough, then the synchronization on x might not matter. The bigger issue is that each calculation of x depends on the previous one. It's impossible to parallelize the computation, because it's inherently a serial computation.
What you could do is compute all the deltas in parallel, since they don't depend on x (at least, they don't in the code you posted). The other element of the sum is constant (i + j for known m and n), so in the end it's just the sum of the deltas and that constant. Again, whether this is worth doing depends somewhat on how costly CalculateDelta() is. The less costly that method is, the less likely you're going to see much if any improvement by parallelizing execution of it.
One advantageous transformation is to extract the arithmetic sum of the contributions of the i and j terms using the double arithmetic series formula. This saves quite a bit of work, reducing the complexity of that portion of the calculation to O(1) from O(m*n).
x = 0;
for (i=0; i<m; i++)
{
for (j=0; j<n; j++)
{
delta = CalculateDelta(i,j);
x = x + j + i + delta;
}
}
can become
x = n * m * (n + m - 2) / 2;
for (i=0; i<m; i++)
{
for (j=0; j<n; j++)
{
x += CalculateDelta(i,j);
}
}
Optimizing further depends entirely on what CalculateDelta does, which you have not disclosed. If it has side effects, then that's a problem. But if it's a pure function (where its result is dependent only on the inputs i and j) then there's a good chance it can be computed directly as well.
the first for() will send you to the second for()
the second for() will loop till jn) and the second mn/2
I am trying to implement my own multi-layer perceptron, unfortunately i make some mistake i can't find. Link to full program is here (it is light, simple c# console application). I am learning from this book , the code I am trying rewrite from batch to sequential form is at this github.
Link to my my project is here (github).
The perceptron itself is here.
My test inputs are Xor function, and function, or function and some random function with a little noise.
My questions are:
1)
Before I covered all my code with infinity checks (for double overflow) all my results (and weights) very quickly (100+ iterations) converged to some super high values and results became NaN. After adding checks i just got double.MaxValue. Interesting part is that if i run the same program about 5 times i will get correct results (depending on number of iterations). The only random variable there are weights that are initialized using random numbers (in range -1/sqrt(n) < x < 1/sqrt(n) where n is number of neurons in hidden layer). What might be the cause of this?
2)
I am training and validating on the same data set (because it does not matter now) and because it is sequential algorithm I am shuffling training inputs and targets INSIDE my class.
public void Train(int iterations, double eta)
{
_lastHiddenUpdates = new double[_hiddenWeights.RowLength(), _hiddenWeights.ColumnLength() + 1];
_lastOutputUpdates = new double[_outputWeights.Length];
for (int i = 0; i < iterations; i++)
{
ShuffleRows(); // <---- ShuffleRows is a private method without any ref parameter!
this._currentIteration = i;
var result = ForwardPhase(_trainingInput);
BackwardsPhase(result.OutputResult, result.HiddenValues, eta);
}
}
This is inside the MultiLayerPerceptron class. The thing is that after training the original array double[] also is shuffled! array of doubles is struct and structs are passed by value, not by reference and original array is in program.cs. Why is it changed outside of scope? Am i missing something? Now i am just cloning target arrays.
3)
This is super ugly
var infinity = deltasHs[i, j];
if (double.IsNegativeInfinity(infinity))
{
deltasHs[i, j] = double.MinValue;
}
else if (double.IsPositiveInfinity(infinity))
{
deltasHs[i, j] = double.MaxValue;
}
How can I simply this?
Note: During writing this program i was not paying attention to performance, sometimes i loop many times through one array just to keep readability at reasonable level.
I also know that you should not train and validate on the same data set but that is not my goal here, i will be perfectly happy if my perceptron will learn the noise as well. I just want this stupid goose to work (and understand).
I'm refactoring my app to make it faster. I was looking for tips on doing so, and found this statement:
"ForEach can simplify the code in a For loop but it is a heavy object and is slower than a loop written using For."
Is that true? If it was true when it was written, is it still true today, or has foreach itself been refactored to improve performance?
I have the same question about this tip from the same source:
"Where possible use arrays instead of collections. Arrays are normally more efficient especially for value types. Also, initialize collections to their required size when possible."
UPDATE
I was looking for performance tips because I had a database operation that was taking several seconds.
I have found that the "using" statement is a time hog.
I completely solved my performance problem by reversing the for loop and the "using" (of course, refactoring was necessary for this to work).
The slower-than-molasses code was:
for (int i = 1; i <= googlePlex; i++) {
. . .
using (OracleCommand ocmd = new OracleCommand(insert, oc)) {
. . .
InsertRecord();
. . .
The faster-than-a-speeding-bullet code is:
using (OracleCommand ocmd = new OracleCommand(insert, oc)) {
for (int i = 1; i <= googlePlex; i++) {
. . .
InsertRecord();
. . .
Short answer:
Code that is hard to read eventually results in software that behaves and performs poorly.
Long answer:
There was a culture of micro-optimization suggestions in early .NET. Partly it was because a few Microsoft's internal tools (such as FxCop) had gained popularity among general public. Partly it was because C# had and has aspirations to be a successor to assembly, C, and C++ regarding the unhindered access to raw hardware performance in the few hottest code paths of a performance critical application. This does require more knowledge and discipline than a typical application, of course. The consequences of performance related decisions in framework code and in app code are also quite different.
The net impact of this on C# coding culture has been positive, of course; but it would be ridiculous to stop using foreach or is or "" just in order to save a couple CIL instructions that your recent jitter could probably optimize away completely if it wanted to.
There are probably very many loops in your app and probably at most one of them might be a current performance bottleneck. "Optimizing" a non-bottleck for perfomance at the expense of readability is a very bad deal.
It's true in many cases that foreach is slower than an equivalent for. It's also true that
for (int i = 0; i < myCollection.Length; i++) // Compiler must re-evaluate getter because value may have changed
is slower than
int max = myCollection.Length;
for (int i = 0; i < max; i++)
But that probably will not matter at all. For a very detailed discussion see Performance difference for control structures 'for' and 'foreach' in C#
Have you done any profiling to determine the hot spots of your application? I would be astonished if the loop management overhead is where you should be focusing your attention.
You should try profiling your code with Red Gate ANTS or something of that ilk - you will be surprised.
I found that in an application I was writing it was the parameter sniffing in SQL that took up 25% of the processing time. After writing a command cache which sniffed the params at the start of the application, there was a big speed boost.
Unless you are doing a large amount of nested for loops, I don't think you will see much of a performance benefit from changing your loops. I can't imagine anything but a real time application such as a game or a large number crunching or scientific application would need that kind of optimisation.
Yes. The classic for is a bit faster than a foreach as the iteration is index based instead of access the element of the collection thought an enumerator
static void Main()
{
const int m = 100000000;
//just to create an array
int[] array = new int[100000000];
for (int x = 0; x < array.Length; x++) {
array[x] = x;
}
var s1 = Stopwatch.StartNew();
var upperBound = array.Length;
for (int i = 0; i < upperBound; i++)
{
}
s1.Stop();
GC.Collect();
var s2 = Stopwatch.StartNew();
foreach (var item in array) {
}
s2.Stop();
Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds *
1000000) / m).ToString("0.00 ns"));
Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds *
1000000) / m).ToString("0.00 ns"));
Console.Read();
//2.49 ns
//4.68 ns
// In Release Mode
//0.39 ns
//1.05 ns
}
I don't know what overhead there is in int array lookups. Which would perform better (in C#):
a = aLookup[i];
b = (a % 6) == 5;
c = (b ? a+1 : a-1) >> 1; // (a + 1) / 2 or (a - 1) / 2
Or
a = aLookup[i];
b = bLookup[i];
c = cLookup[i];
Would an array lookup actually save that much time for either b or c?
Edit: I profiled it several ways. The result is that array lookups are almost four times faster.
It is so extremely unlikely to matter. You should go with what is most readable. And I can tell you that
c = (b ? a+1 : a-1) >> 1;
is pointless as you aren't buying any performance but your code is less readable. Just go with explicitly dividing by two.
That said, just try it for yourself in a profiler if you really care.
A:
depends on
element type
length of array
cache locality
processor affinity, L2 cache size
cache duration (or more importantly: how many times used till cache eviction?)
B:
you need to ... Profile! ( What Are Some Good .NET Profilers? )
Both are O(1) conceptually, although you have an out of bounds check with the array access.
I don't think this will be your bottleneck either way, I would go with what's more readable and shows your intend better.
also if you use reflector to check the implemention of the % operator
you will find it is extremely inefficient and not to be used in a time-critical application in high frequency code so csharp game programmers tend to avoid % and use:
while (x >= n) x -= n;
but they can make assumptions about the range of x (which are verified in debug builds)
unless you are doing 10,000+ of these per second I wouldn't worry about it
Anyone knows if multiply operator is faster than using the Math.Pow method? Like:
n * n * n
vs
Math.Pow ( n, 3 )
I just reinstalled windows so visual studio is not installed and the code is ugly
using System;
using System.Diagnostics;
public static class test{
public static void Main(string[] args){
MyTest();
PowTest();
}
static void PowTest(){
var sw = Stopwatch.StartNew();
double res = 0;
for (int i = 0; i < 333333333; i++){
res = Math.Pow(i,30); //pow(i,30)
}
Console.WriteLine("Math.Pow: " + sw.ElapsedMilliseconds + " ms: " + res);
}
static void MyTest(){
var sw = Stopwatch.StartNew();
double res = 0;
for (int i = 0; i < 333333333; i++){
res = MyPow(i,30);
}
Console.WriteLine("MyPow: " + sw.ElapsedMilliseconds + " ms: " + res);
}
static double MyPow(double num, int exp)
{
double result = 1.0;
while (exp > 0)
{
if (exp % 2 == 1)
result *= num;
exp >>= 1;
num *= num;
}
return result;
}
}
The results:
csc /o test.cs
test.exe
MyPow: 6224 ms: 4.8569351667866E+255
Math.Pow: 43350 ms: 4.8569351667866E+255
Exponentiation by squaring (see https://stackoverflow.com/questions/101439/the-most-efficient-way-to-implement-an-integer-based-power-function-powint-int) is much faster than Math.Pow in my test (my CPU is a Pentium T3200 at 2 Ghz)
EDIT: .NET version is 3.5 SP1, OS is Vista SP1 and power plan is high performance.
Basically, you should benchmark to see.
Educated Guesswork (unreliable):
In case it's not optimized to the same thing by some compiler...
It's very likely that x * x * x is faster than Math.Pow(x, 3) as Math.Pow has to deal with the problem in its general case, dealing with fractional powers and other issues, while x * x * x would just take a couple multiply instructions, so it's very likely to be faster.
A few rules of thumb from 10+ years of optimization in image processing & scientific computing:
Optimizations at an algorithmic level beat any amount of optimization at a low level. Despite the "Write the obvious, then optimize" conventional wisdom this must be done at the start. Not after.
Hand coded math operations (especially SIMD SSE+ types) will generally outperform the fully error checked, generalized inbuilt ones.
Any operation where the compiler knows beforehand what needs to be done are optimized by the compiler. These include:
1. Memory operations such as Array.Copy()
2. For loops over arrays where the array length is given. As in for (..; i<array.Length;..)
Always set unrealistic goals (if you want to).
I just happened to have tested this yesterday, then saw your question now.
On my machine, a Core 2 Duo running 1 test thread, it is faster to use multiply up to a factor of 9. At 10, Math.Pow(b, e) is faster.
However, even at a factor of 2, the results are often not identical. There are rounding errors.
Some algorithms are highly sensitive to rounding errors. I had to literally run over a million random tests until I discovered this.
This is so micro that you should probably benchmark it for specific platforms, I don't think the results for a Pentium Pro will be necessarily the same as for an ARM or Pentium II.
All in all, it's most likely to be totally irrelevant.
I checked, and Math.Pow() is defined to take two doubles. This means that it can't do repeated multiplications, but has to use a more general approach. If there were a Math.Pow(double, int), it could probably be more efficient.
That being said, the performance difference is almost certainly absolutely trivial, and so you should use whichever is clearer. Micro-optimizations like this are almost always pointless, can be introduced at virtually any time, and should be left for the end of the development process. At that point, you can check if the software is too slow, where the hot spots are, and spend your micro-optimization effort where it will actually make a difference.
Let's use the convention x^n. Let's assume n is always an integer.
For small values of n, boring multiplication will be faster, because Math.Pow (likely, implementation dependent) uses fancy algorithms to allow for n to be non-integral and/or negative.
For large values of n, Math.Pow will likely be faster, but if your library isn't very smart it will use the same algorithm, which is not ideal if you know that n is always an integer. For that you could code up an implementation of exponentiation by squaring or some other fancy algorithm.
Of course modern computers are very fast and you should probably stick to the simplest, easiest to read, least likely to be buggy method until you benchmark your program and are sure that you will get a significant speedup by using a different algorithm.
Math.Pow(x, y) is typically calculated internally as Math.Exp(Math.Log(x) * y). Evey power equation requires finding a natural log, a multiplication, and raising e to a power.
As I mentioned in my previous answer, only at a power of 10 does Math.Pow() become faster, but accuracy will be compromised if using a series of multiplications.
I disagree that handbuilt functions are always faster. The cosine functions are way faster and more accurate than anything i could write. As for pow(). I did a quick test to see how slow Math.pow() was in javascript, because Mehrdad cautioned against guesswork
for (i3 = 0; i3 < 50000; ++i3) {
for(n=0; n < 9000;n++){
x=x*Math.cos(i3);
}
}
here are the results:
Each function run 50000 times
time for 50000 Math.cos(i) calls = 8 ms
time for 50000 Math.pow(Math.cos(i),9000) calls = 21 ms
time for 50000 Math.pow(Math.cos(i),9000000) calls = 16 ms
time for 50000 homemade for loop calls 1065 ms
if you don't agree try the program at http://www.m0ose.com/javascripts/speedtests/powSpeedTest.html