I'm doing some Wave file handling and have them read from disk into an array of bytes. I want to quickly copy portions from this byte array into another buffer for intermediate processing. Currently I use something like this:
float[] fin;
byte[] buf;
//fill buf code omitted
for(int i=offset; i < size; i++){
fin[i-offset] = (float) buf[i];
}
I feel that this is a slow method, because there is as much computation going on in the for loop conditional and increment as there is over in the actual body. If there was a block copy avaliable in C# or some other way I can implement a block copy, that would be great.
Maybe it isn't too slow, but it sure looks like a lot of work to move some data over. Here "size" is between 2^10 and 2^14. I am then handing the "fin" off to a FFT library, so this is by no means the slowest part of the code, maybe I'm barking up the wrong tree.
EDIT UPDATE:
I realize that micro optimizations are not where someone should spend their time, and I realize that profiling is a better way to achieve speedups overall, but I know that this code is in a 'hot path' and must be completed in under a third of a second on varying end user architectures to minimize our hardware system requirements. Even though I know that the following FFT code will be much more time consuming, I am looking for speedups where I can get them.
Array.Copy sure looks nice, I didn't know about that before, and I consider this Q&A a success already!
There is also:
Array.Copy
Array.CopyTo
but whether these will be faster will require profiling.
But be warned about focusing on micro-optimisations to the extent you miss the big picture, on modern PCs the effect of multi-level memory caching is likely to be greater than one approach or another to the copy.
Edit: Quick check in reflector: both of the above methods boil down to a common native implementation (good).
Note the docs for Array.Copy cover valid type conversions, a value -> value widening conversion like byte to float should be OK.
Have a look at Array.Copy it should be faster
Since you are converting from byte to float you are not going to get any significant speedup. No Array.Copy or variation of memcopy can cope with that.
The only possible gain would be to 'poke' the byte value into a float. I don't know enough (about the implementation of float) to know if it will work and I honestly don't want to know either.
I won't reference knuth but profile your code. Put some timestamps in and measure how long things are taking. Then you can spend your time in optimization well :)
Related
I don't have a background in C/C++ or related lower-level languages and so I've never ran into pointers before. I'm a game dev working primarily in C# and I finally decided to move to an unsafe context this morning for some performance-critical sections of code (and please no "don't use unsafe" answers as I've read so many times while doing research, as it's already yielding me around 6 times the performance in certain areas, with no issues so far, plus I love the ability to do stuff like reverse arrays with no allocation). Anyhow, there's a certain situation where I expected no difference, or even a possible decrease in speed, and I'm saving a lot of ticks in reality (I'm talking about double the speed in some instances). This benefit seems to decrease with the number of iterations, which I don't fully understand.
This is the situation:
int x = 0;
for(int i = 0; i < 100; i++)
x++;
Takes, on average about 15 ticks.
EDIT: The following is unsafe code, though I assumed that was a given.
int x = 0, i = 0;
int* i_ptr;
for(i_ptr = &i; *i_ptr < 100; (*i_ptr)++)
x++;
Takes about 7 ticks, on average.
As I mentioned, I don't have a low-level background and I literally just started using pointers this morning, at least directly, so I'm probably missing quite a bit of info. So my first query is- why is the pointer more performant in this case? It isn't an isolated instance, and there are a lot of other variables of course, at that specific point in time in relation to the PC, but I'm getting these results very consistently across a lot of tests.
In my head, the operations are as such:
No pointer:
Get address of i
Get value at address
Pointer:
Get address of i_ptr
Get address of i from i_ptr
Get value at address
In my head, there must surely be more overhead, however ridiculously negligible, from using a pointer here. How is it that a pointer is consistently more performant than the direct variable in this case? These are all on the stack as well, of course, so it's not dependent on where they end up being stored, from what I can tell.
As touched on earlier, the caveat is that this bonus decreases with the number of iterations, and pretty fast. I took out the extremes from the following data to account for background interference.
At 1000 iterations, they are both identical at 30 to 34 ticks.
At 10000 iterations, the pointer is slower by about 20 ticks.
Jump up to 10000000 iterations, and the pointer is slower by about 10000 ticks or so.
My assumption is that the decrease comes from the extra step I covered earlier, given that there is an additional lookup, which brings me back to wonder why it's more performant with a pointer than without at low loop counts. At the very least, I'd assume they would be more or less identical (which they are in practice, I suppose, but a difference of 8 ticks from millions of repeated tests is pretty definitive to me) up until the very rough threshold I found somewhere between 100 and 1000 iterations.
Apologies if I'm nitpicking somewhat, or if this is a poor question, but I feel as though it will be beneficial to know exactly what is going on under the hood. And if nothing else, I think it's pretty interesting!
Some users suggested that the test results were most likely due to measurement inaccuracies, and it would seem as such, at least upto a point. When averaged across ten million continuous tests, the mean of both is typically equal, though in some cases the use of pointers averages out to an extra tick. Interestingly, when testing as a single case, the use of pointers has a consistently lower execution time than without. There are of course a lot of additional variables at play at the specific points in time at which a test is tried, which makes it somewhat of a pointless pursuit to track this down any further. But the result is that I've learned some more about pointers, which was my primary goal, and so I'm pleased with the test.
I have two question:
1) I need some expert view in terms of witting code which will be Performance and Memory Consumption wise sound enough.
2) Performance and Memory Consumption wise how good/bad is following piece of code and why ???
Need to increment the counter that could go maximum by 100 and writing code like this:
Some Sample Code is as follows:
for(int i=0;i=100;i++)
{
Some Code
}
for(long i=0;i=1000;i++)
{
Some Code
}
how good is to use Int16 or anything else instead of int, long if the requirement is same.
Need to increment the counter that could go maximum by 100 and writing code like this:
Options given:
for(int i=0;i=100;i++)
for(long i=0;i=1000;i++)
EDIT: As noted, neither of these would even actually compile, due to the middle expression being an assignment rather than an expression of type bool.
This demonstrates a hugely important point: get your code working before you make it fast. Your two loops don't do the same thing - one has an upper bound of 1000, the other has an upper bound of 100. If you have to choose between "fast" and "correct", you almost always want to pick "correct". (There are exceptions to this, of course - but that's usually in terms of absolute correctness of results across large amounts of data, not code correctness.)
Changing between the variable types here is unlikely to make any measurable difference. That's often the case with micro-optimizations. When it comes to performance, architecture is usually much more important than in-method optimizations - and it's also a lot harder to change later on. In general, you should:
Write the cleanest code you can, using types that represent your data most correctly and simply
Determine reasonable performance requirements
Measure your clean implementation
If it doesn't perform well enough, use profiling etc to work out how to improve it
DateTime dtStart = DateTime.Now;
for(int i=0;i=10000;i++)
{
Some Code
}
response.write ((DateTime.Now - dtStart).TotalMilliseconds.ToString());
same way for Long as well and you can know which one is better... ;)
When you are doing things that require a number representing iterations, or the quantity of something, you should always use int unless you have a good semantic reason to use a different type (ie data can never be negative, or it could be bigger than 2^31). Additionally, Worrying about this sort of nano-optimization concern will basically never matter when writing c# code.
That being said, if you are wondering about the differences between things like this (incrementing a 4 byte register versus incrementing 8 bytes), you can always cosult Mr. Agner's wonderful instruction tables.
On an Amd64 machine, incrementing long takes the same amount of time as incrementing int.**
On a 32 bit x86 machine, incrementing int will take less time.
** The same is true for almost all logic and math operations, as long as the value is not both memory bound and unaligned. In .NET a long will always be aligned, so the two will always be the same.
I recently changed
this.FieldValues = new object[2, fieldValues.GetUpperBound(1) + 1];
for (int i = 0; i < FieldCount; i++)
{
this.FieldValues[Current, i] = fieldValues[Current, i];
this.FieldValues[Original, i] = fieldValues[Original, i];
}
to
FieldValues = new object[2, fieldValues.GetLength(1)];
Array.Copy(fieldValues, FieldValues, FieldValues.Length);
Where the values of Current and Original are constants 0 and 1 respectively. FieldValues is a field and fieldValues is a parameter.
In the place I was using it, I found the Array.Copy() version to be faster. But another developer says he timed the for-loop against Array.Copy() in a standalone program and found the for-loop faster.
Is it possible that Array.Copy() is not really faster? I thought it was supposed to be super-optimised!
In my own experience, I've found that I can't trust my intuition about anything when it comes to performance. Consequently, I keep a quick-and-dirty benchmarking app around (that I call "StupidPerformanceTricks"), which I use to test these scenarios. This is invaluable, as I've made all sorts of surprising and counter-intuitive discoveries about performance tricks. It's also important to remember to run your benchmark app in release mode, without a debugger attached, as you otherwise don't get JIT optimizations, and those optimizations can make a significant difference: technique A might be slower than technique B in debug mode, but significantly faster in release mode, with optimized code.
That said, in general, my own testing experience indicates that if your array is < ~32 elements, you'll get better performance by rolling your own copy loop - presumably because you don't have the method call overhead, which can be significant. However, if the loop is larger than ~32 elements, you'll get better performance by using Array.Copy(). (If you're copying ints or floats or similar sorts of things, you might also want to investigate Buffer.BlockCopy(), which is ~10% faster than Array.Copy() for small arrays.)
But all that said, the real answer is, "Write your own tests that match these precise alternatives as closely as possible, wrap them each with a loop, give the loop enough iterations for it to chew up at least 2-3 seconds worth of CPU, and then compare the alternatives yourself."
The way .Net works under the hood, I'd guess that in an optimized situation, Array.Copy would avoid bounds checking.
If you do a loop on any type of collection, by default the CLR will check to make sure you're not passing the end of the collection, and then the JIT will either have to do a runtime assessment or emit code that doesn't need checking. (check the article in my comment for better details of this)
You can modify this behaviour, but generally you don't save that much. Unless you're in a tightly executed inner loop where every millisecond counts, that is.
If the Array is large, I'd use Array.Copy, if it's small, either should perform the same.
I do think it's bounds checking that's creating the different results for you though.
In your particular example, there is a factor that might (in theory) indicate the for loop is faster.
Array.Copy is a O(n) operation while your for loop is O(n/2), where n is the total size of you matrix.
Array.Copy needs to loop trough all the elements in your two-dimensional array because:
When copying between multidimensional arrays, the array behaves like a
long one-dimensional array, where the rows (or columns) are
conceptually laid end to end. For example, if an array has three rows
(or columns) with four elements each, copying six elements from the
beginning of the array would copy all four elements of the first row
(or column) and the first two elements of the second row (or column).
I have an object model that I use to fill results from a query and that I then pass along to a gridview.
Something like this:
public class MyObjectModel
{
public int Variable1 {get;set;}
public int VariableN {get;set;}
}
Let's say variable1 holds the value of a count and I know that the count will never get to become very large (ie. number of upcoming appointments for a certain day). For now, I've put these data types as int. Let's say it's safe to say that someone will book less than 255 appointments per day. Will changing the datatype from int to byte affect performance much? Is it worth the trouble?
Thanks
No, performance will not be affected much at all.
For each int you will be saving 3 bytes, or 6 in total for the specific example. Unless you have many millions of these, the savings in memory are very small.
Not worth the trouble.
Edit:
Just to clarify - my answer is specifically about the example code. In many cases the choices will make a difference, but it is a matter of scale and will require performance testing to ensure correct results.
To answer #Filip's comment - There is a difference between compiling an application to 64bit and selecting an isolated data type.
Using a integer variable smaller than an int (System.Int32) will not provide any performance benefits. This is because most integer operations in the CLR will promote the variable to an int prior to performing the operation. int is considered the "natural" integer size on the systems for which the CLR was developed.
Consider the following code:
for (byte appointmentIndex = 0; appointmentIndex < Variable1; appointmentIndex++)
ProcessAppointment(appointmentIndex);
In the compiled code, the comparison (appointmentIndex < Variable1) and the increment (appointmentIndex++) will (most likely) be performed using 32-bit integers. Even if the optimizer uses a smaller data type, the CPU itself will require additional work to use the smaller data type.
If you are storing an array of values, then using a smaller data type could help save space, which might give a performance advantage in some scenerios.
It will affect the amount of memory allocated for that variable. In my personal opinion, I don't think it's worth the trouble in the example case.
If there were a huge number of variables, or a database table where you could really save, then yes, but not in this case.
Besides, after years of maintenance programming, I can safely say that it's rarely safe to assume an upper limit on anything. if there's even a remote chance that some poor maintenance programmer is going to have to re-write the app because of trying to save a trivial amount of resources, it's not worth the pay-off.
The .NET runtime optimizes the use of Int32 especially for counters etc.
.NET Integer vs Int16?
Contrary to popular belief, making your data type smaller does not make access faster. In fact, it's slower. Look at bool, it's implemented as an int.
This is because internally, your CPU works with native-word-sized registers (32/64 bit these days), and you're forcing it to convert your data back and forth for no reason (well only when writing the result in memory, but it's still a penalty you could easily avoid).
Fiddling with integer widths only affects memory access, and caching specifically. This is the kind of stuff you can only figure out by profiling your application and looking at page fault counters in particular.
I agree with the other answers that performance won't be worth it. But if you're going to do it at all, go with a short instead of a byte. My rule of thumb is to pick the highest number you can imagine, multiply by 10, then use that as the basis to pick your value. So if you can't possibly imagine a value higher than 200, then use 2000 as your basis, which would mean you'd need a short.
When using Array.GetLength(dimension) in C#, does the size of the array actually get calculated each time it is called, or is the size cached/stored and that value just gets accessed?
What I really want to know is if setting a local variable to the length of the dimension of an array would add any efficiency if used inside a big loop or if I can just call array.GetLength() over and over w/o any speed penalty.
It is most certainly a bad idea to start caching/optimizing by yourself here.
When dealing with arrays, you have to follow a standard path that the (JIT) optimizer can recognize. If you do, not only will the Length property be cached but more important the index bounds-check can be done just once before the loop.
When the optimizer loses your trail you will pay the penalty of a per-access bounds-check.
This is why jagged arrays (int[][]) are faster than multi-dim (int[,]). The optimization for int[,] is simply missing. Up to Fx2 anyway, I didn't check the status of this in Fx4 yet.
If you want to research this further, the caching you propose is usually called 'hoisting' the Length property.
It is probably inserted at compile time if it is known then. Otherwise, stored in a variable. If it weren't, how would the size be calculated?
However, you shouldn't make assumptions about the internal operations of the framework. If you want to know if something is more or less efficient, test it!
If you really need the loop to be as fast as possible, you can store the length in a variable. This will give you a slight performance increase, some quick testing that I did shows that it's about 30% faster.
As the difference isn't bigger, it shows that the GetLength method is really fast. Unless you really need to cram the last out of the code, you should just use the method in the loop.
This goes for multidimensional arrays, only. For a single dimensional array it's actually faster to use the Length property in the loop, as the optimiser then can remove bounds checks when you use the array in the loop.
The naming convention is a clue. THe "Length" methods (e.g. Array.Length) in .net typically return a known value, while the "Count" methods (e.g. List.Count) will/may enumerate the contents of the collection to work out the number of items. (In later .nets there are extension methods like Any that allow you to check if a collection is non-empty without having to use the potentially expensive Count operation) GetLength should only differ from Length in that you can request the dimension you want the length of.
A local variable is unlikely to make any difference over a call to GetLength - the compiler will optimise most situations pretty well anyway - or you could use foreach which does not need to determine the length before it starts.
(But it would be easy to write a couple of loops and time them (with a high performance counter) to see what effect different calls/types might have on the execution speed. Doing this sort of quick test can be a great way of gaining insights into a language that you might not really take in if you just read the answers)