When using Array.GetLength(dimension) in C#, does the size of the array actually get calculated each time it is called, or is the size cached/stored and that value just gets accessed?
What I really want to know is if setting a local variable to the length of the dimension of an array would add any efficiency if used inside a big loop or if I can just call array.GetLength() over and over w/o any speed penalty.
It is most certainly a bad idea to start caching/optimizing by yourself here.
When dealing with arrays, you have to follow a standard path that the (JIT) optimizer can recognize. If you do, not only will the Length property be cached but more important the index bounds-check can be done just once before the loop.
When the optimizer loses your trail you will pay the penalty of a per-access bounds-check.
This is why jagged arrays (int[][]) are faster than multi-dim (int[,]). The optimization for int[,] is simply missing. Up to Fx2 anyway, I didn't check the status of this in Fx4 yet.
If you want to research this further, the caching you propose is usually called 'hoisting' the Length property.
It is probably inserted at compile time if it is known then. Otherwise, stored in a variable. If it weren't, how would the size be calculated?
However, you shouldn't make assumptions about the internal operations of the framework. If you want to know if something is more or less efficient, test it!
If you really need the loop to be as fast as possible, you can store the length in a variable. This will give you a slight performance increase, some quick testing that I did shows that it's about 30% faster.
As the difference isn't bigger, it shows that the GetLength method is really fast. Unless you really need to cram the last out of the code, you should just use the method in the loop.
This goes for multidimensional arrays, only. For a single dimensional array it's actually faster to use the Length property in the loop, as the optimiser then can remove bounds checks when you use the array in the loop.
The naming convention is a clue. THe "Length" methods (e.g. Array.Length) in .net typically return a known value, while the "Count" methods (e.g. List.Count) will/may enumerate the contents of the collection to work out the number of items. (In later .nets there are extension methods like Any that allow you to check if a collection is non-empty without having to use the potentially expensive Count operation) GetLength should only differ from Length in that you can request the dimension you want the length of.
A local variable is unlikely to make any difference over a call to GetLength - the compiler will optimise most situations pretty well anyway - or you could use foreach which does not need to determine the length before it starts.
(But it would be easy to write a couple of loops and time them (with a high performance counter) to see what effect different calls/types might have on the execution speed. Doing this sort of quick test can be a great way of gaining insights into a language that you might not really take in if you just read the answers)
Related
I have two question:
1) I need some expert view in terms of witting code which will be Performance and Memory Consumption wise sound enough.
2) Performance and Memory Consumption wise how good/bad is following piece of code and why ???
Need to increment the counter that could go maximum by 100 and writing code like this:
Some Sample Code is as follows:
for(int i=0;i=100;i++)
{
Some Code
}
for(long i=0;i=1000;i++)
{
Some Code
}
how good is to use Int16 or anything else instead of int, long if the requirement is same.
Need to increment the counter that could go maximum by 100 and writing code like this:
Options given:
for(int i=0;i=100;i++)
for(long i=0;i=1000;i++)
EDIT: As noted, neither of these would even actually compile, due to the middle expression being an assignment rather than an expression of type bool.
This demonstrates a hugely important point: get your code working before you make it fast. Your two loops don't do the same thing - one has an upper bound of 1000, the other has an upper bound of 100. If you have to choose between "fast" and "correct", you almost always want to pick "correct". (There are exceptions to this, of course - but that's usually in terms of absolute correctness of results across large amounts of data, not code correctness.)
Changing between the variable types here is unlikely to make any measurable difference. That's often the case with micro-optimizations. When it comes to performance, architecture is usually much more important than in-method optimizations - and it's also a lot harder to change later on. In general, you should:
Write the cleanest code you can, using types that represent your data most correctly and simply
Determine reasonable performance requirements
Measure your clean implementation
If it doesn't perform well enough, use profiling etc to work out how to improve it
DateTime dtStart = DateTime.Now;
for(int i=0;i=10000;i++)
{
Some Code
}
response.write ((DateTime.Now - dtStart).TotalMilliseconds.ToString());
same way for Long as well and you can know which one is better... ;)
When you are doing things that require a number representing iterations, or the quantity of something, you should always use int unless you have a good semantic reason to use a different type (ie data can never be negative, or it could be bigger than 2^31). Additionally, Worrying about this sort of nano-optimization concern will basically never matter when writing c# code.
That being said, if you are wondering about the differences between things like this (incrementing a 4 byte register versus incrementing 8 bytes), you can always cosult Mr. Agner's wonderful instruction tables.
On an Amd64 machine, incrementing long takes the same amount of time as incrementing int.**
On a 32 bit x86 machine, incrementing int will take less time.
** The same is true for almost all logic and math operations, as long as the value is not both memory bound and unaligned. In .NET a long will always be aligned, so the two will always be the same.
I am looking for the most efficient way to store a collection of integers. Right now they're being stored in a HashSet<T>, but profiling has shown that these collections weigh heavily on some performance-critical code and I suspect there's a better option.
Some more details:
Random lookups must be O(1) or close to it.
The collections can grow large, so space efficiency is desirable.
The values are uniformly distributed in a 64-bit space.
Mutability is not needed.
There's no clear upper bound on size, but tens of millions of elements is not uncommon.
The most painful performance hit right now is creating them. That seems to be allocation-related - clearing and reusing HashSets helps a lot in benchmarks, but unfortunately that is not a feasible option in the application code.
(added) Implementing a data structure that's tailored to the task is fine. Is a hash table still the way to go? A trie also seems like a possibility at first glance, but I don't have any practical experience with them.
HashSet is usually the best general purpose collection in this case.
If you have any specific information about your collection you may have better options.
If you have a fixed upper bound that is not incredibly large you can use a bit vector of suitable size.
If you have a very dense collection you can instead store the missing values.
If you have very small collections, <= 4 items or so, you can store them in a regular array. A full scan of such small array may be faster than the hashing required to use the hash-set.
If you don't have any more specific characteristics of your data than "large collections of int" HashSet is the way to go.
If the size of the values is bounded you could use a bitset. It stores one bit per integer. In total the memory use would be log n bits with n being the greatest integer.
Another option is a bloom filter. Bloom filters are very compact but you have to be prepared for an occasional false positive in lookups. You can find more about them in wikipedia.
A third option is using a simle sorted array. Lookups are log n with n being the number of integers. It may be fast enough.
I decided to try and implement a special purpose hash-based set class that uses linear probing to handle collisions:
Backing store is a simple array of longs
The array is sized to be larger than the expected number of elements to be stored.
For a value's hash code, use the least-significant 31 bits.
Searching for the position of a value in the backing store is done using a basic linear probe, like so:
int FindIndex(long value)
{
var index = ((int)(value & 0x7FFFFFFF) % _storage.Length;
var slotValue = _storage[index];
if(slotValue == 0x0 || slotValue == value) return index;
for(++index; ; index++)
{
if (index == _storage.Length) index = 0;
slotValue = _storage[index];
if(slotValue == 0x0 || slotValue == value) return index;
}
}
(I was able to determine that the data being stored will never include 0, so that number is safe to use for empty slots.)
The array needs to be larger than the number of elements stored. (Load factor less than 1.) If the set is ever completely filled then FindIndex() will go into an infinite loop if it's used to search for a value that isn't already in the set. In fact, it will want to have quite a lot of empty space, otherwise search and retrieval may suffer as the data starts to form large clumps.
I'm sure there's still room for optimization, and I will may get stuck using some sort of BigArray<T> or sharding for the backing store on large sets. But initial results are promising. It performs over twice as fast as HashSet<T> at a load factor of 0.5, nearly twice as fast with a load factor of 0.8, and even at 0.9 it's still working 40% faster in my tests.
Overhead is 1 / load factor, so if those performance figures hold out in the real world then I believe it will also be more memory-efficient than HashSet<T>. I haven't done a formal analysis, but judging by the internal structure of HashSet<T> I'm pretty sure its overhead is well above 10%.
--
So I'm pretty happy with this solution, but I'm still curious if there are other possibilities. Maybe some sort of trie?
--
Epilogue: Finally got around to doing some competitive benchmarks of this vs. HashSet<T> on live data. (Before I was using synthetic test sets.) It's even beating my optimistic expectations from before. Real-world performance is turning out to be as much as 6x faster than HashSet<T>, depending on collection size.
What I would do is just create an array of integers with a sufficient enough size to handle how ever many integers you need. Is there any reason from staying away from the generic List<T>? http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx
The most painful performance hit right now is creating them...
As you've obviously observed, HashSet<T> does not have a constructor that takes a capacity argument to initialize its capacity.
One trick which I believe would work is the following:
int capacity = ... some appropriate number;
int[] items = new int[capacity];
HashSet<int> hashSet = new HashSet<int>(items);
hashSet.Clear();
...
Looking at the implementation with reflector, this will initialize the capacity to the size of the items array, ignoring the fact that this array contains duplicates. It will, however, only actually add one value (zero), so I'd assume that initializing and clearing should be reasonably efficient.
I haven't tested this so you'd have to benchmark it. And be willing to take the risk of depending on an undocumented internal implementation detail.
It would be interesting to know why Microsoft didn't provide a constructor with a capacity argument like they do for other collection types.
I recently changed
this.FieldValues = new object[2, fieldValues.GetUpperBound(1) + 1];
for (int i = 0; i < FieldCount; i++)
{
this.FieldValues[Current, i] = fieldValues[Current, i];
this.FieldValues[Original, i] = fieldValues[Original, i];
}
to
FieldValues = new object[2, fieldValues.GetLength(1)];
Array.Copy(fieldValues, FieldValues, FieldValues.Length);
Where the values of Current and Original are constants 0 and 1 respectively. FieldValues is a field and fieldValues is a parameter.
In the place I was using it, I found the Array.Copy() version to be faster. But another developer says he timed the for-loop against Array.Copy() in a standalone program and found the for-loop faster.
Is it possible that Array.Copy() is not really faster? I thought it was supposed to be super-optimised!
In my own experience, I've found that I can't trust my intuition about anything when it comes to performance. Consequently, I keep a quick-and-dirty benchmarking app around (that I call "StupidPerformanceTricks"), which I use to test these scenarios. This is invaluable, as I've made all sorts of surprising and counter-intuitive discoveries about performance tricks. It's also important to remember to run your benchmark app in release mode, without a debugger attached, as you otherwise don't get JIT optimizations, and those optimizations can make a significant difference: technique A might be slower than technique B in debug mode, but significantly faster in release mode, with optimized code.
That said, in general, my own testing experience indicates that if your array is < ~32 elements, you'll get better performance by rolling your own copy loop - presumably because you don't have the method call overhead, which can be significant. However, if the loop is larger than ~32 elements, you'll get better performance by using Array.Copy(). (If you're copying ints or floats or similar sorts of things, you might also want to investigate Buffer.BlockCopy(), which is ~10% faster than Array.Copy() for small arrays.)
But all that said, the real answer is, "Write your own tests that match these precise alternatives as closely as possible, wrap them each with a loop, give the loop enough iterations for it to chew up at least 2-3 seconds worth of CPU, and then compare the alternatives yourself."
The way .Net works under the hood, I'd guess that in an optimized situation, Array.Copy would avoid bounds checking.
If you do a loop on any type of collection, by default the CLR will check to make sure you're not passing the end of the collection, and then the JIT will either have to do a runtime assessment or emit code that doesn't need checking. (check the article in my comment for better details of this)
You can modify this behaviour, but generally you don't save that much. Unless you're in a tightly executed inner loop where every millisecond counts, that is.
If the Array is large, I'd use Array.Copy, if it's small, either should perform the same.
I do think it's bounds checking that's creating the different results for you though.
In your particular example, there is a factor that might (in theory) indicate the for loop is faster.
Array.Copy is a O(n) operation while your for loop is O(n/2), where n is the total size of you matrix.
Array.Copy needs to loop trough all the elements in your two-dimensional array because:
When copying between multidimensional arrays, the array behaves like a
long one-dimensional array, where the rows (or columns) are
conceptually laid end to end. For example, if an array has three rows
(or columns) with four elements each, copying six elements from the
beginning of the array would copy all four elements of the first row
(or column) and the first two elements of the second row (or column).
When an IEnumerable needs both to be sorted and for elements to be removed, are there advantages/drawback of performing the stages in a particular order? My performance tests appear to indicate that it's irrelevant.
A simplified (and somewhat contrived) example of what I mean is shown below:
public IEnumerable<DataItem> GetDataItems(int maximum, IComparer<DataItem> sortOrder)
{
IEnumerable<DataItem> result = this.GetDataItems();
result.Sort(sortOrder);
result.RemoveAll(item => !item.Display);
result = result.Take(maximum);
return result;
}
If your tests indicate it's irrelevant, than why worry about it? Don't optimize before you need to, only when it becomes a problem. If you find a problem with performance, and have used a profiler, and have found that that method is the hotspot, then you can worry more about it.
On second thought, have you considered using LINQ? Those calls could be replaced with a call to Where and OrderBy, both of which are deferred, and then calling Take, like you have in your example. The LINQ libraries should find the best way of doing this for you, and if your data size expands to the point where it takes a noticeable amount of time to process, you can use PLINQ with a simple call to AsParallel.
You might as well RemoveAll before sorting so that you'll have fewer elements to sort.
I think that Sort() method would usually have complexity of O(n*log(n)), and RemoveAll() just O(n), so in general it is probably better to remove items first.
You'd want something like this:
public IEnumerable<DataItem> GetDataItems(int maximum, IComparer<DataItem> sortOrder)
{
IEnumerable<DataItem> result = this.GetDataItems();
return result
.Where(item => item.Display)
.OrderBy(sortOrder)
.Take(maximum);
}
There are two answers that are correct, but won't teach you anything:
It doesn't matter.
You should probably do RemoveAll first.
The first is correct because you said your performance tests showed it's irrelevant. The second is correct because it will have an effect on larger datasets.
There's a third answer that also isn't very useful: Sometimes it's faster to do removals afterwards.
Again, it doesn't actually tell you anything, but "sometimes" always means there is more to learn.
There's also only so much value in saying "profile first". What if profiling shows that 90% of the time is spent doing x.Foo(), which it does in a loop? Is the problem with Foo(), with the loop or with both? Obviously if we can make both more efficient we should, but how do we reason about that without knowledge outside of what a profiler tells us?
When something happens over multiple items (which is true of both RemoveAll and Sort) there are five things (I'm sure there are more I'm not thinking of now) that will affect the performance impact:
The per-set constant costs (both time and memory). How much it costs to do things like calling the function that we pass a collection to, etc. These are almost always negligible, but there could be some nasty high cost hidden there (often because of a mistake).
The per-item constant costs (both time and memory). How much it costs to do something that we do on some or all of the items. Because this happens multiple times, there can be an appreciable win in improving them.
The number of items. As a rule the more items, the more the performance impact. There are exceptions (next item), but unless those exceptions apply (and we need to consider the next item to know when this is the case), then this will be important.
The complexity of the operation. Again, this is a matter of both time-complexity and memory-complexity, but here the chances that we might choose to improve one at the cost of another. I'll talk about this more below.
The number of simultaneous operations. This can be a big difference between "works on my machine" and "works on the live system". If a super time-efficient approach uses .5GB of memory is tested on a machine with 2GB of memory available, it'll work wonderfully, but when you move it to a machine with 8GB of memory available and have multiple concurrent users, it'll hit a bottleneck at 16 simultaneous operations, and suddenly what was beating other approaches in your performance measurements becomes the application's hotspot.
To talk about complexity a bit more. The time complexity is a measure of how the time taken to do something relates the number of items it is done with, while memory complexity is a measure of how the memory used relates to that same number of items. Obtaining an item from a dictionary is O(1) or constant because it takes the same amount of time however large the dictionary is (not strictly true, strictly it "approaches" O(1), but it's close enough for most thinking). Finding something in an already sorted list can be O(log2 n) or logarithmic. Filtering through a list will be linear or O(n). Sorting something using a quicksort (which is what Sort uses) tends to be linearithmic or O(n log2 n) but in its worse case - against a list already sorted - will be quadratic O(n2).
Considering these, with a set of 8 items, an O(1) operation will take 1k seconds to do something, where k is a constant amount of time, O(log2 n) means 3k seconds, O(n) means 8k, O(n log2 n) means 24k and O(n2) means 64k. These are the most commonly found though there are plenty of others like O(nm) which is affected by two different sizes, or O(n!) which would be 40320k.
Obviously, we want as low a complexity as possible, though since k will be different in each case, sometimes the best solution for a small set has a high complexity (but low k constant) though a lower-complexity case will beat it with larger input.
So. Let's go back to the cases you are considering, viz filtering followed by sorting vs. sorting followed by filtering.
Per-set constants. Since we are moving two operations around but still doing both, this will be the same either way.
Per-item constants. Again, we're still doing the same things per item in either case, so no effect.
Number of items. Filtering reduces the number of items. Therefore the sooner we filter items, the more efficient the rest of the operation. Therefore doing RemoveAll first wins in this regard.
Complexity of the operation. It's either a O(n) followed by a average-case-O(log2 n)-worse-case-O(n2), or it's an average-case-O(log2 n)-worse-case-O(n2) followed by an O(n). Same either way.
Number of simultaneous cases. Total memory pressure will be relieved the sooner we remove some items, (slight win for RemoveAll first).
So, we've got two reasons to consider RemoveAll first as likely to be more efficient and none to consider it likely to be less efficient.
We would not assume that we were 100% guaranteed to be correct here. For a start we could simply have made a mistake in our reasoning. For another, there could be other factors we've dismissed as irrelevant that were actually pertinent. It is still true that we should profile before optimising, but reasoning about the sort of things I've mentioned above will both make us more likely to write performant code in the first place (not the same as optimising; but a matter of picking between options when readability, clarity and correctness is equal either way) and makes it easier to find likely ways to improve those things that profiling has found to be troublesome.
For a slightly different but relevant case, consider if the criteria sorted on matched those removed on. E.g. if we were to sort by date and remove all items after a given date.
In this case, if the list deallocates on all removals, it'll still be O(n), but with a much smaller constant. Alternatively, if it just moved the "last-item" pointer*, it becomes O(1). Finding the pointer is O(log2 n), so here there's both reasons to consider that filtering first will be faster (the reasons given above) and that sorting first will be faster (that removal can be made a much faster operation than it was before). With this sort of case it becomes only possible to tell by extending our profiling. It is also true that the performance will be affected by the type of data sent, so we need to profile with realistic data, rather than artificial test data, and we may even find that what was the more performant choice becomes the less performant choice months later when the dataset it is used on changes. Here the ability to reason becomes even more important, because we should note the possibility that changes in real-world use may make this change in this regard, and know that it is something we need to keep an eye on throughout the project's life.
(*Note, List<T> does not just move a last-item pointer for a RemoveRange that covers the last item, but another collection could.)
It would probably be better to the RemoveAll first, although it would only make much of a difference if your sorting comparison was intensive to calculate.
Something I do often if I'm storing a bunch of string values and I want to be able to find them in O(1) time later is:
foreach (String value in someStringCollection)
{
someDictionary.Add(value, String.Empty);
}
This way, I can comfortably perform constant-time lookups on these string values later on, such as:
if (someDictionary.containsKey(someKey))
{
// etc
}
However, I feel like I'm cheating by making the value String.Empty. Is there a more appropriate .NET Collection I should be using?
If you're using .Net 3.5, try HashSet. If you're not using .Net 3.5, try C5. Otherwise your current method is ok (bool as #leppie suggests is better, or not as #JonSkeet suggests, dun dun dun!).
HashSet<string> stringSet = new HashSet<string>(someStringCollection);
if (stringSet.Contains(someString))
{
...
}
You can use HashSet<T> in .NET 3.5, else I would just stick to you current method (actually I would prefer Dictionary<string,bool> but one does not always have that luxury).
something you might want to add is an initial size to your hash. I'm not sure if C# is implemented differently than Java, but it usually has some default size, and if you add more than that, it extends the set. However a properly sized hash is important for achieving as close to O(1) as possible. The goal is to get exactly 1 entry in each bucket, without making it really huge. If you do some searching, I know there is a suggested ratio for sizing the hash table, assuming you know beforehand how many elements you will be adding. For example, something like "the hash should be sized at 1.8x the number of elements to be added" (not the real ratio, just an example).
From Wikipedia:
With a good hash function, a hash
table can typically contain about
70%–80% as many elements as it does
table slots and still perform well.
Depending on the collision resolution
mechanism, performance can begin to
suffer either gradually or
dramatically as more elements are
added. To deal with this, when the
load factor exceeds some threshold, it
is necessary to allocate a new, larger
table, and add all the contents of the
original table to this new table. In
Java's HashMap class, for example, the
default load factor threshold is 0.75.
I should probably make this a question, because I see the problem so often. What makes you think that dictionaries are O(1)? Technically, the only thing likely to be something like O(1) is access into a standard integer-indexed fixed-bound array using an integer index value (there being no look-up in arrays implemented that way).
The presumption that if it looks like an array reference it is O(1) when the "index" is a value that must be looked up somehow, however behind the scenes, means that it is not likely an O(1) scheme unless you are lucky to obtain a hash function with data that has no collisions (and probably a lot of wasted cells).
I see these questions and I even see answers that claim O(1) [not on this particular question, but I do seem them around], with no justification or explanation of what is required to make sure O(1) is actually achieved.
Hmm, I guess this is a decent question. I will do that after I post this remark here.