This is a theoretical example but hopefully it highlights my question:
Let's say I have a master list of Item objects, and an Item has two properties, say Weight and Value.
The program will very frequently be required to sort by Weight and get the lightest Item whilst elsewhere it is being sorted by Value and getting the most expensive Item.
The master list has the potential to be very large, so it would be lots of unnessary work to keep sorting the master list over and over. To save time is it possible to store the sorted result as its own list? Will these other lists simply store pointers to the real objects and not just store them again?
This depends on whether Item is a struct or a class. If it is a class (as would be the sensible default), then both lists only contain references to the objects - there will be no duplication of all the Weight / Value values. If it is a struct, then all the values will be duplicated, as each will have a separate backing vector, and the actual structs will be in the vector. Side note: if the values are strings, then note that strings are also reference types, so the string contents won't be duplicated (unless they were created separately, without any pseudo-interning, etc).
Will these other lists simply store pointers references to the real objects and not just store them again?
As long as Item is a class and not a struct: yes.
as we can read here classes are reference types, which means they won't be copied to the new list
you can indeed keep two lists and keep them sorted one by price and the other by Weight and thus make life easier for you
By creating a Dictionary<int,int> or List<KeyValuePair<int,int>> I can create a list of related ids.
By calling collection[key] I can return the corresponding value stored against it.
I also want to be able to return the key by passing in a value - which I know is possible using some LINQ, however it doesn't seem very efficient.
In my case along with each key being unique, each value is too. Does this fact make it possible to use another approach which will provide better performance?
It sounds like you need a bi-directional dictionary. There are no framework classes that support this, but you can implement your own:
Bidirectional 1 to 1 Dictionary in C#
You could encapsulate two dictionaries, one with your "keys" storing your values and the other keyed with your "values" storing your keys.
Then manage access to them through a few methods. Fast and the added memory overhead shouldn't make a huge difference.
Edit: just noticed this is essentially the same as the previous answer :-/
I have a ILookup<TKey,TElement> lookup from which I fairly often get elements and iterate trough them using LINQ or foreach. I look up like this IEnumerable<TElement> results = lookup[key];.
Thus, results needs to be enumerated at least once every time I use lookup results (and even more if I'm iterating multiple times if I don't use .ToList() first).
Even though its not as "clean", wouldn't it be better (performance-wise) to use a Dictionary<TKey,List<TElement>>, so that all results from a key are only enumerated on construction of the dictionary? Just how taxing is ToList()?
ToLookup, like all the other ToXXX LINQ methods, uses immediate execution. The resulting object has no reference to the original source. It effectively does build a Dictionary<TKey, List<TElement>> - not those exact types, perhaps, but equivalent to it.
Note that there's a difference though, which may or may not be useful to you - the indexer for a lookup returns an empty sequence if you give it a key which doesn't exist, rather than throwing an exception. That can make life much easier if you want to be able to just index it by any key and iterate over the corresponding values.
Also note that although it's not explicitly documented, the implementation used for the value sequences does implement ICollection<T>, so calling the LINQ Count() method is O(1) - it doesn't need to iterate over all the elements.
See my Edulinq post on ToLookup for more details.
Assuming the implementation is System.Linq.Lookup (does ILookup have any other implementations?), the elements presented in lookup[key] are stored in an array of elements as a field of System.Linq.Lookup.Grouping. Repeatedly looking them up won't cause a re-iteration of source. Of course, rebuilding the Lookup will be more costly, but once built, the source is no longer accessed.
I have a List<IAgStkObject>. Each IAgStkObject has a property called InstanceName. How can I search through my List to find if any of the contained IAgStkObject(s) have a particular InstanceName? In the past I would have used a foreach loop.. but this seems too slow.
WulfgarPro
If the only thing you have is a List (not ordered by InstanceName), there is no faster way (if you do similar tests often, you can preprocess the data and create e.g. a Dictionary indexed by the InstanceName).
The only way different from “the past” would be those useful extension methods allowing you to write just
return myList.Any(item => item.InstanceName == "Searched name");
If the list is sorted by the InstanceName, you can use binary search algorithm, otherwise: no.
You would have to use some more advanced data structure (like the sorted list or dictionary). I think dictionary would be the solution for this. It is very fast and easy to use.
But think: how many of the objects do you have? Are you sure looping through them is performance issue? If you have < 1000 of the objects, you absolutely don't have to worry (unless you want to do something in real time).
You can use Linq:
list.Any(o => o.InstanceName == "something")
But you cannot avoid looping through the list (in the Linq case it's done implicitly). If you want a performance gain, change your data structure. Maybe a dictionary (InstanceName -> IAgStkObject) is appropriate?
I've always been told that adding an element to an array happens like this:
An empty copy of the array+1element is
created and then the data from the
original array is copied into it then
the new data for the new element is
then loaded
If this is true, then using an array within a scenario that requires a lot of element activity is contra-indicated due to memory and CPU utilization, correct?
If that is the case, shouldn't you try to avoid using an array as much as possible when you will be adding a lot of elements? Should you use iStringMap instead? If so, what happens if you need more than two dimensions AND need to add a lot of element additions. Do you just take the performance hit or is there something else that should be used?
Look at the generic List<T> as a replacement for arrays. They support most of the same things arrays do, including allocating an initial storage size if you want.
This really depends on what you mean by "add."
If you mean:
T[] array;
int i;
T value;
...
if (i >= 0 && i <= array.Length)
array[i] = value;
Then, no, this does not create a new array, and is in-fact the fastest way to alter any kind of IList in .NET.
If, however, you're using something like ArrayList, List, Collection, etc. then calling the "Add" method may create a new array -- but they are smart about it, they don't just resize by 1 element, they grow geometrically, so if you're adding lots of values only every once in a while will it have to allocate a new array. Even then, you can use the "Capacity" property to force it to grow before hand, if you know how many elements you're adding (list.Capacity += numberOfAddedElements)
In general, I prefer to avoid array usage. Just use List<T>. It uses a dynamically-sized array internally, and is fast enough for most usage. If you're using multi-dimentional arrays, use List<List<List<T>>> if you have to. It's not that much worse in terms of memory, and is much simpler to add items to.
If you're in the 0.1% of usage that requires extreme speed, make sure it's your list accesses that are really the problem before you try to optimize it.
If you're going to be adding/removing elements a lot, just use a List. If it's multidimensional, you can always use a List<List<int>> or something.
On the other hand, lists are less efficient than arrays if what you're mostly doing is traversing the list, because arrays are all in one place in your CPU cache, where objects in a list are scattered all over the place.
If you want to use an array for efficient reading but you're going to be "adding" elements frequently, you have two main options:
1) Generate it as a List (or List of Lists) and then use ToArray() to turn it into an efficient array structure.
2) Allocate the array to be larger than you need, then put the objects into the pre-allocated cells. If you end up needing even more elements than you pre-allocated, you can just reallocate the array when it fills, doubling the size each time. This gives O(log n) resizing performance instead of O(n) like it would be with a reallocate-once-per-add array. Note that this is pretty much how StringBuilder works, giving you a faster way to continually append to a string.
When to abandon the use of arrays
First and foremost, when semantics of arrays dont match with your intent - Need a dynamically growing collection? A set which doesn't allow duplicates? A collection that has to remain immutable? Avoid arrays in all that cases. That's 99% of the cases. Just stating the obvious basic point.
Secondly, when you are not coding for absolute performance criticalness - That's about 95% of the cases. Arrays perform better marginally, especially in iteration. It almost always never matter.
When you're not forced by an argument with params keyword - I just wished params accepted any IEnumerable<T> or even better a language construct itself to denote a sequence (and not a framework type).
When you are not writing legacy code, or dealing with interop
In short, its very rare that you would actually need an array. I will add as to why may one avoid it?
The biggest reason to avoid arrays imo is conceptual. Arrays are closer to implementation and farther from abstraction. Arrays conveys more how it is done than what is done which is against the spirit of high level languages. That's not surprising, considering arrays are closer to the metal, they are straight out of a special type (though internally array is a class). Not to be pedagogical, but arrays really do translate to a semantic meaning very very rarely required. The most useful and frequent semantics are that of a collections with any entries, sets with distinct items, key value maps etc with any combination of addable, readonly, immutable, order-respecting variants. Think about this, you might want an addable collection, or readonly collection with predefined items with no further modification, but how often does your logic look like "I want a dynamically addable collection but only a fixed number of them and they should be modifiable too"? Very rare I would say.
Array was designed during pre-generics era and it mimics genericity with lot of run time hacks and it will show its oddities here and there. Some of the catches I found:
Broken covariance.
string[] strings = ...
object[] objects = strings;
objects[0] = 1; //compiles, but gives a runtime exception.
Arrays can give you reference to a struct!. That's unlike anywhere else. A sample:
struct Value { public int mutable; }
var array = new[] { new Value() };
array[0].mutable = 1; //<-- compiles !
//a List<Value>[0].mutable = 1; doesnt compile since editing a copy makes no sense
print array[0].mutable // 1, expected or unexpected? confusing surely
Run time implemented methods like ICollection<T>.Contains can be different for structs and classes. It's not a big deal, but if you forget to override non generic Equals correctly for reference types expecting generic collection to look for generic Equals, you will get incorrect results.
public class Class : IEquatable<Class>
{
public bool Equals(Class other)
{
Console.WriteLine("generic");
return true;
}
public override bool Equals(object obj)
{
Console.WriteLine("non generic");
return true;
}
}
public struct Struct : IEquatable<Struct>
{
public bool Equals(Struct other)
{
Console.WriteLine("generic");
return true;
}
public override bool Equals(object obj)
{
Console.WriteLine("non generic");
return true;
}
}
class[].Contains(test); //prints "non generic"
struct[].Contains(test); //prints "generic"
The Length property and [] indexer on T[] seem to be regular properties that you can access through reflection (which should involve some magic), but when it comes to expression trees you have to spit out the exact same code the compiler does. There are ArrayLength and ArrayIndex methods to do that separately. One such question here. Another example:
Expression<Func<string>> e = () => new[] { "a" }[0];
//e.Body.NodeType == ExpressionType.ArrayIndex
Expression<Func<string>> e = () => new List<string>() { "a" }[0];
//e.Body.NodeType == ExpressionType.Call;
Yet another one. string[].IsReadOnly returns false, but if you are casting, IList<string>.IsReadOnly returns true.
Type checking gone wrong: (object)new ConsoleColor[0] is int[] returns true, whereas new ConsoleColor[0] is int[] returns false. Same is true for uint[] and int[] comparisons. No such problems if you use any other collection types.
How to abandon the use of arrays.
The most commonly used substitute is List<T> which has a cleaner API. But it is a dynamically growing structure which means you can add to a List<T> at the end or insert anywhere to any capacity. There is no substitute for the exact behaviour of an array, but people mostly use arrays as readonly collection where you can't add anything to its end. A substitute is ReadOnlyCollection<T>.
When the array is resized, a new array must be allocated, and the contents copied. If you are only modifying the contents of the array, it is just a memory assignment.
So, you should not use arrays when you don't know the size of the array, or the size is likely to change. However, if you have a fixed length array, they are an easy way of retrieving elements by index.
ArrayList and List grow the array by more than one when needed (I think it's by doubling the size, but I haven't checked the source). They are generally the best choice when you are building a dynamically sized array.
When your benchmarks indicate that array resize is seriously slowing down your application (remember - premature optimization is the root of all evil), you can evaluate writing a custom array class with tweaked resizing behavior.
Generally, if you must have the BEST indexed lookup performance it's best to build a List first and then turn it into a array thus paying a small penalty at first but avoiding any later. If the issue is that you will be continually adding new data and removing old data then you may want to use a ArrayList or List for convenience but keep in mind that they are just special case Arrays. When they "grow" they allocate a completely new array and copy everything into it which is extremely slow.
ArrayList is just an Array which grows when needed.
Add is amortized O(1), just be careful to make sure the resize won't happen at a bad time.
Insert is O(n) all items to the right must be moved over.
Remove is O(n) all items to the right must be moved over.
Also important to keep in mind that List is not a linked list. It's just a typed ArrayList. The List documentation does note that it performs better in most cases but does not say why.
The best thing to do is to pick a data structure which is appropriate to your problem. This depends one a LOT of things and so you may want to browse the System.Collections.Generic Namespace.
In this particular case I would say that if you can come up with a good key value Dictionary would be your best bet. It has insert and remove that approaches O(1). However, even with a Dictionary you have to be careful not to let it resize it's internal array (an O(n) operation). It's best to give them a lot of room by specifying a larger-then-you-expect-to-use initial capacity in the constructor.
-Rick
A standard array should be defined with a length, which reserves all of the memory that it needs in a contiguous block. Adding an item to the array would put it inside of the block of already reserved memory.
Arrays are great for few writes and many reads, particularly those of an iterative nature - for anything else, use one of the many other data structures.
You are correct an array is great for look ups. However modifications to the size of the array are costly.
You should use a container that supports incremental size adjustments in the scenario where you're modifying the size of the array. You could use an ArrayList which allows you to set the initial size, and you could continually check the size versus the capacity and then increment the capacity by a large chunk to limit the number of resizes.
Or you could just use a linked list. Then however look ups are slow...
If I think I'm going to be adding items to the collection a lot over its lifetime, than I'll use a List. If I know for sure what the size of the collection will be when its declared, then I'll use an array.
Another time I generally use an array over a List is when I need to return a collection as a property of an object - I don't want callers adding items that collection via List's Add methods, but instead want them to add items to the collection via my object's interface. In that case, I'll take the internal List and call ToArray and return an array.
If you are going to be doing a lot of adding, and you will not be doing random access (such as myArray[i]). You could consider using a linked list (LinkedList<T>), because it will never have to "grow" like the List<T> implementation. Keep in mind, though, that you can only really access items in a LinkedList<T> implementation using the IEnumerable<T> interface.
The best thing you can do is to allocate as much memory as you need upfront if possible. This will prevent .NET from having to make additional calls to get memory on the heap. Failing that then it makes sense to allocate in chunks of five or whatever number makes sense for your application.
This is a rule you can apply to anything really.