I am currently building a data structure which relies a lot on efficiency.
Can anyone provide me with resources on how the Find(item => item.X = myObject.Property) method actually works?
Does it iterate linearly throughout all elements until it finds the element?
And what if I know the index of myObject and I use ElementAt(index)?
Which will be the most efficient of these two please?
From the MSDN documentation on List<T>.Find
This method performs a linear search; therefore, this method is an O(n) operation, where n is Count.
I imagine that ElementAt is optimized for IList and will do a direct index. But since you're apparently using this object from the List concrete type anyway, why not just do a direct index? Like this:
var result = list[index];
If you already know the index, there is no point to searching. Just go straight to it.
Related
I recently learned about List's .ConvertAll extension. I used it a couple times in code today at work to convert a large list of my objects to a list of some other object. It seems to work really well. However I'm unsure how efficient or fast this is compared to just iterating the list and converting the object. Does .ConvertAll use anything special to speed up the conversion process or is it just a short hand way of converting Lists without having to set up a loop?
No better way to find out than to go directly to the source, literally :)
http://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs#dbcc8a668882c0db
As you can see, there's no special magic going on. It just iterates over the list and creates a new item by the converter function that you specify.
To be honest, I was not aware of this method. The more idiomatic .NET way to do this kind of projection is through the use of the Select extension method on IEnumerable<T> like so: source.Select(input => new Something(input.Name)). The advantage of this is threefold:
It's more idomatic as I said, the ConvertAll is likely a remnant of the pre-C#3.0 days. It's not a very arcane method by any means and ConvertAll is a pretty clear description, but it might still be better to stick to what other people know, which is Select.
It's available on all IEnumerable<T>, while ConvertAll only works on instances of List<T>. It doesn't matter if it's an array, a list or a dictionary, Select works with all of them.
Select is lazy. It doesn't do anything until you iterate over it. This means that it returns an IEnumerable<TOutput> which you can then convert to a list by calling ToList() or not if you don't actually need a list. Or if you just want to convert and retrieve the first two items out of a list of a million items, you can simply do source.Select(input => new Something(input.Name)).Take(2).
But if your question is purely about the performance of converting a whole list to another list, then ConvertAll is likely to be somewhat faster as it's less generic than a Select followed by a ToList (it knows that a list has a size and can directly access elements by index from the underlying array for instance).
Decompiled using ILSPy:
public List<TOutput> ConvertAll<TOutput>(Converter<T, TOutput> converter)
{
if (converter == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.converter);
}
List<TOutput> list = new List<TOutput>(this._size);
for (int i = 0; i < this._size; i++)
{
list._items[i] = converter(this._items[i]);
}
list._size = this._size;
return list;
}
Create a new list.
Populate the new list by iterating over the current instance, executing the specified delegate.
Return the new list.
Does .ConvertAll use anything special to speed up the conversion
process or is it just a short hand way of converting Lists without
having to set up a loop?
It doesn't do anything special with regards to conversion (what "special" thing could it do?) It is directly modifying the private _items and _size members, so it might be trivially faster under some circumstances.
As usual, if the solution makes you more productive, code easier to read, etc. use it until profiling reveals a compelling performance reason to not use it.
It's the second way you described it - basically a short-hand way without setting up a loop.
Here's the guts of ConvertAll():
List<TOutput> list = new List<TOutput>(this._size);
for (int index = 0; index < this._size; ++index)
list._items[index] = converter(this._items[index]);
list._size = this._size;
return list;
Where TOutput is whatever type you're converting to, and converter is a delegate indicating the method that will do the conversion.
So it loops through the List you passed in, running each element through the method you specify, and then returns a new List of the specified type.
For precise timing in your scenarios you need to measure yourself.
Do not expect any miracles - it have to be O(n) operation since each element need to be converted and added to destination list.
Consider using Enumerable.Select instead as it will do lazy evaluation that may allow avoiding second copy of large list, especially you you need to do any filtering of items along the way.
Say, I have a HashSet with elements:
HashSet<int> hsData = new HashSet<int>();
and at some point I need to process those elements (one by one). I can of course convert it into an array and work with it that way:
int[] arr = hsData.ToArray();
but I'm not sure how efficient this conversion will be?
I see that people recommend using foreach on the HashSet itself, but due to architecture of my code, I cannot use it. I need something that can work as such:
Is it the last element? If no, then get it and advance to next
element.
As you stated, converting to an array can have some performance drawbacks. What foreach does behind the scenes is get an enumerator in the HashSet and run through it.
HashSet<T> also implements IEnumerable<T>, which can be used to enumerate the collection in a much more efficient way. Look here for a reference on IEnumerable.
You can use a foreach still if you'd like. Just keep a running counter of all the elements in the given list, decrement as you do an iteration, and do a comparison against the counter.
On the other hand, is turning it into an array a big deal? Is this operation happening 1000s of times?
ToArray does not (yet) exist, but you can use CopyTo
Currently i have the following syntax (list is a list containing objects with many different properties (where Title is one of them):
for (int i=0; i < list.Count; i++)
{
if(title == list[i].Title)
{
//do something
}
}
How can i access the list[i].Title without having to loop over my entire collection? Since my list tends to grow large this can impact the performance of my program.
I am having a lot of similar syntax across my program (accessing public properties trough a for loop and by index). But im a sure there must be a better and elegant way of doing this?
The find method does seem to be a option since my list contains objects.
I Don't know what do you mean exactly, but technially speaking, this is not possible without a loop.
May be you mean using a LINQ, like for example:
list.Where(x=>x.Title == title)
It's worth mentioning that the iteration over is not skipped, but simply wrapped into the LINQ query.
Hope this helps.
EDIT
In other words if you really concerned about performance, keep coding the way you already doing. Otherwise choose LINQ for more concise and clear syntax.
Here comes Linq:
var listItem = list.Single(i => i.Title == title);
It throws an exception if there's no item matching the predicate. Alternatively, there's SingleOrDefault.
If you want a collection of items matching the title, there's:
var listItems = list.Where(i => i.Title == title);
i had to use it for a condition add if you don't need the index
using System.Linq;
use
if(list.Any(x => x.Title == title){
// do something here
}
this will tell you if any variable satisfies your given condition.
I'd suggest storing these in a Hashtable. You can then access an item in the collection using the key, it's a much more efficient lookup.
var myObjects = new Hashtable();
myObjects.Add(yourObject.Title, yourObject);
...
var myRetrievedObject = myObjects["TargetTitle"];
Consider creating an index. A dictionary can do the trick. If you need the list semantics, subclass and keep the index as a private member...
ObservableCollection is a list so if you don't know the element position you have to look at each element until you find the expected one.
Possible optimization
If your elements are sorted use a binary search to improve performances otherwise use a Dictionary as index.
You're looking for a hash based collection (like a Dictionary or Hashset) which the ObservableCollection is not. The best solution might be to derive from a hash based collection and implement INotifyCollectionChanged which will give you the same behavior as an ObservableCollection.
Well if you have N objects and you need to get the Title of all of them you have to use a loop. If you only need the title and you really want to improve this, maybe you can make a separated array containing only the title, this would improve the performance.
You need to define the amount of memory available and the amount of objects that you can handle before saying this can damage the performance, and in any case the solution would be changing the design of the program not the algorithm.
Maybe this approach would solve the problem:
int result = obsCollection.IndexOf(title);
IndexOf(T)
Searches for the specified object and returns the zero-based index of the first occurrence within the entire Collection.
(Inherited from Collection)
https://learn.microsoft.com/en-us/dotnet/api/system.collections.objectmodel.observablecollection-1?view=netframework-4.7.2#methods
An observablecollection can be a List
{
BuchungsSatz item = BuchungsListe.ToList.Find(x => x.BuchungsAuftragId == DGBuchungenAuftrag.CurrentItem.Id);
}
at the moment I'm using a List<short> as a buffer to hold things for a while while a calculation is made to each value based on other values further down the buffer. I then realised that this probably wasn't very effecient as I have been told that List<> is a linked list so every time I do whatever = myList[100]; the poor thing is having to jump down all the other nodes first to get to the value I want. I dont want to use a regular Array because I have got loads of Add() and Remove()s kicking around in other places in the code. So I need a class that inherits IList<T> but uses a regular array data structure. Does anyone know a class in .net that works this way so I dont have to write my own? I tried using ArrayList but it 'aint generic!
List<T> doesn't use a linked list implementation. Internally it uses an array, so it appears to be exactly what you need. Note that, because it's an array, Remove/insert could be an expensive operation depending on the size of the list and the position item being removed/inserted - O(n). Without knowing more about how you are using it, though, it's hard to recommend a better data structure.
Quoting from the Remarks section of the docs.
The List(T) class is the generic equivalent of the ArrayList class. It implements the IList(T) generic interface using an array whose size is dynamically increased as required.
List<T> is backed by an array, not a linked list. Indexed accesses of a List<T> happen in constant time.
In addition to tvanfosson's correct answer, if you're ever unsure of how something works internally, just load up the .NET Reflector and you can see exactly how things are implemented. In this case, drilling down to the indexer of List<T> shows us the following code:
public T this[int index]
{
get
{
if (index >= this._size)
{
ThrowHelper.ThrowArgumentOutOfRangeException();
}
return this._items[index];
}
// ...
where you can see that this._items[index] is an array of the generic type T.
No, a List<T> is a generic collection, not a linked list. If you need add and remove functionality then List<T> is the implementation most people default to.
I am using some of the LINQ select stuff to create some collections, which return IEnumerable<T>.
In my case I need a List<T>, so I am passing the result to List<T>'s constructor to create one.
I am wondering about the overhead of doing this. The items in my collections are usually in the millions, so I need to consider this.
I assume, if the IEnumerable<T> contains ValueTypes, it's the worst performance.
Am I right? What about Ref Types? Either way there is also the cost of calling, List<T>.Add a million times, right?
Any way to solve this? Like can I "overload" methods like LINQ Select using extension methods)?
No, there's no particular penalty for the element type being value types, assuming you're using IEnumerable<T> instead of IEnumerable. You won't get any boxing going on.
If you actually know the size of the result beforehand (which the result of Select probably won't) you might want to consider creating the list with that size of buffer, then using AddRange to add the values. Otherwise the list will have to resize its buffer every time it fills it.
For instance, instead of doing:
Foo[] foo = new Foo[100];
IEnumerable<string> query = foo.Select(foo => foo.Name);
List<string> queryList = new List<string>(query);
you might do:
Foo[] foo = new Foo[100];
IEnumerable<string> query = foo.Select(x => x.Name);
List<string> queryList = new List<string>(foo.Length);
queryList.AddRange(query);
You know that calling Select will produce a sequence of the same length as the original query source, but nothing in the execution environment has that information as far as I'm aware.
It would be best to avoid the need for a list. If you can keep your caller using IEnumerable<T>, you will save yourself some headaches.
LINQ's ToList() will take your enumerable, and just construct a new List<T> directly from it, using the List<T>(IEnumerable<T>) constructor. This will be the same as making the list yourself, performance wise (although LINQ does a null check, as well).
If you're adding the elements yourself, use the AddRange method instead of the Add. ToList() is very similar to AddRange (since it's using the constructor which takes IEnumerable<T>), which typically will be your best bet, performance wise, in this case.
Generally speaking, a method returning IEnumerable doesn't have to evaluate any of the items before the item is actually needed. So, theoretically, when you return an IEnumerable none of you items need to exist at that time.
So creating a list means that you will really need to evaluate items, get them and place them somewhere in memory (at least their references). There is nothing that can be done about this - if you really need to have a list.
A number of other responders have already provided ideas for how to improve the performance of copying an IEnumerable<T> into a List<T> - I don't think that much can be added on that front.
However, based on what you have described you need to do with the results, and the fact that you get rid of the list when you're done (which I presume means that the intermediate results are not interesting) - you may want to consider whether you really need to materialize a List<T>.
Rather than creating a List<T> and operating on the contents of that list - consider writing a lazy extension method for IEnumerable<T> that performs the same processing logic. I've done this myself in a number of cases, and writing such logic in C# is not so bad when using the [yield return][1] syntax supported by the compiler.
This approach works well if all you're trying to do is visit each item in the results and collection some information from it. Often, what you need to do is just visit each element in the collection on demand, do some processing with it, and then move on. This approach is generally more scalable and performant that creating a copy of the collection just to iterate over it.
Now, this advice may not work for you for other reasons, but it's worth considering as an alternative to finding the most efficient way to materialize a very large list.
Don't pass an IEnumerable to the List constructor. IEnumerable has a ToList() method, which can't possibly do worse than that, and has nicer syntax (IMHO).
That said, that only changes the answer to your question to "it depends" - in particular, it depends on what the IEnumerable actually is behind the scenes. If it happens to be a List already, then ToList will effectively be free, of course will go much faster than if it were another type. It's still not super-fast.
The best way to solve this, of course, is to try to figure out how to do your processing on an IEnumerable rather than a List. That may not be possible.
Edit: Some people in the comments are debating whether or not ToList() will actually be any faster when called on a List than if not, and whether ToList() will be any faster than the list constructor. At this point, speculating is getting pointless, so here's some code:
using System;
using System.Linq;
using System.Collections.Generic;
public static class ToListTest
{
public static int Main(string[] args)
{
List<int> intlist = new List<int>();
for (int i = 0; i < 1000000; i++)
intlist.Add(i);
IEnumerable<int> intenum = intlist;
for (int i = 0; i < 1000; i++)
{
List<int> foo = intenum.ToList();
}
return 0;
}
}
Running this code with an IEnumerable that's really a List goes about 6-10 times faster than if I replace it with a LinkedList or Stack (on my pokey 2.4 GHz P4, using Mono 1.2.6). Conceivably this could be due to some unfortunate interaction between ToList() and the particular implementations of LinkedList or Stack's enumerations, but at least the point remains: speed will depend on the underlying type of the IEnumerable. That said, even with a List as the source, it still takes 6 seconds for me to make 1000 ToList() calls, so it's far from free.
The next question is whether ToList() is any more intelligent than the List constructor. The answer to that turns out to be no: the List constructor is just as fast as ToList(). In hindsight, Jon Skeet's reasoning makes sense - I was just forgetting that ToList() was an extension method. I still (much) prefer ToList() syntactically, but there's no performance reason to use it.
So the short version is that the best answer is still "don't convert to a List if you can avoid it". Barring that, actual performance will depend drastically on what the IEnumerable actually is, but at best it'll be sluggish, as opposed to glacial. I've amended my original answer to reflect this.
From reading the various comments and the question I get the following requirements
for a collection of data you need to run through that collection, filter out some objects and then perform some transformation on the remaining objects. If thats the case you can do something like this:
var result = from item in collection
where item.Id > 10 //or some more sensible condition
select Operation(item);
and if you need to the perform more filtering and transformation you can nest your LINQ queries like
var result = from filteredItem in (from item in collection
where item.Id > 10 //or some more sensible condition
select Operation(item))
where filteredItem.SomePropertyAvailableAfterFirstTransformation == "new"
select SecondTransfomation(filteredItem);