So, I've found a piece of code like this:
class CustomDictionary
{
Dictionary<string, string> backing;
...
public string Get(int index)
{
return backing.ElementAtOrDefault(index); //use linq extensions on IEnumerable
}
}
And then this was used like so:
for(int i=0;i<mydictionary.Count;i++)
{
var value=mydictionary.Get(i);
}
Aside from the performance problems and uglyness of doing it this way, is this code actually correct? Ie, is the IEnumerable on Dictionary guaranteed to always return things in the same order assuming that nothing is modified with the dictionary during the iteration?
This is NOT guaranteed. It is for a SortedDictionary<>, of course, and also for arrays and lists. But NOT for a Dictionary.
Chances are, it will be stable if the dictionary is not changed - but it's just not guaranteed. You have to ask yourself - do you feel lucky? ;)
If you want to get the elements by the order they were inserted then you should probably look into the Stack and Queue depending on what elements you want first.
Yes you'll get the same items.
As you specified, the method you presented is very inefficient way to do so.
ElementAtOrDefault is a LINQ extension method for IEnumerable, means for each item it will iterate all way to the specified item.
Related
Could the order of items in list1 and list2 be different when
list2 = list1.ToList() and both are of type List?
If list1 is consistent in its ordering, then list2 will be in the same order.
It's possible that list1 is some type that doesn't itself promise to have the same order every time it is enumerated, in which case of course the two might differ, but it is the enumeration logic of list1 that is responsible for that, not ToList(). (The name of list1 suggests that it is itself a list, in which case the orders would certainly be the same).
One answer here already includes the source of one of the implementations of ToList(). It is not the only version of ToList() that exists, and corefx optimises for many more cases than netfx does, but it remains that all versions produce the list in the same order as they source would deliver them on enumeration.
Another answer says that this is not guaranteed in the documentation, only by the description of the overload of the List<T> constructor that takes an enumeration (which is not, incidentally the only constructor used by all implementations of ToList() in all cases).
However, a change to ToList() that did not promise to maintain the order would not be accepted.
Consider the case of someSource.OrderBy(x => x.ID).ToList(). In such a case (which incidentally, is a case that is optimised in corefx) if ToList() could change the order it would obviously remove the point of the OrderBy().
Okay, so what if someone changed ToList() in a way that didn't promise to maintain order, but treated OrderBy() as a special case? (After all, it's already a special case for performance reasons in one version). Well, that would still break say someSource.OrderBy(x => x.ID).Where(x => !x.Deleted).ToList(). In all, if we had a version of ToList() that didn't maintain order we'd be able to come up with some sort of linq query where a given order was promised by another part of the query and such an implementation of ToList() broke the promise of the query as a whole.
So, barring special-casing a source that explicitly doesn't promise to maintain order (ParallelEnumerable doesn't unless you use AsOrdered(), since there are a lot of advantages of not maintaining an order unless really necessary when it comes to parallel processing) we can't make a change to ToList() that doesn't maintain order without breaking the promises of linq queries as a whole.
So while the guarantee isn't called out in the documentation of ToList(), it is nevertheless guaranteed and will not be changed in a later version.
The general answer is No, order is not gauranteed to be preserved even if both lists are type of List.
Because List is not a sealed class. Another class could derive from it and override GetEnumerable and possibly return items out of order.
Sounds strange, yes. But its possible. So you cant say ToList will return exact same list unless they are both concrete type of List and not of any derived type.
The other answer says that its implementation detail that could change in future. I dont think so. List is very essential part of .net collections. Such a unreasanable breaking change is very unlikey.
Dont worry, as long as you use concrete List order is always preserved.
The simple answer is no, ToList will just loop over the source enumerable and keep the same order. List<T> guarantees order, so calling ToList on it won't change it.
The more nuanced answer however is that you may not be starting with a List<T> and may have an more general IEnumerable<T> which does not guarantee order at all This means that multiple calls to source.ToList() may produce different outputs.
In practice however, virtually all implementations of IEnumerable will preserve order.
For starters: it's safe to say everybody expects that. But why?
According to the documentation, the constructor of List<T> that takes an IEnumerable<T> is guaranteeing the order is preserved:
The elements are copied onto the List in the same order they are read by the enumerator of the collection.
While the documentation of .ToList() makes no such promises (doesn't say anything to the contrary either though).
Internally, one uses the other, so you are safe, but you are not guaranteed to be safe should the internal implementation of .ToList() change. So if you want to be sure, you should call new List(oldList); directly.
Smallprint: if you are nit picky about it... I could not find a guarantee that the IEnumerable<T> interface would return the elements of a list in order either. So both ways, you have to look at what is, and if you need to rely on it, maybe write some unit tests asserting this behavior so you get notified immediately should the current behavior change.
There should be no different. Check the source code out.
public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
return new List<TSource>(source);
}
And the part when it creates a list
// Constructs a List, copying the contents of the given collection. The
// size and capacity of the new list will both be equal to the size of the
// given collection.
//
public List(IEnumerable<T> collection) {
if (collection==null)
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
Contract.EndContractBlock();
ICollection<T> c = collection as ICollection<T>;
if( c != null) {
int count = c.Count;
if (count == 0)
{
_items = _emptyArray;
}
else {
_items = new T[count];
c.CopyTo(_items, 0);
_size = count;
}
}
else {
_size = 0;
_items = _emptyArray;
// This enumerable could be empty. Let Add allocate a new array, if needed.
// Note it will also go to _defaultCapacity first, not 1, then 2, etc.
using(IEnumerator<T> en = collection.GetEnumerator()) {
while(en.MoveNext()) {
Add(en.Current);
}
}
}
}
https://github.com/Microsoft/referencesource/blob/master/System.Core/System/Linq/Enumerable.cs
I recently learned about List's .ConvertAll extension. I used it a couple times in code today at work to convert a large list of my objects to a list of some other object. It seems to work really well. However I'm unsure how efficient or fast this is compared to just iterating the list and converting the object. Does .ConvertAll use anything special to speed up the conversion process or is it just a short hand way of converting Lists without having to set up a loop?
No better way to find out than to go directly to the source, literally :)
http://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs#dbcc8a668882c0db
As you can see, there's no special magic going on. It just iterates over the list and creates a new item by the converter function that you specify.
To be honest, I was not aware of this method. The more idiomatic .NET way to do this kind of projection is through the use of the Select extension method on IEnumerable<T> like so: source.Select(input => new Something(input.Name)). The advantage of this is threefold:
It's more idomatic as I said, the ConvertAll is likely a remnant of the pre-C#3.0 days. It's not a very arcane method by any means and ConvertAll is a pretty clear description, but it might still be better to stick to what other people know, which is Select.
It's available on all IEnumerable<T>, while ConvertAll only works on instances of List<T>. It doesn't matter if it's an array, a list or a dictionary, Select works with all of them.
Select is lazy. It doesn't do anything until you iterate over it. This means that it returns an IEnumerable<TOutput> which you can then convert to a list by calling ToList() or not if you don't actually need a list. Or if you just want to convert and retrieve the first two items out of a list of a million items, you can simply do source.Select(input => new Something(input.Name)).Take(2).
But if your question is purely about the performance of converting a whole list to another list, then ConvertAll is likely to be somewhat faster as it's less generic than a Select followed by a ToList (it knows that a list has a size and can directly access elements by index from the underlying array for instance).
Decompiled using ILSPy:
public List<TOutput> ConvertAll<TOutput>(Converter<T, TOutput> converter)
{
if (converter == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.converter);
}
List<TOutput> list = new List<TOutput>(this._size);
for (int i = 0; i < this._size; i++)
{
list._items[i] = converter(this._items[i]);
}
list._size = this._size;
return list;
}
Create a new list.
Populate the new list by iterating over the current instance, executing the specified delegate.
Return the new list.
Does .ConvertAll use anything special to speed up the conversion
process or is it just a short hand way of converting Lists without
having to set up a loop?
It doesn't do anything special with regards to conversion (what "special" thing could it do?) It is directly modifying the private _items and _size members, so it might be trivially faster under some circumstances.
As usual, if the solution makes you more productive, code easier to read, etc. use it until profiling reveals a compelling performance reason to not use it.
It's the second way you described it - basically a short-hand way without setting up a loop.
Here's the guts of ConvertAll():
List<TOutput> list = new List<TOutput>(this._size);
for (int index = 0; index < this._size; ++index)
list._items[index] = converter(this._items[index]);
list._size = this._size;
return list;
Where TOutput is whatever type you're converting to, and converter is a delegate indicating the method that will do the conversion.
So it loops through the List you passed in, running each element through the method you specify, and then returns a new List of the specified type.
For precise timing in your scenarios you need to measure yourself.
Do not expect any miracles - it have to be O(n) operation since each element need to be converted and added to destination list.
Consider using Enumerable.Select instead as it will do lazy evaluation that may allow avoiding second copy of large list, especially you you need to do any filtering of items along the way.
Currently i have the following syntax (list is a list containing objects with many different properties (where Title is one of them):
for (int i=0; i < list.Count; i++)
{
if(title == list[i].Title)
{
//do something
}
}
How can i access the list[i].Title without having to loop over my entire collection? Since my list tends to grow large this can impact the performance of my program.
I am having a lot of similar syntax across my program (accessing public properties trough a for loop and by index). But im a sure there must be a better and elegant way of doing this?
The find method does seem to be a option since my list contains objects.
I Don't know what do you mean exactly, but technially speaking, this is not possible without a loop.
May be you mean using a LINQ, like for example:
list.Where(x=>x.Title == title)
It's worth mentioning that the iteration over is not skipped, but simply wrapped into the LINQ query.
Hope this helps.
EDIT
In other words if you really concerned about performance, keep coding the way you already doing. Otherwise choose LINQ for more concise and clear syntax.
Here comes Linq:
var listItem = list.Single(i => i.Title == title);
It throws an exception if there's no item matching the predicate. Alternatively, there's SingleOrDefault.
If you want a collection of items matching the title, there's:
var listItems = list.Where(i => i.Title == title);
i had to use it for a condition add if you don't need the index
using System.Linq;
use
if(list.Any(x => x.Title == title){
// do something here
}
this will tell you if any variable satisfies your given condition.
I'd suggest storing these in a Hashtable. You can then access an item in the collection using the key, it's a much more efficient lookup.
var myObjects = new Hashtable();
myObjects.Add(yourObject.Title, yourObject);
...
var myRetrievedObject = myObjects["TargetTitle"];
Consider creating an index. A dictionary can do the trick. If you need the list semantics, subclass and keep the index as a private member...
ObservableCollection is a list so if you don't know the element position you have to look at each element until you find the expected one.
Possible optimization
If your elements are sorted use a binary search to improve performances otherwise use a Dictionary as index.
You're looking for a hash based collection (like a Dictionary or Hashset) which the ObservableCollection is not. The best solution might be to derive from a hash based collection and implement INotifyCollectionChanged which will give you the same behavior as an ObservableCollection.
Well if you have N objects and you need to get the Title of all of them you have to use a loop. If you only need the title and you really want to improve this, maybe you can make a separated array containing only the title, this would improve the performance.
You need to define the amount of memory available and the amount of objects that you can handle before saying this can damage the performance, and in any case the solution would be changing the design of the program not the algorithm.
Maybe this approach would solve the problem:
int result = obsCollection.IndexOf(title);
IndexOf(T)
Searches for the specified object and returns the zero-based index of the first occurrence within the entire Collection.
(Inherited from Collection)
https://learn.microsoft.com/en-us/dotnet/api/system.collections.objectmodel.observablecollection-1?view=netframework-4.7.2#methods
An observablecollection can be a List
{
BuchungsSatz item = BuchungsListe.ToList.Find(x => x.BuchungsAuftragId == DGBuchungenAuftrag.CurrentItem.Id);
}
Just now find it by chance, Add(T) is defined in ICollection<T>, instead of IEnumerable<T>. And extension methods in Enumerable.cs don't contain Add(T), which I think is really weird. Since an object is enumerable, it must "looks like" a collection of items. Can anyone tell me why?
An IEnumerable<T> is just a sequence of elements; see it as a forward only cursor. Because a lot of those sequences are generating values, streams of data, or record sets from a database, it makes no sense to Add items to them.
IEnumerable is for reading, not for writing.
An enumerable is exactly that - something you can enumerate over and discover all the items. It does not imply that you can add to it.
Being able to enumerate is universal to many types of objects. For example, it is shared by arrays and collections. But you can't 'add' to an array without messing about with it's structure - whereas a Collection is specifically built to be added to and removed from.
Technically you can 'add' to an enumerable, however - by using Concat<> - however all this does is create an enumerator that enumerates from one enumerable to the next - giving the illusion of a single contigious set.
Each ICollection should be IEnumerable (I think, and the .NET Framework team seems to agree with me ;-)), but the other way around does not always make sense. There is a hierarchy of "collection like objects" in this world, and your assumption that an enumerable would be a collection you can add items to does not hold true in that hierarchy.
Example: a list of primary color names would be an IEnumerable returning "Red", "Blue" and "Green". It would make no logical sense at all to be able to do a primaryColors.Add("Bright Purple") on a "collection" filled like this:
...whatever...
{
...
var primaryColors = EnumeratePrimaryColors();
...
}
private static IEnumerable<string> EnumeratePrimaryColors() {
yield return "Red";
yield return "Blue";
yield return "Green";
}
As its name says, you can enumerate (loop) over an IEnumerable, and that's about it.
When you want to be able to Add something to it, it wouldn't be just an enumerable anymore, since it has extra features.
For instance, an array is an IEnumerable, but an array has a fixed length, so you can't add new items to it.
IEnumerable is just the 'base' for all kind of collections (even readonly collections - which have obviously no Add() method).
The more functionality you'd add to such 'base interface', the more specific it would be.
The name says it all. IEnumerable is for enumerating items only. ICollection is the actual collection of items and thus supports the Add method.
I got a Function that returns a Collection<string>, and that calls itself recursively to eventually return one big Collection<string>.
Now, i just wonder what the best approach to merge the lists? Collection.CopyTo() only copies to string[], and using a foreach() loop feels like being inefficient. However, since I also want to filter out duplicates, I feel like i'll end up with a foreach that calls Contains() on the Collection.
I wonder, is there a more efficient way to have a recursive function that returns a list of strings without duplicates? I don't have to use a Collection, it can be pretty much any suitable data type.
Only exclusion, I'm bound to Visual Studio 2005 and .net 3.0, so no LINQ.
Edit: To clarify: The Function takes a user out of Active Directory, looks at the Direct Reports of the user, and then recursively looks at the direct reports of every user. So the end result is a List of all users that are in the "command chain" of a given user.Since this is executed quite often and at the moment takes 20 Seconds for some users, i'm looking for ways to improve it. Caching the result for 24 Hours is also on my list btw., but I want to see how to improve it before applying caching.
If you're using List<> you can use .AddRange to add one list to the other list.
Or you can use yield return to combine lists on the fly like this:
public IEnumerable<string> Combine(IEnumerable<string> col1, IEnumerable<string> col2)
{
foreach(string item in col1)
yield return item;
foreach(string item in col2)
yield return item;
}
You might want to take a look at Iesi.Collections and Extended Generic Iesi.Collections (because the first edition was made in 1.1 when there were no generics yet).
Extended Iesi has an ISet class which acts exactly as a HashSet: it enforces unique members and does not allow duplicates.
The nifty thing about Iesi is that it has set operators instead of methods for merging collections, so you have the choice between a union (|), intersection (&), XOR (^) and so forth.
I think HashSet<T> is a great help.
The HashSet<T> class provides
high performance set operations. A set
is a collection that contains no
duplicate elements, and whose elements
are in no particular order.
Just add items to it and then use CopyTo.
Update: HashSet<T> is in .Net 3.5
Maybe you can use Dictionary<TKey, TValue>. Setting a duplicate key to a dictionary will not raise an exception.
Can you pass the Collection into you method by refernce so that you can just add items to it, that way you dont have to return anything. This is what it might look like if you did it in c#.
class Program
{
static void Main(string[] args)
{
Collection<string> myitems = new Collection<string>();
myMthod(ref myitems);
Console.WriteLine(myitems.Count.ToString());
Console.ReadLine();
}
static void myMthod(ref Collection<string> myitems)
{
myitems.Add("string");
if(myitems.Count <5)
myMthod(ref myitems);
}
}
As Stated by #Zooba Passing by ref is not necessary here, if you passing by value it will also work.
As far as merging goes:
I wonder, is there a more efficient
way to have a recursive function that
returns a list of strings without
duplicates? I don't have to use a
Collection, it can be pretty much any
suitable data type.
Your function assembles a return value, right? You're splitting the supplied list in half, invoking self again (twice) and then merging those results.
During the merge step, why not just check before you add each string to the result? If it's already there, skip it.
Assuming you're working with sorted lists of course.