I got a Function that returns a Collection<string>, and that calls itself recursively to eventually return one big Collection<string>.
Now, i just wonder what the best approach to merge the lists? Collection.CopyTo() only copies to string[], and using a foreach() loop feels like being inefficient. However, since I also want to filter out duplicates, I feel like i'll end up with a foreach that calls Contains() on the Collection.
I wonder, is there a more efficient way to have a recursive function that returns a list of strings without duplicates? I don't have to use a Collection, it can be pretty much any suitable data type.
Only exclusion, I'm bound to Visual Studio 2005 and .net 3.0, so no LINQ.
Edit: To clarify: The Function takes a user out of Active Directory, looks at the Direct Reports of the user, and then recursively looks at the direct reports of every user. So the end result is a List of all users that are in the "command chain" of a given user.Since this is executed quite often and at the moment takes 20 Seconds for some users, i'm looking for ways to improve it. Caching the result for 24 Hours is also on my list btw., but I want to see how to improve it before applying caching.
If you're using List<> you can use .AddRange to add one list to the other list.
Or you can use yield return to combine lists on the fly like this:
public IEnumerable<string> Combine(IEnumerable<string> col1, IEnumerable<string> col2)
{
foreach(string item in col1)
yield return item;
foreach(string item in col2)
yield return item;
}
You might want to take a look at Iesi.Collections and Extended Generic Iesi.Collections (because the first edition was made in 1.1 when there were no generics yet).
Extended Iesi has an ISet class which acts exactly as a HashSet: it enforces unique members and does not allow duplicates.
The nifty thing about Iesi is that it has set operators instead of methods for merging collections, so you have the choice between a union (|), intersection (&), XOR (^) and so forth.
I think HashSet<T> is a great help.
The HashSet<T> class provides
high performance set operations. A set
is a collection that contains no
duplicate elements, and whose elements
are in no particular order.
Just add items to it and then use CopyTo.
Update: HashSet<T> is in .Net 3.5
Maybe you can use Dictionary<TKey, TValue>. Setting a duplicate key to a dictionary will not raise an exception.
Can you pass the Collection into you method by refernce so that you can just add items to it, that way you dont have to return anything. This is what it might look like if you did it in c#.
class Program
{
static void Main(string[] args)
{
Collection<string> myitems = new Collection<string>();
myMthod(ref myitems);
Console.WriteLine(myitems.Count.ToString());
Console.ReadLine();
}
static void myMthod(ref Collection<string> myitems)
{
myitems.Add("string");
if(myitems.Count <5)
myMthod(ref myitems);
}
}
As Stated by #Zooba Passing by ref is not necessary here, if you passing by value it will also work.
As far as merging goes:
I wonder, is there a more efficient
way to have a recursive function that
returns a list of strings without
duplicates? I don't have to use a
Collection, it can be pretty much any
suitable data type.
Your function assembles a return value, right? You're splitting the supplied list in half, invoking self again (twice) and then merging those results.
During the merge step, why not just check before you add each string to the result? If it's already there, skip it.
Assuming you're working with sorted lists of course.
Related
I recently learned about List's .ConvertAll extension. I used it a couple times in code today at work to convert a large list of my objects to a list of some other object. It seems to work really well. However I'm unsure how efficient or fast this is compared to just iterating the list and converting the object. Does .ConvertAll use anything special to speed up the conversion process or is it just a short hand way of converting Lists without having to set up a loop?
No better way to find out than to go directly to the source, literally :)
http://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs#dbcc8a668882c0db
As you can see, there's no special magic going on. It just iterates over the list and creates a new item by the converter function that you specify.
To be honest, I was not aware of this method. The more idiomatic .NET way to do this kind of projection is through the use of the Select extension method on IEnumerable<T> like so: source.Select(input => new Something(input.Name)). The advantage of this is threefold:
It's more idomatic as I said, the ConvertAll is likely a remnant of the pre-C#3.0 days. It's not a very arcane method by any means and ConvertAll is a pretty clear description, but it might still be better to stick to what other people know, which is Select.
It's available on all IEnumerable<T>, while ConvertAll only works on instances of List<T>. It doesn't matter if it's an array, a list or a dictionary, Select works with all of them.
Select is lazy. It doesn't do anything until you iterate over it. This means that it returns an IEnumerable<TOutput> which you can then convert to a list by calling ToList() or not if you don't actually need a list. Or if you just want to convert and retrieve the first two items out of a list of a million items, you can simply do source.Select(input => new Something(input.Name)).Take(2).
But if your question is purely about the performance of converting a whole list to another list, then ConvertAll is likely to be somewhat faster as it's less generic than a Select followed by a ToList (it knows that a list has a size and can directly access elements by index from the underlying array for instance).
Decompiled using ILSPy:
public List<TOutput> ConvertAll<TOutput>(Converter<T, TOutput> converter)
{
if (converter == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.converter);
}
List<TOutput> list = new List<TOutput>(this._size);
for (int i = 0; i < this._size; i++)
{
list._items[i] = converter(this._items[i]);
}
list._size = this._size;
return list;
}
Create a new list.
Populate the new list by iterating over the current instance, executing the specified delegate.
Return the new list.
Does .ConvertAll use anything special to speed up the conversion
process or is it just a short hand way of converting Lists without
having to set up a loop?
It doesn't do anything special with regards to conversion (what "special" thing could it do?) It is directly modifying the private _items and _size members, so it might be trivially faster under some circumstances.
As usual, if the solution makes you more productive, code easier to read, etc. use it until profiling reveals a compelling performance reason to not use it.
It's the second way you described it - basically a short-hand way without setting up a loop.
Here's the guts of ConvertAll():
List<TOutput> list = new List<TOutput>(this._size);
for (int index = 0; index < this._size; ++index)
list._items[index] = converter(this._items[index]);
list._size = this._size;
return list;
Where TOutput is whatever type you're converting to, and converter is a delegate indicating the method that will do the conversion.
So it loops through the List you passed in, running each element through the method you specify, and then returns a new List of the specified type.
For precise timing in your scenarios you need to measure yourself.
Do not expect any miracles - it have to be O(n) operation since each element need to be converted and added to destination list.
Consider using Enumerable.Select instead as it will do lazy evaluation that may allow avoiding second copy of large list, especially you you need to do any filtering of items along the way.
So, I've found a piece of code like this:
class CustomDictionary
{
Dictionary<string, string> backing;
...
public string Get(int index)
{
return backing.ElementAtOrDefault(index); //use linq extensions on IEnumerable
}
}
And then this was used like so:
for(int i=0;i<mydictionary.Count;i++)
{
var value=mydictionary.Get(i);
}
Aside from the performance problems and uglyness of doing it this way, is this code actually correct? Ie, is the IEnumerable on Dictionary guaranteed to always return things in the same order assuming that nothing is modified with the dictionary during the iteration?
This is NOT guaranteed. It is for a SortedDictionary<>, of course, and also for arrays and lists. But NOT for a Dictionary.
Chances are, it will be stable if the dictionary is not changed - but it's just not guaranteed. You have to ask yourself - do you feel lucky? ;)
If you want to get the elements by the order they were inserted then you should probably look into the Stack and Queue depending on what elements you want first.
Yes you'll get the same items.
As you specified, the method you presented is very inefficient way to do so.
ElementAtOrDefault is a LINQ extension method for IEnumerable, means for each item it will iterate all way to the specified item.
I have a dicionary of strings and dates
private readonly Dictionary<string, DateTime>
as a private member variable in a class
I have methods which recursively get the next and previous value in the dictionary given a particular key
e.g.
It takes the given key, does a database lookup, if that is null, it gets the next one and so on recursively. It does this both forward and backwards
I have used this to get the next one
Find next record in a set: LINQ
My question is can I simply add Reverse to the linq statement to do this backwards?
Obviously it will work but I am worried about the state it will leave my member variable list in
i.e. will it be reversed? or is a copy have made when doing the reverse?
thanks
Enumerable.Reverse does not mutate the sequence on which it is invoked.
As others have said, Enumerable.Reverse will not change the underlying sequence (except in the rare case where enumerating over the sequence will change it).
However, be warned that if you are using a List, you might end up calling List.Reverse, which DOES change the list.
List<int> list = /* initialize */;
IEnumerable<int> asEnumerable = list;
// these call List<int>.Reverse(), which changes the list
list.Reverse();
((List<int>)asEnumerable).Reverse();
// these call Enumerable.Reverse<int>(), which does not change the list, but returns a new IEnumerable<int>
asEnumerable.Reverse();
((IEnumerable<int>)list).Reverse();
((IList<int>)list).Reverse();
Sorry, I would have made this a comment on one of the other correct answers, but I wanted to include code.
None of the Linq methods, including Reverse, will alter the enumerable that is passed to them, UNLESS enumerating the enumerable causes alterations (you'd know if this were the case; few if any built-in Enumerable types have side effects to data access).
What reverse likely DOES do, since it requires knowledge of the cardinality of the collection, is to slurp the enumerable's elements into an array which it will iterate in reverse. A lot of oft-used Enumerable Linq methods, such as OrderBy, do this behind the scenes.
I am using some of the LINQ select stuff to create some collections, which return IEnumerable<T>.
In my case I need a List<T>, so I am passing the result to List<T>'s constructor to create one.
I am wondering about the overhead of doing this. The items in my collections are usually in the millions, so I need to consider this.
I assume, if the IEnumerable<T> contains ValueTypes, it's the worst performance.
Am I right? What about Ref Types? Either way there is also the cost of calling, List<T>.Add a million times, right?
Any way to solve this? Like can I "overload" methods like LINQ Select using extension methods)?
No, there's no particular penalty for the element type being value types, assuming you're using IEnumerable<T> instead of IEnumerable. You won't get any boxing going on.
If you actually know the size of the result beforehand (which the result of Select probably won't) you might want to consider creating the list with that size of buffer, then using AddRange to add the values. Otherwise the list will have to resize its buffer every time it fills it.
For instance, instead of doing:
Foo[] foo = new Foo[100];
IEnumerable<string> query = foo.Select(foo => foo.Name);
List<string> queryList = new List<string>(query);
you might do:
Foo[] foo = new Foo[100];
IEnumerable<string> query = foo.Select(x => x.Name);
List<string> queryList = new List<string>(foo.Length);
queryList.AddRange(query);
You know that calling Select will produce a sequence of the same length as the original query source, but nothing in the execution environment has that information as far as I'm aware.
It would be best to avoid the need for a list. If you can keep your caller using IEnumerable<T>, you will save yourself some headaches.
LINQ's ToList() will take your enumerable, and just construct a new List<T> directly from it, using the List<T>(IEnumerable<T>) constructor. This will be the same as making the list yourself, performance wise (although LINQ does a null check, as well).
If you're adding the elements yourself, use the AddRange method instead of the Add. ToList() is very similar to AddRange (since it's using the constructor which takes IEnumerable<T>), which typically will be your best bet, performance wise, in this case.
Generally speaking, a method returning IEnumerable doesn't have to evaluate any of the items before the item is actually needed. So, theoretically, when you return an IEnumerable none of you items need to exist at that time.
So creating a list means that you will really need to evaluate items, get them and place them somewhere in memory (at least their references). There is nothing that can be done about this - if you really need to have a list.
A number of other responders have already provided ideas for how to improve the performance of copying an IEnumerable<T> into a List<T> - I don't think that much can be added on that front.
However, based on what you have described you need to do with the results, and the fact that you get rid of the list when you're done (which I presume means that the intermediate results are not interesting) - you may want to consider whether you really need to materialize a List<T>.
Rather than creating a List<T> and operating on the contents of that list - consider writing a lazy extension method for IEnumerable<T> that performs the same processing logic. I've done this myself in a number of cases, and writing such logic in C# is not so bad when using the [yield return][1] syntax supported by the compiler.
This approach works well if all you're trying to do is visit each item in the results and collection some information from it. Often, what you need to do is just visit each element in the collection on demand, do some processing with it, and then move on. This approach is generally more scalable and performant that creating a copy of the collection just to iterate over it.
Now, this advice may not work for you for other reasons, but it's worth considering as an alternative to finding the most efficient way to materialize a very large list.
Don't pass an IEnumerable to the List constructor. IEnumerable has a ToList() method, which can't possibly do worse than that, and has nicer syntax (IMHO).
That said, that only changes the answer to your question to "it depends" - in particular, it depends on what the IEnumerable actually is behind the scenes. If it happens to be a List already, then ToList will effectively be free, of course will go much faster than if it were another type. It's still not super-fast.
The best way to solve this, of course, is to try to figure out how to do your processing on an IEnumerable rather than a List. That may not be possible.
Edit: Some people in the comments are debating whether or not ToList() will actually be any faster when called on a List than if not, and whether ToList() will be any faster than the list constructor. At this point, speculating is getting pointless, so here's some code:
using System;
using System.Linq;
using System.Collections.Generic;
public static class ToListTest
{
public static int Main(string[] args)
{
List<int> intlist = new List<int>();
for (int i = 0; i < 1000000; i++)
intlist.Add(i);
IEnumerable<int> intenum = intlist;
for (int i = 0; i < 1000; i++)
{
List<int> foo = intenum.ToList();
}
return 0;
}
}
Running this code with an IEnumerable that's really a List goes about 6-10 times faster than if I replace it with a LinkedList or Stack (on my pokey 2.4 GHz P4, using Mono 1.2.6). Conceivably this could be due to some unfortunate interaction between ToList() and the particular implementations of LinkedList or Stack's enumerations, but at least the point remains: speed will depend on the underlying type of the IEnumerable. That said, even with a List as the source, it still takes 6 seconds for me to make 1000 ToList() calls, so it's far from free.
The next question is whether ToList() is any more intelligent than the List constructor. The answer to that turns out to be no: the List constructor is just as fast as ToList(). In hindsight, Jon Skeet's reasoning makes sense - I was just forgetting that ToList() was an extension method. I still (much) prefer ToList() syntactically, but there's no performance reason to use it.
So the short version is that the best answer is still "don't convert to a List if you can avoid it". Barring that, actual performance will depend drastically on what the IEnumerable actually is, but at best it'll be sluggish, as opposed to glacial. I've amended my original answer to reflect this.
From reading the various comments and the question I get the following requirements
for a collection of data you need to run through that collection, filter out some objects and then perform some transformation on the remaining objects. If thats the case you can do something like this:
var result = from item in collection
where item.Id > 10 //or some more sensible condition
select Operation(item);
and if you need to the perform more filtering and transformation you can nest your LINQ queries like
var result = from filteredItem in (from item in collection
where item.Id > 10 //or some more sensible condition
select Operation(item))
where filteredItem.SomePropertyAvailableAfterFirstTransformation == "new"
select SecondTransfomation(filteredItem);
One would think the simple code
llist1.Last.Next = llist2.First;
llist2.First.Previous = llist1.Last;
would work, however apparently in C#'s LinkedList, First, Last, and their properties are Get only.
The other method I could think of was
llist1.AddLast(llist2.First);
However, this does not work either - it fails because the first node of llist2 is already in a linked list.
Does this mean that I have to have a loop that manually AddLast's each node of llist2 to llist1? Doesn't this defeat the efficiency of linked lists????
Yes, you have to loop, unfortunately. This is an O(n) operation - O(1) for each entry added. There's no risk of requiring a buffer to be resized and copied, etc - although of course garbage collection might do roughly that :) You could even write handy extension methods:
public static class LinkedListExtensions
{
public static void AppendRange<T>(this LinkedList<T> source,
IEnumerable<T> items)
{
foreach (T item in items)
{
source.AddLast(item);
}
}
public static void PrependRange<T>(this LinkedList<T> source,
IEnumerable<T> items)
{
LinkedListNode<T> first = source.First;
// If the list is empty, we can just append everything.
if (first is null)
{
AppendRange(source, items);
return;
}
// Otherwise, add each item in turn just before the original first item
foreach (T item in items)
{
source.AddBefore(first, item);
}
}
}
EDIT: Erich's comment suggests why you might think this is inefficient - why not just join the two lists together by updating the "next" pointer of the tail of the first list and the "prev" pointer of the head of the second? Well, think about what would happen to the second list... it would have changed as well.
Not only that, but what would happen to the ownership of those nodes? Each is essentially part of two lists now... but the LinkedListNode<T>.List property can only talk about one of them.
While I can see why you might want to do this in some cases, the way that the .NET LinkedList<T> type has been built basically prohibits it. I think this doc comment explains it best:
The LinkedList<T>) class does
not support chaining, splitting,
cycles, or other features that can
leave the list in an inconsistent
state.
llist1 = new LinkedList<T>(llist1.Concat(llist2));
this concatenates the two lists (requires .NET 3.5). The drawback is that it creates a new instance of LinkedList, which may not be what you want... You could do something like that instead :
foreach(var item in llist2)
{
llist1.AddLast(item);
}
Here you can find my linked list implementation with O(1) concat and split times.
Why .NET LinkedList does not support Concat and Split operations?
Short summary
Advantages vs .NET LinkedList:
Less memory consumption, thus every node SimpleLinkedListNode has
three pointers (prev, next, value) instead of four (prev, next, list,
value) unlike original .NET implementation.
Supports Concat and Split
operations in O(1)
Supports IEnumarable Reverse() enumerator in
O(1) – by the way I do not see any reason why it’s not provided
natively on the .NET LinkedList. Appropriate extension method
requires O(n).
Disadvantages:
Does not support Count.
Concat operation leaves second list in an inconsistent state.
Split operation leaves original list in an inconsistent state.
You are able to share nodes between lists.
Other:
I have chosen an alternative strategy for implementing enumeration and find operations, rather than more verbose and purely readable original implementation. I hope the negative performance impact remains insignificant.