Union multiple number of lists in C#

Union multiple number of lists in C# - c#

I am looking for a elegant solution for the following situation:
I have a class that contains a List like
class MyClass{
...
public List<SomeOtherClass> SomeOtherClassList {get; set;}
...
}
A third class called Model holds a List<Myclass> which is the one I am operating on from extern.
Now I would like to extend the Model class with a method that returns all unique SomeOtherClass instances over all MyClass instances.
I know that there is the Union() method and with a foreach loop I could solve this issue easily, which I actually did. However, since I am new to all the C#3+ features I am curious how this could be achieved more elegantly, with or without Linq.
I have found an approach, that seems rather clumsy to me, but it works:
List<SomeOtherClass> ret = new List<SomeOtherClass>();
MyClassList.Select(b => b.SomeOtherClasses).ToList().ForEach(l => ret = ret.Union(l).ToList());
return ret;
Note: The b.SomeotherClasses property returns a List<SomeOtherClasses>.
This code is far away from being perfect and some questions arise from the fact that I have to figure out what is good style for working with C#3 and what not. So, I made a little list with thoughts about that snippet, which I would be glad to get a few comments about. Apart from that I'd be glad to hear some comments how to improve this code any further.
The temporary list ret would have been part of an approach in C#2 maybe, but is it correct that I should be able to resign this list with using method chaining instead? Or am I missing the point?
Is it really required to use the intermediate ToList() method? All I want is to perform a further action each member of a selection.
What is the cost of those ToList() operations? Are they good style? Necessary?
Thanks.

You are looking for SelectMany() + Distinct() :
List<SomeOtherClass> ret = MyClassList.SelectMany( x => x.SomeOtherClasses)
.Distinct()
.ToList();
SelectMany() will flatten the "list of lists" into one list, then you can just pick out the distinct entries in this enumeration instead of using union between individual sub-lists.
In general you will want to avoid side effects with Linq, your original approach is kind of abusing this my modifying ret which is not part of the query.
ToList() is required since each standard query operator returns a new enumeration and does not modify the existing enumeration, hence you have to convert the final enumeration result back to a list. The cost of ToList() is a full iteration of the enumeration which, in most cases, is negligible. Of course if your class could use an IEnumerable<SomeOtherClass> instead, you do not have to convert to a list at all.

You should have a look at SelectMany. Something like this should generate your "flat" list:
MyClassList.SelectMany(b => b.SomeOtherClasses)
It will return a IEnumerable<SomeOtherClass> which you can filter/process further.

Related

IEnumerable to List

Why IEnumerable.ToList() won't work if like:
var _listReleases= new List<string>;
_listReleases.Add("C#")
_listReleases.Add("Javascript");
_listReleases.Add("Python");
IEnumerable sortedItems = _listReleases.OrderBy(x => x);
_listReleases.Clear();
_listReleases.AddRange(sortedItems); // won't work
_listReleases.AddRange(sortedItems.ToList()); // won't work
Note: _listRelealse will be null

It doesn't work because of this line:
_listReleases.Clear();
First of all, _listReleases is not null at this point. It's merely empty, which is a completely different thing.
But to explain why this doesn't work as you expect: the IEnumerable interface type does not actually allocate or reserve storage for anything. It represents an object that you can use with a foreach loop, and nothing more. It does not actually need to store the items in the collection itself.
Sometimes, an IEnumerable reference does have those items in the same object, but it doesn't have to. That's what's going on here. The OrderBy() extension method only creates an object that knows how to look at the original list and return the items in a specific order. But this does not have storage for those items. It still depends on it's original data source.
The best solution for this situation is to stop using the _listReleases variable at this point, and instead just use the sortedItems variable. As long the former is not garabage collected, the latter will do what you need. But if you really want the _listReleases variable, you can do it like this:
_listReleases = sortedItems.ToList();
Now back to IEnumerables. There are some nice benefits to this property of not requiring immediate storage of the items themselves, and merely abstracting the ability to iterate over a collection:
Lazy Evaluation - That the work required to produce those items is not done until called for (and often, that means it won't need to be done all all, greatly improving performance).
Composition - An IEnumerable object can be modified during a program to incorprate new sets of rules or operations into the final result. This reduces program complexity and improves maintainability by allowing you to break apart a complex set of sorting or filtering requirements into it's component parts. This also makes it much easier to build a program where these rules can be easily determined by the user at run time, instead of in advance by the programmer at compile time.
Memory Efficiency - An IEnumerable makes it possible to iterate collections of data from sources such as a database in ways that only need to keep the current record loaded into memory at any given time. This feature can also be used to create unbounded collections: sets of items that may stretch on to infinity. You can build an IEnumerable with the BigInteger type to calculate the next prime on to infinity, if asked for. Moreover, you could use that collection in a useful way without crashing or hanging the program by combining this with the composition feature, so the program will know when to stop.

LINQ is lazily evaluated. When you run this line:
IEnumerable sortedItems = _listReleases.OrderBy(x => x);
You aren't actually ordering the items right then and there. Instead you're building an enumerable that will, when enumerated, return the objects that are currently in _listReleases in order. So when you Clear() the list, it no longer has any items to order.
You need to force it to evaluate before you clear _listReleases. An easy way to do this is to add a ToList() call. Also, the type IEnumerable isn't compatible with AddRange won't accept it. You can just use var to implicitly type it to List<string>, which will work because List<T> : IEnumerable<T> (it implements the interface).
var sortedItems = _listReleases.OrderBy(x => x).ToList();
_listReleases.Clear();
_listReleases.AddRange(sortedItems);
You should also note that methods like ToList() are extension methods for IEnumerable<T>, not IEnumerable, so ((IEnumerable)something).ToList() won't work. Unlike, say, Java, Something<T> and Something are completely distinct types in C#.

How to join together all the elements in an IEnumerable of IEnumerables?

Just in case you're wondering how this came up, I'm working with some resultsets from Entity Framework.
I have an object that is an IEnumerable<IEnumerable<string>>; basically, a list of lists of strings.
I want to merge all the lists of strings into one big list of strings.
What is the best way to do this in C#.net?

Use the LINQ SelectMany method:
IEnumerable<IEnumerable<string>> myOuterList = // some IEnumerable<IEnumerable<string>>...
IEnumerable<String> allMyStrings = myOuterList.SelectMany(sl => sl);
To be very clear about what's going on here (since I hate the thought of people thinking this is some kind of sorcery, and I feel bad that some other folks deleted the same answer):
SelectMany is an extension method ( a static method that through syntactic sugar looks like an instance method on a specific type) on IEnumerable<T>. It takes your original enumeration of enumerations and a function for converting each item of that into a enumeration.
Because the items are already enumerations, the conversion function is simple- just return the input (sl => sl means "take a paremeter named sl and return it"). SelectMany then provides an enumeration over each of these in turn, resulting in your "flattened" list..

Use the Concat method:
firstEnumerable.Concat(secondEnumerable)
Using SelectMany will force an additional evaluation of each element of both enumerations that you don't need.

Change one field in an array using linq

I have an array of objects, where one field is a boolean field called includeInReport. In a certain case, I want to default that to always be true. I know it's as easy as doing this:
foreach (var item in awards)
{
item.IncludeInReport = true;
}
But is there an equilivent way to do this with linq? It's more to satisfy my curiosity then anything... My first thought was to do this...
awards.Select(a => new Award{ IncludeInReport = true, SomeFiled = a.SomeField, .... }
But since I have a few fields in my object, I didn't want to have to type out all of the fields and it's just clutter on the screen at that point. Thanks!

ForEach is sort of linq:
awards.ForEach(item => item.IncludeInReport = true);
But Linq is not about updating values. So you are not using the right tool.
Let me quantify "sort of linq". ForEach is not Linq, but a method on List<T>. However, the syntax is similar to Linq.

Here's the correct code:
awards = awards.Select(a => {a.IncludeInReport = true; return a;});
LINQ follows functional programming ideas and thus doesn't want you to change (mutate) existing variables.
So instead in the code above we generate a new list (haven't changed any existing values) and then overwrite our original list (this is outside LINQ so we no longer care about functional programming ideas).

Since you are starting with an array, you can use the Array.ForEach method:
Array.ForEach(awards, a => a.IncludeInReport = true);
This isn't LINQ, but in this case you don't need LINQ. As others have mentioned, you can't mutate items via LINQ. If you have a List<T> you could use its ForEach method in a similar fashion. Eric Lippert discusses this issue in more depth here: "foreach" vs "ForEach".

There is no mutating method available in Linq. Linq is useful for querying, ordering, filtering, joining, and projecting data. If you need to mutate it, you already have a very clean, clear method of doing so: your loop.
List<T> exposes a ForEach method to write something that reminds you of Linq (but isn't). You can then provide an Action<T> or some other delegate/function that applies your mutation to each element in turn. (Ahmed Mageed's answer also mentions the slightly different Array.ForEach method.) You can write your own extension method to do the same with IEnumerable<T> (which would then be generally more applicable than either aforementioned method and also be available for your array). But I encourage you to simply keep your loop, it's not exactly dirty.

You can do something like that:
awards.AsParallel().ForAll(item => item.IncludeInReport = true)
That makes that action parallel if possible.

How can I overcome the overhead of creating a List<T> from an IEnumerable<T>?

I am using some of the LINQ select stuff to create some collections, which return IEnumerable<T>.
In my case I need a List<T>, so I am passing the result to List<T>'s constructor to create one.
I am wondering about the overhead of doing this. The items in my collections are usually in the millions, so I need to consider this.
I assume, if the IEnumerable<T> contains ValueTypes, it's the worst performance.
Am I right? What about Ref Types? Either way there is also the cost of calling, List<T>.Add a million times, right?
Any way to solve this? Like can I "overload" methods like LINQ Select using extension methods)?

No, there's no particular penalty for the element type being value types, assuming you're using IEnumerable<T> instead of IEnumerable. You won't get any boxing going on.
If you actually know the size of the result beforehand (which the result of Select probably won't) you might want to consider creating the list with that size of buffer, then using AddRange to add the values. Otherwise the list will have to resize its buffer every time it fills it.
For instance, instead of doing:
Foo[] foo = new Foo[100];
IEnumerable<string> query = foo.Select(foo => foo.Name);
List<string> queryList = new List<string>(query);
you might do:
Foo[] foo = new Foo[100];
IEnumerable<string> query = foo.Select(x => x.Name);
List<string> queryList = new List<string>(foo.Length);
queryList.AddRange(query);
You know that calling Select will produce a sequence of the same length as the original query source, but nothing in the execution environment has that information as far as I'm aware.

It would be best to avoid the need for a list. If you can keep your caller using IEnumerable<T>, you will save yourself some headaches.
LINQ's ToList() will take your enumerable, and just construct a new List<T> directly from it, using the List<T>(IEnumerable<T>) constructor. This will be the same as making the list yourself, performance wise (although LINQ does a null check, as well).
If you're adding the elements yourself, use the AddRange method instead of the Add. ToList() is very similar to AddRange (since it's using the constructor which takes IEnumerable<T>), which typically will be your best bet, performance wise, in this case.

Generally speaking, a method returning IEnumerable doesn't have to evaluate any of the items before the item is actually needed. So, theoretically, when you return an IEnumerable none of you items need to exist at that time.
So creating a list means that you will really need to evaluate items, get them and place them somewhere in memory (at least their references). There is nothing that can be done about this - if you really need to have a list.

A number of other responders have already provided ideas for how to improve the performance of copying an IEnumerable<T> into a List<T> - I don't think that much can be added on that front.
However, based on what you have described you need to do with the results, and the fact that you get rid of the list when you're done (which I presume means that the intermediate results are not interesting) - you may want to consider whether you really need to materialize a List<T>.
Rather than creating a List<T> and operating on the contents of that list - consider writing a lazy extension method for IEnumerable<T> that performs the same processing logic. I've done this myself in a number of cases, and writing such logic in C# is not so bad when using the [yield return][1] syntax supported by the compiler.
This approach works well if all you're trying to do is visit each item in the results and collection some information from it. Often, what you need to do is just visit each element in the collection on demand, do some processing with it, and then move on. This approach is generally more scalable and performant that creating a copy of the collection just to iterate over it.
Now, this advice may not work for you for other reasons, but it's worth considering as an alternative to finding the most efficient way to materialize a very large list.

Don't pass an IEnumerable to the List constructor. IEnumerable has a ToList() method, which can't possibly do worse than that, and has nicer syntax (IMHO).
That said, that only changes the answer to your question to "it depends" - in particular, it depends on what the IEnumerable actually is behind the scenes. If it happens to be a List already, then ToList will effectively be free, of course will go much faster than if it were another type. It's still not super-fast.
The best way to solve this, of course, is to try to figure out how to do your processing on an IEnumerable rather than a List. That may not be possible.
Edit: Some people in the comments are debating whether or not ToList() will actually be any faster when called on a List than if not, and whether ToList() will be any faster than the list constructor. At this point, speculating is getting pointless, so here's some code:
using System;
using System.Linq;
using System.Collections.Generic;
public static class ToListTest
{
public static int Main(string[] args)
{
List<int> intlist = new List<int>();
for (int i = 0; i < 1000000; i++)
intlist.Add(i);
IEnumerable<int> intenum = intlist;
for (int i = 0; i < 1000; i++)
{
List<int> foo = intenum.ToList();
}
return 0;
}
}
Running this code with an IEnumerable that's really a List goes about 6-10 times faster than if I replace it with a LinkedList or Stack (on my pokey 2.4 GHz P4, using Mono 1.2.6). Conceivably this could be due to some unfortunate interaction between ToList() and the particular implementations of LinkedList or Stack's enumerations, but at least the point remains: speed will depend on the underlying type of the IEnumerable. That said, even with a List as the source, it still takes 6 seconds for me to make 1000 ToList() calls, so it's far from free.
The next question is whether ToList() is any more intelligent than the List constructor. The answer to that turns out to be no: the List constructor is just as fast as ToList(). In hindsight, Jon Skeet's reasoning makes sense - I was just forgetting that ToList() was an extension method. I still (much) prefer ToList() syntactically, but there's no performance reason to use it.
So the short version is that the best answer is still "don't convert to a List if you can avoid it". Barring that, actual performance will depend drastically on what the IEnumerable actually is, but at best it'll be sluggish, as opposed to glacial. I've amended my original answer to reflect this.

From reading the various comments and the question I get the following requirements
for a collection of data you need to run through that collection, filter out some objects and then perform some transformation on the remaining objects. If thats the case you can do something like this:
var result = from item in collection
where item.Id > 10 //or some more sensible condition
select Operation(item);
and if you need to the perform more filtering and transformation you can nest your LINQ queries like
var result = from filteredItem in (from item in collection
where item.Id > 10 //or some more sensible condition
select Operation(item))
where filteredItem.SomePropertyAvailableAfterFirstTransformation == "new"
select SecondTransfomation(filteredItem);

Merging two Collection<T>

I got a Function that returns a Collection<string>, and that calls itself recursively to eventually return one big Collection<string>.
Now, i just wonder what the best approach to merge the lists? Collection.CopyTo() only copies to string[], and using a foreach() loop feels like being inefficient. However, since I also want to filter out duplicates, I feel like i'll end up with a foreach that calls Contains() on the Collection.
I wonder, is there a more efficient way to have a recursive function that returns a list of strings without duplicates? I don't have to use a Collection, it can be pretty much any suitable data type.
Only exclusion, I'm bound to Visual Studio 2005 and .net 3.0, so no LINQ.
Edit: To clarify: The Function takes a user out of Active Directory, looks at the Direct Reports of the user, and then recursively looks at the direct reports of every user. So the end result is a List of all users that are in the "command chain" of a given user.Since this is executed quite often and at the moment takes 20 Seconds for some users, i'm looking for ways to improve it. Caching the result for 24 Hours is also on my list btw., but I want to see how to improve it before applying caching.

If you're using List<> you can use .AddRange to add one list to the other list.
Or you can use yield return to combine lists on the fly like this:
public IEnumerable<string> Combine(IEnumerable<string> col1, IEnumerable<string> col2)
{
foreach(string item in col1)
yield return item;
foreach(string item in col2)
yield return item;
}

You might want to take a look at Iesi.Collections and Extended Generic Iesi.Collections (because the first edition was made in 1.1 when there were no generics yet).
Extended Iesi has an ISet class which acts exactly as a HashSet: it enforces unique members and does not allow duplicates.
The nifty thing about Iesi is that it has set operators instead of methods for merging collections, so you have the choice between a union (|), intersection (&), XOR (^) and so forth.

I think HashSet<T> is a great help.
The HashSet<T> class provides
high performance set operations. A set
is a collection that contains no
duplicate elements, and whose elements
are in no particular order.
Just add items to it and then use CopyTo.
Update: HashSet<T> is in .Net 3.5
Maybe you can use Dictionary<TKey, TValue>. Setting a duplicate key to a dictionary will not raise an exception.

Can you pass the Collection into you method by refernce so that you can just add items to it, that way you dont have to return anything. This is what it might look like if you did it in c#.
class Program
{
static void Main(string[] args)
{
Collection<string> myitems = new Collection<string>();
myMthod(ref myitems);
Console.WriteLine(myitems.Count.ToString());
Console.ReadLine();
}
static void myMthod(ref Collection<string> myitems)
{
myitems.Add("string");
if(myitems.Count <5)
myMthod(ref myitems);
}
}
As Stated by #Zooba Passing by ref is not necessary here, if you passing by value it will also work.

As far as merging goes:
I wonder, is there a more efficient
way to have a recursive function that
returns a list of strings without
duplicates? I don't have to use a
Collection, it can be pretty much any
suitable data type.
Your function assembles a return value, right? You're splitting the supplied list in half, invoking self again (twice) and then merging those results.
During the merge step, why not just check before you add each string to the result? If it's already there, skip it.
Assuming you're working with sorted lists of course.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.