How to join together all the elements in an IEnumerable of IEnumerables? - c#

Just in case you're wondering how this came up, I'm working with some resultsets from Entity Framework.
I have an object that is an IEnumerable<IEnumerable<string>>; basically, a list of lists of strings.
I want to merge all the lists of strings into one big list of strings.
What is the best way to do this in C#.net?

Use the LINQ SelectMany method:
IEnumerable<IEnumerable<string>> myOuterList = // some IEnumerable<IEnumerable<string>>...
IEnumerable<String> allMyStrings = myOuterList.SelectMany(sl => sl);
To be very clear about what's going on here (since I hate the thought of people thinking this is some kind of sorcery, and I feel bad that some other folks deleted the same answer):
SelectMany is an extension method ( a static method that through syntactic sugar looks like an instance method on a specific type) on IEnumerable<T>. It takes your original enumeration of enumerations and a function for converting each item of that into a enumeration.
Because the items are already enumerations, the conversion function is simple- just return the input (sl => sl means "take a paremeter named sl and return it"). SelectMany then provides an enumeration over each of these in turn, resulting in your "flattened" list..

Use the Concat method:
firstEnumerable.Concat(secondEnumerable)
Using SelectMany will force an additional evaluation of each element of both enumerations that you don't need.

Related

Is the Find() method useless?

I've been doing some research into the differences between the two linq methods Find() and First(). The only differences I could find (pun intended) was that Find() uses a foreach loop instead of a for loop, and First() does not require a parameter to be called.
So is there any reason that I should use Find() instead of First()?
EDIT: I have already read C# Difference between First() and Find()
, but it does not give any reason to use one over the other. It merely discusses how the two iterate over the list differently.
Mostly style preference, but for some cases there is difference.
Find is defined on limited set of types (List<T>, array) while First is defined as extension for all IEnumerable and IQueryable types. Using First allows you to change underlying collection type easily including using results of .Where and .Select methods. Converting enumerable to one that supports .Find is always slower option than just calling .First.
Performance of both methods is roughly the same on types they both defined as both simply do linear search through elements. More info in question you've linked - C# Difference between First() and Find()
If you have "queryable" enumeration (when using LINQ-to-SQL for example) using .First could be significantly faster than converting result to collection that support .Find (i.e. using .ToList) and than calling .Find. Such queryable enumeration likely convert .First into database specific query that will return one result while .ToList will likely have to pull in much more results for client side filtering.

What is IEnumerable interface in c#? What if we dont use it?

Searched in internet for What is IEnumerable interface in C#? The problem it solves? What if we don't use it? But never really did not get much. Lots of posts explain how to implement it.
I've also found the following example
List<string> List = new List<string>();
List.Add("Sourav");
List.Add("Ram");
List.Add("Sachin");
IEnumerable names = from n in List where (n.StartsWith("S")) select n;
// var names = from n in List where (n.StartsWith("S")) select n;
foreach (string name in names)
{
Console.WriteLine(name);
}
The above ex outputs:
Sourav
Sachin
I wanted to know, the advantage of using IEnumerable in the above example? I can achieve the same using 'var' (commented line).
I would appreciate if anyone of you can help me out to understand this and whats the benefit of using IEnumerable with an example? What if we don't use it?.
Beyond reading the documentation I'd describe IEnumerable<T> as a collection of Ts, it can be iterated over and many other functions can be carried out (such as Where(), Any() and Count()) however it's not designed for adding and removing elements. That's a List<T>.
It's useful because it's a fundamental interface for many collections, various data access layers and ORMs use it and many extension methods are automatically included for it.
Many concrete implementations of Lists, Arrays, Bags, Queues, Stacks all implement it allowing a wide variety of collections to use it's extension methods.
Also collections implementing either IEnumerable or IEnumerable can be used in a foreach loop.
From msdn
for each element in an array or an object collection that implements
the System.Collections.IEnumerable or
System.Collections.Generic.IEnumerable interface.
In your code example you've got a variable called names which will be an IEnumerable<string>, it's important to understand that it will be an IEnumerable<string> regardless of whether you use the var keyword or not. var just allows you to avoid writing the type so explicitly each time.
TLDR
It's a common base interface for many different types of collections which let you use your collection in foreach loops and provides a lot of extra extension methods for free.
IEnumerable and much more preferred IEnumerable<T> are the standard way to handle the 'sequence of elements' pattern.
The idea is each type : IEnumerable<T> looks like if there's a label: "ENUMERATE ME". No matter what's there: queue of order items, collection of controls, records from a sql query, xml element subnodes etc etc etc - it's all the same from enumerable's point of view: you've got a sequence and you can do something for each item from the sequence.
Note that IEnumerable is somewhat limited: there's no count, no indexed access, no guarantee for repeatable results, no way to check if enumerable is empty but to get the enumerator and to check if there is anything. The simplicity allows to cover almost all use cases, from collections to ad-hoc sequences (custom iterators, linq queries etc).
The question was asked multiple times, here're some answers: 1, 2, 3
MSDN
"The disadvantage of omitting IEnumerable and IEnumerator is that the collection class is no longer interoperable with the foreach statements, or equivalent statements, of other common language runtime languages."
So you need to implement this interface so your custom collection type can be used with other CLR languages. It seems like a CLS requirement.

Change one field in an array using linq

I have an array of objects, where one field is a boolean field called includeInReport. In a certain case, I want to default that to always be true. I know it's as easy as doing this:
foreach (var item in awards)
{
item.IncludeInReport = true;
}
But is there an equilivent way to do this with linq? It's more to satisfy my curiosity then anything... My first thought was to do this...
awards.Select(a => new Award{ IncludeInReport = true, SomeFiled = a.SomeField, .... }
But since I have a few fields in my object, I didn't want to have to type out all of the fields and it's just clutter on the screen at that point. Thanks!
ForEach is sort of linq:
awards.ForEach(item => item.IncludeInReport = true);
But Linq is not about updating values. So you are not using the right tool.
Let me quantify "sort of linq". ForEach is not Linq, but a method on List<T>. However, the syntax is similar to Linq.
Here's the correct code:
awards = awards.Select(a => {a.IncludeInReport = true; return a;});
LINQ follows functional programming ideas and thus doesn't want you to change (mutate) existing variables.
So instead in the code above we generate a new list (haven't changed any existing values) and then overwrite our original list (this is outside LINQ so we no longer care about functional programming ideas).
Since you are starting with an array, you can use the Array.ForEach method:
Array.ForEach(awards, a => a.IncludeInReport = true);
This isn't LINQ, but in this case you don't need LINQ. As others have mentioned, you can't mutate items via LINQ. If you have a List<T> you could use its ForEach method in a similar fashion. Eric Lippert discusses this issue in more depth here: "foreach" vs "ForEach".
There is no mutating method available in Linq. Linq is useful for querying, ordering, filtering, joining, and projecting data. If you need to mutate it, you already have a very clean, clear method of doing so: your loop.
List<T> exposes a ForEach method to write something that reminds you of Linq (but isn't). You can then provide an Action<T> or some other delegate/function that applies your mutation to each element in turn. (Ahmed Mageed's answer also mentions the slightly different Array.ForEach method.) You can write your own extension method to do the same with IEnumerable<T> (which would then be generally more applicable than either aforementioned method and also be available for your array). But I encourage you to simply keep your loop, it's not exactly dirty.
You can do something like that:
awards.AsParallel().ForAll(item => item.IncludeInReport = true)
That makes that action parallel if possible.

Union multiple number of lists in C#

I am looking for a elegant solution for the following situation:
I have a class that contains a List like
class MyClass{
...
public List<SomeOtherClass> SomeOtherClassList {get; set;}
...
}
A third class called Model holds a List<Myclass> which is the one I am operating on from extern.
Now I would like to extend the Model class with a method that returns all unique SomeOtherClass instances over all MyClass instances.
I know that there is the Union() method and with a foreach loop I could solve this issue easily, which I actually did. However, since I am new to all the C#3+ features I am curious how this could be achieved more elegantly, with or without Linq.
I have found an approach, that seems rather clumsy to me, but it works:
List<SomeOtherClass> ret = new List<SomeOtherClass>();
MyClassList.Select(b => b.SomeOtherClasses).ToList().ForEach(l => ret = ret.Union(l).ToList());
return ret;
Note: The b.SomeotherClasses property returns a List<SomeOtherClasses>.
This code is far away from being perfect and some questions arise from the fact that I have to figure out what is good style for working with C#3 and what not. So, I made a little list with thoughts about that snippet, which I would be glad to get a few comments about. Apart from that I'd be glad to hear some comments how to improve this code any further.
The temporary list ret would have been part of an approach in C#2 maybe, but is it correct that I should be able to resign this list with using method chaining instead? Or am I missing the point?
Is it really required to use the intermediate ToList() method? All I want is to perform a further action each member of a selection.
What is the cost of those ToList() operations? Are they good style? Necessary?
Thanks.
You are looking for SelectMany() + Distinct() :
List<SomeOtherClass> ret = MyClassList.SelectMany( x => x.SomeOtherClasses)
.Distinct()
.ToList();
SelectMany() will flatten the "list of lists" into one list, then you can just pick out the distinct entries in this enumeration instead of using union between individual sub-lists.
In general you will want to avoid side effects with Linq, your original approach is kind of abusing this my modifying ret which is not part of the query.
ToList() is required since each standard query operator returns a new enumeration and does not modify the existing enumeration, hence you have to convert the final enumeration result back to a list. The cost of ToList() is a full iteration of the enumeration which, in most cases, is negligible. Of course if your class could use an IEnumerable<SomeOtherClass> instead, you do not have to convert to a list at all.
You should have a look at SelectMany. Something like this should generate your "flat" list:
MyClassList.SelectMany(b => b.SomeOtherClasses)
It will return a IEnumerable<SomeOtherClass> which you can filter/process further.

Merging two Collection<T>

I got a Function that returns a Collection<string>, and that calls itself recursively to eventually return one big Collection<string>.
Now, i just wonder what the best approach to merge the lists? Collection.CopyTo() only copies to string[], and using a foreach() loop feels like being inefficient. However, since I also want to filter out duplicates, I feel like i'll end up with a foreach that calls Contains() on the Collection.
I wonder, is there a more efficient way to have a recursive function that returns a list of strings without duplicates? I don't have to use a Collection, it can be pretty much any suitable data type.
Only exclusion, I'm bound to Visual Studio 2005 and .net 3.0, so no LINQ.
Edit: To clarify: The Function takes a user out of Active Directory, looks at the Direct Reports of the user, and then recursively looks at the direct reports of every user. So the end result is a List of all users that are in the "command chain" of a given user.Since this is executed quite often and at the moment takes 20 Seconds for some users, i'm looking for ways to improve it. Caching the result for 24 Hours is also on my list btw., but I want to see how to improve it before applying caching.
If you're using List<> you can use .AddRange to add one list to the other list.
Or you can use yield return to combine lists on the fly like this:
public IEnumerable<string> Combine(IEnumerable<string> col1, IEnumerable<string> col2)
{
foreach(string item in col1)
yield return item;
foreach(string item in col2)
yield return item;
}
You might want to take a look at Iesi.Collections and Extended Generic Iesi.Collections (because the first edition was made in 1.1 when there were no generics yet).
Extended Iesi has an ISet class which acts exactly as a HashSet: it enforces unique members and does not allow duplicates.
The nifty thing about Iesi is that it has set operators instead of methods for merging collections, so you have the choice between a union (|), intersection (&), XOR (^) and so forth.
I think HashSet<T> is a great help.
The HashSet<T> class provides
high performance set operations. A set
is a collection that contains no
duplicate elements, and whose elements
are in no particular order.
Just add items to it and then use CopyTo.
Update: HashSet<T> is in .Net 3.5
Maybe you can use Dictionary<TKey, TValue>. Setting a duplicate key to a dictionary will not raise an exception.
Can you pass the Collection into you method by refernce so that you can just add items to it, that way you dont have to return anything. This is what it might look like if you did it in c#.
class Program
{
static void Main(string[] args)
{
Collection<string> myitems = new Collection<string>();
myMthod(ref myitems);
Console.WriteLine(myitems.Count.ToString());
Console.ReadLine();
}
static void myMthod(ref Collection<string> myitems)
{
myitems.Add("string");
if(myitems.Count <5)
myMthod(ref myitems);
}
}
As Stated by #Zooba Passing by ref is not necessary here, if you passing by value it will also work.
As far as merging goes:
I wonder, is there a more efficient
way to have a recursive function that
returns a list of strings without
duplicates? I don't have to use a
Collection, it can be pretty much any
suitable data type.
Your function assembles a return value, right? You're splitting the supplied list in half, invoking self again (twice) and then merging those results.
During the merge step, why not just check before you add each string to the result? If it's already there, skip it.
Assuming you're working with sorted lists of course.

Categories