Does LINQ to Objects keep its order - c#

I have a List<Person> and instead want to convert them for simple processing to a List<string>, doing the following:
List<Person> persons = GetPersonsBySeatOrder();
List<string> seatNames = persons.Select(x => x.Name).ToList();
Console.WriteLine("First in line: {0}", seatNames[0]);
Is the .Select() statement on a LINQ to Objects object guaranteed to not change the order of the list members? Assuming no explicit distinct/grouping/ordering is added
Also, if an arbitrary .Where() clause is used first, is it still guaranteed to keep the relative order, or does it sometimes use non-iterative filtering?
As Fermin commented above, this is essentially a duplicate question. I failed on selecting the correct keywords to search stackoverflow
Preserving order with LINQ

It depends on the underlying collection type more than anything. You could get inconsistent ordering from a HashSet, but a List is safe. Even if the ordering you want is provided implicitly, it's better to define an explicit ordering if you need it though. It looks like you're doing that judging by the method names.

In current .Net implementation it use such code. But there are no guarantee that this implementation will be in future.
private static IEnumerable<TResult> SelectIterator<TSource, TResult>(IEnumerable<TSource> source, Func<TSource, int, TResult> selector)
{
int index = -1;
foreach (TSource source1 in source)
{
checked { ++index; }
yield return selector(source1, index);
}
}

Yes, Linq Select is guaranteed to return all its results in the order of the enumeration it is passed. Like most Linq functions, it is fully specified what it does. Barring handling of errors, this might as well be the code for Select:
IEnumerable<Y> Select<X, Y>(this IEnumerable<X> input, Func<X, Y> transform)
{
foreach (var x in input)
yield return transform(x);
}
But as Samantha Branham pointed out, the underlying collection might not have an intrinsic order. I've seen hashtables that rearrange themselves on read.

Related

How to simply convert an IEnumerable into IOrderedEnumerable in O(1)? [duplicate]

Say there is an extension method to order an IQueryable based on several types of Sorting (i.e. sorting by various properties) designated by a SortMethod enum.
public static IOrderedEnumerable<AClass> OrderByX(this IQueryable<AClass> values,
SortMethod? sortMethod)
{
IOrderedEnumerable<AClass> queryRes = null;
switch (sortMethod)
{
case SortMethod.Method1:
queryRes = values.OrderBy(a => a.Property1);
break;
case SortMethod.Method2:
queryRes = values.OrderBy(a => a.Property2);
break;
case null:
queryRes = values.OrderBy(a => a.DefaultProperty);
break;
default:
queryRes = values.OrderBy(a => a.DefaultProperty);
break;
}
return queryRes;
}
In the case where sortMethod is null (i.e. where it is specified that I don't care about the order of the values), is there a way to instead of ordering by some default property, to instead just pass the IEnumerator values through as "ordered" without having to perform the actual sort?
I would like the ability to call this extension, and then possibly perform some additional ThenBy orderings.
All you need to do for the default case is:
queryRes = values.OrderBy(a => 1);
This will effectively be a noop sort. Because the OrderBy performs a stable sort the original order will be maintained in the event that the selected objects are equal. Note that since this is an IQueryable and not an IEnumerable it's possible for the query provider to not perform a stable sort. In that case, you need to know if it's important that order be maintained, or if it's appropriate to just say "I don't care what order the result is, so long as I can call ThenBy on the result).
Another option, that allows you to avoid the actual sort is to create your own IOrderedEnumerable implementation:
public class NoopOrder<T> : IOrderedEnumerable<T>
{
private IQueryable<T> source;
public NoopOrder(IQueryable<T> source)
{
this.source = source;
}
public IOrderedEnumerable<T> CreateOrderedEnumerable<TKey>(Func<T, TKey> keySelector, IComparer<TKey> comparer, bool descending)
{
if (descending)
{
return source.OrderByDescending(keySelector, comparer);
}
else
{
return source.OrderBy(keySelector, comparer);
}
}
public IEnumerator<T> GetEnumerator()
{
return source.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return source.GetEnumerator();
}
}
With that your query can be:
queryRes = new NoopOrder<AClass>(values);
Note that the consequence of the above class is that if there is a call to ThenBy that ThenBy will effectively be a top level sort. It is in effect turning the subsequent ThenBy into an OrderBy call. (This should not be surprising; ThenBy will call the CreateOrderedEnumerable method, and in there this code is calling OrderBy, basically turning that ThenBy into an OrderBy. From a conceptual sorting point of view, this is a way of saying that "all of the items in this sequence are equal in the eyes of this sort, but if you specify that equal objects should be tiebroken by something else, then do so.
Another way of thinking of a "no op sort" is that it orders the items based in the index of the input sequence. This means that the items are not all "equal", it means that the order input sequence will be the final order of the output sequence, and since each item in the input sequence is always larger than the one before it, adding additional "tiebreaker" comparisons will do nothing, making any subsequent ThenBy calls pointless. If this behavior is desired, it is even easier to implement than the previous one:
public class NoopOrder<T> : IOrderedEnumerable<T>
{
private IQueryable<T> source;
public NoopOrder(IQueryable<T> source)
{
this.source = source;
}
public IOrderedEnumerable<T> CreateOrderedEnumerable<TKey>(Func<T, TKey> keySelector, IComparer<TKey> comparer, bool descending)
{
return new NoopOrder<T>(source);
}
public IEnumerator<T> GetEnumerator()
{
return source.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return source.GetEnumerator();
}
}
If you return always the same index value you will get an IOrderedEnumerable that preserve the original list order:
case null:
queryRes = values.OrderBy(a => 1);
break;
Btw I don't think this is a right thing to do. You will get a collection that is supposted to be ordered but actually it is not.
Bottom line, IOrderedEnumerable exists solely to provide a grammar structure to the OrderBy()/ThenBy() methods, preventing you from trying to start an ordering clause with ThenBy(). process. It's not intended to be a "marker" that identifies the collection as ordered, unless it was actually ordered by OrderBy(). So, the answer is that if the sorting method being null is supposed to indicate that the enumerable is in some "default order", you should specify that default order (as your current implementation does). It's disingenuous to state that the enumerable is ordered when in fact it isn't, even if, by not specifying a SortingMethod, you are inferring it's "ordered by nothing" and don't care about the actual order.
The "problem" inherent in trying to simply mark the collection as ordered using the interface is that there's more to the process than simply sorting. By executing an ordering method chain, such as myCollection.OrderBy().ThenBy().ThenByDescending(), you're not actually sorting the collection with each call; not yet anyway. You are instead defining the behavior of an "iterator" class, named OrderedEnumerable, which will use the projections and comparisons you define in the chain to perform the sorting at the moment you need an actual sorted element.
Servy's answer, stating that OrderBy(x=>1) is a noop and should be optimized out of SQL providers ignores the reality that this call, made against an Enumerable, will still do quite a bit of work, and that most SQL providers in fact do not optimize this kind of call; OrderBy(x=>1) will, in most Linq providers, produce a query with an "ORDER BY 1" clause, which not only forces the SQL provider to perform its own sorting, it will actually result in a change to the order, because in T-SQL at least "ORDER BY 1" means to order by the first column of the select list.

ICollection<T> is non-index based, but TakeWhile() exists

I'm trying to replace usages of T[] or List<T> as function parameters and return values with more appropriate types such as IEnumerable<T>, ICollection<T> and IList<T>.
ICollection<T> from my understanding is preferrable to IList<T> where you are only needing basic/simple collection functionality (eg an enumerator and count functionality) as it provides this with the least restriction. From reading on here one of the main differentiators I thought was that ICollection<T> doesn't require that the underlying collection to be index based where IList<T> does?
In switching my List<T> references over I needed to replace a List<T>.GetRange() call and I was very surprised to find the ICollection<T>.TakeWhile() extension method which has an overload supporting selection based on index?! (msdn link)
I'm confused why this method exists on ICollection where there is nothing index based on this interface? Have I misunderstood or how can this method actually work if the underlying collection is eg a Hashset or something?
The method, like most of LINQ, is on IEnumerable<T>. Any features that just pass the indexer to the consumer (such as TakeWhile) only need to loop while incrementing a counter. Some APIs may be able to optimize using an indexer, and then it is up to them to decide whether to do that, or just use IEnumerable<T> and simply skip (etc) unwanted data.
For example:
int i = 0;
foreach(var item in source) {
if(!predicate(i++, item)) break;
yield return item;
}
Indexing can be done without collection's support of it
int i = -1;
foreach(var item in collection)
{
i++;
// item is at index i;
}
TakeWhile and other extension methods from System.Linq.Enumerable class work on all the types implementing IEnumerable<T>. They all iterate over the collection (using foreach statement) and perform appropriate actions.
Here is the implementation of the TakeWhile method, with some simplifications:
private static IEnumerable<TSource> TakeWhile<TSource>(IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
foreach (TSource item in source)
{
if (!predicate(item))
{
break;
}
yield return item;
}
}
As you see, it simply iterates over the collection, and evaluates the predicate. This is true for almost all other LINQ methods. The same will happen when you use any other collection, like HashSet<T>.

LINQ's ForEach on HashSet?

I am curious as to what restrictions necessitated the design decision to not have HashSet's be able to use LINQ's ForEach query.
What's really going on differently behind the scenes for these two implementations:
var myHashSet = new HashSet<T>;
foreach( var item in myHashSet ) { do.Stuff(); }
vs
var myHashSet = new HashSet<T>;
myHashSet.ForEach( item => do.Stuff(); }
I'm (pretty) sure that this is just because HashSet does not implement IEnumerable -- but what is a normal ForEach loop doing differently that makes it more supported by a HashSet?
Thanks
LINQ doesn't have ForEach. Only the List<T> class has a ForEach method.
It's also important to note that HashSet does implement IEnumerable<T>.
Remember, LINQ stands for Language INtegrated Query. It is meant to query collections of data. ForEach has nothing to do with querying. It simply loops over the data. Therefore it really doesn't belong in LINQ.
LINQ is meant to query data, I'm guessing it avoided ForEach() because there's a chance it could mutate data that would affect the way the data could be queried (i.e. if you changed a field that affected the hash code or equality).
You may be confused with the fact that List<T> has a ForEach()?
It's easy enough to write one, of course, but it should be used with caution because of those aforementioned concerns...
public static class EnumerableExtensions
{
public static void ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
if (source == null) throw new ArgumentNullException("source");
if (action == null) throw new ArgumentNullException("action");
foreach(var item in source)
{
action(item);
}
}
}
var myHashSet = new HashSet<T>;
myHashSet.ToList().ForEach( x => x.Stuff() );
The first use the method GetEnumerator of HashSet
The second the method ForEach
Maybe the second use GetEnumerator behind the scene but I'm not sure.

Linq Enumerable<T>.ElementAt => IEnumerable<T>

Is there a method in Linq that does the same as ElementAt except it returns an IEnumerable<T> with a single element, rather than the actual element? Isn't there some SelectRange(startIndex, endIndex) method I could use and just pass the same index twice?
The simplest way would be to use
source.Skip(count).Take(1)
or more generally
source.Skip(startIndex).Take(endIndex - startIndex)
(assuming an inclusive startIndex but exclusive endIndex).
Ah.. it's called GetRange(index, count). My bad. Just found it :)
Jon Skeet's technique is a great way to do it. I would however suggest a possible optimization that is based on an implementation detail in Enumerable.Skip: it does not currently appear to take advantage of indexers on IList or IList<T>. Fortunately, Enumerable.ElementAt does.
So an alternate solution would be:
var itemAsSequence = new[] { source.ElementAt(index) };
Do note that this will execute eagerly. If you want deferred execution semantics similar to Jon's answer, you could do something like:
public static IEnumerable<T> ElementAtAsSequence<T>
(this IEnumerable<T> source, int index)
{
// if you need argument validation, you need another level of indirection
yield return source.ElementAt(index);
}
...
var itemAsSequence = source.ElementAtAsSequence(index);
I should point out that since this relies on an implementation detail, future improvements in LINQ to Objects could make this optimization redundant.
write an extension method
public static IEnumerable<T> ToMyEnumerable<T>(this T input)
{
var enumerbale = new[] { input };
return enumerbale;
}
source.First( p => p.ID == value).ToMyEnumerable<T>()
which is O(n)

How can I add an item to a IEnumerable<T> collection?

My question as title above. For example
IEnumerable<T> items = new T[]{new T("msg")};
items.ToList().Add(new T("msg2"));
but after all it only has 1 item inside. Can we have a method like items.Add(item) like the List<T>?
You cannot, because IEnumerable<T> does not necessarily represent a collection to which items can be added. In fact, it does not necessarily represent a collection at all! For example:
IEnumerable<string> ReadLines()
{
string s;
do
{
s = Console.ReadLine();
yield return s;
} while (!string.IsNullOrEmpty(s));
}
IEnumerable<string> lines = ReadLines();
lines.Add("foo") // so what is this supposed to do??
What you can do, however, is create a new IEnumerable object (of unspecified type), which, when enumerated, will provide all items of the old one, plus some of your own. You use Enumerable.Concat for that:
items = items.Concat(new[] { "foo" });
This will not change the array object (you cannot insert items into to arrays, anyway). But it will create a new object that will list all items in the array, and then "Foo". Furthermore, that new object will keep track of changes in the array (i.e. whenever you enumerate it, you'll see the current values of items).
The type IEnumerable<T> does not support such operations. The purpose of the IEnumerable<T> interface is to allow a consumer to view the contents of a collection. Not to modify the values.
When you do operations like .ToList().Add() you are creating a new List<T> and adding a value to that list. It has no connection to the original list.
What you can do is use the Add extension method to create a new IEnumerable<T> with the added value.
items = items.Add("msg2");
Even in this case it won't modify the original IEnumerable<T> object. This can be verified by holding a reference to it. For example
var items = new string[]{"foo"};
var temp = items;
items = items.Add("bar");
After this set of operations the variable temp will still only reference an enumerable with a single element "foo" in the set of values while items will reference a different enumerable with values "foo" and "bar".
EDIT
I contstantly forget that Add is not a typical extension method on IEnumerable<T> because it's one of the first ones that I end up defining. Here it is
public static IEnumerable<T> Add<T>(this IEnumerable<T> e, T value) {
foreach ( var cur in e) {
yield return cur;
}
yield return value;
}
Have you considered using ICollection<T> or IList<T> interfaces instead, they exist for the very reason that you want to have an Add method on an IEnumerable<T>.
IEnumerable<T> is used to 'mark' a type as being...well, enumerable or just a sequence of items without necessarily making any guarantees of whether the real underlying object supports adding/removing of items. Also remember that these interfaces implement IEnumerable<T> so you get all the extensions methods that you get with IEnumerable<T> as well.
In .net Core, there is a method Enumerable.Append that does exactly that.
The source code of the method is available on GitHub..... The implementation (more sophisticated than the suggestions in other answers) is worth a look :).
A couple short, sweet extension methods on IEnumerable and IEnumerable<T> do it for me:
public static IEnumerable Append(this IEnumerable first, params object[] second)
{
return first.OfType<object>().Concat(second);
}
public static IEnumerable<T> Append<T>(this IEnumerable<T> first, params T[] second)
{
return first.Concat(second);
}
public static IEnumerable Prepend(this IEnumerable first, params object[] second)
{
return second.Concat(first.OfType<object>());
}
public static IEnumerable<T> Prepend<T>(this IEnumerable<T> first, params T[] second)
{
return second.Concat(first);
}
Elegant (well, except for the non-generic versions). Too bad these methods are not in the BCL.
No, the IEnumerable doesn't support adding items to it. The alternative solution is
var myList = new List(items);
myList.Add(otherItem);
To add second message you need to -
IEnumerable<T> items = new T[]{new T("msg")};
items = items.Concat(new[] {new T("msg2")})
I just come here to say that, aside from Enumerable.Concat extension method, there seems to be another method named Enumerable.Append in .NET Core 1.1.1. The latter allows you to concatenate a single item to an existing sequence. So Aamol's answer can also be written as
IEnumerable<T> items = new T[]{new T("msg")};
items = items.Append(new T("msg2"));
Still, please note that this function will not change the input sequence, it just return a wrapper that put the given sequence and the appended item together.
Not only can you not add items like you state, but if you add an item to a List<T> (or pretty much any other non-read only collection) that you have an existing enumerator for, the enumerator is invalidated (throws InvalidOperationException from then on).
If you are aggregating results from some type of data query, you can use the Concat extension method:
Edit: I originally used the Union extension in the example, which is not really correct. My application uses it extensively to make sure overlapping queries don't duplicate results.
IEnumerable<T> itemsA = ...;
IEnumerable<T> itemsB = ...;
IEnumerable<T> itemsC = ...;
return itemsA.Concat(itemsB).Concat(itemsC);
Others have already given great explanations regarding why you can not (and should not!) be able to add items to an IEnumerable. I will only add that if you are looking to continue coding to an interface that represents a collection and want an add method, you should code to ICollection or IList. As an added bonanza, these interfaces implement IEnumerable.
you can do this.
//Create IEnumerable
IEnumerable<T> items = new T[]{new T("msg")};
//Convert to list.
List<T> list = items.ToList();
//Add new item to list.
list.add(new T("msg2"));
//Cast list to IEnumerable
items = (IEnumerable<T>)items;
Easyest way to do that is simply
IEnumerable<T> items = new T[]{new T("msg")};
List<string> itemsList = new List<string>();
itemsList.AddRange(items.Select(y => y.ToString()));
itemsList.Add("msg2");
Then you can return list as IEnumerable also because it implements IEnumerable interface
Instances implementing IEnumerable and IEnumerator (returned from IEnumerable) don't have any APIs that allow altering collection, the interface give read-only APIs.
The 2 ways to actually alter the collection:
If the instance happens to be some collection with write API (e.g. List) you can try casting to this type:
IList<string> list = enumerableInstance as IList<string>;
Create a list from IEnumerable (e.g. via LINQ extension method toList():
var list = enumerableInstance.toList();
IEnumerable items = Enumerable.Empty(T);
List somevalues = new List();
items.ToList().Add(someValues);
items.ToList().AddRange(someValues);
Sorry for reviving really old question but as it is listed among first google search results I assume that some people keep landing here.
Among a lot of answers, some of them really valuable and well explained, I would like to add a different point of vue as, to me, the problem has not be well identified.
You are declaring a variable which stores data, you need it to be able to change by adding items to it ? So you shouldn't use declare it as IEnumerable.
As proposed by #NightOwl888
For this example, just declare IList instead of IEnumerable: IList items = new T[]{new T("msg")}; items.Add(new T("msg2"));
Trying to bypass the declared interface limitations only shows that you made the wrong choice.
Beyond this, all methods that are proposed to implement things that already exists in other implementations should be deconsidered.
Classes and interfaces that let you add items already exists. Why always recreate things that are already done elsewhere ?
This kind of consideration is a goal of abstracting variables capabilities within interfaces.
TL;DR : IMO these are cleanest ways to do what you need :
// 1st choice : Changing declaration
IList<T> variable = new T[] { };
variable.Add(new T());
// 2nd choice : Changing instantiation, letting the framework taking care of declaration
var variable = new List<T> { };
variable.Add(new T());
When you'll need to use variable as an IEnumerable, you'll be able to. When you'll need to use it as an array, you'll be able to call 'ToArray()', it really always should be that simple. No extension method needed, casts only when really needed, ability to use LinQ on your variable, etc ...
Stop doing weird and/or complex things because you only made a mistake when declaring/instantiating.
Maybe I'm too late but I hope it helps anyone in the future.
You can use the insert function to add an item at a specific index.
list.insert(0, item);
Sure, you can (I am leaving your T-business aside):
public IEnumerable<string> tryAdd(IEnumerable<string> items)
{
List<string> list = items.ToList();
string obj = "";
list.Add(obj);
return list.Select(i => i);
}

Categories