How to get excluded collection without a second LINQ query? - c#

I have a LINQ query that looks like this:
var p = option.GetType().GetProperties().Where(t => t.PropertyType == typeof(bool));
What is the most efficient way to get the items which aren't included in this query, without executing a second iteration over the list.
I could easily do this with a for loop but I was wondering if there's a shorthand with LINQ.

var p = option.GetType().GetProperties().ToLookup(t => t.PropertyType == typeof(bool));
var bools = p[true];
var notBools = p[false];
.ToLookup() is used to partition an IEnumerable based on a key function. In this case, it will return an Lookup which will have at most 2 items in it. Items in the Lookup can be accessed using a key similar to an IDictionary.
.ToLookup() is evaluated immediately and is an O(n) operation and accessing a partition in the resulting Lookup is an O(1) operation.
Lookup is very similar to a Dictionary and have similar generic parameters (a Key type and a Value type). However, where Dictionary maps a key to a single value, Lookup maps a key to an set of values. Lookup can be implemented as IDictionary<TKey, IEnumerable<TValue>>
.GroupBy() could also be used. But it is different from .ToLookup() in that GroupBy is lazy evaluated and could possibly be enumerated multiple times. .ToLookup() is evaluated immediately and the work is only done once.

You cannot get something that you don't ask for. So if you exlude all but bool you can't expect to get them later. You need to ask for them.
For what it's worth, if you need both, the one you want and all other in a single query you could GroupBy this condition or use ToLookup which i would prefer:
var isboolOrNotLookup = option.GetType().GetProperties()
.ToLookup(t => t.PropertyType == typeof(bool)); // use PropertyType instead
Now you can use this lookup for further processing. For example, if you want a collection of all properties which are bool:
List<System.Reflection.PropertyInfo> boolTypes = isboolOrNotLookup[true].ToList();
or just the count:
int boolCount = isboolOrNotLookup[true].Count();
So if you want to process all which are not bool:
foreach(System.Reflection.PropertyInfo prop in isboolOrNotLookup[false])
{
}

Well, you could go for source.Except(p), but it would reiterate the list and perform a lot of comparisons.
I'd say - write an extension method that does it using foreach, basically splitting the list into two destinations. Or something like this.
How about:
public class UnzipResult<T>{
private readonly IEnumearator<T> _enumerator;
private readonly Func<T, bool> _filter;
private readonly Queue<T> _nonMatching = new Queue<T>();
private readonly Queue<T> _matching = new Queue<T>();
public IEnumerable<T> Matching {get{
if(_matching.Count > 0)
yield return _matching.Dequeue();
else {
while(_enumerator.MoveNext()){
if(_filter(_enumerator.Current))
yield return _enumerator.Current;
else
_nonMatching.Enqueue(_enumerator.Current);
}
yield break;
}
}}
public IEnumerable<T> Rest {get{
if(_matching.Count > 0)
yield return _nonMatching.Dequeue();
else {
while(_enumerator.MoveNext()){
if(!_filter(_enumerator.Current))
yield return _enumerator.Current;
else
_matching.Enqueue(_enumerator.Current);
}
yield break;
}
}}
public UnzipResult(IEnumerable<T> source, Func<T, bool> filter){
_enumerator = source.GetEnumerator();
_filter = filter;
}
}
public static UnzipResult<T> Unzip(this IEnumerable<T> source, Func<T,bool> filter){
return new UnzipResult(source, filter);
}
It's written in notepad, so probably doesn't compile, but my idea is: whatever collection you enumerate (matching or non-matching), you only enumerate the source once. And it should work fairly well with those pesky infinite collections (think yield return random.Next()), unless all elements do/don't fulfil filter.

Related

How to simply convert an IEnumerable into IOrderedEnumerable in O(1)? [duplicate]

Say there is an extension method to order an IQueryable based on several types of Sorting (i.e. sorting by various properties) designated by a SortMethod enum.
public static IOrderedEnumerable<AClass> OrderByX(this IQueryable<AClass> values,
SortMethod? sortMethod)
{
IOrderedEnumerable<AClass> queryRes = null;
switch (sortMethod)
{
case SortMethod.Method1:
queryRes = values.OrderBy(a => a.Property1);
break;
case SortMethod.Method2:
queryRes = values.OrderBy(a => a.Property2);
break;
case null:
queryRes = values.OrderBy(a => a.DefaultProperty);
break;
default:
queryRes = values.OrderBy(a => a.DefaultProperty);
break;
}
return queryRes;
}
In the case where sortMethod is null (i.e. where it is specified that I don't care about the order of the values), is there a way to instead of ordering by some default property, to instead just pass the IEnumerator values through as "ordered" without having to perform the actual sort?
I would like the ability to call this extension, and then possibly perform some additional ThenBy orderings.
All you need to do for the default case is:
queryRes = values.OrderBy(a => 1);
This will effectively be a noop sort. Because the OrderBy performs a stable sort the original order will be maintained in the event that the selected objects are equal. Note that since this is an IQueryable and not an IEnumerable it's possible for the query provider to not perform a stable sort. In that case, you need to know if it's important that order be maintained, or if it's appropriate to just say "I don't care what order the result is, so long as I can call ThenBy on the result).
Another option, that allows you to avoid the actual sort is to create your own IOrderedEnumerable implementation:
public class NoopOrder<T> : IOrderedEnumerable<T>
{
private IQueryable<T> source;
public NoopOrder(IQueryable<T> source)
{
this.source = source;
}
public IOrderedEnumerable<T> CreateOrderedEnumerable<TKey>(Func<T, TKey> keySelector, IComparer<TKey> comparer, bool descending)
{
if (descending)
{
return source.OrderByDescending(keySelector, comparer);
}
else
{
return source.OrderBy(keySelector, comparer);
}
}
public IEnumerator<T> GetEnumerator()
{
return source.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return source.GetEnumerator();
}
}
With that your query can be:
queryRes = new NoopOrder<AClass>(values);
Note that the consequence of the above class is that if there is a call to ThenBy that ThenBy will effectively be a top level sort. It is in effect turning the subsequent ThenBy into an OrderBy call. (This should not be surprising; ThenBy will call the CreateOrderedEnumerable method, and in there this code is calling OrderBy, basically turning that ThenBy into an OrderBy. From a conceptual sorting point of view, this is a way of saying that "all of the items in this sequence are equal in the eyes of this sort, but if you specify that equal objects should be tiebroken by something else, then do so.
Another way of thinking of a "no op sort" is that it orders the items based in the index of the input sequence. This means that the items are not all "equal", it means that the order input sequence will be the final order of the output sequence, and since each item in the input sequence is always larger than the one before it, adding additional "tiebreaker" comparisons will do nothing, making any subsequent ThenBy calls pointless. If this behavior is desired, it is even easier to implement than the previous one:
public class NoopOrder<T> : IOrderedEnumerable<T>
{
private IQueryable<T> source;
public NoopOrder(IQueryable<T> source)
{
this.source = source;
}
public IOrderedEnumerable<T> CreateOrderedEnumerable<TKey>(Func<T, TKey> keySelector, IComparer<TKey> comparer, bool descending)
{
return new NoopOrder<T>(source);
}
public IEnumerator<T> GetEnumerator()
{
return source.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return source.GetEnumerator();
}
}
If you return always the same index value you will get an IOrderedEnumerable that preserve the original list order:
case null:
queryRes = values.OrderBy(a => 1);
break;
Btw I don't think this is a right thing to do. You will get a collection that is supposted to be ordered but actually it is not.
Bottom line, IOrderedEnumerable exists solely to provide a grammar structure to the OrderBy()/ThenBy() methods, preventing you from trying to start an ordering clause with ThenBy(). process. It's not intended to be a "marker" that identifies the collection as ordered, unless it was actually ordered by OrderBy(). So, the answer is that if the sorting method being null is supposed to indicate that the enumerable is in some "default order", you should specify that default order (as your current implementation does). It's disingenuous to state that the enumerable is ordered when in fact it isn't, even if, by not specifying a SortingMethod, you are inferring it's "ordered by nothing" and don't care about the actual order.
The "problem" inherent in trying to simply mark the collection as ordered using the interface is that there's more to the process than simply sorting. By executing an ordering method chain, such as myCollection.OrderBy().ThenBy().ThenByDescending(), you're not actually sorting the collection with each call; not yet anyway. You are instead defining the behavior of an "iterator" class, named OrderedEnumerable, which will use the projections and comparisons you define in the chain to perform the sorting at the moment you need an actual sorted element.
Servy's answer, stating that OrderBy(x=>1) is a noop and should be optimized out of SQL providers ignores the reality that this call, made against an Enumerable, will still do quite a bit of work, and that most SQL providers in fact do not optimize this kind of call; OrderBy(x=>1) will, in most Linq providers, produce a query with an "ORDER BY 1" clause, which not only forces the SQL provider to perform its own sorting, it will actually result in a change to the order, because in T-SQL at least "ORDER BY 1" means to order by the first column of the select list.

Force IEnumerable<T> to evaluate without calling .ToArray() or .ToList()

If I query EF using something like this...
IEnumerable<FooBar> fooBars = db.FooBars.Where(o => o.SomeValue == something);
IIRC, This creates a lazy-evaluated, iterable state machine in the background, that does not yet contain any results; rather, it contains an expression of "how" to obtain the results when required.
If I want to force the collection to contain results I have to call .ToArray() or .ToList()
Is there a way to force an IEnumerable<T> collection to contain results without calling .ToArray() or .ToList(); ?
Rationale
I don't know if the CLR is capable of doing this, but essentially I want to forcibly create an evaluated collection that implements the IEnumerable<T> interface, but is implemented under the hood by the CLR, thus NOT a List<T> or Array<T>
Presumably this is not possible, since I'm not aware of any CLR capability to create in-memory, evaluated collections that implement IEnumerable<T>
Proposal
Say for example, I could write something like this:
var x = IEnumerable<FooBar> fooBars = db.FooBars
.Where(o => o.SomeValue == something)
.Evaluate(); // Does NOT return a "concrete" impl such as List<T> or Array<T>
Console.WriteLine(x.GetType().Name);
// eg. <EvaluatedEnumerable>e__123
Is there a way to force an IEnumerable<T> collection to contain results without calling .ToArray() or .ToList(); ?
Yes, but it is perhaps not what you want:
IEnumerable<T> source = …;
IEnumerable<T> cached = new List<T>(source);
The thing is, IEnumerable<T> is not a concrete type. It is an interface (contract) representing an item sequence. There can be any concrete type "hiding behind" this interface; some might only represent a query, others actually hold the queried items in memory.
If you want to force-evaluate your sequence so that the result is actually stored in physical memory, you need to make sure that the concrete type behind IEnumerable<T> is a in-memory collection that holds the results of the evaluation. The above code example does just that.
You can use a foreach loop:
foreach (var item in fooBars) { }
Note that this evaluates all items in fooBars, but throws away the result immediately. Next time you run the same foreach loop or .ToArray(), .ToList(), the enumerable will be evaluated once again.
A concrete use case I've run into revolves around needing to ensure that an IEnumerable that wraps a DB Query has begun returning results (indicating that the query did not time out) before returning control to the calling method. But the results are too large to evaluate fully, hence the IEnumerable to support streaming.
internal class EagerEvaluator<T>
{
private readonly T _first;
private readonly IEnumerator<T> _enumerator;
private readonly bool _hasFirst;
public EagerEvaluator(IEnumerable<T> enumerable)
{
_enumerator = enumerable.GetEnumerator();
if (_enumerator.MoveNext())
{
_hasFirst = true;
_first = _enumerator.Current;
}
}
public IEnumerable<T> ToEnumerable()
{
if (_hasFirst)
{
yield return _first;
while (_enumerator.MoveNext())
{
yield return _enumerator.Current;
}
}
}
}
The usage is pretty straight forward:
IEnumerable<FooBar> fooBars = new EagerEvaluator(fooBars).ToEnumerable()
Another options is:
<linq expression>.All( x => true);
I use Aggregate<T>() to evaluate an IEnumerable<T> with side effects:
private static IEnumerable<T> Evaluate<T>(IEnumerable<T> source)
=> source.Aggregate(Enumerable.Empty<T>(), (evaluated, s) => evaluated.Append(s));
See it in action: https://dotnetfiddle.net/iya2l0

Does IEnumerable always imply a collection?

Just a quick question regarding IEnumerable:
Does IEnumerable always imply a collection? Or is it legitimate/viable/okay/whatever to use on a single object?
The IEnumerable and IEnumerable<T> interfaces suggest a sequence of some kind, but that sequence doesn't need to be a concrete collection.
For example, where's the underlying concrete collection in this case?
foreach (int i in new EndlessRandomSequence().Take(5))
{
Console.WriteLine(i);
}
// ...
public class EndlessRandomSequence : IEnumerable<int>
{
public IEnumerator<int> GetEnumerator()
{
var rng = new Random();
while (true) yield return rng.Next();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
It is always and mandatory that IEnumerable is used on a single object - the single object is always the holder or producer of zero or more other objects that do not necessarily have any relation to IEnumerable.
It's usual, but not mandatory, that IEnumerable represents a collection.
Enumerables can be collections, as well as generators, queries, and even computations.
Generator:
IEnumerable<int> Generate(
int initial,
Func<int, bool> condition,
Func<int, int> iterator)
{
var i = initial;
while (true)
{
yield return i;
i = iterator(i);
if (!condition(i))
{
yield break;
}
}
}
Query:
IEnumerable<Process> GetProcessesWhereNameContains(string text)
{
// Could be web-service or database call too
var processes = System.Diagnostics.Process.GetProcesses();
foreach (var process in processes)
{
if (process.ProcessName.Contains(text))
{
yield return process;
}
}
}
Computation:
IEnumerable<double> Average(IEnumerable<double> values)
{
var sum = 0.0;
var count = 0;
foreach (var value in values)
{
sum += value;
yield return sum/++count;
}
}
LINQ is itself a series of operators that produce objects that implement IEnumerable<T> that don't have any underlying collections.
Good question, BTW!
NB: Any reference to IEnumerable also applies to IEnumerable<T> as the latter inherits the former.
Yes, IEnumerable implies a collection, or possible collection, of items.
The name is derived from enumerate, which means to:
Mention (a number of things) one by one.
Establish the number of.
According to the docs, it exposes the enumerator over a collection.
You can certainly use it on a single object, but this object will then just be exposed as an enumeration containing a single object, i.e. you could have an IEnumerable<int> with a single integer:
IEnumerable<int> items = new[] { 42 };
IEnumerable represents a collection that can be enumerated, not a single item. Look at MSDN; the interface exposes GetEnumerator(), which
...[r]eturns an enumerator that iterates through a collection.
Yes, IEnumerable always implies a collection, that is what enumerate means.
What is your use case for a single object?
I don't see a problem with using it on a single object, but why do want to do this?
I'm not sure whether you mean a "collection" or a .NET "ICollection" but since other people have only mentioned the former I will mention the latter.
http://msdn.microsoft.com/en-us/library/92t2ye13.aspx
By that definition, All ICollections are IEnumerable. But not the other way around.
But most data structure (Array even) just implement both interfaces.
Going on this train of thought: you could have a car depot (a single object) that does not expose an internal data structure, and put IEnumerable on it. I suppose.

Is this achievable with a single LINQ query?

Suppose I have a given object of type IEnumerable<string> which is the return value of method SomeMethod(), and which contains no repeated elements. I would like to be able to "zip" the following lines in a single LINQ query:
IEnumerable<string> someList = SomeMethod();
if (someList.Contains(givenString))
{
return (someList.Where(givenString));
}
else
{
return (someList);
}
Edit: I mistakenly used Single instead of First. Corrected now.
I know I can "zip" this by using the ternary operator, but that's just not the point. I would just list to be able to achieve this with a single line. Is that possible?
This will return items with given string or all items if given is not present in the list:
someList.Where(i => i == givenString || !someList.Contains(givenString))
The nature of your desired output requires that you either make two requests for the data, like you are now, or buffer the non-matches to return if no matches are found. The later would be especially useful in cases where actually getting the data is a relatively expensive call (eg: database query or WCF service). The buffering method would look like this:
static IEnumerable<T> AllIfNone<T>(this IEnumerable<T> source,
Func<T, bool> predicate)
{
//argument checking ignored for sample purposes
var buffer = new List<T>();
bool foundFirst = false;
foreach (var item in source)
{
if (predicate(item))
{
foundFirst = true;
yield return item;
}
else if (!foundFirst)
{
buffer.Add(item);
}
}
if (!foundFirst)
{
foreach (var item in buffer)
{
yield return item;
}
}
}
The laziness of this method is either that of Where or ToList depending on if the collection contains a match or not. If it does, you should get execution similar to Where. If not, you will get roughly the execution of calling ToList (with the overhead of all the failed filter checks) and iterating the result.
What is wrong with the ternary operator?
someList.Any(s => s == givenString) ? someList.Where(s => s == givenString) : someList;
It would be better to do the Where followed by the Any but I can't think of how to one-line that.
var reducedEnumerable = someList.Where(s => s == givenString);
return reducedEnumerable.Any() ? reducedEnumerable : someList;
It is not possible to change the return type on the method, which is what you're asking. The first condition returns a string and the second condition returns a collection of strings.
Just return the IEnumerable<string> collection, and call Single on the return value like this:
string test = ReturnCollectionOfStrings().Single(x => x == "test");

Check if multiple values (stored in a dedicated collection) are in a LINQ collection, in query

What is the method in LINQ to supply a collection of values and check if any/all of these values are in a collection?
Thanks
You can emulate this via .Intersect() and check if the intersection set has all the required elements. I guess this is pretty inefficient but quick and dirty.
List<T> list = ...
List<T> shouldBeContained = ...
bool containsAll = (list.Intersect(shouldBeContained).Count == shouldBeContained.Count)
Or you could do it with .All(). I guess this is more efficient and cleaner:
List<T> list = ...
List<T> shouldBeContained = ...
bool containsAll = (shouldBeContained.All(x=>list.Contains(x));
Linq has a number of operators that can be used to check existence of one set of values in another.
I would use Intersect:
Produces the set intersection of two sequences by using the default equality comparer to compare values.
While there's nothing easy that is built in...you could always create extension methods to make life easier:
public static bool ContainsAny<T>(this IEnumerable<T> data,
IEnumerable<T> intersection)
{
foreach(T item in intersection)
if(data.Contains(item)
return true;
return false;
}
public static bool ContainsAll<T>(this IEnumerable<T> data,
IEnumerable<T> intersection)
{
foreach(T item in intersection)
if(!data.Contains(item))
return false;
return true;
}

Categories