Is this achievable with a single LINQ query? - c#

Suppose I have a given object of type IEnumerable<string> which is the return value of method SomeMethod(), and which contains no repeated elements. I would like to be able to "zip" the following lines in a single LINQ query:
IEnumerable<string> someList = SomeMethod();
if (someList.Contains(givenString))
{
return (someList.Where(givenString));
}
else
{
return (someList);
}
Edit: I mistakenly used Single instead of First. Corrected now.
I know I can "zip" this by using the ternary operator, but that's just not the point. I would just list to be able to achieve this with a single line. Is that possible?

This will return items with given string or all items if given is not present in the list:
someList.Where(i => i == givenString || !someList.Contains(givenString))

The nature of your desired output requires that you either make two requests for the data, like you are now, or buffer the non-matches to return if no matches are found. The later would be especially useful in cases where actually getting the data is a relatively expensive call (eg: database query or WCF service). The buffering method would look like this:
static IEnumerable<T> AllIfNone<T>(this IEnumerable<T> source,
Func<T, bool> predicate)
{
//argument checking ignored for sample purposes
var buffer = new List<T>();
bool foundFirst = false;
foreach (var item in source)
{
if (predicate(item))
{
foundFirst = true;
yield return item;
}
else if (!foundFirst)
{
buffer.Add(item);
}
}
if (!foundFirst)
{
foreach (var item in buffer)
{
yield return item;
}
}
}
The laziness of this method is either that of Where or ToList depending on if the collection contains a match or not. If it does, you should get execution similar to Where. If not, you will get roughly the execution of calling ToList (with the overhead of all the failed filter checks) and iterating the result.

What is wrong with the ternary operator?
someList.Any(s => s == givenString) ? someList.Where(s => s == givenString) : someList;
It would be better to do the Where followed by the Any but I can't think of how to one-line that.
var reducedEnumerable = someList.Where(s => s == givenString);
return reducedEnumerable.Any() ? reducedEnumerable : someList;

It is not possible to change the return type on the method, which is what you're asking. The first condition returns a string and the second condition returns a collection of strings.
Just return the IEnumerable<string> collection, and call Single on the return value like this:
string test = ReturnCollectionOfStrings().Single(x => x == "test");

Related

Looking for alternative LINQ expression(s)

I'm working on a code generator that validated objects based on certain business rules. As an example, I’m curious to find out various ways below logic can be written as LINQ expression.
Assertion should evaluate to true when collection is null OR when count of "TrueAndCorrect" items is anything but 1. One possible solution is:
bool assertion = report.DeclarationOfTrusteeCollection == null
|| report.DeclarationOfTrusteeCollection.Count(f => f.FTER99.Equals("TrueAndCorrect")) != 1
Are there other ways this LINQ can be expressed as, perhaps more compact, using Any, inverting the operators, or any other?
The original code is:
bool assertion =
report.DeclarationOfTrusteeCollection == null ||
report.DeclarationOfTrusteeCollection.Count(
f => f.FTER99.Equals("TrueAndCorrect")) != 1;
There are some problems here.
First, the intention of the null check seems to be "a null collection has the same semantics as an empty collection". This is a worst-practice in C#. Never do this! If you want to represent an empty collection, make an empty collection. There's even an Enumerable.Empty helper method for you.
So, start with that; the code should be:
if (report.DeclarationOfTrusteeCollection == null)
throw some appropriate exception
or
Debug.Assert(report.DeclarationOfTrusteeCollection != null);
if the condition is impossible.
That leaves us with
bool assertion =
report.DeclarationOfTrusteeCollection.Count(
f => f.FTER99.Equals("TrueAndCorrect")) != 1;
This is bad. Suppose I show you a jar that contains some number of pennies and I ask you "is there exactly one penny in the jar?" How many pennies do you have to count before you know the answer? Your code here is counting all of them, but you could stop after two.
Enumerable gives you a method which throws if a sequence is not a singleton, but no method that tests it. Fortunately it is easy to write. The best practice here is to write a helper method that has the exact semantics you want:
static class Extensions
{
public static bool IsSingleton<T>(this IEnumerable<T> items)
{
bool seenOne = false;
foreach(T item in items)
{
if (seenOne) return false;
seenOne = true;
}
return seenOne;
}
public static bool IsSingleton<T>(
this IEnumerable<T> items, Func<T, bool> predicate) =>
items.Where(predicate).IsSingleton();
}
Done. And now your code is:
if (report.DeclarationOfTrusteeCollection == null)
throw some appropriate exception
bool assertion =
report.DeclarationOfTrusteeCollection.IsSingleton(f => ...);
Write the code so that it reads like what it is logically doing. That's the beauty and power of LINQ sequence operators.
You could use the null-propagation operator:
bool assertion = report.DeclarationOfTrusteeCollection?.Count(f => f.FTER99.Equals("TrueAndCorrect")) != 1;
Since null is not 1 this is also true if the collection is null.
It would be nice if you don't need to count the whole collection, you already know it's wrong when there's more than one matching element. But I don't know of a built-in method for that. You could write your own extension:
public static class MyExtensions
{
public static bool IsNullOrHasNotExactlyOneMatching<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
if (source == null) return true;
bool found = false;
foreach(T element in source)
{
if (!predicate(element)) continue;
if (found) return true; // this is the second match!
found = true;
}
return !found; // one match found (or not)
}
}
And use it:
bool assertion = report.DeclarationOfTrusteeCollection.IsNullOrHasNotExactlyOneMatching(f => f.FTER99.Equals("TrueAndCorrect"));
As mentioned by Rawling you could shorten the extension using Take():
public static bool IsNullOrHasNotExactlyOneMatching<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
return source?.Where(predicate).Take(2).Count() != 1;
}
or do this directly:
bool assertion = report.DeclarationOfTrusteeCollection?.Where(f => f.FTER99.Equals("TrueAndCorrect"))
.Take(2).Count() != 1;
Both versions only iterate until a second match was found (or until the end if no match was found).

How to get excluded collection without a second LINQ query?

I have a LINQ query that looks like this:
var p = option.GetType().GetProperties().Where(t => t.PropertyType == typeof(bool));
What is the most efficient way to get the items which aren't included in this query, without executing a second iteration over the list.
I could easily do this with a for loop but I was wondering if there's a shorthand with LINQ.
var p = option.GetType().GetProperties().ToLookup(t => t.PropertyType == typeof(bool));
var bools = p[true];
var notBools = p[false];
.ToLookup() is used to partition an IEnumerable based on a key function. In this case, it will return an Lookup which will have at most 2 items in it. Items in the Lookup can be accessed using a key similar to an IDictionary.
.ToLookup() is evaluated immediately and is an O(n) operation and accessing a partition in the resulting Lookup is an O(1) operation.
Lookup is very similar to a Dictionary and have similar generic parameters (a Key type and a Value type). However, where Dictionary maps a key to a single value, Lookup maps a key to an set of values. Lookup can be implemented as IDictionary<TKey, IEnumerable<TValue>>
.GroupBy() could also be used. But it is different from .ToLookup() in that GroupBy is lazy evaluated and could possibly be enumerated multiple times. .ToLookup() is evaluated immediately and the work is only done once.
You cannot get something that you don't ask for. So if you exlude all but bool you can't expect to get them later. You need to ask for them.
For what it's worth, if you need both, the one you want and all other in a single query you could GroupBy this condition or use ToLookup which i would prefer:
var isboolOrNotLookup = option.GetType().GetProperties()
.ToLookup(t => t.PropertyType == typeof(bool)); // use PropertyType instead
Now you can use this lookup for further processing. For example, if you want a collection of all properties which are bool:
List<System.Reflection.PropertyInfo> boolTypes = isboolOrNotLookup[true].ToList();
or just the count:
int boolCount = isboolOrNotLookup[true].Count();
So if you want to process all which are not bool:
foreach(System.Reflection.PropertyInfo prop in isboolOrNotLookup[false])
{
}
Well, you could go for source.Except(p), but it would reiterate the list and perform a lot of comparisons.
I'd say - write an extension method that does it using foreach, basically splitting the list into two destinations. Or something like this.
How about:
public class UnzipResult<T>{
private readonly IEnumearator<T> _enumerator;
private readonly Func<T, bool> _filter;
private readonly Queue<T> _nonMatching = new Queue<T>();
private readonly Queue<T> _matching = new Queue<T>();
public IEnumerable<T> Matching {get{
if(_matching.Count > 0)
yield return _matching.Dequeue();
else {
while(_enumerator.MoveNext()){
if(_filter(_enumerator.Current))
yield return _enumerator.Current;
else
_nonMatching.Enqueue(_enumerator.Current);
}
yield break;
}
}}
public IEnumerable<T> Rest {get{
if(_matching.Count > 0)
yield return _nonMatching.Dequeue();
else {
while(_enumerator.MoveNext()){
if(!_filter(_enumerator.Current))
yield return _enumerator.Current;
else
_matching.Enqueue(_enumerator.Current);
}
yield break;
}
}}
public UnzipResult(IEnumerable<T> source, Func<T, bool> filter){
_enumerator = source.GetEnumerator();
_filter = filter;
}
}
public static UnzipResult<T> Unzip(this IEnumerable<T> source, Func<T,bool> filter){
return new UnzipResult(source, filter);
}
It's written in notepad, so probably doesn't compile, but my idea is: whatever collection you enumerate (matching or non-matching), you only enumerate the source once. And it should work fairly well with those pesky infinite collections (think yield return random.Next()), unless all elements do/don't fulfil filter.

Is the IEnumerable<T> function with a yield more efficient than the List<T> function?

I am coding a C# forms application, and would like to know if the following two functions achieve the same result:
public List<object> Method1(int parentId)
{
List<object> allChildren = new List<object>();
foreach (var item in list.Where(c => c.parentHtmlNodeForeignKey == parentId))
{
allChildren.Add(item);
allChildren.AddRange(Method1(item.id));
}
return allChildren;
}
public IEnumerable<object> Method2(int parentId)
{
foreach (var item in list.Where(c => c.parentHtmlNodeForeignKey == parentId))
{
yield return item;
foreach (var itemy in Method2(item.id))
{
yield return itemy;
}
}
}
Am I correct in saying that the Method1 function is more efficient than the Method2?
Also, can either of the above functions be coded to be more efficient?
EDIT
I am using the function to return some objects that are then displayed in a ListView. I am then looping through these same objects to check if a string occurs.
Thanks.
This highly depends on what you want to do. For example if you use FirstOrDefault(p => ....) the yield method can be faster because it's not required to store all the stuff into a list and if the first element is the right one the list method has some overhead ( Of course the yield method has also overhead but as i said it depends ).
If you want to iterate over and over again over the data then you should go with the list.
It depends on lot's of things.
Here are some reasons to use IEnumerable<T> over List<T>:
When you are iterating a part of a collection (e.g. using FirstOrDefault, Any, Take etc.).
When you have an large collection and you can ToList() it (e.g. Fibonacci Series).
When you shouldn't use IEnumerable<T> over List<T>:
When you are enumerating a DB query multiple times with different conditions (You may want the results in memory).
When you want to iterate the whole collection more than once - There is no need to create iterators each time.

How to properly check IEnumerable for existing results

What's the best practice to check if a collection has items?
Here's an example of what I have:
var terminalsToSync = TerminalAction.GetAllTerminals();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
The GetAllTerminals() method will execute a stored procedure and, if we return a result, (Any() is true), SyncTerminals() will loop through the elements; thus enumerating it again and executing the stored procedure for the second time.
What's the best way to avoid this?
I'd like a good solution that can be used in other cases too; possibly without converting it to List.
Thanks in advance.
I would probably use a ToArray call, and then check Length; you're going to enumerate all the results anyway so why not do it early? However, since you've said you want to avoid early realisation of the enumerable...
I'm guessing that SyncTerminals has a foreach, in which case you can write it something like this:
bool any = false;
foreach(var terminal in terminalsToSync)
{
if(!any)any = true;
//....
}
if(!any)
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
Okay, there's a redundant if after the first loop, but I'm guessing the cost of an extra few CPU cycles isn't going to matter much.
Equally, you could do the iteration the old way and use a do...while loop and GetEnumerator; taking the first iteration out of the loop; that way there are literally no wasted operations:
var enumerator = terminalsToSync.GetEnumerator();
if(enumerator.MoveNext())
{
do
{
//sync enumerator.Current
} while(enumerator.MoveNext())
}
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
How about this, which still defers execution, but buffers it once executed:
var terminalsToSync = TerminalAction.GetAllTerminals().Lazily();
with:
public static class LazyEnumerable {
public static IEnumerable<T> Lazily<T>(this IEnumerable<T> source) {
if (source is LazyWrapper<T>) return source;
return new LazyWrapper<T>(source);
}
class LazyWrapper<T> : IEnumerable<T> {
private IEnumerable<T> source;
private bool executed;
public LazyWrapper(IEnumerable<T> source) {
if (source == null) throw new ArgumentNullException("source");
this.source = source;
}
IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
public IEnumerator<T> GetEnumerator() {
if (!executed) {
executed = true;
source = source.ToList();
}
return source.GetEnumerator();
}
}
}
Personally i wouldnt use an any here, foreach will simply not loop through any items if the collection is empty, so i would just do it like that. However i would recommend that you check for null.
If you do want to pre-enumerate the set use .ToArray() eg will only enumerate once:
var terminalsToSync = TerminalAction.GetAllTerminals().ToArray();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
var terminalsToSync = TerminalAction.GetAllTerminals().ToList();
if(terminalsToSync.Any())
SyncTerminals(terminalsToSync);
else
GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
.Length or .Count is faster since it doesn't need to go through the GetEnumerator()/MoveNext()/Dispose() required by Any()
Here's another way of approaching this problem:
int count = SyncTerminals(terminalsToSync);
if(count == 0) GatewayLogAction.WriteLogInfo(Messages.NoTerminalsForSync);
where you change SyncTerminals to do:
int count = 0;
foreach(var obj in terminalsToSync) {
count++;
// some code
}
return count;
Nice and simple.
All the caching solutions here are caching all items when the first item is being retrieved. It it really lazy if you cache each single item while the items of the list are is iterated.
The difference can be seen in this example:
public class LazyListTest
{
private int _count = 0;
public void Test()
{
var numbers = Enumerable.Range(1, 40);
var numbersQuery = numbers.Select(GetElement).ToLazyList(); // Cache lazy
var total = numbersQuery.Take(3)
.Concat(numbersQuery.Take(10))
.Concat(numbersQuery.Take(3))
.Sum();
Console.WriteLine(_count);
}
private int GetElement(int value)
{
_count++;
// Some slow stuff here...
return value * 100;
}
}
If you run the Test() method, the _count is only 10. Without caching it would be 16 and with .ToList() it would be 40!
An example of the implementation of LazyList can be found here.
If you're seeing two procedure calls for the evaluation of whatever GetAllTerminals() returns, this means that the procedure's result isn't being cached. Without knowing what data-access strategy you're using, this is quite hard to fix in a general way.
The simplest solution, as you've alluded, is to copy the result of the call before you perform any other operations. If you wanted to, you could neatly wrap this behaviour up in an IEnumerable<T> which executes the inner enumerable call just once:
public class CachedEnumerable<T> : IEnumerable<T>
{
public CachedEnumerable<T>(IEnumerable<T> enumerable)
{
result = new Lazy<List<T>>(() => enumerable.ToList());
}
private Lazy<List<T>> result;
public IEnumerator<T> GetEnumerator()
{
return this.result.Value.GetEnumerator();
}
System.Collections.IEnumerable GetEnumerator()
{
return this.GetEnumerator();
}
}
Wrap the result in an instance of this type and it will not evaluate the inner enumerable multiple times.

Why does this linq extension method hit the database twice?

I have an extension method called ToListIfNotNullOrEmpty(), which is hitting the DB twice, instead of once. The first time it returns one result, the second time it returns all the correct results.
I'm pretty sure the first time it hits the database, is when the .Any() method is getting called.
here's the code.
public static IList<T> ToListIfNotNullOrEmpty<T>(this IEnumerable<T> value)
{
if (value.IsNullOrEmpty())
{
return null;
}
if (value is IList<T>)
{
return (value as IList<T>);
}
return new List<T>(value);
}
public static bool IsNullOrEmpty<T>(this IEnumerable<T> value)
{
if (value != null)
{
return !value.Any();
}
return true;
}
I'm hoping to refactor it so that, before the .Any() method is called, it actually enumerates through the entire list.
If i do the following, only one DB call is made, because the list is already enumerated.
var pewPew = (from x in whatever
select x)
.ToList() // This enumerates.
.ToListIsNotNullOrEmpty(); // This checks the enumerated result.
I sorta don't really want to call ToList() then my extension method.
Any ideas, folks?
I confess that I see little point in this method. Surely if you simply do a ToList(), a check to see if the list is empty suffices as well. It's arguably harder to handle the null result when you expect a list because then you always have to check for null before you iterate over it.
I think that:
var query = (from ...).ToList();
if (query.Count == 0) {
...
}
works as well and is less burdensome than
var query = (from ...).ToListIfNotNullOrEmpty();
if (query == null) {
...
}
and you don't have to implement (and maintain) any code.
How about something like this?
public static IList<T> ToListIfNotNullOrEmpty<T>(this IEnumerable<T> value)
{
if (value == null)
return null;
var list = value.ToList();
return (list.Count > 0) ? list : null;
}
To actually answer your question:
This method hits the database twice because the extension methods provided by the System.Linq.Enumerable class exhibit what is called deferred execution. Essentially, this is to eliminate the need for constructing a string of temporarily cached collections for every part of a query. To understand this, consider the following example:
var firstMaleTom = people
.Where(p => p.Gender = Gender.Male)
.Where(p => p.FirstName == "Tom")
.FirstOrDefault();
Without deferred execution, the above code might require that the entire collection people be enumerated over, populating a temporary buffer array with all the individuals whose Gender is Male. Then it would need to be enumerated over again, populating another buffer array with all of the individuals from the first buffer whose first name is Tom. After all that work, the last part would return the first item from the resulting array.
That's a lot of pointless work. The idea with deferred execution is that the above code really just sets up the firstMaleTom variable with the information it needs to return what's being requested with the minimal amount of work.
Now, there's a flip side to this: in the case of querying a database, deferred execution means that the database gets queried when the return value is evaluated. So, in your IsNullOrEmpty method, when you call Any, the value parameter is actually being evaluated right then and there -- hence a database query. After this, in your ToListIfNotNullOrEmpty method, the line return new List<T>(value) also evaluates the value parameter -- because it's enumerating over the values and adding them to the newly created List<T>.
You could stick the .ToList() call inside the extension, the effect is slightly different, but does this still work in the cases you have?
public static IList<T> ToListIfNotNullOrEmpty<T>(this IEnumerable<T> value)
{
if(value == null)
{
return null;
}
var result = value.ToList();
return result.IsNullOrEmpty() ? null : result;
}

Categories