Looking for alternative LINQ expression(s) - c#

I'm working on a code generator that validated objects based on certain business rules. As an example, I’m curious to find out various ways below logic can be written as LINQ expression.
Assertion should evaluate to true when collection is null OR when count of "TrueAndCorrect" items is anything but 1. One possible solution is:
bool assertion = report.DeclarationOfTrusteeCollection == null
|| report.DeclarationOfTrusteeCollection.Count(f => f.FTER99.Equals("TrueAndCorrect")) != 1
Are there other ways this LINQ can be expressed as, perhaps more compact, using Any, inverting the operators, or any other?

The original code is:
bool assertion =
report.DeclarationOfTrusteeCollection == null ||
report.DeclarationOfTrusteeCollection.Count(
f => f.FTER99.Equals("TrueAndCorrect")) != 1;
There are some problems here.
First, the intention of the null check seems to be "a null collection has the same semantics as an empty collection". This is a worst-practice in C#. Never do this! If you want to represent an empty collection, make an empty collection. There's even an Enumerable.Empty helper method for you.
So, start with that; the code should be:
if (report.DeclarationOfTrusteeCollection == null)
throw some appropriate exception
or
Debug.Assert(report.DeclarationOfTrusteeCollection != null);
if the condition is impossible.
That leaves us with
bool assertion =
report.DeclarationOfTrusteeCollection.Count(
f => f.FTER99.Equals("TrueAndCorrect")) != 1;
This is bad. Suppose I show you a jar that contains some number of pennies and I ask you "is there exactly one penny in the jar?" How many pennies do you have to count before you know the answer? Your code here is counting all of them, but you could stop after two.
Enumerable gives you a method which throws if a sequence is not a singleton, but no method that tests it. Fortunately it is easy to write. The best practice here is to write a helper method that has the exact semantics you want:
static class Extensions
{
public static bool IsSingleton<T>(this IEnumerable<T> items)
{
bool seenOne = false;
foreach(T item in items)
{
if (seenOne) return false;
seenOne = true;
}
return seenOne;
}
public static bool IsSingleton<T>(
this IEnumerable<T> items, Func<T, bool> predicate) =>
items.Where(predicate).IsSingleton();
}
Done. And now your code is:
if (report.DeclarationOfTrusteeCollection == null)
throw some appropriate exception
bool assertion =
report.DeclarationOfTrusteeCollection.IsSingleton(f => ...);
Write the code so that it reads like what it is logically doing. That's the beauty and power of LINQ sequence operators.

You could use the null-propagation operator:
bool assertion = report.DeclarationOfTrusteeCollection?.Count(f => f.FTER99.Equals("TrueAndCorrect")) != 1;
Since null is not 1 this is also true if the collection is null.
It would be nice if you don't need to count the whole collection, you already know it's wrong when there's more than one matching element. But I don't know of a built-in method for that. You could write your own extension:
public static class MyExtensions
{
public static bool IsNullOrHasNotExactlyOneMatching<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
if (source == null) return true;
bool found = false;
foreach(T element in source)
{
if (!predicate(element)) continue;
if (found) return true; // this is the second match!
found = true;
}
return !found; // one match found (or not)
}
}
And use it:
bool assertion = report.DeclarationOfTrusteeCollection.IsNullOrHasNotExactlyOneMatching(f => f.FTER99.Equals("TrueAndCorrect"));
As mentioned by Rawling you could shorten the extension using Take():
public static bool IsNullOrHasNotExactlyOneMatching<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
return source?.Where(predicate).Take(2).Count() != 1;
}
or do this directly:
bool assertion = report.DeclarationOfTrusteeCollection?.Where(f => f.FTER99.Equals("TrueAndCorrect"))
.Take(2).Count() != 1;
Both versions only iterate until a second match was found (or until the end if no match was found).

Related

Checking if all items in one generic collection exist in another using custom comparison delegate

I have a situation where I need a generic method to which I can pass two collections of type T along with a delegate that compares the two collections and returns true if every element in collection 1 has an equal element in collection 2, even if they are not in the same index of the collection. What I mean by "equal" is handled by the delegate. My initial thought was to return false if the collections were different lengths and otherwise sort them and then compare them like parallel arrays. Then it occurred to me that I can't sort a collection of a generic type without the types sharing an interface. So now I am thinking a LINQ expression might do the trick, but I can't think of how to write it. Consider my current code:
private static bool HasSameCollectionItems<T>(ICollection<T> left, ICollection<T> right, Func<T, T, bool> func)
{
if (left.Count != right.Count)
{
return false;
}
foreach (var item in left)
{
bool leftItemIsInRightCollection = ??? MAGIC ???
if (!leftItemIsInRightCollection)
{
return false;
}
}
return true;
}
I would like to replace ??? MAGIC ??? with a LINQ expression to see if item is "equal" to an element in right using the passed in delegate func. Is this even possible?
Note: For reasons I don't want to bother getting into here, impelemnting IEquatable or overriding the Equals method is not an option here.
It looks like you want .All() and .Any() methods (first method checks that all elements satisfy condition second only check if such an element exist) :
bool leftItemIsInRightCollection = right.Any(rItem => func(item, rItem));
Also i'd refactor your code to something like :
private static bool HasSameCollectionItems<T>(ICollection<T> left, ICollection<T> right, Func<T, T, bool> func)
{
return left.Count == right.Count && left.All(LI => right.Any(RI => func(LI, RI)));
}
The following works by checking whether there are element in left which are not in right.
If you insist on a delegate to determine equality, you can use the FuncEqualityComparer from here. (Note that you must also provide an implementation for Object.GetHashCode)
private static bool HasSameCollectionItems<T>(ICollection<T> left, ICollection<T> right, IEqualityComparer<T> comparer)
{
if (left.Count != right.Count) return false;
return !left.Except(right, comparer).Any();
}

Does Any() stop on success?

To be more specific: will the Linq extension method Any(IEnumerable collection, Func predicate) stop checking all the remaining elements of the collections once the predicate has yielded true for an item?
Because I don't want to spend to much time on figuring out if I need to do the really expensive parts at all:
if(lotsOfItems.Any(x => x.ID == target.ID))
//do expensive calculation here
So if Any is always checking all the items in the source this might end up being a waste of time instead of just going with:
var candidate = lotsOfItems.FirstOrDefault(x => x.ID == target.ID)
if(candicate != null)
//do expensive calculation here
because I'm pretty sure that FirstOrDefault does return once it got a result and only keeps going through the whole Enumerable if it does not find a suitable entry in the collection.
Does anyonehave information about the internal workings of Any, or could anyone suggest a solution for this kind of decision?
Also, a colleague suggested something along the lines of:
if(!lotsOfItems.All(x => x.ID != target.ID))
since this is supposed to stop once the conditions returns false for the first time but I'm not sure on that, so if anyone could shed some light on this as well it would be appreciated.
As we see from the source code, Yes:
internal static bool Any<T>(this IEnumerable<T> source, Func<T, bool> predicate) {
foreach (T element in source) {
if (predicate(element)) {
return true; // Attention to this line
}
}
return false;
}
Any() is the most efficient way to determine whether any element of a sequence satisfies a condition with LINQ.
also:a colleague suggested something along the lines of
if(!lotsOfItems.All(x => x.ID != target.ID)) since this is supposed to
stop once the conditions returns false for the first time but i'm not
sure on that, so if anyone could shed some light on this as well it
would be appreciated :>]
All() determines whether all elements of a sequence satisfy a condition. So, the enumeration of source is stopped as soon as the result can be determined.
Additional note:
The above is true if you are using Linq to objects. If you are using Linq to Database, then it will create a query and will execute it against database.
You could test it yourself: https://ideone.com/nIDKxr
public static IEnumerable<int> Tester()
{
yield return 1;
yield return 2;
throw new Exception();
}
static void Main(string[] args)
{
Console.WriteLine(Tester().Any(x => x == 1));
Console.WriteLine(Tester().Any(x => x == 2));
try
{
Console.WriteLine(Tester().Any(x => x == 3));
}
catch
{
Console.WriteLine("Error here");
}
}
Yes, it does :-)
also:a colleague suggested something along the lines of
if(!lotsOfItems.All(x => x.ID != target.ID))
since this is supposed to stop once the conditions returns false for the first time but i'm not sure on that, so if anyone could shed some light on this as well it would be appreciated :>]
Using the same reasoning, All() could continue even if one of the element returns false :-) No, even All() is programmed correctly :-)
It does whatever is the quickest way of doing what it has to do.
When used on an IEnumerable this will be along the lines of:
foreach(var item in source)
if(predicate(item))
return true;
return false;
Or for the variant that doesn't take a predicate:
using(var en = source.GetEnumerator())
return en.MoveNext();
When run against at database it will be something like
SELECT EXISTS(SELECT null FROM [some table] WHERE [some where clause])
And so on. How that was executed would depend in turn on what indices were available for fulfilling the WHERE clause, so it could be a quick index lookup, a full table scan aborting on first match found, or an index lookup followed by a partial table scan aborting on first match found, depending on that.
Yet other Linq providers would have yet other implementations, but generally the people responsible will be trying to be at least reasonably efficient.
In all, you can depend upon it being at least slightly more efficient than calling FirstOrDefault, as FirstOrDefault uses similar approaches but does have to return a full object (perhaps constructing it). Likewise !All(inversePredicate) tends to be pretty much on a par with Any(predicate) as per this answer.
Single is an exception to this
Update: The following from this point on no longer applies to .NET Core, which has changed the implementation of Single.
It's important to note that in the case of linq-to objects, the overloads of Single and SingleOrDefault that take a predicate do not stop on identified failure. While the obvious approach to Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) would be something like:
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
/* do null checks */
using(var en = source.GetEnumerator())
while(en.MoveNext())
{
var val = en.Current;
if(predicate(val))
{
while(en.MoveNext())
if(predicate(en.Current))
throw new InvalidOperationException("too many matching items");
return val;
}
}
throw new InvalidOperationException("no matching items");
}
The actual implementation is something like:
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
/* do null checks */
var result = default(TSource);
long tally = 0;
for(var item in source)
if(predicate(item))
{
result = item;
checked{++tally;}
}
switch(tally)
{
case 0:
throw new InvalidOperationException("no matching items");
case 1:
return result;
default:
throw new InvalidOperationException("too many matching items");
}
}
Now, while successful Single will have to scan everything, this can mean that an unsucessful Single is much, much slower than it needs to (and can even potentially throw an undocumented error) and if the reason for the unexpected duplicate is a bug which is duplicating items into the sequence - and hence making it far larger than it should be, then the Single that should have helped you find that problem is now dragging away through this.
SingleOrDefault has the same issue.
This only applies to linq-to-objects, but it remains safer to do .Where(predicate).Single() rather than Single(predicate).
Any stops at the first match. All stops at the first non-match.
I don't know whether the documentation guarantees that but this behavior is now effectively fixed for all time due to compatibility reasons. It also makes sense.
Yes it stops when the predicate is satisfied once. Here is code via RedGate Reflector:
[__DynamicallyInvokable]
public static bool Any<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
if (predicate == null)
{
throw Error.ArgumentNull("predicate");
}
foreach (TSource local in source)
{
if (predicate(local))
{
return true;
}
}
return false;
}

How do I verify a collection of values is unique (contains no duplicates) in C#

Surely there is an easy way to verify a collection of values has no duplicates [using the default Comparison of the collection's Type] in C#/.NET ? Doesn't have to be directly built in but should be short and efficient.
I've looked a lot but I keep hitting examples of using collection.Count() == collection.Distinct().Count() which for me is inefficient. I'm not interested in the result and want to bail out as soon as I detect a duplicate, should that be the case.
(I'd love to delete this question and/or its answer if someone can point out the duplicates)
Okay, if you just want to get out as soon as the duplicate is found, it's simple:
// TODO: add an overload taking an IEqualityComparer<T>
public bool AllUnique<T>(this IEnumerable<T> source)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
var distinctItems = new HashSet<T>();
foreach (var item in source)
{
if (!distinctItems.Add(item))
{
return false;
}
}
return true;
}
... or use All, as you've already shown. I'd argue that this is slightly simpler to understand in this case... or if you do want to use All, I'd at least separate the creation of the set from the method group conversion, for clarity:
public static bool IsUnique<T>(this IEnumerable<T> source)
{
// TODO: validation
var distinctItems = new HashSet<T>();
// Add will return false if the element already exists. If
// every element is actually added, then they must all be unique.
return source.All(distinctItems.Add);
}
Doing it inline, you can replace:
collection.Count() == collection.Distinct().Count()
with
collection.All( new HashSet<T>().Add );
(where T is the type of your collection's elements)
Or you can extract the above to a helper extension method[1] so you can say:
collection.IsUnique()
[1]
static class EnumerableUniquenessExtensions
{
public static bool IsUnique<T>(this IEnumerable<T> that)
{
return that.All( new HashSet<T>().Add );
}
}
(and as Jon has pointed out in his answer, one really should separate and comment the two lines as such 'cuteness' is generally Not A Good Idea)

Is this achievable with a single LINQ query?

Suppose I have a given object of type IEnumerable<string> which is the return value of method SomeMethod(), and which contains no repeated elements. I would like to be able to "zip" the following lines in a single LINQ query:
IEnumerable<string> someList = SomeMethod();
if (someList.Contains(givenString))
{
return (someList.Where(givenString));
}
else
{
return (someList);
}
Edit: I mistakenly used Single instead of First. Corrected now.
I know I can "zip" this by using the ternary operator, but that's just not the point. I would just list to be able to achieve this with a single line. Is that possible?
This will return items with given string or all items if given is not present in the list:
someList.Where(i => i == givenString || !someList.Contains(givenString))
The nature of your desired output requires that you either make two requests for the data, like you are now, or buffer the non-matches to return if no matches are found. The later would be especially useful in cases where actually getting the data is a relatively expensive call (eg: database query or WCF service). The buffering method would look like this:
static IEnumerable<T> AllIfNone<T>(this IEnumerable<T> source,
Func<T, bool> predicate)
{
//argument checking ignored for sample purposes
var buffer = new List<T>();
bool foundFirst = false;
foreach (var item in source)
{
if (predicate(item))
{
foundFirst = true;
yield return item;
}
else if (!foundFirst)
{
buffer.Add(item);
}
}
if (!foundFirst)
{
foreach (var item in buffer)
{
yield return item;
}
}
}
The laziness of this method is either that of Where or ToList depending on if the collection contains a match or not. If it does, you should get execution similar to Where. If not, you will get roughly the execution of calling ToList (with the overhead of all the failed filter checks) and iterating the result.
What is wrong with the ternary operator?
someList.Any(s => s == givenString) ? someList.Where(s => s == givenString) : someList;
It would be better to do the Where followed by the Any but I can't think of how to one-line that.
var reducedEnumerable = someList.Where(s => s == givenString);
return reducedEnumerable.Any() ? reducedEnumerable : someList;
It is not possible to change the return type on the method, which is what you're asking. The first condition returns a string and the second condition returns a collection of strings.
Just return the IEnumerable<string> collection, and call Single on the return value like this:
string test = ReturnCollectionOfStrings().Single(x => x == "test");

Efficient Linq Enumerable's 'Count() == 1' test

Similar to this question but rephrased for Linq:
You can use Enumerable<T>.Any() to test if the enumerable contains data. But what's the efficient way to test if the enumerable contains a single value (i.e. Enumerable<T>.Count() == 1) or greater than a single value (i.e. Enumerable<T>.Count() > 1) without using an expensive count operation?
int constrainedCount = yourSequence.Take(2).Count();
// if constrainedCount == 0 then the sequence is empty
// if constrainedCount == 1 then the sequence contains a single element
// if constrainedCount == 2 then the sequence has more than one element
One way is to write a new extension method
public static bool IsSingle<T>(this IEnumerable<T> enumerable) {
using (var enumerator = enumerable.GetEnumerator()) {
if (!enumerator.MoveNext()) {
return false;
}
return !enumerator.MoveNext();
}
}
This code take's LukeH's excellent answer and wraps it up as an IEnumerable extension so that your code can deal in terms of None, One and Many rather than 0, 1 and 2.
public enum Multiplicity
{
None,
One,
Many,
}
In a static class, e.g. EnumerableExtensions:
public static Multiplicity Multiplicity<TElement>(this IEnumerable<TElement> #this)
{
switch (#this.Take(2).Count())
{
case 0: return General.Multiplicity.None;
case 1: return General.Multiplicity.One;
case 2: return General.Multiplicity.Many;
default: throw new Exception("WTF‽");
}
}
Another way:
bool containsMoreThanOneElement = yourSequence.Skip(1).Any();
Or for exactly 1 element:
bool containsOneElement = yourSequence.Any() && !yourSequence.Skip(1).Any();
Efficient Count() == n test:
public static bool CountIsEqualTo<T>(this IEnumerable<T> enumerable, int c)
{
using (var enumerator = enumerable.GetEnumerator())
{
for(var i = 0; i < c ; i++)
if (!enumerator.MoveNext())
return false;
return !enumerator.MoveNext();
}
}
With linq to objects, SingleOrDefault throws if there is more than one element, so you're probably best off if you roll your own.
EDIT: Now I've seen LukeH's answer, and I have to say I prefer it. Wish I'd thought of it myself!
bool hasTwo = yourSequence.ElementAtOrDefault(1) != default(T);
...in case of class where values can be null this could maybe we useful.

Categories