I came across the following piece of code in the Sprache repository :
Parser<string> identifier =
from leading in Parse.WhiteSpace.Many()
from first in Parse.Letter.Once().Text()
from rest in Parse.LetterOrDigit.Many().Text()
from trailing in Parse.WhiteSpace.Many()
select first + rest;
var id = identifier.Parse(" abc123 ");
I see a contradiction here: the from clause docs say the source (Parse.WhiteSpace.Many() or Parse.Letter.Once().Text() in our case) must be IEnumerable:
The data source referenced in the from clause must have a type of IEnumerable, IEnumerable<T>, or a derived type such as IQueryable<T>
But it isn't and the compiler says that's fine!
I thought there is some implicit cast to IEnumerable, but there isn't: Parse.WhiteSpace.Many() returns Parser<IEnumerable<T>> and Parse.Letter.Once().Text() returns Parser<string> (types are not IEnumerable).
1st question: Why does the compiler allow this code?
Also, the final expression select first + rest doesn't take into account leading and trailing variables, but the final result identifier, for sure, uses them inside.
2nd question: By what rule\mechanism leading and trailing variables were added to the identifier?
P.S.
It'd be great if someone shared an all-encompassing doc about internal work of LINQ query syntax. I've found nothing on this topic.
After like five minutes of looking at the code I have observed:
parser is a delegate that returns an intermediate result
public delegate IResult<T> Parser<out T>(IInput input);
there are linq compliant methods declared that allow linq syntax like:
public static Parser<U> Select<T, U>(this Parser<T> parser, Func<T, U> convert)
{
if (parser == null) throw new ArgumentNullException(nameof(parser));
if (convert == null) throw new ArgumentNullException(nameof(convert));
return parser.Then(t => Return(convert(t)));
}
https://github.com/sprache/Sprache/blob/develop/src/Sprache/Parse.cs#L357
It is not true that IEnumerable interface is required for the syntax from x in set to work you just require particular extension method with correct name that accepts correct set of parameters. So the above makes select valid. Here you have where method
public static Parser<T> Where<T>(this Parser<T> parser, Func<T, bool> predicate)
{
if (parser == null) throw new ArgumentNullException(nameof(parser));
if (predicate == null) throw new ArgumentNullException(nameof(predicate));
return i => parser(i).IfSuccess(s =>
predicate(s.Value) ? s : Result.Failure<T>(i,
$"Unexpected {s.Value}.",
new string[0]));
}
https://github.com/sprache/Sprache/blob/develop/src/Sprache/Parse.cs#L614
and so on.
This is separate implementation of the linq abstraction that has nothing to do with collections it is about parsing text. It produces a nested chain of delegates that process given string to verify if it confirms to particular gramma and returns structure that describes parsed text.
that answers first question. For the second you need to follow the code: all from x in set except the first one map to SelectMany function:
public static Parser<V> SelectMany<T, U, V>(
this Parser<T> parser,
Func<T, Parser<U>> selector,
Func<T, U, V> projector)
{
if (parser == null) throw new ArgumentNullException(nameof(parser));
if (selector == null) throw new ArgumentNullException(nameof(selector));
if (projector == null) throw new ArgumentNullException(nameof(projector));
return parser.Then(t => selector(t).Select(u => projector(t, u)));
}
https://github.com/sprache/Sprache/blob/develop/src/Sprache/Parse.cs#L635
and Then method
https://github.com/sprache/Sprache/blob/develop/src/Sprache/Parse.cs#L241
there you will see that if first succeeds (leading white spaces where matched) only than second (the single letter parser) is applied on the remainder of the string. So Again it is not a collection processing its a chain of events that lead to parsing the string.
Related
I'm working on a code generator that validated objects based on certain business rules. As an example, I’m curious to find out various ways below logic can be written as LINQ expression.
Assertion should evaluate to true when collection is null OR when count of "TrueAndCorrect" items is anything but 1. One possible solution is:
bool assertion = report.DeclarationOfTrusteeCollection == null
|| report.DeclarationOfTrusteeCollection.Count(f => f.FTER99.Equals("TrueAndCorrect")) != 1
Are there other ways this LINQ can be expressed as, perhaps more compact, using Any, inverting the operators, or any other?
The original code is:
bool assertion =
report.DeclarationOfTrusteeCollection == null ||
report.DeclarationOfTrusteeCollection.Count(
f => f.FTER99.Equals("TrueAndCorrect")) != 1;
There are some problems here.
First, the intention of the null check seems to be "a null collection has the same semantics as an empty collection". This is a worst-practice in C#. Never do this! If you want to represent an empty collection, make an empty collection. There's even an Enumerable.Empty helper method for you.
So, start with that; the code should be:
if (report.DeclarationOfTrusteeCollection == null)
throw some appropriate exception
or
Debug.Assert(report.DeclarationOfTrusteeCollection != null);
if the condition is impossible.
That leaves us with
bool assertion =
report.DeclarationOfTrusteeCollection.Count(
f => f.FTER99.Equals("TrueAndCorrect")) != 1;
This is bad. Suppose I show you a jar that contains some number of pennies and I ask you "is there exactly one penny in the jar?" How many pennies do you have to count before you know the answer? Your code here is counting all of them, but you could stop after two.
Enumerable gives you a method which throws if a sequence is not a singleton, but no method that tests it. Fortunately it is easy to write. The best practice here is to write a helper method that has the exact semantics you want:
static class Extensions
{
public static bool IsSingleton<T>(this IEnumerable<T> items)
{
bool seenOne = false;
foreach(T item in items)
{
if (seenOne) return false;
seenOne = true;
}
return seenOne;
}
public static bool IsSingleton<T>(
this IEnumerable<T> items, Func<T, bool> predicate) =>
items.Where(predicate).IsSingleton();
}
Done. And now your code is:
if (report.DeclarationOfTrusteeCollection == null)
throw some appropriate exception
bool assertion =
report.DeclarationOfTrusteeCollection.IsSingleton(f => ...);
Write the code so that it reads like what it is logically doing. That's the beauty and power of LINQ sequence operators.
You could use the null-propagation operator:
bool assertion = report.DeclarationOfTrusteeCollection?.Count(f => f.FTER99.Equals("TrueAndCorrect")) != 1;
Since null is not 1 this is also true if the collection is null.
It would be nice if you don't need to count the whole collection, you already know it's wrong when there's more than one matching element. But I don't know of a built-in method for that. You could write your own extension:
public static class MyExtensions
{
public static bool IsNullOrHasNotExactlyOneMatching<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
if (source == null) return true;
bool found = false;
foreach(T element in source)
{
if (!predicate(element)) continue;
if (found) return true; // this is the second match!
found = true;
}
return !found; // one match found (or not)
}
}
And use it:
bool assertion = report.DeclarationOfTrusteeCollection.IsNullOrHasNotExactlyOneMatching(f => f.FTER99.Equals("TrueAndCorrect"));
As mentioned by Rawling you could shorten the extension using Take():
public static bool IsNullOrHasNotExactlyOneMatching<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
return source?.Where(predicate).Take(2).Count() != 1;
}
or do this directly:
bool assertion = report.DeclarationOfTrusteeCollection?.Where(f => f.FTER99.Equals("TrueAndCorrect"))
.Take(2).Count() != 1;
Both versions only iterate until a second match was found (or until the end if no match was found).
I basically want to create my own implementation of LINQ .First(item =>...) and .Single(item => ...), objects only, which throws an exception with a meaningful message for the logfile:
var items = new List<Item>();
// fill items...
var itemIdToFind = 1234; // not supposed to be constant
var itemFound = items.First(
i => i.ID==1234,
() => new NotFoundException("Item " + itemIdToFind + " not found in items"));
Implementation is like this:
internal static class MyExtendedLinq
{
public static T First<T, TEx>(this IEnumerable<T> elements, Func<T, bool> predicate, Func<TEx> notFoundErrorFunc)
where TEx : Exception
{
var firstOnly = elements.Where(predicate).Take(1).ToArray();
// don't confuse found default value with default due to element not found - not FirstOrDefault!.
if (firstOnly.Length == 1)
{
return firstOnly[0];
}
throw notFoundErrorFunc(); // don't care for null func in example
}
}
This keeps giving me the Implicitly Captured Closure warning from ReSharper, both for the Exception lambda and the predicate function.
Especially for the Func predicate, I see no difference to the regular LINQ First(predicate) implementation, which doesn't show this warning.
I don't want the meaningless InvalidOperationExceptions from the regular First(prediate) method, leaving people searching for days, where something expected is missing.
The difference in your case is that you have two different lambdas, each of which are closing over different variables. Enumerable.First only has a single lambda, so it can't do that.
Now, you don't need to care about this warning, because neither delegate is long lived (neither will even outlive either variable), so there is no problem here. Of course, Resharper can't know that, and so has chosen to warn you about it so that you can determine that there isn't actually a problem here.
To be more specific: will the Linq extension method Any(IEnumerable collection, Func predicate) stop checking all the remaining elements of the collections once the predicate has yielded true for an item?
Because I don't want to spend to much time on figuring out if I need to do the really expensive parts at all:
if(lotsOfItems.Any(x => x.ID == target.ID))
//do expensive calculation here
So if Any is always checking all the items in the source this might end up being a waste of time instead of just going with:
var candidate = lotsOfItems.FirstOrDefault(x => x.ID == target.ID)
if(candicate != null)
//do expensive calculation here
because I'm pretty sure that FirstOrDefault does return once it got a result and only keeps going through the whole Enumerable if it does not find a suitable entry in the collection.
Does anyonehave information about the internal workings of Any, or could anyone suggest a solution for this kind of decision?
Also, a colleague suggested something along the lines of:
if(!lotsOfItems.All(x => x.ID != target.ID))
since this is supposed to stop once the conditions returns false for the first time but I'm not sure on that, so if anyone could shed some light on this as well it would be appreciated.
As we see from the source code, Yes:
internal static bool Any<T>(this IEnumerable<T> source, Func<T, bool> predicate) {
foreach (T element in source) {
if (predicate(element)) {
return true; // Attention to this line
}
}
return false;
}
Any() is the most efficient way to determine whether any element of a sequence satisfies a condition with LINQ.
also:a colleague suggested something along the lines of
if(!lotsOfItems.All(x => x.ID != target.ID)) since this is supposed to
stop once the conditions returns false for the first time but i'm not
sure on that, so if anyone could shed some light on this as well it
would be appreciated :>]
All() determines whether all elements of a sequence satisfy a condition. So, the enumeration of source is stopped as soon as the result can be determined.
Additional note:
The above is true if you are using Linq to objects. If you are using Linq to Database, then it will create a query and will execute it against database.
You could test it yourself: https://ideone.com/nIDKxr
public static IEnumerable<int> Tester()
{
yield return 1;
yield return 2;
throw new Exception();
}
static void Main(string[] args)
{
Console.WriteLine(Tester().Any(x => x == 1));
Console.WriteLine(Tester().Any(x => x == 2));
try
{
Console.WriteLine(Tester().Any(x => x == 3));
}
catch
{
Console.WriteLine("Error here");
}
}
Yes, it does :-)
also:a colleague suggested something along the lines of
if(!lotsOfItems.All(x => x.ID != target.ID))
since this is supposed to stop once the conditions returns false for the first time but i'm not sure on that, so if anyone could shed some light on this as well it would be appreciated :>]
Using the same reasoning, All() could continue even if one of the element returns false :-) No, even All() is programmed correctly :-)
It does whatever is the quickest way of doing what it has to do.
When used on an IEnumerable this will be along the lines of:
foreach(var item in source)
if(predicate(item))
return true;
return false;
Or for the variant that doesn't take a predicate:
using(var en = source.GetEnumerator())
return en.MoveNext();
When run against at database it will be something like
SELECT EXISTS(SELECT null FROM [some table] WHERE [some where clause])
And so on. How that was executed would depend in turn on what indices were available for fulfilling the WHERE clause, so it could be a quick index lookup, a full table scan aborting on first match found, or an index lookup followed by a partial table scan aborting on first match found, depending on that.
Yet other Linq providers would have yet other implementations, but generally the people responsible will be trying to be at least reasonably efficient.
In all, you can depend upon it being at least slightly more efficient than calling FirstOrDefault, as FirstOrDefault uses similar approaches but does have to return a full object (perhaps constructing it). Likewise !All(inversePredicate) tends to be pretty much on a par with Any(predicate) as per this answer.
Single is an exception to this
Update: The following from this point on no longer applies to .NET Core, which has changed the implementation of Single.
It's important to note that in the case of linq-to objects, the overloads of Single and SingleOrDefault that take a predicate do not stop on identified failure. While the obvious approach to Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) would be something like:
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
/* do null checks */
using(var en = source.GetEnumerator())
while(en.MoveNext())
{
var val = en.Current;
if(predicate(val))
{
while(en.MoveNext())
if(predicate(en.Current))
throw new InvalidOperationException("too many matching items");
return val;
}
}
throw new InvalidOperationException("no matching items");
}
The actual implementation is something like:
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
/* do null checks */
var result = default(TSource);
long tally = 0;
for(var item in source)
if(predicate(item))
{
result = item;
checked{++tally;}
}
switch(tally)
{
case 0:
throw new InvalidOperationException("no matching items");
case 1:
return result;
default:
throw new InvalidOperationException("too many matching items");
}
}
Now, while successful Single will have to scan everything, this can mean that an unsucessful Single is much, much slower than it needs to (and can even potentially throw an undocumented error) and if the reason for the unexpected duplicate is a bug which is duplicating items into the sequence - and hence making it far larger than it should be, then the Single that should have helped you find that problem is now dragging away through this.
SingleOrDefault has the same issue.
This only applies to linq-to-objects, but it remains safer to do .Where(predicate).Single() rather than Single(predicate).
Any stops at the first match. All stops at the first non-match.
I don't know whether the documentation guarantees that but this behavior is now effectively fixed for all time due to compatibility reasons. It also makes sense.
Yes it stops when the predicate is satisfied once. Here is code via RedGate Reflector:
[__DynamicallyInvokable]
public static bool Any<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
if (predicate == null)
{
throw Error.ArgumentNull("predicate");
}
foreach (TSource local in source)
{
if (predicate(local))
{
return true;
}
}
return false;
}
The title pretty much sums up the question. I don't have any problems with it, I'm just curious about reasons behind that design choice.
My Guess? Simplicity and compatibility for different providers.
Contrary to some other answers this has nothing to do with deferred execution - which is an important concept but irrelevant to the issue. For example I could make the following completely valid method:
public static IEnumerable<T> NotBuffered<T>(this IEnumerable<T> input)
{
return (IEnumerable<T>)input.ToList(); //not deferred
}
Alternatively I could expose a WhereEnumerable that works just like a IEnumerable but has the following properties:
WhereEnumerable data = source.Where(x=> x.Name == "Cheese"); //still deferred
print(data.First());
print(data.skipped); //Number of items that failed the test.
print(data.returned); //Number of items that passed the test.
And this could conceivably be useful - as demonstrated - and easy to implement in the basic LinqToObjects implementation. However it might be considerably harder to impossible to implement the same functionality in the LinqToSQL or LinqToMongo or LinqToOpenCL drivers. This would risk making code less portable between implementations, and increase the implementors complexity.
For example MongoDB runs the query on the server (in a specialized query language) and does not make these stats available to the user. Furthermore with concepts such as indexes, these concepts could be meaningless e.g. users.Where(user => user.ID = "{ID"}).First() on a index might 'skip' 0 records before finding the result, even if it's at position 100,412 in the Index or 40,231 on the disk or on index node 431. That's a 'simple' problem...
Lastly, you can always write your own LINQ methods to return your own custom types with this functionality if you wish or through overloads that output a 'stats' object and similar. For a hypothetical example of the latter:
var stats = new WhereStats();
WhereEnumerable data = source.Where(x=> x.Name == "Cheese", stats);
print(data.First());
print(stats.skipped); //Number of items that failed the test.
print(stats.returned); //Number of items that passed the test.
Edit: Example of a typed where (Proof of Concept Only):
using System;
using System.Collections.Generic;
using System.Linq;
namespace TypedWhereExample
{
class Program
{
static void Main(string[] args)
{
var data = Enumerable.Range(0, 1000);
var typedWhere1 = data.TypedWhere(x => x % 2 == 0);
var typedWhere2 = typedWhere1.TypedWhere(x => x % 3 == 0);
var result = typedWhere2.Take(10).ToList(); //Works like usual Linq
//But returns additional data
Console.WriteLine("Result: " + string.Join(",", result));
Console.WriteLine("Typed Where 1 Skipped: " + typedWhere1.Skipped);
Console.WriteLine("Typed Where 1 Returned: " + typedWhere1.Returned);
Console.WriteLine("Typed Where 2 Skipped: " + typedWhere2.Skipped);
Console.WriteLine("Typed Where 2 Returned: " + typedWhere2.Returned);
Console.ReadLine();
//Result: 0,6,12,18,24,30,36,42,48,54
//Typed Where 1 Skipped: 27
//Typed Where 1 Returned: 28
//Typed Where 2 Skipped: 18
//Typed Where 2 Returned: 10
}
}
public static class MyLINQ
{
public static TypedWhereEnumerable<T> TypedWhere<T>
(this IEnumerable<T> source, Func<T, bool> filter)
{
return new TypedWhereEnumerable<T>(source, filter);
}
}
public class TypedWhereEnumerable<T> : IEnumerable<T>
{
IEnumerable<T> source;
Func<T, bool> filter;
public int Skipped { get; private set; }
public int Returned { get; private set; }
public TypedWhereEnumerable(IEnumerable<T> source, Func<T, bool> filter)
{
this.source = source;
this.filter = filter;
}
IEnumerator<T> IEnumerable<T>.GetEnumerator()
{
foreach (var o in source)
if (filter(o)) { Returned++; yield return o; }
else Skipped++;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
foreach (var o in source)
if (filter(o)) { Returned++; yield return o; }
else Skipped++;
}
}
}
Just to make sure I understand your question correctly, I'll use an example:
Take this method:
public static IEnumerable<TSource> Where<TSource>
(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
I assume your question is: why does it return IEnumerable<TSource> and not, for example, an Enumerable.WhereEnumerableIterator<TSource> ?
Note that the type above is the actual runtime type of the object returned, but the method simply declares it to be IEnumerable<TSource>.
The answer is that there would be virtually no benefit in doing otherwise, but there would be a non-zero cost. When the cost is higher than the benefit, don't do it.
Why is there no benefit?
First of all, because there would be many WhereEnumerableIterator<TSource> objects around, that are still statically typed as IEnumerable<TSource>. As a result, method overloading would not work, and just as today, the Where method would have to try to cast its input to a WhereEnumerableIterator<TSource> if it want to optimize the .Where(...).Where(...) sequence. There are several reasons why this is the case, one of them being this pattern:
IEnumerable<whatever> q = source;
if (!string.IsNullOrEmpty(searchText))
{
q = q.Where(item => item.Name.Contains(searchText));
}
if (startDate.HasValue)
{
// What is the runtime type of q? And what is its compiletime type?
q = q.Where(item => item.Date > startDate.Value);
}
The non-zero cost is composed of maintenance and documentation cost (you make it more difficult to change your implementation if you expose it, and you have to document it), and increased complexity for the user.
IEnumerable's only yield a result upon enumeration. Say you have a query such as:
someEnumerable
.Select(a=>new b(a))
.Filter(b=>b.someProp > 10)
.Select(b=>new c(b));
If the return value of the LINQ steps where something with eagerly evaluated contents such as a List<T> then each step would have to fully evaluate in order to pass into the next. With large inputs, this could mean a noticeable wait/lag while performing that step.
LINQ queries return a lazily evaluated IEnumerable<T>. The query is performed upon enumeration. Even if your source IEnumerable<T> had millions of records the above query would be instantaneous.
Edit: Think of LINQ queries as creating a pipeline for your result rather than imperatively creating the result. Enumeration is essentially opening the resulting pipeline and seeing what comes out.
Additionally, IEnumerables are the most upcasted form of a sequence in .NET e.g.
IList<T> :> ICollection<T> :> IEnumerable<T>
This gives you the most flexible interface available.
I think my mind is exploding trying to figure out Funcs... If this makes no sense, I apologize, right now it make sense to me but its been a long day already ....
1) Assuming you are given a func which takes in T and outputs a string:
Func<T, string>
Can you transform that into a func that take in a T and returns a bool based on some logic (in this case if the returned string is empty (String.IsNullOrWhiteSpace)?
Func<T, bool>
2) Can you do the same thing if you are given an
Expression<Func<T, string>>
and need to convert it to a
Func<T, bool>
that returns true/false based on if the returned string is empty (String.IsNullOrWhiteSpace)?
Thanks
for the first part you can even make some "higher"-order function:
Func<A,C> MapFun<A,B,C>(Func<A,B> input, Func<B,C> transf)
{
return a => transf(input(a));
}
use with
Func <T,string> test = ...
var result = MapFun(test, String.IsNullOrWhiteSpace);
(I hope C# type type inference is working here)
If you define this as extension on Func it gets even easier:
public static class FuncExtension
{
public static Func<A,C> ComposeWith<A,B,C>(this Func<A,B> input, Func<B,C> f)
{
return a => f(input(a));
}
}
here is a very simple test:
Func<int, string> test = i => i.ToString();
var result = test.ComposeWith(string.IsNullOrEmpty);
For the second one: I think you can compile the expression into a "real" Func and then use the above code. see MSDN Docs on Expression.Compile
PS: renamed the function to better match it's intend (it's function composition)
Could you not define it as a separate delegate:
Func<T, string> func1 = t => t.ToString();
Func<T, bool> func2 = t => string.IsNullOrEmpty(func1(t));
For the first part the technique is known as function composition i.e you compose 2 functions to create a new function.
In your case you have a function Func<T,String> and another function (like string empty or null) which is of type Func<string,bool>, using function composition you can compose these two functions to create a new function of type Func<T,Bool>
Most functional programming language have this composition of function already defined in their standard library or in the language itself. But it is no tough to create one for your language if the language supports functions as first class values.
In C# you can use the below function which will allow you to compose functions:
public static Func<X,Z> Compose<X,Y,Z>(Func<X,Y> a, Func<Y,Z> b)
{
return (v) => b(a(v));
}
To 1: Yes (You can also parametrize bool and string):
Func<T, bool> Compose<T>(Func<T, string> source, Func<string, bool>map)
{
return x => map(source(x));
}
To 2: Yes, but you need to compile the expression first:
Func<T, bool> Compose<T>(Expression<Func<T, string>> source, Func<string, bool> map)
{
return x => compose(source.Compile(), map)
}
.Compile will compile the expression into a dynamic CLR method that you can invoke with the returned delegate.
You can use this code like this:
Func<int, string> ts = i => i.ToString();
var result = Compose(ts, string.IsNullOrEmpty);
By the way, in this case you should really write a higher-order function. What you are doing here (algebraically) is composing monoids. Remember function composition? f . g := f(g(x)) is what you are doing here.
Think of source as g:A->B and map as f:B->C (where A,B and C are sets) so the result of f . g is h:A->C. By the way, the . operator is often build into functional programming languages, such as Haskell and achieves the same thing as your compose function (but with cleaner syntax).