C# PLINQ .AsParallel() position in query - c#

StackOverflow,
Within C# PLINQ I understand the position of ".AsParallel()" impacts how the query is run. For example, where ".AsParallel()" occurs in the middle of a query it will execute sequentially before the method and parallel after the method. (PLINQ: Parallel Queries in .NET).
My question is, with a more complex query (below), where ".AsParallel" occurs at the start of the query (as a prefix to .Select) will all following methods execute parallel also? (currently, ".AsParallel" occurs after the .Select).
Collection =
typeof (Detail).GetProperties(BindingFlags.Public | BindingFlags.Instance)
.SelectMany(propertyInfo => recentPhases
.Where(phase => phase.Finalised)
.SelectMany(phase => phase.PhaseDetail
.Select(keyValuePair => new
{
phase.Direction,
phase.Momentum,
keyValuePair.Key,
keyValuePair.Value
}))
.Select(arg => new
{
Key = new BmkKey
{
Direction = (arg.Direction == Dir.Up ? Dir.Up : Dir.Down),
Momentum = (arg.Momentum == Mom.Price ? Mom.Price : Mom.Time),
BarNumber = arg.Key,
DetailType = propertyInfo.Name
},
Value = (double) propertyInfo.GetValue(arg.Value, null)
}))
.AsParallel().GroupBy(grp => grp.Key)
.ToDictionary(grp => grp.Key, grp => new Distribution(grp.Select(x => x.Value)));

Yes everything will be executed parallel after the AsParallel() method is called. From msdn:
public static ParallelQuery<TSource> AsParallel<TSource>(
this IEnumerable<TSource> source)
So the input is an IEnumerable<T> and the output a ParallelQuery<T>.
If we then look at the ParallelEnumerable class:
Provides a set of methods for querying objects that implement ParallelQuery{TSource}. This is the parallel equivalent of Enumerable.
So from then on, you won't be calling the methods defined for IEnumerable<T> but you will be calling their parallel counterparts defined for ParallelEnumerable.

Related

Combine Expressions instead of using multiple queries in Entity Framework

I have following generic queryable (which may already have selections applied):
IQueryable<TEntity> queryable = DBSet<TEntity>.AsQueryable();
Then there is the Provider class that looks like this:
public class Provider<TEntity>
{
public Expression<Func<TEntity, bool>> Condition { get; set; }
[...]
}
The Condition could be defined per instance in the following fashion:
Condition = entity => entity.Id == 3;
Now I want to select all Provider instances which have a Condition that is met at least by one entity of the DBSet:
List<Provider> providers = [...];
var matchingProviders = providers.Where(provider => queryable.Any(provider.Condition))
The problem with this: I'm starting a query for each Provider instance in the list. I'd rather use a single query to achieve the same result. This topic is especially important because of questionable performance. How can I achieve the same results with a single query and improve performance using Linq statements or Expression Trees?
Interesting challenge. The only way I see is to build dynamically UNION ALL query like this:
SELECT TOP 1 0 FROM Table WHERE Condition[0]
UNION ALL
SELECT TOP 1 1 FROM Table WHERE Condition[1]
...
UNION ALL
SELECT TOP 1 N-1 FROM Table WHERE Condition[N-1]
and then use the returned numbers as index to get the matching providers.
Something like this:
var parameter = Expression.Parameter(typeof(TEntity), "e");
var indexQuery = providers
.Select((provider, index) => queryable
.Where(provider.Condition)
.Take(1)
.Select(Expression.Lambda<Func<TEntity, int>>(Expression.Constant(index), parameter)))
.Aggregate(Queryable.Concat);
var indexes = indexQuery.ToList();
var matchingProviders = indexes.Select(index => providers[index]);
Note that I could have built the query without using Expression class by replacing the above Select with
.Select(_ => index)
but that would introduce unnecessary SQL query parameter for each index.
Here is another (crazy) idea that came in my mind. Please note that similar to my previous answer, it doesn't guarantee better performance (in fact it could be worse). It just presents a way to do what you are asking with a single SQL query.
Here we are going to create a query that returns a single string with length N consisting of '0' and '1' characters with '1' denoting a match (something like string bit array). The query will use my favorite group by constant technique to build dynamically something like this:
var matchInfo = queryable
.GroupBy(e => 1)
.Select(g =>
(g.Max(Condition[0] ? "1" : "0")) +
(g.Max(Condition[1] ? "1" : "0")) +
...
(g.Max(Condition[N-1] ? "1" : "0")))
.FirstOrDefault() ?? "";
And here is the code:
var group = Expression.Parameter(typeof(IGrouping<int, TEntity>), "g");
var concatArgs = providers.Select(provider => Expression.Call(
typeof(Enumerable), "Max", new[] { typeof(TEntity), typeof(string) },
group, Expression.Lambda(
Expression.Condition(
provider.Condition.Body, Expression.Constant("1"), Expression.Constant("0")),
provider.Condition.Parameters)));
var concatCall = Expression.Call(
typeof(string).GetMethod("Concat", new[] { typeof(string[]) }),
Expression.NewArrayInit(typeof(string), concatArgs));
var selector = Expression.Lambda<Func<IGrouping<int, TEntity>, string>>(concatCall, group);
var matchInfo = queryable
.GroupBy(e => 1)
.Select(selector)
.FirstOrDefault() ?? "";
var matchingProviders = matchInfo.Zip(providers,
(match, provider) => match == '1' ? provider : null)
.Where(provider => provider != null)
.ToList();
Enjoy:)
P.S. In my opinion, this query will run with constant speed (regarding number and type of the conditions, i.e. can be considered O(N) in the best, worst and average cases, where N is the number of the records in the table) because the database has to perform always a full table scan. Still it will be interesting to know what's the actual performance, but most likely doing something like this just doesn't worth the efforts.
Update: Regarding the bounty and the updated requirement:
Find a fast query that only reads a record of the table once and ends the query if already all conditions are met
There is no standard SQL construct (not even speaking about LINQ query translation) that satisfies both conditions. The constructs that allow early end like EXISTS can be used for a single condition, thus when executed for multiple conditions will violate the first rule of reading the table record only once. While the constructs that use aggregates like in this answer satisfy the first rule, but in order to produce the aggregate value they have to read all the records, thus cannot exit earlier.
Shortly, there is no query that can satisfy both requirements. What about the fast part, it really depends of the size of the data and the number and type of the conditions, table indexes etc., so again there is simply no "best" general solution for all cases.
Based on this Post by #Ivan I created an expression that is slightly faster in some cases.
It uses Any instead of Max to get the desired results.
var group = Expression.Parameter(typeof(IGrouping<int, TEntity>), "g");
var anyMethod = typeof(Enumerable)
.GetMethods()
.First(m => m.Name == "Any" && m.GetParameters()
.Count() == 2)
.MakeGenericMethod(typeof(TEntity));
var concatArgs = Providers.Select(provider =>
Expression.Call(anyMethod, group,
Expression.Lambda(provider.Condition.Body, provider.Condition.Parameters)));
var convertExpression = concatArgs.Select(concat =>
Expression.Condition(concat, Expression.Constant("1"), Expression.Constant("0")));
var concatCall = Expression.Call(
typeof(string).GetMethod("Concat", new[] { typeof(string[]) }),
Expression.NewArrayInit(typeof(string), convertExpression));
var selector = Expression.Lambda<Func<IGrouping<int, TEntity>, string>>(concatCall, group);
var matchInfo = queryable
.GroupBy(e => 1)
.Select(selector)
.First();
var MatchingProviders = matchInfo.Zip(Providers,
(match, provider) => match == '1' ? provider : null)
.Where(provider => provider != null)
.ToList();
The approach I tried here was to create Conditions and nest them into one Expression. If one of the Conditions is met, we get the index of the Provider for it.
private static Expression NestedExpression(
IEnumerable<Expression<Func<TEntity, bool>>> expressions,
int startIndex = 0)
{
var range = expressions.ToList();
range.RemoveRange(0, startIndex);
if (range.Count == 0)
return Expression.Constant(-1);
return Expression.Condition(
range[0].Body,
Expression.Constant(startIndex),
NestedExpression(expressions, ++startIndex));
}
Because the Expressions still may use different ParameterExpressions, we need an ExpressionVisitor to rewrite those:
private class PredicateRewriterVisitor : ExpressionVisitor
{
private readonly ParameterExpression _parameterExpression;
public PredicateRewriterVisitor(ParameterExpression parameterExpression)
{
_parameterExpression = parameterExpression;
}
protected override Expression VisitParameter(ParameterExpression node)
{
return _parameterExpression;
}
}
For the rewrite we only need to call this method:
private static Expression<Func<T, bool>> Rewrite<T>(
Expression<Func<T, bool>> exp,
ParameterExpression parameterExpression)
{
var newExpression = new PredicateRewriterVisitor(parameterExpression).Visit(exp);
return (Expression<Func<T, bool>>)newExpression;
}
The query itself and the selection of the Provider instances works like this:
var parameterExpression = Expression.Parameter(typeof(TEntity), "src");
var conditions = Providers.Select(provider =>
Rewrite(provider.Condition, parameterExpression)
);
var nestedExpression = NestedExpression(conditions);
var lambda = Expression.Lambda<Func<TEntity, int>>(nestedExpression, parameterExpression);
var matchInfo = queryable.Select(lambda).Distinct();
var MatchingProviders = Providers.Where((provider, index) => matchInfo.Contains(index));
Note: Another option which isn't really fast as well
Here is another view of the problem that has nothing to do with expressions.
Since the main goal is to improve the performance, if the attempts to produce the result with single query don't help, we could try improving the speed by parallelizing the execution of the original multi query solution.
Since it's really a LINQ to Objects query (which internally executes multiple EF queries), theoretically it should be a simple matter of turning it into a PLINQ query by inserting AsParallel like this (non working):
var matchingProviders = providers
.AsParallel()
.Where(provider => queryable.Any(provider.Condition))
.ToList();
However, it turns out that EF DbContext is not well suited for multi thread access, and the above simply generates runtime errors. So I had to resort to TPL using one of the Parallel.ForEach overloads that allows us to supply local state, which I used to allocate several DbContext instances during the execution.
The final working code looks like this:
var matchingProviders = new List<Provider<TEntity>>();
Parallel.ForEach(providers,
() => new
{
context = new MyDbContext(),
matchingProviders = new List<Provider<TEntity>>()
},
(provider, state, data) =>
{
if (data.context.Set<TEntity>().Any(provider.Condition))
data.matchingProviders.Add(provider);
return data;
},
data =>
{
data.context.Dispose();
if (data.matchingProviders.Count > 0)
{
lock (matchingProviders)
matchingProviders.AddRange(data.matchingProviders);
}
}
);
If you have a multi core CPU (which is normal nowadays) and a good database server, this should give you the improvement you are seeking for.

ToUpper() in lambda not working

var Result = addressContext.Address_Lookup
.Where(c => c.Address_Full.ToUpper().Contains(term.ToUpper())
|| c.Address_Full.ToUpper().Contains(TermModified.ToUpper()))
.Select(e => new {
id = e.Address_ID,
label = e.Address_Full,
value = e.Address_Full })
.ToList();
To ensure search will be non-case sensitive I am using ToUpper().
I am searching for something like Jimmy (with a capital J). jimmy (all lower case) doesnt catch? why?
Since you're using entity-framework, a linq-to-sql framework, you're actually trying to make the database perform a .ToUpper rather than performing one in-memory as you would if running through an IEnumerable. If the query translation in your framework doesn't support the function, it either won't be used or throw an Exception.
You can generally predict such behaviour by checking whether you're calling a function against an IQueryable object, which queues all calls as an expression tree for translation, or an IEnumerable, which uses foreach and yield returns to handle evaluation. Since the Linq functions are extension methods, polymorphism doesn't apply here.
If you're not worried about the performance hit of getting EVERY entry from that table in-memory, add a .AsEnumerable() call, and your functionwill evaluate on localized data.
var Result = addressContext.Address_Lookup
.AsEnumerable()
.Where(c => c.Address_Full.ToUpper().Contains(term.ToUpper())
|| c.Address_Full.ToUpper().Contains(TermModified.ToUpper()))
.Select(e => new
{
id = e.Address_ID,
label = e.Address_Full,
value = e.Address_Full
})
.ToList();

Can't get Contains to work in Linq query where clause

List<Guid> toBeFilteredCarIds = new List<Guid>();
toBeFilteredCarIds.Add(new Guid("4cc70c3a-405c-4a5c-b2cd-0429a5bc06ef"));
var cars = ef.Cars
.Include("CarProfile.Option.OptionType")
.Where(c => c.CarStatusId == 1);
cars.Join(ef.CarProfile,
t1 => t1.CarId,
t2 => t2.Car.CarId,
(t1, t2) => new { t1, t2 }).Where(o => o.t2.IsActive == true).Select(o => o.t1);
var filteredCars = cars.ToList().Where(u => toBeFilteredCarIds.Contains(u.CarId));
the above code is trying to get list of Cars, where CarProfile is active, and the CarId is in the toBeFilteredCarIds list.
However as you can see in the last line, I am doing .ToList() first and then doing Where clause to filter by the CarIds.
This obviously will get all cars first from DB, and then do the filter. Which is very expensive.
I have tried the way others have suggested on other answers:
List<Guid> toBeFilteredCarIds = new List<Guid>();
toBeFilteredCarIds.Add(new Guid("4cc70c3a-405c-4a5c-b2cd-0429a5bc06ef"));
var cars = ef.Cars
.Include("CarProfile.Option.OptionType")
.Where(c => toBeFilteredCarIds.Contains(c.CarId) && c.CarStatusId == 1);
cars.Join(ef.CarProfile,
t1 => t1.CarId,
t2 => t2.Car.CarId,
(t1, t2) => new { t1, t2 }).Where(o => o.t2.IsActive == true).Select(o => o.t1);
var filteredCars = cars.ToList();
but that's not working for me, it gives me this error:
LINQ to Entities does not recognize the method 'Boolean Contains(System.Guid)' method, and this method cannot be translated into a store expression.
I can see on stackoverflow, many have marked as answered the above approach:
.Where(c => toBeFilteredCarIds.Contains(c.CarId)
but its not working for me.
By they way am using: VS2008, EF 3.5, and I have got using System.Data.Entity; in my using statements.
"Include" used above is important, as I need to get everything before hand, as there will be huge set of loops reading the data afterwords.
Contains on IEnumerable wasn't added until 4.0 so if you're stuck in 3.5 you'll need to do something like unrolling the values before sending the query.
Look at: http://blogs.msdn.com/b/alexj/archive/2009/03/26/tip-8-writing-where-in-style-queries-using-linq-to-entities.aspx
Since .NET 3.5 doesn't support Contains, you can use a Dynamic Linq Library which would allow you generate your where clause using a string. That way, you wouldn't have to fiddle with all the Expression lingo.
Available on NuGet: https://www.nuget.org/packages/System.Linq.Dynamic.Library/.
Your Where clause would look something like this:
.Where("CarStatusId=1 AND (CarId=1 OR CarId=2 OR CarId=3"))
You'd just have to generate that string based on your toBeFilteredCarIds list.
You can use Any(). Not as elegant as Contains(), but it does the job in .NET 3.5:
.Where(c => toBeFilteredCarIds.Any(g => g == c.CarId) && c.CarStatusId == 1);

Internal .NET Framework Data Provider error 1025

IQueryable<Organization> query = context.Organizations;
Func<Reservation, bool> predicate = r => !r.IsDeleted;
query.Select(o => new {
Reservations = o.Reservations.Where(predicate)
}).ToList();
this query throws "Internal .NET Framework Data Provider error 1025" exception but the query below does not.
query.Select(o => new {
Reservations = o.Reservations.Where( r => !r.IsDeleted)
}).ToList();
I need to use the first one because I need to check a few if statements for constructing the right predicate. I know that I can not use if statements in this circumstance that is why I pass a delegate as parameter.
How can I make the first query work?
While the other answers are true, note that when trying to use it after a select statement one has to call AsQueryable() explicitly, otherwise the compiler will assume that we are trying to use IEnumerable methods, which expect a Func and not Expression<Func>.
This was probably the issue of the original poster, as otherwise the compiler will complain most of the time that it is looking for Expression<Func> and not Func.
Demo:
The following will fail:
MyContext.MySet.Where(m =>
m.SubCollection.Select(s => s.SubItem).Any(expr))
.Load()
While the following will work:
MyContext.MySet.Where(m =>
m.SubCollection.Select(s => s.SubItem).AsQueryable().Any(expr))
.Load()
After creating the bounty (rats!), I found this answer, which solved my problem. (My problem involved a .Any() call, which is a little more complicated than this question...)
In short, here's your answer:
IQueryable<Organization> query = context.Organizations;
Expression<Func<Reservation, bool>> expr = r => !r.IsDeleted;
query.Select(o => new { Reservations = o.Reservations.Where(expr) })
.ToList();
Read the referenced answer for an explanation of why you need the local variable expr, and you can't directly reference another method of return type Expression<Func<Reservation, bool>>.
Thanks for pinging me. I guess I was on the right track after all.
Anyway, to reiterate, LINQ to Entities (thanks to Jon Skeet for correcting me when I got mixed up in my own thought process in the comments) operates on Expression Trees; it allows for a projection to translate the lambda expression to SQL by the QueryProvider.
Regular Func<> works well for LINQ to Objects.
So in this case, when you're using the Entity Framework, any predicate passed to the EF's IQueryable has to be the Expression<Func<>>.
I just experienced this issue in a different scenario.
I have a static class full of Expression predicates which I can then combine or pass to an EF query. One of them was:
public static Expression<Func<ClientEvent, bool>> ClientHasAttendeeStatus(
IEnumerable<EventEnums.AttendeeStatus> statuses)
{
return ce => ce.Event.AttendeeStatuses
.Where(a => a.ClientId == ce.Client.Id)
.Select(a => a.Status.Value)
.Any(statuses.Contains);
}
This was throwing the 1025 error due to the Contains method group call. The entity framework expected an Expression and found a method group, which resulted in the error. Converting the code to use a lambda (which can be implicitly cast to an Expression) fixed the error
public static Expression<Func<ClientEvent, bool>> ClientHasAttendeeStatus(
IEnumerable<EventEnums.AttendeeStatus> statuses)
{
return ce => ce.Event.AttendeeStatuses
.Where(a => a.ClientId == ce.Client.Id)
.Select(a => a.Status.Value)
.Any(x => statuses.Contains(x));
}
Aside: I then simplified the expression to ce => ce.Event.AttendeeStatuses.Any(a => a.ClientId == ce.Client.Id && statuses.Contains(a.Status.Value));
Had a similar problem. Library of ViewModels that look like this:
public class TagViewModel
{
public int Id { get; set; }
public string Name { get; set; }
public static Expression<Func<SiteTag, TagViewModel>> Select = t => new TagViewModel
{
Id = t.Id,
Name = t.Name,
};
This works:
var tags = await db.Tags.Take(10).Select(TagViewModel.Select)
.ToArrayAsync();
But, this won't compile:
var post = await db.Posts.Take(10)
.Select(p => new {
Post = p,
Tags = p.Tags.Select(pt => pt.Tag).Select(TagViewModel.Select)
})
.ToArrayAsync();
Because the second .Select is a mess - the first one is actually called off of an ICollection, which is not IQueryable, so it consumes that first Expression as a plain Func, not Expression<Func.... That returns IEnumerable<..., as discussed on this page. So .AsQueryable() to the rescue:
var post = await db.Posts.Take(10)
.Select(p => new {
Post = p,
Tags = p.Tags.Select(pt => pt.Tag).AsQueryable()
.Select(TagViewModel.Select)
})
.ToArrayAsync();
But that creates a new, weirder problem: Either I get Internal Framework...Error 1025, or I get the post variable with a fully loaded .Post property, but the .Tags property has an EF proxy object that seems to be used for Lazy-Loading.
The solution is to control the return type of Tags, by ending use of the Anonymous class:
public class PostViewModel
{
public Post Post { get; set; }
public IEnumerable<TagViewModel> Tags { get; set; }
Now select into this and it all works:
var post = await db.Posts.Take(10)
.Select(p => new PostViewModel {
Post = p,
Tags = p.Tags.Select(pt => pt.Tag).AsQueryable()
.Select(TagViewModel.Select)
})
.ToArrayAsync();

Calling a method inside a Linq query

I want to insert into my table a column named 'S' that will get some string value based on a value it gets from a table column.
For example: for each ID (a.z) I want to gets it's string value stored in another table. The string value is returned from another method that gets it through a Linq query.
Is it possible to call a method from Linq?
Should I do everything in the same query?
This is the structure of the information I need to get:
a.z is the ID in the first square in table #1, from this ID I get another id in table #2, and from that I can get my string value that I need to display under column 'S'.
var q = (from a in v.A join b in v.B
on a.i equals b.j
where a.k == "aaa" && a.h == 0
select new {T = a.i, S = someMethod(a.z).ToString()})
return q;
The line S = someMethod(a.z).ToString() causing the following error:
Unable to cast object of type 'System.Data.Linq.SqlClient.SqlColumn'
to type 'System.Data.Linq.SqlClient.SqlMethodCall'.
You have to execute your method call in Linq-to-Objects context, because on the database side that method call will not make sense - you can do this using AsEnumerable() - basically the rest of the query will then be evaluated as an in memory collection using Linq-to-Objects and you can use method calls as expected:
var q = (from a in v.A join b in v.B
on a.i equals b.j
where a.k == "aaa" && a.h == 0
select new {T = a.i, Z = a.z })
.AsEnumerable()
.Select(x => new { T = x.T, S = someMethod(x.Z).ToString() })
You'll want to split it up into two statements. Return the results from the query (which is what will hit the database), and then enumerate the results a second time in a separate step to transform the translation into the new object list. This second "query" won't hit the database, so you'll be able to use the someMethod() inside it.
Linq-to-Entities is a bit of a strange thing, because it makes the transition to querying the database from C# extremely seamless: but you always have to remind yourself, "This C# is going to get translated into some SQL." And as a result, you have to ask yourself, "Can all this C# actually get executed as SQL?" If it can't - if you're calling someMethod() inside it - your query is going to have problems. And the usual solution is to split it up.
(The other answer from #BrokenGlass, using .AsEnumerable(), is basically another way to do just that.)
That is an old question, but I see nobody mention one "hack", that allows to call methods during select without reiterating. Idea is to use constructor and in constructor you can call whatever you wish (at least it works fine in LINQ with NHibernate, not sure about LINQ2SQL or EF, but I guess it should be the same).
Below I have source code for benchmark program, it looks like reiterating approach in my case is about twice slower than constructor approach and I guess there's no wonder - my business logic was minimal, so things like iteration and memory allocation matters.
Also I wished there was better way to say, that this or that should not be tried to execute on database,
// Here are the results of selecting sum of 1 million ints on my machine:
// Name Iterations Percent
// reiterate 294 53.3575317604356%
// constructor 551 100%
public class A
{
public A()
{
}
public A(int b, int c)
{
Result = Sum(b, c);
}
public int Result { get; set; }
public static int Sum(int source1, int source2)
{
return source1 + source2;
}
}
class Program
{
static void Main(string[] args)
{
var range = Enumerable.Range(1, 1000000).ToList();
BenchmarkIt.Benchmark.This("reiterate", () =>
{
var tst = range
.Select(x => new { b = x, c = x })
.AsEnumerable()
.Select(x => new A
{
Result = A.Sum(x.b, x.c)
})
.ToList();
})
.Against.This("constructor", () =>
{
var tst = range
.Select(x => new A(x, x))
.ToList();
})
.For(60)
.Seconds()
.PrintComparison();
Console.ReadKey();
}
}

Categories