LINQ to SQL Func as input Performance - c#

I have created a simple method using EF 6 that will query with grouping based on some input information and some possible Type and SubType values, as the following
public int GetOriginal(DateTime startDate, DateTime endDate, List<int> userIds)
{
DateTime dt = DateTime.UtcNow;
var ret = DbContext.ContactFeedback
.Where(c => c.FeedbackDate >= startDate &&
c.FeedbackDate <= endDate && userIds.Contains(c.UserId) &&
(c.Type == FeedbackType.A || c.Type == FeedbackType.B || c.Type == FeedbackType.C))
.GroupBy(x => new {TruncateTime = DbFunctions.TruncateTime(x.FeedbackDate), x.LeadId, x.UserId})
.Count();
Console.WriteLine(string.Format("{0}",DateTime.UtcNow - dt));
return ret;
}
It works as expected, however if I try to create a new auxiliar method that receives the "query" (Func type object) as input to be run, I see a very big difference in performance which I'm not able to explain, because they should run exactly the same.
Here is my rewritten methods
public int GetRewritten(DateTime startDate, DateTime endDate, List<int> userIds)
{
DateTime dt = DateTime.UtcNow;
var query = new Func<ContactFeedback, bool>(c => c.FeedbackDate >= startDate && c.FeedbackDate <= endDate && userIds.Contains(c.UserId) &&
(c.Type == FeedbackType.A || c.Type == FeedbackType.B ||
c.Type == FeedbackType.C));
var ret = GetTotalLeadsByFeedback(query);
Console.WriteLine(string.Format("{0}",DateTime.UtcNow - dt));
return ret;
}
private int GetTotalLeadsByFeedback(Func<ContactFeedback, bool> query)
{
return DbContext.ContactFeedback
.Where(query)
.GroupBy(x => new { TruncateTime = DbFunctions.TruncateTime(x.FeedbackDate), x.LeadId, x.UserId })
.Count();
}
Here are the running times in seconds
GetOriginal with 1 userId:0.0156318 - With ~100 usersIds: 0.1455635
GetRewritten with 1 userId:0.4742711 - With ~100 usersIds: 7.2555701
As you can see the difference is huge, anyone can share a light on why this occurs?
I'm running everything on Azure with a SQL Server DB if it helps

I see a very big difference in performance which I'm not able to explain, because they should run exactly the same.
They're considerably different in approach. The first part of your initial method's query:
DbContext.ContactFeedback
.Where(c => c.FeedbackDate >= startDate &&
c.FeedbackDate <= endDate && userIds.Contains(c.UserId) &&
(c.Type == FeedbackType.A || c.Type == FeedbackType.B || c.Type == FeedbackType.C))
Is equivalent to:
DbContext.ContactFeedback
.Where(new Expression<Func<ContactFeedback, bool>>(new Func<ContactFeedback, bool>(c => c.FeedbackDate >= startDate && c.FeedbackDate <= endDate && userIds.Contains(c.UserId) &&
(c.Type == FeedbackType.A || c.Type == FeedbackType.B ||
c.Type == FeedbackType.C)))
When you call .Where on an IQueryable<T> it will (barring a case where the type implementing IQueryable<T> has its own appliable .Where which would be strange) call into:
public static IQueryable<TSource> Where<TSource>(
this IQueryable<TSource> source,
Expression<Func<TSource, bool>> predicate
)
Bearing in mind that lambdas in source code can be turned into either a Func<…> or an Expression<Func<…>> as applicable.
Entity Framework then combines this query with the GroupBy and finally upon Count() turns the entire query into the appropriate SELECT COUNT … query, which the database performs (just how quickly depending on table contents and what indices are set, but which should be reasonably quick) and then a single value is sent back from the database for EF to obtain.
Your version though has explicitly assigned the lambda to a Func<ContactFeedback, bool>. As such using it with Where it has to call into:
public static IEnumerable<TSource> Where<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate
)
So to do the Where EF has to do retrieve every column of every row from the database, and then filter out those rows for which that Func returns true, then group them in memory (which requires storing the partially-constructed groups) before doing a Count by a mechanism like:
public int Count<T>(this IEnumerable<T> source)
{
/* some attempts at optimising that don't apply to this case and so in fact just waste a tiny amount omitted */
int tally = 0;
using(var en = source.GetEnumerator())
while(en.MoveNext())
++tally;
return tally;
}
This is a lot more work with a lot more traffic between the EF and database, and so a lot slower.
A rewrite of the sort you attempted would be better approximated by:
public int GetRewritten(DateTime startDate, DateTime endDate, List<int> userIds)
{
DateTime dt = DateTime.UtcNow;
var query = new Expression<Func<ContactFeedback, bool>>(c => c.FeedbackDate >= startDate && c.FeedbackDate <= endDate && userIds.Contains(c.UserId) &&
(c.Type == FeedbackType.A || c.Type == FeedbackType.B ||
c.Type == FeedbackType.C));
var ret = GetTotalLeadsByFeedback(query);
Console.WriteLine(string.Format("{0}",DateTime.UtcNow - dt));
return ret;
}
private int GetTotalLeadsByFeedback(Expression<Func<ContactFeedback, bool>> predicate)
{
return DbContext.ContactFeedback
.Where(predicate)
.GroupBy(x => new { TruncateTime = DbFunctions.TruncateTime(x.FeedbackDate), x.LeadId, x.UserId })
.Count();
}
(Note also that I changed the name of the predicate to predicate, as predicate is more commonly used for predicates, query for a source along with zero or more methods acting upon it; so DbContext.ContactFeedback, DbContext.ContactFeedback.Where(predicate) and DbContext.ContactFeedback.Where(predicate).GroupBy(x => new { TruncateTime = DbFunctions.TruncateTime(x.FeedbackDate), x.LeadId, x.UserId }) would all be queries if enumerated, and DbContext.ContactFeedback.Where(predicate).GroupBy(x => new { TruncateTime = DbFunctions.TruncateTime(x.FeedbackDate), x.LeadId, x.UserId }).Count() is a query that immediately executes and returns a single value).
Conversely, the form you ended up with could be written back into the style of GetOriginal as:
public int GetNotOriginal(DateTime startDate, DateTime endDate, List<int> userIds)
{
DateTime dt = DateTime.UtcNow;
var ret = DbContext.ContactFeedback
.AsEnumerable()
.Where(c => c.FeedbackDate >= startDate &&
c.FeedbackDate <= endDate && userIds.Contains(c.UserId) &&
(c.Type == FeedbackType.A || c.Type == FeedbackType.B || c.Type == FeedbackType.C))
.GroupBy(x => new {TruncateTime = DbFunctions.TruncateTime(x.FeedbackDate), x.LeadId, x.UserId})
.Count();
Console.WriteLine(string.Format("{0}",DateTime.UtcNow - dt));
return ret;
}
Note the AsEnumerable forcing the Where and everything that follows to be executed in the .NET application, rather than on the database.

Related

How to efficiently grab results from Linq with or without user input params

I have a linq query that I would like to return results with user input data. However, if this function gets called and there is zero data from user, OR user just wants to search via data, OR just one of the other parameters, how can I efficiently write the linq to accommodate for this? Here is the Linq and function:
public static List<Objects.Logs.GenericLog> GetLogs(int entityId, int logLevelId,
DateTime startDate, DateTime endDate)
{
var logsList = new List<Objects.Logs.GenericLog>();
using(var db = CORAContext.GetCORAContext())
{
logsList = (from i in db.GenericLog select new Objects.Logs.GenericLog()
{
EntityId = i.FkEntityId,
LogSourceCode = i.FkLogSourceCode,
LogLevelId = i.FkLogLevelId,
LogDateTime = i.LogDateTime,
LogId = i.PkLogId,
Message = i.Message
})
.Where(i => i.LogDateTime >= startDate && i.LogDateTime <= endDate)
.Where(i => i.EntityId == entityId || i.EntityId == null)
.Where(i => i.LogLevelId == logLevelId || i.EntityId == null)
.ToList();
}
return logsList;
}
For example, in the second and third Where(), I have || i.EntityId == null... thinking this would accomodate for is user input for Entity is null?
Will this work?
Also, how can I do this for date ranges? Can I also do the same?
Finally, is there a BETTER way to do this?
Split creating a query and generating a final result by .ToList()
When you generate a query, you can add where statements on demand, like this:
public static List<Objects.Logs.GenericLog> GetLogs(int entityId, int logLevelId, DateTime startDate, DateTime endDate)
{
var logsList = new List<Objects.Logs.GenericLog>();
using(var db = CORAContext.GetCORAContext())
{
var query = (from i in db.GenericLog select new Objects.Logs.GenericLog()
{
EntityId = i.FkEntityId,
LogSourceCode = i.FkLogSourceCode,
LogLevelId = i.FkLogLevelId,
LogDateTime = i.LogDateTime,
LogId = i.PkLogId,
Message = i.Message
});
if(someCondition) {
query = query.Where(i => i.LogDateTime >= startDate && i.LogDateTime <= endDate)
}
query = query.Where(i => i.EntityId == entityId || i.EntityId == null)
query = query.Where(i => i.LogLevelId == logLevelId || i.EntityId == null)
logsList = query.ToList();
}
return logsList;
}
If I understand you correctly, you have a method that gets a filtered set of data based on the values of the parameters passed in. But you want to make the parameters optional, so if the user wants data for all entities, they wouldn't pass in an entityId.
If that's the case, then you can make the arguments optional by providing a default value for them in the method signature. We can then check if the argument has the default value, and if it does, don't apply that filter; otherwise apply it.
We can do this by doing .Where(x => argHasDefaultValue || someFilter). This works because if the argument has the default value, then the second part of the || is ignored.
For example:
public static List<Objects.Logs.GenericLog> GetLogs(int entityId = int.MinValue,
int logLevelId = int.MinValue, DateTime startDate = default(DateTime),
DateTime endDate = default(DateTime))
{
using(var db = CORAContext.GetCORAContext())
{
return db.GenericLog
.Where(i => startDate == default(DateTime) || i.LogDateTime >= startDate)
.Where(i => endDate == default(DateTime) || i.LogDateTime <= endDate)
.Where(i => entityId == int.MinValue || i.EntityId == entityId)
.Where(i => logLevelId == int.MinValue || i.LogLevelId == logLevelId)
.Select(i => new Objects.Logs.GenericLog
{
EntityId = i.FkEntityId,
LogSourceCode = i.FkLogSourceCode,
LogLevelId = i.FkLogLevelId,
LogDateTime = i.LogDateTime,
LogId = i.PkLogId,
Message = i.Message
}).ToList();
}
}

entity framework, order by preventing left outer join

I have a following query, which is causing me performance issues in SQL server.
ctx.Articles
.Where(m => m.Active)
.Where(m => m.PublishDate <= DateTime.Now)
.Where(m => m.Sponsored == false)
.WhereIf(request.ExcludeFirstNews, m => m.PositionForCategory != 1)
.WhereIf(category != null, m => m.RootCategoryId == category.RootCategoryId)
.WhereIf(request.TimeRange != 0 && request.TimeRange == TimeRange.today, m => m.PublishDate.Value >= today)
.WhereIf(request.TimeRange != 0 && request.TimeRange == TimeRange.yesterday, m => m.PublishDate.Value >= yesterday)
.WhereIf(request.TimeRange != 0 && request.TimeRange == TimeRange.week, m => m.PublishDate.Value >= week &&
m.PublishDate.Value <= DateTime.Now)
.WhereIf(request.TimeRange != 0 && request.TimeRange == TimeRange.month, m => m.PublishDate.Value >= month &&
m.PublishDate.Value <= DateTime.Now)
.OrderByDescending(m => m.ArticleViewCountSum.ViewCountSum)
Is there a way to make entity framework to use inner join instead of left outer join for order by syntax part of the lambda expression?
I've checked query in profiler, and it generaters left outer join, which is not necessary in this case, and runs much faster if i change query manually in SQL.
EDIT:
Per questions, in the comments, this is code for WhereIf
public static IQueryable<T> WhereIf<T>(
this IQueryable<T> source,
bool condition,
Expression<Func<T, bool>> predicate)
{
if (condition)
{
return source.Where(predicate);
}
return source;
}

LINQ DateTime Query that ignores milliseconds

x.CreateDate DateTime is stored in our database down to milliseconds. My dateTimePicker values startdate and enddate only allows for querying down to seconds.
How can change my query to ignore the milliseconds of x.CreateDate? I thought the code I wrote below would work but it is not.
if (stardDateIsValid && endDateIsValid && startdate == enddate)
query = _context.Logs
.Where(x => x.ApplicationID == applicationId &&
x.CreateDate.AddMilliseconds(-x.CreateDate.Millisecond) == startdate)
.OrderByDescending(x => x.ID)
.Take(count);
var query = from l in _context.Logs
where l.ApplicationID == applicationId
&& SqlMethods.DateDiffSecond(l.CreateDate,startdate) == 0
orderby l.ID descending
select l).Take(count);
This avoids converting every date in you table into a string and the subsequent string comparison, by comparing the two dates as dates.
Getting CreateDate and startdate in the same format will help you compare apples to apples. This should accomplish that.
if (stardDateIsValid && endDateIsValid && startdate == enddate)
query = _context.Logs
.Where(x => x.ApplicationID == applicationId &&
x.CreateDate.ToString(#"MM/DD/YYYY h:mm:ss") == startdate.ToString(#"MM/DD/YYYY h:mm:ss")
.OrderByDescending(x => x.ID)
.Take(count);
I have no idea why I could not get any results from the queries posted above as I tried several variations of their themes. However I did get it working correctly by adding milliseconds to the startdate and enddate variables and it s working.
if (stardDateIsValid && endDateIsValid)
startdate = startdate.AddMilliseconds(000);
enddate = enddate.AddMilliseconds(999);
query = _context.Logs.Where(x => x.ApplicationID == applicationId && x.CreateDate >= startdate && x.CreateDate <= enddate).OrderByDescending(x => x.ID).Take(count);
You can create extension method.
public const long TicksPerMillisecond = 10000;
public const long TicksPerSecond = TicksPerMillisecond * 1000;
public static bool IsEqualIgnoreMilliseconds(this DateTime date, DateTime compareDate)
{
long tickDiff = date.Ticks - compareDate.Ticks;
return tickDiff > 0 ? tickDiff < TicksPerSecond : tickDiff < -TicksPerSecond;
}
Then you can use this:
if (stardDateIsValid && endDateIsValid && startdate == enddate)
query = _context.Logs
.Where(x => x.ApplicationID == applicationId &&
x.CreateDate.IsEqualIgnoreMilliseconds(startdate)
.OrderByDescending(x => x.ID)
.Take(count);

Entity Framework Dynamic Query Expression being ignored

I have an operation that takes a serializable QueryModel and converts it to an Expression to be passed to Entity Framework. My query against the database looks like:
public IEnumerable<PhotoVerifySessionOverview> FindSessions(Expression<Func<vwPhotoVerifySession, bool>> predicate, PaginationModel model)
{
var sessions = Context.vwPhotoVerifySessions
.AsQueryable()
.Where(predicate)
.OrderBy(string.Format("{0} {1}", model.OrderByColumn, model.OrderByDirection))
.Skip(model.Offset)
.Take(model.PageSize);
return Mapper.Map<IEnumerable<PhotoVerifySessionOverview>>(sessions);
}
and my predicate builder looks like:
var predicate = PredicateBuilder.True<vwPhotoVerifySession>();
//Add the tenant to the where clause
if (model.TenantId.HasValue)
predicate.And(p => p.TenantId == model.TenantId.Value);
else
predicate.And(p => p.TenantReferenceId == model.TenantReferenceId);
//Add a date range if one is present
if (model.CreatedOnRange != default(DateRange))
{
var endDate = model.CreatedOnRange.End == default(DateTime) ? DateTime.Now : model.CreatedOnRange.End;
predicate.And(p => p.CreatedOn >= model.CreatedOnRange.Start && p.CreatedOn <= endDate);
}
//Include status filtering if any filters are present
if (model.StatusFilter != null && model.StatusFilter.Any())
{
//use Id and name to search for status
predicate.And(p => model.StatusFilter.Any(f => f.StatusId == p.StatusId || p.Status == f.Name));
}
var pagination = model as PaginationModel;
var sessions = Manager.FindSessions(predicate, pagination);
return sessions;
The problem is, my Where clause is not being evaluated and all results are being returned. Is there something else I should be doing to get the Where statement to work correctly?
You need to assign predicate back to itself.
predicate = predicate.And(p => p.TenantId == model.TenantId.Value);

How to combine the multiple part linq into one query?

Operator should be ‘AND’ and not a ‘OR’.
I am trying to refactor the following code and i understood the following way of writing linq query may not be the correct way. Can somone advice me how to combine the following into one query.
AllCompany.Where(itm => itm != null).Distinct().ToList();
if (AllCompany.Count > 0)
{
//COMPANY NAME
if (isfldCompanyName)
{
AllCompany = AllCompany.Where(company => company["Company Name"].StartsWith(fldCompanyName)).ToList();
}
//SECTOR
if (isfldSector)
{
AllCompany = AllCompany.Where(company => fldSector.Intersect(company["Sectors"].Split('|')).Any()).ToList();
}
//LOCATION
if (isfldLocation)
{
AllCompany = AllCompany.Where(company => fldLocation.Intersect(company["Location"].Split('|')).Any()).ToList();
}
//CREATED DATE
if (isfldcreatedDate)
{
AllCompany = AllCompany.Where(company => company.Statistics.Created >= createdDate).ToList();
}
//LAST UPDATED DATE
if (isfldUpdatedDate)
{
AllCompany = AllCompany.Where(company => company.Statistics.Updated >= updatedDate).ToList();
}
//Allow Placements
if (isfldEmployerLevel)
{
fldEmployerLevel = (fldEmployerLevel == "Yes") ? "1" : "";
AllCompany = AllCompany.Where(company => company["Allow Placements"].ToString() == fldEmployerLevel).ToList();
}
Firstly, unless AllCompany is of some magic custom type, the first line gives you nothing.
Also I have a doubt that Distinctworks the way You want it to. I don't know the type of AllCompany but I would guess it gives you only reference distinction.
Either way here'w what I think You want:
fldEmployerLevel = (fldEmployerLevel == "Yes") ? "1" : "";
var result = AllCompany.Where(itm => itm != null)
.Where(company => !isfldCompanyName || company["Company Name"].StartsWith(fldCompanyName))
.Where(company => !isfldSector|| fldSector.Intersect(company["Sectors"].Split('|')).Any())
.Where(company => !isfldLocation|| fldLocation.Intersect(company["Location"].Split('|')).Any())
.Where(company => !isfldcreatedDate|| company.Statistics.Created >= createdDate)
.Where(company => !isfldUpdatedDate|| company.Statistics.Updated >= updatedDate)
.Where(company => !isfldEmployerLevel|| company["Allow Placements"].ToString() == fldEmployerLevel)
.Distinct()
.ToList();
Edit:
I moved Distinct to the end of the query to optimize the processing.
How about trying like this;
AllCompany = AllCompany .Where(company => (company => company.Statistics.Created >= createdDate)) && (company.Statistics.Updated >= updatedDate));
If every part of query is optional (like created date, last update date..) then you can build linq query string.
Here's a sneaky trick. If you define the following extension method in its own static class:
public virtual IEnumerable<T> WhereAll(params Expression<Predicate<T> filters)
{
return filters.Aggregate(dbSet, (acc, element) => acc.Where(element));
}
then you can write
var result = AllCompany.WhereAll(itm => itm != null,
company => !isfldCompanyName || company["Company Name"].StartsWith(fldCompanyName),
company => !isfldSectorn || fldSector.Intersect(company["Sectors"].Split('|')).Any(),
company => !isfldLocation || fldLocation.Intersect(company["Location"].Split('|')).Any(),
company => !isfldcreatedDate || company.Statistics.Created >= createdDate,
company => !isfldUpdatedDate || company.Statistics.Updated >= updatedDate,
company => !isfldEmployerLevel || company["Allow Placements"].ToString() == fldEmployerLevel)
.Distinct()
.ToList();

Categories