Linq complex - Enumeration Performance

Linq complex - Enumeration Performance - c#

Have got a method which returns IEnumerable<User> which I have been using Linq / Entity Framework / SQL Server to return results.
I came across a difficult conditional scenario, which was much more easily solved in C# iterating on the web server (at the end of a chain of linq statements, just before returning the data to the client):
public IEnumerable<User> ReturnUsersNotInRoles()
{
IQueryable<User> z = (from users
//...many joins..conditions...
).Distinct().Include(x => x.RoleUserLinks).ToList()
IEnumerable<User> list = new List<User>();
foreach (User user in z)
{
bool shouldReturnUser = true;
foreach (var rul in user.RoleUserLinks)
{
if (rul.LinkStatusID == (byte)Enums.LinkStatus.Added)
shouldReturnUser = false;
}
if (shouldReturnUser)
list.Add(user);
}
return list;
}
Question: In C# is there a more performant / less memory overhead way of doing this?
Am only bringing back the entities I need from Linq. There is no N+1 scenario. Performance currently is excellent.
I realise that ideally I'd be writing this in SQL / Linq, as then SQL Server would do its magic and serve me the data quickly. However I'm balancing this with a potentially v.hard query to understand, and excellent performance currently with iterating, and the ease of understanding the C# way.

How about this:
public IEnumerable<User> ReturnUsersNotInRoles()
{
var z = (from users
//...many joins..conditions...
).Distinct().Include(x => x.RoleUserLinks);
var addedLinkStatusID = (int)Enums.LinkStatus.Added;
return z.Where(user =>
false == user.RoleUserLinks.Any(link => link.LinkStatusID == addedLinkStatusID))
.ToList();
}
This should run completely as a SQL query - you could make the first part (z) materialize by adding a .ToList() at the end of the line that defines it.
By the way, regarding your question "In C# is there a more performant / less memory overhead way of doing this?" - well, firstly you can add a break statement right after you set shouldReturnUser = false;.
Secondly, I prefer using the LINQ primitives whenever possible whether or not I'm working with a database:
When used correctly, implementation using LINQ methods will probably be as fast or faster than anything you can write.
More importantly, they promote functional, stateless programming over stateful, bug-prone programming.
Also, if you are working with a database you have the bonus of being able to decide whether or not you want the code to run as a SQL query - all you have to do is decide where to materialize.

Your loop is equivalent to the following LINQ query - I find it easier to understand than the loop and it allows for complete execution on the server when combined with the first part of the query.
var linkStatusAdded = (Byte)Enums.LinkStatus.Added;
return z.Where(user => user.RoleUserLinks
.All(rul => rul.LinkStatusID != linkStatusAdded))
.ToList();

Related

Linq ToList() and Count() performance problem

I have 200k rows in my table and I need to filter the table and then show in datatable. When I try to do that, my sql run fast. But when I want to get row count or run the ToList(), it takes long time. Also when I try to convert it to list it has 15 rows after filter, it has not huge data.
public static List<Books> GetBooks()
{
List<Books> bookList = new List<Books>();
var v = from a in ctx.Books select a);
int allBooksCount = v.Count(); // I need all books count before filter. but it is so slow is my first problem
if (isFilter)
{
v = v.Where(a => a.startdate <= DateTime.Now && a.enddate>= DateTime.Now);
}
.
.
bookList = v.ToList(); // also toList is so slow is my second problem
}

There's nothing wrong with the code you've shown. So either you have some trouble in the database itself, or you're ruining the query by using IEnumerable instead of IQueryable.
My guess is that either ctx.Books is IEnumerable<Books> (instead of IQueryable<Books>), or that the Count (and Where etc.) method you're calling is the Enumerable version, rather than the Queryable version.
Which version of Count are you actually calling?

First, to get help you need to provide quantitative values for "fast" vs. "too long". Loading entities from EF will take longer than running a raw SQL statement in a client tool like TOAD etc. Are you seeing differences of 15ms vs. 15 seconds, or 15ms vs. 150ms?
To help identify and eliminate possible culprits:
Eliminate the possibility of a long-running DbContext instance tracking too many entities bogging down performance. The longer a DbContext is used and the more entities it tracks, the slower it gets. Temporarily change the code to:
List<Books> bookList = new List<Books>();
using (var context = new YourDbContext())
{
var v = from a in context.Books select a);
int allBooksCount = v.Count(); // I need all books count before filter. but it is so slow is my first problem
if (isFilter)
{
v = v.Where(a => a.startdate <= DateTime.Now && a.enddate>= DateTime.Now);
}
.
.
bookList = v.ToList();
}
Using a fresh DbContext ensures queries are not sifting through in-memory entities after running a query to find tracked instances to return. This also ensures we are running against IQueryable off the Books DbSet within the context. We can only guess what "ctx" in your code actually represents.
Next: Look at a profiler for MySQL, or have your database log out SQL statements to capture exactly what EF is requesting. Check that the Count and ToList each trigger just one query against the database, and then run these exact statements against the database. If there are more than these two queries being run then something odd is happening behind the scenes that you need to investigate, such as that your example doesn't really represent what your real code is doing. You could be tripping client side evaluation (if using EF Core 2) or lazy loading. The next thing I would look at is if possible to look at the execution plan for these queries for hints like missing indexes or such. (my DB experience is primarily SQL Server so I cannot provide advice on tools to use for MySQL)

I would log the actual SQL queries here. You can then use DESCRIBE to look at how many rows it hits. There are various tools that can further analyse the queries if DESCRIBE isn't sufficient. This way you can see whether it's the queries or the (lack of) indices that is the problem. Next step has to be guided by that.

I want a way to update a range of records based on a where clause in entity framework without using ToList() and foreach

I have a function in my asp.net core app which updates a bunch of records based on a certain criteria I write in a where clause ... I read that ToList() has bad performance , so is there a better and faster way than using tolist and foreach ???
This is my current way doing it , I would appreciate it if someone provides a more efficient way
public async Task UpdateCatalogOnTenantApproval(int tenantID)
{
var catalogQuery = GetQueryable();
var catalog = await catalogQuery.Where(x => x.IdTenant == tenantID).ToListAsync();
catalog.ForEach(c => { c.IsApprovedByAdmin = true; c.IsActive = true; });
Context.UpdateRange(catalog);
await Context.SaveChangesAsync(); ;
}

read that ToList() has bad performance ,
That is wrong. ToList has as good a performance as you will get - submit a bad query which is overly complex and which results in bad SQL that SQL Server will take ages to execute and it is slow.
Also, many people think "ToList" is slow (as in: in the profiler). You see, yo ustart with a db context, take a set of entities there, add some where clauses - all fast. Then ToList and it takes "long" (compared to the rest). Well, THAT is where the query is sent to the sql server ;) WHere (x=>whatever) takes "no time" because all it does is add some nodes to the expression tree, not executing the query. THAT is mostly what people mix up - delayed execution which exeutes only when asked for the results.
And third, some people like "ToList().Where() and complain about performance. Filter as much as possible no the DB.
All three reasons are why people think ToList is slow - but all it shows is a lack of understanding of how LINQ and SQL operate.

Entity Framework does not handle bulk update operations by default -- hence your existing code. If you really want to do these bulk operations, then you have two options:
Write the SQL yourself and use the ExecuteSqlCommand() method
to execute it; or
Look at 3rd party extensions, such as https://entityframework-extensions.net/

We can reduce query cost by selecting a subset of data before attaching for EF to track, and then updating.
However, it may be just pointless micro-optimization that does not perform significantly better unless you are processing massive amount of records.
// select pk for EF to track, and the 2 fields to be modified
var catalog = await catalogQuery.Where(x => x.IdTenant == tenantID)
.Select(x => new Catelog{x.CatelogId, x.IsApprovedByAdmin, x.IsActive }).ToListAsync();
//next we attach range here to let EF track the list
Context.AttachRange(catalog);
//perform your update as usual, this will be flagged as modified if changed
catalog.ForEach(c => { c.IsApprovedByAdmin = true; c.IsActive = true; });
//save and let EF update based on modified fields.
await Context.SaveChangesAsync();

Let me explain to you what you have done and what you are trying to do.
You are partially right about the performance issues related to ToList and ToListAsync as they are mainly responsible to upload entities to the memory and track them.
Based on that if your request is expected to deal intensively with light data you are not required to enhance your code. if it is not, however, there are many open approaches each one has its pros and cons and you have to treat and balance between them for each case you do not want to use the dual app-SQL requests.
let's be more realistic by talking about your case:
1- we assume that your method is a resource-consuming by (loading high volume of data, intensively called, or both)
2- I see the modification is too static by updating all of the rows by c.IsApprovedByAdmin = true; c.IsActive = true;
form (1) and (2) I suggest to write a stored procedure or ExexcuteSqlCammand (as Bryan Lewis suggested) that does this for you
because (3) the stored procedures, triggers, and all the SQL based operation are hard-maintainable and are highly potential for hidden exceptions. In your case, however, you less likely to fell into that as your code is too basic and you could reduce more the risk by construct your query from dynamic elements such as nameof(yourClassName that is the table name).YouProperty and the like ...
Anyway, this is an example to show that there is no ideal approach and you have study each case alone.
Finally, I do not agree with the 3d parties extensions as most of freely provided developed by unprofessionals and tracking exceptions caused by them are nightmares, and the paid versions are too expensive and not 0-exception extensions. The 3d party extension are more oriented to the complex bulk update/delete and/or huge data.
e.g.
await Context.UpdateAsync(e=> new Catalog
{ Archived = e.LastUpdate >
DateTime.UtcNow.AddYears(-99)? false : true
});

How do I update multiple Entity models in one SQL statement?

I had the following:
List<Message> unreadMessages = this.context.Messages
.Where( x =>
x.AncestorMessage.MessageID == ancestorMessageID &&
x.Read == false &&
x.SentTo.Id == userID ).ToList();
foreach(var unreadMessage in unreadMessages)
{
unreadMessage.Read = true;
}
this.context.SaveChanges();
But there must be a way of doing this without having to do 2 SQL queries, one for selecting the items, and one for updating the list.
How do i do this?

Current idiomatic support in EF
As far as I know, there is no direct support for "bulk updates" yet in Entity Framework (there has been an ongoing discussion for bulk operation support for a while though, and it is likely it will be included at some point).
(Why) Do you want to do this?
It is clear that this is an operation that, in native SQL, can be achieved in a single statement, and provides some significant advantages over the approach followed in your question. Using the single SQL statement, only a very small amount of I/O is required between client and DB server, and the statement itself can be completely executed and optimized by the DB server. No need to transfer to and iterate through a potentially large result set client side, just to update one or two fields and send this back the other way.
How
So although not directly supported by EF, it is still possible to do this, using one of two approaches.
Option A. Handcode your SQL update statement
This is a very simple approach, that does not require any other tools/packages and can be performed Async as well:
var sql = "UPDATE TABLE x SET FIELDA = #fieldA WHERE FIELDB = #fieldb";
var parameters = new SqlParameter[] { ..., ... };
int result = db.Database.ExecuteSqlCommand(sql, parameters);
or
int result = await db.Database.ExecuteSqlCommandAsync(sql, parameters);
The obvious downside is, well breaking the nice linqy paradigm and having to handcode your SQL (possibly for more than one target SQL dialect).
Option B. Use one of the EF extension/utility packages
Since a while, a number of open source nuget packages are available that offer specific extensions to EF. A number of them do provide a nice "linqy" way to issue a single update SQL statement to the server. Two examples are:
Entity Framework Extended Library that allows performing a bulk update using a statement like:
context.Messages.Update(
x => x.Read == false && x.SentTo.Id == userID,
x => new Message { Read = true });
It is also available on github
EntityFramework.Utilities that allows performing a bulk update using a statement like:
EFBatchOperation
.For(context, context.Messages)
.Where(x => x.Read == false && x.SentTo.Id == userID)
.Update(x => x.Read, x => x.Read = true);
It is also available on github
And there are definitely other packages and libraries out there that provide similar support.

Even SQL has to do this in two steps in a sense, in that an UPDATE query with a WHERE clause first runs the equivalent of a SELECT behind the scenes, filtering via the WHERE clause, then applying the update. So really, I don't think you need to be worried about improving this.
Further, the reason why it's broken into two steps like this in LINQ is precisely for performance reasons. You want that "select" to be as minimal as possible, i.e. you don't want to load any more objects from the database into in memory objects than you have to. Only then do you alter objects (in the foreach).
If you really want to run a native UPDATE on the SQL side, you could use a System.Data.SqlClient.SqlCommand to issue the update, instead of having LINQ give you back objects that you then update. That will be faster, but then you conceptually move some of your logic out of your C# code object model space into the database model space (you are doing things in the database, not in your object space), even if the SqlCommand is being issued from your code.

LINQ: When to use Compiled Queries?

I'd like some expert advice on this. I've used compiled queries before, but for this particular case, i'm not sure whether it's appropriate.
It's a search form where the query changes and is dependent on what is being searched on.
static Func<DBContext, int, IQueryable<Foo>> Search = CompiledQuery.Compile(
(DBContext db, int ID) =>
db.Person
.Where(w => w.LocationID = ID)
.Select(s =>
new Foo
{
Name = s.PersonName,
Age = s.Age,
Location = s.LocationName,
Kin = s.Kin
}));
Now if someone fills in the search box, i want to extend the query by adding another Where statement to the query:
var query = Search(context, 123);
query = query.Where(w => w.Name.Contains(searchString));
So my question is, is it returning all the results where LocationID == 123, then checking the results for a searchString match? Or is it actually extending the compiled query?
If it's the former (which i suspect it is), should scrap the CompiledQuery and just create a method that extends the query then return it as a list?
Also, what are the best practices for CompiledQuery usage and is there a guideline of when they should be used?
Note: I'm using the above in an ASP.NET website with Linq to SQL. Not sure if that makes any difference.
Thanks

The problem is that the compiled query is set in stone; it knows what SQL it will run against the database. The lambda expression is lazy loaded however, and cannot modify the compile query as it is being run during run time. The bad news is that it will return all of the records from the database, but it will query those records in memory to further refine them.
If you want to compile the query then I would suggest writing two queries with different signatures.

As far as I know, it is good practice to compile your query once, that is the whole point of pre-compiled query(and that's why your pre-compiled query is static), it saves time to compile that query into SQL. If it extend that pre-compiled query, then it is compiling that query again, which you loose gains.
Query result on result (your query variable) is no longer LINQ to SQL.

Just include your additional condition in your compiled query.
DB.Person.Where(w => w.LocationID == ID
& (searchString=="" || w.Name.Contains(searchString)))

If i am right then you need some dynamic where clause in linq. So for that i would suggest go this way
IEnumerable list;
if(condition1)
{
list = Linq Statement;
}
if(condition2)
{
list = from f in list where con1=con && con2=con select f;
}
if(condition3)
{
list = from n in list con1=con && con2=con select f;
}
I hope you got my words.

Why is the SQL produced by LINQ-to-Entities so inefficient?

The following (cut down) code excerpt is a Linq-To-Entities query that results in SQL (via ToTraceString) that is much slower than a hand crafted query. Am I doing anything stupid, or is Linq-to-Entities just bad at optimizing queries?
I have a ToList() at the end of the query as I need to execute it before using it to build an XML data structure (which was a whole other pain).
var result = (from mainEntity in entities.Main
where (mainEntity.Date >= today) && (mainEntity.Date <= tomorrow) && (!mainEntity.IsEnabled)
select new
{
Id = mainEntity.Id,
Sub =
from subEntity in mainEntity.Sub
select
{
Id = subEntity.Id,
FirstResults =
from firstResultEntity in subEntity.FirstResult
select new
{
Value = firstResultEntity.Value,
},
SecondResults =
from secondResultEntity in subEntity.SecondResult
select
{
Value = secondResultEntity.Value,
},
SubSub =
from subSubEntity in entities.SubSub
where (subEntity.Id == subSubEntity.MainId) && (subEntity.Id == subSubEntity.SubId)
select
new
{
Name = (from name in entities.Name
where subSubEntity.NameId == name.Id
select name.Name).FirstOrDefault()
}
}
}).ToList();
While working on this, I've also has some real problems with Dates. When I just tried to include returned dates in my data structure, I got internal error "1005".

Just as a general observation and not based on any practical experience with Linq-To-Entities (yet): having four nested subqueries inside a single query doesn't look like it's awfully efficient and speedy to begin with.
I think your very broad statement about the (lack of) quality of the SQL generated by Linq-to-Entities is not warranted - and you don't really back it up by much evidence, either.
Several well respected folks including Rico Mariani (MS Performance guru) and Julie Lerman (author of "Programming EF") have been showing in various tests that in general and overall, the Linq-to-SQL and Linq-to-Entities "engines" aren't really all that bad - they achieve overall at least 80-95% of the possible peak performance. Not every .NET app dev can achieve this :-)
Is there any way for you to rewrite that query or change the way you retrieve the bits and pieces that make up its contents?
Marc

Have you tried not materializing the result immediately by calling .ToList()? I'm not sure it will make a difference, but you might see improved performance if you iterate over the result instead of calling .ToList() ...
foreach( var r in result )
{
// build your XML
}
Also, you could try breaking up the one huge query into separate queries and then iterating over the results. Sending everything in one big gulp might be the issue.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.