Caching Linq Query Question

Caching Linq Query Question - c#

I am creating a forum package for a cms and looking at caching some of the queries to help with performance, but I'm not sure if caching the below will help/do what it should on the below (BTW: Cachehelper is a simple helper class that just adds and removes from cache)
// Set cache variables
IEnumerable<ForumTopic> maintopics;
if (!CacheHelper.Get(topicCacheKey, out maintopics))
{
// Now get topics
maintopics = from t in u.ForumTopics
where t.ParentNodeId == CurrentNode.Id
orderby t.ForumTopicLastPost descending
select t;
// Add to cache
CacheHelper.Add(maintopics, topicCacheKey);
}
//End Cache
// Pass to my pager helper
var pagedResults = new PaginatedList<ForumTopic>(maintopics, p ?? 0, Convert.ToInt32(Settings.ForumTopicsPerPage));
// Now bind
rptTopicList.DataSource = pagedResults;
rptTopicList.DataBind();
Doesn't linq only execute when its enumerated? So the above won't work will it? as its only enumerated when I pass it to the paging helper which .Take()'s a certain amount of records based on a querystring value 'p'

You need to enumerate your results, for example by calling the ToList() method.
maintopics = from t in u.ForumTopics
where t.ParentNodeId == CurrentNode.Id
orderby t.ForumTopicLastPost descending
select t;
// Add to cache
CacheHelper.Add(maintopics.ToList(), topicCacheKey);

My experience with Linq-to-Sql is that it's not super performant when you start getting into complex objects and/or joins.
The first step is to set up LoadOptions on the datacontext. This will force joins so that a complete record is recalled. This was a problem in a ticket tracking system I wrote. I was displaying a list of 10 tickets and saw about 70 queries come across the wire. I had ticket->substatus->status. Due to L2S's lazy initialization, that caused each foreign key for each object that I referenced in the grid to fire off a new query.
Here's a blog post (not mine) about this subject (MSDN was weak): http://oakleafblog.blogspot.com/2007/08/linq-to-sql-query-execution-with.html
The next option is to create precompiled Linq queries. I had to do this with large joins. Here's another blog post on the subject: http://aspguy.wordpress.com/2008/08/15/speed-up-linq-to-sql-with-compiled-linq-queries/
The next option is to convert things over to using stored procedures. This makes programming and deployment harder for sure, but for complex queries where you only need a subset of data, they will be orders of magnitude faster.
The reason I bring this up is because the way you're talking about caching things (why not use the built in Cache in ASP.NET?) is going to cause you lots of headaches in the long term. I'd recommend building your system and then running SQL traces to see where your database performance problems are, then build optimizations around that. You might find that your real issues aren't in the "top 10 topics" but in other, much simpler to fix areas.

Yes, you need to enumerate your results. Linq will not evaluate your query until you enumerate the results.
If you want a general caching strategy for Linq, here is a great tutorial:
http://petemontgomery.wordpress.com/2008/08/07/caching-the-results-of-linq-queries/
The end goal is the ability to automatically generate unique cache keys for any Linq query.

Related

I want a way to update a range of records based on a where clause in entity framework without using ToList() and foreach

I have a function in my asp.net core app which updates a bunch of records based on a certain criteria I write in a where clause ... I read that ToList() has bad performance , so is there a better and faster way than using tolist and foreach ???
This is my current way doing it , I would appreciate it if someone provides a more efficient way
public async Task UpdateCatalogOnTenantApproval(int tenantID)
{
var catalogQuery = GetQueryable();
var catalog = await catalogQuery.Where(x => x.IdTenant == tenantID).ToListAsync();
catalog.ForEach(c => { c.IsApprovedByAdmin = true; c.IsActive = true; });
Context.UpdateRange(catalog);
await Context.SaveChangesAsync(); ;
}

read that ToList() has bad performance ,
That is wrong. ToList has as good a performance as you will get - submit a bad query which is overly complex and which results in bad SQL that SQL Server will take ages to execute and it is slow.
Also, many people think "ToList" is slow (as in: in the profiler). You see, yo ustart with a db context, take a set of entities there, add some where clauses - all fast. Then ToList and it takes "long" (compared to the rest). Well, THAT is where the query is sent to the sql server ;) WHere (x=>whatever) takes "no time" because all it does is add some nodes to the expression tree, not executing the query. THAT is mostly what people mix up - delayed execution which exeutes only when asked for the results.
And third, some people like "ToList().Where() and complain about performance. Filter as much as possible no the DB.
All three reasons are why people think ToList is slow - but all it shows is a lack of understanding of how LINQ and SQL operate.

Entity Framework does not handle bulk update operations by default -- hence your existing code. If you really want to do these bulk operations, then you have two options:
Write the SQL yourself and use the ExecuteSqlCommand() method
to execute it; or
Look at 3rd party extensions, such as https://entityframework-extensions.net/

We can reduce query cost by selecting a subset of data before attaching for EF to track, and then updating.
However, it may be just pointless micro-optimization that does not perform significantly better unless you are processing massive amount of records.
// select pk for EF to track, and the 2 fields to be modified
var catalog = await catalogQuery.Where(x => x.IdTenant == tenantID)
.Select(x => new Catelog{x.CatelogId, x.IsApprovedByAdmin, x.IsActive }).ToListAsync();
//next we attach range here to let EF track the list
Context.AttachRange(catalog);
//perform your update as usual, this will be flagged as modified if changed
catalog.ForEach(c => { c.IsApprovedByAdmin = true; c.IsActive = true; });
//save and let EF update based on modified fields.
await Context.SaveChangesAsync();

Let me explain to you what you have done and what you are trying to do.
You are partially right about the performance issues related to ToList and ToListAsync as they are mainly responsible to upload entities to the memory and track them.
Based on that if your request is expected to deal intensively with light data you are not required to enhance your code. if it is not, however, there are many open approaches each one has its pros and cons and you have to treat and balance between them for each case you do not want to use the dual app-SQL requests.
let's be more realistic by talking about your case:
1- we assume that your method is a resource-consuming by (loading high volume of data, intensively called, or both)
2- I see the modification is too static by updating all of the rows by c.IsApprovedByAdmin = true; c.IsActive = true;
form (1) and (2) I suggest to write a stored procedure or ExexcuteSqlCammand (as Bryan Lewis suggested) that does this for you
because (3) the stored procedures, triggers, and all the SQL based operation are hard-maintainable and are highly potential for hidden exceptions. In your case, however, you less likely to fell into that as your code is too basic and you could reduce more the risk by construct your query from dynamic elements such as nameof(yourClassName that is the table name).YouProperty and the like ...
Anyway, this is an example to show that there is no ideal approach and you have study each case alone.
Finally, I do not agree with the 3d parties extensions as most of freely provided developed by unprofessionals and tracking exceptions caused by them are nightmares, and the paid versions are too expensive and not 0-exception extensions. The 3d party extension are more oriented to the complex bulk update/delete and/or huge data.
e.g.
await Context.UpdateAsync(e=> new Catalog
{ Archived = e.LastUpdate >
DateTime.UtcNow.AddYears(-99)? false : true
});

Most efficient collection for storing data from LINQ to Entities?

I have read several different sources over the years that indicate that when storing a collection of data, a List<T> is efficient when you want to insert objects, and an IEnumerable<T> is best for enumerating over a collection.
In LINQ-to-Entities, there is the AsEnumerable() function, that will return an IEnumerable<T>, but it will not resolve the SQL created by the LINQ statement until you start enumerating over the list.
What if I want to store objects from LINQ to Entities in a collection and then query on that collection later?
Using this strategy causes the SQL to be resolved by adding a WHERE clause and querying each record separately. I specifically don't want to do that because I'm trying to limit network chatter:
var myDataToLookup = context.MyData.AsEnumerable();
for(var myOtherDatum in myOtherDataList)
{
// gets singular record from database each time.
var myDatum = myDataToLookup.SingleOrDefault(w => w.key == myOtherDatum.key)
}
How do I resolve the SQL upfront so myDataToLookup actually contains the data in memory? I've tried ToArray:
var myDataToLookup = context.MyData.ToArray();
But I recently learned that it actually uses more memory than ToList does:
Is it better to call ToList() or ToArray() in LINQ queries?
Should I use a join instead?
var myCombinedData = from o in myOtherDataList
join d in myDataToLookup on
o.key equals d.key
select { myOtherData: o, myData: d};
Should I use ToDictionary and store my key as the key to the dictionary? Or am I worrying too much about this?

If you're using LINQ to Entities then you should not worry if ToArray is slower than ToList. There is almost no difference between them in terms of performance and LINQ to Entities itself will be a bottleneck anyway.
Regarding a dictionary. It is a structure optimized for reads by keys. There is an additional cost on adding new items though. So, if you will read by key a lot and add new items not that often then that's the way to go. But to be honest - you probably should not bother at all. If data size is not big enough, you won't see a difference.

Think of IEnumerable, ICollection and IList/IDictionary as a hierarchy each one inheriting from the previous one. Arrays add a level of restriction and complexity on top of Lists. Simply, IEnumerable gives you iteration only. ICollection adds counting and IList then gives richer functionality including find, add and remove elements by index or via lambda expressions. Dictionaries provide efficient access via a key. Arrays are much more static.
So, the answer then depends on your requirements. If it is appropriate to hold the data in memory and you need to frequently re-query it then I usually convert the Entity result to a List. This also loads the data.
If access via a set of keys is paramount then I use a Dictionary.
I cannot remember that last time I used an array except for infrequent and very specific purposes.
SO, not a direct answer, but as your question and the other replies indicate there isn't a single answer and the solution will be a compromise.

When I code and measure performance and data carried over the network, here is how I look at things based on your example above.
Let's say your result returns 100 records. Your code has now run a query on the server and performed 1 second of processing (I made the number up for sake of argument).
Then you need to cast it to a list which is going to be 1 more second of processing. Then you want to find all records that have a value of 1. The code will now Loop through the entire list to find the values with 1 and then return you the result. This is let's say another 1 second of processing and it finds 10 records.
Your network is going to carry over 10 records that took 3 seconds to process.
If you move your logic to your Data layer and make your query search right away for the records that you want, you can then save 2 seconds of performance and still only carry 10 records across the network. The bonus side is also that you can just use IEnumerable<T> as a result and not have to cast it a list. Thus eliminating the 1 second of casting to list and 1 second of iterating through the list.
I hope this helps answer your question.

How does linq actually execute the code to retrieve data from the data source?

I will start working on xamarin shortly and will be transferring a lot of code from android studio's java to c#.
In java I am using a custom classes which are given arguments conditions etc, convert them to SQL statements and then loads the results to the objects in the project's model
What I am unsure of is wether linq is a better option for filtering such data.
For example what would happen currently is somethng along these lines
List<Customer> customers = (new CustomerDAO()).get_all()
Or if I have a condition
List<Customer> customers = (new CustomerDAO()).get(new Condition(CustomerDAO.Code, equals, "code1")
Now let us assume I have transferred the classes to c# and I wish to do somethng similar to the second case.
So I will probably write something along the lines of:
var customers = from customer
in (new CustomerDAO()).get_all()
where customer.code.equals("code1")
select customer
I know that the query will only be executed when I actually try to access customers, but if I have multiple accesses to customers ( let us say that I use 4 foreach loops later on) will the get_all method be called 4 times? or are the results stored at the first execution?
Also is it more efficient (time wise because memory wise it is probably not) to just keep the get_all() method and use linq to filter the results? Or use my existing setup which in effect executes
Select * from Customers where code = 'code1'
And loads the results to an object?
Thanks in advance for any help you can provide
Edit: yes I do know there is sqlite.net which pretty much does what my daos do but probably better, and at some point I will probably convert all my objects to use it, I just need to know for the sake of knowing

if I have multiple accesses to customers ( let
us say that I use 4 foreach loops later on) will the get_all method be
called 4 times? or are the results stored at the first execution?
Each time you enumerate the enumerator (using foreach in your example), the query will re-execute, unless you store the materialized result somewhere. For example, if on the first query you'd do:
var customerSource = new CustomerDAO();
List<Customer> customerSource.Where(customer => customer.Code.Equals("code1")).ToList();
Then now you'll be working with an in-memory List<Customer> without executing the query over again.
On the contrary, if each time you'd do:
var filteredCustomers = customerSource.Where(customer => customer.Code.Equals("code1"))
foreach (var customer in filteredCustomers)
{
// Do stuff
}
Then for each enumeration you'll be exeucting the said query over again.
Also is it more efficient (time wise because memory wise it is
probably not) to just keep the get_all() method and use linq to filter
the results? Or use my existing setup which in effect executes
That really depends on your use-case. Lets imagine you were using LINQ to EF, and the customer table has a million rows, do you really want to be bringing all of them in-memory and only then filtering them out to use a subset of data? It would usually be better to full filtered query.

Linq slowness materializing complex queries

I have often found that if I have too many joins in a Linq query (whether using Entity Framework or NHibernate) and/or the shape of the resulting anonymous class is too complex, Linq takes a very long time to materialize the result set into objects.
This is a generic question, but here's a specific example using NHibernate:
var libraryBookIdsWithShelfAndBookTagQuery = (from shelf in session.Query<Shelf>()
join sbttref in session.Query<ShelfBookTagTypeCrossReference>() on
shelf.ShelfId equals sbttref.ShelfId
join bookTag in session.Query<BookTag>() on
sbttref.BookTagTypeId equals (byte)bookTag.BookTagType
join btbref in session.Query<BookTagBookCrossReference>() on
bookTag.BookTagId equals btbref.BookTagId
join book in session.Query<Book>() on
btbref.BookId equals book.BookId
join libraryBook in session.Query<LibraryBook>() on
book.BookId equals libraryBook.BookId
join library in session.Query<LibraryCredential>() on
libraryBook.LibraryCredentialId equals library.LibraryCredentialId
join lcsg in session
.Query<LibraryCredentialSalesforceGroupCrossReference>()
on library.LibraryCredentialId equals lcsg.LibraryCredentialId
join userGroup in session.Query<UserGroup>() on
lcsg.UserGroupOrganizationId equals userGroup.UserGroupOrganizationId
where
shelf.ShelfId == shelfId &&
userGroup.UserGroupId == userGroupId &&
!book.IsDeleted &&
book.IsDrm != null &&
book.BookFormatTypeId != null
select new
{
Book = book,
LibraryBook = libraryBook,
BookTag = bookTag
});
// add a couple of where clauses, then...
var result = libraryBookIdsWithShelfAndBookTagQuery.ToList();
I know it's not the query execution, because I put a sniffer on the database and I can see that the query is taking 0ms, yet the code is taking about a second to execute that query and bring back all of 11 records.
So yeah, this is an overly complex query, having 8 joins between 9 tables, and I could probably restructure it into several smaller queries. Or I could turn it into a stored procedure - but would that help?
What I'm trying to understand is, where is that red line crossed between a query that is performant and one that starts to struggle with materialization? What's going on under the hood? And would it help if this were a SP whose flat results I subsequently manipulate in memory into the right shape?
EDIT: in response to a request in the comments, here's the SQL emitted:
SELECT DISTINCT book4_.bookid AS BookId12_0_,
libraryboo5_.librarybookid AS LibraryB1_35_1_,
booktag2_.booktagid AS BookTagId15_2_,
book4_.title AS Title12_0_,
book4_.isbn AS ISBN12_0_,
book4_.publicationdate AS Publicat4_12_0_,
book4_.classificationtypeid AS Classifi5_12_0_,
book4_.synopsis AS Synopsis12_0_,
book4_.thumbnailurl AS Thumbnai7_12_0_,
book4_.retinathumbnailurl AS RetinaTh8_12_0_,
book4_.totalpages AS TotalPages12_0_,
book4_.lastpage AS LastPage12_0_,
book4_.lastpagelocation AS LastPag11_12_0_,
book4_.lexilerating AS LexileR12_12_0_,
book4_.lastpageposition AS LastPag13_12_0_,
book4_.hidden AS Hidden12_0_,
book4_.teacherhidden AS Teacher15_12_0_,
book4_.modifieddatetime AS Modifie16_12_0_,
book4_.isdeleted AS IsDeleted12_0_,
book4_.importedwithlexile AS Importe18_12_0_,
book4_.bookformattypeid AS BookFor19_12_0_,
book4_.isdrm AS IsDrm12_0_,
book4_.lightsailready AS LightSa21_12_0_,
libraryboo5_.bookid AS BookId35_1_,
libraryboo5_.libraryid AS LibraryId35_1_,
libraryboo5_.externalid AS ExternalId35_1_,
libraryboo5_.totalcopies AS TotalCop5_35_1_,
libraryboo5_.availablecopies AS Availabl6_35_1_,
libraryboo5_.statuschangedate AS StatusCh7_35_1_,
booktag2_.booktagtypeid AS BookTagT2_15_2_,
booktag2_.booktagvalue AS BookTagV3_15_2_
FROM shelf shelf0_,
shelfbooktagtypecrossreference shelfbookt1_,
booktag booktag2_,
booktagbookcrossreference booktagboo3_,
book book4_,
librarybook libraryboo5_,
library librarycre6_,
librarycredentialsalesforcegroupcrossreference librarycre7_,
usergroup usergroup8_
WHERE shelfbookt1_.shelfid = shelf0_.shelfid
AND booktag2_.booktagtypeid = shelfbookt1_.booktagtypeid
AND booktagboo3_.booktagid = booktag2_.booktagid
AND book4_.bookid = booktagboo3_.bookid
AND libraryboo5_.bookid = book4_.bookid
AND librarycre6_.libraryid = libraryboo5_.libraryid
AND librarycre7_.librarycredentialid = librarycre6_.libraryid
AND usergroup8_.usergrouporganizationid =
librarycre7_.usergrouporganizationid
AND shelf0_.shelfid = #p0
AND usergroup8_.usergroupid = #p1
AND NOT ( book4_.isdeleted = 1 )
AND ( book4_.isdrm IS NOT NULL )
AND ( book4_.bookformattypeid IS NOT NULL )
AND book4_.lightsailready = 1
EDIT 2: Here's the performance analysis from ANTS Performance Profiler:

It is often database "good" practice to place lots of joins or super common joins into views. ORMs don't let you ignore these facts nor do they supplement the decades of time spent fine tuning databases to do these kinds of things efficiently. Refactor those joins into a singular view or a couple views if that'd make more sense in the greater perspective of your application.
NHibernate should be optimizing the query down and reducing the data so that .Net only has to mess with the important parts. However, if those domain objects are just naturally large, that's still a lot of data. Also, if it's a really large result set in terms of rows returned, that's a lot of objects getting instantiated even if the DB is able to return the set quickly. Refactoring this query into a view that only returns the data you actually need would also reduce object instantiation overhead.
Another thought would be to not do a .ToList(). Return the enumerable and let your code lazily consume the data.

According to profiling information, the CreateQuery takes 45% of the total execution time. However as you mentioned the query took 0ms when you executed directly. But this alone is not enough to say there is a performance problem because,
You are running the query with the profiler which has significant impact on execution time.
When you use a profiler, it will affect every code is being profiled but not the sql execution time (because it happens in the SQL server), so you can see everything else is slower compared to SQL statement.
so ideal scenario is to measure how long it takes to execute entire code block, measure time for SQL query and calculate times, and if you do that you will probably end up with different values.
However, I'm not saying that the the NH Linq to SQL implementation is optimized for any query you come up with, but there are other ways in NHibernate to deal with those situations such as QueryOverAPI, CriteriaQueries, HQL and finally SQL.
Where is that red line crossed between a query that is performant and
one that starts to struggle with materialization. What's going on under the hood?
This one is pretty hard question and without having detail knowledge of NHibernate Linq to SQL provider it's hard to provide a accurate answer. You can always try different mechanisms provided and see which one is the best for given scenario.
And would it help if this were a SP whose flat results I subsequently
manipulate in memory into the right shape?
Yes, using a SP would help things to work pretty fast, but using SP would add more maintenance problems to your code base.

You have generic question, I'll tell you generic answer :)
If you query data for reading (not for update) try to use anonymous classes. The reason is - they are lighter to create, they have no navigatoin properties. And you select only data you need! It's very important rule. So, try to replace your select with smth like this:
select new
{
Book = new { book.Id, book.Name},
LibraryBook = new { libraryBook.Id, libraryBook.AnotherProperty},
BookTag = new { bookTag.Name}
}
Stored procedures are good, when query is complex and linq-provider generates not effective code, so, you can replace it with plain SQL or stored procedure. It's not offten case and, I think, it's not your situation
Run your sql-query. How many rows it returns? Is it the same value as result? Sometimes linq provider generates code, that select much more rows to select one entity. It happens, when entity has one to many relationship with another selecting entity. For example:
class Book
{
int Id {get;set;}
string Name {get;set;}
ICollection<Tag> Tags {get;set;}
}
class Tag
{
string Name {get;set;}
Book Book {get;set;}
}
...
dbContext.Books.Where(o => o.Id == 1).Select(o=>new {Book = o, Tags = o.Tags}).Single();
I Select only one book with Id = 1, but provider will generate code, that returns rows amount equals to Tags amount (entity framework does this).
Split complex query to set of simple and join in client side. Sometimes, you have complex query with many conditionals and resulting sql become terrible. So, you split you big query to more simple, get results of each and join/filter on client side.
At the end, I advice you to use anonymous class as result of select.

Don’t use Linq’s Join. Navigate!
in that post you can see:
As long as there are proper foreign key constraints in the database, the navigation properties will be created automatically. It is also possible to manually add them in the ORM designer. As with all LINQ to SQL usage I think that it is best to focus on getting the database right and have the code exactly reflect the database structure. With the relations properly specified as foreign keys the code can safely make assumptions about referential integrity between the tables.

I agree 100% with the sentiments expressed by everyone else (with regards to their being two parts to the optimisation here and the SQL execution being a big unknown, and likely cause of poor performance).
Another part of the solution that might help you get some speed is to pre-compile your LINQ statements. I remember this being a huge optimisation on a tiny project (high traffic) I worked on ages and ages ago... seems like it would contribute to the client side slowness you're seeing. Having said all that though I've not found a need to use them since... so heed everyone else's warnings first! :)
https://msdn.microsoft.com/en-us/library/vstudio/bb896297(v=vs.100).aspx

Using Linq to return the count of items from a query along with its resultset

I am using C# MVC4 with Linq.
I have used dependency injection for my project which resulted in me having a separate Model's project along with a separate Repository project (and one for testing ect). All this no problem.
I moved my queries out of the controllers (old style) and into the repository (new DI style), and injected them. It works fine.
I have a standard linq query (pick any example, they are basic enough), which returns a set of items from the database as normal. No problems here either.
My problem is, that I want to implement paging, and I taught it would be simple enough to. Here is my steps:
Take in the results of the linq query from the repository (injected into the controller) store it in a var. It looks something like:
var results = _someInjectedCode.GetListById(SomeId);
Before, I was able to do something simple like:
results.Count()
results.Skip(SomeNum).Take(SomeOtherNum)
But now that I want paging, I need to do my Skip Take something like this:
var results = from xyz in _someInjectedCode.GetListById(SomeId).SomeId).Skip(SomeNum).Take(SomeOtherNum)
select new[] {a,id, a.fName, a.lName .....}
The problem with this is that I no longer have access to the total count of items before the list was shortened to the Pre Skip...Take state unless I do two queries which means hitting the DB twice.
What is the best way to resolve this issue.

I just do it like this:
var result = (from n in mycollection
where n.someprop == "some value"
select n).ToList();
var count = result.Count;
There are probably other ways, but this is the simplest that I know of.

Thinking about it from a SQL point of view, I can't think of a way in a single normal query to retrieve both the total count and a subset of the data, so I don't think you will be able to do it in LINQ either.
To avoid creating two separate commands, only thing I can think of is a stored proc that returns two tables (one with just the count, the other with your subset of results). It would still execute two queries, but in a single connection. You'd lose your LINQ though. So if you want to keep your LINQ query, you might be stuck with making two separate calls.
The other way is to retrieve the entire unpaged resultset into memory, then run your Take and Skip against the array, but this is pretty wasteful and probably worse than two calls.

You can either add additional parameters to your repository interface/class which will provide paging parameters and return count alongside your result or modify your interfaces to return IQueryable and then apply count and then skip/take before query is compiled and sent for execution.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.