Improving performance of big EF multi-level Include

Improving performance of big EF multi-level Include - c#

I'm an EF noob (as in I just started today, I've only used other ORMs), and I'm experiencing a baptism of fire.
I've been asked to improve the performance of this query created by another dev:
var questionnaires = await _myContext.Questionnaires
.Include("Sections")
.Include(q => q.QuestionnaireCommonFields)
.Include("Sections.Questions")
.Include("Sections.Questions.Answers")
.Include("Sections.Questions.Answers.AnswerMetadatas")
.Include("Sections.Questions.Answers.SubQuestions")
.Include("Sections.Questions.Answers.SubQuestions.Answers")
.Include("Sections.Questions.Answers.SubQuestions.Answers.AnswerMetadatas")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.AnswerMetadatas")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.AnswerMetadatas")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.AnswerMetadatas")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.AnswerMetadatas")
.Where(q => questionnaireIds.Contains(q.Id))
.ToListAsync().ConfigureAwait(false);
A quick web-surf tells me that Include() results in a cols * rows product and poor performance if you run multiple levels deep.
I've seen some helpful answers on SO, but they have limited less complex examples, and I can't figure out the best approach for a rewrite of the above.
The multiple repeat of the part -"Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers..." looks suspicious to me like it could be done separately and then another query issued, but I don't know how to build this up or whether such an approach would even improve performance.
Questions:
How do I rewrite this query to something more sensible to improve performance, while ensuring that the eventual result set is the same?
Given the last line: .Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.AnswerMetadatas")
Why do I need all the intermediate lines? (I guess it's because some of the joins may not be left joins?)
EF Version info: package id="EntityFramework" version="6.2.0" targetFramework="net452"
I realise this question is a bit rubbish, but I'm trying to resolve as fast as I can from a point of no knowledge.
Edit
After mulling over this for half a day and thanks to StuartLC's suggestions I came up with some options:
Poor - split the query so that it performs multiple round-trips to fetch the data. This is likely to provide a slightly slower experience for the user, but will stop the SQL timing out. (This is not much better than just increasing the EF command timeout).
Good - change the clustered indexing on child tables to be clustered by their parent's foreign key (assuming you don't have a lot of insert operations).
Good - change the code to only query the first few levels and lazy-load (separate db hit) anything below this, i.e. remove all but the top few Includes, then change the ICollections - Answers.SubQuestions, Answers.AnswerMetadatas, and Question.Answers to all be virtual. Presumably the downside to making these virtual is that if any (other) existing code in the app expects those ICollection properties to be eager-loaded, you may have to update that code (i.e. if you want/need them to load immediately within that code). I will be investigating this option further. Further edit - unfortunately this won't work if you need to serialize the response due to self-referencing loop.
Non-trivial - Write a sql stored proc/view manually and build a new EF object pointed at it.
Longer term
The obvious, best, but most time-consuming option - rewrite the app design, so it doesn't need the whole data tree in a single api call, or go with the option below:
Rewrite the app to store the data in a NoSQL fashion (e.g. store the object tree as json so there are no joins). As Stuart mentioned this is not a good option if you need to filter the data in other ways (via something other than the questionnaireId), which you might need to do. Another alternative is to partially store NoSQL-style and partially relational as required.

First up, it must be said that this isn't a trivial query. Seemingly we have:
6 levels of recursion through a nested question-answer tree
A total of 20 tables are joined in this way via eager loaded .Include
I would first take the time to determine where this query is used in your app, and how often it is needed, with particular attention to where it is used most frequently.
YAGNI optimizations
The obvious place to start is to see where the query is used in your app, and if you don't need the whole tree all the time, then suggest you don't join in the nested question and answer tables if they are not needed in all usages of the query.
Also, it is possible to compose on IQueryable dynamically, so if there are multiple use cases for your query (e.g. from a "Summary" screen which doesn't need the question + answers, and a details tree which does need them), then you can do something like:
var questionnaireQuery = _myContext.Questionnaires
.Include(q => q.Sections)
.Include(q => q.QuestionnaireCommonFields);
// Conditionally extend the joins
if (mustIncludeQandA)
{
questionnaireQuery = questionnaireQuery
.Include(q => q.Sections.Select(s => s.Questions.Select(q => q.Answers..... etc);
}
// Execute + materialize the query
var questionnaires = await questionnaireQuery
.Where(q => questionnaireIds.Contains(q.Id))
.ToListAsync()
.ConfigureAwait(false);
SQL Optimizations
If you really have to fetch the whole tree all the time, then look at your SQL table design and indexing.
1) Filters
.Where(q => questionnaireIds.Contains(q.Id))
(I'm assuming SQL Server terminology here, but the concepts are applicable in most other RDBMs as well.)
I'm guessing Questionnaires.Id is a clustered primary key, so will be indexed, but just check for sanity (it will look something PK_Questionnaires CLUSTERED UNIQUE PRIMARY KEY in SSMS)
2) Ensure all child tables have indexes on their foreign keys back to the parent.
e.g. q => q.Sections means that table Sections has a foreign key back to Questionnaires.Id - make sure this has at least a non-clustered index on it - EF Code First should do this automagically, but again, check to be sure.
This would look like IX_QuestionairreId NONCLUSTERED on column Sections(QuestionairreId)
3) Consider changing the clustered indexing on child tables to be clustered by their parent's foreign key, e.g. Cluster Section by Questions.SectionId. This will keep all child rows related to the same parent together, and reduce the number of pages of data that SQL needs to fetch. It isn't trivial to achieve in EF code first, but your DBA can assist you in doing this, perhaps as a custom step.
Other comments
If this query is only used to query data, not to update or delete, then adding .AsNoTracking() will marginally reduce the memory consumption and in-memory performance of EF.
Unrelated to performance, but you've mixed the weakly typed ("Sections") and strongly typed .Include statements (q => q.QuestionnaireCommonFields). I would suggest moving to the strongly typed includes for the additional compile time safety.
Note that you only need to specify the include path for the longest chain(s) which are eager loaded - this will obviously force EF to include all higher levels too. i.e. You can reduce the 20 .Include statements to just 2. This will do the same job more efficiently:
.Include(q => q.QuestionnaireCommonFields)
.Include(q => q.Sections.Select(s => s.Questions.Select(q => q.Answers .... etc))
You'll need .Select any time there is a 1:Many relationship, but if the navigation is 1:1 (or N:1) then you don't need the .Select, e.g. City c => c.Country
Redesign
Last but not least, if data is only ever filtered from the top level (i.e. Questionnaires), and if the whole questionairre 'tree' (Aggregate Root) is typically always added or updated all at once, then you might try and approach the data modelling of the question and answer tree in a NoSQL way, e.g. by simply modelling the whole tree as XML or JSON, and then treat the whole tree as a long string. This will avoid all the nasty joins altogether. You would need a custom deserialization step in your data tier. This latter approach won't be very useful if you need to filter from nodes in the tree (i.e. a Query like find me all questionairre's where the SubAnswer to Question 5 is "Foo" won't be a good fit)

Related

EF Core Single vs. Split Queries

I am using EF Core 7. It looks like, since EF Core 5, there is now Single vs Split Query execution.
I see that the default configuration still uses the Single Query execution though.
I noticed in my logs it was saying:
Microsoft.EntityFrameworkCore.Query.MultipleCollectionIncludeWarning':
Compiling a query which loads related collections for more than one
collection navigation, either via 'Include' or through projection, but
no 'QuerySplittingBehavior' has been configured. By default, Entity
Framework will use 'QuerySplittingBehavior.SingleQuery', which can
potentially result in slow query performance.
Then I configured a warning on db context to get more details:
services.AddDbContextPool<TheBestDbContext>(
options => options.UseSqlServer(configuration.GetConnectionString("TheBestDbConnection"))
.ConfigureWarnings(warnings => warnings.Throw(RelationalEventId.MultipleCollectionIncludeWarning))
);
Then I was able to specifically see which call was actually causing that warning.
var user = await _userManager.Users
.Include(x => x.UserRoles)
.ThenInclude(x => x.ApplicationRole)
.ThenInclude(x => x.RoleClaims)
.SingleOrDefaultAsync(u => u.Id == userId);
So basically same code would be like:
var user = await _userManager.Users
.Include(x => x.UserRoles)
.ThenInclude(x => x.ApplicationRole)
.ThenInclude(x => x.RoleClaims)
.AsSplitQuery() // <===
.SingleOrDefaultAsync(u => u.Id == userId);
with Split query option.
I went through the documentation, but I'm still not sure how to create a pattern out of it.
I would like to set the most common one as a default value across the project, and only use the other for specific scenarios.
Based on the documentation, I have a feeling that the "Split" should be used as default in general but with caution. I also noticed on their documentation specific to pagination, that it says:
When using split queries with Skip/Take, pay special attention to making your query ordering fully unique; not doing so could cause incorrect data to be returned. For example, if results are ordered only by date, but there can be multiple results with the same date, then each one of the split queries could each get different results from the database. Ordering by both date and ID (or any other unique property or combination of properties) makes the ordering fully unique and avoids this problem. Note that relational databases do not apply any ordering by default, even on the primary key.
which completely makes sense as the query will be split.
But if we are mainly fetching from database for a single record, regardless how big or small the include list with its navigation properties, should I always go with "Split" approach?
I would love to hear if there are any best practices on that and when to use which approach.

But if we are mainly fetching from database for a single record, regardless how big or small the include list with its navigation properties, should I always go with "Split" approach?
It depends, let's examine your example in Single query approach:
var user = await _userManager.Users // 1 records based on SingleOrDefault but to server goes TAKE 2
.Include(x => x.UserRoles) // R roles
.ThenInclude(x => x.ApplicationRole) // 1 record
.ThenInclude(x => x.RoleClaims) // C claims
.SingleOrDefaultAsync(u => u.Id == userId);
As result on the client will be returned RecordCount = 1 * R * 1 * C records. Then they will be deduplicated and placed in appropriate collections.
If RecordCount is approximately small Single query can be best approach.
Also EF Core adds ORDER BY for such query which may slowdown execution. So better examine execution plan.
Side note: Better to use FirstOrDefault/Async it CAN be a lot faster than SingleOrDefault/Async, when SQL server fails to detect that there no 2 records in recordset early.

The documentation at https://learn.microsoft.com/en-us/ef/core/querying/single-split-queries outlines the considerations when Split Queries could have unintentional consequences, particularly around isolation and ordering. As mentioned when loading a single record with related details, a singlw query execution is generally perferred. The warning is appearing because you have a one-to-many, which contains a one-to-many, so it is warning that this can potentially lead to a much larger Cartesian Product in terms of a JOIN-based query. To avoid the warning as you are confident that the query is reasonable in size, you can specify .AsSingleQuery() explicitly and the warning should disappear.
When working with object graphs like this you can consider designing operations against the data state to be as atomic as possible. IF you are editing a User that has Roles & Claims, rather than loading everything for a User and attempting to edit the entire graph in memory in one go, you might structure the application to perform actions like "AddRoleToUser", "RemoveRoleFromUser", AddClaimToUserRole", etc. So instead of loading User /w Roles /w Claims, these actions just load Roles for a user, or Claims for a UserRole respectively to alter this data.

After searching through this to figure out if there is any pattern to apply this, and with all the great content provided at the bottom, I was still not sure as I was looking for "When to use split queries" and "when not to", so I tried the summarized my understanding at the bottom.
I will use the same example that Microsoft shows on Single vs Split Queries
var blogs = ctx.Blogs
.Include(b => b.Posts)
.Include(b => b.Contributors)
.ToList();
and here is the generated SQL for that:
SELECT [b].[Id], [b].[Name], [p].[Id], [p].[BlogId], [p].[Title], [c].[Id], [c].[BlogId], [c].[FirstName], [c].[LastName]
FROM [Blogs] AS [b]
LEFT JOIN [Posts] AS [p] ON [b].[Id] = [p].[BlogId]
LEFT JOIN [Contributors] AS [c] ON [b].[Id] = [c].[BlogId]
ORDER BY [b].[Id], [p].[Id]
Microsoft says:
In this example, since both Posts and Contributors are collection
navigations of Blog - they're at the same level - relational databases
return a cross product: each row from Posts is joined with each row
from Contributors. This means that if a given blog has 10 posts and 10
contributors, the database returns 100 rows for that single blog. This
phenomenon - sometimes called cartesian explosion - can cause huge
amounts of data to unintentionally get transferred to the client,
especially as more sibling JOINs are added to the query; this can be a
major performance issue in database applications.
However what it doesn't clearly mention is, other than sorting/ordering issues, this may easily mess up the performance of the queries.
First concern is, we are going to be hitting to database multiple times in that case.
Let's check this one:
using (var context = new BloggingContext())
{
var blogs = context.Blogs
.Include(blog => blog.Posts)
.AsSplitQuery()
.ToList();
}
And check out the generated SQL when .AsSplitQuery() is used.
SELECT [b].[BlogId], [b].[OwnerId], [b].[Rating], [b].[Url]
FROM [Blogs] AS [b]
ORDER BY [b].[BlogId]
SELECT [p].[PostId], [p].[AuthorId], [p].[BlogId], [p].[Content], [p].[Rating], [p].[Title], [b].[BlogId]
FROM [Blogs] AS [b]
INNER JOIN [Posts] AS [p] ON [b].[BlogId] = [p].[BlogId]
ORDER BY [b].[BlogId]
So above query was kind of surprised me. It is interesting that when it uses the split option, it still joins on the second query even though second query should only be pulling data from posts table. Pretty sure EF Core folks had some idea behind that but it just doesn't make sense to me. Then what is the point of having that foreign key over there?
Looks like Microsoft was mainly focused on a solution to avoid cartesian explosion problem but obviously it doesn't mean that "split queries" should be used as best practices by default going forward. Definitely not!
And another possible problem I can think of is data inconsistency, yet the queries are ran separate, you can't guarantee the data consistency. (unless completely locked)
I just don't want to throw away the feature of course. There are still some "good" scenarios to use Split Queries imo, (unless you are really worried about the data consistency) like if we are returning lots of columns with a relation and the size is pretty large, then this could be really performance factor. Or the parent data is not a lot, but tons of navigation sets, then there is your cartesian explosion.
PS: Note that cartesian explosion does not occur when the two JOINs aren't at the same level.
Last but not least, personally, if I am really going to be pulling some heavy amount of data with bunch of relation of relation of relation, I would still prefer those "good old" Stored Procedures. It never gets old!

C# Entity Framework: Bulk Extensions Input Memory Issue

I am currently using EF Extensions. One thing I don't understand, "its supposed to help with performance"
however placing a million+ records into List variable, is a Memory Issue itself.
So If wanting to update million records, without holding everything in memory, how can this be done efficiently?
Should we use a for loop, and update in batches say 10,000? Does EFExtensions BulkUpdate have any native functionality to support this?
Example:
var productUpdate = _dbContext.Set<Product>()
.Where(x => x.ProductType == 'Electronics'); // this creates IQueryable
await productUpdate.ForEachAsync(c => c.ProductBrand = 'ABC Company');
_dbContext.BulkUpdateAsync(productUpdate.ToList());
Resource:
https://entityframework-extensions.net/bulk-update

This is actually something that EF is not made for. EF's database interactions start from the record object, and flow from there. EF cannot generate a partial UPDATE (i.e. not overwriting everything) if the entity wasn't change tracked (and therefore loaded), and similarly it cannot DELETE records based on a condition instead of a key.
There is no EF equivalent (without loading all of those records) for conditional update/delete logic such as
UPDATE People
SET FirstName = 'Bob'
WHERE FirstName = 'Robert'
or
DELETE FROM People
WHERE FirstName = 'Robert'
Doing this using the EF approach will require you to load all of these entities just to send them back (with an update or delete) to the database, and that's a waste of bandwidth and performance as you've already found.
The best solution I've found here is to bypass EF's LINQ-friendly methods and instead execute the raw SQL yourself. This can still be done using an EF context.
using (var ctx = new MyContext())
{
string updateCommand = "UPDATE People SET FirstName = 'Bob' WHERE FirstName = 'Robert'";
int noOfRowsUpdated = ctx.Database.ExecuteSqlCommand(updateCommand);
string deleteCommand = "DELETE FROM People WHERE FirstName = 'Robert'";
int noOfRowsDeleted = ctx.Database.ExecuteSqlCommand(deleteCommand);
}
More information here. Of course don't forget to protect against SQL injection where relevant.
The specific syntax to run raw SQL may vary per version of EF/EF Core but as far as I'm aware all versions allow you to execute raw SQL.
I can't comment on the performance of EF Extensions or BulkUpdate specifically, and I'm not going to buy it from them.
Based on their documentation, they don't seem to have the methods with the right signatures to allow for conditional update/delete logic.
BulkUpdate doesn't seem to allow you to input the logical condition (the WHERE in your UPDATE command) that would allow you to optimize this.
BulkDelete still has a BatchSize setting, which suggests that they are still handling the records one at a time (well, per batch I guess), and not using a single DELETE query with a condition (WHERE clause).
Based on your intended code in the question, EF Extensions isn't really giving you what you need. It's more performant and cheaper to simply execute raw SQL on the database, as this bypasses EF's need to load its entities.
Update
I might stand corrected, there is some support for conditional update logic, as seen here. However, it is unclear to me while the example still loads everything in memory and what the purpose of that conditional WHERE logic then is if you've already loaded it all in memory (why not use in-memory LINQ then?)
However, even if this works without loading the entities, it's still:
more limited (only equality checks are allowed, compared to SQL allowing any boolean condition that is valid SQL),
relatively complex (I don't like their syntax, maybe that's subjective)
and more costly (still a paid library)
compared to rolling your own raw SQL query. I would still suggest rolling your own raw SQL here, but that's just my opinion.

I found the "proper" EF Extensions way to do a bulk update with a query-like condition:
var productUpdate = _dbContext.Set<Product>()
.Where(x => x.ProductType == 'Electronics')
.UpdateFromQuery( x => new Product { ProductBrand = "ABC Company" });
This should result in a proper SQL UPDATE ... SET ... WHERE, without the need to load entities first, as per the documentation:
Why UpdateFromQuery is faster than SaveChanges, BulkSaveChanges, and BulkUpdate?
UpdateFromQuery executes a statement directly in SQL such as UPDATE [TableName] SET [SetColumnsAndValues] WHERE [Key].
Other operations normally require one or multiple database round-trips which makes the performance slower.
You can check the working syntax on this dotnet fiddle example, adapted from their example of BulkUpdate.
Other considerations
No mention of batch operations for this, unfortunately.
Before doing a big update like this, it might be worth considering deactivating indexes you may have on this column, and rebuild them afterward. This is especially useful if you have many of them.
Careful about the condition in the Where, if it can't be translated as SQL by EF, then it will be done client side, meaning the "usual" terrible roundtrip "Load - change in memory - update"

Optimize/faster choice than "not all" in LINQ

I'm using EF 6 and .NET Framework 4.6.1. I have a scenario where I need to exclude a parent record if all of its child records meet a certain condition.
This is a generic version of what I've done so far:
public ParentRecords GetParentRecordsExceptWhereSpecificStringOnAllChildren(string aSpecificString){
return ParentRecords
.Where(parent => !parent.ChildRecords
.Select(child => child.SomeStringProperty)
.All(c => c.Equals(aSpecificString))
);
}
This takes a little too much time to run (on the scale of one second per child record), and the generated SQL from EF contains n-1 UNION ALL statements, where n is the number of child records.
I suspect I'm missing an obvious way to write this that would improve performance dramatically, but I'm not seeing it (but I'm not a LINQ/EF master by any means).
I wrote a stored procedure that returns the same data, but much faster, and not exactly in the same layout (one flat row versus a row for each child record). We're trying to avoid stored procedures, though, so I'm back to the grindstone on figuring out how to make this LINQ faster.
Any suggestions would be greatly appreciated. If I haven't explained this clearly, please let me know. I tried to make it generic for the sake of re-use, in case anyone else is in this situation.

You can remove the select of SomeStringProperty from your code.
Using Any
ParentRecords.Where(parent =>
parent.ChildRecords.Any(child => !child.SomeStringProperty.Equals(aSpecificString)));
Using All
ParentRecords.Where(parent =>
!parent.ChildRecords.All(child => child.SomeStringProperty.Equals(aSpecificString)));

Does the order of the include and the where matter in a LINQ query?

I have the following:
var objectives = _objectivesRepository
.GetAll()
.Where(o => o.ExamId == examId || examId == 0)
.Include(o => o.ObjectiveDetails)
.ToList();
In a previous post one of the users said that it was important to put the where before the include in a LINQ query.
Can someone let me know if this is correct? Does order matter? How about if there are many where and includes ?

In Entity Framework yes it does matter, but only in certain scenarios. When using groupings or projections, it will fail to include the requested data.
See this blog post on the subject.

The actual answer, is that usually, the order does not matter significantly. Following your example statement, I would describe the logical translational steps to a relational query:
Get all objects, with all their properties (in relational algebra they are considered attributes)
Restrict the retrieved rows based on your condition ((relational algebra projection operation)
Restrict the attributes of the retrieved rows which are eagerly loaded (relational algebra selection operation)
In your specific query, the steps 2 and 3 are interchangeable without altering the final outcome. As stated here, this is the default case. Nevertheless, even if the final outcome would not change, the performance could be significantly be affected. This is the reason for which the modern databases have query optimizers which create an execution plan to optimize the specific query.
Nevertheless, this is not always the case. So, I suppose that you could always find a case where the above do not apply. Regarding performance, no assumptions are safe. You should always measure things. You could always use the SQL Server profiler to see the translation of your linq to entities query to the final SQL query. Then you could use the SQL server tools (like the query analyzer) to see the execution plan of the final SQL query.
Hope I helped!

Caching Linq Query Question

I am creating a forum package for a cms and looking at caching some of the queries to help with performance, but I'm not sure if caching the below will help/do what it should on the below (BTW: Cachehelper is a simple helper class that just adds and removes from cache)
// Set cache variables
IEnumerable<ForumTopic> maintopics;
if (!CacheHelper.Get(topicCacheKey, out maintopics))
{
// Now get topics
maintopics = from t in u.ForumTopics
where t.ParentNodeId == CurrentNode.Id
orderby t.ForumTopicLastPost descending
select t;
// Add to cache
CacheHelper.Add(maintopics, topicCacheKey);
}
//End Cache
// Pass to my pager helper
var pagedResults = new PaginatedList<ForumTopic>(maintopics, p ?? 0, Convert.ToInt32(Settings.ForumTopicsPerPage));
// Now bind
rptTopicList.DataSource = pagedResults;
rptTopicList.DataBind();
Doesn't linq only execute when its enumerated? So the above won't work will it? as its only enumerated when I pass it to the paging helper which .Take()'s a certain amount of records based on a querystring value 'p'

You need to enumerate your results, for example by calling the ToList() method.
maintopics = from t in u.ForumTopics
where t.ParentNodeId == CurrentNode.Id
orderby t.ForumTopicLastPost descending
select t;
// Add to cache
CacheHelper.Add(maintopics.ToList(), topicCacheKey);

My experience with Linq-to-Sql is that it's not super performant when you start getting into complex objects and/or joins.
The first step is to set up LoadOptions on the datacontext. This will force joins so that a complete record is recalled. This was a problem in a ticket tracking system I wrote. I was displaying a list of 10 tickets and saw about 70 queries come across the wire. I had ticket->substatus->status. Due to L2S's lazy initialization, that caused each foreign key for each object that I referenced in the grid to fire off a new query.
Here's a blog post (not mine) about this subject (MSDN was weak): http://oakleafblog.blogspot.com/2007/08/linq-to-sql-query-execution-with.html
The next option is to create precompiled Linq queries. I had to do this with large joins. Here's another blog post on the subject: http://aspguy.wordpress.com/2008/08/15/speed-up-linq-to-sql-with-compiled-linq-queries/
The next option is to convert things over to using stored procedures. This makes programming and deployment harder for sure, but for complex queries where you only need a subset of data, they will be orders of magnitude faster.
The reason I bring this up is because the way you're talking about caching things (why not use the built in Cache in ASP.NET?) is going to cause you lots of headaches in the long term. I'd recommend building your system and then running SQL traces to see where your database performance problems are, then build optimizations around that. You might find that your real issues aren't in the "top 10 topics" but in other, much simpler to fix areas.

Yes, you need to enumerate your results. Linq will not evaluate your query until you enumerate the results.
If you want a general caching strategy for Linq, here is a great tutorial:
http://petemontgomery.wordpress.com/2008/08/07/caching-the-results-of-linq-queries/
The end goal is the ability to automatically generate unique cache keys for any Linq query.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.