How to query based on an previous object result? Entity Framework - c#

I am extremely stuck with getting the right information from the DB. So, basically the problem is that I need to add where closure in my statement to validate that it only retrieves the real and needed information.
public async Task<IEnumerable<Post>> GetAllPosts(int userId, int pageNumber)
{
var followersIds = _dataContext.Followees.Where(f => f.CaUserId == userId).AsQueryable();
pageNumber *= 15;
var posts = await _dataContext.Posts
.Include(p => p.CaUser)
.Include(p => p.CaUser.Photos)
.Include(c => c.Comments)
.Where(u => u.CaUserId == followersIds.Id) <=== ERROR
.Include(l => l.LikeDet).ToListAsync();
return posts.OrderByDescending(p => p.Created).Take(pageNumber);
}
As you can see the followersIds contains all the required Id which I need to validate in the post variable. However I have tried with a foreach but nothing seems to work here. Can somebody help me with this issue?

The short version is that you can change that error line you have marked above to something like .Where(u => followersIds.Contains(u.CaUserId) which will return all entities with an CaUserID that is contained in the followersIds variable, however this still has the potential to return a much larger dataset than you will actually need and be quite a large query. (You also might need to check the sytax just a bit, shooting from memory without an IDE open) You are including a lot of linked entities in that query above, so maybe you'd be better off using a Select query vs a Where query, which would load only the properties that you need from each entity.
Take a look at this article from Jon Smith, who wrote the book "Entity Framework Core In Action", where he talks about using Select queries and DTO's to only get out what you need. Chances are, you don't need every property of every entity you are asking for in the query you have above. (Maybe you do, what do I know :p) Using this might help you get something much more efficient for just the dataset you need. More lines of code in the query, but potentaily better performance on the back end and a lighter memory footprint.

Related

How to correctly cache query results when using EF Core with a lot of Include()?

I am using https://github.com/VahidN/EFCoreSecondLevelCacheInterceptor package to cache EF Core query results in my ASP.NET Core app. According to creator, it works like that:
The results of EF commands will be stored in the cache, so that the
same EF commands will retrieve their data from the cache rather than
executing them against the database again.
So this library returns cached results if the generated SQL is the same.
The problem is that I am using a lot of Include() methods while querying database. This is needed due to some pages showing a lot of information that is stored in different related database tables.
Also, the query contains current-user-specific info, e.g. if the current user liked the Post or not. Caching user-specific info is not right as I heard and results in redundant cache entries since the only data that is changing is the current user ID.
Example of a query:
var post = dbContext.Posts
.Where(p => p.Id == 1)
.Include(p => p.Comments)
.Include(p => p.VoteTracker
.Where(t => t.UserId == "{current user ID}"))
// ... other Include(...) calls
.ToList();
The SQL generated can be quite huge with a lot of JOINs. So the problem with caching here are:
if Comments get changed, then the whole query above will be invalidated aswell since this library
watches for all of the CRUD operations using its interceptor and
then invalidates the related cache entries automatically
Having current-user-specific info in the query creates a lot of identical cache entries, which are only varying by that current-user-specific info
What is the better approach for caching here?
The first thing that comes to mind is to have separate queries that can be cached. For example:
// Post and Comments query will be cached
var post = dbContext.Posts.SingleOrDefault(p => p.Id == 1);
var comments = dbContext.Comments
.Where(c => c.PostId == 1)
.ToList();
// This one will be excluded from caching (because of user-specific info)
var voteTracker = dbContext.VoteTrackers
.SingleOrDefault(t => t.PostId == 1 && t.UserId == "{current user ID}");
Is it a good approach? And if it's not which one is better?
I am struggling with this a lot, but having a hard time finding the right solution. Thank you very much in advance! :)

C# Entity Framework Pagination Children Includes

I am looking for the fastest way to do the following operation. What I need to accomplish, is I have a screen that displays the "Parts" that are defined inside of a "Lot". Each part has objects of a station, and each station has objects of tools, and each tool can have measurements.
My problem is I cannot get the pagination to work. The incoming offset is 0, and the number of records to take is 20 however the following operation is not working:
Lot foundLot = EntitiesContext.Lots.Where(x => x.ID == lotID)
.IncludeFilter(lot => lot.Parts.OrderBy(n => n.PartID).Skip(offset).Take(numberOfRecords).ToList())
.IncludeOptimized(lot => lot.Parts.Select(parts => parts.Stations))
.IncludeOptimized(lot => lot.Parts.Select(parts => parts.Stations.Select(station => station.Tools)))
.IncludeOptimized(lot => lot.Parts.Select(parts => parts.Stations.Select(station =>
station.Tools.Select(tools => tools.Measurements))))
.FirstOrDefault();
So I am trying to filter to only grab certain parts, and then of those filtered parts, I want to grab all of the children's data associated with them. I have checked all of the existing stack overflow articles related to this, and the changes I make either result in the Z.EntityFrameworkPlus package throwing a generic exception, that provides no details (which is what the above code does), or if I use the regular EntityFramework functions it throws an exception for the invalid path.
Thank you for your assistance.
Okay, after reading the docs for Z.EntityFrameworkPlus I now see you cannot use IncludeFilter and IncludeOptimzed on the same LINQ statement
https://entityframework-plus.net/query-include-optimized
I switched IncludeFilter to IncludeOptimzed and it works.

Most efficient way to get a single value with EF?

I have been using EF6 for a while and I'm at a point where I need to optimize to the maximum every single query I'm performing to the DB.
There is a point where I simply need to get a string based on a Guid, which is not precisely a complex query but I wanted to know what would be best practice and why:
a) Find/FindAsync
string senderName = Context.Senders.Find(senderId).Name;
b) Where, Select and FirstOrDefault/FirstOrDefaultAsync
string senderName = Context.Senders.Where(x => x.Id == senderId)
.Select(x => x.Name)
.FirstOrDefault();
I can't profile the SQL it's performing right now but since a) query seems "simpler", b) query seems to use defered execution (IQueryable) which could be more interesting even combined with async execution.
Am I right? What would be the best choice and why?
Using Find is much faster. where the quarry is very simple select * from t where id=[ID]
is much cleaner and there wont be any db check etc that happen in EF6 and on the top of that EF wont need to parse the Linq Where and Select statement.
And also for those people who hate EF, i have build an ORM library that work like EF and the old ADO.Net. with migration, code to db etc all those are offcourse optional. is 100% faster with test prov please check it EntityWorker.Core
As pointed out in the comments: a) loads the whole entity into memory while b) loads only the name. If you need the name only, then b is the much better choice.

Linq performance when diffing two lists using inner Contains

EDIT 01: I seem to have found a solution (click for the answer) that works for me. Going from and hour to merely seconds by pre-computing and then applying the .Except() extension method; but leaving this open if anyone else encounters this problem or if anyone else finds a better solution.
ORIGINAL QUESTION
I have the following set of queries, for differend kind of objects I'm staging from a source system so I can keep it in sync and make a delta stamp myself, as the sourcesystem doesn't provide it, nor can we build or touch it.
I get all data in memory an then for example perform this query, where I look for objects that don't exist any longer in the source system, but are present in the staging database - and thus have to be marked "deleted". The bottleneck is the first part of the LINQ query - on the .Contains(), how can I improve it's performance - mayve with .Except(), with a custom comparer?
Or should I best put them in a hashing list and them perform the compare?
The problem is though I have to have the staged objects afterwards to do some property transforms on them, this seemed the simplest solution, but unfortunately it's very slow on 20k objects
stagedSystemObjects.Where(stagedSystemObject =>
!sourceSystemObjects.Select(sourceSystemObject => sourceSystemObject.Code)
.Contains(stagedSystemObject.Code)
)
.Select(x =>
{
x.ActiveStatus = ActiveStatuses.Disabled;
x.ChangeReason = ChangeReasons.Edited;
return x;
})
.ToList();
Based on Yves Schelpe's answer. I made a little tweaks to make it faster.
The basic idea is to cancel the first two ToList and use PLINQ. See if this help
var stagedSystemCodes = stagedSystemObjects.Select(x => x.Code);
var sourceSystemCodes = sourceSystemObjects.Select(x => x.Code);
var codesThatNoLongerExistInSourceSystem = stagedSystemCodes.Except(sourceSystemCodes).ToArray();
var y = stagedSystemObjects.AsParallel()
.Where(stagedSystemObject =>
codesThatNoLongerExistInSourceSystem.Contains(stagedSystemObject.Code))
.Select(x =>
{
x.ActiveStatus = ActiveStatuses.Disabled;
x.ChangeReason = ChangeReasons.Edited;
return x;
}).ToArray();
Note that PLINQ may only work well for computational limited task with multi-core CPU. It could make things worse in other scenarios.
I have found a solution for this problem - which brought it down to mere seconds in stead of an hour for 200k objects.
It's done by pre-computing and then applying the .Except() extension method
So no longer "chaining" linq queries, or doing .Contains inside a method... but make it "simpler" by first projecting both to a list of strings, so that inner calculation doesn't have to happen over and over again in the original question's example code.
Here is my solution, that for now is satisfactory. However I'm leaving this open if anyone comes up with a refined/better solution!
var stagedSystemCodes = stagedSystemObjects.Select(x => x.Code).ToList();
var sourceSystemCodes = sourceSystemObjects.Select(x => x.Code).ToList();
var codesThatNoLongerExistInSourceSystem = stagedSystemCodes.Except(sourceSystemCodes).ToList();
return stagedSystemObjects
.Where(stagedSystemObject =>
codesThatNoLongerExistInSourceSystem.Contains(stagedSystemObject.Code))
.Select(x =>
{
x.ActiveStatus = ActiveStatuses.Disabled;
x.ChangeReason = ChangeReasons.Edited;
return x;
})
.ToList();

Linq Take on Include?

I have a query with a lot of includes, and I'm wondering if I can do Takes on some of the includes.
For example, here's one of my queries, with the (illegal) Take illustrating what I want to do.
var primaryLocation = context.Locations
.Include("PhoneNumbers")
.Include("Invoices").Take(50)
.Include("Invoices.Items")
.Include("Schedules")
.Include("Staffs")
.SingleOrDefault(d => d.Id == locationId);
Currently the only way I can think to do it would be like so:
var primaryLocation = context.Locations
.Include("Invoices")
.Include("Etc")
.SingleOrDefault(d => d.Id == locationId);
primaryLocation.Invoices = primaryLocation.Invoices.Take(50).ToList();
I'd prefer not doing it that way, since means pulling back the entire Invoice list from the database, which I don't need.
Is there a handy way to build the Take into my query?
Seems like have two conflicting criteria for what you're doing. I'm guessing here, but you didn't leave us all that much to go on.
Since your primaryLocation.Invoices = primaryLocation.Invoices.Take(50).ToList(); statement only makes use of 1 of your includes, I'm assuming you're doing more things with your primaryLocation than what you've shown us. This leads me to believe that you want that primaryLocation to include all of the stuff. And then you seem not to want more than those 50, so that's not all of the stuff after all then... To me this is a contradiction. If you require all, you should include it all.
If you want your 50 invoices selection specifically you could get those separately in its own query. I use NHibernate myself, so I'm not sure of the syntax for future's in Entity framework, but if you want to ask for multiple things with only 1 round-trip to the server, in NHibernate you can make a series of queries into futures to allow this. I expect Entity framework has something similar.
In short, what I'm suggesting is that if you want primaryLocation to include all of your data, then that's what you'll get, and if you're after more specific information with filters like Take, then you might want to query more specifically.
Use projection instead of blindly calling Include if you don't want everything:
var primaryLocation = context.Locations
.Select(location => new {
Id = location.Id,
Name = location.Name,
// ... other properties needed on the front end
RecentInvoices = location.Invoices
// really should sort if you're only taking 50
.OrderByDescending(invoice => invoice.CreatedAt)
.Take(50),
AllPhoneNumbers = location.PhoneNumbers,
})
.SingleOrDefault(location => location.Id == locationId);
You could use projection to get just the invoice information you need too, I just didn't want to over-complicate the example.
Using this method you get exactly the data you want without adding confusion. It also allows you to name your properties (such as RecentInvoices above) to add more meaning.

Categories