How to retrieve entities with a subset of related entities? - c#

I have two entities in a one-to-many relation:
Meter (1) -> (n) Reading
I believe my two entities are set up correctly to provide the relation, so assume that.
I wish to retrieve Meters with related Readings but because there may be many Readings per Meter, I wish to limit it by eg Reading.Date. Another option could be to read at most X Readings per Meter.
How can I do that in EF.Core?

What I think the other answer missed is that you are asking for a subset of the related entities, i.e. not the entire collection of related entities.
If you want to be selective about the related entities that are fetched, you cannot just rely on an Include statement (or implicit lazy loading), because these are set up to load all related entities.
There is no selective Include. But you can make an inclusive Select:
DateTime filterDate = DateTime.Now;
var myData = db.Meters
.Select(m => new
{
Meter = m,
Readings = m.Readings.Where(r => r.Date == filterDate)
})
.ToList();
Remarks
I used an anonymous type, but you can of course also use a concrete DTO class.
Where(r => r.Date == filterDate) can be improved (checking for the Date component, or a range), this is just a simple example. You can use whatever filter criteria you want here.
Notice that you do not need an Include statement for this. A Select (on a yet unenumerated IQueryable) does not need an explicit Include because the Select itself is already aware of what data you want to fetch.
I suggest not putting the subset of related entities in the meter.Readings nav prop. This is going to lead to confusion down the line as to whether this list is a subset or the full set, and EF may actually register this as a change when you call SaceChanges(). Nav props should not be used as storage space for collection with the same type but a different functional meaning.

If your tables are designed correctly i.e. key in Meter is mapped with Reading (see foreign key constraints), then EF automatically gives related records when you access its POCO class.
Make sure Reading has foreign key for Meter table in database.

Related

EF Core Single vs. Split Queries

I am using EF Core 7. It looks like, since EF Core 5, there is now Single vs Split Query execution.
I see that the default configuration still uses the Single Query execution though.
I noticed in my logs it was saying:
Microsoft.EntityFrameworkCore.Query.MultipleCollectionIncludeWarning':
Compiling a query which loads related collections for more than one
collection navigation, either via 'Include' or through projection, but
no 'QuerySplittingBehavior' has been configured. By default, Entity
Framework will use 'QuerySplittingBehavior.SingleQuery', which can
potentially result in slow query performance.
Then I configured a warning on db context to get more details:
services.AddDbContextPool<TheBestDbContext>(
options => options.UseSqlServer(configuration.GetConnectionString("TheBestDbConnection"))
.ConfigureWarnings(warnings => warnings.Throw(RelationalEventId.MultipleCollectionIncludeWarning))
);
Then I was able to specifically see which call was actually causing that warning.
var user = await _userManager.Users
.Include(x => x.UserRoles)
.ThenInclude(x => x.ApplicationRole)
.ThenInclude(x => x.RoleClaims)
.SingleOrDefaultAsync(u => u.Id == userId);
So basically same code would be like:
var user = await _userManager.Users
.Include(x => x.UserRoles)
.ThenInclude(x => x.ApplicationRole)
.ThenInclude(x => x.RoleClaims)
.AsSplitQuery() // <===
.SingleOrDefaultAsync(u => u.Id == userId);
with Split query option.
I went through the documentation, but I'm still not sure how to create a pattern out of it.
I would like to set the most common one as a default value across the project, and only use the other for specific scenarios.
Based on the documentation, I have a feeling that the "Split" should be used as default in general but with caution. I also noticed on their documentation specific to pagination, that it says:
When using split queries with Skip/Take, pay special attention to making your query ordering fully unique; not doing so could cause incorrect data to be returned. For example, if results are ordered only by date, but there can be multiple results with the same date, then each one of the split queries could each get different results from the database. Ordering by both date and ID (or any other unique property or combination of properties) makes the ordering fully unique and avoids this problem. Note that relational databases do not apply any ordering by default, even on the primary key.
which completely makes sense as the query will be split.
But if we are mainly fetching from database for a single record, regardless how big or small the include list with its navigation properties, should I always go with "Split" approach?
I would love to hear if there are any best practices on that and when to use which approach.
But if we are mainly fetching from database for a single record, regardless how big or small the include list with its navigation properties, should I always go with "Split" approach?
It depends, let's examine your example in Single query approach:
var user = await _userManager.Users // 1 records based on SingleOrDefault but to server goes TAKE 2
.Include(x => x.UserRoles) // R roles
.ThenInclude(x => x.ApplicationRole) // 1 record
.ThenInclude(x => x.RoleClaims) // C claims
.SingleOrDefaultAsync(u => u.Id == userId);
As result on the client will be returned RecordCount = 1 * R * 1 * C records. Then they will be deduplicated and placed in appropriate collections.
If RecordCount is approximately small Single query can be best approach.
Also EF Core adds ORDER BY for such query which may slowdown execution. So better examine execution plan.
Side note: Better to use FirstOrDefault/Async it CAN be a lot faster than SingleOrDefault/Async, when SQL server fails to detect that there no 2 records in recordset early.
The documentation at https://learn.microsoft.com/en-us/ef/core/querying/single-split-queries outlines the considerations when Split Queries could have unintentional consequences, particularly around isolation and ordering. As mentioned when loading a single record with related details, a singlw query execution is generally perferred. The warning is appearing because you have a one-to-many, which contains a one-to-many, so it is warning that this can potentially lead to a much larger Cartesian Product in terms of a JOIN-based query. To avoid the warning as you are confident that the query is reasonable in size, you can specify .AsSingleQuery() explicitly and the warning should disappear.
When working with object graphs like this you can consider designing operations against the data state to be as atomic as possible. IF you are editing a User that has Roles & Claims, rather than loading everything for a User and attempting to edit the entire graph in memory in one go, you might structure the application to perform actions like "AddRoleToUser", "RemoveRoleFromUser", AddClaimToUserRole", etc. So instead of loading User /w Roles /w Claims, these actions just load Roles for a user, or Claims for a UserRole respectively to alter this data.
After searching through this to figure out if there is any pattern to apply this, and with all the great content provided at the bottom, I was still not sure as I was looking for "When to use split queries" and "when not to", so I tried the summarized my understanding at the bottom.
I will use the same example that Microsoft shows on Single vs Split Queries
var blogs = ctx.Blogs
.Include(b => b.Posts)
.Include(b => b.Contributors)
.ToList();
and here is the generated SQL for that:
SELECT [b].[Id], [b].[Name], [p].[Id], [p].[BlogId], [p].[Title], [c].[Id], [c].[BlogId], [c].[FirstName], [c].[LastName]
FROM [Blogs] AS [b]
LEFT JOIN [Posts] AS [p] ON [b].[Id] = [p].[BlogId]
LEFT JOIN [Contributors] AS [c] ON [b].[Id] = [c].[BlogId]
ORDER BY [b].[Id], [p].[Id]
Microsoft says:
In this example, since both Posts and Contributors are collection
navigations of Blog - they're at the same level - relational databases
return a cross product: each row from Posts is joined with each row
from Contributors. This means that if a given blog has 10 posts and 10
contributors, the database returns 100 rows for that single blog. This
phenomenon - sometimes called cartesian explosion - can cause huge
amounts of data to unintentionally get transferred to the client,
especially as more sibling JOINs are added to the query; this can be a
major performance issue in database applications.
However what it doesn't clearly mention is, other than sorting/ordering issues, this may easily mess up the performance of the queries.
First concern is, we are going to be hitting to database multiple times in that case.
Let's check this one:
using (var context = new BloggingContext())
{
var blogs = context.Blogs
.Include(blog => blog.Posts)
.AsSplitQuery()
.ToList();
}
And check out the generated SQL when .AsSplitQuery() is used.
SELECT [b].[BlogId], [b].[OwnerId], [b].[Rating], [b].[Url]
FROM [Blogs] AS [b]
ORDER BY [b].[BlogId]
SELECT [p].[PostId], [p].[AuthorId], [p].[BlogId], [p].[Content], [p].[Rating], [p].[Title], [b].[BlogId]
FROM [Blogs] AS [b]
INNER JOIN [Posts] AS [p] ON [b].[BlogId] = [p].[BlogId]
ORDER BY [b].[BlogId]
So above query was kind of surprised me. It is interesting that when it uses the split option, it still joins on the second query even though second query should only be pulling data from posts table. Pretty sure EF Core folks had some idea behind that but it just doesn't make sense to me. Then what is the point of having that foreign key over there?
Looks like Microsoft was mainly focused on a solution to avoid cartesian explosion problem but obviously it doesn't mean that "split queries" should be used as best practices by default going forward. Definitely not!
And another possible problem I can think of is data inconsistency, yet the queries are ran separate, you can't guarantee the data consistency. (unless completely locked)
I just don't want to throw away the feature of course. There are still some "good" scenarios to use Split Queries imo, (unless you are really worried about the data consistency) like if we are returning lots of columns with a relation and the size is pretty large, then this could be really performance factor. Or the parent data is not a lot, but tons of navigation sets, then there is your cartesian explosion.
PS: Note that cartesian explosion does not occur when the two JOINs aren't at the same level.
Last but not least, personally, if I am really going to be pulling some heavy amount of data with bunch of relation of relation of relation, I would still prefer those "good old" Stored Procedures. It never gets old!

Improving performance of big EF multi-level Include

I'm an EF noob (as in I just started today, I've only used other ORMs), and I'm experiencing a baptism of fire.
I've been asked to improve the performance of this query created by another dev:
var questionnaires = await _myContext.Questionnaires
.Include("Sections")
.Include(q => q.QuestionnaireCommonFields)
.Include("Sections.Questions")
.Include("Sections.Questions.Answers")
.Include("Sections.Questions.Answers.AnswerMetadatas")
.Include("Sections.Questions.Answers.SubQuestions")
.Include("Sections.Questions.Answers.SubQuestions.Answers")
.Include("Sections.Questions.Answers.SubQuestions.Answers.AnswerMetadatas")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.AnswerMetadatas")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.AnswerMetadatas")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.AnswerMetadatas")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers")
.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.AnswerMetadatas")
.Where(q => questionnaireIds.Contains(q.Id))
.ToListAsync().ConfigureAwait(false);
A quick web-surf tells me that Include() results in a cols * rows product and poor performance if you run multiple levels deep.
I've seen some helpful answers on SO, but they have limited less complex examples, and I can't figure out the best approach for a rewrite of the above.
The multiple repeat of the part -"Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers..." looks suspicious to me like it could be done separately and then another query issued, but I don't know how to build this up or whether such an approach would even improve performance.
Questions:
How do I rewrite this query to something more sensible to improve performance, while ensuring that the eventual result set is the same?
Given the last line: .Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.AnswerMetadatas")
Why do I need all the intermediate lines? (I guess it's because some of the joins may not be left joins?)
EF Version info: package id="EntityFramework" version="6.2.0" targetFramework="net452"
I realise this question is a bit rubbish, but I'm trying to resolve as fast as I can from a point of no knowledge.
Edit
After mulling over this for half a day and thanks to StuartLC's suggestions I came up with some options:
Poor - split the query so that it performs multiple round-trips to fetch the data. This is likely to provide a slightly slower experience for the user, but will stop the SQL timing out. (This is not much better than just increasing the EF command timeout).
Good - change the clustered indexing on child tables to be clustered by their parent's foreign key (assuming you don't have a lot of insert operations).
Good - change the code to only query the first few levels and lazy-load (separate db hit) anything below this, i.e. remove all but the top few Includes, then change the ICollections - Answers.SubQuestions, Answers.AnswerMetadatas, and Question.Answers to all be virtual. Presumably the downside to making these virtual is that if any (other) existing code in the app expects those ICollection properties to be eager-loaded, you may have to update that code (i.e. if you want/need them to load immediately within that code). I will be investigating this option further. Further edit - unfortunately this won't work if you need to serialize the response due to self-referencing loop.
Non-trivial - Write a sql stored proc/view manually and build a new EF object pointed at it.
Longer term
The obvious, best, but most time-consuming option - rewrite the app design, so it doesn't need the whole data tree in a single api call, or go with the option below:
Rewrite the app to store the data in a NoSQL fashion (e.g. store the object tree as json so there are no joins). As Stuart mentioned this is not a good option if you need to filter the data in other ways (via something other than the questionnaireId), which you might need to do. Another alternative is to partially store NoSQL-style and partially relational as required.
First up, it must be said that this isn't a trivial query. Seemingly we have:
6 levels of recursion through a nested question-answer tree
A total of 20 tables are joined in this way via eager loaded .Include
I would first take the time to determine where this query is used in your app, and how often it is needed, with particular attention to where it is used most frequently.
YAGNI optimizations
The obvious place to start is to see where the query is used in your app, and if you don't need the whole tree all the time, then suggest you don't join in the nested question and answer tables if they are not needed in all usages of the query.
Also, it is possible to compose on IQueryable dynamically, so if there are multiple use cases for your query (e.g. from a "Summary" screen which doesn't need the question + answers, and a details tree which does need them), then you can do something like:
var questionnaireQuery = _myContext.Questionnaires
.Include(q => q.Sections)
.Include(q => q.QuestionnaireCommonFields);
// Conditionally extend the joins
if (mustIncludeQandA)
{
questionnaireQuery = questionnaireQuery
.Include(q => q.Sections.Select(s => s.Questions.Select(q => q.Answers..... etc);
}
// Execute + materialize the query
var questionnaires = await questionnaireQuery
.Where(q => questionnaireIds.Contains(q.Id))
.ToListAsync()
.ConfigureAwait(false);
SQL Optimizations
If you really have to fetch the whole tree all the time, then look at your SQL table design and indexing.
1) Filters
.Where(q => questionnaireIds.Contains(q.Id))
(I'm assuming SQL Server terminology here, but the concepts are applicable in most other RDBMs as well.)
I'm guessing Questionnaires.Id is a clustered primary key, so will be indexed, but just check for sanity (it will look something PK_Questionnaires CLUSTERED UNIQUE PRIMARY KEY in SSMS)
2) Ensure all child tables have indexes on their foreign keys back to the parent.
e.g. q => q.Sections means that table Sections has a foreign key back to Questionnaires.Id - make sure this has at least a non-clustered index on it - EF Code First should do this automagically, but again, check to be sure.
This would look like IX_QuestionairreId NONCLUSTERED on column Sections(QuestionairreId)
3) Consider changing the clustered indexing on child tables to be clustered by their parent's foreign key, e.g. Cluster Section by Questions.SectionId. This will keep all child rows related to the same parent together, and reduce the number of pages of data that SQL needs to fetch. It isn't trivial to achieve in EF code first, but your DBA can assist you in doing this, perhaps as a custom step.
Other comments
If this query is only used to query data, not to update or delete, then adding .AsNoTracking() will marginally reduce the memory consumption and in-memory performance of EF.
Unrelated to performance, but you've mixed the weakly typed ("Sections") and strongly typed .Include statements (q => q.QuestionnaireCommonFields). I would suggest moving to the strongly typed includes for the additional compile time safety.
Note that you only need to specify the include path for the longest chain(s) which are eager loaded - this will obviously force EF to include all higher levels too. i.e. You can reduce the 20 .Include statements to just 2. This will do the same job more efficiently:
.Include(q => q.QuestionnaireCommonFields)
.Include(q => q.Sections.Select(s => s.Questions.Select(q => q.Answers .... etc))
You'll need .Select any time there is a 1:Many relationship, but if the navigation is 1:1 (or N:1) then you don't need the .Select, e.g. City c => c.Country
Redesign
Last but not least, if data is only ever filtered from the top level (i.e. Questionnaires), and if the whole questionairre 'tree' (Aggregate Root) is typically always added or updated all at once, then you might try and approach the data modelling of the question and answer tree in a NoSQL way, e.g. by simply modelling the whole tree as XML or JSON, and then treat the whole tree as a long string. This will avoid all the nasty joins altogether. You would need a custom deserialization step in your data tier. This latter approach won't be very useful if you need to filter from nodes in the tree (i.e. a Query like find me all questionairre's where the SubAnswer to Question 5 is "Foo" won't be a good fit)

Entity Framework 5 (Code First) Navigation Properties

Is it the correct behaviour of entity framework to load all items with the given foreign key for a navigation property before querying/filtering?
For example:
myUser.Apples.First(a => a.Id == 1 && !a.Expires.HasValue);
Will load all apples associated with that user. (The SQL query doesn't query the ID or Expires fields).
There are two other ways of doing it (which generate the correct SQL) but neither as clean as using the navigation properties:
myDbContext.Entry(myUser).Collection(u => u.Apples).Query().First(a => a.Id == 1 && !a.Expires.HasValue);
myDbContext.Apples.First(a => a.UserId == myUser.Id && a.Id == 1 && !a.Expires.HasValue);
Things I've Checked
Lazy load is enabled and is not disabled anywhere.
The navigation properties are virtual.
EDIT:
Ok based on your edit I think i had the wrong idea about what you were asking (which makes a lot more sense now). Ill leave the previous answer around as i think its probably useful to explain but is much less relevant to your specific question as it stands.
From what you've posted your user object is enabled for lazy loading. EF enables lazy loading by default, however there is one requirement to lazy loading which is to mark navigation properties as virtual (which you have done).
Lazy loading works by attaching to the get method on a navigation property and performing a SQL query at that point to retrieve the foreign entity. Navigation properties are also not queriable collections, which means that when you execute the get method your query will be executed immediately.
In your above example the apples collection on User is enumerated before you execute the .first call (which occurs using plain old linq to objects). This means that SQL will return back all of the apples associated to the user and filter them in memory on the querying machine (as you have observed). This will also mean you need two queries to pull down the apples you are interested in (one for the user and one for the nav property) which may not be efficient for you if all you want is apples.
A perhaps better way of doing this is to keep the whole expression as a query for as long as possible. An example of this would be something like the following:
myDbContext.Users
.Where(u=>u.Id == userId)
.SelectMany(u=>u.Apples)
.Where(a=>a.Id == 1 && !a.Expires.HasValue);
this should execute as a single SQL statement and only pull down the apples you care about.
HTH
Ok from what i can understand of your question you are asking why EF appears to allow you to use navigation properties in a query even though they may be null in the result set.
In answer to your question yes this is expected behavior, heres why:
Why you write a query it is translated into SQL, for example something like
myDbContext.Apples.Where(a=>a.IsRed)
will turn into something like
Select * from Apples
where [IsRed] = 1
similarly something like the following will also be translated directly to SQL
myDbContext.Apples.Where(a=>a.Tree.Height > 100)
will turn into something like
Select a.* from Apples as a
inner join Tree as t on a.TreeId = t.Id
where t.Height > 100
However its a bit of a different story when we actually pull down the result sets.
To avoid pulling down too much data and making it slow EF offers several mechanisms for specifying what comes back in the result set. One is lazy loading (which incidently needs to be used carefully if you want to avoid performance issues) and the second is the include syntax. These methods restrict what we are pulling back so that queries are quick and dont consume un-needed resources.
For example in the above you will note that only Apple fields are returned.
If we were to add an include to that as below you could get a different result:
myDbContext.Apples.Include(a=>a.Tree).Where(a=>a.Tree.Height > 100)
will translate to SQL similar to:
Select a.*, t.* from Apples as a
inner join Tree as t on a.TreeId = t.Id
where t.Height > 100
In your above example (which I'm fairly sure isn't syntactically correct as myContext.Users should be a collection and therefore shouldn't have a .Apples) you are creating a query therefor all variables are available. When you enumerate that query you have to be explicit about whats returned.
For more details on navigation properties and how they work (and the .Include syntax) check out my blog: http://blog.staticvoid.co.nz/2012/07/entity-framework-navigation-property.html

CRM 2011: Limitation of query expression?

I believe the answer to this question may be to use Linq to Sql, but wanted to see if this is something which is possible using QueryExpressions:-
I create a query expression which queries against Entity A, it also links to Entity B (via LinkEntity) and imposes additional criteria. It is possible to retrieve columns from Entity B by adding the appropriate attribute names. However, it will only retrieve the linked entity (inner join).
Is it possible using QueryExpression to retrieve all related records (and required columns) from Entity B related to Entity A (e.g. all cases associated with contact where contact passes specified criteria). Normally I would consider inverting the query and searching for Entity B relatig to Entity A with the appropriate LinkEntity Conditions, but there are a number of linked entities which I would like to retrieve for the same contact query.
So I'm left with some options:-
(1) Perform a second query (not ideal when iterating over a large number of results from the initial query),
(2) Perform a query using Linq to CRM on the filtered views,
(3) A different method entirely?
Any thoughts would be appreciated.
EDIT:
I ended up using Linq-to-Sql to complete this task and the code used is similar to that below (albeit with a few more joins for the actual query!):-
var dataCollection = (from eA in xrmServiceContext.EntityASet
join eB in xrmServiceContext.EntityBSet on new EntityReference(EntityA.EntityLogicalName, eA.Id) equals (EntityReference)eB.EntityBLookupToEntityA
select new
{
Id = eA.Id,
EntityBInterestingAttribute = eB.InterestingAttributeName
}
So this will bring back a row per Entity A, per Entity B. To make things easier I then defined a custom class "MyEntityAClass" which had properties which were Lists so I could return one object for filling of GridView etc. This is more to do with the processing of these results though so I haven't posted that code here.
I hope that makes sense. Essentially, it is getting the multiple rows per record a la SQL which makes this method work.
QueryExpression can only return fields from one type of entity, the one specified in QueryExpression.EntityName.
You can use FetchXML which allows you to also get the fields of any link entities, which would be an option 3 for you, unfortunately it returns the data as XML which you would then have to parse yourself.
It might be quicker to run the FetchXML, but it will take longet to write and test, and its not the easiest thing to maintain either.
Sample Code, this gets the first 101 of all Cases that are active for all accounts that are active
string fetch = "<fetch count='101' mapping='logical'><entity name='account'><filter type='and'><condition attribute='statecode' operator='eq' value='1'/></filter><link-entity name='incident' from='customerid' to='accountid'><all-attributes/><filter type='and'><condition attribute='statecode' operator='eq' value='1'/></filter></link-entity></entity></fetch>";
string data = yourCrmServiceObject.Fetch(fetch);

How do I apply the LINQ to SQL Distinct() operator to a List<T>?

I have a serious(it's getting me crazy) problem with LINQ to SQL. I am developing an ASP.NET MVC3 application using c# and Razor in Visual Studio 2010.
I have two database tables, Product and Categories:
Product(Prod_Id[primary key], other attributes)
Categories((Dept_Id, Prod_Id) [primary keys], other attributes)
Obviously Prod_Id in Categories is a foreign key. Both classes are mapped using the Entity Framework (EF). I do not mention the context of the application for simplicity.
In Categories there are multiple rows containing the Prod_Id. I want to make a projection of all Distinct Prod_Id in Categories. I did it using plain (T)SQL in SQL Server MGMT Studio according to this (really simple) query:
SELECT DISTINCT Prod_Id
FROM Categories
and the result is correct. Now I need to make this query in my application so I used:
var query = _StoreDB.Categories.Select(m => m.Prod_Id).Distinct();
I go to check the result of my query by using:
query.Select(m => m.Prod_Id);
or
foreach(var item in query)
{
item.Prod_Id;
//other instructions
}
and it does not work. First of all the Intellisense when I attempt to write query.Select(m => m. or item.shows just suggestions about methods (such as Equals, etc...) and not properties. I thought that maybe there was something wrong with Intellisense (I guess most of you many times hoped that Intellisense was wrong :-D) but when I launch the application I receive an error at runtime.
Before giving your answer keep in mind that;
I checked many forums, I tried the normal LINQ to SQL (without using lambdas) but it does not work. The fact that it works in (T)SQL means that there is something wrong with the LINQ to SQL instruction (other queries in my application work perfectly).
For application related reasons, I used a List<T> variable instead of _StoreDB.Categories and I thought that was the problem. If you can offer me a solution without using a List<T> is appreciated as well.
This line:
var query = _StoreDB.Categories.Select(m => m.Prod_Id).Distinct();
Your LINQ query most likely returns IEnumerable... of ints (judging by Select(m => m.Prod_Id)). You have list of integers, not list of entity objects. Try to print them and see what you got.
Calling _StoreDB.Categories.Select(m => m.Prod_Id) means that query will contain Prod_Id values only, not the entire entity. It would be roughly equivalent to this SQL, which selects only one column (instead of the entire row):
SELECT Prod_Id FROM Categories;
So when you iterate through query using foreach (var item in query), the type of item is probably int (or whatever your Prod_Id column is), not your entity. That's why Intellisense doesn't show the entity properties that you expect when you type "item."...
If you want all of the columns in Categories to be included in query, you don't even need to use .Select(m => m). You can just do this:
var query = _StoreDB.Categories.Distinct();
Note that if you don't explicitly pass an IEqualityComparer<T> to Distinct(), EqualityComparer<T>.Default will be used (which may or may not behave the way you want it to, depending on the type of T, whether or not it implements System.IEquatable<T>, etc.).
For more info on getting Distinct to work in situations similar to yours, take a look at this question or this question and the related discussions.
As has been explained by the other answers, the error that the OP ran into was because the result of his code was a collection of ints, not a collection of Categories.
What hasn't been answered was his question about how to use the collection of ints in a join or something in order to get at some useful data. I will attempt to do that here.
Now, I'm not really sure why the OP wanted to get a distinct list of Prod_Ids from Categories, rather than just getting the Prod_Ids from Projects. Perhaps he wanted to find out what Products are related to one or more Categories, thus any uncategorized Products would be excluded from the results. I'll assume this is the case and that the desired result is a collection of distinct Products that have associated Categories. I'll first answer the question about what to do with the Prod_Ids first, and then offer some alternatives.
We can take the collection of Prod_Ids exactly as they were created in the question as a query:
var query = _StoreDB.Categories.Select(m => m.Prod_Id).Distinct();
Then we would use join, like so:
var products = query.Join(_StoreDB.Products, id => id, p => p.Prod_Id,
(id,p) => p);
This takes the query, joins it with the Products table, specifies the keys to use, and finally says to return the Product entity from each matching set. Because we know that the Prod_Ids in query are unique (because of Distinct()) and the Prod_Ids in Products are unique (by definition because it is the primary key), we know that the results will be unique without having to call Distinct().
Now, the above will get the desired results, but it's definitely not the cleanest or simplest way to do it. If the Category entities are defined with a relational property that returns the related record from Products (which would likely be called Product), the simplest way to do what we're trying to do would be the following:
var products = _StoreDB.Categories.Select(c => c.Product).Distinct();
This gets the Product from each Category and returns a distinct collection of them.
If the Category entity doesn't have the Product relational property, then we can go back to using the Join function to get our Products.
var products = _StoreDB.Categories.Join(_StoreDB.Products, c => c.Prod_Id,
p => p.Prod_Id, (c,p) => p).Distinct();
Finally, if we aren't just wanting a simple collection of Products, then some more though would have to go into this and perhaps the simplest thing would be to handle that when iterating through the Products. Another example would be for getting a count for the number of Categories each Product belongs to. If that's the case, I would reverse the logic and start with Products, like so:
var productsWithCount = _StoreDB.Products.Select(p => new { Product = p,
NumberOfCategories = _StoreDB.Categories.Count(c => c.Prod_Id == p.Prod_Id)});
This would result in a collection of anonymous typed objects that reference the Product and the NumberOfCategories related to that Product. If we still needed to exclude any uncatorized Products, we could append .Where(r => r.NumberOfCategories > 0) before the semicolon. Of course, if the Product entity is defined with a relational property for the related Categories, you wouldn't need this because you could just take any Product and do the following:
int NumberOfCategories = product.Categories.Count();
Anyway, sorry for rambling on. I hope this proves helpful to anyone else that runs into a similar issue. ;)

Categories