I was going through some sample code about DBEntities and DbContext. Is there any limit on number of rows the DbSet pulls from the database? In the below code sample, lets say there is a DbSet<History> history or DbSet<Logs> logs, when a dbcontext is created, will dbcontext.logs or dbcontext.history has all the logs present in the database? If its so, what if the tables have millions of rows. Doesnt it hit the performance when during linq or any udpates and saving the context?
public virtual DbSet<Course> Courses { get; set; }
public virtual DbSet<Standard> Standards { get; set; }
public virtual DbSet<Student> Students { get; set; }
public virtual DbSet<StudentAddress> StudentAddresses { get; set; }
using (var context = await _contextFactory.CreateContext())
{
context.History.Add(history);
context.SaveChanges();
}
Entity framework doesn’t need to pull any rows to do an insert which is what the Add() method and SaveChanged() does. It should do what you would do in SQL to add a row to the table in question.
As in your example, it doesn't "explode"
The following line basically only adds an item to an empty change tracker:
context.History.Add(history);
If you would execute
context.History.ToList()
Then the query is executed as a "select * from History" and you will definitely hit a performance issue if it contains millions of rows.
Key point is that EF is "smart enough" to not load everything in memory as a whole set. You could attach a profiler (or enable EF logging) to see the actual queries being executed. Fiddle around a bit with it to gain some experience.
If you expand the set, for example with a debugger, then basically you don't apply any filter and will retrieve the whole set. With the misuse of navigation properties you would even be able to load your whole database in memory.
The subtitle difference is within the difference between the IQueryable and other IEnumerable-like interfaces.
While the object is still only IQueryable the actual query is still to be executed and can be expand with filters. As I said; once you are starting to enumerate, the actual query is executed and hence, an unfiltered dbset will return all rows in a table.
Also note the mentioned linq methods
.Skip
And
.Take
There are several more, like group, join, where, etc.
You have to realise that a DbSet<Student> does not represent your collection of Students, it represents the Students table in your database. This means that you can query for sequences of properties of Students.
If desired, you can query the complete sequence, but that will lead to performance problems, if not memory problems.
Therefore, if you ask for Student data, you have to keep in mind what you will be using of the fetched data: don't Select properties that you already know the value of, don't Select items that you don't plan to use.
An example: a database with Schools and Students, with a one-to-many relation, every School has zero or more Students, every Student attends exactly one School:
class School
{
public int Id {get; set;}
public string Name {get; set;}
...
// every School has zero or more Students (one-to-many)
public virtual ICollection<Student> Students {get; set;}
}
class Student
{
public int Id {get; set;}
public string Name {get; set;}
...
// Every Student attends exactly one School, using foreign key:
public int SchoolId {get; set;}
public virtual School School {get; set;}
}
In entity framework the columns of the tables are represented by non-virtual properties. The virtual properties represent the relations between the tables (one-to-many, many-to-many, ...)
Don't do the following!
public IEnumerable<School> GetSchoolByLocation(string city)
{
return mySchoolWithItsStudents = dbContext.Schools
.Where(school => school.City == city)
.Include(school => school.Students)
.ToList();
}
Why not? This looks like perfect code, doesn't it?
Well maybe you are fetching way more data than your caller will use:
var mySchoolId = GetSchoolByLocation("Oxford")
.Where(school => schoolStreet == "Main Street")
.Select(school => school.Id)
.FirstOrDefault();
What a waste, to first fetch all Oxford schools, and then keep only this one!
Furthermore: you get the school with all its Students, and all you use if the Id of the school?
Try to return IQueryable<...> as long as possible, and let your caller decide what to do with the returned data.
Maybe he wants to do ToList, or Count, or FirstOrDefault. Maybe he only wants the Id and the Name. As long as you don't know that, don't make the decision for him, it only makes your code less reusable.
Always use Select to select the properties, and select only the data you actually plan to use. Only use Include if you plan to update the included data.
var schools = dbContext.Schools.Where(school => ...)
// Keep only the Schools that you actually plan to use:
.Select(school => new
{
// only select the properties that you plan to use
Id = school.Id,
Name = school.Name,
...
// Only the Students you plan to use:
Students = school.Students.Where(student => ...)
.Select(student => new
{
// Again, only the properties you plan to use
Id = student.Id,
Name = student.Name,
// no need for the foreign key: you already know the value
// SchoolId = student.SchoolId,
}),
});
Finally, if you want access to all Students to show them, but you don't want to fetch all million Students at once, consider fetching them by Page. Remember the primary key of the last item of the last fetched page, and use `.Where(item => item.Id > lastFetchedPrimaryKey).Take(pageSize) to get the next page, until there are no more pages.
This way, you might ask for 50 students, while you'll only display 25 of them, but at least you don't have all million students in memory. Fetching the next page is fairly fast, because there is already an index on primary key, and fetched items are already ordered by primary key.
Related
I have two classes like so.
public class Client { public Guid Id { get; set; } ... }
public class Meeting
{
public Guid Id { get; set; }
public Client[] Invitees { get; set; } = new Client[0];
public Client[] Attendees { get; set; } = new Client[0];
}
The config in the contex is as follows.
private static void OnModelCreating(EntityTypeBuilder<Client> entity) { }
private static void OnModelCreating(EntityTypeBuilder<Meeting> entity)
{
entity.HasMany(a => a.Invitees);
entity.HasMany(a => a.Attendees);
}
I only need the reference to the clients from my meetings. The clients need not to know anything. The meetings need to reference the clients twice or less (volountary presence, optional invitation).
The migration on the above creates two tables, which I'm perfectly fine with. But it creates two indices as well, like this.
migrationBuilder.CreateIndex(
name: "IX_Clients_MeetingId",
table: "Clients",
column: "MeetingId");
migrationBuilder.CreateIndex(
name: "IX_Clients_MeetingId1",
table: "Clients",
column: "MeetingId1");
I'm not fine with that. First of all, I expected only one index to be created, as we're indexing the sme table's primary keys. Secondly, if I can't dogde that, I dislike the digit in IX_Clients_MeetingId1.
What can I do (if anything) to only have a single index created?
How can I specify the name of the index if I'm not using WithMany()?
I'm not providing any links as a proof of effort. Checking MSDN, SO and blogs resulted in a lot of hits on the full M2M relation, i.e. .HasMany(...).WithMany(...) and that's not what I'm heading for. I saw a suggestion to manually make the change in the migration file but tempering with those is begging for issues later. I'm not sure how to google-off the irrelevant results and I'm starting to fear that the "half" M2M I'm attempting is a bad idea (there's no in-between table created, for instance).
Well, it seems that EF is assuming you have 2 one2many relations. So one Client could only be invited to at most one meeting.
As a quick resolution you can either
add 2 join entities explicitly and configure the appropriate
one2many relations. Then you have one table for Invitations and one
for Attendance.
add one many2many join entity that also tracks a
link type (Client, Meeting, LinkType) so that "invited" and
"attended" are link types
Add 2 properties to Client to show EF that
you mean this as a many2many relation:
Like so:
public class Client {
public Guid Id { get; set; }
public ICollection<Meeting> InvitedTo { get; set; }
public ICollection<Meeting> Attended { get; set; }
}
These should not show up in the clients table but as 2 separate tables. (Essentially solution 1 with implicit join entity)
Stepping back, I think you can simply improve the model by introducing an MeetingMember entity. In the current model there's no way a client can be invited to two meetings, nor are clients restricted to attending meetings to which they are invited. So you need a M2M relation, and you can get away with one if you use an explicit linking entity, like
MeetingMember(MeetingId, ClientId, InvitedAt, Attended)
I am new to Entity Framework and Linq. I am using .Net Core and EF Core 5. I like it so far but have hit a couple of issues that I am struggling with. This is the one I am really confused about and not understanding.
I have some products represented in a class. I have customers that buy these products in another class. Each customer may call one of my products a different name within their business so I need to allow them to define an alias that they use for my product.
Here are my two parent classes (Products & Customers)
public class Product
{
public int Id {get; set;}
public Guid IdGuid {get; set;}
public string Name {get; set;}
}
public class Customer
{
public int Id {get; set;}
public Guid IdGuid {get; set;}
public string Name {get; set;}
}
In these classes I have an Id column that is used by the Database for referential integrity. I do not pass this Id back to the user. Instead whenever the user gets one of these objects they get the Guid returned so they can uniquely identify the row in the DB. The alias class that joins these two tables is as follows:
public class ProductCustomerAlias
{
public Product Product {get; set;}
public Customer Customer {get; set;}
public string Alias {get; set;}
}
Since I need the database table to use a complex key consisting of the Product Id and the Customer Id I have to override the OnModelCreating method of the context object:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
base.OnModelCreating(modelBuilder);
modelBuilder.Entity<ProductCustomerAlias>()
.HasKey("ProductId", "CustomerId");
}
'''
This ends up creating a Database table that has the following structure (in Oracle)
'''
CREATE TABLE "RTS"."ProductCustomerAlias"
(
"ProductId" NUMBER(10,0),
"CustomerId" NUMBER(10,0),
"Alias" NVARCHAR2(100)
)
So far so good. I now have an intersect table to store Alias' in that has a primary key of ProductId and CustomerId both being the key integer values from the Products and Customers within the DB context.
So my next step is to start creating the Repo class that retrieves data from these objects. Keep in mind that the end user that submits the request can only pass me the Product.IdGuid and Customer.IdGuid because that is all they ever have. In my Repo I have the following code:
public async Task<ProductCustomerAlias> GetProductCustomerAliasAsync(Guid pProductId, Guid pCustomerId)
{
var alias = await (from ali in _context.ProductCustomerAlias
join prod in _context.Products on ali.Product.Id equals prod.Id
join cust in _context.Customers on ali.Customer.Id equals cust.Id
where cust.IdGuid == pCustomerId && prod.IdGuid == pProductId
select new {Product = prod,
Customer = cust,
Alias = ali.Alias}
).FirstOrDefaultAsync();
return (IEnumerable<ProductCustomerAlias>)alias;
}
My problem is that it is giving me the following error:
Cannot convert type '<anonymous type: Models.Product Product, Models.Customer Customer, string Alias>' to 'System.Collections.Generic.IEnumerable<Models.ProductCustomerAlias>'
Please don't tell me that the error is telling me exactly what is wrong. I am sure it is but if I understood where I was screwing up I would not be wasting my time typing out this ridiculously long explanation. So how can I cast the results from my Linq query to the specified type? Is there something else I am doing wrong? Any help would be greatly appreciated.
Answering your concrete questions.
So how can I cast the results from my Linq query to the specified type?
You can't, because the LINQ query result is anonymous type which is not compatible with the desired result type (concrete entity type), thus cannot be cast to it.
Is there something else I am doing wrong?
Sorry to say that, but basically everything is wrong.
Select is not needed because the desired result type is the exact type of the entity being queried. i.e. ali variable here
from ali in _context.ProductCustomerAlias
is exactly what you need as a result (after applying the filter and limiting operators)
Manual joins are also not needed, because they are provided automatically by navigation properties, i.e. here
join prod in _context.Products on ali.Product.Id equals prod.Id
the prod is exactly the same thing as ali.Product
attempt to cast single object to enumerable is wrong
return (IEnumerable<ProductCustomerAlias>)alias
Even if the alias variable was of correct type, this will fail because it is single object rather than a collection.
So, the solution is quite simple - use the corresponding DbSet, apply filter (Where), limit the result to zero or one (FirstOrDefault{Async}) and you are done.
With one small detail. Since you are querying and returning a full entity, its navigation properties (like Product and Customer) are considered to be a related data, and are not populated (loaded) automatically by EF Core. You have to explicitly opt-in for that, which is called eager loading and explained in the Loading Related Data section of the official EF Core documentation (I would also recommend familiarizing with navigation properties and the whole Relationships concept). With simple words, this requires usage of specifically provided EF Core extension methods called Include and ThenInclude.
With all that being said, the solution is something like this:
public async Task<ProductCustomerAlias> GetProductCustomerAliasAsync(
Guid pProductId, Guid pCustomerId)
{
return await _context.ProductCustomerAlias
.Include(a => a.Product)
.Include(a => a.Customer)
.Where(a => a.Customer.IdGuid == pCustomerId && a.Product.IdGuid == pProductId)
.FirstOrDefaultAsync();
}
You can even replace the last two lines with single call to the predicate overload of FirstOrDefaultAsync, but that's not essential since it is just a shortcut for the above
return await _context.ProductCustomerAlias
.Include(a => a.Product)
.Include(a => a.Customer)
.FirstOrDefaultAsync(a => a.Customer.IdGuid == pCustomerId && a.Product.IdGuid == pProductId);
I'm trying learn to write efficient Entity Framework queries when data has to be fetched based on multiple joins, including a many-to-many via a junction table. In the following example, I'd like to fetch all States that contain a particular Book.
Let's use a model with the following tables/entities, all linked by navigation properties:
State, City, Library, Book, LibraryBook (junction table for many-to-many relationship between library and book.)
Each State has 1 or more Cities
Each City has 1 or more Libraries
Each Library has many Books & Each Book may exist at more than 1 library.
How can I best return all of the States that contain a particular Book? I'm inclined to think separate queries may work better than 1 large one, but I'm not certain what the best implementation is. I think that getting the LibraryId from the many-to-many relation first in a separate query is probably a good way to start.
So for that:
var bookId = 12;
var libraryIds = _context.LibraryBook.Where(l => l.BookId == bookId).Select(s => s.LibraryId);
If that comes first, I'm uncertain how to best query the next data in order to get the cities which contain each of those LibraryIds. I could use a foreach:
var cities = new List<City>;
foreach(var libraryId in libraryIds)
{
var city = _context.City.Where(c => c.Library = libraryId)
cities.Add(city);
}
But then I'd have to do yet another foreach for the states that contain the city, and this all adds up to a lot of separate SQL queries!
Is this really the only way to go about this? If not, what is a better alternative?
Thanks in advance!
Database management systems are extremely optimized in combining tables and selecting columns from the result. The transport of the selected data is the slower part.
Hence it is usually better to limit the data that needs to be transported: let the DBMS do all the joining and selecting.
For this, you don't need to put everything in one big LINQ statement that is hard to understand (and thus hard to test, reuse, maintain). As long as your LINQ statements remain IQuerayble<...>, the query is not executed. Concatenating several of these LINQ statements is not costly.
Back to your question
If you followed the entity framework conventions, your one-to-many relations and your many-to-many will have resulted in classes similar to the following:
class State
{
public int Id {get; set;}
public string Name {get; set;}
...
// every State has zero or more Cities (one-to-many)
public virtual ICollection<City> Cities {get; set;}
}
class City
{
public int Id {get; set;}
public string Name {get; set;}
...
// Every City is a City in exactly one State, using foreign key:
public int StateId {get; set;}
public virtual State State {get; set;}
// every City has zero or more Libraries (one-to-many)
public virtual ICollection<Library> Libraries {get; set;}
}
Library and Books: many-to-many:
class Library
{
public int Id {get; set;}
public string Name {get; set;}
...
// Every Library is a Library in exactly one City, using foreign key:
public int CityId {get; set;}
public virtual City City {get; set;}
// every Library has zero or more Books (many-to-many)
public virtual ICollection<Book> Books {get; set;}
}
class Book
{
public int Id {get; set;}
public string Title {get; set;}
...
// Every Book is a Book in zero or more Libraries (many-to-many)
public virtual ICollection<Book> Books {get; set;}
}
This is all that entity framework needs to know to recognize your tables, the columns in the tables and the relations between the tables.
You will only need attributes or fluent API if you want to deviate from the conventions: different identifiers for columns or tables, non-default types for decimals, non default behaviour for cascade on delete, etc.
In entity framework, the columns in the tables are represented by the non-virtual properties; the virtual properties represent the relations between the tables.
The foreign key is an actual column in the table, hence it is non-virtual. The one-to-many has virtual ICollection<Type> on the "one" side and virtual Type on the "many" side. The many-to-many has virtual ICollection<...> on both sides.
There is no need to specify the junction table. Entity framework recognizes the many-to-many and creates the junction table for you. If you use database first, you might need to use fluent API to specify the junction table.
But how am I supposed to do the joins without a junction table?
Answer: don't do the (group-)joins yourself, use the virtual ICollections!
How can I best return all of the States that contain a particular Book?
int bookId = ...
var statesWithThis = dbContext.States
.Where(state => state.Cities.SelectMany(city => city.Libraries)
.SelectMany(library => library.Books)
.Select(book => book.Id)
.Contains(bookId);
In words: you have a lot of States. From every State, get all Books that are in all Libraries that are in all Cities in this State. Use SelectMany to make this one big sequence of Books. From every Book Select the Id. The result is one big sequence of Ids (of Books that are in Libraries that are in Cities that are in the State). Keep only those States that have at least one Book.
Room for Optimization
If you regularly need to do similar questions, like: "Give me all States that have a Book from a certain Author", or "Give me all Libraries that have a Book with a certain title", consider to create extension methods for this. This way you can concatenate them as any LINQ method. The extension method creates the query, it will not execute them, so this won't be a performance penalty.
Advantages of the extension method: simpler to understand, reusable, easier to test and easier to change.
If you are not familiar with extension methods, read Extension Methods Demystified
// you need to convert them to IQueryable with the AsQueryable() method, if not
// you get an error since the receiver asks for an IQueryable
// and a ICollection was given
public static IQueryable<Book> GetBooks(this IQueryable<Library> libraries)
{
return libraries.SelectMany(library => library.AsQueryable().Books);
}
public static IQueryable<Book> GetBooks(this IQueryable<City> cities)
{
return cities.SelectMany(city => city.Libraries.AsQueryable().GetBooks());
}
Usage:
Get all states that have a book by Karl Marx:
string author = "Karl Marx";
var statesWithCommunistBooks = dbContext.States.
.Where(state => state.GetBooks()
.Select(book => book.Author)
.Contains(author));
Get all Cities without a bible:
string title = "Bible";
var citiesWithoutBibles = dbContext.Cities
.Where(city => !city.GetBooks()
.Select(book => book.Title)
.Contains(title));
Because you extended your classes with method GetBooks(), it is as if States and Cities have Books. You've seen the reusability above. Changes can be easy, if for instance you extend your database such, that Cities have BookStores. GetBooks can check the libraries and the BookStores. Your change will be in one place. Users of GetBooks(), won't have to change.
I have an entity that consists only of foreign keys of other Entities.
My simplified class looks like this:
DeliverNote
Adress add1 {get; set;}
Adress add2 {get; set;}
I can load adresses by themselves just fine, but I can't load a DeliveryNote, because EF doesn't load the related data by default, I think.
So I saw solutions, mainly with context.notes.Include(dn => dn.Adresses), but I just can't figure out how I tell the note or the adress class how they're related to each other. Basically when I type "dn." nothing shows up.
The simplest, probably working, solution I saw was from microsoft. In the github from this page https://learn.microsoft.com/de-de/ef/core/querying/related-data you can see the Blog and the Post classes. To me the Post class looks flawed though, why would a Post have to know about the Blog it is in? This will mess up the database too in code first solutions. What if the same post is gonna be posted in several blogs?
Most solutions also seem to be lists of some kind, I don't have a list, just simple single objects. 1-1 relationship, I think.
So you have a database with a table of Addresses and a table of DeliveryNotes. Every DeliveryNote has two foreign keys to the Addresses: one From and one To (you call it addr1 and addr2)
If you follow the entity framework code first conventions, you'll have something like this:
class Address
{
public int Id {get; set;}
... // other properties
// every Address has sent zero or more delivery Notes (one-to-many)
public virtual ICollection<DeliveryNote> SentNotes {get; set};
// every Address has received zero or more delivery Notes (one-to-many)
public virtual ICollection<DeliveryNote> ReceivedNotes {get; set};
}
class DeliveryNote
{
public int Id {get; set;}
... // other properties
// every DeliveryNote comes from an Address, using foreign key
public int FromId {get; set;}
public virtual Address FromAddress {get; set;}
// every DeliverNote is sent to an Address, using foreign key:
public int ToId {get; set;}
public virtual Address ToAddress {get; set;}
}
In entity framework the columns of the tables are represented by non-virtual properties. The virtual properties represent the relations between the tables.
Note that the ICollection and FromAddress / ToAddress are virtual and thus not columns into your columns. If desired you can leave them out of your classes. However, if you have these virtual properties, you don't have to do the (Group)Joins yourself.
I can load adresses by themselves just fine, but I can't load a DeliveryNote, because EF doesn't load the related data by default ... I
From this it is not easy to detect what kind of queries you want.
One of the slower parts of database queries is the transport of the selected data from your DBMS to your local process. Hence it is wise to minimize the data being transported.
If you use Include, then the complete object is transported, inclusive the foreign keys and all properties you don't need. If you have a database with Schools and Students, then every Student will have a foreign key to the School he attends. If you ask for a 'School with his 1000 Students' of school with Id 4, using Include, you don't want to transport the foreign key SchoolId a 1000 times, because you already know it will have value 4
In entity framework only use Include if you want to change / update the fetched item, otherwise use Select
Given a bunch of DeliveryNotes, give me some AddressDetails of it:
IQueryable<DeliveryNote> deliveryNotes = dbContext.DeliveryNotes
.Where (deliveryNote => ...) // probably something with Id, or Date, or subject
.Select(deliveryNote => new
{
// select only the delivery note properties you actually plan to use
Subject = deliveryNote.Subject,
DeliveryDate = deliveryNote.DeliveryDate,
...
From = new
{
// select only the From properties you plan to use
Id = deliveryNote.FromAddress.Id,
Name = deliveryNote.FromAddress.Name,
Address = deliveryNote.FromAddress.Address,
...
}
To = new
{
// again: only properties you'll use
Name = deliveryNote.ToAddress.Name,
...
},
});
Entity framework knows the one-to-many relationship and will perform the proper join for you.
Given a bunch of Addresses give me some of the DeliveryNotes they received
var query = dbContext.Addresses
.Where(address => address.City == "New York" && ...)
.Select(address => new
{
// only properties you plan to use
Id = address.Id,
Name = address.Name,
ReceivedNotes = address.ReceivedNotes
.Where(note => note.DeliveryDate.Year == 2018)
.Select(note => new
{
// only properties you plan to use:
Title = note.Title,
...
// no need for this, you know it equals Id
// AddressId = note.FromId,
}),
});
Entity framework knows the one-to-many relationship and will do the proper groupjoin for you.
If you have a one-to-many relationship and you want the "item with its many sub-items", start on the one-side and use the virtual ICollection. If you want the sub-item with the item that it belongs to, start with the many-side and use the virtual property to the one-side
If you define your model as:
public class DeliverNote {
public int Id { get; set; }
public Adress addr1 { get; set; }
public Adress addr2 { get; set; }
}
public class Adress {
public int Id { get; set; }
}
You can then call:
context.notes.Include(dn => dn.addr1).Include(dn => dn.addr2);
Which will include the related data.
Your model doesn't define foreign keys for addr1 or addr2 so EF Core will create shadow properties for you, i.e. columns that exist in the table but not as properties in the c# model.
Is it possible to get the value of the primary key of the entity to be created next? Before it is created?
I tried:
Order newOrder = new Order();
MessageBox.Show(newOrder.orderId.toString());
It showed 0.
Is it possible?
Bigger Picture:
I am trying to build a fast food order management system. I have Order, Item and OrderItem tables. There is a many-to-many relationship between Order and Item and OrderItem table resolves this relationship.
So, when adding an order, I need to add OrderItem s whose orderId field should be populated by the 'Order' just being created, i.e. is not created yet.
EDIT: I use Code-First approach.
EF will take care of it.
So if you have Code First, you could do something like:
class Order
{
[Key]
public int Id { get; set }
public virtual List<OrderItem> Items { get; set; }
public Order()
{
Items = new List<OrderItem>();
}
}
class OrderItem
{
[Key]
public int Id { get; set; }
public string ItemName { get; set; } //of course just a demo property
}
and do something like:
Order order = new Order()
OrderItem item = new OrderItem();
item.ItemName = "Super Burger";
order.Items.Add( item );
context.Orders.Add( order );
context.SaveChanges();
All is well and the keys populated accordingly.
If the key is generated at the database then no, it's not possible. So I see 2 alternatives:
You can just add the entity to your collection, then call dbContext.SaveChanges() to write it to the database and use the id value in your code. (You could do this inside a transaction and rollback the transaction subsequently, or even just remove the record, if needs be.)
You can generate the id value yourself in your code rather than having something generated by the database - perhaps by using a GUID as an ID
There are positives and negatives to each approach depending on how this fits into whatever your 'big picture' is.
UPDATE:
in terms of positives and negatives - the first option is probably closer to what you originally planned. And can be made to work. Arguably would make it easier to set up relationships with other entites. On the flipside you need to hit the database each time you want to get a new id - which could be slower.
It's really a (opinion based!) design decision - either method can be made to work.