I am wondering what the best practice is on a database schema for inheriting from a base class where the base class has the PK id.
Let's say I'm building an app for a school. There are Teachers and Staff each with their own "duties". I want an Employee base class. Then, I want ONE Duty class that can be for any type of employee. Meaning, I don't want to create a TeacherDuty class with a FK to the TeacherId and a second StaffDuty class with a FK to StaffId, I'd like one Duty class, bound to the EmployeeId.
What is best practice here? Or is it best practice to have a TeacherDuty and StaffDuty class with FK's to each of their respective classes. I'm trying to minimize the number of classes and make them as reusable as possible (for instance, if there's another type of Employee, like Administrator, I want to avoid creating yet another table for AdministratorDuty).
public class Employee
{
public string Id {get; set;}
}
public class Teacher: Employee
{
public string Id {get; set;}
}
public class Teacher: Employee
{
public string Id {get; set;}
}
public class Duty
{
//what's the FK here?
}
Your hierarchy still doesn't quite make sense because you have a Teacher class there twice identically. I assume one of them is meant to be Staff instead of Teacher. But even with that, I'm not quite sure why Teacher would not be considered staff also?
It's important to properly designing your schema before continuing. I'll attempt to answer you actual question, but that doesn't mean this is necessarily the correct way to go about it. I've proposed an alternative schema design at the end for you to consider.
Answering your Question
To answer your question, we can look at how ORM's (Namely Entity Framework Core) solve this kind of thing. The issue is, the concept of OOP and classes doesn't fully map over to the Relational database world. C# ORM's like Entity Framework have to do this so even if you're not going to use Entity Framework, looking at how they solve this issue can give you insight.
To make it more obvious what is going on, I'm going to add some fake columns to each your classes. Let's pretend this is what your C# classes look like:
public class Employee
{
public string Id { get; set; }
public bool IsActive { get; set; }
}
public class Administrator : Employee
{
public string PhoneNumber { get; set; }
}
public class Teacher : Employee
{
public string HomeRoom { get; set; }
}
So you can see both Administrators and teachers should share an Id and whether they are active or not. Administrators store their phone number, and teachers need a home room. Administrators do not need a home room and teachers do not need a phone number. Again, the columns aren't meant to be a real use case, just as an example.
Table per Hierarchy
The first way (and recommended/default way in Entity Framework Core) is to do a single table per Hierarchy. This means you would have a table called Employee, and that table would have all the columns required to represent every type of employee (Staff, Teacher, Administrator, etc). This means there would be 1 EmployeeID column to represent all employees. The magic comes from having a 'Discriminator' column to specify 'This employee is a X'.
So your Employee table would look something like this:
Column
Type
Id
NVARCHAR(50) PK
Discriminator
NVARCHAR(100) (NOT NULL)
IsActive
BIT (NOT NULL)
PhoneNumber
NVARCHAR(50) (NULLABLE)
HomeRoom
NVARCHAR(50) (NULLABLE)
And the data might look something like this:
Id
Discriminator
IsActive
PhoneNumber
HomeRoom
1
Teacher
1
NULL
2A
2
Administrator
1
3256-6986
NULL
In this situation your duty table would have a FK of EmployeeID, which would join to this table. Conceptually it's simple, but you have to make sure you always add the discriminator when doing queries for a specific type. Using EF Core, this is mostly handled for you.
If you wanted to query out all the Teachers, you'd do a query something like this:
SELECT * FROM Employee WHERE Discriminator = 'Teacher'
The reason this is the default way in EF Core is because it allows for faster querying of data. Everything about this 1 type is stored in the one table. That benefit is also one of its weaknesses. You can see that a Teacher row has to store that it's PhoneNumber is null, even though that data is not relevant to a teacher. This is okay in this small sample, but if you have a bunch of employee types that each have a lot of unique columns, you add a lot of bloat to your table. This can make it hard to understand what it going on.
Table-per-type
The second way to do it is by doing a table per type. In this way, you'd have an Employee table with just Id and IsActive. Then there'd be a separate table for Teacher/Administator which holds the specific data for those classes.
The schema's might look something like this:
Employee
| Column | Type |
|---- |-------|
| Id | NVARCHAR(50) PK Identity |
| IsActive | BIT (NOT NULL) |
Administrator
| Column | Type |
|---- |-------|
| EmployeeId | NVARCHAR(50) PK |
| PhoneNumber | NVARCHAR(50) (NULLABLE) |
Teacher
| Column | Type |
|---- |-------|
| EmployeeId | NVARCHAR(50) PK |
| HomeRoom | NVARCHAR(50) (NULLABLE) |
It's a bit harder to show what the data would look like, but essentially you'd have every employee inside the employee table. Then if they were a teacher, there would be a record in the Teacher table with a matching EmployeeId.
So if you wanted to query out all the teachers with this method, you'd do something like this:
SELECT * FROM Employee
INNER JOIN Teacher ON Teacher.EmployeeId = Employee.Id
The benefit to this way is that you aren't storing any wasted data.
The downside is that you could potentially make your queries a lot more complex by adding in a lot of joins. In the EF Core docs they mention this performance hit.
Alternative
I mentioned at the start that I would try to answer your question as is, which I have done. But I also think it's important to offer alternatives that may or may not apply. To me it seems like you're starting with an OOP approach to how you want your data to look, and then you're asking 'How can I map this to a database'. An OOP design doesn't really work 100% with schema's though, so instead start with a schema, then figure out how to represent that with classes.
Consider the following: Is it ever possible for an employee to have multiple roles? With your current system, someone is locked into being 1 type of employee. What if someone changes roles? Maybe they started as an Administrator but changed to a teacher? I'm not saying this is definitely an issue. Maybe you'd consider someone changing jobs a new employee, in which case your current way would work.
I'd propose structuring it completely differently though. Have your Employee table contain all the information all employees could need (no matter what role they have). Then have a separate table called something like Role, which has a row for all the different types of employees possible. For example it'd have a row for 'Teacher', 'Administrator', 'Grounds Keeper', etc. This way adding new roles is as easy as adding a row into a table.
Then you'd have a many-many table called EmployeeRole, which just stores an EmployeeID and RoleID (And maybe an IsActive to say if they are still in that role). That way a single employee can have multiple roles, and you'd have a history of their previous roles they've had before.
Then you could have your Duty table have a FK to this EmployeeRole table, instead of directly to the Employee table. Each Role the employee has would then have a different set of Duty's.
I'm not a Database designer though and this design is definitely not without its own flaws, so think about your use case and carefully design what limitations you need to allow for. Don't think about your code in OOP terms, create a class structure from your schema. Use my example as a way to think differently about the problem, but don't just follow what I proposed blindly.
Summing up
If you want to stick with your original design, I think Table-Per-Hierarchy works best for you. But I strongly recommend re-thinking your schema and figuring out what kind of limitations apply to it, and whether you need to allow for those.
Related
Currently working on some learning in Entity Framework Core and would appreciate some help as I've been stuck for a decent while.
I've got a set of Models:
Users (Usernames, emails, names, etc)
Students (Each student has a one-one relationship with a user, this is just for readability)
This is the part that is stumping me:
School - Each school has ONE admin (user), a name and a collection/list of classes.
School Class - Has ONE teacher (user) and a collection/list of students.
The issue I have is how do I set these models up and create a new one without having to pass a whole user object into the class/school during creation? E.g. my model might look like
class School
{
[Required]
public string id {get; set;}
[Required]
public User Admin {get; set;}
[Required]
public List<Student> Students {get; set;}
[Required]
public List<SchoolClass> Classes {get; set;}
}
(Note this is just quickly thrown together for the sake of the question)
I can provide more detail if asked but I feel my whole current approach is slightly wrong.
Essentially whenever I add a user I should be able to add a student (optional), then I should be able to create schools but ideally I don't want to have to fetch/pass in a whole user object / list of classes when initially creating a school -> I'd just like to pass in the ID of a user to be the admin for example.
Then when I create a SchoolClass, it shouldn't need to create new users, I should just be able to add current users. Not entirely sure where to go with this.
You can make it nullable with the null operator (?)
public User? Admin {get; set;}
When you do your database calls, you'll add the Admin User object to the School object. However, it will not add the entire object to the database. It will only update the foreign key in the School table, and the Admin will not be marked for modification unless you explicitly do so.
Also be aware that attributes are gradually being phased out in favor of Fluent API.
I am attempting to create a new Web API using ASP.Net Core 3, Entity Framework Core, and AutoMapper against a previously existing database. I will try to explain my problem briefly.
For the sake of example, assume the database has the following tables:
Person
--------------------
person_id int PK
first_name varchar
last_name varchar
...other fields common to all Persons...
Owner
--------------------
owner_id int PK, FK Person.person_id
...fields specific to Owners...
Renter
--------------------
renter_id int PK, FK Person.person_id
...fields specific to Renters...
Note: The original developers of the database did not make the person_id an identity column. They use a [UniqueIds] table and a stored procedure to fetch and increment Ids for tables in the database.
In the context of the data model objects, Person, Owner, and Renter are all distinct classes with their on DbSet<> properties in the DbContext. The owner_id or renter_id is a person_id, and ties the tables together through a foreign key constraint.
From a Domain object perspective, I've designed the Owner and Renter domain classes as sub-classes of a Person class.
class Person { ... }
class Owner : Person { ... }
class Renter : Person { ... }
I'm still learning the plumbing of how to put all this together, which is proving to be difficult given the plethora of information available, which is sometimes incomplete, lacking context, or outdated. So I really could use some up-to-date guidance. What isn't clear to me:
If I were using the data model directly, my application would be responsible for knowing that in order to create a new Owner record (i.e., add a new class instance to DbSet<Owner>), a new Person record with the same ID must be added to DbSet<Person>. So, I am assuming that a domain model in the layer that sits atop the data/persistence layer has to do something similar, and that AutoMapper will take care of ensuring the domain object's properties are properly mapped to the data objects if properly configured.
With that in mind, when defining the domain model, should I define a separate OwnerId field in the Owner class and somehow map it in AutoMapper? This seems rather sloppy to me, and relies on the consumer of my domain to ensure that OwnerId and PersonId from the base class hold the same value when creating a new Owner.
It would seem to me that AutoMapper should support the ability to represent the inheritance (which it does with IMappingExression.Include or IMappingExression.IncludeBase) and map Owner.PersonId (Domain) to both Owner.OwnerId and Person.PersonId in the data objects. But I cannot find any practical examples of this. Perhaps my Google Fu is failing me and similar question has been asked, but I could not find one. Any help or guidance would be much appreciated.
I am currently looking for a way I can pass a foreing key to a table entry that is listed in one table,
and should be extracted in another table.
for example purposes I created this ?
public class Parent
{
public string Name {get; set;}
public virtual ICollection<child> Children
Public virtual ICollection<School> Schools {get; set;}
}
public class Child
{
public string Name {get; set;}
Public School Schoola{get; set;} // Which should be a school Name that the Parent Should know?
}
public class School
{
//ParentID
//ChildID
public string SchoolName {get; set;}
}
How do i give my Child instance a SchoolName that the Parent contains within the SchoolNames?
Children and SchoolNames are seperate tables - but child only need to know a specific entry..
Caveat
Your code does not work, since EF does not serialize collections of primitive types. EF Core does have value conversions but it is unclear what you're exactly looking for. I'm going to assume you meant to store these as actual School entities, since your question asks how to "extract one entry from a table".
For the sake of answering your question, I assume that your child should have a reference to the school entity, not a string property that's technically unrelated to the school entity itself, which would make it a question not related to Entity Framework and thus the question tags would be wrong.
I'll address both my assumption and your literal question, just to be sure.
If you need a relationship between a child and a school
From a purely database standpoint, there is no way to specify that an entity's (Child) foreign key should refer to an entity (School) which in and of itself has a foreign key to another entity (Parent). It simply doesn't exist in SQL and therefore EF cannot generate this behavior for you.
What you can do, is implement business validation on your code and refuse to store any child with a school that doesn't belong to its parent. Keep in mind, this requires you to load the parent and their schools every time you want to save a child to the database (because otherwise you can't check if the selected school is allowed for this child), so it will become a somewhat expensive operation.
However, that doesn't prevent the possibility for someone to introduce data into the database (circumventing your business logic, e.g. by a DBA) where this rule is violated but the FK constraint itself is upheld.
How you handle these bad data states is up to you. Do you remove those entries when you stumble upon them? Do you proactively scan the database once in a while? Do you allow it to exist but restrict your application's users to only choosing schools from the parent's scope? These are all business decisions that we cannot make for you.
If a child needs a school name without a relation to the school itself
At first sight, this seems to me to be a bad solution. What happens when the school's name changes? Wouldn't you expect the child's schoolname to also change? Because that's not going to happen in your current setup.
In either case, if you are looking to set a string property, that's trivial, you simply set the property. Presumably, your question is how to restrict the user's options to the child's parent's schools.
This restrictive list can be fetched from the database using the child's identifier:
var childID = 123;
var schoolsFromParent = db
.Children
.Where(c => c.Id == childId)
.Select(c => c.Parent.Schools)
.FirstOrDefault();
Note that this code works regardless of whether you have a School entity or a list of strings - though the type of schoolsFromParent will be different.
And then restrict your end user to only being able to pick from the presented options. Note that to prevent bad data, you should doublecheck the chosen name after the user has selected it.
For school I'm working on a project in C# WPF and SQL Server. I made the database and use Linq to SQL. Now I got the following tables:
Patients
-------
ID int PK
name varchar
insurancecompany int FK
Insurancecompanies
-------
ID int PK
name varchar
insurancecompany in patients is a FK to id in insurancecompanies
I left out a lot of unnecessary columns for my question since it would take too long. So I added the database tables to my Linq to SQL database model. I created an instance to the patient class. Looking at it, I see 2 properties. One is insurancecompany, which is an int. The other is insurancecompany1, which is an insurancecompany type.
Is it safe to make the int private, or remove it? Or is there a way to make it so there's only one property?
What is happening is that database model sees that you have a foreign key relationship to Insurancecompanies, it looks at the value you've assigned it and then finds that insurancecompany and adds it as an additional property which it calls "insurancecompany1" (it would have called it "insurancecompany" but couldn't because you already have a property with that name).
This is a nice feature because it makes it easy to look at the insurance company for a given patient without needing to use joins;
var dave = patient();
//assign a patient from your database to "dave" here..
var nameOfDavesInsuranceCompany = dave.insurancecompany1.name;
If you remove the int insurancecompany you will loose this feature because the model would no longer be able to work out which insurance company to look at. You could make it private but you would loose the ability to assign an insurance company to patient by simply giving an int value (you would always have to set an insurancecompany object to insurancecompany1).
If you don't like the names, you could rename insurancecompany to something like insurancCompanyId and then call insurancecompany1 insuranceCompany.
So, I'd love some feedback on the best way to design the classes and store the data for the following situation:
I have an interface called Tasks that looks like this:
interface ITask
{
int ID{ get; set;}
string Title {get; set;}
string Description{get; set;}
}
I would like the ability to create different types of Tasks depending on who is using the application...for example:
public class SoftwareTask: ITask
{
//ITask Implementation
string BuildVersion {get; set;}
bool IsBug {get; set;}
}
public class SalesTask: ITask
{
//ITask Implementation
int AccountID {get; set;}
int SalesPersonID {get; set;}
}
So the way I see it I can create a Tasks table in the database with columns that match the ITask interface and a column that shoves all of the properties of more specific tasks in a single column (or maybe even serialize the task object into a single column)
OR
Create a table for each task type to store the properties that are unique to that type.
I really don't like either solution right now. I need to be able to create different types of Tasks ( or any other class) that all share a common core set of properties and methods through a base interface, but have the ability to store their unique properties in a fashion that is easy to search and filter against without having to create a bunch of database tables for each type.
I've starting looking into Plug-In architecture and the strategy pattern, but I don't see where either would address my problem with storing and accessing the data.
Any help or push in the right direction is greatly appreciated!!!
Your second approach (one table per type) is the canonical way to solve this problem - while it requires a bit more effort to implement it fits better with the relational model of most databases and preserves a consistent and cohesive representation of the data. The approach of using one table per concrete type works well, and is compatible with most ORM libraries (like EntityFramework and NHibernate).
There are, however, a couple of alternative approaches sometimes used when the number of subtypes is very large, or subtypes are created on the fly.
Alternative #1: The Key-Value extension table. This is a table with one row per additional field of data you wish to store, a foreign key back to the core table (Task), and a column that specifies what kind of field this is. It's structure is typically something like:
TaskExt Table
=================
TaskID : Number (foreign key back to Task)
FieldType : Number or String (this would be AccountID, SalesPersonID, etc)
FieldValue : String (this would be the value of the associated field)
Alternative #2: The Type-Mapped Extension Table. In this alternative, you create a table with a bunch of nullable columns of different data types (numbers, strings, date/time, etc) with names like DATA01, DATA02, DATA03 ... and so on. For each kind of Task, you select a subset of the columns and map them to particular fields. So, DATA01 may end up being the BuildVersion for a SoftwareTask and an AccountName for a SalesTask. In this approach, you must manage some metadata somewhere that control which column you map specific fields to. A type-mapped table will often look something like:
TaskExt Table
=================
TaskID : Number (foreign key back to task)
Data01 : String
Data02 : String
Data03 : String
Data04 : String
Data05 : Number
Data06 : Number
Data07 : Number
Data08 : Number
Data09 : Date
Data10 : Date
Data11 : Date
Data12 : Date
// etc...
The main benefit of option #1 is that you can dynamically add as many different fields as you need, and you can even support a level of backward compatibility. A significant downside, however, is that even simple queries can become challenging because fields of the objects are pivoted into rows in the table. Unpivoting turns out to be an operation that is both complicated and often poorly performing.
The benefits of option #2 is that it's easy to implement, and preserves a 1-to-1 correspondence betweens rows, making queries easy. Unfortunately, there are some downsides to this as well. The first is that the column names are completely uninformative, and you have to refer to some metadata dictionary to understand which columns maps to which field for which type of task. The second downside is that most databases limit the number of columns on a table to a relatively small number (usually 50 - 300 columns). As a result, you can only have so many numeric, string, datetime, etc columns available to use. So if you type ends up having more DateTime fields than the table supports you have to either use string fields to store dates, or create multiple extension tables.
Be forewarned, most ORM libraries do not provide built-in support for either of these modeling patterns.
You should probably take a lead from how ORMs deal with this, like TPH/TPC/TPT
Given that ITask is an interface you should probably go for TPC (Table per Concrete Type). When you make it a baseclass, TPT and TPH are also options.