I'm developing a web app (not ASP.NET), and I encountered a small architectural problem:
So, i have two classes to work with users.
public class User
{
public int Id { get; set; }
public string Username { get; set; }
public string Password { get; set; }
// Other properties...
}
public class Profile
{
public int Id { get; set; }
public string PhotoUrl { get; set; }
public string DisplayName { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public List<PostItem> Posts { get; set; }
}
I had to split these classes because there is a feature that allows you to view profile of the certain member, and obviously you don't want to retrieve data from database that contains user's password, name and other private stuff (though it's not displayed in view). So i'm storing this data in different tables: table Users contains personal infomation, while table Profiles contains public one (it can be viewed by anyone).
But at the same time, in order not to break Single responsibility principle, i had to implement UserRepository and ProfileRepository classes that does some checking, adding and other stuff.
And here they come:
Issue 1: code that handles user registration is turned into real hell now, i have to check if record with specific username exists in the two different tables by instantiating two repositories.
Issue 2: Also on the page where you can view public data, there is a need to display latest posts, but here is another problem: i can't store complicated values in one column, so i have to store posts in another table too. It means that i need to implement PostRepository and at the same time property Posts in Profile class is useless (though i need it to display latest posts in view), because in order to retrieve latest posts you need to look through other table inside UserRepository, but it should be handled by PostRepository. For example the same goes for comments.
So, this is my small problem. Any advices?
Ok, taking each item in turn;
1) Its perfectly normal to have the Identity of a user checked through one repository and their permissions to your application stored in another. In fact this is the basic idea behind federated identity. Consider that your might extend your application to allow Identity to be provided by Facebook, but permissions by your own application, and you will see that separating them makes sense.
2) Yes, absolutely. What makes you think that a high volume store like Posts is best served by the same repository that you store a low-change-rate set of data like Permissions in ? One might be in Mongo, the other in Active Directory, with the Identity being OAUTH. You see that since your own the whole application you see these as being unnecessary complexities, whereas they represent good architectural separation.
Identity => not owned by your application. Slow change rate.
Permissions => owned by your application. Slow change rate.
Posts => owned by your application. Fast change rage.
Just looking at those three use-cases, it seems that using different repositories would be a good idea since they have such different profiles. If ultimately your repositories all map to a SQL Server (or other) implementation, then so be it; but by separating these architecturally you can use the best possible underlying implementation.
Related
I use SQL Azure and have application, which sync data with external resource. Data is huge, approx 10K records, so, I get it from DB one time, update something if necessary during some minutes and save changes. It works, but problem with simultaneously access to data. IF during these some minutes other service add changes, these changes will be rewritten.
But in the most cases it concerns fields, which my application does not touch!
So, for example, my Table Device:
public partial class Device : BaseEntity
{
public string Name { get; set; }
public string IMEI { get; set; }
public string SN { get; set; }
public string ICCID { get; set; }
public string MacAddress { get; set; }
public DeviceStatus Status { get; set; }
first service (application with long-term process) can modify SN, ICCID, MacAddress, but not Status, second service, vice versa, can modify only Status.
Code to update in the first service:
_allLocalDevicesWithIMEI = _context.GetAllDevicesWithImei().ToList();
(it gets entities, not DTO, because really there are many fields can be changed)
and then:
_context.Devices.Update(localDevice);
for every device, which should be changed
and, eventually:
await _context.SaveChangesAsync();
How to mark, that field Status should be excluded from tracing?
One simple method to avoid update the status field when calling the first service is create a update entity not include the status field, and create another update entity for the second service which includes the status field.
Another way to resolve this problem is override the SaveChangesAsync method and control the update logic by yourself, but it's complex I think and the behavior is implicit, it will not easy for others to understand your code.
To avoid rewrite, you can specify RowVersion on entities. This is so called optimistic concurrency, it will throw error if rewrite happens and you can retry operation if someone already changed something. Or you can just level up your Transaction level, to something like RepeatableRead/Serialized to lock these rows for entire operation (which of course will pose huge performance impact and timeouts). Second option is simple, and good enough for background jobs and distributed transactions, first one is more flexible and usually faster. but hard to implement across multiple endpoints/entities.
EDIT 1: story example at the end
Years ago, we created tables in order to count how many products there were in our boxes.
There are two simple tables:
product (
code VARCHAR(16) PK,
length INT,
width INT,
height INT
)
box (
pkid INT IDENTITY(1,1),
barcode varchar(18),
product_code VARCHAR(16) FK,
quantity INT
)
And there two associated class:
public struct Product
{
public string Code { get; set; }
public int Length { get; set; }
public int Width { get; set; }
public int Heigth { get; set; }
}
public struct Box
{
public int Id { get; set; }
public string BarCode { get; set; }
public Product Product { get; set; }
public int Quantity { get; set; }
}
After years, we need to put multiple different products in the same box, so we now need this:
product (
code VARCHAR(16) PK,
length INT,
width INT,
height INT
)
-- box changed
box (
pkid INT IDENTITY(1,1),
barcode varchar(18)
)
-- stock created
stock (
box_pkid INT M-PK FK,
product_code VARCHAR(16) M-PK FK,
quantity INT
)
and this:
public struct Product
{
public string Code { get; set; }
public int Length { get; set; }
public int Width { get; set; }
public int Heigth { get; set; }
}
public struct Box
{
public int Id { get; set; }
public string BarCode { get; set; }
public Dictionary<Product, int> Content { get; set; } // <-- this changed
public int Quantity { get; set; }
}
But after years, we have lot of code, maybe duplicates in some dark places, left by leaving collaborators. I am a trainee, so I ask for my future experiences, in order to avoid this later.
What could be a solution to update our schema and keep data-integrity safe ? Even with millions of rows in DB ?
Example:
In 2014, we needed to store 10 Romeo and Juliet books in one box. If we had some Hamlet books, then we put them in another box. All 10 Romeo and Juliet books were the 'same' product (same cover, same content, same reference).
Today, we want to store, let's say, different Shakespear books in the same box. Or maybe different Love books. Or even Romeo and Juliet books AND figurines? So different products together: we should change the box table and Box class, shouldn't we?
You have many challenges; I'd split them into two high-level groups.
Firstly, how do you change your application at the code level, and secondly, how do you migrate your data from the old schema to the new one.
How do you change your code?
The first question is: can you be 100% certain that the classes you list are the only ways the data is accessed and modified? Are there any triggers, stored procedures, batch jobs, or other applications? I'm not aware of any way of finding this out other than by trawling through both the database schema artifacts, and the code base.
Within your "own" application, you have a choice. It's usually better to extend than modify your interface. In practical terms, that means that instead of changing the public Product Product { get; set; } signature to handle a dictionary, you keep it around, and add public Dictionary<Product, int> Content { get; set; } - if you can guarantee that the old method still works. This would mean limited re-writing all the dependencies of your class - you only have to worry about clients that need to understand that there could be more than 1 product in a box.
This allows you to follow a "lots of small changes, but the existing code continues to work" model; you can manage this via feature toggles etc. It's much lower risk - so the lesson here is "design your solution to be open to extension, but closed to change".
In this case, it doesn't seem possible - the "set" method may be okay (you can default that to a "one product in a box" solution), but the "get" method would have no graceful way of handling the case where you have more than 1 product in a box. If that's true, you change the class, and look for all the instances where your code won't compile, and follow the chain of dependencies.
For instance, in a typical MVC framework, in this case you'd be changing the model; this should cause the controller to report a compile error. In resolving that error, you almost certainly modify the signature of the controller methods. This in turn should break the view. So you follow that chain; doing this means your schema change becomes a "big bang, all-or-nothing" release. This typically is stressful for all involved...
How do you release your change?
This depends hugely on which of the two options you've chosen. #gburton's answer covers the database steps; these are necessary in both code options.
The second challenge is releasing new versions of your software; if it's a desktop client, for instance, you must make sure all clients are updated at the same time as your database change. This is (usually) horrible. If it's a web application, it's usually a little easier - fewer client machines to worry about.
Safely updating a legacy system is a classic problem. I'm guessing from your post that there isn't a nice safe dev copy of the DB, or at least one that is up to date, or you would already have a process to apply here.
I've written this in a system agnostic way even though you're obviously using MS SQL Server.
The key is to use caution, and ensure you are never 100% stuck if something goes wrong.
back up the old DB. Ensure you know how to do this without breaking anything.
Restore that backup into a new location.
figure out a test plan (this can be the longest part of the job)
make the changes to the new copy of the DB (don't touch the live one)
run through your test plan to ensure nothing has been broken.
If step 5 showed some errors, you just have to work through them. Once this is done, you have the scary part. The backup restore drill is critical here.
take backup of live database (your previous backup is probably out of date. you want as fresh a backup as possible to reduce data loss)
run a backup restore drill to make 100% sure you can recover
apply the changes to the live database
re-run your tests
Recovering a database down to the individual transaction is possible with many database engines. Consider using that process for step 6 if possible. How to achieve this would be a seperate question.
Is it ok (read good practice) to re-use POCO's for the request and response DTO's. Our POCO's are lightweight (ORM Lite) with only properties and some decorating attributes.
Or, should I create other objects for the request and/or response?
Thanks,
I would say it depends on your system, and ultimately how big and complex it is now, and has the potential to become.
The ServiceStack documentation doesn't specify which design pattern you should use. Ultimately it provides the flexibility for separating the database model POCO's from the DTOs, but it also provides support for their re-use.
When using OrmLite:
OrmLite was designed so that you could re-use your data model POCOs as your request and response DTOs. As noted from the ServiceStack documentation, this was an intentional design aim of the framework:
The POCOs used in Micro ORMS are particularly well suited for re-using as DTOs since they don't contain any circular references that the Heavy ORMs have (e.g. EF). OrmLite goes 1-step further and borrows pages from NoSQL's playbook where any complex property e.g. List is transparently blobbed in a schema-less text field, promoting the design of frictionless Pure POCOS that are uninhibited by RDBMS concerns.
Consideration:
If you do opt to re-use your POCOs, because it is supported, you should be aware that there are situations where it will be smarter to use separate request and response DTOs.
In many cases these POCO data models already make good DTOs and can be returned directly instead of mapping to domain-specific DTOs.
^ Not all cases. Sometimes the difficulty of choosing your design pattern is foreseeing the cases where it may not be suitable for re-use. So hopefully a scenario will help illustrate a potential problem.
Scenario:
You have a system where users can register for your service.
You, as the administrator, have the ability to list users of your service.
If you take the OrmLite POCO re-use approach, then we may have this User POCO:
public class User
{
[PrimaryKey, AutoIncrement, Alias("Id")]
public int UserId { get; set; }
public string Username { get; set; }
public string Password { get; set; }
public string Salt { get; set; }
public bool Enabled { get; set; }
}
When you make your Create User request you populate Username and Password of your User POCO as your request to the server.
We can't just push this POCO into the database because:
The password in the Password field will be plain text. We are good programmers, and security is important, so we need to create a salt which we add to the Salt property, and hash Password with the salt and update the Password field. OK, that's not a major problem, a few lines of code will sort that before the insert.
The client may have set a UserId, but for create this wasn't required and will cause our database query to fail the insert. So we have to default this value before inserting into the database.
The Enabled property may have been passed with the request. What if somebody has set this? We only wanted the deal with Username and Password, but now we have to consider other fields that would effect the database insert. Similarly they could have set the Salt (though this wouldn't be a problem because we would be overriding the value anyway.) So now you have added validation to do.
But now consider when we come to returning a List<User>.
If you re-use the POCO as your response type, there are a lot of fields that you don't want exposed back to the client. It wouldn't be smart to do:
return Db.Select<User>();
Because you don't have a tight purpose built response for listing Users, the Password hash and the Salt would need to be removed in the logic to prevent it being serialised out in the response.
Consider also that during the registration of a user, that as part of the create request we want to ask if we should send a welcome email. So we would update the POCO:
public class User
{
// Properties as before
...
[Ignore] // This isn't a database field
public bool SendWelcomeEmail { get; set; }
}
We now have the additional property that is only useful in the user creation process. If you use the User POCO over and over again, you will find over time you are adding more and more properties that don't apply to certain contexts.
When we return the list of users, for example, there is now an optional property of SendWelcomeEmail that could be populated - it just doesn't make sense. It can then be difficult to maintain the code.
A key thing to remember is that when sharing a POCO object such that it is used as both a request and response object: Properties that you send as a response will be exposed in a request. You will have to do more validation on requests, ultimately the sharing of the POCO may not save effort.
In this scenario wouldn't it be far easier to do:
public class CreateUserRequest
{
public string Username { get; set; }
public string Password { get; set; }
public bool SendWelcomeEmail { get; set; }
}
public class UserResponse
{
public int UserId { get; set; }
public string Username { get; set; }
public bool Enabled { get; set; }
}
public class User
{
[PrimaryKey, AutoIncrement, Alias("Id")]
public int UserId { get; set; }
public string Username { get; set; }
public string Password { get; set; }
public string Salt { get; set; }
public bool Enabled { get; set; }
}
We know now when we create a request (CreateUserRequest) that we don't have to consider UserId, Salt or Enabled.
When returning a list of users it's now List<UserResponse> and there is no chance the client will see any properties we don't want them to see.
It's clear to other people looking at the code, the required properties for requests, and what will be exposed in response.
Summary:
Sorry, it's a really long answer, but I think this addresses an aspect of sharing POCOs that some people miss, or fail to grasp initially, I was one of them.
Yes you can re-use POCOs for requests and response.
The documentation says it's OK to do so. In fact it is by design.
In many cases it will be fine to re-use.
There are cases where it's not suitable. (My scenario tries to show this, but you'll find as you develop your own real situations.)
Consider how many additional properties may be exposed because your shared POCO tries to support multiple actions, and how much extra validation work may be required.
Ultimately it's about what you are comfortable maintaining.
Hope this helps.
We have other approach, and my answer is opinionated.
Because we work not only with C# clients, but mainly with JavaScript clients.
The request and response DTO's, the routes and the data entities, are negotiated between
the customer and the front-end analyst. They are part of the specs in a detailed form.
Even if "customer", in some cases, is our product UI.
These DTO's don't change without important reason and can be reusable in both sides.
But the objects in the data layer, can be the same or partial class or different,
They can be changed internally, including sensitive or workflow information,
but they have to be compatible with the specification of the API.
We start with the API first , not the database or ORM.
Person { ... }
Address { ... }
ResponceDTO
{
public bool success {get; set;}
public string message {get; set;}
public Person person {get; set;}
public List<Address> addresses {get; set;}
//customer can use the Person & Address, which can be the same or different
//with the ones in the data-layer. But we have defined these POCO's in the specs.
}
RequestDTO
{
public int Id {get; set;}
public FilteByAge {get; set;}
public FilteByZipCode {get; set;}
}
UpdatePersonRequest
{
public int Id {get; set;}
public bool IsNew {get; set;}
public Person person {get; set;}
public List<Address> addresses {get; set;}
}
We don't expose only Request or Response DTOs.
The Person and Address are negotiated with the customer and are referenced in the API specs.
They can be the same or partial or different from the data-layer internal implementation.
Customer will use them to their application or web site, or mobile.
but the important is that we design and negotiate first the API interface.
We use also often the requestDTO as parameter to the business layer function,
which returns the response object or collection.
By this way the service code is a thin wrapper in front of the business layer.
ResponseDTO Get(RequestDTO request)
{
return GetPersonData(request);
}
Also from the ServiceStack wiki , the API-First development approach
This will not be a problem given you are OK with exposing the structure of your data objects (if this is a publicly consumed API). Otherwise, Restsharp is made to be used with simple POCOs :)
I think it all depends on how you're using your DTO's, and how you want to balance re-usability of code over readability. If both your requests and responses both utilize a majority of properties on your DTO's, then you'll be getting a lot of re-usability without really lowering readability. If, for instance, your request object has 10 properties (or vice-versa), but your response only needs 1 of them, someone could make an argument that it's easier to understand/read if your response object only had that 1 property on it.
In summary, good practice is just clean code. You have to evaluate your particular use case on whether or not your code is easy to use and read. Another way to think of it, is to write code for the next person who will read it, even if that person is you.
Summary
I am currently prototyping a (very straight-forward?) multi-tenant web-application where users (stored in database 1) can register to different tenants (stored in a database per tenant (same db schema). An architecture that I thought would apply to a lot of multi tenant solutions.
Sadly, I found out that cross database relations are not supported in Entity Framework (I assumed it's still the case for EF6). I provided the links below.
The next short sections explain my problem, and ultimately my question(s).
The rational behind the design
I choose to have separate databases; one for users (1), and one for each tenant with their customer specific information. That way a user does not have to create a new account when he joins another tenant (one customer can have different domains for different departments).
How it's implemented
I implemented this using two different DbContexts, one for the users, and one for the tenant information. In the TenantContext I define DbSets which holds entities which refer to the User entity (navigation properties).
The 'per-tenant' context:
public class CaseApplicationContext : DbContext, IDbContext
{
public DbSet<CaseType> CaseTypes { get; set; }
public DbSet<Case> Cases { get; set; }
// left out some irrelevant code
}
The Case entity:
[Table("Cases")]
public class Case : IEntity
{
public int Id { get; set; }
public User Owner { get; set; } // <== the navigation property
public string Title { get; set; }
public string Description { get; set; }
public Case()
{
Tasks = new List<Task>();
}
}
The User entity
[Table("Users")]
public class User : IEntity
{
public int Id { get; set; }
public string Name { get; set; }
public string EmailAddress { get; set; }
public string Password { get; set; }
}
This User entity is also contained by the Users database by my other DbContext derivative:
public class TenantApplicationContext : DbContext, IDbContext
{
public DbSet<Tenant> Tenants { get; set; }
public DbSet<User> Users { get; set; } // <== here it is again
// left out irrelevant code
}
Now, what goes wrong?
Expected:
What I (in all my stupidity) thought that would happen is that I would actually create a cross database relation:
The 'per-tenant' database contains a table 'Cases'. This table contains rows with a 'UserID'. The 'UserID' refers to the 'Users' database.
Actual:
When I start adding Cases I am also creating another table 'Users' in my 'per-tenant' database. In my 'cases' table the UserID refers to the table in the same database.
Cross database relations do not exist in EF
So I started googling, and found that this feature simply is not supported. This made me think, should I even use EF for an application like this? Should I move towards NHibernate instead?
But I also can't imagine that the huge market for multi tenant applications simply is ignored by Microsoft's Entity Framework?! So I most probably am doing something rather stupid.
Finally, the question...
I think the main question is about my 'database design'. Since I am new to EF and learning as I go, I might have taken the wrong turn on several occasions (is my design broken?). Since SO is well represented with EF experts I am very eager to learn which alternatives I could use to achieve the same thing (multi tenant, shared users, deployable in azure). Should I use one single DbContext and still be able to deploy a multi tenant web-application with a shared Users database?
I'd really appreciate your help!
Things learned:
NHibernate does support cross database relations (but I want to deploy into Azure and rather stick to microsoft technologies)
Views or Synomyms can be an alternative (not sure if that will cause more difficulties in Azure)
Cross database relations are not supported by EF:
EF4 cross database relationships
ADO.Net Entity Framework across multiple databases
Entity framework 4 and multiple database
(msdn forum with EF devs) http://social.msdn.microsoft.com/Forums/en/adodotnetentityframework/thread/cad06147-2168-4c20-ac23-98f32987b126
PS: I realize this is a lengthy question. Feel free to edit the question and remove irrelevant parts to improve the readability.
PPS: I can share more code if needed
Thank you so much in advance. I will gladly reward you with upvotes for all your efforts!
I don't quite understand why do you need cross database relations at all. Assuming your application can talk to the two databases, the user database and a tenant database, it can easily use the first database for authentication and then find related user in the tenant database with "by name" convention.
For example, if you authenticate a user JOHN using user database then you search for a user JOHN in the tenant database.
This would be much easier to implement and still match your requirements, users are stored in users database together with their passwords and "shadow copies" of user records but with no passwords are stored in tenant databases and there is NO physical relation between these two.
Is the following OK to do? I know Domain Models should never be used in views but is it ok to use Domain Models in your View Models? For some very small models it doesn't seem worth it to be creating and managing a View Model for them.
For Example
public class LoginDomainModel
{
public string Email { get; set; }
public string Password { get; set; }
public string DisplayName { get; set; }
public long UserTypeID { get; set; }
public virtual UserType UserType { get; set; }
}
public class UserTypeDomainModel
{
public UserType()
{
this.Logins = new List<Login>();
}
public long UserTypeID { get; set; }
public string UserType { get; set; }
public string Description { get; set; }
public virtual ICollection<Login> Logins { get; set; }
}
public class LoginViewModel
{
public string Email { get; set; }
public long UserTypeID {get; set;}
//Right here
public List<UserTypeDomainModel> UserTypesSelectList {get; set;}
}
Personally I use domain models in the view if they would naturally be an exact fit. That is likely to happen only on trivial projects that are CRUD in nature (editing the domain entities in a straightforward way). I find it a waste of time to create an exact copy of a domain entity for the sake of purity.
I will never modify a domain model in the slightest to account for needs of the view. In 95%+ of my projects, this is the circumstance I find myself in. The moment you pollute the domain for the sake of the view, you introduce maintainability headaches.
It depends on what you mean by "Domain model". Do you mean EF entities? Or do you mean business layer objects?
It's never a good idea to pass EF entities to the view, particularly if you're using default model binding. This can create security issues if you are not careful. Although the same issues can occur if you're not careful with business objects passed to the view.
One of the huge advantages of view models is that you have much finer control over mapping of data, so you can validate more easily that only the correct maps occur.
It all comes down to your app though. If it's a simple app, then it may not be worth the trouble of doing more complex mappings. If it's a complex app, that must live for a long time, and will likely to be updated a lot.. then you should definitely invest the effort.
I struggled for a long time with the perceived duplication caused by separate view models and domain models. I would assert that since they are intended for different purposes it's not really duplication, but it still feels "wrong" to declare so many similar properties.
In very small projects (especially ones with a highly trusted group of authenticated users) I may just bind directly to the domain models and be done with it. Or I may mix and match if the view model requires a different structure (as #Eric J. describes).
However: The ModelBinder will attempt to match values in the request to properties on your model. This means that any property on your domain model can potentially be populated by a (rogue) request. There are ways to prevent this, but for me the peace of mind outweighs a little extra effort creating separate view models.
I don't see an absolute need to create a separate view model for readonly, unbound values (possibly the list of user types in your case, though public virtual ICollection<Login> Logins may negate this).
Alternatively, you may wish to project the domain model to a UI-oriented abstraction (e.g. IEnumerable<SelectListItem>). You can use SelectListItems for a variety of input mechanisms, so you aren't tying yourself to a particular UI behavior.
Even with abstraction, you may still need to validate that the request doesn't contain an illegal value. For example, perhaps only super admins can assign certain UserTypeDomainModel IDs. Regardless of abstraction, you still need to validate this.
TLDR: abstract domain models as much as is practical, find appropriate abstractions (a new view model isn't always the correct answer), and be (slightly paranoid) about input validation.