EDIT 1: story example at the end
Years ago, we created tables in order to count how many products there were in our boxes.
There are two simple tables:
product (
code VARCHAR(16) PK,
length INT,
width INT,
height INT
)
box (
pkid INT IDENTITY(1,1),
barcode varchar(18),
product_code VARCHAR(16) FK,
quantity INT
)
And there two associated class:
public struct Product
{
public string Code { get; set; }
public int Length { get; set; }
public int Width { get; set; }
public int Heigth { get; set; }
}
public struct Box
{
public int Id { get; set; }
public string BarCode { get; set; }
public Product Product { get; set; }
public int Quantity { get; set; }
}
After years, we need to put multiple different products in the same box, so we now need this:
product (
code VARCHAR(16) PK,
length INT,
width INT,
height INT
)
-- box changed
box (
pkid INT IDENTITY(1,1),
barcode varchar(18)
)
-- stock created
stock (
box_pkid INT M-PK FK,
product_code VARCHAR(16) M-PK FK,
quantity INT
)
and this:
public struct Product
{
public string Code { get; set; }
public int Length { get; set; }
public int Width { get; set; }
public int Heigth { get; set; }
}
public struct Box
{
public int Id { get; set; }
public string BarCode { get; set; }
public Dictionary<Product, int> Content { get; set; } // <-- this changed
public int Quantity { get; set; }
}
But after years, we have lot of code, maybe duplicates in some dark places, left by leaving collaborators. I am a trainee, so I ask for my future experiences, in order to avoid this later.
What could be a solution to update our schema and keep data-integrity safe ? Even with millions of rows in DB ?
Example:
In 2014, we needed to store 10 Romeo and Juliet books in one box. If we had some Hamlet books, then we put them in another box. All 10 Romeo and Juliet books were the 'same' product (same cover, same content, same reference).
Today, we want to store, let's say, different Shakespear books in the same box. Or maybe different Love books. Or even Romeo and Juliet books AND figurines? So different products together: we should change the box table and Box class, shouldn't we?
You have many challenges; I'd split them into two high-level groups.
Firstly, how do you change your application at the code level, and secondly, how do you migrate your data from the old schema to the new one.
How do you change your code?
The first question is: can you be 100% certain that the classes you list are the only ways the data is accessed and modified? Are there any triggers, stored procedures, batch jobs, or other applications? I'm not aware of any way of finding this out other than by trawling through both the database schema artifacts, and the code base.
Within your "own" application, you have a choice. It's usually better to extend than modify your interface. In practical terms, that means that instead of changing the public Product Product { get; set; } signature to handle a dictionary, you keep it around, and add public Dictionary<Product, int> Content { get; set; } - if you can guarantee that the old method still works. This would mean limited re-writing all the dependencies of your class - you only have to worry about clients that need to understand that there could be more than 1 product in a box.
This allows you to follow a "lots of small changes, but the existing code continues to work" model; you can manage this via feature toggles etc. It's much lower risk - so the lesson here is "design your solution to be open to extension, but closed to change".
In this case, it doesn't seem possible - the "set" method may be okay (you can default that to a "one product in a box" solution), but the "get" method would have no graceful way of handling the case where you have more than 1 product in a box. If that's true, you change the class, and look for all the instances where your code won't compile, and follow the chain of dependencies.
For instance, in a typical MVC framework, in this case you'd be changing the model; this should cause the controller to report a compile error. In resolving that error, you almost certainly modify the signature of the controller methods. This in turn should break the view. So you follow that chain; doing this means your schema change becomes a "big bang, all-or-nothing" release. This typically is stressful for all involved...
How do you release your change?
This depends hugely on which of the two options you've chosen. #gburton's answer covers the database steps; these are necessary in both code options.
The second challenge is releasing new versions of your software; if it's a desktop client, for instance, you must make sure all clients are updated at the same time as your database change. This is (usually) horrible. If it's a web application, it's usually a little easier - fewer client machines to worry about.
Safely updating a legacy system is a classic problem. I'm guessing from your post that there isn't a nice safe dev copy of the DB, or at least one that is up to date, or you would already have a process to apply here.
I've written this in a system agnostic way even though you're obviously using MS SQL Server.
The key is to use caution, and ensure you are never 100% stuck if something goes wrong.
back up the old DB. Ensure you know how to do this without breaking anything.
Restore that backup into a new location.
figure out a test plan (this can be the longest part of the job)
make the changes to the new copy of the DB (don't touch the live one)
run through your test plan to ensure nothing has been broken.
If step 5 showed some errors, you just have to work through them. Once this is done, you have the scary part. The backup restore drill is critical here.
take backup of live database (your previous backup is probably out of date. you want as fresh a backup as possible to reduce data loss)
run a backup restore drill to make 100% sure you can recover
apply the changes to the live database
re-run your tests
Recovering a database down to the individual transaction is possible with many database engines. Consider using that process for step 6 if possible. How to achieve this would be a seperate question.
Related
I use SQL Azure and have application, which sync data with external resource. Data is huge, approx 10K records, so, I get it from DB one time, update something if necessary during some minutes and save changes. It works, but problem with simultaneously access to data. IF during these some minutes other service add changes, these changes will be rewritten.
But in the most cases it concerns fields, which my application does not touch!
So, for example, my Table Device:
public partial class Device : BaseEntity
{
public string Name { get; set; }
public string IMEI { get; set; }
public string SN { get; set; }
public string ICCID { get; set; }
public string MacAddress { get; set; }
public DeviceStatus Status { get; set; }
first service (application with long-term process) can modify SN, ICCID, MacAddress, but not Status, second service, vice versa, can modify only Status.
Code to update in the first service:
_allLocalDevicesWithIMEI = _context.GetAllDevicesWithImei().ToList();
(it gets entities, not DTO, because really there are many fields can be changed)
and then:
_context.Devices.Update(localDevice);
for every device, which should be changed
and, eventually:
await _context.SaveChangesAsync();
How to mark, that field Status should be excluded from tracing?
One simple method to avoid update the status field when calling the first service is create a update entity not include the status field, and create another update entity for the second service which includes the status field.
Another way to resolve this problem is override the SaveChangesAsync method and control the update logic by yourself, but it's complex I think and the behavior is implicit, it will not easy for others to understand your code.
To avoid rewrite, you can specify RowVersion on entities. This is so called optimistic concurrency, it will throw error if rewrite happens and you can retry operation if someone already changed something. Or you can just level up your Transaction level, to something like RepeatableRead/Serialized to lock these rows for entire operation (which of course will pose huge performance impact and timeouts). Second option is simple, and good enough for background jobs and distributed transactions, first one is more flexible and usually faster. but hard to implement across multiple endpoints/entities.
I have an ASP.NET 5 Web API application using EF Core with Npgsql (PostgreSQL).
I'm trying to implement a file download tracker that increments some fields in my database when a file is requested.
Let's say I have this object:
public class Stats
{
[Key]
public long ID { get; set; }
public long DownloadCount { get; set; }
}
This application will obviously have many users requesting many different files at the same time, resulting in lots of changes made to the same value for each file. Now, I'm wondering whether this type of thing is already implemented in the internal change tracker or not.
If it isn't, how can I implement it in a way that will make sure that every request is counted?
Using : dotnet core 1.1, entity framework code first, sql server.
Is there any elegant way to enable a user working on a large form, represented by a complexe model (40+ tables/C# objects), having multiple "required" fields, to save it's work temporarily and come back to complete it afterward?
Let's say I have this model :
[Table("IdentificationInfo", Schema = "Meta")]
public class IdentificationInfo : PocoBase
{
[...]
public int MetaDataId { get; set; }
[ForeignKey("MetaDataId")]
public virtual MetaData MetaData { get; set; }
public int ProgressId { get; set; }
[ForeignKey("ProgressId")]
public Progress Progress { get; set; }
public virtual MaintenanceInfo MaintenanceInfo { get; set; }
public int PresentationFormId { get; set; }
[ForeignKey("PresentationFormId")]
public PresentationForm PresentationForm { get; set; }
private string _abstract;
[Required]
public string Abstract
{
get { return _abstract; }
set { SetFieldValue(ref _abstract, value, "Abstract"); }
}
[...]
}
[Table("PresentationForm", Schema = "Meta")]
public class PresentationForm : PocoEnumeration
{
[...]
}
The user starts to fill everything (in a big form with multiples tabs or really long page!), but needs to stop and save the progress without having the time to save to fill the PresentationForm part, nor the abstract. Normally, in the database, those fields are not null, so it would fail when we try to save the model. Similarly, it would also fail with EF validation in the UI.
What would be nice is using the Progress property and disable EF model validation (model.isValid()), and also enable database insert even if the fields are null (it is not possible to put default values in those not nullable fields as they are often foreign keys to enum like table).
For the model validation part, I know we can make some custom validator, with custom annotation such as [RequiredIf("FieldName","Value","Message")]. I'm really curious about some method to do something similar in the database?
Would the easy way to do that be to save the model as JSON in a temporary table as long as the progress status is not completed, retrieve it when needed for edition directly from the JSON, and save it to the database only when the status is completed?
To support (elegantly) what you ask you should design it that way.
One table with it's required columns should be minimum segment that have to be inputted before any save. Should make segment optimal size.
You could set all fields to allow null but that would be very BAD design, so I would not consider that option at all.
Now if your input consist of several logical parts, and on form they could be different tabs so each tab is in one table in Db and main table have FKs of others tables.
That FK could be Nullable, and it would enable you to finish say first 2 tabs, Save it, and leave rest for after. So you will know that those FK column that have values are finished(and maybe could be edited still), while others are yet to be inserted. You can also have column Status:Draft/Active/...
What's more this design would allow you to have configurable tabs, so for example based on some chosen selection on main input you could chose what tables can be inputted, and which not and to enable/disable appropriate tabs.
If however you don't want FKs nullable than solution would be some temporary storage, one option being JSON in one string column, as you have mentioned your self. But I see no issues with nullable FKs in this case.
I'm developing a web app (not ASP.NET), and I encountered a small architectural problem:
So, i have two classes to work with users.
public class User
{
public int Id { get; set; }
public string Username { get; set; }
public string Password { get; set; }
// Other properties...
}
public class Profile
{
public int Id { get; set; }
public string PhotoUrl { get; set; }
public string DisplayName { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
public List<PostItem> Posts { get; set; }
}
I had to split these classes because there is a feature that allows you to view profile of the certain member, and obviously you don't want to retrieve data from database that contains user's password, name and other private stuff (though it's not displayed in view). So i'm storing this data in different tables: table Users contains personal infomation, while table Profiles contains public one (it can be viewed by anyone).
But at the same time, in order not to break Single responsibility principle, i had to implement UserRepository and ProfileRepository classes that does some checking, adding and other stuff.
And here they come:
Issue 1: code that handles user registration is turned into real hell now, i have to check if record with specific username exists in the two different tables by instantiating two repositories.
Issue 2: Also on the page where you can view public data, there is a need to display latest posts, but here is another problem: i can't store complicated values in one column, so i have to store posts in another table too. It means that i need to implement PostRepository and at the same time property Posts in Profile class is useless (though i need it to display latest posts in view), because in order to retrieve latest posts you need to look through other table inside UserRepository, but it should be handled by PostRepository. For example the same goes for comments.
So, this is my small problem. Any advices?
Ok, taking each item in turn;
1) Its perfectly normal to have the Identity of a user checked through one repository and their permissions to your application stored in another. In fact this is the basic idea behind federated identity. Consider that your might extend your application to allow Identity to be provided by Facebook, but permissions by your own application, and you will see that separating them makes sense.
2) Yes, absolutely. What makes you think that a high volume store like Posts is best served by the same repository that you store a low-change-rate set of data like Permissions in ? One might be in Mongo, the other in Active Directory, with the Identity being OAUTH. You see that since your own the whole application you see these as being unnecessary complexities, whereas they represent good architectural separation.
Identity => not owned by your application. Slow change rate.
Permissions => owned by your application. Slow change rate.
Posts => owned by your application. Fast change rage.
Just looking at those three use-cases, it seems that using different repositories would be a good idea since they have such different profiles. If ultimately your repositories all map to a SQL Server (or other) implementation, then so be it; but by separating these architecturally you can use the best possible underlying implementation.
I need advice for using Entity Framework 6. Suppose the website features many products with many reviews (think Amazon.com). Assuming most visitors view products more than writing reviews, then if I want to display the average user rating for each product, should I add a AverageReviewRating column to store its value to (supposedly) speed up query performance?
Is this a good or bad practice? The alternative would be to access each Review from the navigation property and calculate the average rating from that. What's the recommended approach?
public class Product
{
public int ProductID { get; set; }
public string Name { get; set; }
// Should I store the rating in a pre-calculated column or not?
public int AverageReviewRating { get; set; }
public virtual ICollection<Review> Reviews { get; set; }
}
I would say if it is something huge like amazon you can pre store the calculated value.
This would help in better performance while fetching .Also you can delegate the task of updating the average to some kind of background process.
I would not store the average in a separate field becase
I will need to update it frequently
I will not be saving much in terms of performance
If you query using an Entity Framework context and you request an average of the review properties, the review will not be fetched. Entity Framework will generate a SQL select statement returning the average.
Something like: SELECT AVERAGE(ReviewScore) FROM ReviewsTable Where ReviewedProductId = ProductId
Furthermore, database denormalization should only be used when you meet serious performance problems.