Is there any way to alter the underlying database using EF using Code First approach?
I have 2 tables which have a static model:
Users and Info1.
I also have another table which Ill call info2.
I would like to be able to add and remove columns from Info2 from the admin section of my website.
My goal is to have a website which can dynamically be altered as you go, adding and removing fields as the user likes, without the user having to know anything about coding.
I've considered using a separate database outside of the one specified in the model of my MVC3 project and do straight SQL requests to that instead.
This could also be accomplished by having a table with the dynamically created fields, and another with the data, but this gets messy fast.
Has anyone done anything like this? Is it a bad idea?
I'd recommend not trying to expand the table horizontally, that's an operation that you should make a conscious decision to have.
Instead, I'd recommend that you store the values as name/value pairs. You can have tables that have specific types (let's say you needed an integer value paired with a key), and then you would select those into a dictionary for the user.
You'd also have a table which has the keys, if you are concerned about replicating key values.
For example, you'd have a UserDefinedKey table
UserDefinedKeyId (int, PK) Key (varchar(?))
-------------------------- ----------------
1 'My Website'
2 'My favorite color'
Then you would have a UserDefinedString table (for string values)
UserDefinedStringId UserId UserDefinedKeyId Value
(int, PK) (int, FK) (int, FK) (varchar(max))
------------------- --------- ---------------- --------------
1 1 1 'http://stackoverflow.com'
2 1 2 'Blue'
3 2 2 'Red'
You'd probably want to place a unique index on the UserId and UserDefinedKeyId fields to prevent people from entering multiple values for the same key (if you want that, have a separate table without the unique constraint).
Then, when you want to add a value for users, you add it to the UserDefinedKey table, and then drive your logic off that table and the other tables which hold the values.
Another benefit of storing the values vertically is that you aren't wasting space for columns with values that aren't being used by all users.
For example, assuming you take the approach of modifying the table, for the attributes above, you would get:
UserId WebSite Color
------ ------- -----
1 http://stackoverflow.com Blue
2 (null) Red
Now let's say a third user comes along, and adds a Favorite Sports Team value, and they are the only one who uses it, the table then looks like:
UserId WebSite Color FavoriteSportsTeam
------ ------- ----- ------------------
1 http://stackoverflow.com Blue (null)
2 (null) Red (null)
3 (null) (null) Yankees
As the number of users and attributes grows, the amount of sparse data that you have will increase dramatically.
Now, assuming you are using SQL Server 2008, you could use sparse columns, if you don't, your table is going to get huge but not have much data.
Also, using sparse columns doesn't take away from the fact that it's pretty dirty to use data definition language (DDL) to change the schema on the fly.
Additionally, Entity Framework isn't going to be able to adapt it's object model to account for the new attributes; every time you have an attribute added, you will have to go and add the attribute to your object model, recompile, and redeploy.
With a vertical approach, it takes more work, granted, but it will be infinitely flexible, as well as utilize your database space more efficiently.
Related
BACKGROUND TO THE DOMAIN
I have a .NET application with a SQL Server database underneath. Each customer has their own database.
In that database, I have a table called Label which is empty on new installations, and gets populated when the user creates their own labels.
NEW REQUIREMENT
There is a new requirement that when the database is patched for the next release, we should add some default labels that come with the system.
These default labels must have the same ID in every customer database that they are added to (via SQL patching on upgrade), i.e. it must be 'static data'
ATTEMPT 1
I tried giving these 'default labels' an enormous ID number, e.g. above 200,000, since we 'know' to a decent degree of certainty that all customers will have no more than 5 - 10 rows in this table, so since the ID column is of type identity (1, 1), no customers will have already used this ID in this table.
But I've been told we want to avoid this because it seems like bad practice to mix static and dynamic data in one table.
ATTEMPT 2
I tried adding a new identical table called StaticLabel, also with an ID column of type int identity(1, 1).
I added a corresponding entity in the application (StaticLabelEntity.cs), an entity map (StaticLabelEntity.cs, using fluent NHibernate) and a repository (StaticLabelRepository.cs).
Now, in the existing LabelService.cs code that retrieves labels from the database (via the repository), I tell it to get both the Labels and StaticLabels, and combine them into one list of ILabel.
This works fine when viewing the labels config in my application
When it comes to assigning labels to an author (an author can have many labels), there's already an AuthorLabel table with a foreign key to the Author table (AuthorId) and one to the Label table (LabelId).
I guess we need to add another column to AuthorLabel, which is a foreign key to StaticLabel, and also add a check constraint so that only LabelId OR StaticLabelId is populated.
QUESTION
Does what I've done in Attempt #2 sound like a good idea? It seems a bit weird and like there might be some better way that I haven't heard of due to lack of experience
It starts to get weird in the code if I do this, because I end up with two properties on the AuthorLabel entity (a Label and a StaticLabel) where one is always null.
This will then propagate through the code and I'll end up with lots of 'if staticlabel property is not null, then x, else y' etc - it feels a bit messy.
I need to do a BULK INSERT of several hundred-thousand records across 3 tables. A simple breakdown of the tables would be:
TableA
--------
TableAID (PK)
TableBID (FK)
TableCID (FK)
Other Columns
TableB
--------
TableBID (PK)
Other Columns
TableC
--------
TableCID (PK)
Other Columns
The problem with a bulk insert, of course, is that it only works with one table so FK's become a problem.
I've been looking around for ways to work around this, and from what I've gleaned from various sources, using a SEQUENCE column might be the best bet. I just want to make sure I have correctly cobbled together the logic from the various threads and posts I've read on this. Let me know if I have the right idea.
First, would modify the tables to look like this:
TableA
--------
TableAID (PK)
TableBSequence
TableCSequence
Other Columns
TableB
--------
TableBID (PK)
TableBSequence
Other Columns
TableC
--------
TableCID (PK)
TableCSequence
Other Columns
Then, from within the application code, I would make five calls to the database with the following logic:
Request X Sequence numbers from TableC, where X is the known number of records to be inserted into TableC. (1st DB call.)
Request Y Sequence numbers from TableB, where Y is the known number of records to be inserted into TableB (2nd DB call.)
Modify the existing objects for A, B and C (which are models generated to mirror the tables) with the now known Sequence numbers.
Bulk insert to TableA. (3rd DB call)
Bulk insert to TableB. (4th DB call)
Bulk insert to TableC. (5th DB call)
And then, of course, we would always join on the Sequence.
I have three questions:
Do I have the basic logic correct?
In Tables B and C, would I remove the clustered index from the PK and put in on the Sequence instead?
Once the Sequence numbers are requested from Tables B and C, are they then somehow locked between the request and the bulk insert? I just need to make sure that between the request and the insert, some other process doesn't request and use the same numbers.
Thanks!
EDIT:
After typing this up and posting it, I've been reading deeper into the SEQUENCE document. I think I misunderstood it at first. SEQUENCE is not a column type. For the actual column in the table, I would just use an INT (or maybe a BIGINT) depending on the number of records I expect to have). The actual SEQUENCE object is an entirely separate entity whose job is to generate numeric values on request and keep track of which ones have already been generated. So, if I understand correctly, I would generate two SEQUENCE objects, one to be used in conjunction with Table B and one with Table C.
So that answers my third question.
Do I have the basic logic correct?
Yes. The other common approach here is to bulk load your data into a staging table, and do something similar on the server-side.
From the client you can request ranges of sequence values using the sp_sequence_get_range stored procedure.
In Tables B and C, would I remove the clustered index from the PK
No, as you later noted the sequence just supplies the PK values for you.
Sorry, read your question wrong at first. I see now that you are trying to generate your own PK's rather then allow MS SQL to generate them for you. Scratch my above comment.
As David Browne mentioned, you might want to use a staging table to avoid the strain you'll put on your app's heap. Use tempdb and do the modifications directly on the table using a single transaction for each table. Then, copy the staging tables over to their target or use a MERGE if appending. If you are enforcing FK's, you can temporarily remove those constraints if you choose to insert in reverse order (C=>B=>A). You also may want to consider temporarily removing indexes if experiencing performance issues during the insert. Last, consider using SSIS instead of a custom app.
Note 1: I REPHRASED THE QUESTION. It now consists of Suppliers and Orders, instead of Cars and Parts.
Note 2: THIS PROBLEM IS HYPOTHETICAL.
My goal is to understand how to create object counters.
For regulatory requirements, I need TO SEQUENTIALLY NUMBER EACH Order for each of the suppliers.
I'm Using 'Entity Framework` with Sql Server.
In my hypothetical example, I have a Supplier class and an Order class.
Each supplier has Orders. Each order has a product and a quantity. Meaning, it states which product was ordered from the supplier and how many of it.
I need to be able to create counters, like an auto incremented number, to count the orders FOR EACH supplier.
For regulatory reasons, each supplier must sequentially number its orders, in the order of creation, and using an integer only.
When we examine an Order, We should know by its OrderCountForSupplier column, what was its order of creation (a DateTime / TimeStamp column is insufficient by the regulatory authorities. They require such a counter).
For simplicity of this question, an order cannot be deleted (it's status can change, but it cannot be deleted).
It's very important for me to have a solution which includes the technical/programming way, not only theoretic way.
I've made a diagram in order to explain my problem in the most clear way possible:
I have a way that might work, and would be glad to hear feedback.
I'm thinking of an external table/tables, to hold the counters. Something like:
Supplier Order Counters Table
| SupplierId | OrderCountForSupplier
------------------------
| 54654 | 3
| 78787 | 2
| 99666 | 4
Would I need a trigger in order to increment the OrderCountForSupplier counter on each insertion, for each supplier?
If not - how can this incremental be done in a safe way ? (without for example, two processes in a race condition to get the next counter and increment it, which could eventually result in a duplicate Order Count).
And another note:
Can this be done Entity Framework wise? if not - a Sql Server solution will be respected.
First answer, the example in the question has changed after it was written.
You say that is it OK to have gaps in the Part IDs, because "some parts might be deleted along the way".
So, what's the difference between your example:
Car PartID
54654 1
54654 2
54654 3
78787 1
78787 2
99666 1
99666 2
99666 5
99666 7
And this variant:
Car PartID
54654 1
54654 2
54654 3
78787 4
78787 5
99666 6
99666 7
99666 8
99666 9
In the second variant each part has some ID that is unique for each car (it is also globally unique as well, but it doesn't matter). In the second variant PartID specifies the order in which parts were inserted into the table, same as in the first variant.
So, I'd use a simple IDENTITY column:
Parts
PartID int IDENTITY NOT NULL (PRIMARY KEY)
CarLicenseNum int NOT NULL (FOREIGN KEY)
PartName varchar(255)
Update for Supplier-Order example
The most important bit in the updated question is "regulatory reasons". It answers the question why would you want to do such unnatural thing. "Regulatory" and efficiency are often opposite.
Essentially, it means that you have to use serializable transaction isolation level when inserting a new row and calculating the next number in the sequence. It will hurt concurrency/throughput, but it will guarantee consistency and "be safe" in multi-user environment.
I don't know how to do it in Entity Framework, it should be possible. But, again, for "regulatory reasons" I'd put this logic in the stored procedure in the DB and make sure that ordinary users don't have write access to the Orders table directly, but have rights only to execute this dedicated stored procedure. You can replicate the logic of this stored procedure in the EF code, but the database itself will be open to changes done through other applications, which may not follow the regulatory requirements.
You can implement it using the separate table, which stores the latest sequence number for each supplier, or you can read the last maximum sequence number on the fly. If each supplier has only few orders, then this separate table with latest values of counters would be comparable to Orders table and you would not gain much. In any case, having a proper index is the key. Getting the latest counter value would be one seek in the index.
Here is an example of stored procedure without using an extra table.
Make sure that Orders table has unique index on (SupplierId, OrderCountForSupplier). In fact, you must have this index even if you are using an extra table to enforce the constraint.
CREATE PROCEDURE [dbo].[AddOrder]
#ParamSupplierID int,
#ParamProductSerial varchar(10),
#ParamQuantity int,
#NewOrderID int OUTPUT
AS
BEGIN
SET NOCOUNT ON;
SET XACT_ABORT ON;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
BEGIN TRY
DECLARE #VarMaxCounter int;
SELECT TOP(1) #VarMaxCounter = OrderCountForSupplier
FROM dbo.Orders
WHERE SupplierID = #ParamSupplierID
ORDER BY OrderCountForSupplier DESC;
SET #VarMaxCounter = ISNULL(#VarMaxCounter, 0) + 1;
INSERT INTO dbo.Orders
(SupplierID
,OrderCountForSupplier
,ProductSerial
,Quantity)
VALUES
(#ParamSupplierID
,#VarMaxCounter
,#ParamProductSerial
,#ParamQuantity);
SET #NewOrderID = SCOPE_IDENTITY();
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
-- TODO: handle the error
SET #NewOrderID = 0;
ROLLBACK TRANSACTION;
END CATCH;
END
GO
After investigating some possible approaches (see links at the bottom), I've came out with a very basic solution, with the help of #Vladimir Baranov.
I've ruled out using SqlServer triggers / Stored Procedures. They seemed hard to implement in conjunction with Entity Framework, and they seem to me like an Over-Kill in this scenario.
I've also ruled out the Optimistic Concurrency approach (using a concurrency token), because in this scenario, the counters cannot be updated simultaneously. They only get updated after a successful insertion to the orders table.
My orders table looks like that. I've added a unique constraint on the OrderId, SupplierId and OrderCountForSupplier trio, so insertion of the same order count for a supplier would fail.
I've indeed used a counters table, from which I can take the latest counter - for each of the suppliers.
Supplier Order Counters Table
| SupplierId | OrderCountForSupplier
------------------------
| 54654 | 3
| 78787 | 2
| 99666 | 4
These are the steps:
Get the current supplier orders counter.
Try insert a new order for the supplier, using the current counter + 1.
If the insertion goes ok => Increase the orders counter for this supplier, on the supplier counters table.
If insertion goes wrong, and we get an error stating the has been a violation of the constraint (same order count, which already exists):
Try 2 more times to get the current counter, and try inserting the order again.
The Code:
public class SupplierRepository
{
private MyContext _context;
private Supplier _supplier;
public SupplierRepository(int supplierId)
{
_context = new MyContext();
_supplier = context.Suppliers.Single(x => x.SupplierId == supplierId);
}
// Retrieve the latest counter for a supplier
public SupplierCounter GetCounter()
{
var counterEntity = _context.SupplierCounters.Single(x => x.SupplierId == _supplier.SupplierId);
return counterEntity;
}
// Adding a supplier
public void AddSupplier(Order order)
{
int retries = 3;
while (retries > 0)
{
SupplierCounter currentCounter = GetCounter();
try
{
// Set the current counter into the order object
_order.OrderCountForSupplier = currentCounter.OrderCountForSupplier;
_context.Add(order);
// Success! update the counter (+1) and then break out of the while loop.
currentCounter.OrderCountForSupplier += 1;
// I'M CALLING `SAVECHANGES` AFTER ADDING AN ORDER AND INCREASING THE COUNTER, SO THEY WOULD BE IN THE SAME TRANSACTION.
// THIS WOULD PREVENT A SCENARIO WHERE THE ORDER IS ADDED AND THE COUNTER IS NOT INCREMENTED.
_context.SaveChanges();
break;
}
catch (SqlException ex)
{
if (ex.Number == 2627) // Violating unique constraint
{
--retries;
}
}
}
}
}
Some useful links:
SQL Server Unique Composite Key of Two Field With Second Field Auto-Increment
Atomic Increment with Entity Framework
how to inc/dec multi user safe in entity framework 5
This is not a real world example. That's why you are struggling. For an example, A real world parts entity is lot more complicated than that. A real world part will have a ManufacturerId (BMW, Audi etc), PartNumber(B4-773284-YT), VehicleModelId (AUDI A4 etc), Description, ManufacturerYear so on and so forth. Usually when it comes to parts entities, we use a concatanated primary key on ManufacturerId and PartNumber.
Same with your car table. It's not a real world example too. Car entity should have a VIN number, which is unique. When you say each part is specific, you are not talking about Part entity. You are talking about PartInventory entity. PartInventory has a unique serial number (barcode) for each part. So every single part can be identified uniquely. When you attach a part to a vehicle, you are not just attaching a Part, you are actually attaching a PartInventory item, which is recognizable by a unique serial number.
Once the partInventory item is attached to a vehicle, it becomes a fitted part item of the vehicle. Which means the part gets transferred to VehicleParts table.
Unfortunately I see a lot of gaps in your vehicle industry domain knowledge. We develop systems to address real world problems. When you try to address hypothetical problems, you run in to this kind of issues. That leads to wasting lot of other peoples time who are trying to help you out.
First things first: it is not OK to change your question entirely! Delete this question and create a new one. Having said that ...
Answer of the current question:
Answers to hypothetical questions are just oppinion based and/or too broad (there is actually a flag for this - Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise.)!
My answer to the current question is: I do not see any benefit (or advantage or use) of the OrderCountForSupplier in the database! Creating such counter in the database makes adding and maintenance (in a multi-threaded environment) very complicated and error-prone.
I think the problem can be solved more easily with the help of EF (move the creation of the counters in the code) and a different design of the database:
in order to allow concurrent adding of Orders, create two columns - a GUID as the Order-PrimaryKey and a CreationDate of type DateTime. Filling those two columns from multiple threads is not a problem
when retrieving all Orders for a specific SupplierId, sort the result list in ascending order by CreationDate
when iterating over the result list using (for example) a for-loop, then the counter is the desired sequential counter
as an alternative to the EF solution, the creation of the sequential counter can stay in SQL - create a view or stored procedure for the Order items and use ROW_NUMBER to create the artificial sequential count, after grouping the items over SupplierId and sorting on CreationDate
Reading the database from multiple threads (and creating the counter in every thread) is again not a problem any more.
Answer of the first question:
You are almost there. You need to normalize your data model a little bit more. This is a common scenario in which you want to minimize redundancy of the data and at the same time still maintain a meaningful relation (without the use of triggers).
One possible solution would be to create a Car_has_Part-Table in order to represent the relation between a Car and a Part entity:
| Car_has_Part |
----------------
| PartId |
| CarId |
The primary key of the Car_has_Part table is a composite primary key consisting of CarId + PartId which is unique and at the same time you avoid data duplication.
In your example in the Parts table the Doors part is repeated for every Car. Using this intermediate table the data is not duplicated and you have a proper relation.
Your new data model could look like this:
| Car | | Car_has_Part | | Part |
------- ---------------- ----------
|CarId | | PartId | | PartId |
|Model | | | | Descr |
| etc. | | CarId | | etc. |
This model allow resp. covers the specified requirements:
I need to be able to create a counter, like an auto incremented
number, to count the parts for each car. Car 1, could have parts 1, 2,
3... and Car 2 would also have parts 1, 2, 5, 7... (some parts might be deleted along the way).
Select all PartId's from the Car_has_Part table over CarId.
Each part HAS to be counted separately for its related car. That's the
base requirement.
Same as above (without data duplication like in your example). Adding resp. removing a relation or modifying a part name has also become easier - you need to update only one row in the Parts table and the change is reflected for every car.
About the triggers question - you can only create a trigger with EF (using code first approach). Regarding execution - triggers are always executed in the database and EF can't control trigger execution (you can certainly enable/disable trigger using raw SQL queries, but if I understand your question correctly this is not what you want).
I have an application running that has entities that might be: CustomerType1, CustomerType2, and CustomerType3.
All three CustomerType entities might have completely different information, but they all have a CustomerID field which is an integer.
I am trying to figure out how to set things up so that no matter which type is created, the CustomerID will always be unique across all three types, and remain an integer.
For example, creating the following would result in the following CustomerID
CustomerType1 - 1
CustomerType1 - 2
CustomerType1 - 3
CustomerType2 - 4
CustomerType1 - 5
CustomerType3 - 6
CustomerType1 - 7
What is the best way to approach this?
2 possible approaches:
Use a single table for all of your customer types with a "discriminator" field to track each customer type, and include the CustomerID as its identity.
Use an external table to manage creating the CustomerID as an identity field.
The first approach has the advantage of having direct support in the Entity Framework for as outlined in the following tutorial:
http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc/implementing-inheritance-with-the-entity-framework-in-an-asp-net-mvc-application
Note that although your customer types may contain different data, this approach could end up being cheaper in the long run in terms of scalability despite the wasted database space. In your business layer, the customer types could simply ignore the fields that don't pertain to them.
The second approach would probably be best suited to adding onto existing applications that are too difficult to change. In the long run, there is more work involved with keeping track of the IDs this way. For one, your business layer will need to fetch an ID from one table in order to insert into another table, which can be expensive for large datasets. Depending on the requirements of the business layer, there may also be scenarios where you have to discard an unused CustomerID and it would simply not exist in the system (you would skip from CustomerID 58 to CustomerID 60 for example).
Your approach is similar to having the same entity type in network databases, with unique IDs.
Usually, many developers use a UID or related type. Its a type that has from 64, 128, or more digits, and its generated randomly and automatically. If you use the same database in different machines, its almost imposible to get the same value.
Some databases store that value as a string, instead of an integer.
If you only had a single table with integer keys, how do you generate the primary key value ? Automatic ? Do you generate the value in code, and later assigned to the primary key field ?
Solution 1
If your database supports U.I.D. or O.I.D. or Unique identifiers, that are generated automatically, as an integers, use them.
Solution 2
If your database supports U.I.D. or O.I.D. as varchar / string, or the database engine, or program that you use, have a function that generates U.I.D., you may use that function, cast the result value from string to integer, (stripping separators like "-"), and stored in an integer primary key field.
Summary
Many developers prefer to let the database engine generate the primary key automatically, when inserting a new record. In cases, likes this, its better, to generate the primary key in code, and assign it directly. Since you are using "Entity Framework", I ignore how does that library handles primary keys.
Cheers.
I have several tables within my database that contains nothing but "metadata".
For example we have different grouptypes, contentItemTypes, languages, ect.
the problem is, if you use automatic numbering then it is possible that you create gaps.
The id's are used within our code so, the number is very important.
Now I wonder if it isn't better not to use autonumbering within these tables?
Now we have create the row in the database first, before we can write our code. And in my opinion this should not be the case.
What do you guys think?
I would use an identity column as you suggest to be your primary key(surrogate key) and then assign your you candidate key (identifier from your system) to be a standard column but apply a unique constraint to it. This way you can ensure you do not insert duplicate records.
Make sense?
if these are FK tables used just to expand codes into a description or contain other attributes, then I would NOT use an IDENTITY. Identity are good for ever inserting user data, metadata tables are usually static. When you deploy a update to your code, you don't want to be suprised and have an IDENTITY value different than you expect.
For example, you add a new value to the "Languages" table, you expect the ID will be 6, but for some reason (development is out of sync, another person has not implemented their next language type, etc) the next identity you get is different say 7. You then insert or convert a bunch of rows having using Language ID=6 which all fail becuase it does not exist (it is 7 iin the metadata table). Worse yet, they all actuall insert or update because the value 6 you thought was yours was already in the medadata table and you now have a mix of two items sharing the same 6 value, and your new 7 value is left unused.
I would pick the proper data type based on how many codes you need, how often you will need to look at it (CHARs are nice to look at for a few values, helps with memory).
for example, if you only have a few groups, and you'll often look at the raw data, then a char(1) may be good:
GroupTypes table
-----------------
GroupType char(1) --'M'=manufacturing, 'P'=purchasing, 'S'=sales
GroupTypeDescription varchar(100)
however, if there are many different values, then some form of an int (tinyint, smallint, int, bigint) may do it:
EmailTypes table
----------------
EmailType smallint --2 bytes, up to 32k different positive values
EmailTypeDescription varchar(100)
If the numbers are hardcoded in your code, don't use identity fields. Hardcode them in the database as well as they'll be less prone to changing because someone scripted a database badly.
I would use an identity column as the primary key also just for simplicity sake of inserting the records into the database, but then use a column for type of metadata, I call mine LookUpType(int), as well as columns for LookUpId (int value in code) or value in select lists, LookUpName(string), and if those values require additional settings so to speak use extra columns. I personally use two extras, LookUpKey for hierarchical relations, and LookUpValue for abbreviations or alternate values of LookUpName.
Well, if those numbers are important to you because they'll be in code, I would probably not use an IDENTITY.
Instead, just make sure you use a INT column and make it the primary key - in that case, you will have to provide the ID's yourself, and they'll have to be unique.