In my database, there is a table which essentially contains questions with their options and answers. The first field is ~questionid~ and is the primary key, as expected (I've disabled AUTOINCREMENT for now). It's possible that my client wants to delete some questions. This leaves me with two options:
All subsequent questions move up so that there is no empty row. This option implies that those questions will have their question id changed
Leave it as it is so there will be empty rows. If a there's a new entry, it should fill the first empty row.
How do I go about implementing any of them? I prefer the second, actually, but if anyone has a different opinion, it's welcome.
I'm using a MySQL database and C#.
You are using a database so you don't have to worry about these issues.
There is no concept of "empty" row in a SQL table (well, one could say if all the columns are NULL then the row is empty, but that is not relevant here). Rows in a SQL table are not inherently ordered.
The rows themselves are stored on pages, which may or may not have extra space for more rows. This may be what you are thinking of when you think of an empty row.
When a row is deleted, the data is not rearranged. There is just some additional space on the page in case a new row is added later. If you add in a new row with a primary key between two existing rows, and the page is full, then the database "splits" the page into two. The two other pages have extra space.
The important point, though, is not how this works. One reason you are using a relational database for your application is so you can add and delete rows without having to worry about their actual physical storage.
If you have a database that has lots of transactions -- deletions and insertions -- then you may want to periodically rearrange the data so it fits better on the pages. Such optimizations though are usually necessary only when there is a high volume of such transactions.
One thing, though. Your application should not depend on the primary keys being sequential, so it can handle deletes correctly.
I am not sure how you have implemented. I would have done it in this way,
questions
question_id - pk
question
answers
answer_id - pk
answers
question_answer
question_id
answer_id
This will give more advantage. many questions will have same answer. if a question can be deleted then delete them along with their answers from question_answer table
Related
We have many lookup tables in the system and if it's already referred by some other tables, we shouldn't be allowed to update or delete the look-up table "value" column. eg: EnrollStatusName in below table.
Eg:
Lookup table: EnrollStatus
ID
EnrollStatusName
1
Pending
2
Approved
3
Rejected
Other table: UserRegistration
URID
EnrollStatusID(FK)
11
1
12
1
13
2
In this now I can edit Lookup table row 3 since it's not referring anywhere.
The solution which comes to my mind is to add a read-only column to look up the table and whenever there is a DML to the UserRegistration table, update the read-only column to true. Is there any other best approach to this? It can be either handling in application code or in SQL hence I'm tagging c# also to know the possibilities.
Delete is easy; just establish a foreign key relationship to some other table, and don't cascade or setnull. It's no longer possible to delete the in-use row because it has dependent rows in other tables
Update is perhaps trickier. You can use the same mechanism and I think it's neatest, instead of doing the update as an update, do it as a delete and insert - if the row is in use the foreign key will prevent the delete..
Belayer pointed out in the comments that you can use UPDATE also; you'll have to include the PK column in the list of columns you set and you can't set it to the same value it already is, nor to a value that is already in use. You'll probably need a strategy like two updates in a row if you want to have a controlled list of IDs
UPDATE EnrollStatus SET id=-id, EnrollStatusName='whatever' WHERE id=3
UPDATE EnrollStatus SET id=-id WHERE id=-3
A strategy of flipping it negative then back positive will work out only if it's not in use. If it is used then it will error out on the first statement.
If you don't care that your PKs end up a mix of positives and negatives (and you shouldn't, but people do seem to care more than they should about what values PKs have) you can forego the second update; you can always insert new values as positive incrementing and flipflop them while they're being edited before being brought into use..
I need to update a column in a large table (over 30 million rows) that has no primary key. A table row has a unique email address column. The update involves generating a value that must occur in C# and appending it to a column value. So the row must be read, the column value updated, and written back out.
I was hoping there was a concept of cursoring in ADO.NET but I do not see this. I can read the rows quickly enough, but the update call, using a WHERE clause for the email address, takes forever. After researching this most answers seem to be "put in a primary key!" but that is not an option here. Any thoughts?
For a 30mil rows heap, there's not many options. Without any index you can do basically nothing to speed it up.
Only solution is to check a fragmentation of a heap. You should add a clustered index to alleviate the table fragmentation, then drop it immediately. But if you cannot affect that table in any way, it could be faster to move all the data into a new table :-)
I am rewriting a new timesheet application including redesigning database and it will require data migration from Oracle to Oracle.
In the old system field ‘EmployeeCod’ is a Primary Key and it is in Alphanumeric form i.e. ‘UK001’, ‘UK002’,‘FR001’,’FR002’, ‘US001’ . Employee table is also linked to timesheet and other tables where the EmpCode is being referred as a FK.
To make the JOINs perform faster in the new system I was thinking about adding a new INT column in the Employee table and set it to PK. (Don't know if it will make any big difference)
-Employee table has about 600 rows.
-Data type of EmpCode is Varchar2(20) in old DB which I can reduce to Varchar2(6) in the new system and alter it later as company expends.
I am wondering if it is better to keep the EmpCode as a Primary Key which will make things easier in migrating data or should I add a INT column?
Someone has given me following advise in one of my previous thread:
“if you need to create a composite code of AANNN then I'd split this into two: a simple 'Prefix' field of CHAR(2) and an identity field of INT, then turn EmpCode into a computed field that concats the two and stick an index on there that (#Chris)”
I am not sure if this option would work as employee table is linked to other tables as well. (EmpCode is being used as FK in other tables)
n
If you do add this PK, and also keep the former PK, you will have some data management issues to deal with. Or perhaps your customers. Getting rid of the old PK may not be feasable if there are existing users who will be upgrading to the new database.
If EmployeeCode, the former PK is used by the users of the data to identify Employees, then you will have to add a constraint to make sure that this field is unique. Carrying both codes will wipe out any performance gains you were hoping for.
If it were me, I'd leave well enough alone. The performance gains, if any, will be trivial.
The performance difference will be negligible if the index you're creating on the alphanumeric field is the clustered index for the table. Which, based off of your question is going to be the case, but I wanted to note that for completeness. I say this for two reasons:
A clustered index is the physical order of the table and so when seeking against that index, looking for more data presumably off of the data page in a query, a binary search can be performed against it because it's also physically stored in that order.
A binary search is just about as efficient as you can get, lest we forget though a statistical index. I call this out because integer primary keys build statistical indexes which are as fast a seek as you can get because mathmatically speaking we know 2 comes after 1 for example.
So, just keep that in mind when building alphanumeric, or even compound, keys and indexes and trying to compare the difference between them and an integer key. Personally, I prefer to stick with integer primary keys because I have found them to perform better over time during extreme growth.
I hope this helps.
I use alphanumeric primary keys regularly and see absolutely no issues with it. There is no performance issue, you have a wider addressable space, and you can be more expressive/human readable. Integer keys are just a convention.
Add to that the risk you're adding to you project by adding a major architectural change over and above the porting issues, I'd say stick with the existing schema as much as possible.
There will be no performance improvement - in fact, unless you know and can prove/measure that you have a performance problem, changing things "to make them faster" usually leads to pain.
However, there is a concern that your primary key appears to carry meaning - it's a country code, concatenated with a number. What if an employee moves from the US to the UK? What if the UK hires its 1000th employee?
For that reason, I'd refactor the application to use a meaningless primary key; whether it's an INT or a VARCHAR is not hugely relevant.
You do occassionally come across alphanumeric primary keys.. personally I find it just makes life more difficult.. if you are able to change it and you want to change it, I would say go ahead.. it will make things easier for you later. As for it being an FK, you would need to be careful to write a script to properly update all the data. One way you can do this is:
Step 1: Create a new int column for the PK and set Identity Insert to true
Step 2: Add a new int column in your child table and then:
Step 3: write an update script like this:
UPDATE childTable C
INNER JOIN parentTable P ON C.oldEmpID = P.oldEmpID
SET C.myNewEmpIDColumn = P.myNewEmpIDColumn
Step 4: Repeat steps 2 & 3 for all child tables
Step 5: Delete all old FK columns
Something like that and don't forget to backup your current DB first ;)
I have a C# app which allows the user to update some columns in a DB. My problem is that I have 300.000 records in the DB, and just updating 50.000 took 30 mins. Can I do something to speed things up?
My update query looks like this:
UPDATE SET UM = 'UM', Code = 'Code' WHERE Material = 'MaterialCode'.
My only unique constrain is Material. I read the file the user selects, and put the data in a datatable, and then I go row by row, and update the corresponding material in the DB
Limit the number of indexes in your database especially if your application updates data very frequently.This is because each index takes up disk space and slow the adding, deleting, and updating of rows, you should create new indexes only after analyze the uses of the data, the types and frequencies of queries performed, and how your queries will use the new indexes.
In many cases, the speed advantages of creating the new indexes outweigh the disadvantages of additional space used and slowly rows modification. However, avoid using redundant indexes, create them only when it is necessary. For read-only table, the number of indexes can be increased.
Use non clustered index on the table if the update is frequent.
Use clustered index on the table if the updates/inserts are not frequent.
C# code may not be a problem , your update statement is important. Where clause of the update statement is the place to lookout for. You need to have some indexed column in the where clause.
Another thing, is the field material, indexed? And also, is the where clause, needed to be on a field with a varchar value? Can't it be an integer valued field?
Performance will be better if you filter on fields having integers and not strings. Not sure if this is possible for you.
I know you can insert new items to your SQL database (LINQ to SQL, code generated by SQLMetal.exe). You can attach new items with the Attach method in your entity table and what not, or you can edit existing records.
Now, let's say, instead of one new entity, you're presented with a lot - some of which may well already exist within the table. There is a primary key, but it's possible there may be some altered records in the collection, so the primary key probably isn't going to be the best method of figuring out what's changed.
Do I have to go through every record in my LINQ table and then compare all of its column data with all of the column data in the entities in the collection in question? This would tell me which ones are new, which ones have had changes, and which ones can be discarded. This just seems like a really long winded way of doing it.
Is there an easier way?
Thanks.
I think an "UPSERT" is what your after.
It's basically a combined insert/update command for sql, if it exists update it, if not create it.
http://www.databasejournal.com/features/mssql/article.php/3739131/UPSERT-Functionality-in-SQL-Server-2008.htm