We have a production database using Change Data Capture(CDC) feature to maintain audit data for few tables. But because of the performance impacts and need to make changes to database structure like adding indexes which we can’t do due to CDC, we now want to disable CDC.
Requirement is to capture all the Insert, Delete and Update actions from web application for a set of tables in a single audit table.
Let's take for example I have to maintain Audit information for TableA in TableB for each Add, Update and Remove done to TableA through web application [in C#].
I'm currently updating each stored procedure which can cause either of these 3 actions for each of the table in the set. But it's too error prone and time taking method as I have huge list of tables.
Please help with a better way which can be more efficient both in terms of time and performance to achieve this.
Quick answer : Just make sure that all actions that will change the DATABASE is recorded!
In our project, we created a separate table which has the Following Fields
Date_And_Time Actions UserID Part_of_Program
whenever a user executes an insert, update or delete command, or logs in or out of the system, or the system runs an automatic operation inside our database, the program automatically inserts a record here which tells us what action was taken, who did it and what what part of the program was affected.
..Or
you could write code that constantly checks the current size of your tables. If the size changes, then record it in the audit table.
If the size increases, then data was inserted. otherwise data was deleted.
Related
Evening all,
Background: I have many xml data files which I need to import in to an SQL database via my C# WPF based Windows Application. Whilst some data files are a simple and straight forward INSERT, many require validation and verification checks, require GUIDs from existing records (from within the db) before the INSERT takes place in other to maintain certain relationships.
So I've broken the process in to three stages:-
Validate records. Many checks exist such as e.g. 50,000 accounts in xml file must have a matching reference with an associated record already in the database. If there is no corresponding account, abandon the entire import process; Another would be, only the 'Current' record can be updated, so again, if it points to a 'historic' record then it needs to crash and burn.
UPDATE the associated database records e.g Set from 'Current' to 'Historic' as record is to be superseded by what's to come next...
INSERT the records across multiple tables. e.g. 50,000 accounts to be inserted.
So, in summary, I need to validate records before any changes can be made to the database with a few validation checks. At which point I will change various tables for existing records before inserting the 'latest' records.
My first attempt to resolve this situation was to load the xml file in to XmlDocument or XDocument instance, iterate over every account in the xml file and perform an sql command for every account. Verify it exists, verify its a current account, change the record before inserting it. Rinse and repeat for thousands of records - Not ideal to say the least.
So my second attempt is to load the xml file in to a data table. Export the corresponding accounts from the database in to another data table and perform a nested loop validation e.g. does DT1.AccountID exist in DT2.AccountID, move DT2.GUID to DT1.GUID etc etc. I appreciate this could also be a slow process. That said, I do have the luxury then of performing both the UPDATE and INSERT stored procedures with a table value parameter (TVP) and making use of the data table information.
I appreciate many will suggest letting the SQL do all of the work but i'm lacking in that skillset unfortunately (happy to learn if thats the general consensus) but I would much rather do the work in C# code if at all possible.
Any views on this are greatly appreciated. Many of the questions I've found are around bulk INSERT, not so much about validating existing records, following by updating records followed by inserting records. I suppose my question is around the first part, the validation. Does extracting the data from the db in to a data table to work on seem wrong, old fashioned, pointless?
I'm sure i've missed out some vital piece of information so apologies if unclear.
Cheers
What is the best approach to audit the database for:
Make a copy of the deleted and updated records.
Save the date and user ID of the ones who did the DML which is (the ID) was saved on a session in ASP.NET
My manager told me to add 3 extra columns to each table one for the ID of whom updated the record, one for whom deleted the record and the third has a boolean value 0 for deleted and 1 for active (not deleted yet), but I think this is a work around and I'm pretty sure there is a better way to do it .
I was thinking of creating history tables and write an AFTER DELETE trigger that saves all the DML.
Is this it or is there a more plain forward way?
SQL Server 2016 onwards you can do this using Temporal tables:
A system-versioned temporal table is a type of user table designed to
keep a full history of data changes, allowing easy point-in-time
analysis. This type of temporal table is referred to as a
system-versioned temporal table because the period of validity for
each row is managed by the system (that is, the database engine).
If what you are really trying to do is to record who changed what, a better approach is to use roles and groups to prevent users altering anything they shouldn't.
There is mssql table in an external customer network. The aim is to create and reflect the same table in the local server. External mssql table, of course, can be changed (data) every hour and somehow I have to check for changes and reflect that changes in local table when new rows are added/deleted or updated. Is there any efficient way to do it? Additionaly i know that this table will have thousands of records. First of all, I thought about some windows service application but have no idea what approach to do, I do not think datatable/dataset with regards to so much records is fine as i remember memory out of exception in past. Any ideas?
The way I would go about it is to create triggers on the existing tables that upon insert, update and delete would insert into a new sync table (or a sync table per existing table) which would mark the change pending synchronization. Your C# code would read from this table on a schedule, apply changes to the local DB and delete the rows from the 'pending' table.
For example, this is how Azure SQL Data Sync works; it creates a table per existing table in the source table and then checks all these tables. I think, depending on how many tables you have and the structure etc, you could write something like JSON in just the one table instead, and it would be easier to check one table than plenty (obviously this depends on how many actual tables we're talking about).
I want to be able to backup all original data for the rows in a table that have been modified or deleted. I also want to keep track of all the inserted rows. The reason for this is that my application will have a user on it to alter changes to a database, and I do not want to commit the changes until the user confirms that all the changes are correct.
How would I go about doing this? I took a look at the TransactionScope and Transaction classes, but what if the program has 100 updates to do to 100 different tables. If my understanding is correct, then I would require 100 different threads until the user confirms that the changes are correct? I came across this when searching about committing and rolling back transactions.
Help :(
If you're waiting for user input you shouldn't leave a transaction open the whole time. Transactions are better suited to handling low level rollbacks within the database itself or from .NET code managing database operations.
If you're waiting for user input, store the temporary data in a different table until you have confirmation from them and then move the data to its ultimate location.
As you want to track all changed rows your issue is not about Transaction. Transaction is for keeping integrity of database incase of an error between two commands. I think you need an Undo/Redo history model.
You can solve your problem by using additional tables. One table for keeping all changes and one for pointing approved version of changes. For a products table it may be something like that:
Products Table
RowId | ProductId | ProductName | Description | Price | ChangingTime
ApprovedProducts Table
ProductId | RowId
Whenever user updates a product you add a new row into Products table and you can point the desired row with RowId in ApprovedProducts table.
I think you should leave to the database to do it, you can write a trigger on insert, update and delete and save the new data and old data. do this on front end is no recomended, triggers are easy and portable to any data base engine.
Regards,
I need to update about 250k rows on a table and each field to update will have a different value depending on the row itself (not calculated based on the row id or the key but externally).
I tried with a parametrized query but it turns out to be slow (I still can try with a table-value parameter, SqlDbType.Structured, in SQL Server 2008, but I'd like to have a general way to do it on several databases including MySql, Oracle and Firebird).
Making a huge concat of individual updates is also slow (BUT about 2 times faster than making thousands of individual calls (roundtrips!) using parametrized queries)
What about creating a temp table and running an update joining my table and the tmp one? Will it work faster?
How slow is "slow"?
The main problem with this is that it would create an enormous entry in the database's log file (in case there's a power failure half-way through the update, the database needs to log each action so that it can rollback in the event of failure). This is most likely where the "slowness" is coming from, more than anything else (though obviously with such a large number of rows, there are other ways to make the thing inefficient [e.g. doing one DB roundtrip per update would be unbearably slow], I'm just saying once you eliminate the obvious things, you'll still find it's pretty slow).
There's a few ways you can do it more efficiently. One would be to do the update in chunks, 1,000 rows at a time, say. That way, the database writes lots of small log entries, rather than one really huge one.
Another way would be to turn off - or turn "down" - the database's logging for the duration of the update. In SQL Server, for example, you can set the Recovery Model to "simple" or "bulk update" which would speed it up considerably (with the caveat that you are more at risk if there's a power failure or something during the update).
Edit Just to expand a little more, probably the most efficient way to actually execute the queries in the first place would be to do a BULK INSERT of all the new rows into a temporary table, and then do a single UPDATE of the existing table from that (or to do the UPDATE in chunks of 1,000 as I said above). Most of my answer was addressing the problem once you've implemented it like that: you'll still find it's pretty slow...
call a stored procedure if possible
If the columns updated are part of indexes you could
drop these indexes
do the update
re-create the indexes.
If you need these indexes to retrieve the data, well, it doesn't help.
You should use the SqlBulkCopy with the KeepIdentities flag set.
As part of a SqlTransaction do a query to SELECT all the records that need updating and then DELETE THEM, returning those selected (and now removed) records. Read them into C# in a single batch. Update the records on the C# side in memory, now that you've narrowed the selection and then SqlBulkCopy those updated records back, keys and all. And don't forget to commit the transaction. It's more work, but it's very fast.
Here's what I would do:
Retrieve the entire table, that is, the columns you need in order to calculate/retrieve/find/produce the changes externally
Calculate/produce those changes
Run a bulk insert to a temporary table, uploading the information you need server-side in order to do the changes. This would require the key information + new values for all the rows you intend to change.
Run SQL on the server to copy new values from the temporary table into the production table.
Pros:
Running the final step server-side is faster than running tons and tons of individual SQL, so you're going to lock the table in question for a shorter time
Bulk insert like this is fast
Cons:
Requires extra space in your database for the temporary table
Produces more log data, logging both the bulk insert and the changes to the production table
Here are things that can make your updates slow:
executing updates one by one through parametrized query
solution: do update in one statement
large transaction creates big log entry
see codeka's answer
updating indexes (RDBMS will update index after each row. If you change indexed column, it could be very costly on large table)
if you can, drop indices before update and recreate them after
updating field that has foreign key constraint - for each inserted record RDBMS will go and look for appropriate key
if you can, disable foreign key constraints before update and enable them after update
triggers and row level checks
if you can, disable triggers before update and enable them after