There is mssql table in an external customer network. The aim is to create and reflect the same table in the local server. External mssql table, of course, can be changed (data) every hour and somehow I have to check for changes and reflect that changes in local table when new rows are added/deleted or updated. Is there any efficient way to do it? Additionaly i know that this table will have thousands of records. First of all, I thought about some windows service application but have no idea what approach to do, I do not think datatable/dataset with regards to so much records is fine as i remember memory out of exception in past. Any ideas?
The way I would go about it is to create triggers on the existing tables that upon insert, update and delete would insert into a new sync table (or a sync table per existing table) which would mark the change pending synchronization. Your C# code would read from this table on a schedule, apply changes to the local DB and delete the rows from the 'pending' table.
For example, this is how Azure SQL Data Sync works; it creates a table per existing table in the source table and then checks all these tables. I think, depending on how many tables you have and the structure etc, you could write something like JSON in just the one table instead, and it would be easier to check one table than plenty (obviously this depends on how many actual tables we're talking about).
Related
What is the best approach to audit the database for:
Make a copy of the deleted and updated records.
Save the date and user ID of the ones who did the DML which is (the ID) was saved on a session in ASP.NET
My manager told me to add 3 extra columns to each table one for the ID of whom updated the record, one for whom deleted the record and the third has a boolean value 0 for deleted and 1 for active (not deleted yet), but I think this is a work around and I'm pretty sure there is a better way to do it .
I was thinking of creating history tables and write an AFTER DELETE trigger that saves all the DML.
Is this it or is there a more plain forward way?
SQL Server 2016 onwards you can do this using Temporal tables:
A system-versioned temporal table is a type of user table designed to
keep a full history of data changes, allowing easy point-in-time
analysis. This type of temporal table is referred to as a
system-versioned temporal table because the period of validity for
each row is managed by the system (that is, the database engine).
If what you are really trying to do is to record who changed what, a better approach is to use roles and groups to prevent users altering anything they shouldn't.
We have a production database using Change Data Capture(CDC) feature to maintain audit data for few tables. But because of the performance impacts and need to make changes to database structure like adding indexes which we can’t do due to CDC, we now want to disable CDC.
Requirement is to capture all the Insert, Delete and Update actions from web application for a set of tables in a single audit table.
Let's take for example I have to maintain Audit information for TableA in TableB for each Add, Update and Remove done to TableA through web application [in C#].
I'm currently updating each stored procedure which can cause either of these 3 actions for each of the table in the set. But it's too error prone and time taking method as I have huge list of tables.
Please help with a better way which can be more efficient both in terms of time and performance to achieve this.
Quick answer : Just make sure that all actions that will change the DATABASE is recorded!
In our project, we created a separate table which has the Following Fields
Date_And_Time Actions UserID Part_of_Program
whenever a user executes an insert, update or delete command, or logs in or out of the system, or the system runs an automatic operation inside our database, the program automatically inserts a record here which tells us what action was taken, who did it and what what part of the program was affected.
..Or
you could write code that constantly checks the current size of your tables. If the size changes, then record it in the audit table.
If the size increases, then data was inserted. otherwise data was deleted.
I have a project where i am to syncronize a MS Access DB with a MySQL DB and i was wondering about best approach to read/write from/to the MS Access DB since it will lock the DB while i do it.
(There will be other applications reading/writing to the same Access DB so i would like to minimize the amount of time i lock it).
Does the language i choose make difference? I am most used to writing applications using C# and .NET.
Anyone more experienced out there with recommendations/experiences?
Store the queries your application will use exclusively for lookup purposes in the Access DB and as ReadOnly in the query properites. Then have your application query those ReadOnly queries, which will prevent unnecessary locks.
Keeping your update and insert SQL statements in the application short and simple will limit lock time to some extent as well.
I dont think the language matters much - you can use C# or VB in a .NET environment. Are you planning a real-time synchronization or an off-hours (lets say daily) synchronization of the two databases?
One approach for the synchronization would be to do a data dump of the tables to a text file, load them into properly indexed tables and then find updated records (using data access queries) and find new records (with an outer join) and append them to the datastore.
If your table do not contains primary key. Add one, let's name it "Id"
Create table "sync_log" to store changes.
one column for "SQL" to apply to target DB.
one column for this "SQL"'s execute status. (pending, finished)
one column for time of this "SQL", then you can apply the SQLs in original order to target DB.
Create Trigger for Create,Update,Delete
crate:
when create one row, your trigger will insert following line to another table "sync_log"
"INSERT into table T values(CA, CB, CC, CD)"
delete:
when create one row, your trigger will insert following line to another table "sync_log"
"delete from table T where Id = 99"
update: is similar, also identify target row by primary key "Id"
Let your app to poll the table "sync_log", apply new "pending" SQL to target DB, then update the SQL's status in "sync_log" to "finished".
This is one way sync, the another way sync is similar.
if (you are using Microsoft Access project)
then
{http://office.microsoft.com/en-au/access-help/create-a-trigger-adp-HP003085415.aspx }
else
{ Access macro events is similar to trigger. http://blogs.office.com/b/microsoft-access/archive/2009/08/13/access-2010-data-macros-similar-to-triggers.aspx
}
I have 3 tables:
Item_Detail -----------------ID, Name
ItemPurchased_Detail ---QtyPurchased , RateOfPurchase , DiscountReceived
ItemSold_Detail -----------QtySold , RateofSale , DiscountGiven
Also, i have a MAIN TABLE, ITEM_FULL_DETAIL, which contains all the columns from above 3 tables.
i have a winform application, with a single form which contains all the textboxes to insert data in the ITEM_FULL_DETAIL table. the user would input all the data, and click the SUBMIT button
i want to insert data first in the MAIN TABLE, and then it should distribute data individually to all the 3 tables. for this what shall i use? like triggers, porcedures, or views or joins?
Also, am using the ITEM_FULL_DETAIL table, because i want to protect my actual tables from any loss of data such as in case of power outage.
Shall I use a temporary table in place of ITEM_FULL_DETAIL table or is it fine using the current one only?
Is there any other way also?
You can use database triggers or insert all records on the application level.
You should probably re-think your design: duplicating the same data in different tables is usually a bad idea. In this case, you could replace ITEM_FULL_DETAIL with a view, then maintain the data in the underlying tables. That way you only have one copy of the data, so you don't need to worry about inconsistencies between tables.
If you did that then you could either insert new data into the 3 underlying tables in the correct order (probably the best idea) or use an INSTEAD OF trigger on the ITEM_FULL_DETAIL view (more complicated). The INSERTs can be done using an ORM, ADO.NET, a stored procedure or whatever suits your application design best.
If you do have a good reason for duplicating your data, then it would be helpful if you could share it, because someone may have a better suggestion for that scenario.
Also, am using the ITEM_FULL_DETAIL table, because i want to protect my actual tables from any loss of data such as in case of power outage.
..What? How do you suppose you are protecting your tables? What are you trying to prevent? There is absolutely no need to have the ITEM_FULL_DETAIL table if what you are worried about is data integrity. You're probably creating a situation in which data integrity can be compromised by using this intermediate table.
Are you aware of transactions? Use them. If two out of three tables are written to, then the power on the client goes off and can't complete the 3rd write, the transaction will fail and the partial data will be rolled back.
Unless I'm totally missing the point here..
I need to update about 250k rows on a table and each field to update will have a different value depending on the row itself (not calculated based on the row id or the key but externally).
I tried with a parametrized query but it turns out to be slow (I still can try with a table-value parameter, SqlDbType.Structured, in SQL Server 2008, but I'd like to have a general way to do it on several databases including MySql, Oracle and Firebird).
Making a huge concat of individual updates is also slow (BUT about 2 times faster than making thousands of individual calls (roundtrips!) using parametrized queries)
What about creating a temp table and running an update joining my table and the tmp one? Will it work faster?
How slow is "slow"?
The main problem with this is that it would create an enormous entry in the database's log file (in case there's a power failure half-way through the update, the database needs to log each action so that it can rollback in the event of failure). This is most likely where the "slowness" is coming from, more than anything else (though obviously with such a large number of rows, there are other ways to make the thing inefficient [e.g. doing one DB roundtrip per update would be unbearably slow], I'm just saying once you eliminate the obvious things, you'll still find it's pretty slow).
There's a few ways you can do it more efficiently. One would be to do the update in chunks, 1,000 rows at a time, say. That way, the database writes lots of small log entries, rather than one really huge one.
Another way would be to turn off - or turn "down" - the database's logging for the duration of the update. In SQL Server, for example, you can set the Recovery Model to "simple" or "bulk update" which would speed it up considerably (with the caveat that you are more at risk if there's a power failure or something during the update).
Edit Just to expand a little more, probably the most efficient way to actually execute the queries in the first place would be to do a BULK INSERT of all the new rows into a temporary table, and then do a single UPDATE of the existing table from that (or to do the UPDATE in chunks of 1,000 as I said above). Most of my answer was addressing the problem once you've implemented it like that: you'll still find it's pretty slow...
call a stored procedure if possible
If the columns updated are part of indexes you could
drop these indexes
do the update
re-create the indexes.
If you need these indexes to retrieve the data, well, it doesn't help.
You should use the SqlBulkCopy with the KeepIdentities flag set.
As part of a SqlTransaction do a query to SELECT all the records that need updating and then DELETE THEM, returning those selected (and now removed) records. Read them into C# in a single batch. Update the records on the C# side in memory, now that you've narrowed the selection and then SqlBulkCopy those updated records back, keys and all. And don't forget to commit the transaction. It's more work, but it's very fast.
Here's what I would do:
Retrieve the entire table, that is, the columns you need in order to calculate/retrieve/find/produce the changes externally
Calculate/produce those changes
Run a bulk insert to a temporary table, uploading the information you need server-side in order to do the changes. This would require the key information + new values for all the rows you intend to change.
Run SQL on the server to copy new values from the temporary table into the production table.
Pros:
Running the final step server-side is faster than running tons and tons of individual SQL, so you're going to lock the table in question for a shorter time
Bulk insert like this is fast
Cons:
Requires extra space in your database for the temporary table
Produces more log data, logging both the bulk insert and the changes to the production table
Here are things that can make your updates slow:
executing updates one by one through parametrized query
solution: do update in one statement
large transaction creates big log entry
see codeka's answer
updating indexes (RDBMS will update index after each row. If you change indexed column, it could be very costly on large table)
if you can, drop indices before update and recreate them after
updating field that has foreign key constraint - for each inserted record RDBMS will go and look for appropriate key
if you can, disable foreign key constraints before update and enable them after update
triggers and row level checks
if you can, disable triggers before update and enable them after