Comparing and reporting differing data between two databases

Comparing and reporting differing data between two databases - c#

The database for my application contains tables (not editable by the user) that are necessary for my application to run. For instance, there is a Report table containing a list of my SSRS reports.
Except for the Auto-Increment and GUID fields, the data in my Report Table should match across all databases.
To keep existing client databases in synch with the ones created from scratch, there is a database updater app that runs scripts to update the existing client base.
There are Unit Tests to ensure Reports run correctly on both types of databases. However, other than developer eye, there is no system check to ensure the rows and values in those rows match among the tables. This is prone to human error.
To fix, I plan to add a small report to Unit Test report that will inform development of the following:
Records missing from the "Made From Scratch" database that exist in the "Updated" Database
Records missing from the "Updated" database that exist in the "Made From Scratch" Database
Fields that do not match between the tables
So far, I have a query to report the above information for all tables involved.
A sample query would look something like this:
--Take the fields I want to compare from TableToCompare in MadeFromScratch and put them in #First_Table_Var
--NOTE: MyFirstField should match in both tables in order to compare the values between rows
DECLARE #First_Table_Var table(
MyFirstField Varchar(255),
MySecondField VarChar(255),
MyThirdField Varchar(255),
);
INSERT INTO #First_Table_Var
SELECT
r.MyFirstField,
r.MySecondField,
l.MyThirdField
FROM
MadeFromScratch.dbo.TableToCompare r
INNER JOIN MadeFromScratch.dbo.LookUpTable l ON r.ForeignKeyID = l.PrimaryKeyID
--Take the fields I want to compare from TableToCompare in UpdatdDatabase and put them in #Second_Table_Var
DECLARE #Second_Table_Var table(
MyFirstField Varchar(255),
MySecondField VarChar(255),
MyThirdField Varchar(255),
);
INSERT INTO #Second_Table_Var
SELECT
r.MyFirstField,
r.MySecondField,
l.MyThirdField
FROM
UpdatdDatabase.dbo.TableToCompare r
INNER JOIN UpdatdDatabase.dbo.LookUpTable l ON r.ForeignKeyID = l.PrimaryKeyID
--**********************
-- CREATE OUTPUT
--**********************
--List Rows that exist in #Second_Table but not #First_Table
--(e.g. these rows need to be added to the table in MadeFromScratch)
SELECT
Problem = '1 MISSING ROW IN A MADE-FROM-SCRATCH DATABASE',
hur.MyFirstField,
hur.MySecondField,
hur.MyThirdField
FROM
#Second_Table_Var hur
WHERE
NOT EXISTS
(SELECT
*
FROM
#First_Table_Var hu
WHERE
hu.MyFirstField = hur.MyFirstField
)
UNION
--List Rows that exist in #First_Table but not #Second_Table
--(e.g. these rows need to be added to the table in UpdatdDatabase)
SELECT
Problem = '2 MISSING IN UPDATE DATABASE',
hur.MyFirstField,
hur.MySecondField,
hur.MyThirdField
FROM
#First_Table_Var hur
WHERE
NOT EXISTS
(SELECT
*
FROM
#Second_Table_Var hu
WHERE
hu.MySecondField = hur.MySecondField
)
UNION
--Compare fields among the tables where MyFirstField matches, but
SELECT
Problem = '3 MISMATCHED FIELD',
h.MyFirstField,
MySecondField = CASE WHEN h.MySecondField = hu.MySecondField THEN '' ELSE 'Created Value: ' + h.MySecondField + ' Updated Value: ' + hu.MySecondField END,
MyThirdField = CASE WHEN h.MyThirdField = hu.MyThirdField THEN '' ELSE 'Created Value: ' + CAST(h.MyThirdField AS VARCHAR(4)) + ' Updated Value: ' + CAST(hu.MyThirdField AS VARCHAR(4)) END,
FROM
#First_Table_Var h
INNER JOIN #Second_Table_Var hu on h.MyFirstField = hu.MyFirstField
WHERE
NOT EXISTS
(SELECT
*
FROM
#Second_Table_Var hu
WHERE
hu.MyFirstField = h.MyFirstField and
hu.MySecondField = h.MySecondField and
hu.MyThirdField = h.MyThirdField and
)
ORDER BY Problem
I won't have any problem writing code to parse through the results, but this methodology feels antiquated for the following reasons:
Several queries (which essentially do the same thing) will need to be written
Maintenance for this process can get cumbersome
I would like to be able to write something where the list of tables and fields to compare is maintained by some kind of file (XML?). So, whether fields are added or changes all the user has to do is update this file.
Is there a way to use LINQ and/or Reflection (or any feature in .NET 4.0 for that matter) where I could compare tables between two databases and maintain them like I've listed above?
Ideas are welcome. Ideas with an example would be great! :D

you said "Except for the Auto-Increment and GUID fields, the data in my Report Table should match across all databases."
I assume that these fields are ID fields, ideally, replication of the database should replicate the id fields too ensuring this will allow you to check for new inserts by ID, in case of updates, you can set a timestamp field for comparison.

Related

Is there any way to set Database name as global name in sql server?

I have two databases as mentioned below:
[QCR_DEV]
[QCR_DEV_LOG]
All application data are stored in [QCR_DEV]. On each table of [QCR_DEV], there is a trigger that insert the details of insertion and update of [QCR_DEV] table into [QCR_DEV_LOG] database.
Suppose i have a table [party] in [QCR_DEV] database. Whenever i insert,update or delete some record in the table. There will be one insertion in table [party_log] which exists in [QCR_DEV_LOG] database. In short i am keeping the log or action performed on tables of [QCR_DEV] into [QCR_DEV_LOG] database.
When we connect to database through application, it connect to database somehow using connection-string. In my stored procedure, i did not use database name like:
Select * From [QCR_DEV].[party];
I am using like this:
Select * From [party];
This is because, in feature if i need to change database name then i will only need to change connection-string.
Now come to the point, i need to get data from [QCR_DEV_LOG] database. I am writing a stored procedure in which i need to get data from both databases like:
Select * From [QCR_DEV_LOG][party_log]
INNER JOIN [person] on [person].person_id = [QCR_DEV_LOG][party_log].person_id
where party_id = 1
This stored procedure is in [QCR_DEV] database. I need to get data from both databases. For this i need to mention the database name in query. I don't want this. Is there any way to set database name globally and use this name in my queries so that if in future i need to change database name, i only change from where it sets globally. Is there any way to do this?

I would second Jeroen Mostert comment and use synonyms:
CREATE SYNONYM [party_log] FOR [QCR_DEV_LOG].[dbo].[party_log];
And when the target database is renamed, this query would generate a migration script:
SELECT 'DROP SYNONYM [' + name + ']; CREATE SYNONYM [' + name + '] FOR ' + REPLACE(base_object_name, '[OldLogDbName].', '[NewLogDbName].') + ';'
FROM sys.synonyms
WHERE base_object_name LIKE '[OldLogDbName].%';

You could do this in the DEV database:
CREATE VIEW [dbo].[party_log]
AS
SELECT * FROM [QCR_DEV_LOG].[dbo].[party_log]
Then you can write SELECT-queries as if the [party_log] table exists in the DEV database.
Any WHERE.. or JOIN..ON.. clauses should get applied before the combined query is executed.
If the LOG database ever gets moved or renamed, then you'd only need to update the view (or a couple of views, but probably never a lot).
If you expect regular changes, or if you need to use this on multiple servers then you could use dynamic SQL:
IF OBJECT_ID('[dbo].[party_log]') IS NOT NULL DROP VIEW [dbo].[party_log]
-- etc, repeat to DROP other views
DECLARE #logdb VARCHAR(80) = 'QCR_DEV_LOG'
EXEC ('CREATE VIEW [dbo].[party_log] AS SELECT * FROM [' + #logdb + '].[dbo][party_log]')
-- etc, repeat to create other views

Tracking who makes changes to SQL Server within C# application or SQL Server

I'm converting an application from Access to SQL Server 2014. One of the capabilities of this tool is to allow users to create ad-hoc SQL queries to modify delete or add data to a number of tables.
Right now in Access there is no tracking of who does what so if something gets messed up on accident, there is no way to know who it was or when it happened (it has happened enough times that it is a serious issue and one of many reasons the tool is being rewritten).
The application I'm writing is a Windows application in C#. I'm looking for ANY and all suggestions on ways this can be done without putting a huge demand on the server (processing or space). Since the users are creating their own queries I can't just add a column for user name and date (also that would only track the most recent change).
We don't need to keep the old data or even identifying exactly what was changed. Just who changed data and when they did. I want to be able to look at something (view, table or even separate database) that shows me a list of users that made a change and when they did it.

You haven't specified the SQL Server Version, anyway if you have a version >= 2008 R2 you can use Extended Events to monitor your system.
On stackoverflow you can read my answer to similar problem

You can consider to use triggers and a log table, this will work on all SQL Servers. Triggers are a bit more expensive that CDC, but if your users already are updating directly on your tables, this should not be a problem. I think this also will depend on how many tables you want to log.
I will provide you with a simple example for logging the users that has changed a table, or several tables (just add the trigger to the tables):
CREATE TABLE UserTableChangeLog
(
ChangeID INT PRIMARY KEY IDENTITY(1,1)
, TableName VARCHAR(128) NOT NULL
, SystemUser VARCHAR(256) NOT NULL DEFAULT SYSTEM_USER
, ChangeDate DATETIME NOT NULL DEFAULT GETDATE()
)
GO
CREATE TABLE TestTable
(
ID INT IDENTITY(1,1)
, Test VARCHAR(255)
)
GO
--This sql can be added for multiple tables, just change the trigger name, and the table name
CREATE TRIGGER TRG_TABLENAME_Log ON TestTable
AFTER INSERT, UPDATE, DELETE
AS
BEGIN
SET NOCOUNT ON;
--Can be used to get type of change, and wich data that was altered.
--SELECT * FROM INSERTED;
--SELECT * FROM DELETED;
DECLARE #tableName VARCHAR(255) = (SELECT OBJECT_NAME( parent_id ) FROM sys.triggers WHERE object_id = ##PROCID);
INSERT INTO UserTableChangeLog (TableName) VALUES (#tableName);
END
GO
This is how it will work:
INSERT INTO TestTable VALUES ('1001');
INSERT INTO TestTable VALUES ('2002');
INSERT INTO TestTable VALUES ('3003');
GO
UPDATE dbo.TestTable SET Test = '4004' WHERE ID = 2
GO
SELECT * FROM UserTableChangeLog

Avoiding duplicate record insertion into SQL table

I have a windows service which basically watches a folder for any CSV file. Each record in the CSV file is inserted into a SQL table. If the same CSV file is put in that folder, it can lead to duplicate record entries in the table. How can I avoid duplicate insertions into the SQL table?

Try INSERT WHERE NOT EXISTS, where a, b and c are relevant columns, #a, #b and #c are relevant values.
INSERT INTO table
(
a,
b,
c
)
VALUES
(
#a,
#b,
#c
)
WHERE NOT EXISTS
(
SELECT 0 FROM table WHERE a = #a, b = #b, c = #c
)

The accepted answer has a syntax error and is not compatible with relational databases like MySQL.
Specifically, the following is not compatible with most databases:
values(...) where not exists
While the following is generic SQL, and is compatible with all databases:
select ... where not exists
Given that, if you want to insert a single record into a table after checking if it already exists, you can do a simple select with a where not exists clause as part of your insert statement, like this:
INSERT
INTO table_name (
primay_col,
col_1,
col_2
)
SELECT 1234,
'val_1',
'val_2'
WHERE NOT EXISTS (
SELECT 1
FROM table_name
WHERE primary_col=1234
);
Simply pass all values with the select keyword, and put the primary or unique key condition in the where clause.

Problems with the answers using WHERE NOT EXISTS are:
performance -- row-by-row processing requires, potentially, a very large number of table scans against table
NULL handling -- for every column where there might be NULLs you will have to write the matching condition in a more complicated way, like
(a = #a OR (a IS NULL AND #a IS NULL)).
Repeat that for 10 columns and viola - you hate SQL :)
A better answer would take into account the great SET processing capabilities that relational databases provide (in short -- never use row-by-row processing in SQL if you can avoid it. If you can't -- think again and avoid it anyway).
So for the answer:
load (all) data into a temporary table (or a staging table that can be safely truncated before load)
run the insert in a "set"-way:
INSERT INTO table (<columns>)
select <columns> from #temptab
EXCEPT
select <columns> from table
Keep in mind that the EXCEPT is safely dealing with NULLs for every kind of column ;) as well as choosing a high-performance join type for matching (hash, loop, merge join) depending on the available indexes and table statistics.

How can I Insert/Update into two related tables in one command?

A database exists with two tables
Data_t : DataID Primary Key that is
Identity 1,1. Also has another field
'LEFT' TINYINT
Data_Link_t : DataID PK and FK where
DataID MUST exist in Data_t. Also has another field 'RIGHT' SMALLINT
Coming from a microsoft access environment into C# and sql server I'm looking for a good method of importing a record into this relationship.
The record contains information that belongs on both sides of this join (Possibly inserting/updating upwards 5000 records at once). Bonus to process the entire batch in some kind of LINQ list type command but even if this is done record by record the key goal is that BOTH sides of this record should be processed in the same step.
There are countless approaches and I'm looking at too many to determine which way I should go so I thought faster to ask the general public. Is LINQ an option for inserting/updating a big list like this with LINQ to SQL? Should I go record by record? What approach should I use to add a record to normalized tables that when joined create the full record?

Sounds like a case where I'd write a small stored proc and call that from C# - e.g. as a function on my Linq-to-SQL data context object.
Something like:
CREATE PROCEDURE dbo.InsertData(#Left TINYINT, #Right SMALLINT)
AS BEGIN
DECLARE #DataID INT
INSERT INTO dbo.Data_t(Left) VALUES(#Left)
SELECT #DataID = SCOPE_IDENTITY();
INSERT INTO dbo.Data_Link_T(DataID, Right) VALUES(#DataID, #Right)
END
If you import that into your data context, you could call this something like:
using(YourDataContext ctx = new YourDataContext)
{
foreach(YourObjectType obj in YourListOfObjects)
{
ctx.InsertData(obj.Left, obj.Right)
}
}
and let the stored proc handle all the rest (all the details, like determining and using the IDENTITY from the first table in the second one) for you.

I have never tried it myself, but you might be able to do exactly what you are asking for by creating an updateable view and then inserting records into the view.
UPDATE
I just tried it, and it doesn't look like it will work.
Msg 4405, Level 16, State 1, Line 1
View or function 'Data_t_and_Data_Link_t' is not updatable because the modification affects multiple base tables.
I guess this is just one more thing for all the Relational Database Theory purists to hate about SQL Server.
ANOTHER UPDATE
Further research has found a way to do it. It can be done with a view and an "instead of" trigger.
create table Data_t
(
DataID int not null identity primary key,
[LEFT] tinyint,
)
GO
create table Data_Link_t
(
DataID int not null primary key foreign key references Data_T (DataID),
[RIGHT] smallint,
)
GO
create view Data_t_and_Data_Link_t
as
select
d.DataID,
d.[LEFT],
dl.[RIGHT]
from
Data_t d
inner join Data_Link_t dl on dl.DataID = d.DataID
GO
create trigger trgInsData_t_and_Data_Link_t on Data_t_and_Data_Link_T
instead of insert
as
insert into Data_t ([LEFT]) select [LEFT] from inserted
insert into Data_Link_t (DataID, [RIGHT]) select ##IDENTITY, [RIGHT] from inserted
go
insert into Data_t_and_Data_Link_t ([LEFT],[RIGHT]) values (1, 2)

Adding a Column Programmatically to a SQL Server database

I've taken over an ASP.NET application that needs to be re-written. The core functionality of this application that I need to replicate modifies a SQL Server database that is accessed via ODBC from third party software.
The third-party application creates files that represent printer labels, generated by a user. These label files directly reference an ODBC source's fields. Each row of the table represents a product that populates the label's fields. (So, within these files are direct references to the column names of the table.)
The ASP.NET application allows the user to create/update the data for these fields that are referenced by the labels, by adding or editing a particular row representing a product.
It also allows the occasional addition of new fields... where it actually creates a new column in the core table that is referenced by the labels.
My concern: I've never programmatically altered an existing table's columns before. The existing application seems to handle this functionality fine, but before I blindly do the same thing in my new application, I'd like to know what sort of pitfalls exist in doing this, if any... and if there are any obvious alternatives.

It can become problem when too many columns are added to tables, and you have to be careful if performance is a consideration (covering indexes are not applicable, so expensive bookmark lookups might be performed).
The other alternative is a Key-Value Pair structure: Key Value Pairs in Database design, but that too has it's pitfalls and you are better off creating new columns, as you are suggesting. (KVPs are good for settings)

One option I think is to use a KVP table for storing dynamic "columns" (as first mentioned by Mitch), join the products table with the KVP table based on the product id then pivot the results in order to have all the dynamic columns in the resultset.
EDIT: something along these lines:
Prepare:
create table Product(ProductID nvarchar(50))
insert Product values('Product1')
insert Product values('Product2')
insert Product values('Product3')
create table ProductKVP(ProductID nvarchar(50), [Key] nvarchar(50), [Value] nvarchar(255))
insert ProductKVP values('Product1', 'Key2', 'Value12')
insert ProductKVP values('Product2', 'Key1', 'Value21')
insert ProductKVP values('Product2', 'Key2', 'Value22')
insert ProductKVP values('Product2', 'Key3', 'Value23')
insert ProductKVP values('Product3', 'Key4', 'Value34')
Retrieve:
declare #forClause nvarchar(max),
#sql nvarchar(max)
select #forClause = isnull(#forClause + ',', '') + '[' + [Key] + ']' from (
select distinct [Key] from ProductKVP /* WHERE CLAUSE */
) t
set #forClause = 'for [Key] in (' + #forClause + ')'
set #sql = '
select * from (
select
ProductID, [Key], [Value]
from (
select k.* from
Product p
inner join ProductKVP k on (p.ProductID = k.ProductID)
/* WHERE CLAUSE */
) sq
) t pivot (
max([Value])' +
#forClause + '
) pvt'
exec(#sql)
Results:
ProductID Key1 Key2 Key3 Key4
----------- --------- --------- --------- -------
Product1 NULL Value12 NULL NULL
Product2 Value21 Value22 Value23 NULL
Product3 NULL NULL NULL Value34

It very much depends on the queries you want to run against those tables. The main disadvantage of KVP is that more complex queries can become very inefficient.
A "hybrid" approach of both might be interesting.
Store the values you want to query in dedicated columns and leave the rest in an XML blob (MS SQL has nice features to even query inside the XML) or alternatively in a KVP bag. Personally I really don't like KVPs in DBs because you cannot build application logic specific indixes anymore.
Just another approach would be not to model the specific columns at all. You create generic "custom attribute" tables like: Attribute1, Attribute2, Attribute3, Attribute4 (for the required data type etc...) You then add meta data to your database that describes what AttrX means for a specific type of printer label.
Again, it really depends on how you want to use that data in the end.

One risk is the table getting too wide. I used to maintain a horrible app that added 3 columns "automagically" when new values were added to some XML (for some reason it thought everything would be a string a date or a number- hence the creation of 3 columns).
There are other techniques like serializing a BLOB or designing the tables differently that may help.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Comparing and reporting differing data between two databases - c#

Related

Is there any way to set Database name as global name in sql server?

Tracking who makes changes to SQL Server within C# application or SQL Server

Avoiding duplicate record insertion into SQL table

How can I Insert/Update into two related tables in one command?

Adding a Column Programmatically to a SQL Server database

Categories

Resources