This question already has answers here:
SQL how to compare two tables for same data content?
(21 answers)
Closed 6 years ago.
I have 2 big tables(About 100-150k rows in each).
The structure of these tables is the same. Ids of entities are also the same in each table.
I need a very fast way to compare these tables and answer the following questions:
Which row's fields are different from another table's row?
Which ids exists in first table and doesn't exists in second table?
Which ids exists in second table and doesn't exists in first table?
Thank you!
Edit: I need to do this comparison using C# or maybe stored procedures(and then to select results by c#)
If you have two tables Table1 and Table2 and they have the same structure and primary key named ID you can use this SQL:
--Find rows that exist in both Table1 and Table2
SELECT *
FROM Table1
WHERE EXISTS (SELECT 0 FROM Table2 WHERE Table1.ID = Table2.ID)
--Find rows that exist in Table1 but not Table2
SELECT *
FROM Table1
WHERE NOT EXISTS (SELECT 0 FROM Table2 WHERE Table1.ID = Table2.ID)
If you are trying to compare and find rows that differ in one column or another, that is a little trickier. You can write SQL to check each and every column yourself, but it may be simpler to add a temporary CHECKSUM column to both tables and compare those. If the checksums are different then one or more columns are different.
SQL Data Compare is a great tool for doing this. Also Microsoft Visual Studio SQL Server Data Tools has a Data Compare function.
I found the following method to perform very well when comparing large data sets.
http://weblogs.sqlteam.com/jeffs/archive/2004/11/10/2737.aspx
Basically UNION ALL of the two data sources then aggregate them and return only rows which don't have an identical matching row in the other table.
With unionCTE As (
Select 'TableA' As TableName, col1, col2
From TableA
Union All
Select 'TableB', col1, col2
From TableB)
Select Max(TableName), col1, col2
From unionCTE
Group By col1, col2
Having Count(*) = 1
Order By col1, col2, TableName;
This will show the results in a single resultset, and if there are any rows that have the same key but different values the rows will be one above the other so that you can easily compare which values have changed between the tables.
This can easily be put into a stored procedure, if you want.
Related
I have two database table to store the current data and another to store historical data, whenever a user makes any changes then the current table is updated and that data in the current data is saved to historical table. However, I would like to use a SQL query to display on the gridview both the current and historical data. I've outlined a picture of the results I would like to achieve.
You seem to be looking for union all:
select ct.* from current_table
union all
select ht.* from history_table
For this to work properly, both tables must have the same number of columns, with the same datatype (and length) - the description of your question makes me think that this really is the case here.
Use as command for your sql query as below
select column1, column2 from table1
union
select column4 as column1, column2 from table2.
This will rename you non matching column to matching column.
Some reading link
I have a query like this:
Select table1.*, table2.column1 from table1 join table2 on table1.column1=table2.column1
It works, but it puts the column in the end of the datagridview, but i have to put table2.column1, after a specified column of table2, and i have to use table1.* and i cant use listing of the table1's columns is it possible?
And why exactly can't you use a list of all the fields?
NO , it's not possible to place a column in the middle of columns specified with * , not with pure SQL and not with dynamic.
Just specify them, don't be lazy, it's better practice:
SELECT table1.col1,
table1.col2,
table2.col1,
table1.col3
..........
because i am using union queries, and the table names are changing and one table contains more colums than the other
if table1 differs, that above all should be a strong argument for specifing all needed fields separatly. In case of a new field in table1, your query would be broken, cause the number of fields will differ from the ones used in the next union.
I am programming an Excel add-in in C# where I process data contained in different DataTable objects. I would like to provide a function to perform SQL queries on the data, with the ability to reference data from other tables in where and sort by clauses (for example, using a join).
An example of such a query would be
SELECT name
FROM Table1
WHERE id = Table2.id AND Table2.age > 18
The problem with this is that a DataTable doesn't know of the existance of the other DataTables, so (for so far I know) there are no such methods in the class. Also, I cannot use something like LINQ, since the query will be written by the users of the add-in in excel.
Would it be a good solution to copy the data to an in-memory database, where each DataTable is mapped to a table? How would this work performance-wise? Is there a simpler solution?
In terms of SQL query you are missing a table reference in selecting the tables, corrected query will look like
SELECT name
FROM Table1, Table2
WHERE Table1.id = Table2.id AND Table2.age > 18
Use Table1.name if there is same named attribute in Table2.
However using only WHERE condition in Joins without specifying the joining attribute is not recommended read this question. Use JOIN.
SELECT Table1.name
FROM Table1 INNER JOIN Table2 ON Table1.id = Table2.id WHERE Table2.age > 18
I have a DataSet that contains two tables. One is considered to be nested in the other.. All I want is for it to not be nested and for there to be one table. .Merge() and LINQ just aren't doing the trick.
Here is a sample of what the main table would look like
student-id ID
--------------------
123456789 1
654987321 2
But each of these has multiple rows that they correspond to in the next table
ID Col1 Col2 etc.
----------------------
1 fact1 fact2
1 fact3 fact4
2 fact5 fact6
I want to combine them so they would look like this...
student-id Col1 Col2
-------------------------------
123456789 fact1 fact2
123456789 fact3 fact4
654987321 fact5 fact6
Everytime that I try the merge it doesn't work I get an error that I cant duplicate the primary key which is "ID" and since the merge is based on the primary key(i believe) I cant remove it.
I cant use LINQ because I want to make this generic so that the second table could have any number of columns and I cant get the select to work for that.
UPDATE: MY SOLUTION
I ended up cloning the second table to a new data table. Then adding a column called 'student-id' and deleting the ID column. The I looped through the rows of the Main table finding and related them to row in the second table... Combined all the data in an array and created a row in the final table.
The LINQ isn't as bad as you suggest. You can just use an anonymous type that holds two DataRows:
var result = from t1 in table1.AsEnumerable()
join t2 in table2.AsEnumerable() on (int)t1["ID"] equals (int)t2["ID"]
select new
{
Student = t1,
Facts = t2
};
foreach(var s in result)
Console.WriteLine("{0} {1} {2}", s.Student["student-id"], s.Facts["Col1"], s.Facts["Col2"]);
That way, you're not including specific columns in your output, so you can access them after the fact.
That being said, the other poster's suggestion of using a pivot table is probably a better direction to go.
let's try it in SQL.
Let, 1st Table = Table1 and 2nd Table = Table2
SQL:
Select
X.student-id,Y.Col1,Y.Col2
From Table1 As X Inner Join Table2 As Y On Y.ID=X.ID
I think if you try it in SQL it's easy to do!!!
Sounds like what you need is a Pivot table.
This will essentially allow you to display the data how you want.
Here are a couple of tutorials/projects
http://www.codeproject.com/Articles/25167/Simple-Advanced-Pivots-with-C-and-ASP-NET
http://www.codeproject.com/Articles/46486/Pivoting-DataTable-Simplified
Update
you may find yourself better doing the 'pivot' part in MS SQL as stored procedure and then populating your datatable with the results of calling this stored procedure. This example here is a great starting block
http://blogs.msdn.com/b/spike/archive/2009/03/03/pivot-tables-in-sql-server-a-simple-sample.aspx
I am using C# and sql server 2008,I need to select 10 random rows from one table and insert them to another table and I wanted to do this with cursor in sql server,but I have read a lot about disadvantages of cursor.Now I want to do this via C# code.Does anybody have a better suggestion?? thanks in advance
Idea from Select n random rows from SQL Server table
No cursors required:
Insert Into Table1
(col1, col2)
Select Top 10
col1, col2
From
Table2
Order By
NewID()