bulk compare in sql

bulk compare in sql - c#

I have a csv file with one column. Each row in this column may have a match record in TestTable from database (sample table only). I would like to send the entire data via C# to the stored procedure one time and return the rows from the database if it has a match.
Input parameter (csv column):
Stringcolumn
apple
banana
copper
dig
....
Output (possibly dataset):
ID|StringColumn|AddedDate
2|apple|2021-01-02
35|copper|2021-01-02
The requirement is to send and receive the data via C#.
Right now, I'm thinking to use User-Defined Table Types in MS SQL to receive the data.
Thank you.

If its a big csv I would recommend dumping it in a temp table and then using a join to get the data. Alternatively you can use user defined table type to pass the data into a stored procedure

Related

How to insert millions of data of different RDBMS in to SQL Server database with insert statement?

I have two databases in my SQL Server with each database containing 1 single table as of now.
I have 2 database like below :
1) Db1 (MySQL)
2) Db2 (Oracle)
Now what I want to do is fill my database table of SQL Server db1 with data from Db1 from MySQL like below :
Insert into Table1 select * from Table1
Select * from Table1(Mysql Db1) - Data coming from Mysql database
Insert into Table1(Sql server Db1) - Insert data coming from Mysql
database considering same schema
I don't want to use sqlbulk copy as I don't want to insert chunk by chunk data. I want to insert all data in 1 go considering millions of data as my operation is just not limited to insert records in database. So user have to sit wait for a long like first millions of data inserting chunk by chunk in database and then again for my further operation which is also long running operation.
So if I have this process speed up then I can have my second operation also speed up considering all records are in my 1 local sql server instance.
Is this possible to achieve in a C# application?
Update: I researched about Linked server as #GorDon Linoff suggested me that linked server can be use to achieve this scenario but based on my research it seems like i cannot create linked server through code.
I want to do this with the help of ado.net.
This is what I am trying to do exactly:
Consider I have 2 different client RDBMS with 2 database and some tables in client premises.
So database is like this :
Sql Server :
Db1
Order
Id Amount
1 100
2 200
3 300
4 400
Mysql or Oracle :
Db1:
Order
Id Amount
1 1000
2 2000
3 3000
4 400
Now I want to compare Amount column from source (SQL Server) to destination database (MySQL or Oracle).
I will be use to join this 2 different RDBMS databases tables to compare Amount columns.
In C# what I can do is like fetch chunk by chunk records in my datatable (in memory) then compare this records with the help of code but this will take so much time considering millions of records.
So I want to do something better than this.
Hence I was thinking that i bring out this 2 RDBMS records in my local SQL server instance in 2 databases and then create join query joining this 2 tables based on Id and then take advantage of DBMS processing capability which can compare this millions of records efficiently.
Query like this compares millions of records efficiently :
select SqlServer.Id,Mysql.Id,SqlServer.Amount,Mysql.Amount from SqlServerDb.dbo.Order as SqlServer
Left join MysqlDb.dbo.Order as Mysql on SqlServer.Id=Mysql.Id
where SqlServer.Amount != Mysql.Amount
Above query works when I have this 2 different RDBMS data in my local server instance with database : SqlServerDb and MysqlDb and this will fetch below records whose amount is not matching :
So I am trying to get those records from source(Sql server Db) to MySQL whose Amount column value is not matching.
Expected Output :
Id Amount
1 1000
2 2000
3 3000
So there is any way to achieve this scenario?

On the SELECT side, create a .csv file (tab-delimited) using SELECT ... INTO OUTFILE ...
On the INSERT side, use LOAD DATA INFILE ... (or whatever the target machine syntax is).
Doing it all at once may be easier to code than chunking, and may (or may not) be faster running.

SqlBulkCopy can accept either a DataTable or a System.Data.IDataReader as its input.
Using your query to read the source DB, set up a ADO.Net DataReader on the source MySQL or Oracle DB and pass the reader to the WriteToServer() method of the SqlBulkCopy.
This can copy almost any number of rows without limit. I have copied hundreds of millions of rows using the data reader approach.

What about adding a changed date in the remote database.
Then you could get all rows that have changed since the last sync and just compare those?

First of all do not use linked server. It is tempting but it will more trouble than it is bringing on the table. Like updates and inserts will fetch all of the target db to source db and do insert/update and post all data to target back.
As far as I understand you are trying to copy changed data to target system for some stuff.
I recommend using a timestamp column on source table. When anything changes on source table timestamp column is updated by sql server.
On target, get max ID and max timestamp. two queries at max.
On source, rows where source.ID <= target.MaxID && source.timestamp >= target.MaxTimeTamp is true, are the rows that changed after last sync (need update). And rows where source.ID > target.MaxID is true, are the rows that are inserted after last sync.
Now you do not have to compare two worlds, and you just got all updates and inserts.

You need to create a linked server connection using ODBC and the proper driver, after that you can execute the queries using openquery.
Take a look at openquery:
https://msdn.microsoft.com/en-us/library/ms188427(v=sql.120).aspx

Yes, SQL Server is very efficient when it's working with sets so let's keep that in play.
In a nutshell, what I'm pitching is
Load data from the source to a staging table on the target database (staging table = table to temporarily hold raw data from the source table, same structure as the source table... add tracking columns to taste). This will be done by your C# code... select from source_table into DataTable then SqlBulkCopy to the staging table.
Have a stored proc on the target database to reconcile the data between your target table and the staging table. Your C# code calls the stored proc.
Given that you're talking about millions of rows, another thing that can make things faster is dropping indices on the staging table before inserting to it and recreating those after the inserts and before any select is performed.

How do I change the connection string to SQL Server in Excel Power query programmatically?

I'm attempting to create an Excel pivot table based on data in a Microsoft Dynamics NAV database.
The trick is I need the Excel to access the data directly from the SQL Server database with power query - and furthermore it must be able to access the data from the same table in multiple databases with different names and table names.
Does anybody have any experience or advice regarding this issue?

Step 1. First you should make a function where you can pass a server name, database name and table to be queried. Something like
let getData =(servername,dbname,tablename)=>
let
Source = Sql.Database(Servername, dbname, [Query="select abc , def from" & tablename & " where condition etc etc"]),
#"CustomStep1" = some action on Source,
in
.
.
#"CustomStepn" = some action on Added CustomStepn-1
in
#"Added CustomStepn"
in
getData
You have a function ready which you can use in a table to create a custom column.
Step 2. Now use a parameter table approach. Create a table in normal excel area.Something like
Server Name|DatabaseName | Table_to_be_used
Use now use menu option fromtable in powerquery options (or Data tab in Excel 16) Add a custom column in this table in powerquery steps using function getdata created in previous Step. Perform any other "Expand" (By default first function is going to return a table if you are not doing any other transformation) , "summarize" , Rename Operation.
However powerquery formula firewall is going to give you hardtime as powerquery doesn't trust Native SQL queries and you will have to approve each and every native sql query. You may try to uncheck checkbox for "Require user approval for new native database query" in query option.
Hope you get the idea and it helps.

Perhaps it would be worth looking into creating Query objects and exposing them via oData which is something Excel can read from. The benefit here is that it can handle table relations natively and can expose Flow Fields which you cannot see in direct SQL queries to the table.
Aside from a stored procedure to manage the different table names, there's not a simple way to query specific tables without hard coding the names in some capacity.
The Company table will give you the prefix$ and the table names are static between companies. You could write some fancy Excel logic to loop through them.

send and insert Data-table to sqlserver table

I need to send a datatable to sqlserver2012 and, insert datatable into a specific table. How can i do that?
( i don't want do this work in C# i want to send datatable to sqlserver and do this work in sqlserver). I saw similar question, but didn't found my answer.
Edit:
into sqlserver i want insert datatable rows into a table.

Your point is not clear enough for what is your purpose, you can use the Update() method which would perform the operation of updating newly added records to your table, this would implicitly send the table to the sql server and update data

i think you want to save bulk data from DataTable in sqlserver. please peroform following steps
1) Create custom table value data type in sqlserver 2008 onword
2) create stored procedure with custom table value data type parameter.
3) use following code for set parameter value from code
SqlParameter param = cmd.Parameters.AddWithValue("#RECORD_TBL", Datatable);

Compare an array with a "very large" table of a SQL Server database

In an C# program I have an array with about 100.000 elements.
Then I have a SQL Server 2008 table where the primary key column contains more or less nearly all elements of the array (but a few not). The table can have up to 30.000.000 rows.
Now I want to determine which elements of the array do not exist in the table. How can this be achieved efficiently?

The most efficient method would probably be to bulk-insert those 100,000 elements into a temp table and then perform the comparison within the database itself.
(Note that I haven't tested this theory; it's just an educated guess.)

Query the table with a
select <primarykey> where <primarykey> in (<primary key of ur list of elements in c#>)
This should be faster than inserting all rows into a table and then checking with an except/minus command for missing elements, because it does not involve any write operation.
Once you have the list of primary keys which are common..pull it back into c# and compare.

A way to avoid creating temp tables would be to use a stored procedure which accepts a table valued parameter of a user-defined table type (udtt). This table would have a schema of one column of a data type matching that in your array.
If you populate a DataTable (with a schema matching the udtt schema) with your array values and supply the data table as your stored proc's parameter, you can pass up all 100,000 of your items in their sql binary format. The proc can just do a join between the 30M row table and the table-valued parameter, returning the items in the TVP table with no matches in the master table.
This avoids needing to build massive IN statements.
EDIT Regarding the comment from #Kyro below
I'm now less confident in this approach. I found an article showing the under-the-covers row-by-row inserts that Kyro describes. What you might gain in sending binary data over the network rather than a large TSQL where in() statement, may well be taken away by the performance SQL side. However, it's a fairly simple code approach, so might just be worth a quick test. Let us know how you get on?

Loading DataTable into SP

I have SQL Server 2008 and VS 2008 Pro. And I am coding in C#. I accept a text input file from the customer and I parse this data into a DataTable in my C# aspx.cs file. I know that this is working correctly. So I populated the DataTable. But now how do I load this data into an SP? Like can I use a dynamic table parameter? Or should I use XML instead?
The problem is that the number of required columns may vary depending on which table they want to insert into. The details are I let the user select which table they want to append data to. For simplicity, let's say:
TABLE_NAME NUM_COLS
A 2
B 3
C 4
And also let's assume that the first column in each of these is an INT primary key.
So if they choose Table B, then DataTable would look something like:
PK C1 C2 C3
1 'd' 'e' '3/10/99'
2 'g' 'h' '4/10/99'
So now I want to append this data above into Table B in my AdventureWorks DB. What is the easiest way to implement this both in the SP definition and also the C# code which calls this SP?
Thank you!

I think I understand what you're asking. I'm going to assume each row of your data import will map directly/cleanly to a table in the database. I am also going to assume your application logic can determine where each row of data shall be persisted.
This said, I suggest working with each row of the .NET DataTable individually rather than passing the data in bulk to SQL as a single stored procedure parameter and then depending on SQL to do any data parsing and table mapping.
Basically, loop through your DataTable, determine the type of data and execute the appropriate insert for each row. I hope this helps.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

bulk compare in sql - c#

If its a big csv I would recommend dumping it in a temp table and then using a join to get the data. Alternatively you can use user defined table type to pass the data into a stored procedure

Related

How to insert millions of data of different RDBMS in to SQL Server database with insert statement?

How do I change the connection string to SQL Server in Excel Power query programmatically?

send and insert Data-table to sqlserver table

Compare an array with a "very large" table of a SQL Server database

Loading DataTable into SP

Categories

Resources