I am working on a c# .Net desktop application using IDE Visual Studio 2019. I have two tables in SQLite, both having around 1 Million records each. I have to update a few columns of Table A from table B based on some matches. I have also loaded records in DATATABLES too for local processing.
Search and update field criteria vary from time to time, so I have to build dynamic queries.
I have records in tables as well as loaded in datatables. So any approach to update through direct table queries or local processing in DATATABLES is appreciated.
Followed below approaches :
Approach 1 using the query : Found on google search that We can update records from one table to another table easily by the single query in SQL but we can't update records from one table to another in SQLite. I had tried the query approach but failed later found that SQLite does not support.
I also tried to update records one by one in the loop, insert queries in the loop are very fast (took just 4-5 seconds for 200K records). But update query in the loop is very slow taking around 9 records per second. That will take hours to update all records.
Approach 2 using Linq :
Schema of both tables are not the same but have links based on a few columns.
string field1 = "A";
string field2 = "B";
var updateQuery = from r1 in table1Data.AsEnumerable()
join r2 in table2Data.AsEnumerable()
on r1.Field<string>(field1) equals r2.Field<string>(field2)
select new { r1, r2 };
This works fine for the single field but I need dynamic way and with multicolumn too (column count will be decided on run time based on user selection)
Related
I have two databases in my SQL Server with each database containing 1 single table as of now.
I have 2 database like below :
1) Db1 (MySQL)
2) Db2 (Oracle)
Now what I want to do is fill my database table of SQL Server db1 with data from Db1 from MySQL like below :
Insert into Table1 select * from Table1
Select * from Table1(Mysql Db1) - Data coming from Mysql database
Insert into Table1(Sql server Db1) - Insert data coming from Mysql
database considering same schema
I don't want to use sqlbulk copy as I don't want to insert chunk by chunk data. I want to insert all data in 1 go considering millions of data as my operation is just not limited to insert records in database. So user have to sit wait for a long like first millions of data inserting chunk by chunk in database and then again for my further operation which is also long running operation.
So if I have this process speed up then I can have my second operation also speed up considering all records are in my 1 local sql server instance.
Is this possible to achieve in a C# application?
Update: I researched about Linked server as #GorDon Linoff suggested me that linked server can be use to achieve this scenario but based on my research it seems like i cannot create linked server through code.
I want to do this with the help of ado.net.
This is what I am trying to do exactly:
Consider I have 2 different client RDBMS with 2 database and some tables in client premises.
So database is like this :
Sql Server :
Db1
Order
Id Amount
1 100
2 200
3 300
4 400
Mysql or Oracle :
Db1:
Order
Id Amount
1 1000
2 2000
3 3000
4 400
Now I want to compare Amount column from source (SQL Server) to destination database (MySQL or Oracle).
I will be use to join this 2 different RDBMS databases tables to compare Amount columns.
In C# what I can do is like fetch chunk by chunk records in my datatable (in memory) then compare this records with the help of code but this will take so much time considering millions of records.
So I want to do something better than this.
Hence I was thinking that i bring out this 2 RDBMS records in my local SQL server instance in 2 databases and then create join query joining this 2 tables based on Id and then take advantage of DBMS processing capability which can compare this millions of records efficiently.
Query like this compares millions of records efficiently :
select SqlServer.Id,Mysql.Id,SqlServer.Amount,Mysql.Amount from SqlServerDb.dbo.Order as SqlServer
Left join MysqlDb.dbo.Order as Mysql on SqlServer.Id=Mysql.Id
where SqlServer.Amount != Mysql.Amount
Above query works when I have this 2 different RDBMS data in my local server instance with database : SqlServerDb and MysqlDb and this will fetch below records whose amount is not matching :
So I am trying to get those records from source(Sql server Db) to MySQL whose Amount column value is not matching.
Expected Output :
Id Amount
1 1000
2 2000
3 3000
So there is any way to achieve this scenario?
On the SELECT side, create a .csv file (tab-delimited) using SELECT ... INTO OUTFILE ...
On the INSERT side, use LOAD DATA INFILE ... (or whatever the target machine syntax is).
Doing it all at once may be easier to code than chunking, and may (or may not) be faster running.
SqlBulkCopy can accept either a DataTable or a System.Data.IDataReader as its input.
Using your query to read the source DB, set up a ADO.Net DataReader on the source MySQL or Oracle DB and pass the reader to the WriteToServer() method of the SqlBulkCopy.
This can copy almost any number of rows without limit. I have copied hundreds of millions of rows using the data reader approach.
What about adding a changed date in the remote database.
Then you could get all rows that have changed since the last sync and just compare those?
First of all do not use linked server. It is tempting but it will more trouble than it is bringing on the table. Like updates and inserts will fetch all of the target db to source db and do insert/update and post all data to target back.
As far as I understand you are trying to copy changed data to target system for some stuff.
I recommend using a timestamp column on source table. When anything changes on source table timestamp column is updated by sql server.
On target, get max ID and max timestamp. two queries at max.
On source, rows where source.ID <= target.MaxID && source.timestamp >= target.MaxTimeTamp is true, are the rows that changed after last sync (need update). And rows where source.ID > target.MaxID is true, are the rows that are inserted after last sync.
Now you do not have to compare two worlds, and you just got all updates and inserts.
You need to create a linked server connection using ODBC and the proper driver, after that you can execute the queries using openquery.
Take a look at openquery:
https://msdn.microsoft.com/en-us/library/ms188427(v=sql.120).aspx
Yes, SQL Server is very efficient when it's working with sets so let's keep that in play.
In a nutshell, what I'm pitching is
Load data from the source to a staging table on the target database (staging table = table to temporarily hold raw data from the source table, same structure as the source table... add tracking columns to taste). This will be done by your C# code... select from source_table into DataTable then SqlBulkCopy to the staging table.
Have a stored proc on the target database to reconcile the data between your target table and the staging table. Your C# code calls the stored proc.
Given that you're talking about millions of rows, another thing that can make things faster is dropping indices on the staging table before inserting to it and recreating those after the inserts and before any select is performed.
With EF Core: Is there a way to tell it that the new value in the single field I update is the same for all entities and make it update the entities with one/few SQL statements?
I want to set a timestamp on a lot of records at the same time.
I currently have this code:
var downloadTimeStamp = DateTime.UtcNow;
foreach (var e in entities)
{
e.DownloadDateTime = downloadTimeStamp;
}
await db.SaveChangesAsync();
It generates SQL that executes one UPDATE statement for each entity. E.g.:
UPDATE [TableName] SET [DownloadDateTime] = #p1988
WHERE [Id] = #p1989;
SELECT ##ROWCOUNT;
Is that really the most optimal way to do this? Is it possible to make EF Core do this in fewer SQL statements?
In my current dataset there are 100,000 records that need to be updated. It is not the full contents of the table, only a subset. It takes about 90 seconds to do the above updates when running the code inside Visual Studio with a SQL server express on my local PC. When running in a small Azure web app with a small SQL server Azure instance, it takes several minutes. That's why I'm looking to optimize it, if possible.
I'm looking for an efficient way of inserting records into SQL server for my C#/MVC application. Anyone know what the best method would be?
Normally I've just done a while loop and insert statement within, but then again I've not had quite so many records to deal with. I need to insert around half a million, and at 300 rows a minute with the while loop, I'll be here all day!
What I'm doing is looping through a large holding table, and using it's rows to create records in a different table. I've set up some functions for lookup data which is necessary for the new table, and this is no doubt adding to the drain.
So here is the query I have. Extremely inefficient for large amounts of data!
Declare #HoldingID int
Set #HoldingID = (Select min(HoldingID) From HoldingTable)
While #JourneyHoldingID IS NOT NULL
Begin
Insert Into Journeys (DepartureID, ArrivalID, ProviderID, JourneyNumber, Active)
Select
dbo.GetHubIDFromName(StartHubName),
dbo.GetHubIDFromName(EndHubName),
dbo.GetBusIDFromName(CompanyName),
JourneyNo, 1
From Holding
Where HoldingID = #HoldingID
Set #HoldingID = (Select MIN(HoldingID) From Holding Where HoldingID > #HoldingID)
End
I've heard about set-based approaches - is there anything that might work for the above problem?
If you want to insert a lot of data into a MSSQL Server then you should use BULK INSERTs - there is a command line tool called the bcp utility for this, and also a C# wrapper for performing Bulk Copy Operations, but under the covers they are all using BULK INSERT.
Depending on your application you may want to insert your data into a staging table first, and then either MERGE or INSERT INTO SELECT... to transfer those rows from the staging table(s) to the target table(s) - if you have a lot of data then this will take some time, however will be a lot quicker than performing the inserts individually.
If you want to speed this up then are various things that you can do such as changing the recovery model or tweaking / removing triggers and indexes (depending on whether or not this is a live database or not). If its still really slow then you should look into doing this process in batches (e.g. 1000 rows at a time).
This should be exactly what you are doing now.
Insert Into Journeys(DepartureID, ArrivalID, ProviderID, JourneyNumber, Active)
Select
dbo.GetHubIDFromName(StartHubName),
dbo.GetHubIDFromName(EndHubName),
dbo.GetBusIDFromName(CompanyName),
JourneyNo, 1
From Holding
ORDER BY HoldingID ASC
you (probably) are able to do it in one statement of the form
INSERT INTO JOURNEYS
SELECT * FROM HOLDING;
Without more information about your schema it is difficult to be absolutely sure.
SQLServer 2008 introduced Table Parameters. These allow you to insert multiple rows in a single trip to the database (send it as a large blob). Without using a temporary table. This article describes how it works (step four in the article)
http://www.altdevblogaday.com/2012/05/16/sql-server-high-performance-inserts/
It differs from bulk inserts in that you do not need special utilities and that all constraints and foreign keys are checked.
I quadrupled my throughput using this and parallelizing the inserts. Now at 15.000 inserts/second in the same table sustained. Regular table with indexes and over a billion rows.
I have used BulkCopy command to transfer rows from one table to another table with bulk data about 3 to 5 million rows. I want to update these rows.
Is there any BulkUpdate command similar to the BulkCopy command? I'm using ASP.NET with C#.
No, there isn't.
Q: What's an "lac"?
This might help:
http://itknowledgeexchange.techtarget.com/itanswers/bulk-update-in-sql-server-2005/
Assuming that you have a column with distict values to show you which
rows are which between the two tables this can be done with a simple
update statement.
UPDATE TableA
SET TableA.A1 = TableB.B1,
TableA.A2 = TableB.B2
FROM TableB
WHERE TableA.A3 = TableB.B3
If you are worried about creating one massive transaction you can
batch the operation into smaller chunks. This is done via the TOP
keyword.
UPDATE TOP (1000) TableA
SET TableA.A1 = TableB.B1,
TableA.A2 = TableB.B2
FROM TableB
WHERE TableA.A3 = TableB.B3
AND TableA.A1 <> TableB.B1
AND TableA.A2 <> TableB.B2
You can put that into a loop...
Here's another link (with basically the same solution):
http://www.sqlusa.com/bestpractices2005/hugeupdate/
A common approach here is:
bulk-load (SqlBulkCopy) into an empty *staging table - meaning: a table with the right columns/types as the actual data, but not part of the main transactional system
now do an update joining the real data to the staging data, to update the values in the real data
Disclaimer: I'm the owner of the project Bulk Operations
The Bulk Operations Library allow to Insert, Delete, Update and Merge millions of rows in few seconds.
It's very easy to learn and use if you already know the SqlBulkCopy class.
var bulk = new BulkOperation(connection);
// ... Mappings ....
bulk.BulkUpdate(dt);
I am confused about selecting two approaches.
Scenario
there are two tables Table 1 and Table 2 respectively. Table 1 contains user's data for example first name, last name etc
Table 2 contains cars each user has with its description. i.e Color, Registration No etc
Now if I want to have all the information of all users then what approach is best to be completed in minimum time?
Approach 1.
Query for all rows in Table 1 and store them all in a list for ex.
then Loop through the list and query it and get data from Table 2 according to user saved in in first step.
Approach 2
Query for all rows and while saving that row get its all values from table 2 and save them too.
If I think of system processes then I think it might be the same because there are same no of records to be processed in both approaches.
If there is any other better idea please let me know
Your two approaches will have about the same performance (slow because of N+1 queries). It would be faster to do a single query like this:
select *
from T1
left join T2 on ...
order by T1.PrimaryKey
Your client app can them interpret the results and have all data in a single query. An alternative would be:
select *, 1 as Tag
from T1
union all
select *, 2 as Tag
from T2
order by T1.PrimaryKey, Tag
This is just pseudo code but you could make it work.
The union-all query will have surprisingly good performance because sql server will do a "merge union" which works like a merge-join. This pattern also works for multi-level parent-child relationships, although not as well.