What is the functionality of __syncTransactions table in SQL Compact database - c#

Does anyone know what __syncTransactions table is used for? Here's my scenario:
We have few clients running on SQL Server Compact database syncing with SQL Server 2008 Express server database using Sync Framework v3.5 for a about a year now. It seems that for tables with large number of records (i.e. > 20000), it roughly takes more than a minute to sync and CPU reaches 100% as well.
After enabling sync trace logging, I managed to narrow down the query that was behind the slow syncing. Following is the query that checks for inserts in the client database for one of the tables that contains ~70,000 records:
select
ut.*
from
(
select
ut0.*
from
[tblPermissionGroupResourceRole] as ut0
where
( ut0._sysTrackingContext <> 'a4e40127-4083-4b27-88d0-ef3aed4ae343'
OR
ut0._sysTrackingContext IS NULL
)
AND (ut0._sysChangeTxBsn >= 9486853)
AND (ut0._sysInsertTxBsn NOT IN (SELECT SyncBsn FROM __syncTransactions))
) as ut
LEFT OUTER JOIN
(
select
txcs0.*
from
_sysTxCommitSequence as txcs0
) as txcs ON (ut._sysInsertTxBsn = txcs._sysTxBsn)
WHERE
COALESCE(txcs._sysTxCsn, ut._sysInsertTxBsn) > 9486853
AND
COALESCE(txcs._sysTxCsn, ut.__sysInsertTxBsn) <= 9487480
I have highlighted line that takes roughly 1 minute to execute. The reason is that __syncTransaction table contains around 1200 records and due to the cross reference with 70,000 odd records in my tblPermissionGroupResourceRole table, the query is quite slow.
Therefore, I need to understand how __syncTransactions table is used so I can try and clear records from this table or whether there's any other way to sort out my problem?
Any help is much appreciated.
Kind regards,
Sasanka.

Related

SQL Server request inserting 200 rows in a local database takes 20 seconds and growing

I am working on a console app (C#, asp-core 2.1, Entity Framework Core) which is connected to a local SQL Server database, the default (localdb)\MSSQLLocalDB (SQL Server 2016 v13.0) provided with Visual Studio.
The problem I am facing is that it takes quite a long time to insert data into a table. The table has 400.000 rows, 6 columns, and I insert them 200 at a time.
Right now, the request takes 20 seconds to be executed. And this execution time keeps increasing. Considering the fact that I still have 20.000 x200 rows to insert, it's worth figuring out where does this problem comes from!
A couple of facts :
There is no Index on the table
My computer is not new but I have a quite good hardware (i7, 16 Go RAM) and I don't hit 100% CPU while inserting
So, my questions are :
Is 400 k rows considered to be a 'large' database? I've never worked with a table that big before but I thought it was common to have a dataset like this.
How can I investigate where does the inserting time come from? I have only Visual Studio installed so far (but I am opened to other options)
Here is the SQL code of the table in question :
CREATE TABLE [dbo].[KfStatDatas]
(
[Id] INT IDENTITY (1, 1) NOT NULL,
[DistrictId] INT NOT NULL,
[StatId] INT NOT NULL,
[DataSourceId] INT NOT NULL,
[Value] NVARCHAR(300) NULL,
[SnapshotDate] DATETIME2(7) NOT NULL
);
EDIT
I ran SQL Server Management Studio, and I found the request that is the slowing down the whole process. It is the insertion request.
But, by looking at the SQL Request create by Entity Framework, it looks like it's doing an inner join and going through the whole table, which would explain why the processing time increases with the table.
I may miss a point but why would you need to enumerate the whole table to add rows?
Raw request being executed :
SELECT [t].[Id]
FROM [KfStatDatas] t
INNER JOIN #inserted0 i ON ([t].[Id] = [i].[Id])
ORDER BY [i].[_Position]
EDIT and SOLUTION
I eventually found the issue, and it was a stupid mistake : my Id field was not declared as primary key! So the system had to go through the whole DB for every inserted row. I added the PK and it now takes...100 ms for 200 rows, and this duration is stable.
Thanks for your time!
I think you may simply missing an primary key. You've declared to EF that Id is the Entity Key, but you don't have a unique index on the table to enforce that.
And when EF wants to fetch the inserted IDs, without an index, it's expensive. So this query
SELECT t.id from KfStatDatas t
inner join #inserted0 i
on t.id = i.id
order by i._Position
performs 38K logical reads, and takes 16sec on average.
So try:
ALTER TABLE [dbo].[KfStatDatas]
ADD CONSTRAINT PK_KfStatDatas
PRIMARY KEY (id)
BTW are you sure this is EF6? This looks more like EF Core batch insert.
No 400K rows is not large.
The most efficient way to insert a large number of rows from .NET is with SqlBulkCopy. This should take seconds rather than minutes for 400K rows.
With batching individual inserts, execute the entire batch in a single transaction to improve throughput. Otherwise, each insert is committed individually, requiring a synchronous flush of the log buffer to disk for each insert to harden the transaction.
EDIT:
I see from your comment that you are using Entity Framework. This answer may help you use SqlBulkCopy with EF.

How to insert millions of data of different RDBMS in to SQL Server database with insert statement?

I have two databases in my SQL Server with each database containing 1 single table as of now.
I have 2 database like below :
1) Db1 (MySQL)
2) Db2 (Oracle)
Now what I want to do is fill my database table of SQL Server db1 with data from Db1 from MySQL like below :
Insert into Table1 select * from Table1
Select * from Table1(Mysql Db1) - Data coming from Mysql database
Insert into Table1(Sql server Db1) - Insert data coming from Mysql
database considering same schema
I don't want to use sqlbulk copy as I don't want to insert chunk by chunk data. I want to insert all data in 1 go considering millions of data as my operation is just not limited to insert records in database. So user have to sit wait for a long like first millions of data inserting chunk by chunk in database and then again for my further operation which is also long running operation.
So if I have this process speed up then I can have my second operation also speed up considering all records are in my 1 local sql server instance.
Is this possible to achieve in a C# application?
Update: I researched about Linked server as #GorDon Linoff suggested me that linked server can be use to achieve this scenario but based on my research it seems like i cannot create linked server through code.
I want to do this with the help of ado.net.
This is what I am trying to do exactly:
Consider I have 2 different client RDBMS with 2 database and some tables in client premises.
So database is like this :
Sql Server :
Db1
Order
Id Amount
1 100
2 200
3 300
4 400
Mysql or Oracle :
Db1:
Order
Id Amount
1 1000
2 2000
3 3000
4 400
Now I want to compare Amount column from source (SQL Server) to destination database (MySQL or Oracle).
I will be use to join this 2 different RDBMS databases tables to compare Amount columns.
In C# what I can do is like fetch chunk by chunk records in my datatable (in memory) then compare this records with the help of code but this will take so much time considering millions of records.
So I want to do something better than this.
Hence I was thinking that i bring out this 2 RDBMS records in my local SQL server instance in 2 databases and then create join query joining this 2 tables based on Id and then take advantage of DBMS processing capability which can compare this millions of records efficiently.
Query like this compares millions of records efficiently :
select SqlServer.Id,Mysql.Id,SqlServer.Amount,Mysql.Amount from SqlServerDb.dbo.Order as SqlServer
Left join MysqlDb.dbo.Order as Mysql on SqlServer.Id=Mysql.Id
where SqlServer.Amount != Mysql.Amount
Above query works when I have this 2 different RDBMS data in my local server instance with database : SqlServerDb and MysqlDb and this will fetch below records whose amount is not matching :
So I am trying to get those records from source(Sql server Db) to MySQL whose Amount column value is not matching.
Expected Output :
Id Amount
1 1000
2 2000
3 3000
So there is any way to achieve this scenario?
On the SELECT side, create a .csv file (tab-delimited) using SELECT ... INTO OUTFILE ...
On the INSERT side, use LOAD DATA INFILE ... (or whatever the target machine syntax is).
Doing it all at once may be easier to code than chunking, and may (or may not) be faster running.
SqlBulkCopy can accept either a DataTable or a System.Data.IDataReader as its input.
Using your query to read the source DB, set up a ADO.Net DataReader on the source MySQL or Oracle DB and pass the reader to the WriteToServer() method of the SqlBulkCopy.
This can copy almost any number of rows without limit. I have copied hundreds of millions of rows using the data reader approach.
What about adding a changed date in the remote database.
Then you could get all rows that have changed since the last sync and just compare those?
First of all do not use linked server. It is tempting but it will more trouble than it is bringing on the table. Like updates and inserts will fetch all of the target db to source db and do insert/update and post all data to target back.
As far as I understand you are trying to copy changed data to target system for some stuff.
I recommend using a timestamp column on source table. When anything changes on source table timestamp column is updated by sql server.
On target, get max ID and max timestamp. two queries at max.
On source, rows where source.ID <= target.MaxID && source.timestamp >= target.MaxTimeTamp is true, are the rows that changed after last sync (need update). And rows where source.ID > target.MaxID is true, are the rows that are inserted after last sync.
Now you do not have to compare two worlds, and you just got all updates and inserts.
You need to create a linked server connection using ODBC and the proper driver, after that you can execute the queries using openquery.
Take a look at openquery:
https://msdn.microsoft.com/en-us/library/ms188427(v=sql.120).aspx
Yes, SQL Server is very efficient when it's working with sets so let's keep that in play.
In a nutshell, what I'm pitching is
Load data from the source to a staging table on the target database (staging table = table to temporarily hold raw data from the source table, same structure as the source table... add tracking columns to taste). This will be done by your C# code... select from source_table into DataTable then SqlBulkCopy to the staging table.
Have a stored proc on the target database to reconcile the data between your target table and the staging table. Your C# code calls the stored proc.
Given that you're talking about millions of rows, another thing that can make things faster is dropping indices on the staging table before inserting to it and recreating those after the inserts and before any select is performed.

Does anyone know of a way to paginate a call to GetSchema from C#?

I'm using the ADO.NET provider function "GetSchema" to fetch meta data out of a Sql Server database (and an Informix system as well) and want to know if there is anyway to paginate the results. I ask because one of the systems has over 3,000 tables (yes, three thousand) and twice that many views and let's not even talk about the stored procedures.
Needless to say, trying to bring down that list in one shot is too much for the VM I have running (a mere 4GB of memory). I'm already aware of the restrictions that can be applied, these are all tables in the "dbo" schema so there isn't much else that I'm aware of for limiting the result set before it gets to my client.
Instead of using GetSchema I suggest to use the more flexible INFORMATION_SCHEMA system views. These views already divide the information about the Tables, StoredProcedures and Views and you can write a specific query to retrieve your data in a paginated way.
For example to retrieve the first 100 rows of table names you could write a query like this
SELECT *
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY TABLE_NAME) AS RowNum, *
FROM INFORMATION_SCHEMA.TABLES
) AS TableWithRowNum
WHERE RowNum >= 0
AND RowNum < 100
ORDER BY RowNum
Following queries could be easily prepared changing the min and max values used by the query.
The same code could be applied for StoredProcedures (using INFORMATION_SCHEMA.ROUTINES WHERE ROUTINE_TYPE = 'PROCEDURE') or the views using INFORMATION_SCHEMA.VIEWS
Note, if you are using Sql Server 2012 and later the first query could be rewritten to use this syntax
SELECT *
FROM INFORMATION_SCHEMA.TABLES
ORDER BY TABLE_NAME
OFFSET 0 ROWS FETCH NEXT 100 ROWS ONLY
And the C# code could also use parameters for the FIRST (0) and COUNT(100) values

Efficient insert statement

I'm looking for an efficient way of inserting records into SQL server for my C#/MVC application. Anyone know what the best method would be?
Normally I've just done a while loop and insert statement within, but then again I've not had quite so many records to deal with. I need to insert around half a million, and at 300 rows a minute with the while loop, I'll be here all day!
What I'm doing is looping through a large holding table, and using it's rows to create records in a different table. I've set up some functions for lookup data which is necessary for the new table, and this is no doubt adding to the drain.
So here is the query I have. Extremely inefficient for large amounts of data!
Declare #HoldingID int
Set #HoldingID = (Select min(HoldingID) From HoldingTable)
While #JourneyHoldingID IS NOT NULL
Begin
Insert Into Journeys (DepartureID, ArrivalID, ProviderID, JourneyNumber, Active)
Select
dbo.GetHubIDFromName(StartHubName),
dbo.GetHubIDFromName(EndHubName),
dbo.GetBusIDFromName(CompanyName),
JourneyNo, 1
From Holding
Where HoldingID = #HoldingID
Set #HoldingID = (Select MIN(HoldingID) From Holding Where HoldingID > #HoldingID)
End
I've heard about set-based approaches - is there anything that might work for the above problem?
If you want to insert a lot of data into a MSSQL Server then you should use BULK INSERTs - there is a command line tool called the bcp utility for this, and also a C# wrapper for performing Bulk Copy Operations, but under the covers they are all using BULK INSERT.
Depending on your application you may want to insert your data into a staging table first, and then either MERGE or INSERT INTO SELECT... to transfer those rows from the staging table(s) to the target table(s) - if you have a lot of data then this will take some time, however will be a lot quicker than performing the inserts individually.
If you want to speed this up then are various things that you can do such as changing the recovery model or tweaking / removing triggers and indexes (depending on whether or not this is a live database or not). If its still really slow then you should look into doing this process in batches (e.g. 1000 rows at a time).
This should be exactly what you are doing now.
Insert Into Journeys(DepartureID, ArrivalID, ProviderID, JourneyNumber, Active)
Select
dbo.GetHubIDFromName(StartHubName),
dbo.GetHubIDFromName(EndHubName),
dbo.GetBusIDFromName(CompanyName),
JourneyNo, 1
From Holding
ORDER BY HoldingID ASC
you (probably) are able to do it in one statement of the form
INSERT INTO JOURNEYS
SELECT * FROM HOLDING;
Without more information about your schema it is difficult to be absolutely sure.
SQLServer 2008 introduced Table Parameters. These allow you to insert multiple rows in a single trip to the database (send it as a large blob). Without using a temporary table. This article describes how it works (step four in the article)
http://www.altdevblogaday.com/2012/05/16/sql-server-high-performance-inserts/
It differs from bulk inserts in that you do not need special utilities and that all constraints and foreign keys are checked.
I quadrupled my throughput using this and parallelizing the inserts. Now at 15.000 inserts/second in the same table sustained. Regular table with indexes and over a billion rows.

SELECT with "datetime > string" performance issue in EF4 / SQL Server 2008

I am using EntityFramework 4 to access a SQL Server 2008 database.
One of the SQL queries that the EF generates is having a behavior that I cannot explain.
The query is like this:
SELECT tableA.field1, tableA.field2, ...
FROM tableA join tableB on tableA.field1 = tableB.field1
WHERE
tableA.field2 > '20110825'
and tableA.field3 in ('a', 'b', 'c,')
and tableB.field4 = 'xxx'
Where tableA.field2 is datetime not null, and the other fields are varchars.
tableA contains circa 1.5 million records, tableB contains circa 2 million records, and the query returns 1877 rows.
The problem is, it returns them in 86 seconds, and that time changes dramatically when I change the '20110825' literal to older values.
For instance if I put '20110725' the query returns 3483 rows in 35 milliseconds.
I found out in the execution plan that the difference between the two lies in the indexes SQL Server chooses to use depending on the date used to compare.
When it is taking time, the execution plan shows:
50%: index seek on tableA.field2 (it's a clustered index on this field alone)
50%: index seek on tableB.field1 (non-unique, non-clustered index on this field alone)
0%: join
When it is almost instantaneous, the execution plan shows:
98%: index seek on tableA.field1 (non-unique, non-clustered index on this field alone)
2%: index seek on tableB.field1 (non-unique, non-clustered index on this field alone)
0%: join
So it seems to me that the decision of the optimizer to use the clustered index on tableA.field2 is not optimal.
Is there a flaw in the database design? In the SQL query?
Can I force in any way the database to use the correct execution plan?
Given that you are using literal values and are only encountering the issue with recent date strings I would suspect you are hitting the issue described here and need to schedule a job to update your statistics.
Presumably when they were last updated there were few or no rows meeting the '20110825' criteria and SQL Server is using a join strategy predicated on that assumption.

Categories