I need to update a row of alarms with Linq to SQL, which can contain over 100000 rows.
Which means that a simple update such as:
foreach (var alarm in Alarms)
{
alarm.Alarm_Ack_UTC = DateTime.UtcNow;
}
SubmitChanges();
gives me a SQL query of
SELECT [t0].[Alarm_ID], [t0].[Alarm_Application_Number], [t0].[Alarm_Ack_UTC], [t0].[Alarm_DateTime_UTC], [t0].[Alarm_Message_Number], [t0].[Username], [t0].[Runtime_Message], [t0].[Alarm_Application_Name], [t0].[Alarm_Application_Computer], [t0].[Alarm_GUID], [t0].[Alarm_Comments]
FROM [Alarms] AS [t0]
GO
-- Region Parameters
DECLARE #p0 Int = 1
DECLARE #p1 DateTime = '2012-03-16 11:56:25.850'
-- EndRegion
UPDATE [Alarms]
SET [Alarm_Ack_UTC] = #p1
WHERE [Alarm_ID] = #p0
GO
-- Region Parameters
DECLARE #p0 Int = 2
DECLARE #p1 DateTime = '2012-03-16 11:56:25.851'
-- EndRegion
UPDATE [Alarms]
SET [Alarm_Ack_UTC] = #p1
WHERE [Alarm_ID] = #p0
GO
-- Region Parameters
DECLARE #p0 Int = 3
DECLARE #p1 DateTime = '2012-03-16 11:56:25.851'
-- EndRegion
UPDATE [Alarms]
SET [Alarm_Ack_UTC] = #p1
WHERE [Alarm_ID] = #p0
GO
Repeated 100000 times, which is really slow, inefficient and unoptimized.
The real query is more advanced, and update more data, uses a .Where(a => a.Time != null) and other things.
But just to improve the query above, which could be replaced with the very efficient SQL query:
UPDATE [Alarms]
SET Alarm_Ack_UTC = GETUTCDATE()
GO
How can one achieve this with Linq to SQL? Or is it impossible?
You can't do this with LINQ to SQL (or any other O/RM). They will always fetch an object from the database that you want to change and have a single update statement for that entity. If you change 10,000 entities, you will have at least 10,000 update statements.
If this is too slow, switch to a stored procedure or manual SQL statement in that case.
I'd opt for writing a stored procedure.
You can then map to this stored procedure in your Linq-to-SQL designer by dragging it over to your design. It will then appear as a method of your DataContext and will result in much more efficient design.
If you need to run specific optimized SQL like this (via Linq to SQL) you will need to use ExecuteQuery.
So using your example you could do:
db.ExecuteQuery<Alarm>("UPDATE [Alarms] SET Alarm_Ack_UTC = GETUTCDATE()");
If you want a more optimized way of updating multiple rows with different values then you would need to think about using SqlBulkCopy which is SQL server specific, but isn't Linq to SQL.
Related
I am facing a peculiar issue with loading a list of tables from a specific database (well rather a group of databases) while attached to the master database. Currently my query loads all of the databases on the server, then loops through those databases sending information back to the client via RAISERROR. As this loop is executing I need a nested loop to load all of the tables for the current database for later transmission as a SELECT once the query has completed. The issue I'm running into is that this will be executed as a single query inside of C# code. Ideally I would like to load everything in SQL and return it to the client for processing. For example:
WHILE (#dbLoop < #dbCount) BEGIN
-- Do cool things and send details back to client.
SET #dbName = (SELECT _name FROM dbTemp WHERE _id = #dbLoop);
-- USE [#dbName]
-- Get a count of the tables from info schema on the newly specified database.
WHILE (#tableLoop < #tableCount) BEGIN
-- USE [#dbName]
-- Do super cool things and load tables from info schema.
SET #tableLoop += 1;
END
SET #dbLoop += 1;
END
-- Return the list of tables from all databases to the client for use with SQLDataAdapter.
SELECT * FROM tableTemp;
This topic is pretty straight forward; I just need a way to access tables in a specified database (preferably by name) without having to change the connection on the SqlConnection object, and without having to have a loop inside of my C# code to process the same query on each database on the C# side. It would be more efficient to load everything in SQL and send it back to the application. Any help that can be provided on this would be great!
Thanks,
Jamie
All the tables are in the meta data you can just do a query against that and join to your list of schemas you want to look at.
SELECT tab.name
FROM sys.tables AS tab
JOIN sys.schemas AS sch on tab.schema_id = sch.schema_id
JOIN dbTemp temp on sch.name = temp.[_name]
This returns a list of the table to return back as a result set.
The statement USE [#dbName] takes effect AFTER it is run (usually via the GO statement.
USE [#dbName]
GO
The above 2 lines would make you start using the new Database. You cannot use this in the middle of your SQL or SP.
One other option which you can use is to use the dot notation, i.e., dbname..tablename syntax to query your tables.
double dot notation post
Okay, after spending all day working on this, I have finally come up with a solution. I load all the databases into a table variable, then I begin looping through those databases and send back their details to the client. After the database details themselves have been sent to the client via RAISERROR I then utilize sp_executesql to execute a new sub-query with the current database specified to get the list of tables for processing at the end of the primary. The example below demonstrates the basic structure of this process for others experiencing this issue in the future.
Thank you all once again for your help!
-Jamie
DECLARE #LoopCounter INT = 1, #DatabaseCount INT = 0;
DECLARE #SQL NVARCHAR(MAX), #dbName NVARCHAR(MAX);
DECLARE #Databases TABLE ( _id INT, _name NVARCHAR(MAX) );
DECLARE #Tables TABLE ( _name NVARCHAR(MAX), _type NVARCHAR(15) );
INSERT INTO #Databases
SELECT ROW_NUMBER() OVER(ORDER BY name) AS id, name
FROM sys.databases
WHERE name NOT IN ( 'master', 'tempdb', 'msdb', 'model' );
SET #DatabaseCount = (SELECT COUNT(*) FROM #Databases);
WHILE (#LoopCounter <= #DatabaseCount) BEGIN
SET #dbName = (SELECT _name FROM #Databases WHERE _id = #LoopCounter);
SET #SQL NVARCHAR(MAX) = 'SELECT TABLE_NAME, TABLE_TYPE
FROM [' + #dbName + '].INFORMATION_SCHEMA.TABLES';
INSERT INTO #Tables EXEC sp_executesql #SQL;
SET #LoopCounter += 1;
END
Currently development team is done their application, and as a tester needs to insert 1000000 records into the 20 tables, for performance testing.
I gone through the tables and there is relationship between all the tables actually.
To insert that much dummy data into the tables, I need to understand the application completely in very short span so that I don't have the dummy data also by this time.
In SQL server is there any way to insert this much data insertion possibility.
please share the approaches.
Currently I am planning with the possibilities to create dummy data in excel, but here I am not sure the relationships between the tables.
Found in Google that SQL profiler will provide the order of execution, but waiting for the access to analyze this.
One more thing I found in Google is red-gate tool can be used.
Is there any script or any other solution to perform this tasks in simple way.
I am very sorry if this is a common question, I am working first time in SQL real time scenario. but I have the knowledge on SQL.
Why You don't generate those records in SQL Server. Here is a script to generate table with 1000000 rows:
DECLARE #values TABLE (DataValue int, RandValue INT)
;WITH mycte AS
(
SELECT 1 DataValue
UNION all
SELECT DataValue + 1
FROM mycte
WHERE DataValue + 1 <= 1000000
)
INSERT INTO #values(DataValue,RandValue)
SELECT
DataValue,
convert(int, convert (varbinary(4), NEWID(), 1)) AS RandValue
FROM mycte m
OPTION (MAXRECURSION 0)
SELECT
v.DataValue,
v.RandValue,
(SELECT TOP 1 [User_ID] FROM tblUsers ORDER BY NEWID())
FROM #values v
In table #values You will have some random int value(column RandValue) which can be used to generate values for other columns. Also You have example of getting random foreign key.
Below is a simple procedure I wrote to insert millions of dummy records into the table, I know its not the most efficient one but serves the purpose for a million records it takes around 5 minutes. You need to pass the no of records you need to generate while executing the procedure.
IF EXISTS (SELECT 1 FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[DUMMY_INSERT]') AND type in (N'P', N'PC'))
BEGIN
DROP PROCEDURE DUMMY_INSERT
END
GO
CREATE PROCEDURE DUMMY_INSERT (
#noOfRecords INT
)
AS
BEGIN
DECLARE #count int
SET #count = 1;
WHILE (#count < #noOfRecords)
BEGIN
INSERT INTO [dbo].[LogTable] ([UserId],[UserName],[Priority],[CmdName],[Message],[Success],[StartTime],[EndTime],[RemoteAddress],[TId])
VALUES(1,'user_'+CAST(#count AS VARCHAR(256)),1,'dummy command','dummy message.',0,convert(varchar(50),dateadd(D,Round(RAND() * 1000,1),getdate()),121),convert(varchar(50),dateadd(D,Round(RAND() * 1000,1),getdate()),121),'160.200.45.1',1);
SET #count = #count + 1;
END
END
you can use the cursor for repeat data:
for example this simple code:
Declare #SYMBOL nchar(255), --sample V
#SY_ID int --sample V
Declare R2 Cursor
For SELECT [ColumnsName]
FROM [TableName]
For Read Only;
Open R2
Fetch Next From R2 INTO #SYMBOL,#SY_ID
While (##FETCH_STATUS <>-1 )
Begin
Insert INTO [TableName] ([ColumnsName])
Values (#SYMBOL,#SY_ID)
Fetch Next From R2 INTO #SYMBOL,#SY_ID
End
Close R2
Deallocate R2
/*wait a ... moment*/
SELECT COUNT(*) --check result
FROM [TableName]
This question already has an answer here:
Stored Proc slower from application than Management Studio
(1 answer)
Closed 9 years ago.
This is my dynamic query used on search form which runs in milliseconds in SSMS roughly between 300 to 400 ms:
exec sp_executesql N'set arithabort off;
set transaction isolation level read uncommitted;
With cte as
(Select ROW_NUMBER() OVER
(Order By Case When d.OldInstrumentID IS NULL
THEN d.LastStatusChangedDateTime Else d.RecordingDateTime End
desc) peta_rn,
d.DocumentID
From Documents d
Inner Join Users u on d.UserID = u.UserID
Inner Join IGroupes ig on ig.IGroupID = d.IGroupID
Inner Join ITypes it on it.ITypeID = d.ITypeID
Where 1=1
And (CreatedByAccountID = #0 Or DocumentStatusID = #1 Or DocumentStatusID = #2 )
And (d.JurisdictionID = #3 Or DocumentStatusID = #4 Or DocumentStatusID = #5)
AND ( d.DocumentStatusID = 9 )
)
Select d.DocumentID, d.IsReEfiled, d.IGroupID, d.ITypeID, d.RecordingDateTime,
d.CreatedByAccountID, d.JurisdictionID,
Case When d.OldInstrumentID IS NULL THEN d.LastStatusChangedDateTime
Else d.RecordingDateTime End as LastStatusChangedDateTime,
dbo.FnCanChangeDocumentStatus(d.DocumentStatusID,d.DocumentID) as CanChangeStatus,
d.IDate, d.InstrumentID, d.DocumentStatusID,ig.Abbreviation as IGroupAbbreviation,
u.Username, j.JDAbbreviation, inf.DocumentName,
it.Abbreviation as ITypeAbbreviation, d.DocumentDate,
ds.Abbreviation as DocumentStatusAbbreviation,
Upper(dbo.GetFlatDocumentName(d.DocumentID)) as FlatDocumentName
From Documents d
Left Join IGroupes ig On d.IGroupID = ig.IGroupID
Left Join ITypes it On d.ITypeID = it.ITypeID
Left Join Users u On u.UserID = d.UserID
Left Join DocumentStatuses ds On d.DocumentStatusID = ds.DocumentStatusID
Left Join InstrumentFiles inf On d.DocumentID = inf.DocumentID
Left Join Jurisdictions j on j.JurisdictionID = d.JurisdictionID
Inner Join cte on cte.DocumentID = d.DocumentID
Where 1=1
And peta_rn>=#6 AND peta_rn<=#7
Order by peta_rn',
N'#0 int,#1 int,#2 int,#3 int,#4 int,#5 int,#6 bigint,#7 bigint',
#0=44,#1=5,#2=9,#3=1,#4=5,#5=9,#6=94200,#7=94250
This sql is formed in C# code and the where clauses are added dynamically based on the value the user has searched in search form. It takes roughly 3 seconds to move from one page to 2nd. I already have necessary indexes on most of the columns where I search.
Any idea why would my Ado.Net code be slow?
Update: Not sure if execution plans would help but here they are:
It is possible that SQL server has created inappropriate query plan for ADO.NET connections. We have seen similar issues with ADO, usual solution is to clear any query plans and run slow query again - this may create better plan.
To clear query plans most general solution is to update statistics for involved tables. Like next for you:
update statistics documents with fullscan
Do same for other tables involved and then run your slow query from ADO.NET (do not run SSMS before).
Note that such timing inconsistencies may hint of bad query or database design - at least for us that is usually so :)
If you run a query repeatedly in SSMS, the database may re-use a previously created execution plan, and the required data may already be cached in memory.
There are a couple of things I notice in your query:
the CTE joins Users, IGroupes and ITypes, but the joined records are not used in the SELECT
the CTE performs an ORDER BY on a calculated expression (notice the 85% cost in (unindexed) Sort)
probably replacing the CASE expression with a computed persisted column which can be indexed speeds up execution.
note that the ORDER BY is executed on data resulting from joining 4 tables
the WHERE condition of the CTE states AND d.DocumentStatusID = 9, but AND's other DocumentStatusIDs
paging is performed on the result of 8 JOINed tables.
most likely creating an intermediate CTE which filters the first CTE based on peta_rn improves performance
.net by default uses UTF strings, which equates to NVARCHAR as opposed to VARCHAR.
When you are doing a WHERE ID = #foo in dot net, you are likely to be implicitly doing
WHERE CONVERT(ID, NVARCHAR) = #foo
The result is that this where clause can't be indexed, and must be table scanned. The solution is to actually pass each parameter into the SqlCommand as a DbParameter with the DbType set to VARCHAR (in the case of string).
A similar situation could of course occur with Int types if the .net parameter is "wider" than the SQL column equivalent.
PS The easiest way to "prove" this issue is to run your query in SSMS with the following above
DECLARE #p0 INT = 123
DECLARE #p1 NVARCHAR = "foobar" //etc etc
and compare with
DECLARE #p0 INT = 123
DECLARE #p1 VARCHAR = "foobar" //etc etc
I am running SQL Server and I have a stored procedure. I want do a select statement with a WHERE IN clause. I don't know how long the list will be so right now I have tried something as follows
SELECT * FROM table1 WHERE id IN (#idList)
in this solution #idList is a varChar(max). but this doesn't work. I heard about passing in table values, but I am confused about how to do that. Any help would be great
I would suggest using a function to split the incoming list (use the link that Martin put in his comment).
Store the results of the split function in a temporary table or table variable and join it in your query instead of the WHERE clause
select * into #ids from dbo.Split(',', #idList)
select t.*
from table1 t
join #ids i
on t.id = i.s
The most efficient way would be to pass in a table valued parameter (if you're on SQL Server 2008), or an XML parameter (if you're on SQL Server 2005/2000). If your list is small (and you're on SQL Server 2005/2000), passing in your list as a comma (or otherwise) delimited list and using a split function to divide the values out into rows in a temporary table is also an option.
Whichever option you use, you would then join this table (either the table parameter, the table resulting from the XML select, or the temporary table created by the values from the split) to your main query.
Here is a table valued function that takes a nvarchar and returns a table to join on:
Create function [ReturnValues]
(
#Values nvarchar(4000)
)
Returns #ValueTable table(Value nvarchar(2000))
As
Begin
Declare #Start int
Declare #End int
Set #Start = 1
Set #End = 1
While #Start <= len(#Values)
Begin
Set #End = charindex(',', #Values, #Start)
If #End = 0
Set #End = len(#Values) + 1
Insert into #ValueTable
Select rtrim(ltrim(substring(#Values, #Start, #End - #Start)))
Set #Start = #End + 1
End
Return
End
GO
Binding an #idList parameter as you suggested is not possible with SQL.
The best would be bulk inserting the ids into a separated table and than query that table by using an subselect, or joining the IDs.
e.g.
INSERT INTO idTable (id, context) values (#idValue, 1);
INSERT INTO idTable (id, context) values (#idValue, 1);
INSERT INTO idTable (id, context) values (#idValue, 1); // as often as you like
SELECT * FROM table1, idTable WHERE table1.id == idTable.id and idTable.context = 1
The context must be a unique value that identifies the Id Range. That is important for running the stored proc parallel. Without the context information, running the stored procecure in parallel would mix the values from different selections.
If the number of parameters are reasonably small (< 100) you can use several parameters
SELECT * FROM table1 WHERE IN id IN (#id1, #id2, #id3)
If it is longer, look for a split function.
I have a sql statement which is hardcoded in an existing VB6 app. I'm upgrading a new version in C# and using Linq To Sql. I was able to get LinqToSql to generate the same sql (before I start refactoring), but for some reason the Sql generated by LinqToSql is 5x slower than the original sql. This is running the generated Sql Directly in LinqPad.
The only real difference my meager sql eyes can spot is the
WITH (NOLOCK), which if I add into the LinqToSql generated sql, makes no difference.
Can someone point out what I'm doing wrong here? Thanks!
Existing Hard Coded Sql (5.0 Seconds)
SELECT DISTINCT
CH.ClaimNum, CH.AcnProvID, CH.AcnPatID, CH.TinNum, CH.Diag1, CH.GroupNum, CH.AllowedTotal
FROM Claims.dbo.T_ClaimsHeader AS CH WITH (NOLOCK)
WHERE
CH.ContractID IN ('123A','123B','123C','123D','123E','123F','123G','123H')
AND ( ( (CH.Transmited Is Null or CH.Transmited = '')
AND CH.DateTransmit Is Null
AND CH.EobDate Is Null
AND CH.ProcessFlag IN ('Y','E')
AND CH.DataSource NOT IN ('A','EC','EU')
AND CH.AllowedTotal > 0 ) )
ORDER BY CH.AcnPatID, CH.ClaimNum
Generated Sql from LinqToSql (27.6 Seconds)
-- Region Parameters
DECLARE #p0 NVarChar(4) SET #p0 = '123A'
DECLARE #p1 NVarChar(4) SET #p1 = '123B'
DECLARE #p2 NVarChar(4) SET #p2 = '123C'
DECLARE #p3 NVarChar(4) SET #p3 = '123D'
DECLARE #p4 NVarChar(4) SET #p4 = '123E'
DECLARE #p5 NVarChar(4) SET #p5 = '123F'
DECLARE #p6 NVarChar(4) SET #p6 = '123G'
DECLARE #p7 NVarChar(4) SET #p7 = '123H'
DECLARE #p8 VarChar(1) SET #p8 = ''
DECLARE #p9 NVarChar(1) SET #p9 = 'Y'
DECLARE #p10 NVarChar(1) SET #p10 = 'E'
DECLARE #p11 NVarChar(1) SET #p11 = 'A'
DECLARE #p12 NVarChar(2) SET #p12 = 'EC'
DECLARE #p13 NVarChar(2) SET #p13 = 'EU'
DECLARE #p14 Decimal(5,4) SET #p14 = 0
-- EndRegion
SELECT DISTINCT
[t0].[ClaimNum],
[t0].[acnprovid] AS [AcnProvID],
[t0].[acnpatid] AS [AcnPatID],
[t0].[tinnum] AS [TinNum],
[t0].[diag1] AS [Diag1],
[t0].[GroupNum],
[t0].[allowedtotal] AS [AllowedTotal]
FROM [Claims].[dbo].[T_ClaimsHeader] AS [t0]
WHERE
([t0].[contractid] IN (#p0, #p1, #p2, #p3, #p4, #p5, #p6, #p7))
AND (([t0].[Transmited] IS NULL) OR ([t0].[Transmited] = #p8))
AND ([t0].[DATETRANSMIT] IS NULL)
AND ([t0].[EOBDATE] IS NULL)
AND ([t0].[PROCESSFLAG] IN (#p9, #p10))
AND (NOT ([t0].[DataSource] IN (#p11, #p12, #p13)))
AND ([t0].[allowedtotal] > #p14)
ORDER BY [t0].[acnpatid], [t0].[ClaimNum]
New LinqToSql Code (30+ seconds... Times out )
var contractIds = T_ContractDatas.Where(x => x.EdiSubmissionGroupID == "123-01").Select(x => x.CONTRACTID).ToList();
var processFlags = new List<string> {"Y","E"};
var dataSource = new List<string> {"A","EC","EU"};
var results = (from claims in T_ClaimsHeaders
where contractIds.Contains(claims.contractid)
&& (claims.Transmited == null || claims.Transmited == string.Empty )
&& claims.DATETRANSMIT == null
&& claims.EOBDATE == null
&& processFlags.Contains(claims.PROCESSFLAG)
&& !dataSource.Contains(claims.DataSource)
&& claims.allowedtotal > 0
select new
{
ClaimNum = claims.ClaimNum,
AcnProvID = claims.acnprovid,
AcnPatID = claims.acnpatid,
TinNum = claims.tinnum,
Diag1 = claims.diag1,
GroupNum = claims.GroupNum,
AllowedTotal = claims.allowedtotal
}).OrderBy(x => x.ClaimNum).OrderBy(x => x.AcnPatID).Distinct();
I'm using the list of constants above to make LinqToSql Generate IN ('xxx','xxx',etc) Otherwise it uses subqueries which are just as slow...
Compare the execution plans for the two queires. The linqtosql query is using loads of parameters, the query optimiser will build an execution plan based on what MIGHT be in the parameters, the hard coded SQL has literal values, the query optimiser will build an execution plan based on the actual values. It is probably producing a much more eficient plan for the literal values. Your best bet is to try and spot the slow bits in the execution plan and try and get linq2sql to produce a better query. If you can't but you think you can build one by hand then create an SP, which you can then expose as a method on your data context class in linqtosql.
The hard-coded values in the first SQL may be allowing the query optimizer to use indexes that it doesn't know it can efficiently use for the second, parameterised, SQL.
Another possibility is that if you're running the hand-crafted SQL in SQL Server Management Studio, the different default SET-tings of SSMS compared to the .NET SQL Server provider may be affecting performance. If this is the case, changing some of the SET-tings on the .NET connection prior to executing the command might help (e.g. SET ARITHABORT ON) but I don't know if you can do this in LinqPad. See here for more info on this possibility.
The big difference are the parameters.
I can't know for sure without analyzing the plans, but L2S parameterizes queries so that their plans can be effectively reused, avoiding excessive query recompilation on the server. This is, in general, a Good Thing because it keeps the CPU time low on the SQL Server -- it doesn't have to keep generating and generating and generating the same plan.
But L2S goes a bit overboard when you use constants. It parameterizes them, too, which can be detrimental for performance in certain situations.
Putting on my Aluminum-Foil Clairvoyancy Hat, I'm visualizing the kinds of index structures you might have on that table. For example, you may have an index just on ProcessFlag, and there may be very few values for "Y" and "E" for ProcessFlag, causing the query with the hard-coded constants to do a scan only of the values where ProcessFlag = "Y" and "E". For the parameterized query, SQL Server generates a plan which is judged to be optimal for arbitrary input. That means that the server can't take advantage of this little hint (the constants) that you give it.
My advice to you at this point is to take a good look at your indexes and favor composite indexes which cover more of your WHERE conditions together. I will bet that with a bit of that type of analysis, you will find that the query performance becomes far more similar. (and probably improves, in both cases!)
You might also check our compiled LINQ queries - http://www.jdconley.com/blog/archive/2007/11/28/linq-to-sql-surprise-performance-hit.aspx