Creating an Entity Framework object results in a long-running query

Creating an Entity Framework object results in a long-running query - c#

I have a database schema that includes tables Orders, Options [Id, OptionGroupId, ...], OrderOptions [Id, OptionId] and an associative table OrderSheetOptions [OrderSheetId, OrderOptionId].
My C# code is creating an OrderOption EF object (an object that maps to the database table):
// Select all default options
var options = Sheet.OptionGroups.Select(og => og.Option);
foreach (var option in options)
{
// Add all default options
OrderOptions
.Add(new OrderOption
{
Option = option
});
}
The problem seems to arise because I am setting the Option property of OrderOption when it gets created. This results in EF running this query:
exec sp_executesql N'SELECT
[Extent1].[Id] AS [Id],
[Extent1].[OptionId] AS [OptionId]
FROM [dbo].[OrderOptions] AS [Extent1]
WHERE [Extent1].[OptionId] = #EntityKeyValue1',N'#EntityKeyValue1 int',#EntityKeyValue1=1
Every Order has a default OrderOption with an OptionId = 1, so as you can see from the above query, every record is returned. As this table grows, this query is getting very slow and is severely impacting web site UX - delays of minutes.
I am typically only adding 3 records, as those are the default options. I have added an index to OrderOptions, but it doesn't help because every record in that table has an OptionId = 1, so the automatic EF query returns all records.
It looks like EF is querying the entire OrderOptions table to ensure that a duplicate object is not being created, but the OrderOption is Order-specific, so it only has to worry about the current order. However, Options is not directly tied to Orders; only indirectly via OrderSheetOptions. So, the EF query can't restrict itself to the current order. Can anyone tell me how I could optimize this so that EF doesn't have to query the entire OrderOptions table just to add a new item?
Thanks!

Related

Change Tracking Entity framework

i make table for property name that changed and value before and value after
How i can use Change Tracking to store changed in this table?

You can track the operation, the changed columns and the new values by using Change Tracking. However getting the old Value out of Change Tracking is not possible. SQL Server 2016 offers the new feature "Change data capture", which gives you the needed Information about the old value before the update/delete happened ( see https://msdn.microsoft.com/en-us/library/bb933994.aspx ).
If you don't have access to a SQL Server 2016, here is how you can configure Change Tracking:
Activate at Database
ALTER DATABASE <YourDatabase> f.e. DeviceDatabase
SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)
Activate Change Tracking for your needed tables
ALTER TABLE <YourTable> f.e. Devices
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = ON)
Setup a DB Job, which will copy change-information into your custom table every minute,hour,day (what you need)
DECLARE #minversion bigint;
SET #minversion = (SELECT MinVersion = CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID('Devices')) )
SELECT SYS_CHANGE_COLUMNS, e.Id FROM
CHANGETABLE(CHANGES Devices, #minversion) AS C
LEFT OUTER JOIN Devices AS e
ON e.Id = c.Id;
To Get the latest Value of the Changed Column you can try this (but beware of multiple updates of the same row. you only get the latest value).
CHANGE_TRACKING_IS_COLUMN_IN_MASK
(COLUMNPROPERTY(OBJECT_ID('Devices'), 'Id', 'ColumnId')
,c.sys_change_columns)
This will return 1 if Column changed, 0 if not. You can add this for every column of your table and join on value = 1 and then add the value to your query.
Finally, I would just recommend to use Stored Procedures to Update/Insert/Delete on your Tables. In those you can easily insert all information you want to store about the change in your custom table.
If you have SQL Server 2016 tho, try what I mentioned above, eventually.

Actually if you override the SaveChanges() method in your data context class you can access ChangeTracker. This gives you all the entities currently tracked by the context and their EntityState (if they are added/modified/deleted/unchanged etc).
Here you can get the DbEntityEntry class and from that get the entitys current values and/or its previous values if the entity is in the modified state.
public override int SaveChanges()
{
var allTrackedEntities = this.ChangeTracker.Entries().ToList();
return base.SaveChanges();
}
I currently use this method to do some basic auditing of who is doing that to what entity.

Entity Framework returning bad data

I have an Entity Framework 6.1 project that is querying a SQL Server 2012 database table and getting back incorrect results.
To illustrate what is happening, I created 2 queries that should have the exact same results. The table ProjectTable has 23 columns and 20500ish rows:
var test1 = db.ProjectTable
.GroupBy(t => t.ProjectOwner)
.Select(g => g.Key)
.ToArray();
var test2 = db.ProjectTable
.ToArray()
.GroupBy(t => t.ProjectOwner)
.Select(g => g.Key)
.ToArray();
The queries are designed to get a list of all of the distinct project owners in the table. The first query does the heavy lifting on the SQL Server, where as the second query downloads the entire table into memory and then processes it on the client side.
The first variable test1 has a length of about 300 items. The second variable test2 has a length of 5.
Here are the raw SQL queries that EF is generating:
-- test1
SELECT [Distinct1].[ProjectOwner] AS [ProjectOwner]
FROM ( SELECT DISTINCT
[Extent1].[ProjectOwner] AS [ProjectOwner]
FROM [dbo].[ProjectTable] as [Extent1]
) AS [Distinct1]
-- test2
SELECT Col1, Col2 ... ProjectOwner, ... Col23
FROM [dbo].[ProjectTable]
When I run this query and analyze the returned entities, I notice that the full 20500ish rows are returned, but the ProjectOwner column gets overridden with one of only 5 different users!
var test = db.ProjectTable.ToArray();
I thought that maybe it was the SQL Server, so I did a packet trace and filtered on TDS. Randomly looking through the raw streams I see many names that aren't in the list of 5, so I know that data is being sent across the wire correctly.
How do I see the raw data that EF is getting? Is there something that might be messing with the cache and pulling incorrect results?
If I run the queries in either SSMS or Visual Studio, the list returned is correctly. It is only EF that has this issue.
EDIT
Ok, I added another test to make sure my sanity is in check.
I took the test2 raw sql query and did the following:
var test3 = db.Database
.SqlQuery<ProjectTable>(#"SELECT Col1..Col23")
.ToArray()
.Select(t => t.ProjectOwner)
.Distict()
.ToArray();
and I get the correct 300ish names back!
So, in short:
Having EF send projected DISTINCT query to SQL Server returns the correct results
Having EF select the entire table and then using LINQ to project and DISTINCT the data returns incorrect results
Giving EF THE EXACT SAME QUERY!!! that bullet #2 generates and doing a raw SQL query, returns the correct results

After downloading the Entity Framework source and stepping through many an Enumerator, I found the issue.
In the Shaper.HandleEntityAppendOnlymethod (found here), on line 187 the Context.ObjectStateManager.FindEntityEntry method is called. To my surprise, a non-null value was returned! Wait a minute, there shouldn't be any cached results, since I'm returning all rows?!
That's when I discovered that my Table has no Primary Key!
In my defence, the table is actually a cache of a view that I'm working with, I just did a SELECT * INTO CACHETABLE FROM USERVIEW
I then looked at which column Entity Framework thought was my Primary Key (they call it a singleton key) and it just so happens that the column they picked had only... drum roll please... 5 unique values!
When I looked at the model that EF generated, sure enough! That column was specified as a primary key. I changed the key to the appropriate column and now everything is working as it should!

How to know how many persistent objects were deleted using Session.Delete(query);

We are refactoring a project from plain MySQL queries to the usage of NHibernate.
In the MySQL connector there is the ExecuteNonQuery function that returns the rows affected. So
int RowsDeleted = ExecuteNonQuery("DELETE FROM `table` WHERE ...");
would show me how many rows where effectively deleted.
How can I achieve the same with NHibernate? So far I can see it is not possible with Session.Delete(query);.
My current workaround is first loading all of the objects that are about to be deleted and delete them one-by-one, incrementing a counter on each delete. But that will cost performance I may assume.

If you don't mind that nHibernate will create delete statements for each row and maybe additional statements for orphans and/or other relationships, you can use session.Delete.
For better performance I would recommend to do batch deletes (see example below).
session.Delete
If you delete many objects with session.Delete, nHibernate makes sure that the integrity is preserved, it will load everything into the session if needed anyways. So there is no real reason to count your objects or have a method to retrieve the number of objects which have been deleted, because you would simply do a query before running the delete to determine the number of objects which will be affected...
The following statement will delete all entities of type post by id.
The select statement will query the database only for the Ids so it is actually very performant...
var idList = session.Query<Post>().Select(p => p.Id).ToList<int>();
session.Delete(string.Format("from Post where Id in ({0})", string.Join(",", idList.ToArray())));
The number of objects deleted will be equal to the number of Ids in the list...
This is actually the same (in terms of queries nHibernate will fire against your database) as if you would query<T> and loop over the result and delete all of them one by one...
Batch delete
You can use session.CreateSqlQuery to run native SQL commands. It also allows you to have input and output parameters.
The following statement would simply delete everything from the table as you would expect
session.CreateSQLQuery(#"Delete from MyTableName");
To retrieve the number of rows delete, we'll use the normal TSQL ##ROWCOUNT variable and output it via select. To retrieve the selected row count, we have to add an output parameter to the created query via AddScalar and UniqueResult simple returns the integer:
var rowsAffected = session.CreateSQLQuery(#"
Delete from MyTableName;
Select ##ROWCOUNT as NumberOfRows")
.AddScalar("NumberOfRows", NHibernateUtil.Int32)
.UniqueResult();
To pass input variables you can do this with .SetParameter(<name>,<value>)
var rowsAffected = session.CreateSQLQuery(#"
DELETE from MyTableName where ColumnName = :val;
select ##ROWCOUNT NumberOfRows;")
.AddScalar("NumberOfRows", NHibernateUtil.Int32)
.SetParameter("val", 1)
.UniqueResult();
I'm not so confortable with MySQL, the example I wrote is for MSSQL, I think in MySQL the ##ROWCOUNT equivalent would be SELECT ROW_COUNT();?

EF Competing SaveChanges() Calls

I am building a batch processing system. Batches of Units come in quantities from 20-1000. Each Unit is essentially a hierarchy of models (one main model and many child models). My task involves saving each model hierarchy to a database as a single transaction (either each hierarchy commits or it rolls back). Unfortunately EF was unable to handle two portions of the model hierarchy due to their potential to contain thousands of records.
What I've done to resolve this is set up SqlBulkCopy to handle these two potentially high count models and let EF handle the rest of the inserts (and referential integrity).
Batch Loop:
foreach (var unitDetails in BatchUnits)
{
var unitOfWork = new Unit(unitDetails);
Task.Factory.StartNew(() =>
{
unitOfWork.ProcessX(); // data preparation
unitOfWork.ProcessY(); // data preparation
unitOfWork.PersistCase();
});
}
Unit:
class Unit
{
public PersistCase()
{
using (var dbContext = new CustomDbContext())
{
// Need an explicit transaction so that
// EF + SqlBulkCopy act as a single block
using (var scope = new TransactionScope(TransactionScopeOption.Required,
new TransactionOptions() {
IsolationLevel = System.Transaction.IsolationLevel.ReadCommitted
}))
{
// Let EF Insert most of the records
// Note Insert is all it is doing, no update or delete
dbContext.Units.Add(thisUnit);
dbContext.SaveChanges(); // deadlocks, DbConcurrencyExceptions here
// Copy Auto Inc Generated Id (set by EF) to DataTables
// for referential integrity of SqlBulkCopy inserts
CopyGeneratedId(thisUnit.AutoIncrementedId, dataTables);
// Execute SqlBulkCopy for potentially numerous model #1
SqlBulkCopy bulkCopy1 = new SqlBulkCopy(...);
...
bulkCopy1.WriteToServer(dataTables["#1"]);
// Execute SqlBulkCopy for potentially number model #2
SqlBulkCopy bulkCopy2 = new SqlBulkCopy(...);
...
bulkCopy2.WriteToServer(dataTables["#2"]);
// Commit transaction
scope.Complete();
}
}
}
}
Right now I'm essentially stuck between a rock and a hard place. If I leave the IsolationLevel set to ReadCommitted, I get deadlocks between EF INSERT statements in different Tasks.
If I set the IsolationLevel to ReadUncommitted (which I thought would be fine since I'm not doing any SELECTs) I get DbConcurrencyExceptions.
I've been unable to find any good information about DbConcurrencyExceptions and Entity Framework but I'm guessing that ReadUncommitted is essentially causing EF to receive invalid "rows inserted" information.
UPDATE
Here is some background information on what is actually causing my deadlocking issues while doing INSERTS:
http://connect.microsoft.com/VisualStudio/feedback/details/562148/how-to-avoid-using-scope-identity-based-insert-commands-on-sql-server-2005
Apparently this same issue was present a few years ago when Linq To SQL came out and Microsoft fixed it by changing how scope_identity() gets selected. Not sure why their position has changed to this being a SQL Server problem when the same issue came up with Entity Framework.

This issue is explained fairly well here: http://connect.microsoft.com/VisualStudio/feedback/details/562148/how-to-avoid-using-scope-identity-based-insert-commands-on-sql-server-2005
Essentially its an internal EF issue. I migrated my code to use Linq To SQL and it now works fine (no longer does the unnecessary SELECT for the identity value).
Relevant quote from the exact same issue in Linq To Sql which was fixed:
When a table has an identity column, Linq to SQL generates extremely
inefficient SQL for insertion into such a table. Assume the table is
Order and the identiy column is Id. The SQL generated is:
exec sp_executesql N'INSERT INTO [dbo].[Order]([Colum1], [Column2])
VALUES (#p0, #p1)
SELECT [t0].[Id] FROM [dbo].[Order] AS [t0] WHERE [t0].[Id] =
(SCOPE_IDENTITY()) ',N'#p0 int,#p1 int,#p0=124,#p1=432
As one can see instead of returning SCOPE_IDENTITY() directly by using
'SELECT SCOPE_IDENTITY()', the generated SQL performs a SELECT on the
Id column using the value returned by SCOPE_IDENTITY(). When the
number of the records in the table is large, this significantly slows
down the insertion. When the table is partitioned, the problem gets
even worse.

TSQL: UPDATE with INSERT INTO SELECT FROM

so I have an old database that I'm migrating to a new one. The new one has a slightly different but mostly-compatible schema. Additionally, I want to renumber all tables from zero.
Currently I have been using a tool I wrote that manually retrieves the old record, inserts it into the new database, and updates a v2 ID field in the old database to show its corresponding ID location in the new database.
for example, I'm selecting from MV5.Posts and inserting into MV6.Posts. Upon the insert, I retrieve the ID of the new row in MV6.Posts and update it in the old MV5.Posts.MV6ID field.
Is there a way to do this UPDATE via INSERT INTO SELECT FROM so I don't have to process every record manually? I'm using SQL Server 2005, dev edition.

The key with migration is to do several things:
First, do not do anything without a current backup.
Second, if the keys will be changing, you need to store both the old and new in the new structure at least temporarily (Permanently if the key field is exposed to the users because they may be searching by it to get old records).
Next you need to have a thorough understanding of the relationships to child tables. If you change the key field all related tables must change as well. This is where having both old and new key stored comes in handy. If you forget to change any of them, the data will no longer be correct and will be useless. So this is a critical step.
Pick out some test cases of particularly complex data making sure to include one or more test cases for each related table. Store the existing values in work tables.
To start the migration you insert into the new table using a select from the old table. Depending on the amount of records, you may want to loop through batches (not one record at a time) to improve performance. If the new key is an identity, you simply put the value of the old key in its field and let the database create the new keys.
Then do the same with the related tables. Then use the old key value in the table to update the foreign key fields with something like:
Update t2
set fkfield = newkey
from table2 t2
join table1 t1 on t1.oldkey = t2.fkfield
Test your migration by running the test cases and comparing the data with what you stored from before the migration. It is utterly critical to thoroughly test migration data or you can't be sure the data is consistent with the old structure. Migration is a very complex action; it pays to take your time and do it very methodically and thoroughly.

Probably the simplest way would be to add a column on MV6.Posts for oldId, then insert all the records from the old table into the new table. Last, update the old table matching on oldId in the new table with something like:
UPDATE mv5.posts
SET newid = n.id
FROM mv5.posts o, mv6.posts n
WHERE o.id = n.oldid
You could clean up and drop the oldId column afterwards if you wanted to.

The best you can do that I know is with the output clause. Assuming you have SQL 2005 or 2008.
USE AdventureWorks;
GO
DECLARE #MyTableVar table( ScrapReasonID smallint,
Name varchar(50),
ModifiedDate datetime);
INSERT Production.ScrapReason
OUTPUT INSERTED.ScrapReasonID, INSERTED.Name, INSERTED.ModifiedDate
INTO #MyTableVar
VALUES (N'Operator error', GETDATE());
It still would require a second pass to update the original table; however, it might help make your logic simpler. Do you need to update the source table? You could just store the new id's in a third cross reference table.

Heh. I remember doing this in a migration.
Putting the old_id in the new table makes both the update easier -- you can just do an insert into newtable select ... from oldtable, -- and the subsequent "stitching" of records easier. In the "stitch" you'll either update child tables' foreign keys in the insert, by doing a subselect on the new parent (insert into newchild select ... (select id from new_parent where old_id = oldchild.fk) as fk, ... from oldchild) or you'll insert children and do a separate update to fix the foreign keys.
Doing it in one insert is faster; doing it in a separate step meas that your inserts aren't order dependent, and can be re-done if necessary.
After the migration, you can either drop the old_id columns, or, if you have a case where the legacy system exposed the ids and so users used the keys as data, you can keep them to allow use lookup based on the old_id.
Indeed, if you have the foreign keys correctly defined, you can use systables/information-schema to generate your insert statements.

Is there a way to do this UPDATE via INSERT INTO SELECT FROM so I don't have to process every record manually?
Since you wouldn't want to do it manually, but automatically, create a trigger on MV6.Posts so that UPDATE occurs on MV5.Posts automatically when you insert into MV6.Posts.
And your trigger might look something like,
create trigger trg_MV6Posts
on MV6.Posts
after insert
as
begin
set identity_insert MV5.Posts on
update MV5.Posts
set ID = I.ID
from inserted I
set identity_insert MV5.Posts off
end

AFAIK, you cannot update two different tables with a single sql statement
You can however use triggers to achieve what you want to do.

Make a column in MV6.Post.OldMV5Id
make a
insert into MV6.Post
select .. from MV5.Post
then make an update of MV5.Post.MV6ID

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.