Entity Framework returning bad data - c#

I have an Entity Framework 6.1 project that is querying a SQL Server 2012 database table and getting back incorrect results.
To illustrate what is happening, I created 2 queries that should have the exact same results. The table ProjectTable has 23 columns and 20500ish rows:
var test1 = db.ProjectTable
.GroupBy(t => t.ProjectOwner)
.Select(g => g.Key)
.ToArray();
var test2 = db.ProjectTable
.ToArray()
.GroupBy(t => t.ProjectOwner)
.Select(g => g.Key)
.ToArray();
The queries are designed to get a list of all of the distinct project owners in the table. The first query does the heavy lifting on the SQL Server, where as the second query downloads the entire table into memory and then processes it on the client side.
The first variable test1 has a length of about 300 items. The second variable test2 has a length of 5.
Here are the raw SQL queries that EF is generating:
-- test1
SELECT [Distinct1].[ProjectOwner] AS [ProjectOwner]
FROM ( SELECT DISTINCT
[Extent1].[ProjectOwner] AS [ProjectOwner]
FROM [dbo].[ProjectTable] as [Extent1]
) AS [Distinct1]
-- test2
SELECT Col1, Col2 ... ProjectOwner, ... Col23
FROM [dbo].[ProjectTable]
When I run this query and analyze the returned entities, I notice that the full 20500ish rows are returned, but the ProjectOwner column gets overridden with one of only 5 different users!
var test = db.ProjectTable.ToArray();
I thought that maybe it was the SQL Server, so I did a packet trace and filtered on TDS. Randomly looking through the raw streams I see many names that aren't in the list of 5, so I know that data is being sent across the wire correctly.
How do I see the raw data that EF is getting? Is there something that might be messing with the cache and pulling incorrect results?
If I run the queries in either SSMS or Visual Studio, the list returned is correctly. It is only EF that has this issue.
EDIT
Ok, I added another test to make sure my sanity is in check.
I took the test2 raw sql query and did the following:
var test3 = db.Database
.SqlQuery<ProjectTable>(#"SELECT Col1..Col23")
.ToArray()
.Select(t => t.ProjectOwner)
.Distict()
.ToArray();
and I get the correct 300ish names back!
So, in short:
Having EF send projected DISTINCT query to SQL Server returns the correct results
Having EF select the entire table and then using LINQ to project and DISTINCT the data returns incorrect results
Giving EF THE EXACT SAME QUERY!!! that bullet #2 generates and doing a raw SQL query, returns the correct results

After downloading the Entity Framework source and stepping through many an Enumerator, I found the issue.
In the Shaper.HandleEntityAppendOnlymethod (found here), on line 187 the Context.ObjectStateManager.FindEntityEntry method is called. To my surprise, a non-null value was returned! Wait a minute, there shouldn't be any cached results, since I'm returning all rows?!
That's when I discovered that my Table has no Primary Key!
In my defence, the table is actually a cache of a view that I'm working with, I just did a SELECT * INTO CACHETABLE FROM USERVIEW
I then looked at which column Entity Framework thought was my Primary Key (they call it a singleton key) and it just so happens that the column they picked had only... drum roll please... 5 unique values!
When I looked at the model that EF generated, sure enough! That column was specified as a primary key. I changed the key to the appropriate column and now everything is working as it should!

Related

Does the order of items in DbSet match the order of rows in the database table in Entity Framework Core?

For example I have a Postgres database with table Clients where primary keys are INT and rows are naturally sorted by ascending of Ids. And I have .Net Core application with Entity Framework Core as ORM and Npgsql as data provider. So main questions:
Does order of items in returned collection of this listing will always match order of rows in original table in Database?
var clients = context.Clients.ToList();
Does Take() applied to DbSet without OrderBy() will always return items from the begin of table in correct order?
Does Skip() applied to DbSet without OrderBy() will always skip items from the begin of table in correct order?
Are these listings are equal?
var clients = context.Clients
.Skip(10)
.Take(5)
.ToList();
var clients = context.Clients
.OrderBy(c => c.Id)
.Skip(10)
.Take(5)
.ToList();
Do I have to always use OderBy() in expressions with Skip() and Take() when I want to paginate table?
Is all this behavior determined by the framework or by the data provider? For example, will these things be the same in MSSQL, Postgres and MySql?
There is no inherent order in the table, they may be physically stored in order of the clustered index, but the engine may return them to you in any order it sees fit to achieve performance and/or consistency unless you specify a sort order.
The original spec (http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt) says:
If an order by clause is not specified, then the ordering of the
rows of Q is implementation-dependent.
You should not rely on implementation-dependent details, as they are prone to change.
So basically Yes - you must specify an order. No they are not the same. Yes you need an orderby to use skip or take. And it is determined by BOTH the provider and framework, neither of which can be relied upon to stay this way, even in between runs on the same version. Just because you get the results in the order you expect a number of times, doesn't mean that will continue to happen.

How To can Select**DB Table

If I have a database in each table where the ID field and its appropriate function in any field do not take the administrator behavior so that tables and field contents can be fetched until the serial number is unified without duplicate values
Appropriate in this context using except.
Is there a code that can fetch tables either in sql or in the Entity Framework ؟
Eexcept_Admin_except_List
List<int> tempIdList = answeripE.Select(q => q.ID).ToList();
var quslist = db.Qustion.Where(q => !tempIdList.Contains(q.ID));
\Thanks for the creator of "daryal" Get All Except from SQL database using Entity Framework
I need to do this without asking for each table and querying it. And also request SQL from the database as a whole without exception such as
select*
IDfield
FROM
MSDB_Table T
WHERE
T.id == MaxBy(T.OrderBy(x => x.id);
can replace "where TABLE1.id 'OR' Table2.id" decode all the tables and give a result.
All I'm looking forward to is that I can query one database on a whole, get it on a list without the use of tables or a composite key because it serves me in analyzing a set of data converted to other data formats, for example when representing a database in the form of JSON There are a lot of them on more than one platform and in a single database and to avoid the repetition of the data I need to do this or a comprehensive query may be compared or to investigate or like Solver Tool in Excel, so far did not get the answer to show me the first step is because it does not exist originally or because it is not possible?
If you want Entity Framework to retrieve all columns except a subset of them, the best way to do that is either via a stored procedure or a view. With a view you can query it using LINQ and add your predicates in code, but in a stored procedure you will have to write it and feed your predicate conditions into it...so it sounds like a view would be better for you.
Old example, but should guide you through the process:
https://www.mssqltips.com/sqlservertip/1990/how-to-use-sql-server-views-with-the-entity-framework/

Entity Framework - how can I optimize “Contains” statement?

In our current application we have some performance issues with some of our queries. Usually we have something like:
List<int> idList = some data here…;
var query = (from a in someTable where idList.Contains(a.Id));
while for simple queries this is acceptable, it becomes a bottleneck when we have more items in idList (in some queries we have about 700 id’s to check, for example).
Is there any way to use something other then Contains? We are thinking of using some temporary tables to first insert the Ids, and then to execute join instead of Contains, but it would seem EntityFramework does not support such operations (creating temporary tables in code) :(
What else can we try?
I Suggest using LINQ PAD it offers a "Transform to SQL" option which allows you to see your query in SQL syntax.
there is a chance that this is the optimal solution (if youre not into messy stuff).
might try holding the idList as a sorted array and have the contains method replaced with a binary search. (you can implement your own extension).
You can try this:
var query = someTable.Where(a => idList.Any(b => b.Id == a.Id));
If you don't mind having a physical table you could use a semi-temporary table. The basic idea is:
Create a physical table with a "query id" column
Generate a unique ID (not random, but unique)
Insert data into the table tagging the records with the query ID
Pass the query id to the main query, using it to join to the link table
Once the query is complete, delete the temporary records
At worst if something goes wrong you will have orphaned records in the link table (which is why you use a unique query ID).
It's not the cleanest solution but it will be faster than using Contains if you have a lot of values to check against.
When Entity Framework starts being a performance bottleneck, generally it's time to write actual SQL.
So what you could do for example is build a table-valued function that takes a table-valued parameter (your list of IDs) as parameter. The function would just return the result of your JOIN.
Table valued function feature requires EF5, so it might be not an option if you're really stuck with EF4.
The idea is to refactor your queries to get rid of idList.
For example you should return the list of orders of male users 18-25 year, from France. If you filter users table by age, sex and country to get idList of users you end up with 700+ id's. Instead you make Orders table join with Users and apply filters to the Users table. So you don't have 2 requests (one for ids and one for orders) and it works much faster cause it can use indexes while joining the table.
Makes sense?

Linq to entities with raw stored procedure returns duplicate copies of data

Interfacing to SQL Server 2008R2:
I have a linq expression:
IQueryable<xxx> xxxResult =
(from t in _context.xxxx.AsNoTracking().Include("yyy")
where t.zzz >= lowNumber
&& t.zzz <= highNumber
&& t.qqq == someValue
select t);
(It probably doesn't matter on the exact query, but it's there in case it does.)
Linq generated SQL that the SQL Server generated a terrible plan, and, since I can't add index/join hints, I created a stored procedure that wrapped the SQL that the above Linq expression generated.
I know I should be able to access the stored procedure through Entity Framework, but I'm using a previous project that used a very light code-first implementation (no .edmx file, for instance) and I'm kinda new to the whole EF thing and didn't know how to tie the new procedure into EF. I know it can be done, but I am trying to call the procedure directly.
I worked this out:
IQueryable<xxx> xxxResult =
_context.xxxx.SqlQuery("GetData #p0, #p1, #p2", someValue, lowNumber, highNumber)
.AsNoTracking().AsQueryable();
This seems to work, except for one problem. When iterating over the linq queryable, everything works swimmingly. But, when I use the stored procedure, I get duplicate records.
For instance, if I have an xxx record that includes 3 yyy records in a collection, I get a single xxx record from the linq expression and it, as expected, includes 3 yyy records in the internal collection.
The stored procedure, for the same dataset, iterating over the queryable returns three xxx records, EACH of which contain the same 3 yyy records.
Again, the stored procedure executes the exact same SQL that the linq expression generated.
Why is that? Any ideas?
(Again, new to EF, so please forgive errors in terminology.)
I believe that EF is seeing your results as duplicate based on the primary key you have defined. In EF5, this would be defined using the "Entity Key" property on the fields which uniquely define the entity (a multi-part primary key would have this set on multiple fields).
If your procedure returns a record that matches one that it already returned (based soley on the primary key fields) then it will return a reference to the previous record.
Your LINQ expression uses .AsNoTracking which should prevent this caching behavior.
I'm guessing that the .AsNoTracking() using the stored proc occurs after it has been cached and doesn't have the effect you are looking for.
Make sure that you have your primary keys set properly on your model.
Here's an article that describes the behavior with a view:
http://jepsonsblog.blogspot.in/2011/11/enitity-framework-duplicate-rows-in.html which should be similar to what you are seeing with the stored procedure.
It looks like in Code First, you would use the [Key] annotation to specify your unique keys: http://msdn.microsoft.com/en-us/data/jj591583.aspx

SELECT with "datetime > string" performance issue in EF4 / SQL Server 2008

I am using EntityFramework 4 to access a SQL Server 2008 database.
One of the SQL queries that the EF generates is having a behavior that I cannot explain.
The query is like this:
SELECT tableA.field1, tableA.field2, ...
FROM tableA join tableB on tableA.field1 = tableB.field1
WHERE
tableA.field2 > '20110825'
and tableA.field3 in ('a', 'b', 'c,')
and tableB.field4 = 'xxx'
Where tableA.field2 is datetime not null, and the other fields are varchars.
tableA contains circa 1.5 million records, tableB contains circa 2 million records, and the query returns 1877 rows.
The problem is, it returns them in 86 seconds, and that time changes dramatically when I change the '20110825' literal to older values.
For instance if I put '20110725' the query returns 3483 rows in 35 milliseconds.
I found out in the execution plan that the difference between the two lies in the indexes SQL Server chooses to use depending on the date used to compare.
When it is taking time, the execution plan shows:
50%: index seek on tableA.field2 (it's a clustered index on this field alone)
50%: index seek on tableB.field1 (non-unique, non-clustered index on this field alone)
0%: join
When it is almost instantaneous, the execution plan shows:
98%: index seek on tableA.field1 (non-unique, non-clustered index on this field alone)
2%: index seek on tableB.field1 (non-unique, non-clustered index on this field alone)
0%: join
So it seems to me that the decision of the optimizer to use the clustered index on tableA.field2 is not optimal.
Is there a flaw in the database design? In the SQL query?
Can I force in any way the database to use the correct execution plan?
Given that you are using literal values and are only encountering the issue with recent date strings I would suspect you are hitting the issue described here and need to schedule a job to update your statistics.
Presumably when they were last updated there were few or no rows meeting the '20110825' criteria and SQL Server is using a join strategy predicated on that assumption.

Categories