I'm using a LinQ query that looks like this
public List<TEntity> GetEntities<TEntity>(int[] ids)
{
var someDbSet = new DbSet<TEntity>();
var resultQ = someDbSet.Where(t => !ids.Any() || ids.Contains(t.ID)); //<= crashing line
return resultQ.toList();
}
It usually works, but for some case when ids size is ~ 7000 items it crashes.
The thrown exception message is "Exception of type 'System.StackOverflowException' was thrown.".
It has no stack trace or InnerException.
I also get this info: "EntityFramework.pdb not loaded... contains the debug information required to find the source for the module EntityFramework.dll"
Is this a known bug or can someone explain why it doesn't work when the array is bigger?
I'm using .NET Framework 4.5, EntityFramework 6.1.3, EntityFramework6.Npgsql 3.0.3
If we pass an array with only two values int[] ids = {1, 2} to your method GetEntities EntityFramework will generate the next query:
SELECT
[Extent1].[Id] AS [Id],
...
FROM [dbo].[Entity] AS [Extent1]
WHERE ( NOT EXISTS (SELECT
1 AS [C1]
FROM (SELECT
1 AS [C0]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
UNION ALL
SELECT
1 AS [C0]
FROM ( SELECT 1 AS X ) AS [SingleRowTable2]) AS [UnionAll1]
)) OR (1 = [Extent1].[Id]) OR (2 = [Extent1].[Id])
If we increase the number of elements in ids array this query becomes more complex with more levels of nestings. I think that EntityFramework uses some recursive algorithm to generate SQL-code for !ids.Any() expression. When the number of elements in ids array increases the depth of the recursion also increases. Therefore it generates StackOverflowException when the number of elements in ids array (and also the depth of the recursion) is large.
If we delete !ids.Any() expression the next query will be generated:
SELECT
[Extent1].[Id] AS [Id],
...
FROM [dbo].[Entity] AS [Extent1]
WHERE [Extent1].[Id] IN (1,2)
Such query does not generate StackOverflowException when the number of elements in the ids array is large. Therefore it would be better to extract !ids.Any() expression out of LINQ query:
public List<TEntity> GetEntities<TEntity>(int[] ids)
{
var someDbSet = new DbSet<TEntity>();
if (!ids.Any())
return someDbSet.ToList();
var resultQ = someDbSet.Where(t => ids.Contains(t.ID));
return resultQ.toList();
}
You should also take to account that there is a limitation on number of items for WHERE IN condition: Limit on the WHERE col IN (...) condition.
ionutnespus wrote:
Yes, extracting the condition outside Where() is working. Still, I
couldn't find any explanation why EF would use such a complicated
algorithm for such a simple condition. Any thoughts on that?
I've decided to answer this question by extending this post because the asnwer is large and contains code.
I don't know for sure why EF generates such complex query but I've made some research and here are my thoughts. If we modify your GetEntites method and use next condition in LINQ query:
someDbSet.Where(t => !ids.Any(i => i == 3) || ids.Contains(t.ID));
the next SQL-query will be generated if ids = {1, 2}:
SELECT
[Extent1].[Id] AS [Id],
...
FROM [dbo].[Entity] AS [Extent1]
WHERE ( NOT EXISTS (
SELECT 1 AS [C1]
FROM (
SELECT 1 AS [C0] FROM ( SELECT 1 AS X ) AS [SingleRowTable1] WHERE 3 = 1
UNION ALL
SELECT 1 AS [C0] FROM ( SELECT 1 AS X ) AS [SingleRowTable2] WHERE 3 = 2
) AS [UnionAll1]
)) OR (1 = [Extent1].[Id]) OR (2 = [Extent1].[Id])
Here you can see that NOT EXISTS condition contains two subqueries each of which checks if the next element of the ids array equals required value. I think that it is logical to use NOT EXISTS SQL-condition to represent Any() method. But why does EF generates one subquery for each array element? In my opinion EF does so because because EF Team tried to write algorithm that generates queries that are not depend on database type. But this is only my opinion. May be it would be better to ask this question EF Team on github.
Can you try like this?
public List<TEntity> GetEntities<TEntity>(int[] ids)
{
var someDbSet = new DbSet<TEntity>();
var resultQ = new List<your_list_type>();
foreach( var id in ids) {
resultQ.Add(someDbSet.Where(prm => prm.ID == id).FirstOrDefault());
}
return resultQ;
}
As per your error message, The exception that is thrown when the execution stack overflows because it contains too many nested method calls.
As MSDN
The default maximum size 2 gigabytes (GB) of an array.
In a 64-bit environment, you can avoid the size restriction by setting the enabled attribute of the gcAllowVeryLargeObjects configuration element to true in the run-time environment.
Moreover your ids exceeds to 2 gb limits. i think this might be the cause
Related
Predicament
I have a large dataset that I need to perform a complex calculation upon a part of. For my calculation, I need to take a chunk of ordered data from a large set based on input parameters.
My method signature looks like this:
double Process(Entity e, DateTimeOffset? start, DateTimeOffset? end)
Potential Solutions
The two following methods spring to mind:
Method 1 - WHERE Clause
double result = 0d;
IEnumerable<Quote> items = from item in e.Items
where (!start.HasValue || item.Date >= start.Value)
&& (!end.HasValue || item.Date <= end.Value)
orderby item.Date ascending
select item;
...
return result;
Method 2 - Skip & Take
double result = 0d;
IEnumerable<Item> items = e.Items.OrderBy(i => i.Date);
if (start.HasValue)
items = items.SkipWhile(i => i.Date < start.Value);
if (end.HasValue)
items = items.TakeWhile(i => i.Date <= end.Value);
...
return result;
Question
If I were just throwing this together, I'd probably just go with Method 1, but the size of my dataset and the the size of the set of datasets are both too large to ignore slight efficiency losses, and it is of vital importance that the resulting enumerable is ordered.
Which approach will generate the more efficient query? And is there a more efficient approach that I am yet to consider?
Any solutions presented can make the safe assumption that the table is well indexed.
SkipWhile is not supported for translation to SQL. You need to throw that option away.
The best way to go about this is to create an index on the field you use to range select and then issue a query that is SARGable. where date >= start && date < end is SARGable and can make use of an index.
!start.HasValue || is not a good idea because that destroys SARGability. Build the query so that this is not needed. For example:
if(start != null) query = query.Where(...);
Make the index covering and you have optimal performance. There is no single extra row that needs to be processed.
According to link you can't use SkipWhile without materializing a query, so in 2. case you materialize all entities, then calculate the result.
In 1. scenario you can let sql handle this query and materialize only necessary records, so it's better option.
EDIT:
I wrote sample data, queries made to database:
SELECT
[Project1].[Id] AS [Id],
[Project1].[AddedDate] AS [AddedDate],
[Project1].[SendDate] AS [SendDate]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[AddedDate] AS [AddedDate],
[Extent1].[SendDate] AS [SendDate]
FROM [dbo].[Alerts] AS [Extent1]
WHERE ([Extent1].[AddedDate] >= #p__linq__0) AND ([Extent1].[AddedDate] <= #p__linq__1)
) AS [Project1]
ORDER BY [Project1].[AddedDate] ASC
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[AddedDate] AS [AddedDate],
[Extent1].[SendDate] AS [SendDate]
FROM [dbo].[Alerts] AS [Extent1]
ORDER BY [Extent1].[AddedDate] ASC
I inserted 1 000 000 records, and wrote query with expected 1 row in result. In 1 case query time was 291 ms and instant materialize. In second case query time was 1065 ms and I had to wait about 10 second to materialize result;
I am using EF6 and I would like to get the records in a table which are in a group of IDs.
In my test for example I am using 4 IDs.
I try two options, the first is with any.
dbContext.MyTable
.Where(x => myIDS.Any(y=> y == x.MyID));
And the T-SQL that this linq exrepsion generates is:
SELECT
*
FROM [dbo].[MiTabla] AS [Extent1]
WHERE EXISTS (SELECT
1 AS [C1]
FROM (SELECT
[UnionAll2].[C1] AS [C1]
FROM (SELECT
[UnionAll1].[C1] AS [C1]
FROM (SELECT
cast(130 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
UNION ALL
SELECT
cast(139 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable2]) AS [UnionAll1]
UNION ALL
SELECT
cast(140 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable3]) AS [UnionAll2]
UNION ALL
SELECT
cast(141 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable4]) AS [UnionAll3]
WHERE [UnionAll3].[C1] = [Extent1].[MiID]
)
How can is seen, the T-SQL is a "where exists" that use many subqueries and unions.
The second option is with contains.
dbContext.MyTable
.Where(x => myIDS.Contains(x.MiID));
And the T-SQL:
SELECT
*
FROM [dbo].[MiTabla] AS [Extent1]
WHERE [Extent1].[MiID] IN (cast(130 as bigint), cast(139 as bigint), cast(140 as bigint), cast(141 as bigint))
The contains is translated into "where in", but the query is much less complex.
I have read that any it use to be faster, so I have the doubt if the any is, although it is more complex at a first glance, is faster or not.
Thank so much.
EDIT: I have some test (I don't know if this is the best way to test this).
System.Diagnostics.Stopwatch miswContains = new System.Diagnostics.Stopwatch();
miswContains.Start();
for (int i = 0; i < 100; i++)
{
IQueryable<MyTable> iq = dbContext.MyTable
.Where(x => myIDS.Contains(x.MyID));
iq.ToArrayAsync();
}
miswContains.Stop();
System.Diagnostics.Stopwatch miswAny = new System.Diagnostics.Stopwatch();
miswAny.Start();
for (int i = 0; i < 20; i++)
{
IQueryable<MyTable> iq = dbContext.Mytable
.Where(x => myIDS.Any(y => y == x.MyID));
iq.ToArrayAsync();
}
miswAny.Stop();
the results are that miswAny is about 850ms and the miswContains is about 4251ms.
So the second option, with contaions, is slower.
Your second option is the fastest solution I can think of (at least for not very large arrays of ids) provided your MiTabla.MiID is in an index.
If you want to read more about in clause performance: Is SQL IN bad for performance?.
If you know the ID, then using LINQ2SQL Count() method would create a much cleaner and faster SQL code (than both Any and Contains):
dbContext.MyTable
.Where(x => myIDS.Count(y=> y == x.MyID) > 0);
The generated SQL for the count should look something like this:
DECLARE #p0 Decimal(9,0) = 12345
SELECT COUNT(*) AS [value]
FROM [ids] AS [t0]
WHERE [t0].[id] = #p0
You can tell by the shape of the queries that Any is not scalable at all. It doesn't take many elements in myIDS (~50 probably) to get a SQL exception that the maximum nesting level has exceeded.
Contains is much better in this respect. It can handle a couple of thousands of elements before its performance gets severely affected.
So I would go for the scalable solution, even though Any may be faster with small numbers. It is possible to make Contains even better scalable.
I have read that any it use to be faster,
In LINQ-to-objects that's generally true, because the enumeration stops at the first hit. But with LINQ against a SQL backend, the generated SQL is what counts.
I've noticed that Entity Framework translates LINQ queries with negative boolean filters such that the generated query plan won't use a filtered index. For example, the query:
context.Foo.Count(f => !f.IsActive)
generates the SQL statement:
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[Foos] AS [Extent1]
WHERE [Extent1].[IsActive] <> cast(1 as bit)
) AS [GroupBy1]
Notice the WHERE clause uses [IsActive] <> cast(1 as bit), rather than the more intuitive [IsActive] = 0. This becomes an issue when using filtered indexes. The plan for the above query will not use the following index:
CREATE INDEX IX_Foo_IsActive ON Foos (IsActive) WHERE (IsActive = 0)
I suspect the reason that EF generates queries this way has something to do with DB null semantics, but this happens even with non-nullable bit fields. I've verified that writing the filtered index with EF's syntax (IsActive <> 1) fixes the issue, but that would break any non-EF queries using the more common syntax.
Is there a better work around?
Full example program here: http://dotnetfiddle.net/3kZugt. The entity type used above is:
public class Foo
{
public int Id { get; set; }
public bool IsActive { get; set; }
}
It's not unusual that for some strange reason, sometimes we don't see something which is really obvious: do a direct translation of your DB predicate to a C# predicate, i.e.
WHERE IsActive = 0
is translated to
f => f.IsActive = false
You have to stop thinking in C# and start thinking in SQL ;)
I'm trying to create a query similar to this:
select randomId
from myView
where ...
group by randomId
NOTE: EF doesn't support the distinct so I was thinking of going around the lack of it with the group by (or so I think)
randomId is numeric
Entity Framework V.6.0.2
This gives me the expected result in < 1 second query
When trying to do the same with EF I have been having some issues.
If I do the LINQ similar to this:
context.myView
.Where(...)
.GroupBy(mt => mt.randomId)
.Select({ Id = group.Key, Count = group.Count() } )
I will get sort of the same result but forcing a count and making the query > 6 seconds
The SQL EF generates is something like this:
SELECT
1 AS [C1],
[GroupBy1].[K1] AS [randomId],
[GroupBy1].[A1] AS [C2]
FROM (
SELECT
[Extent1].[randomId] AS [K1],
COUNT(1) AS [A1]
FROM [dbo].[myView] AS [Extent1]
WHERE (...)
GROUP BY [Extent1].[randomId]
) AS [GroupBy1]
But, if the query had the count commented out it would be back to < 1 second
If I change the Select to be like:
.Select({ Id = group.Key} )
I will get all of rows without the group by statement in the SQL query and no Distinct whatsoever:
SELECT
[Extent1].[anotherField] AS [anotherField], -- 'this field got included automatically on this query and I dont know why, it doesnt affect outcome when removed in SQL server'
[Extent1].[randomId] AS [randomId]
FROM [dbo].[myView] AS [Extent1]
WHERE (...)
Other failed attempts:
query.GroupBy(x => x.randomId).Select(group => group.FirstOrDefault());
The query that was generated is as follows:
SELECT
[Limit1].ALL FIELDS,...
FROM (SELECT
[Extent1].[randomId] AS [randomId]
FROM [dbo].[myView] AS [Extent1]
WHERE (...) AS [Project1]
OUTER APPLY (SELECT TOP (1)
[Extent2].ALL FIELDS,...
FROM [dbo].[myView] AS [Extent2]
WHERE (...) AS [Limit1] -- same as the where above
This query performed rather poorly and still managed to return all Ids for the where clause.
Does anyone have an idea on how to force the usage of the group by without an aggregating function like a count?
In SQL it works but then again I have the distinct keyword as well...
Cheers,
J
var query = from p in TableName
select new {Id = p.ColumnNameId};
var distinctItems = query.Distinct().ToList();
Here is the linq query however you should be able to write an equivalent from EF dbset too. If you have issues let me know.
Cheers!
Anybody know how to write a LINQ to SQL statement to return every nth row from a table? I'm needing to get the title of the item at the top of each page in a paged data grid back for fast user scanning. So if i wanted the first record, then every 3rd one after that, from the following names:
Amy, Eric, Jason, Joe, John, Josh, Maribel, Paul, Steve, Tom
I'd get Amy, Joe, Maribel, and Tom.
I suspect this can be done... LINQ to SQL statements already invoke the ROW_NUMBER() SQL function in conjunction with sorting and paging. I just don't know how to get back every nth item. The SQL Statement would be something like WHERE ROW_NUMBER MOD 3 = 0, but I don't know the LINQ statement to use to get the right SQL.
Sometimes, TSQL is the way to go. I would use ExecuteQuery<T> here:
var data = db.ExecuteQuery<SomeObjectType>(#"
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS [__row]
FROM [YourTable]) x WHERE (x.__row % 25) = 1");
You could also swap out the n:
var data = db.ExecuteQuery<SomeObjectType>(#"
DECLARE #n int = 2
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS [__row]
FROM [YourTable]) x WHERE (x.__row % #n) = 1", n);
Once upon a time, there was no such thing as Row_Number, and yet such queries were possible. Behold!
var query =
from c in db.Customers
let i = (
from c2 in db.Customers
where c2.ID < c.ID
select c2).Count()
where i%3 == 0
select c;
This generates the following Sql
SELECT [t2].[ID], [t2]. --(more fields)
FROM (
SELECT [t0].[ID], [t0]. --(more fields)
(
SELECT COUNT(*)
FROM [dbo].[Customer] AS [t1]
WHERE [t1].[ID] < [t0].[ID]
) AS [value]
FROM [dbo].[Customer] AS [t0]
) AS [t2]
WHERE ([t2].[value] % #p0) = #p1
Here's an option that works, but it might be worth checking that it doesn't have any performance issues in practice:
var nth = 3;
var ids = Table
.Select(x => x.Id)
.ToArray()
.Where((x, n) => n % nth == 0)
.ToArray();
var nthRecords = Table
.Where(x => ids.Contains(x.Id));
Just googling around a bit I haven't found (or experienced) an option for Linq to SQL to directly support this.
The only option I can offer is that you write a stored procedure with the appropriate SQL query written out and then calling the sproc via Linq to SQL. Not the best solution, especially if you have any kind of complex filtering going on.
There really doesn't seem to be an easy way to do this:
How do I add ROW_NUMBER to a LINQ query or Entity?
How to find the ROW_NUMBER() of a row with Linq to SQL
But there's always:
peopleToFilter.AsEnumerable().Where((x,i) => i % AmountToSkipBy == 0)
NOTE: This still doesn't execute on the database side of things!
This will do the trick, but it isn't the most efficient query in the world:
var count = query.Count();
var pageSize = 10;
var pageTops = query.Take(1);
for(int i = pageSize; i < count; i += pageSize)
{
pageTops = pageTops.Concat(query.Skip(i - (i % pageSize)).Take(1));
}
return pageTops;
It dynamically constructs a query to pull the (nth, 2*nth, 3*nth, etc) value from the given query. If you use this technique, you'll probably want to create a limit of maybe ten or twenty names, similar to how Google results page (1-10, and Next), in order to avoid getting an expression so large the database refuses to attempt to parse it.
If you need better performance, you'll probably have to use a stored procedure or a view to represent your query, and include the row number as part of the stored proc results or the view's fields.