IQueryable<>.ToString() too slow - c#

I'm using BatchDelete found on the answer to this question: EF Code First Delete Batch From IQueryable<T>?
The method seems to be wasting too much time building the delete clause from the IQueryable. Specifically, deleting 20.000 elements using the IQueryable below is taking almost two minutes.
context.DeleteBatch(context.SomeTable.Where(x => idList.Contains(x.Id)));
All the time is spent on this line:
var sql = clause.ToString();
The line is part of this method, available on the original question linked above but pasted here for convenience:
private static string GetClause<T>(DbContext context, IQueryable<T> clause) where T : class
{
const string Snippet = "FROM [dbo].[";
var sql = clause.ToString();
var sqlFirstPart = sql.Substring(sql.IndexOf(Snippet, System.StringComparison.OrdinalIgnoreCase));
sqlFirstPart = sqlFirstPart.Replace("AS [Extent1]", string.Empty);
sqlFirstPart = sqlFirstPart.Replace("[Extent1].", string.Empty);
return sqlFirstPart;
}
I imagine making context.SomeTable.Where(x => idList.Contains(x.Id)) into a compiled query could help, but AFAIK you can't compile queries while using DbContext on EF 5. In thesis they should be cached but I see no sign of improvement on a second execution of the same BatchDelete.
Is there a way to make this faster? I would like to avoid manually building the SQL delete statement.

The IQueryable isn't cached and each time you evaluate it you're going out to SQL. Running ToList() or ToArray() on it will evaluate it once and then you can work with the list as the cached version.
If you want to preserve you're interfaces, you'd use ToList().AsQueryable() and this would pass in a cached version.
Related post.
How do I cache an IQueryable object?

It seems there is no way to cache the IQueryable in this case, because the query contains a list of ids to check against and the list changes in every call.
The only way I found to avoid the two minute delay in building the query every time I had to mass-delete objects was to use ExecuteSqlCommand as below:
var list = string.Join("','", ids.Select(x => x.ToString()));
var qry = string.Format("DELETE FROM SomeTable WHERE Id IN ('{0}')", list);
context.Database.ExecuteSqlCommand(qry);
I'll mark this as the answer for now. If any other technique is suggested that doesn't rely on ExecuteSqlCommand, I'll gladly change the answer.

There is a EF pattern that works Ok.
it uses projection. to return ONLY keys from DB. (projections are not added to context,
So this is pretty quick.
Then You build the context with KEY only stub POCOs, and light the fuse....
basically.
var deleteMagazine = Context.Set<DeadMeat>.Where(t=>t.IhateYou == true).Select(t=>t.THEKEY).toList
//Now instantiate a dummy POCO with KEY only for the list,
foreach ( var bullet in deleteMagazine)
{
context.Set<deadmeat>.attach(bullet);
context.set<deadmeat>.remove(bullet);
// consider saving chnages every 1000 records .... performance, trial different values
if (magazineisEmpty) // your counter logic here :-)
context.SaveChanges
}
// shoot anyone still moving
context.SaveChanges
check SQL server profiler....

Related

Linq ToList() and Count() performance problem

I have 200k rows in my table and I need to filter the table and then show in datatable. When I try to do that, my sql run fast. But when I want to get row count or run the ToList(), it takes long time. Also when I try to convert it to list it has 15 rows after filter, it has not huge data.
public static List<Books> GetBooks()
{
List<Books> bookList = new List<Books>();
var v = from a in ctx.Books select a);
int allBooksCount = v.Count(); // I need all books count before filter. but it is so slow is my first problem
if (isFilter)
{
v = v.Where(a => a.startdate <= DateTime.Now && a.enddate>= DateTime.Now);
}
.
.
bookList = v.ToList(); // also toList is so slow is my second problem
}
There's nothing wrong with the code you've shown. So either you have some trouble in the database itself, or you're ruining the query by using IEnumerable instead of IQueryable.
My guess is that either ctx.Books is IEnumerable<Books> (instead of IQueryable<Books>), or that the Count (and Where etc.) method you're calling is the Enumerable version, rather than the Queryable version.
Which version of Count are you actually calling?
First, to get help you need to provide quantitative values for "fast" vs. "too long". Loading entities from EF will take longer than running a raw SQL statement in a client tool like TOAD etc. Are you seeing differences of 15ms vs. 15 seconds, or 15ms vs. 150ms?
To help identify and eliminate possible culprits:
Eliminate the possibility of a long-running DbContext instance tracking too many entities bogging down performance. The longer a DbContext is used and the more entities it tracks, the slower it gets. Temporarily change the code to:
List<Books> bookList = new List<Books>();
using (var context = new YourDbContext())
{
var v = from a in context.Books select a);
int allBooksCount = v.Count(); // I need all books count before filter. but it is so slow is my first problem
if (isFilter)
{
v = v.Where(a => a.startdate <= DateTime.Now && a.enddate>= DateTime.Now);
}
.
.
bookList = v.ToList();
}
Using a fresh DbContext ensures queries are not sifting through in-memory entities after running a query to find tracked instances to return. This also ensures we are running against IQueryable off the Books DbSet within the context. We can only guess what "ctx" in your code actually represents.
Next: Look at a profiler for MySQL, or have your database log out SQL statements to capture exactly what EF is requesting. Check that the Count and ToList each trigger just one query against the database, and then run these exact statements against the database. If there are more than these two queries being run then something odd is happening behind the scenes that you need to investigate, such as that your example doesn't really represent what your real code is doing. You could be tripping client side evaluation (if using EF Core 2) or lazy loading. The next thing I would look at is if possible to look at the execution plan for these queries for hints like missing indexes or such. (my DB experience is primarily SQL Server so I cannot provide advice on tools to use for MySQL)
I would log the actual SQL queries here. You can then use DESCRIBE to look at how many rows it hits. There are various tools that can further analyse the queries if DESCRIBE isn't sufficient. This way you can see whether it's the queries or the (lack of) indices that is the problem. Next step has to be guided by that.

Speeding up LINQ queries (selecting data from table)

I have written a code which looks like this:
using(var ctx = new myentitiesContext())
{
var currentLoggedUser = ctx.Users.FirstOrDefault(x=>x.Email==User.Identity.Name);
var items = ctx.Items.Where(x=>x.Sales>0 && x.UserId==currentLoggedUser.UserId).ToList();
}
As you can see it's a simple select from the DB. But the tricky part is that sometimes I can select a large quantity of data (50-100k records at a time). So what I've been wondering, are there any ways to tweak the LINQ to perform faster when the data is being pulled out of the table?
I've already created indexes in my table on FK UserId, so that part is done.
My question here is, is there any way to speed up LINQ queries via some tweaks in context configuration section, or perhaps by creating compiled queries, or via some other method ?
P.S. guys, would something like this work good:
ctx.Configuration.AutoDetectChangesEnabled = false;
// my queries...
ctx.Configuration.AutoDetectChangesEnabled = true;
In addition with the things that the rest of the users have written. You could disable lazy loading. That way if the Items Db Table has references to other tables they will not get loaded along with the Items unless you absolutely need it. Check these links
thecodegarden
mehdi
One more think that i would recommend is that you must log the sql queries that your linq expressions create and try to optimise them with your DBA. You could do this by adding an Action<string> delegate on the DbContext.Database.Log that will emit everything between a connection.Open() and a connection.Close(). You could also take the sql query out of your IQueryableor IQueryable<T> calling the .ToString() method on your IQueryable variable.
You should make projection first. For example, this:
var items = ctx.Items.Where(x=>x.Sales>0 && x.UserId==currentLoggedUser.UserId).ToList();
will be better if you write it like this:
var items = ctx.Items.Where(x.UserId==currentLoggedUser.UserId).Where(x2=>x2.Sales>0 ).ToList();
And if you don't need all the object you have to use the "Select" clause before the "Where" and project just the properties that you need to minimize the cost, like this:
ctx.Items.Select(e=>new {e.UserID,e.Sales}).Where(x.UserId==currentLoggedUser.UserId).Where(x2=>x2.Sales>0 ).ToList();

How to save and reuse an IQueryable, or it's Where clause?

I have an IQueryable that has a Where clause with many parameters. I could save each parameter to the ASP.NET session and recreate the IQueryable from zero, but I figured is easier to save only one parameter to the session: the IQueryable or at least the where clause of the IQueryable.
How to do this?
The query:
IQueryable<DAL.TradeCard> data = dc.TradeCard.Include("Address").Include("Vehicle");
data = data.Where(it =>
(tbOrderNumber.Text == null || it.orderNumber == tbOrderNumber.Text) &&
(tbPlateNumber.Text == null || it.Vehicle.plateNumber == tbPlateNumber.Text));
(now there are only 2 but there will be many more parameters)
PS: I don't want to save the query result to the session.
I'm not sure if this will work, but you could try to use DataContext.GetCommand to get the SQL command for the query and then save it to use it with DataContext.ExecuteQuery.
I had whole big answer ready but I misunderstood what you're trying to achieve :)
Ok, so let me write out what I think you're trying to do:
Server gets request ("get data for paramA = 1, paramB = 2, ... paramZ = 24")
Server runs a series of "Where" and gets a filtered result set
Server sends data to client
Client does some stuff that operates on the same set of filtered data. But you don't want server to re-run the query! And you cannot save the data to client's session cause it's a lot of records.
Until client explicitely calls the query with different params, the query should not be re-run
I was working on simillar problem lately, but there isn't one magic bullet :)
Some ideas for the solution:
Cache list of ID's. Unless the data goes into hundreds of thousands records, you probably can save the indexed Id of the selected items to the session. It's what, 4-8 bytes per ID + overhead? But that does re-runs the query, just more more efficiently: data = source.Include("...").Where(i => IdsFromSession.Contain(i.Id));
(Added in edit) Cache the query input string/object/however your search values are passed. You can probably fairly easily serialize it and use that (or hash of that) as a server-side cache key.
(I love the idea, but it's a bit wonky :) ) Cache the Wheres! Now, this workes like this:
Create method that takes expressions instead of Funcs
Write your where lambdas to that method and have that magic method actually return Funcs for "Where"
Get an unique hash for the where lambdas
Check the server-side cache for that hash, if needed run the query and save the results under that hash.
Now, this is a huge overkill and overengineered solution, but I am personally dying to actually implement this. It would look like this:
class MagicClass{ // don't have time for name-inventing :)
private List<string> hashes = new List<string>();
public string Hash {get{ return String.Join("_", hashes);}}
public Func<TIn,bool> MagicWhere(Expression<Func<TIn,bool>> where){
var v = new MagicExpressionVisitor();
v.Visit(where);
hashes.Add(v.ExpressionHash);
return where.Compile(); // I think that should do...
}
}
class MagicExpressionVisitor : ExpressionVisitor
{
public string ExpressionHash {get;set;}
// Override ExpressionVisitor methods to get a possibly unique hash depending on what's actually in that expression
}
Usage:
var magic = new MagicClass();
data = data.Where(magic.MyWhere<DAL.TradeCard,int>(it => it.IsSomething && it.Name != "some name"));
if(!Cache.HasKey(magic.Hash))
Cache[magic.Hash] = data.ToList();
return Cache[magic.Hash];
This is obviously untested, but doable. It won't execute the Wheres on data source, if such query was already run (in cache period). It has two advantages over caching Id's: 1. It works for many clients simultanously (so second client will benefit from the fact, that the first one requested the same query, 2. It doesn't touch the datasource at all.
If the datasource is DB, you can probably find out what is the actual SQL command being run, SHA-### it and save results under that hash, but my crazy solution works for other datasources, like LINQ-to-Objects etc. ;)

EF LINQ ToList is very slow

I am using ASP NET MVC 4.5 and EF6, code first migrations.
I have this code, which takes about 6 seconds.
var filtered = _repository.Requests.Where(r => some conditions); // this is fast, conditions match only 8 items
var list = filtered.ToList(); // this takes 6 seconds, has 8 items inside
I thought that this is because of relations, it must build them inside memory, but that is not the case, because even when I return 0 fields, it is still as slow.
var filtered = _repository.Requests.Where(r => some conditions).Select(e => new {}); // this is fast, conditions match only 8 items
var list = filtered.ToList(); // this takes still around 5-6 seconds, has 8 items inside
Now the Requests table is quite complex, lots of relations and has ~16k items. On the other hand, the filtered list should only contain proxies to 8 items.
Why is ToList() method so slow? I actually think the problem is not in ToList() method, but probably EF issue, or bad design problem.
Anyone has had experience with anything like this?
EDIT:
These are the conditions:
_repository.Requests.Where(r => ids.Any(a => a == r.Student.Id) && r.StartDate <= cycle.EndDate && r.EndDate >= cycle.StartDate)
So basically, I can checking if Student id is in my id list and checking if dates match.
Your filtered variable contains a query which is a question, and it doesn't contain the answer. If you request the answer by calling .ToList(), that is when the query is executed. And that is the reason why it is slow, because only when you call .ToList() is the query executed by your database.
It is called Deferred execution. A google might give you some more information about it.
If you show some of your conditions, we might be able to say why it is slow.
In addition to Maarten's answer I think the problem is about two different situation
some condition is complex and results in complex and heavy joins or query in your database
some condition is filtering on a column which does not have an index and this cause the full table scan and make your query slow.
I suggest start monitoring the query generated by Entity Framework, it's very simple, you just need to set Log function of your context and see the results,
using (var context = new MyContext())
{
context.Database.Log = Console.Write;
// Your code here...
}
if you see something strange in generated query try to make it better by breaking it in parts, some times Entity Framework generated queries are not so good.
if the query is okay then the problem lies in your database (assuming no network problem).
run your query with an SQL profiler and check what's wrong.
UPDATE
I suggest you to:
add index for StartDate and EndDate Column in your table (one for each, not one for both)
ToList executes the query against DB, while first line is not.
Can you show some conditions code here?
To increase the performance you need to optimize query/create indexes on the DB tables.
Your first line of code only returns an IQueryable. This is a representation of a query that you want to run not the result of the query. The query itself is only runs on the databse when you call .ToList() on your IQueryable, because its the first point that you have actually asked for data.
Your adjustment to add the .Select only adds to the existing IQueryable query definition. It doesnt change what conditions have to execute. You have essentially changed the following, where you get back 8 records:
select * from Requests where [some conditions];
to something like:
select '' from Requests where [some conditions];
You will still have to perform the full query with the conditions giving you 8 records, but for each one, you only asked for an empty string, so you get back 8 empty strings.
The long and the short of this is that any performance problem you are having is coming from your "some conditions". Without seeing them, its is difficult to know. But I have seen people in the past add .Where clauses inside a loop, before calling .ToList() and inadvertently creating a massively complicated query.
Jaanus. The most likely reason of this issue is complecity of generated SQL query by entity framework. I guess that your filter condition contains some check of other tables.
Try to check generated query by "SQL Server Profiler". And then copy this query to "Management Studio" and check "Estimated execution plan". As a rule "Management Studio" generatd index recomendation for your query try to follow these recomendations.

Is there a wildcard for the .Take method in LINQ?

I am trying to create a method using LINQ that would take X ammount of products fron the DB, so I am using the .TAKE method for that.
The thing is, in situations I need to take all the products, so is there a wildcard I can give to .TAKE or some other method that would bring me all the products in the DB?
Also, what happens if I do a .TAKE (50) and there are only 10 products in the DB?
My code looks something like :
var ratingsToPick = context.RatingAndProducts
.ToList()
.OrderByDescending(c => c.WeightedRating)
.Take(pAmmount);
You could separate it to a separate call based on your flag:
IEnumerable<RatingAndProducts> ratingsToPick = context.RatingAndProducts
.OrderByDescending(c => c.WeightedRating);
if (!takeAll)
ratingsToPick = ratingsToPick.Take(pAmmount);
var results = ratingsToPick.ToList();
If you don't include the Take, then it will simply take everything.
Note that you may need to type your original query as IEnumerable<MyType> as OrderByDescending returns an IOrderedEnumerable and won't be reassignable from the Take call. (or you can simply work around this as appropriate based on your actual code)
Also, as #Rene147 pointed out, you should move your ToList to the end otherwise it will retrieve all items from the database every time and the OrderByDescending and Take are then actually operating on a List<> of objects in memory not performing it as a database query which I assume is unintended.
Regarding your second question if you perform a Take(50) but only 10 entries are available. That might depend on your database provider, but in my experience, they tend to be smart enough to not throw exceptions and will simply give you whatever number of items are available. (I would suggest you perform a quick test to make sure for your specific case)
Your current solution always takes all products from database. Because you are calling ToList(). After loading all products from database you are taking first N in memory. In order to conditionally load first N products, you need to build query
int? countToTake = 50;
var ratingsToPick = context.RatingAndProducts
.OrderByDescending(c => c.WeightedRating);
// conditionally take only first results
if (countToTake.HasValue)
ratingsToPick = ratingsToPick.Take(countToTake.Value);
var result = ratingsToPick.ToList(); // execute query

Categories