Speeding up LINQ queries (selecting data from table) - c#

I have written a code which looks like this:
using(var ctx = new myentitiesContext())
{
var currentLoggedUser = ctx.Users.FirstOrDefault(x=>x.Email==User.Identity.Name);
var items = ctx.Items.Where(x=>x.Sales>0 && x.UserId==currentLoggedUser.UserId).ToList();
}
As you can see it's a simple select from the DB. But the tricky part is that sometimes I can select a large quantity of data (50-100k records at a time). So what I've been wondering, are there any ways to tweak the LINQ to perform faster when the data is being pulled out of the table?
I've already created indexes in my table on FK UserId, so that part is done.
My question here is, is there any way to speed up LINQ queries via some tweaks in context configuration section, or perhaps by creating compiled queries, or via some other method ?
P.S. guys, would something like this work good:
ctx.Configuration.AutoDetectChangesEnabled = false;
// my queries...
ctx.Configuration.AutoDetectChangesEnabled = true;

In addition with the things that the rest of the users have written. You could disable lazy loading. That way if the Items Db Table has references to other tables they will not get loaded along with the Items unless you absolutely need it. Check these links
thecodegarden
mehdi
One more think that i would recommend is that you must log the sql queries that your linq expressions create and try to optimise them with your DBA. You could do this by adding an Action<string> delegate on the DbContext.Database.Log that will emit everything between a connection.Open() and a connection.Close(). You could also take the sql query out of your IQueryableor IQueryable<T> calling the .ToString() method on your IQueryable variable.

You should make projection first. For example, this:
var items = ctx.Items.Where(x=>x.Sales>0 && x.UserId==currentLoggedUser.UserId).ToList();
will be better if you write it like this:
var items = ctx.Items.Where(x.UserId==currentLoggedUser.UserId).Where(x2=>x2.Sales>0 ).ToList();
And if you don't need all the object you have to use the "Select" clause before the "Where" and project just the properties that you need to minimize the cost, like this:
ctx.Items.Select(e=>new {e.UserID,e.Sales}).Where(x.UserId==currentLoggedUser.UserId).Where(x2=>x2.Sales>0 ).ToList();

Related

How does linq actually execute the code to retrieve data from the data source?

I will start working on xamarin shortly and will be transferring a lot of code from android studio's java to c#.
In java I am using a custom classes which are given arguments conditions etc, convert them to SQL statements and then loads the results to the objects in the project's model
What I am unsure of is wether linq is a better option for filtering such data.
For example what would happen currently is somethng along these lines
List<Customer> customers = (new CustomerDAO()).get_all()
Or if I have a condition
List<Customer> customers = (new CustomerDAO()).get(new Condition(CustomerDAO.Code, equals, "code1")
Now let us assume I have transferred the classes to c# and I wish to do somethng similar to the second case.
So I will probably write something along the lines of:
var customers = from customer
in (new CustomerDAO()).get_all()
where customer.code.equals("code1")
select customer
I know that the query will only be executed when I actually try to access customers, but if I have multiple accesses to customers ( let us say that I use 4 foreach loops later on) will the get_all method be called 4 times? or are the results stored at the first execution?
Also is it more efficient (time wise because memory wise it is probably not) to just keep the get_all() method and use linq to filter the results? Or use my existing setup which in effect executes
Select * from Customers where code = 'code1'
And loads the results to an object?
Thanks in advance for any help you can provide
Edit: yes I do know there is sqlite.net which pretty much does what my daos do but probably better, and at some point I will probably convert all my objects to use it, I just need to know for the sake of knowing
if I have multiple accesses to customers ( let
us say that I use 4 foreach loops later on) will the get_all method be
called 4 times? or are the results stored at the first execution?
Each time you enumerate the enumerator (using foreach in your example), the query will re-execute, unless you store the materialized result somewhere. For example, if on the first query you'd do:
var customerSource = new CustomerDAO();
List<Customer> customerSource.Where(customer => customer.Code.Equals("code1")).ToList();
Then now you'll be working with an in-memory List<Customer> without executing the query over again.
On the contrary, if each time you'd do:
var filteredCustomers = customerSource.Where(customer => customer.Code.Equals("code1"))
foreach (var customer in filteredCustomers)
{
// Do stuff
}
Then for each enumeration you'll be exeucting the said query over again.
Also is it more efficient (time wise because memory wise it is
probably not) to just keep the get_all() method and use linq to filter
the results? Or use my existing setup which in effect executes
That really depends on your use-case. Lets imagine you were using LINQ to EF, and the customer table has a million rows, do you really want to be bringing all of them in-memory and only then filtering them out to use a subset of data? It would usually be better to full filtered query.

Efficiently paging large data sets with LINQ

When looking into the best ways to implement paging in C# (using LINQ), most suggestions are something along these lines:
// Execute the query
var query = db.Entity.Where(e => e.Something == something);
// Get the total num records
var total = query.Count();
// Page the results
var paged = query.Skip((pageNum - 1) * pageSize).Take(pageSize);
This seems to be the commonly suggested strategy (simplified).
For me, my main purpose in paging is for efficiency. If my table contains 1.2 million records where Something == something, I don't want to retrieve all of them at the same time. Instead, I want to page the data, grabbing as few records as possible. But with this method, it seems that this is a moot point.
If I understand it correctly, the first statement is still retrieving the 1.2 million records, then it is being paged as necessary.
Does paging in this way actually improve performance? If the 1.2 million records are going to be retrieved every time, what's the point (besides the obvious UI benefits)?
Am I misunderstanding this? Any .NET gurus out there that can give me a lesson on LINQ, paging, and performance (when dealing with large data sets)?
The first statement does not execute the actual SQL query, it only builds part of the query you intend to run.
It is when you call query.Count() that the first will be executed
SELECT COUNT(*) FROM Table WHERE Something = something
On query.Skip().Take() won't execute the query either, it is only when you try to enumerate the results(doing a foreach over paged or calling .ToList() on it) that it will execute the appropriate SQL statement retrieving only the rows for the page (using ROW_NUMBER).
If watch this in the SQL Profiler you will see that exactly two queries are executed and at no point it will try to retrieve the full table.
Be careful when you are using the debugger, because if you step after the first statement and try to look at the contents of query that will execute the SQL query. Maybe that is the source of your misunderstanding.
// Execute the query
var query = db.Entity.Where(e => e.Something == something);
For your information, nothing is called after the first statement.
// Get the total num records
var total = query.Count();
This count query will be translated to SQL, and it'll make a call to database.
This call will not get all records, because the generated SQL is something like this:
SELECT COUNT(*) FROM Entity where Something LIKE 'something'
For the last query, it doesn't get all the records neither. The query will be translated into SQL, and the paging run in the database.
Maybe you'll find this question useful: efficient way to implement paging
I believe Entity Framework might structure the SQL query with the appropriate conditions based on the linq statements. (e.g. using ROWNUMBER() OVER ...).
I could be wrong on that, however. I'd run SQL profiler and see what the generated query looks like.

Entity framework and linq string comparsion issue

I have a problem with linq..
lets see the code
I have an article class:
public calss Article
{
public string Tag{get; set; }
}
I save each tag for article splitted by , comma.
for example : first,second,third
and when I want to get an article I want to get articles that has any common tag.
I use this query but:
var relatedArticles =
_db.Articles.Where(a => a.Tag
.Split('،')
.Any(t => article.Tag
.Split('،')
.Any(ac => ac == t)));
and I am getting this exception:
LINQ to Entities does not recognize the method 'System.String[]
Split(Char[])' method
Any other way?
Update:
i cant keep tags in different table because i must let user to create tags as many as he wants when he/she is inserting article.
like 50 and it will be overhead to check if that tag exists or not when saving article to db.
Set your class as follows:
public class Article
{
public List<string> Tag{get; set; }
}
I'm not 100% sure what your 2nd statement does, but you can use a.Tag.Contains() to check your values.
I think my comment could be worth as answer to your problem, so i write it down as one :)
You should think about your table / class design.
You need to normalize it because that looks like n:m reference. Keep the article in one table with a reference to a mapping table, where you reference the arcticleId and the tagId and than one table with that tags with a primary key tagId.
If one Tag will change in future, you don't need to update every article, you just update that particular tag and it changes for every article.
The "a.Tag.Split('،')" is a node in the expression tree that your IQueryable is, not an actual operation. When you materialize results by calling something like ToList, the whole expression tree is translated into SQL expression before execution - this is the point of error because translator doesnt have an idea how to convert Split() method into sql statement.
As an alternative, you can retrieve all results into app, materialize them by ToList() and then do what you want with them as IEnumerables. Also, you can write a stored procedure and pass search tags array into there.
Also, maybe it will work - try to pass a simple values array (not as methods) into query so resulting sql looks like
"WHERE ... IN ...".
it means that Linq to entities failed to find translation of split method that can be written as SQL query. if you want to perform split functions you have to bring the record in memory by calling ToList(), AsEnumerable() etc.
But better approach would be to create separate table for tags in your db.
Linq query would look something like this(supposing many-to-many relationship between articles and tags):
var relatedArticles =
_db.Articles.Where(a => a.Tags.Any(t => t.Articles.Count() > 1));
// if Article has Tag which is assigned more then to one Article then it suits us
You can't call regular methods like string.Split() directly in Linq when working with EF, since it can't be translated to SQL.
You could append AsEnumerable() to your Where() clause to cause Linq to fetch the data, allowing you to perform operations like that on it later. Unfortunately, that will not permit you to do what you want with the list without fetching the whole list, which I'm sure you would rather avoid.
Perhaps you could do something like this instead?:
List<string> tagsForCurrentArticle = ... // fetch this first somehow
_db.Articles.Where(a =>
tagsForCurrentArticle.Any(tag =>
a.Tag.Contains(tag)))
Note, just to be clear: This should work, but the better option, if possible, would be to move your tags out into a separate table, as others have suggested.

Using Linq to return the count of items from a query along with its resultset

I am using C# MVC4 with Linq.
I have used dependency injection for my project which resulted in me having a separate Model's project along with a separate Repository project (and one for testing ect). All this no problem.
I moved my queries out of the controllers (old style) and into the repository (new DI style), and injected them. It works fine.
I have a standard linq query (pick any example, they are basic enough), which returns a set of items from the database as normal. No problems here either.
My problem is, that I want to implement paging, and I taught it would be simple enough to. Here is my steps:
Take in the results of the linq query from the repository (injected into the controller) store it in a var. It looks something like:
var results = _someInjectedCode.GetListById(SomeId);
Before, I was able to do something simple like:
results.Count()
results.Skip(SomeNum).Take(SomeOtherNum)
But now that I want paging, I need to do my Skip Take something like this:
var results = from xyz in _someInjectedCode.GetListById(SomeId).SomeId).Skip(SomeNum).Take(SomeOtherNum)
select new[] {a,id, a.fName, a.lName .....}
The problem with this is that I no longer have access to the total count of items before the list was shortened to the Pre Skip...Take state unless I do two queries which means hitting the DB twice.
What is the best way to resolve this issue.
I just do it like this:
var result = (from n in mycollection
where n.someprop == "some value"
select n).ToList();
var count = result.Count;
There are probably other ways, but this is the simplest that I know of.
Thinking about it from a SQL point of view, I can't think of a way in a single normal query to retrieve both the total count and a subset of the data, so I don't think you will be able to do it in LINQ either.
To avoid creating two separate commands, only thing I can think of is a stored proc that returns two tables (one with just the count, the other with your subset of results). It would still execute two queries, but in a single connection. You'd lose your LINQ though. So if you want to keep your LINQ query, you might be stuck with making two separate calls.
The other way is to retrieve the entire unpaged resultset into memory, then run your Take and Skip against the array, but this is pretty wasteful and probably worse than two calls.
You can either add additional parameters to your repository interface/class which will provide paging parameters and return count alongside your result or modify your interfaces to return IQueryable and then apply count and then skip/take before query is compiled and sent for execution.

IQueryable<>.ToString() too slow

I'm using BatchDelete found on the answer to this question: EF Code First Delete Batch From IQueryable<T>?
The method seems to be wasting too much time building the delete clause from the IQueryable. Specifically, deleting 20.000 elements using the IQueryable below is taking almost two minutes.
context.DeleteBatch(context.SomeTable.Where(x => idList.Contains(x.Id)));
All the time is spent on this line:
var sql = clause.ToString();
The line is part of this method, available on the original question linked above but pasted here for convenience:
private static string GetClause<T>(DbContext context, IQueryable<T> clause) where T : class
{
const string Snippet = "FROM [dbo].[";
var sql = clause.ToString();
var sqlFirstPart = sql.Substring(sql.IndexOf(Snippet, System.StringComparison.OrdinalIgnoreCase));
sqlFirstPart = sqlFirstPart.Replace("AS [Extent1]", string.Empty);
sqlFirstPart = sqlFirstPart.Replace("[Extent1].", string.Empty);
return sqlFirstPart;
}
I imagine making context.SomeTable.Where(x => idList.Contains(x.Id)) into a compiled query could help, but AFAIK you can't compile queries while using DbContext on EF 5. In thesis they should be cached but I see no sign of improvement on a second execution of the same BatchDelete.
Is there a way to make this faster? I would like to avoid manually building the SQL delete statement.
The IQueryable isn't cached and each time you evaluate it you're going out to SQL. Running ToList() or ToArray() on it will evaluate it once and then you can work with the list as the cached version.
If you want to preserve you're interfaces, you'd use ToList().AsQueryable() and this would pass in a cached version.
Related post.
How do I cache an IQueryable object?
It seems there is no way to cache the IQueryable in this case, because the query contains a list of ids to check against and the list changes in every call.
The only way I found to avoid the two minute delay in building the query every time I had to mass-delete objects was to use ExecuteSqlCommand as below:
var list = string.Join("','", ids.Select(x => x.ToString()));
var qry = string.Format("DELETE FROM SomeTable WHERE Id IN ('{0}')", list);
context.Database.ExecuteSqlCommand(qry);
I'll mark this as the answer for now. If any other technique is suggested that doesn't rely on ExecuteSqlCommand, I'll gladly change the answer.
There is a EF pattern that works Ok.
it uses projection. to return ONLY keys from DB. (projections are not added to context,
So this is pretty quick.
Then You build the context with KEY only stub POCOs, and light the fuse....
basically.
var deleteMagazine = Context.Set<DeadMeat>.Where(t=>t.IhateYou == true).Select(t=>t.THEKEY).toList
//Now instantiate a dummy POCO with KEY only for the list,
foreach ( var bullet in deleteMagazine)
{
context.Set<deadmeat>.attach(bullet);
context.set<deadmeat>.remove(bullet);
// consider saving chnages every 1000 records .... performance, trial different values
if (magazineisEmpty) // your counter logic here :-)
context.SaveChanges
}
// shoot anyone still moving
context.SaveChanges
check SQL server profiler....

Categories