Linq to XML how many times is this XML file getting read? - c#

My site navigation has a concept of categories that have a description, image, and pages associated with them.
In _ViewStart.cshtml I have the following LINQ query and then store the results in PageData because I might be using the categories more than once on a given page and didn't want to re-run the query.
XDocument navigation = XDocument.Load(Server.MapPath("~/App_Data/Navigation.xml"));
IEnumerable<Category> categories = from category in navigation.Root.Descendants("category")
select new Category(
category.Attribute("name").Value,
category.Element("description").Value,
new CategoryImage(
category.Element("image").Element("path").Value,
category.Element("image").Element("cssClass").Value,
category.Element("image").Element("description").Value
),
(from page in category.Descendants("page") select new BetterSolutions.ViewModels.ProductPage(page.Attribute("display").Value, page.Value)).ToList()
);
PageData["categories"] = categories;
When I watch what happens through the debugger, anytime I access PageData["categories"] it keeps going back to the query in _ViewStart.cshtml.
When I change the above code by adding parenthesis around the LINQ query and adding .ToList() at the end, it appears to execute once and then never again.
What is the way I should be doing this? I think that adding the .ToList() is correct since the query appears to be only running once, but I might be misunderstanding how deferred execution within LINQ to XML is actually working.

What is the way I should be doing this? I think that adding the .ToList() is correct since the query appears to be only running once, but I might be misunderstanding how deferred executing within LINQ to XML is actually working.
What you see is deferred execution. When you create your categories it's not a collection of items: it's just a query which will be executed when results are needed. And that query definition is stored in PageData["categories"]. So every time to take it from there you have to execute it to get results.
Adding ToList() makes the results necessary right away, because you ask for list of results. And after that that list is stored in PageData["categories"]. That's why you don't have to execute query over and over again: because you already have the results stored in a list, and you don't even know where results came from before.
What is the right way to go? It depends. If you expect the file to change between PageData["categories"] calls and you need it to always return accurate results, you may stay with your current solution. If file does not change or it's OK not to read the file just once and discard all changes made to the file during program execution you should use ToList() to make it performing better and avoid unnecessary file access.
Update
My main answer is not completely correct. Even without ToList() accessing PageData["categories"] would not make the file access again, because the file is already completely loaded and parsed into XDocument instance. But it would travers the document itself to execute the query.

it Looks like you forgot to put to ToList() call for categories:
the simple fix would be :
PageData["categories"] = categories.ToList();

Related

How does linq actually execute the code to retrieve data from the data source?

I will start working on xamarin shortly and will be transferring a lot of code from android studio's java to c#.
In java I am using a custom classes which are given arguments conditions etc, convert them to SQL statements and then loads the results to the objects in the project's model
What I am unsure of is wether linq is a better option for filtering such data.
For example what would happen currently is somethng along these lines
List<Customer> customers = (new CustomerDAO()).get_all()
Or if I have a condition
List<Customer> customers = (new CustomerDAO()).get(new Condition(CustomerDAO.Code, equals, "code1")
Now let us assume I have transferred the classes to c# and I wish to do somethng similar to the second case.
So I will probably write something along the lines of:
var customers = from customer
in (new CustomerDAO()).get_all()
where customer.code.equals("code1")
select customer
I know that the query will only be executed when I actually try to access customers, but if I have multiple accesses to customers ( let us say that I use 4 foreach loops later on) will the get_all method be called 4 times? or are the results stored at the first execution?
Also is it more efficient (time wise because memory wise it is probably not) to just keep the get_all() method and use linq to filter the results? Or use my existing setup which in effect executes
Select * from Customers where code = 'code1'
And loads the results to an object?
Thanks in advance for any help you can provide
Edit: yes I do know there is sqlite.net which pretty much does what my daos do but probably better, and at some point I will probably convert all my objects to use it, I just need to know for the sake of knowing
if I have multiple accesses to customers ( let
us say that I use 4 foreach loops later on) will the get_all method be
called 4 times? or are the results stored at the first execution?
Each time you enumerate the enumerator (using foreach in your example), the query will re-execute, unless you store the materialized result somewhere. For example, if on the first query you'd do:
var customerSource = new CustomerDAO();
List<Customer> customerSource.Where(customer => customer.Code.Equals("code1")).ToList();
Then now you'll be working with an in-memory List<Customer> without executing the query over again.
On the contrary, if each time you'd do:
var filteredCustomers = customerSource.Where(customer => customer.Code.Equals("code1"))
foreach (var customer in filteredCustomers)
{
// Do stuff
}
Then for each enumeration you'll be exeucting the said query over again.
Also is it more efficient (time wise because memory wise it is
probably not) to just keep the get_all() method and use linq to filter
the results? Or use my existing setup which in effect executes
That really depends on your use-case. Lets imagine you were using LINQ to EF, and the customer table has a million rows, do you really want to be bringing all of them in-memory and only then filtering them out to use a subset of data? It would usually be better to full filtered query.

Why is using tolist() not a good approach here?

This is not a good approach here...! can anyone say why?
var dbc= new SchoolContext();
var a=dbc.Menus.ToList().Select(x=> new {
x.Type.Name,
ListOfChildmenus = x.ChildMenu.Select(cm=>cm.Name),
ListOfSettings = x.Settings.SelectMany(set=>set.Role)
});
Because when you call .ToList() or .FirstOrDefault() and so on (when you enumerate), your query will get executed.
So when you do dbc.Menus.ToList() you bring in memory from the database all your Menus, and you didn't want that.
You want to bring in memory only what you select ( the list of child menus and the list of settings ).
Relevant furter reading : http://www.codeproject.com/Articles/652556/Can-you-explain-Lazy-Loading - probably you are using lazy loading
And if you want to add a filter to your IQueryable you may read about difference between ienumerable, iqueryable http://blog.micic.ch/net/iqueryable-vs-ienumerable-vs-ihaveheadache
And some dinamic filtering https://codereview.stackexchange.com/questions/3560/is-there-a-better-way-to-do-dynamic-filtering-and-sorting-with-entity-framework
Actually Razvan's answer isn't totally accurate. What happens in your query is this:
When you call ToList() the contents of the entire table get dumped into memory.
When you access navigation properties such as ChildMenu and Settings a new query is generated and run for each element in that table.
If you'd done it like so:
dbc.Menus
.Select(x=> new {
x.Type.Name,
ListOfChildmenus = x.ChildMenu.Select(m=>m.Name),
ListOfSettings = x.Settings.SelectMany(z=>z.Role)
})
.ToList()
your whole structure would have been generated in one query and one round trip to the database.
Also, as Alex said in his comment, it's not necessarily a bad approach. For instance if your database is under a lot of load it's sometimes better to just dump things in the web application's memory and work with them there.

EF LINQ ToList is very slow

I am using ASP NET MVC 4.5 and EF6, code first migrations.
I have this code, which takes about 6 seconds.
var filtered = _repository.Requests.Where(r => some conditions); // this is fast, conditions match only 8 items
var list = filtered.ToList(); // this takes 6 seconds, has 8 items inside
I thought that this is because of relations, it must build them inside memory, but that is not the case, because even when I return 0 fields, it is still as slow.
var filtered = _repository.Requests.Where(r => some conditions).Select(e => new {}); // this is fast, conditions match only 8 items
var list = filtered.ToList(); // this takes still around 5-6 seconds, has 8 items inside
Now the Requests table is quite complex, lots of relations and has ~16k items. On the other hand, the filtered list should only contain proxies to 8 items.
Why is ToList() method so slow? I actually think the problem is not in ToList() method, but probably EF issue, or bad design problem.
Anyone has had experience with anything like this?
EDIT:
These are the conditions:
_repository.Requests.Where(r => ids.Any(a => a == r.Student.Id) && r.StartDate <= cycle.EndDate && r.EndDate >= cycle.StartDate)
So basically, I can checking if Student id is in my id list and checking if dates match.
Your filtered variable contains a query which is a question, and it doesn't contain the answer. If you request the answer by calling .ToList(), that is when the query is executed. And that is the reason why it is slow, because only when you call .ToList() is the query executed by your database.
It is called Deferred execution. A google might give you some more information about it.
If you show some of your conditions, we might be able to say why it is slow.
In addition to Maarten's answer I think the problem is about two different situation
some condition is complex and results in complex and heavy joins or query in your database
some condition is filtering on a column which does not have an index and this cause the full table scan and make your query slow.
I suggest start monitoring the query generated by Entity Framework, it's very simple, you just need to set Log function of your context and see the results,
using (var context = new MyContext())
{
context.Database.Log = Console.Write;
// Your code here...
}
if you see something strange in generated query try to make it better by breaking it in parts, some times Entity Framework generated queries are not so good.
if the query is okay then the problem lies in your database (assuming no network problem).
run your query with an SQL profiler and check what's wrong.
UPDATE
I suggest you to:
add index for StartDate and EndDate Column in your table (one for each, not one for both)
ToList executes the query against DB, while first line is not.
Can you show some conditions code here?
To increase the performance you need to optimize query/create indexes on the DB tables.
Your first line of code only returns an IQueryable. This is a representation of a query that you want to run not the result of the query. The query itself is only runs on the databse when you call .ToList() on your IQueryable, because its the first point that you have actually asked for data.
Your adjustment to add the .Select only adds to the existing IQueryable query definition. It doesnt change what conditions have to execute. You have essentially changed the following, where you get back 8 records:
select * from Requests where [some conditions];
to something like:
select '' from Requests where [some conditions];
You will still have to perform the full query with the conditions giving you 8 records, but for each one, you only asked for an empty string, so you get back 8 empty strings.
The long and the short of this is that any performance problem you are having is coming from your "some conditions". Without seeing them, its is difficult to know. But I have seen people in the past add .Where clauses inside a loop, before calling .ToList() and inadvertently creating a massively complicated query.
Jaanus. The most likely reason of this issue is complecity of generated SQL query by entity framework. I guess that your filter condition contains some check of other tables.
Try to check generated query by "SQL Server Profiler". And then copy this query to "Management Studio" and check "Estimated execution plan". As a rule "Management Studio" generatd index recomendation for your query try to follow these recomendations.

Using FirstOrDefault() in a Where() clause

I've got a LINQ query that's returning no results when I know that it should be returning at least one. I'm building up the query dynamically. I looked at the result set in the debugger right before I get to the line that filters out all of the results and it contains hundreds of rows. After this line, it contains 0 when it really should contain at least one.
query = query.Where(x =>
x.Lineages.FirstOrDefault().Sire.Contains(options.PedigreeContains));
'x' in this case represents an entity called 'Horse'. 'options.PedigreeContains' is just a string value. The Lineages table looks like this:
ID HorseID Sire Dam etc...
I can even pull up a Horse entity in the debugger (the one I know should be returned as a result), inspect the Lineages property and see it fully populated, including the Sire value that matches my search. So everything SEEMS like it should be working, except there's obviously some issue with the LINQ query that I'm using.
Does anyone see anything inherently wrong with what I'm doing that would cause this to filter out results that I know should be there?
EDIT: For clarification, it's a 1-to-1 relationship. I know the Lineages object exists, I know there's only one, and I know it matches. It's just for some reason it's returning zero results so I thought there might be a problem with the way I wrote the query. If that query should work the way it's written though (minus all of the extra "possibilities" if no lineages exist, more than one, etc) then it must be an issue somewhere else in my code.
What if FirstOrDefault returns the "Default"? You'll get a NullReferenceException.
You are providing no means to order the Lineages, if the first one returned does not have the desired Sire containing option.PedigreeContains. In such a case, the result set would be empty, regardless of the other Sire's in the Lineages.
Actually answring your question: No. There is nothing inherently wrong with your query. It must be an issue somewhere else in your query construction, in the database structure or in your data.
When debugging, instead of enumerating and verifying the result count, copy the query expression value and look what the generated SQL looks like. You can do that before and after altering the IQueryable query. Other suggestions like #Jalalx use of .Any() to avoid what #John Saunders points out.
If you do FirstOrDefault() where you have it, aren't you taking the first of what could be many sires, so if a later one matches your where you won't find it?
query = query.Where(x =>
x.Lineages.FirstOrDefault(lineage => lineage.Sire.Contains(options.PedigreeContains))).Sire;

Using EF4 how can I track the # of times a record is part of a Skip().Take() result set

So using EF4, I'm running a search query that ends with this common function:
query = query.Skip(5).Take(10);
Those records in the database have a column called ImpressionCount, which I intend to use to count the number of times that each record displayed on a page of search results.
What's the most efficient way to do this? Off the top of my head, I'm just going to look at the result set, get a list of ID's and then hit the database again using ADO.NET to do something like:
UPDATE TableName SET ImpressionCount = ImpressionCount + 1 WHERE Id IN (1,2,3,4,5,6,7,8,9,10)
Seems simple enough, just wondering if there's a more .NET 4 / Linq-ish way to do this that I'm not thinking of. One that doesn't involve another hit to the database would be nice too. :)
EDIT: So I'm leaning towards IAbstract's response as the answer since it doesn't appear there's a "built in" way to do this. I didn't think there was but it never hurts to ask. However, the only other question I think I want to throw out there is: is it possible to write a SQL trigger that could only operate on this particular query? I don't want ImpressionCount to update on EVERY select statement for the record (for example, when someone goes to view the detail page, that's not an impression -- if an admin edits the record in the back end, that's not an impression)...possible using LINQ or no?
SQL Server would somehow need to be able to identify that the query was generated by that Linq command, not sure if that's possible or not. This site is expecting relatively heavy traffic so I'm just trying to optimize where possible, but if it's overkill, I might just go ahead and hit the database again one time for each page of results.
In my opinion, it is better to go ahead and run with the SQL command as you have it. Just because we have LINQ does not mean it is always the best choice. Your statement gives you a one-call process to update impression counts and should be fairly quick.
You can technically use a .Select to modify each element in a returned result, but it's not idiomatic C#/linq, so you're better off using a foreach loop.
Example:
var query = query.ToList().Select(x => { x.ImpressionCount++; return x; });
As IAbstract said, be careful of performance issues. Using the example above, or a foreach will execute 10 updates. Your one update statement is better.
I know Linq2NHibernate has this same issue - trying to stick with Linq just isn't any good for updates (that's why it's called "language integrated query");
Edit:
Actually there's probably no reason why EF4 or NHibernate couldn't parse the select expression, realize it's an update, and translate it into an update statement an execute it, but certainly neither framework will do that. If that were something that could happen, you'd want a new .Update() extension method for IQueryable<T> to explicitly state that you're modifying data. Using .Select() for it is a dirty hack.
... which means there's no reason you couldn't write your own .Update(x => x.ImpressionCount++) extension method for IQueryable<T> that output the SQL you want and call ExecuteStoreCommand, but it would be a lot of work.

Categories