SQL server is able to translate EF .First() using its function TOP(1). But when using Entity Framework's .Last() function, it throws an exception. SQL server does not recognize such functions, for obvious reasons.
I used to work it around by sorting descending and taking the first corresponding line :
var v = db.Table.OrderByDescending(t => t.ID).FirstOrDefault(t => t.ClientNumber == ClientNumberDetected);
This does it with a single query, but sorting the whole table (million rows) before querying...
Do I have good reasons to think there will be speed issues if I abuse of this technique ?
I thought of something similar... but it requires two query :
int maxID_of_Client = db.Where(t => t.ClientNumber == ClientNumberDetected).Max(t => t.ID);
var v = db.First(t => t.ID == maxID_of_Client);
It's consisting of retrieving the max ID of the client, then use this ID to retrieve the last line of the client.
It doesn't seems faster to query two times...
There must be a way to optimize this and use a single query without sorting millions of datas.
Unless there is something I don't understand, I'm probably not the first to think about this problem and I want to solve it for good !
Thanks in advance.
The assumption driving this question is that result sets with no ordering clause come back from your DB in any predictable order at all.
In reality, result sets that come back from SQL have no implicit ordering and none should be assumed.
Therefore, the result of
db.Table.FirstOrDefault(t => t.ClientNumber == ClientNumberDetected)
is actually indeterminate.
Whether you're taking first or last, without ordering it's all meaningless anyway.
Now, what goes to SQL where you add an ordering clause to your LINQ? It will be something similar to...
SELECT TOP(1) something FROM somewhere WHERE foo=bar ORDER BY somevalue
or, in the the descending/last case
SELECT TOP(1) something FROM somewhere WHERE foo=bar ORDER BY somevalue DESC
From SQL's POV, there's no significant difference here and your DB will be optimized for this sort of query. The index can be scanned in either direction, and the cost of each query above is the same.
TL;DR :
db.Table.OrderByDescending(t => t.ID)
.FirstOrDefault(t => t.ClientNumber == ClientNumberDetected)
is just fine.
Related
Is that possible in LINQ to write a nice one-liner to get a first matched element or if there's no match than get first element in the collection?
E.g. you have a collection of parrots and you want yellow parrot but if there's no yellow parrots - then any will do, something like this:
Parrots.MatchedOrFirst(x => x.Yellow == true)
I'm trying to avoid double-go to SQL Server and the ORM we use in this particular case is Dapper.
What about:
var matchedOrFirst = Parrots.FirstOrDefault(x => x.Yellow == true)
?? Parrots.FirstOrDefault();
Edit
For structs, this should work:
var matchedOrFirst = Parrots.Any(x => x.Yellow == true)
? Parrots.First(x => x.Yellow == true)
: Parrots.FirstOrDefault();
Edit: It was a linq to SQL solution
First building a handy extension
public static T MatchedOrFirstOrDefault<T>(this IQueryable<T> collection, System.Linq.Expressions.Expression<Func<T, Boolean>> predicate)
{
return (from item in collection.Where(predicate) select item)
.Concat((from item in collection select item).Take(1))
.ToList() // Convert to query result
.FirstOrDefault();
}
Using the code
var matchedOrFirst = Parrots.MatchedOrFirstOrDefault(x => x.Yellow);
If you want to avoid a 2nd SQL call and since requires branching logic, its unlikely that Dapper will know how to convert a LINQ query you come up with into appropriate SQL IIF, CASE, or whatever other SQL-specific functions you end up using.
I recommend you write a simple stored procedure to do that and call it from Dapper.
Depending on its usage though, if this page only has one or two queries on it already, and is located reasonably close (latency wise) to the server, a 2nd simple SELECT won't hurt the overall application that much. Unless it is in a loop or something, or your example is trivial compared to the actual query regarding the cost of the first SELECT.
If the SQL query I use to populate a generic List orders the result set like so (List<InventoryItem> inventoryItems is populated):
SELECT id, pack_size, description, department+(subdepartment/100) AS Dept, vendor_id, vendor_item, ave_cost, unit_list FROM t_inv ORDER BY id, pack_size
...is it redundant (it seems to me that it is, but I want to verify it) to use .OrderBy().ThenBy() in subsequent LINQ code like this:
public IEnumerable<InventoryItem> Get(string ID, int packSize, int CountToFetch)
{
return inventoryItems
.Where(i => (i.Id.CompareTo(ID) == 0 && i.PackSize > packSize) || i.Id.CompareTo(ID) > 0)
.OrderBy(i => i.Id)
.ThenBy(i => i.PackSize)
.Take(CountToFetch);
}
?
I probably could afford the miniscule amount of additional Purina Gerbil Chow required to power this, but (call me a PETA pet if you will) I'd still rather not waste energy needlessly.
If you are sorting the data at your database there is no need to sort it again
Since the SQL query that serves as the source of your LINQ query is part of your code, and because you know for sure that the records are going to be ordered, there is no point to enforce the ordering in your LINQ as well. If anything, add a debug assertion to check that the data is coming in correctly sorted, and turn off debug assertions in production.
The philosophy behind it is that you should have only one piece of code responsible for each piece of functionality. If someone modifies your query to bring items out of order, in all likelihood he does not understand what you were doing. It is better to point that out right away by an assertion, than to mask it with "belt and suspenders" sort in the LINQ part of your code.
It goes without saying that decisions like this one need to be well documented: you need to add a comment explaining that your function expects its input data to be sorted in a particular way, and that feeding unsorted data is an error.
Trying to refactor some code that has gotten really slow recently and I came across a code block that is taking 5+ seconds to execute.
The code consists of 2 statements:
IEnumerable<int> StudentIds = _entities.Filters
.Where(x => x.TeacherId == Profile.TeacherId.Value && x.StudentId != null)
.Select(x => x.StudentId)
.Distinct<int>();
and
_entities.StudentClassrooms
.Include("ClassroomTerm.Classroom.School.District")
.Include("ClassroomTerm.Teacher.Profile")
.Include("Student")
.Where(x => StudentIds.Contains(x.StudentId)
&& x.ClassroomTerm.IsActive
&& x.ClassroomTerm.Classroom.IsActive
&& x.ClassroomTerm.Classroom.School.IsActive
&& x.ClassroomTerm.Classroom.School.District.IsActive).AsQueryable<StudentClassroom>();
So it's a bit messy but first I get a Distinct list of Id's from one Table (Filters), then I query another Table using it.
These are relatively small tables, but it's still 5+ seconds of query time.
I put this in LINQPad and it showed that it was doing the bottom query first then running 1000 "distinct" queries afterwards.
On a whim I changed the "StudentIds" code by just adding .ToArray() at the end. This improved the speed 1000x ... it now takes like 100ms to complete the same query.
What's the deal? What am I doing wrong?
This is one of the pitfalls of deferred execution in Linq: In your first approach StudentIds is really an IQueryable, not an in-memory collection. That means using it in the second query will run the query again on the database - each and every time.
Forcing execution of the first query by using ToArray() makes StudentIds an in-memory collection and the Contains part in your second query will run over this collection that contains a fixed sequence of items - This gets mapped to something equivalent to a SQL where StudentId in (1,2,3,4) query.
This query will of course, be much much faster since you determined this sequence once up-front, and not every time the Where clause is executed. Your second query without using ToArray() (I would think) would be mapped to a SQL query with an where exists (...) sub-query that gets evaluated for each row.
ToArray() Materializes the initial query to the server memory.
My guess would be the query provider is not able to parse the expression StudentIds.Contains(x.StudentId). Hence it probably thinks that the studentIds is an array already loaded to memory. So it's probably querying the database over and over again during the parsing phase. The only way to know for sure is to setup the profiler.
If you need to do this on the db server, use a join, instead of "contains". If you need to use contains to do what looks like a join problem, you are likely to be missing a surrogate primary key or a foreign key somewhere.
You could also declare studentIds as IQueryable instead of IEnumerable. This might give the query provider the hint it needs to interpret the studentIds as expression aka. data not already loaded to memory. I somehow doubt this but worth a try.
If all else fails, use ToArray(). This will load the initial studentIds to memory.
I need to have parent and parent.child.count()....in the query.. when i do this it is taking 20 seconds....its not a huge database...Any ideas for optimization...
var plist = context.persons
.Select(p => new
{
p.fullName,
c.personID,
p.Status,
p.Birthdate,
p.Accounts.Count
}).ToList();
Here is a great article on using count() when you really meant to use any()
http://blogs.teamb.com/craigstuntz/2010/04/21/38598/
Do you need to use .count or could you use .any?
http://msdn.microsoft.com/en-us/library/bb534972.aspx
Since this is entity framework, open up the sql profiler and take a look at what sql queries are being sent to the database. It sounds like you may see that a single query is sent to fetch the group identifiers, and then another set of queries (one for each group) might be fetching the count. If that's happening, you'll have to post the linq query for someone to resolve the issue.
Based on the code you sent, it doesn't look like things should be taking that long. I have a few suggestions:
Use LinqPad to do this query. It will let you see the SQL that gets generated. Then run that SQL code in SQL Server Management Studio, and tell it to include the actual execution plan. This will help you learn whether there's a particular point in the query that's taking a lot of time. For example, if you don't have an index on the Account table's PersonId reference, this query will take a lot longer.
Look at how you're using this data. It's very rare that you really need to have all the people in your entire system in memory at the same time. In fact, I suspect that simply getting all this person data out of the database is probably taking a lot more time than the Count() is.
Are you displaying this data? If so, wouldn't it be better to "page" the results, only showing maybe ten entries at a time? You can use the .Take(int) method before calling .ToList() to get only as many entries as you need.
If you're processing and aggregating this data for the sake of site metrics, it's probably better to set up your query to return the end result before it gets evaluated.
If you can describe how this data is being used, or provide a screenshot of the SQL's execution, we can provide more feedback.
I solved a similar problem using the GroupBy method.
IEnumerable> accounts = Accounts.GroupBy(x => x.personID);
accounts.Count() will return the number of accounts that belong to the person.
accounts.Key will return the personID of the group.
I had a somehow similar problem, I tried these and worked out better :
child.count(x=> x.paretnID == inputParentID)
child.where(x=> x.parentID == inputParentID)
my original code which took around 15-20 seconds on each iteration was:
return (isEdit) ? db.ChasisBuys.Single(x => x.ChasisBuyID == long.Parse(Request.QueryString["chbid"])).Chasises.Count(y => y.Bikes.Count > 0 && y.ColorID == buyItems[(int)index].ColorID && y.ChasisTypeID == buyItems[(int)index].ChasisTypeID).ToString() : "-";
new code which runs good is :
**return (isEdit) ? db.Chasises.Where(x => x.ChasisBuyID == long.Parse(Request.QueryString["chbid"])).Count(y => y.Bikes.Count > 0 && y.ColorID == buyItems[(int)index].ColorID && y.ChasisTypeID == buyItems[(int)index].ChasisTypeID).ToString() : "-";**
Database has around 1000 records in chasises , about 5 in chasisBuys and about 20 in Bikes.
my opinion is that Linq to SQL queries does not do preevaluations such in logical statements which for instance if you write "return a && b && c;" if statement a is false other statements are not evaluated and I was expecting such thing in linq to sql but it's not the case.
I have a linq query that is causing some timeout issues. Basically, I have a query that is returning the top 100 results from a table that has approximately 500,000 records.
Here is the query:
using (var dc = CreateContext())
{
var accounts = string.IsNullOrEmpty(searchText)
? dc.Genealogy_Accounts
.Where(a => a.Genealogy_AccountClass.Searchable)
.OrderByDescending(a => a.ID)
.Take(100)
: dc.Genealogy_Accounts
.Where(a => (a.Code.StartsWith(searchText)
|| a.Name.StartsWith(searchText))
&& a.Genealogy_AccountClass.Searchable)
.OrderBy(a => a.Code)
.Take(100);
return accounts.Select(a =>
}
}
Oddly enough it is the first linq query that is causing the timeout. I thought that by doing a 'Take' we wouldn't need to scan all 500k of records. However, that must be what is happening. I'm guessing that the join to find what is 'searchable' is causing the issue. I'm not able to denormalize the tables... so I'm wondering if there is a way to rewrite the linq query to get it to return quicker... or if I should just write this query as a Stored Procedure (and if so, what might it look like). Thanks.
Well to start with, I'd find out what query is being generated (in LINQ to SQL you'd set the Log on the data context) and then profile it in SQL Server Management Studio. Play with it there until you've found something that is fast enough (either by changing the query or adding indexes) and if you've had to change the query, work out how to represent that in LINQ.
I suspect the problem is that you're combining OrderBy and Take - which means it potentially needs to find out all the results in order to work out which the top 100 would look like. Is Code indexed? If not, try indexing that - it may help by allowing the server to consider records in the order in which they'd be returned, so it can stop after it's found 100 records. You should look at indexes for the other columns too.
The Take(100) translates to "Select Top 100" etc. This would help if your problem was an otherwise huge result set, where there are a lot of columns returned. I bet though that your problem is a table scan resulting from the query. In this case, .Take(100) might not help much at all.
So, the likely culprit is the same as if you were doing SQL using ADO.NET: How are your Indxes? Are the fields being searched fields for which you don't have good indexes? This would cause a drastic decrease in performance compared to queries that do utilize good indexes. Add an index that includes Code and Name and see what happens. Not using an index for Code is guaranteed to hose you, because of the Order By. Also, what field links Genealogy_Accounts and Genealogy_AccountClass? A lack of index on either table could hose things. (I would guess an index including Searchable is unlikely to help.)
Use SQL Profiler to see the actual query being run (though you can do this in VS too), and to see how bad it really is on the server.
The problem might be LINQ doing something stupid generating the query, but this is probably not the case. We're finding LINQ-to-SQL often makes better queries than we do. Even if it looks goofy, it's usually very efficient. You can put the SQL in Query Analyzer, and check out the query plan. Then rewrite the SQL to be more human-simple and see if it improve things -- I bet it won't. I think you'll still see a table scan, indicating something is wrong with your index.