Retrieve only the first row from a table in Entity Framework - c#

Background:
Entity Framework 4, with SQL Server 2008
Problem:
I have a table Order. Each row has a column Timestamp.
The user can choose some time in past and I need to get the Order closest to the specified time, but that had occurred before the specified time. In other words, the last order before the specified time.
For example, if I have orders
2008-01-12
2009-04-17
2009-09-24
2010-11-02
2010-12-01
2011-05-16
and choose a date 2010-07-22, I should get the 2009-09-24 order, because that's the last order before the specified date.
var query = (from oData in db.OrderDatas
where oData.Timestamp <= userTime
orderby oData.Timestamp ascending
select oData).Last();
This is closest to what I am trying. However, I am not sure how exactly does the Last operator work when translated to SQL, if it's translated at all.
Question:
Will this query fetch all data (earlier than userTime) and then take the last element, or will it be translated so that only one element will be returned from the database? My table can hold very large number of rows (100000+) so performance is an issue here.
Also, how would one retrieve the closest time in the database (not necessarily the earlier time)? In the example of 2010-07-22, one would get 2010-11-02, because it is closer to the date specified than 2009-09-24.

In general, if you're concerned about how LINQ behaves, you should check what does happen with the SQL. If you haven't worked out how to see how your LINQ queries are turned into SQL, that should be the very next thing you do.
As you noted in your comment, Last() isn't supported by LINQ to SQL so the same may be true for EF. Fortunately, it's easy to use First() instead:
var query = (from oData in db.OrderDatas
where oData.Timestamp <= userTime
orderby oData.Timestamp descending
select oData).First();

Try using:
var query = (from oData in db.OrderDatas
where oData.Timestamp <= userTime
orderby oData.Timestamp descending
select oData).Take(1);
It's the equivalent of TOP 1

Question:
Will this query fetch all data (earlier than userTime) and then take
the last element, or will it be translated so that only one element
will be returned from the database? My table can hold very large
number of rows (100000+) so performance is an issue here.
In this case, using the first() approach, the query will be executed immediately and it will optimized in such a way that it will ony retrieve 1 record. Most probably a top(1) select. You really need to check the genereated sql with a sql profilihg tool or by using the log of the datacontext. Or you can use linqpad. linq-2-sql can lead to N+1 queries if not used the proper way. This behaviour is quite predictable but in the beginning you really have to be aware.

Related

Query ODataV4 connected service with LINQ - Get last record from table

Im trying to query my OData webservice from a C# application.
When i do the following:
var SecurityDefs = from SD in nav.ICESecurityDefinition.Take(1)
orderby SD.Entry_No descending
select SD;
i get an exception because .top() and .orderby is not supposed to be used together.
I need to get the last record in the dataset and only the last.
The purpose is to get the last used entry number in a ledger and then continue creating new entries incrementing the found entry no.
I cant seem to find anything online that explains how to do this.
Its very important that the service only returns the last record from the feed since speed is paramount in this solution.
i get an exception because .top() and .orderby is not supposed to be used together.
Where did you read that? In general .top() or .Take() should ONLY be used in conjunction WITH .orderby(), otherwise the record being retrieved is not guaranteed to be repeatable or predictable.
Probably the compounding issue here is mixing query and fluent expression syntax, which is valid, but you have to understand the order of precedence.
Your syntax is taking 1 record, then applying a sort order... you might find it easier to start with a query like this:
// build your query
var SecurityDefsQuery = from SD in nav.ICESecurityDefinition
orderby SD.Entry_No descending
select SD;
// Take the first item from the list, if it exists, will be a single record.
var SecurityDefs = SecurityDefsQuery.FirstOrDefault();
// Take an array of only the first record if it exists
var SecurityDefsDeferred = SecurityDefsQuery.Take(1);
This can be executed on a single line using brackets, but you can see how the query is the same in both cases, SecurityDefs in this case is a single ICESecurityDefinition typed record, where as SecurityDefsDeferred is an IQueryable<ICESecurityDefinition> that only has a single record.
If you only need the record itself, you this one liner:
var SecurityDefs = (from SD in nav.ICESecurityDefinition
orderby SD.Entry_No descending
select SD).FirstOrDefault();
You can execute the same query using fluent notation as well:
var SecurityDefs = nav.ICESecurityDefinition.OrderByDescending(sd => sd.Entry_No)
.FirstOrDefault();
In both cases, .Take(1) or .top() is being implemented through .FirstOrDefault(). You have indicated that speed is important, so use .First() or .FirstOrDefault() instead of .Single() or .SingleOrDefault() because the single variants will actually request .Take(2) and will throw an exception if it returns 1 or no results.
The OrDefault variants on both of these queries will not impact the performance of the query itself and should have negligble affect on your code, use the one that is appriate for your logic that uses the returned record and if you need to handle the case when there is no existing record.
If the record being returned has many columns, and you are only interested in the Entry_No column value, then perhaps you should simply query for that specific value itself:
Query expression:
var lastEntryNo = (from SD in nav.ICESecurityDefinition
orderby SD.Entry_No descending
select SD.Entry_No).FirstOrDefault();
Fluent expression:
var lastEntryNo = nav.ICESecurityDefinition.OrderByDescending(sd => sd.Entry_No)
.Select(sd => sd.Entry_No)
.FirstOrDefault();
If Speed is paramount then look at providing a specific custom endpoint on the service to either serve the record or do not process the 'Entry_No` in the client at all, make that the job of the code that receives data from the client and compute it at the time the entries are inserted.
Making the query perform faster is not the silver bullet you might be looking for though, Even if this is highly optimised, your current pattern means that X number of clients could all call the service to get the current value of Entry_No, meaning all of them would start incrementing from the same value.
If you MUST increment the Entry_No from the client then you should look at putting a custom endpoint on the service to simply return the Next Entry_No to use. This should be optimistic meaning that you don't care if the Entry_No actually gets used in the end, but you can implement the end point such that every call will increment the field in the database and return the next value.
Its getting a bit beyond the scope of your initial post, but SQL Server now has support for Sequences that formalise this type of logic from a database and schema point of view, using Sequence simplifies how we can manage these types of incrementations from the client, because we no longer rely on the outcome of data updates to be comitted to the table before the client can increment the next record. (which is what your TOP, Order By Desc solution is trying to do.

Is Queryable.OrderBy unstable for SQL Server database?

OrderBy is stable for LINQ to Objects, but MSDN on Queryable.OrderBy doesn't mention if it is stable or not.
I guess it depends on the provider implementation. Is it unstable for SQL Server? Because it looks so. I did a quick look at Queryable source code, but it is not obvious from there.
I need to order a collection before other operations and I want to use IQueryable, rather than IEnumerable for the sake of performance.
// All the timestamps are the same and I am getting inconsistent
// results by running it multiple times, first few pages return the same results
var result = data.OrderBy(i => i.TimeStamp).Skip(start).Take(length);
but if I use
var result = data.ToList().OrderBy(i => i.TimeStamp).Skip(start).Take(length);
It works just fine, but I lose performance boost from LINQ to SQL. It seems combination of Queryable OrderBy/Skip/Take produce inconsistent results.
SQL Code generated seems fine to me:
SELECT
...
FROM [dbo].[Table] AS [Extent1]
ORDER BY [Extent1].[TimeStamp] ASC
OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY
In Linq-to-Entities LINQ queries are translated into SQL queries so Linq-to-Objects implementation of OrderBy doesn't matter. You should look at your database implementation of ORDER BY. If you are using MS SQL you can find in docs that:
To achieve stable results between query requests using OFFSET and FETCH, the following conditions must be met:
(...)
The ORDER BY clause contains a column or combination of columns that are guaranteed to be unique.
So ORDER BY for the same values does not guarantee the same order so limiting it could provide different results set. To solve this you can simply sort by some additional column that has unique values e.g. id. So basically you will have:
var result = data
.OrderBy(i => i.TimeStamp)
.ThenBy(i => i.Id)
.Skip(start)
.Take(length);
I take it that by "stable", you mean consistent. If you didn't have the ORDER BY in a SQL query, the order of the data is not guaranteed for each time you run the query. It will simply return all of the data in whatever order is most efficient for the server. When you add the ORDER BY, it will sort that data. Since you are sorting data where all of the sort values are the same, no rows are being reordered, so the ordered data is in an order you don't expect. If you need a specific order, you will need to add a secondary sort column such as an ID.
It is a best to never assume the order of data coming back from the server unless you explicitly define what that order is.

Create LINQ expression with two aggregate operators [duplicate]

This rather simple SQL query is proving to be quite perplexing when attempting it from LINQ.
I have a SQL table Plant with column ZoneMin.
I want to find the minimum and maximum of the values in the column.
The answer in T-SQL is quite simple:
SELECT MIN(ZoneMin), MAX(ZoneMin) FROM Plant
What's a LINQ query that could get me to this (or some similar) SQL?
I've made various attempts at .Aggregate() and .GroupBy() with no luck. I've also looked at several SO questions that seem similar.
This could be simply achieved with methods applied to a resulting array, but I shouldn't need to transport a value from every SQL row when it's so simple in T-SQL.
To achieve the same performance as your original query, you'll need to use grouping (by a constant to minimize impact, e.g. 0), so that you can refer to the same set of records twice in the same query. Using the table name causes a new query to be produced on each reference. Try the following:
(from plant in db.Plants
group plant by 0 into plants
select new { Min = plants.Min(p => p.ZoneMin), Max = plants.Max(p => p.ZoneMin) }
).Single()
This produces the following query:
SELECT MIN(plants.ZoneMin), MAX(plants.ZoneMin)
FROM (SELECT 0 AS Grp, ZoneMin FROM Plants) AS plants
GROUP BY plants.Grp
And after the optimizer is done with it, it spits out something equivalent to your query, at least according to SQL Server Management Studio.

Entity Framework SQL Query Execution

Using the Entity Framework, when one executes a query on lets say 2000 records requiring a groupby and some other calculations, does the query get executed on the server and only the results sent over to the client or is it all sent over to the client and then executed?
This using SQL Server.
I'm looking into this, as I'm going to be starting a project where there will be loads of queries required on a huge database and want to know if this will produce a significant load on the network, if using the Entity Framework.
I would think all database querying is done on the server side (where the database is!) and the results are passed over. However, in Linq you have what's known as Delayed Execution (lazily loaded) so your information isn't actually retrieved until you try to access it e.g. calling ToList() or accessing a property (related table).
You have the option to use the LoadWith to do eager loading if you require it.
So in terms of performance if you only really want to make 1 trip to the Database for your query (which has related tables) I would advise using the LoadWith options. However, it does really depend on the particular situation.
It's always executed on SQL Server. This also means sometimes you have to change this:
from q in ctx.Bar
where q.Id == new Guid(someString)
select q
to
Guid g = new Guid(someString);
from q in ctx.Bar
where q.Id == g
select q
This is because the constructor call cannot be translated to SQL.
Sql's groupby and linq's groupby return differently shaped results.
Sql's groupby returns keys and aggregates (no group members)
Linq's groupby returns keys and group members.
If you use those group members, they must be (re-)fetched by the grouping key. This can result in +1 database roundtrip per group.
well, i had the same question some time ago.
basically: your linq-statement is converted to a sql-statement. however: some groups will get translated, others not - depending on how you write your statement.
so yes - both is possible
example:
var a = (from entity in myTable where entity.Property == 1 select entity).ToList();
versus
var a = (from entity in myTable.ToList() where entity.Property == 1 select entity).ToList();

Selecting first 100 records using Linq

How can I return first 100 records using Linq?
I have a table with 40million records.
This code works, but it's slow, because will return all values before filter:
var values = (from e in dataContext.table_sample
where e.x == 1
select e)
.Take(100);
Is there a way to return filtered? Like T-SQL TOP clause?
No, that doesn't return all the values before filtering. The Take(100) will end up being part of the SQL sent up - quite possibly using TOP.
Of course, it makes more sense to do that when you've specified an orderby clause.
LINQ doesn't execute the query when it reaches the end of your query expression. It only sends up any SQL when either you call an aggregation operator (e.g. Count or Any) or you start iterating through the results. Even calling Take doesn't actually execute the query - you might want to put more filtering on it afterwards, for instance, which could end up being part of the query.
When you start iterating over the results (typically with foreach) - that's when the SQL will actually be sent to the database.
(I think your where clause is a bit broken, by the way. If you've got problems with your real code it would help to see code as close to reality as possible.)
I don't think you are right about it returning all records before taking the top 100. I think Linq decides what the SQL string is going to be at the time the query is executed (aka Lazy Loading), and your database server will optimize it out.
Have you compared standard SQL query with your linq query? Which one is faster and how significant is the difference?
I do agree with above comments that your linq query is generally correct, but...
in your 'where' clause should probably be x==1 not x=1 (comparison instead of assignment)
'select e' will return all columns where you probably need only some of them - be more precise with select clause (type only required columns); 'select *' is a vaste of resources
make sure your database is well indexed and try to make use of indexed data
Anyway, 40milions records database is quite huge - do you need all that data all the time? Maybe some kind of partitioning can reduce it to the most commonly used records.
I agree with Jon Skeet, but just wanted to add:
The generated SQL will use TOP to implement Take().
If you're able to run SQL-Profiler and step through your code in debug mode, you will be able to see exactly what SQL is generated and when it gets executed. If you find the time to do this, you will learn a lot about what happens underneath.
There is also a DataContext.Log property that you can assign a TextWriter to view the SQL generated, for example:
dbContext.Log = Console.Out;
Another option is to experiment with LINQPad. LINQPad allows you to connect to your datasource and easily try different LINQ expressions. In the results panel, you can switch to see the SQL generated the LINQ expression.
I'm going to go out on a limb and guess that you don't have an index on the column used in your where clause. If that's the case then it's undoubtedly doing a table scan when the query is materialized and that's why it's taking so long.

Categories