I am trying to run a Select on an IQUeryable linked to a database query. Its working correctly, but for all the properties being selecte-d, its running a seperate query.
My code looks something like this
IQueryable<MyDataSource> data = [Some Complicated Query I've been Building Up];
var results = data.Select(d => new
{
A = d.A,
B = d.B,
C = d.C
}).Take(100).ToArray();
Now, this is taking ages, even though the actual Query isn't taking that long.
When I ran an SQL profiler on it, I'm finding out that its running a different SQL select procedure for each property I'm selecting - for each entity I'm returning (so in the above example around 300 different queries, as well as the actual first query to perform the filtering).
I'm quite sure I'm doing something wrong here, what is it? I'm expecting it to run a single large query - which selects the right columns from the datasource (You know Select top 100 d.A, d.B, d.C from [bla bla] ), not all this mess.
You can influence the lazy/eager loading using DataLoadOptions
This will force eager loading for B
DataLoadOptions options = new DataLoadOptions();
options.LoadWith<a>(a => a.B);
dc.LoadOptions = options;
Related
I moved from Linq 2 SQL to Linq 2 Entities due to performance reasons for complex queries.
It's a little bit frustrating to get everything working.
In L2SQL I can run subqueries without any problems.
Now I figured out, that in L2E I have to place the subquery outside the query.
That's usually no problem, but how can I run a query if the subquery depends on the query?
For example:
I want to get the status of an document but for that I need the current documentID.
var query = from Dok in dbContext.Dokumente
orderby Dok.DokumentID descending
select new ClassDok() {
_Nr = Dok.Dokumentnr,
_StatusID = dbContext.Database.SqlQuery<int>("GetDokStatusID", Dok.DokumentID).Single()
};
I already tried to put
dbContext.Database.SqlQuery<int>("GetDokStatusID", Dok.DokumentID).Single()
into a method and return integer, but it doesn't work.
I have a little program that needs to do some calculation on a data range. The range maybe contain about half a millon of records. I just looked to my db and saw that a group by was executed.
I thought that the result was executed on the first line, and later I just worked with data in RAM. But now I think that the query builder combine the expression.
var Test = db.Test.Where(x => x > Date.Now.AddDays(-7));
var Test2 = (from p in Test
group p by p.CustomerId into g
select new { UniqueCount = g.Count() } );
In my real world app I got more subqueries that is based on the range selected by the first query. I think I just added a big overhead to let the DB make different selects.
Now I bascilly just call .ToList() after the first expression.
So my question is am I right about that the query builder combine different IQueryable when it builds the expression tree?
Yes, you are correct. LINQ expressions are lazily evaluated at the moment you evaluate them (via .ToList(), for example). At that point in time, Entity Framework will look at the total query and build an SQL statement to represent it.
In this particular case, it's probably wiser to not evaluate the first query, because the SQL database is optimized for performing set-based operations like grouping and counting. Rather than forcing the database to send all the Test objects across the wire, deserializing the results into in-memory objects, and then performing the grouping and counting locally, you will likely see better performance by having the SQL database just return the resulting Counts.
Is there a way to get count of resultset but return only top 5 records while making just one db hit instead of 2 (one for count and second for data)
There is not a particularly good way to do this in Entity Framework, at least as of v4. #Tobias writes a single LINQ query, but his suspicions are correct. You'll see multiple queries roll by in SQL Profiler.
Ignoring EF for a minute, this is a relatively complicated problem for SQL Server. Well, it's complicated once your data size gets large or your query gets complicated. You can get a flavor for what's involved here.
With that said, I wouldn't worry about it being 2 queries just yet. Don't optimize until you know it is an actual performance problem. You'll likely end up working around EF, maybe using the EF extensions and creating a stored proc that can take advantage of windowed functions and CTE's. Or maybe it will just return two result sets in a single procedure.
This little query should do the trick (I'm not sure if that's really just one physical query though, and it could be that the grouping is done in the code rather than in the DB), but it's definitely more convenient:
var obj = (from x in entities.SomeTable
let item = new { N = 1, x }
group item by item.N into g
select new { Count = g.Count(), First = g.Take(5) }).FirstOrDefault();
Nonetheless, just doing this in two queries will definitely be much faster (especially if you define them in one stored procedure, as proposed here).
I'm trying to get a list of rows in DataTableA where the value in Column 1 is not in Column1 of DataTableB.
I'm using the following LinQ query
//Not in Database
var query = from i in dtImport.AsEnumerable()
where !dtProducts.AsEnumerable().Any(p => p[colP] == i[colI])
select i;
Such that I want a list of products in the import table that aren't already in the products table.
Which seems to skip past the line quickly when I'm debugging but then when I call anything relating to that query such as int rows = query.Count<DataRow>(); or DataTable dtResult = query.CopyToDataTable(); it seems to take a long time so I just stop the program.
So, What am I doing wrong?
Linq uses deferred execution. The query is executed when it is used (not when declared)
For better performance you can use a HashSet like the following;
var set = new HashSet<int>(dtProducts.AsEnumerable().Select(p => p.colP));
var result = dtImport.AsEnumerable().Where(i => !set.Contains(i[colI])).ToList();
The slowdown is expected: the query does not get evaluated until you enumerate the results, so you skip this line in the debugger pretty quickly: all it does is preparing to query the data source; the querying is done on enumerating the results.
As far as I can tell without profiling your code, the issue is probably related to a big out-of-db select that happens when you convert dtProducts and dtImport to IEnumerable: essentially, you bring the data from both tables into memory before doing your select. If your tables are of considerable size, this is probably where most of the time goes. But again, the only sure way to tell is profiling.
Your query is slow, because it has to enumerate the products for each record in dtImport. Put the products into a dictionary first, to speed up your query.
var prod = dtProducts.AsEnumerable().ToDictionary(p => p[colP]);
var query = from imp in dtImport.AsEnumerable()
where !prod.ContainsKey(imp[colI])
select imp;
I have written what I thought to be a pretty solid Linq statement but this is getting 2 to 5 second wait times on execution. Does anybody have thoughts about how to speed this up?
t.states = (from s in tmdb.tmZipCodes
where zips.Contains(s.ZipCode) && s.tmLicensing.Required.Equals(true)
group s by new Licensing {
stateCode = s.tmLicensing.StateCode,
stateName = s.tmLicensing.StateName,
FIPSCode = s.tmLicensing.FIPSCode,
required = (bool)s.tmLicensing.Required,
requirements = s.tmLicensing.Requirements,
canWorkWhen = s.tmLicensing.CanWorkWhen,
appProccesingTime = (int) s.tmLicensing.AppProcessingTime
}
into state
select state.Key).ToList();
I've changed it to a two stage query which runs almost instantaneously by doing a distinct query to make my grouping work, but it seems to me that it is a little counter intuitive to have that run so much faster than a single query.
Im not sure why it's taking so long, but it might help to have a look at LINQPad, it will show you the actual query being generated and help optimize.
also, it might not be the actual query that's taking a long time, it might be the query generation. I've found that the longest part is when the linq is being converted to the sql statement.
you could possibly use a compiled query to speed up the sql generation process. a little information can be found on 3devs. I'm not trying to promote my blog entry but i think it fits.
I would hope it's irrelevant, but
s.tmLicensing.Required.Equals(true)
looks an awful lot (to me) like:
s.tmLicensing
assuming it's a Boolean property.
Given that you know it's true, I don't see much point in having it in the grouping either.
Having said those things, John Boker is absolutely right on both counts: find out whether it's the SQL or LINQ, and then attack the relevant bit.
You don't seem to be using the group, just selecting the key at the end. So, does this do the same thing that you want?
t.states = (from s in tmdb.tmZipCodes
where zips.Contains(s.ZipCode) && s.tmLicensing.Required.Equals(true)
select new Licensing {
stateCode = s.tmLicensing.StateCode,
stateName = s.tmLicensing.StateName,
FIPSCode = s.tmLicensing.FIPSCode,
required = (bool)s.tmLicensing.Required,
requirements = s.tmLicensing.Requirements,
canWorkWhen = s.tmLicensing.CanWorkWhen,
appProccesingTime = (int) s.tmLicensing.AppProcessingTime
}).Distinct().ToList();
Also bear in mind that LINQ does not execute a query until it has to. So if you build your query in two statements, it will not execute that query against the data context (in this case SQL Server) until the call to ToList. When the query does run it will merge the multiple querys into one query and execute that.