Using FirstOrDefault() in a Where() clause

Using FirstOrDefault() in a Where() clause - c#

I've got a LINQ query that's returning no results when I know that it should be returning at least one. I'm building up the query dynamically. I looked at the result set in the debugger right before I get to the line that filters out all of the results and it contains hundreds of rows. After this line, it contains 0 when it really should contain at least one.
query = query.Where(x =>
x.Lineages.FirstOrDefault().Sire.Contains(options.PedigreeContains));
'x' in this case represents an entity called 'Horse'. 'options.PedigreeContains' is just a string value. The Lineages table looks like this:
ID HorseID Sire Dam etc...
I can even pull up a Horse entity in the debugger (the one I know should be returned as a result), inspect the Lineages property and see it fully populated, including the Sire value that matches my search. So everything SEEMS like it should be working, except there's obviously some issue with the LINQ query that I'm using.
Does anyone see anything inherently wrong with what I'm doing that would cause this to filter out results that I know should be there?
EDIT: For clarification, it's a 1-to-1 relationship. I know the Lineages object exists, I know there's only one, and I know it matches. It's just for some reason it's returning zero results so I thought there might be a problem with the way I wrote the query. If that query should work the way it's written though (minus all of the extra "possibilities" if no lineages exist, more than one, etc) then it must be an issue somewhere else in my code.

What if FirstOrDefault returns the "Default"? You'll get a NullReferenceException.

You are providing no means to order the Lineages, if the first one returned does not have the desired Sire containing option.PedigreeContains. In such a case, the result set would be empty, regardless of the other Sire's in the Lineages.

Actually answring your question: No. There is nothing inherently wrong with your query. It must be an issue somewhere else in your query construction, in the database structure or in your data.
When debugging, instead of enumerating and verifying the result count, copy the query expression value and look what the generated SQL looks like. You can do that before and after altering the IQueryable query. Other suggestions like #Jalalx use of .Any() to avoid what #John Saunders points out.

If you do FirstOrDefault() where you have it, aren't you taking the first of what could be many sires, so if a later one matches your where you won't find it?
query = query.Where(x =>
x.Lineages.FirstOrDefault(lineage => lineage.Sire.Contains(options.PedigreeContains))).Sire;

Related

Fastest way to see if a entity's child table contains any records

After reading: http://forums.devart.com/viewtopic.php?f=31&t=22425&p=74949&hilit=IQueryable#p74949
It seems like I should use IQueryable to figure out if a entity's child table has a record or not.
My initial attempt was like:
Ace.DirDs.Any();
This line of code (or similar lines of code) could be run hundreds of times and was causing a huge performance issue.
So from reading the previous post in the link above I thought I would try something like:
IQueryable<DirD> dddd = CurrentContext.DirDs
.Where(d => d.AceConfigModelID == ace.ID).Take(1);
bool hasAChild = dddd.Any();
Would there be a better way?

There's no need for a Take(1). Plus, this one is shorter to type.
bool hasAChild = CurrentContext.DirDs.Any(d => d.AceConfigModelID == ace.ID);

I may be wrong, but I think Any() will still cause an initial Read() of the first row from the database server back to the client. You may be better getting a Count so you only get a number back:
bool hasAChild = CurrentContext.DirDs.Count(d => d.AceConfigModelID == ace.ID) > 0;
By the way this doesn't appear to be looking at a child table just DirDs.

In your example, you are materializing the IQueryable into an IEnumerable, and so the entire query is executed and you are then only just taking the first row in the result. Either of the previously exampled answers will be exponentially faster than this. Be careful when using Count as there is both a property Count and a method Count(). In order to avoid your original problem (and if you choose the Count route), you'll want to use the method Count() like in Rhumborl's example, otherwise it'll execute the query and give you the property Count of your IEnumerable that was returned. The method Count() essentially translates into a SQL COUNT whereas the method Any() translates into a SQL EXISTS (at least when working with Microsoft SQL Server). One can be more efficient than the other at this level depending on what your backend database is and which version of EF you're using.
My vote would be to always default to Any() and explore Count() if you run into performance troubles. There can still be performance costs with the method Count() at the database level, but that still depends on the database you're using.
Here's a good related answer: Linq To Entities - Any VS First VS Exists

LINQ to Entities does not recognize the method 'Int32 IndexOf(System.String, System.StringComparison)' method

I have executed a linq query by using Entityframework like below
GroupMaster getGroup = null;
getGroup = DataContext.Groups.FirstOrDefault(item => keyword.IndexOf(item.Keywords,StringComparison.OrdinalIgnoreCase)>=0 && item.IsEnabled)
when executing this method I got exception like below
LINQ to Entities does not recognize the method 'Int32 IndexOf(System.String, System.StringComparison)' method, and this
method cannot be translated into a store expression.
Contains() method by default case sensitive so again I need to convert to lower.Is there any method for checking a string match other than the contains method and is there any method to solve the indexOf method issue?

The IndexOf method Of string class will not recognized by Entity Framework, Please replace this function with SQLfunction or Canonical functions
You can also take help from here or maybe here
You can use below code sample:
DataContext.Groups.FirstOrDefault(item =>
System.Data.Objects.SqlClient.SqlFunctions.CharIndex(item.Keywords, keyword).Value >=0 && item.IsEnabled)

You really only have four options here.
Change the collation of the database globally. This can be done in several ways, a simple google search should reveal them.
Change the collation of individual tables or columns.
Use a stored procedure and specify the COLATE statement on your query
perform a query and return a large set of results, then filter in memory using Linq to Objects.
number 4 is not a good option unless your result set is pretty small. #3 is good if you can't change the database (but you can't use Linq with it).
numbers 1 and 2 are choices you need to make about your data model as a whole, or if you only want to do it on specific fields.
Changing the Servers collation:
http://technet.microsoft.com/en-us/library/ms179254.aspx
Changing the Database Collation:
http://technet.microsoft.com/en-us/library/ms179254.aspx
Changing the Columns Collation:
http://technet.microsoft.com/en-us/library/ms190920(v=sql.105).aspx
Using the Collate statement in a stored proc:
http://technet.microsoft.com/en-us/library/ms184391.aspx

Instead you can use this method below for lowering the cases:
var lowerCaseItem = item.ToLower();
If your item is of type string. Then this might get you through that exception.

Erik Funkenbush' answer is perfectly valid when looking at it like a database problem. But I get the feeling that you need a better structure for keeping data regarding keywords if you want to traverse them efficiently.
Note that this answer isn't intended to be better, it is intended to fix the problem in your data model rather than making the environment adapt to the current (apparently flawed, since there is an issue) data model you have.
My main suggestion, regardless of time constraint (I realize this isn't the easiest fix) would be to add a separate table for the keywords (with a many-to-many relationship with its related classes).
[GROUPS] * ------- * [KEYWORD]
This should allow for you to search for the keyword, and only then retrieve the items that have that keyword related to it (based on ID rather than a compound string).
int? keywordID = DataContext.Keywords.Where(x => x.Name == keywordFilter).Select(x => x.Id).FirstOrDefault();
if(keywordID != null)
{
getGroup = DataContext.Groups.FirstOrDefault(group => group.Keywords.Any(kw => kw.Id == keywordID));
}
But I can understand completely if this type of fix is not possible anymore in the current project. I wanted to mention it though, in case anyone in the future stumbles on this question and still has the option for improving the data structure.

Linq. Help me tune this!

I have a linq query that is causing some timeout issues. Basically, I have a query that is returning the top 100 results from a table that has approximately 500,000 records.
Here is the query:
using (var dc = CreateContext())
{
var accounts = string.IsNullOrEmpty(searchText)
? dc.Genealogy_Accounts
.Where(a => a.Genealogy_AccountClass.Searchable)
.OrderByDescending(a => a.ID)
.Take(100)
: dc.Genealogy_Accounts
.Where(a => (a.Code.StartsWith(searchText)
|| a.Name.StartsWith(searchText))
&& a.Genealogy_AccountClass.Searchable)
.OrderBy(a => a.Code)
.Take(100);
return accounts.Select(a =>
}
}
Oddly enough it is the first linq query that is causing the timeout. I thought that by doing a 'Take' we wouldn't need to scan all 500k of records. However, that must be what is happening. I'm guessing that the join to find what is 'searchable' is causing the issue. I'm not able to denormalize the tables... so I'm wondering if there is a way to rewrite the linq query to get it to return quicker... or if I should just write this query as a Stored Procedure (and if so, what might it look like). Thanks.

Well to start with, I'd find out what query is being generated (in LINQ to SQL you'd set the Log on the data context) and then profile it in SQL Server Management Studio. Play with it there until you've found something that is fast enough (either by changing the query or adding indexes) and if you've had to change the query, work out how to represent that in LINQ.
I suspect the problem is that you're combining OrderBy and Take - which means it potentially needs to find out all the results in order to work out which the top 100 would look like. Is Code indexed? If not, try indexing that - it may help by allowing the server to consider records in the order in which they'd be returned, so it can stop after it's found 100 records. You should look at indexes for the other columns too.

The Take(100) translates to "Select Top 100" etc. This would help if your problem was an otherwise huge result set, where there are a lot of columns returned. I bet though that your problem is a table scan resulting from the query. In this case, .Take(100) might not help much at all.
So, the likely culprit is the same as if you were doing SQL using ADO.NET: How are your Indxes? Are the fields being searched fields for which you don't have good indexes? This would cause a drastic decrease in performance compared to queries that do utilize good indexes. Add an index that includes Code and Name and see what happens. Not using an index for Code is guaranteed to hose you, because of the Order By. Also, what field links Genealogy_Accounts and Genealogy_AccountClass? A lack of index on either table could hose things. (I would guess an index including Searchable is unlikely to help.)
Use SQL Profiler to see the actual query being run (though you can do this in VS too), and to see how bad it really is on the server.
The problem might be LINQ doing something stupid generating the query, but this is probably not the case. We're finding LINQ-to-SQL often makes better queries than we do. Even if it looks goofy, it's usually very efficient. You can put the SQL in Query Analyzer, and check out the query plan. Then rewrite the SQL to be more human-simple and see if it improve things -- I bet it won't. I think you'll still see a table scan, indicating something is wrong with your index.

Linq help - Sql trace returns result, but datacontext returning null

var adminCov = db.SearchAgg_AdminCovs.SingleOrDefault(l => l.AdminCovGuid == covSourceGuid);
adminCov keeps coming back null. When I run SQL profiler, I can see the generated linq, when I past that into management Studio, I get the result I expect.
LinqToSql generates this:
exec sp_executesql N'SELECT [t0].[AdminCovGuid], [t0].[AdminPolicyId], [t0].[CertSerialNumber], [t0].[CertNumber], [t0].[PseudoInsurerCd], [t0].[SourceSystemCode], [t0].[CovSeqNumber], [t0].[RiderSeqNumber], [t0].[CovRiderIndicator], [t0].[CovCd], [t0].[AddrSeqNumber], [t0].[TransferSeqNumber], [t0].[CovStatusIndicator], [t0].[CovEffectiveDate], [t0].[CovExpirationDate], [t0].[CovCancelDate], [t0].[ClmIntegCode], [t0].[ClmNumber], [t0].[ClmCertSeqNumber], [t0].[TermNumber], [t0].[CovPaidThruDate], [t0].[BillThruDate], [t0].[BillModeCode], [t0].[BillModeDesc], [t0].[CalcModeCode], [t0].[CalcModeDesc], [t0].[Form1Name], [t0].[BenefitAmt], [t0].[CovDesc], [t0].[ProdLineDesc], [t0].[PremiumAmt], [t0].[PremiumTypeIndicator], [t0].[PremiumTypeDesc]
FROM [dbo].[SearchAgg_AdminCov] AS [t0]
WHERE [t0].[AdminCovGuid] = #p0',N'#p0 uniqueidentifier',#p0='D2689692-33E8-4B31-A77B-2D3A627145D4'
When I execute, I get a result. What am I missing here?
Thanks for any help,
~ck in San Diego

This is really good question. I had the same issue with Linq to SQL when selecting invoices in the date range. Some of them were not present in the object results while they were included in the generated SQL query result. I had some serious trouble with it because some invoices were not exported to the accounting software.
What I did was to create stored procedure and everything worked perfectly fine.
I would really like to know the true solution for this and why it happened.

Do you get your result back if you change your statement, as follows (note the "Equals" instead of the "==")?
var adminCov = db.SearchAgg_AdminCovs.SingleOrDefault(l => l.AdminCovGuid.Equals(covSourceGuid));
I have run into some comparison equality issues with GUIDs in the past (usually in Unit Testing), but the same might apply here.

Using Single or SingleOrDefault is always risky if there could be zero or more than one record matching the criteria. SingleOrDefault will return null if there are no matches or more than one match (in your case it could be more than one since you say there is data). This should cause a "Single" to throw an exception, if you try that. As an alternative you could try to use FirstOrDefault to get the first match if there is at least one match. It will return null when there are no matches.

Does your SearchAgg_AdminCovs table has a primary key set? I'm not sure, but I used have some headache about forget to set one, but not sure if it was select/update/insert or delete.

Selecting first 100 records using Linq

How can I return first 100 records using Linq?
I have a table with 40million records.
This code works, but it's slow, because will return all values before filter:
var values = (from e in dataContext.table_sample
where e.x == 1
select e)
.Take(100);
Is there a way to return filtered? Like T-SQL TOP clause?

No, that doesn't return all the values before filtering. The Take(100) will end up being part of the SQL sent up - quite possibly using TOP.
Of course, it makes more sense to do that when you've specified an orderby clause.
LINQ doesn't execute the query when it reaches the end of your query expression. It only sends up any SQL when either you call an aggregation operator (e.g. Count or Any) or you start iterating through the results. Even calling Take doesn't actually execute the query - you might want to put more filtering on it afterwards, for instance, which could end up being part of the query.
When you start iterating over the results (typically with foreach) - that's when the SQL will actually be sent to the database.
(I think your where clause is a bit broken, by the way. If you've got problems with your real code it would help to see code as close to reality as possible.)

I don't think you are right about it returning all records before taking the top 100. I think Linq decides what the SQL string is going to be at the time the query is executed (aka Lazy Loading), and your database server will optimize it out.

Have you compared standard SQL query with your linq query? Which one is faster and how significant is the difference?
I do agree with above comments that your linq query is generally correct, but...
in your 'where' clause should probably be x==1 not x=1 (comparison instead of assignment)
'select e' will return all columns where you probably need only some of them - be more precise with select clause (type only required columns); 'select *' is a vaste of resources
make sure your database is well indexed and try to make use of indexed data
Anyway, 40milions records database is quite huge - do you need all that data all the time? Maybe some kind of partitioning can reduce it to the most commonly used records.

I agree with Jon Skeet, but just wanted to add:
The generated SQL will use TOP to implement Take().
If you're able to run SQL-Profiler and step through your code in debug mode, you will be able to see exactly what SQL is generated and when it gets executed. If you find the time to do this, you will learn a lot about what happens underneath.
There is also a DataContext.Log property that you can assign a TextWriter to view the SQL generated, for example:
dbContext.Log = Console.Out;
Another option is to experiment with LINQPad. LINQPad allows you to connect to your datasource and easily try different LINQ expressions. In the results panel, you can switch to see the SQL generated the LINQ expression.

I'm going to go out on a limb and guess that you don't have an index on the column used in your where clause. If that's the case then it's undoubtedly doing a table scan when the query is materialized and that's why it's taking so long.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.