LINQ: Difference between 'Select c' and 'Select new (c...' - c#

What is difference between these two statements:
var result = from c in context.CustomerEntities
join p in context.ProjectEntities on c.Pk equals p.CustomerPk
where p.Entered > DateTime.Now.AddDays(-15)
select c;
and
var result = from c in context.CustomerEntities
join p in context.ProjectEntities on c.Pk equals p.CustomerPk
where p.Entered > DateTime.Now.AddDays(-15)
select new (c.Company, c.Entered, c.pk);
Is there any performance related issue in these statements. (For simplicity c contains only these 3 coloums.)
Thanks.

What is difference between these two statements
The first returns a filtered sequence of the original/complete source object; the second still does the filter, but returns a sequence of an anonymous type with just those three properties.
Is there any performance related issue in these statements
Performance depends on the back-end. If this is LINQ-to-Objects, then with new {...} you are creating extra objects (anonymous types) per record, so there may be a very small overhead. However, if this is LINQ-to-SQL etc (a database back-end), then this can be a huge benefit. The query builder will check which columns are needed, and will only fetch the three in your anon-type; if you have (for example) a BLOB (or just long varchar) in your data that you don't need, this can be a huge benefit.
Additional notes: you can't include anonymous types in the signature of a method, so you might find you need to declare your own DTO type for this purpose:
return new CustomerDto { Company = c.Company, Entered = c.Entered, PK = c.pk};
...
public class CustomerDto { ... }

The main difference is that the first example returns references to existing instances while the second example creates new instances of an anonymous type. I would be more concerned with this issue than any possible performance issues.

I ran some tests (using Stopwatch). In no cases were anonymous types faster, in Linq-to-SQL (against SQL Server), Linq-to-Entities (against MySQL), and Linq-to-Objects (against a List). In fact, usually it was slower, depending on how many columns you select.
One of my results:
I ran each query 5000 times against a 5-column table populated by 400 rows with Linq-to-Entities.
anonymous object (selecting 1 column): 17314ms
anonymous object (selecting 5 columns): 19193ms
source object: 16055ms
Anyway, the best way to find out is to test it yourself (takes about the time to write a good post).

If in doubt, profile.
But yes, I think that there is a performance overhead. If you do select c then the collection will contain references to the original items. If you do select new { ... } then C# is building an anonymous type for you, creating new instances of that type and filling them with data. Sounds definitely slower to me.

Related

change value of a member of a list of type anonymous

I have a list of anonymous objects, that each anonymous object has the same members, and I create this list by something like this :
var listWorthies = balanceWorthies.Select(w => new
{
OwnerName = w.OwnerOnInquery,
OwnerDocDate = w.OwnerDocDate,
}).ToList();
Now I just wanted to convert the OwnerDocDate member of each dynamic object.
something like this :
var listWorthies = balanceWorthies.Select(w => new
{
OwnerName = w.OwnerOnInquery,
OwnerDocDate = ConvertDate(w.OwnerDocDate),
}).ToList();
gives me the error
LINQ to Entities does not recognize the method 'System.String ConvertDate(System.DateTime)' method, and this method cannot be translated into a store expression.
after creating the list, I tried somethings like these ones too :
foreach (dynamic Worthy in listWorthies)
{
DateTime OwnerDocDate_2 = ConvertDate(Worthy.OwnerDocDate);
Worthy.AddProperty(OwnerDocDate_2);
}
but it gives me this error :
'<>f__AnonymousType8<string,string,System.DateTime,System.DateTime,System.Guid,string>' does not contain a definition for 'AddProperty'
UPDATE
the output of function ConvertDate is String, and I want to change the type of OwnerDocDate from datetime to string.
how can I solve this?
You forgot to tell us that balanceWorthies is an IQueryable<...>, not an IEnumerable<...>.
To understand why this matters, you must understand the difference between an IQueryable<...>, and an IEnumerable<...>.
IEnumerable
An object of a class that implements IEnumerable<...>, represents a sequence. You can get the first element, and once you've got an element, you can get the next element, as long as there are elements.
At its lowest level this is done using methode GetEnumerator() / MoveNext() / Current:
IEnumerable<Cusotmer> customers = ...
IEnumerator<Customer> enumerator = customers.GetEnumerator();
while (enumerator.MoveNext())
{
// There is still a Customer in the sequence:
Customer customer = enumerator.Current;
ProcessCustomer(customer);
}
foreach will deep inside do this.
If you look at the LINQ methods, you will see that there are two groups of methods: the ones that return IEnumerable<...> and the others. The first group won't enumerate the sequence. We say that they use lazy-execution or deferred-execution. In the description of these LINQ methods, you'll find this term in the remarks section.
Concatenating methods of this group isn't expensive: the query is not executed, only the Enumerator is adjusted.
The LINQ methods that return something else than IEnumerble<...> will execute the sequence. Deep inside GetEnumerator / MoveNext / Current is called to access the elements of the source sequence one by one.
IQueryable
An object of a class that implements IQueryable<...> doesn't represent an enumerable sequence, it represents the potential to fetch an enumerable sequence.
To do this, the class holds an Expression and a Provider. The Expression holds what data must be fetched in some generic format. The Provider knows where the data must be fetched (usually a Database Management System) and what language is used to communicate with this DBMS (usually SQL).
When you ask the IQueryable to get the enumerator, the Expression is sent to the Provider, who will translate the Expression into SQL and fetch the data at the DBMS. The fetched data is presented as an IEnumerator<...>. The caller can use MoveNext() / Current to access the fetched elements one by one.
Back to your question
The Provider doesn't know your own methods. Hence it doesn't know how to translate them into SQL. In fact, there are several standard LINQ methods that are not supported by entity framework. See Supported and unsupported LINQ methods
Your compiler doesn't know how smart your Provider is, so he can't complain. You'll get the error at runtime.
The easiest way to solve your problem is by transferring the data to your local process and let your local process execute the methods as if it was an IEnumerable<...>.
This is done using the method Enumerable.AsEnumerable. As transferring data is expensive, it is wise to limit the data being transferred to a minimum before you call AsEnumerable. So first do all your Where, (Group-)Join, etc. everything that limits the amount of transferred data.
var listWorthies = balanceWorthies.Select(w => new
{
OwnerName = w.OwnerOnInquery,
OwnerDocDate = w.OwnerDocDate,
})
// OwnerDocDate has the type of balanceWorthy.OwnerDocDate
// move the data to local process, so you can ConvertDate
.AsEnumerable()
// now you can call your own methods:
.Select(fetchedItem => new
{
OwnerName = fetchedItem.OwnderName,
OwnerDocDate = ConvertDate(fetchedItem.OwnerDocDate),
});
Database management systems are extremely optimized in selecting data. It seems that ConvertDate will only translate the data into a different format. So you won't lose a lot of efficiencies.
If in other cases the method will change the amount of selected data, try to change the expression such that the DBMS can handle it, especially if it is before a Where. If the DBMS must execute your code, and you can't translate the LINQ into something that your provider supports, you'll have to write an extension method that changes the Expression. How to do that is something for a different question.

Linq2Entities CompiledQuery for query that uses joins

I have a query that is not performing too well, e.g. the generated SQL code is sub-optimal.
The original statement looked something like this (simplified):
ctx.Table1.Where(t => ...)
.OrderBy(t => ....)
.Select(t => new {Table1 = t,
SomeProperty = t.Table2.SomeProperty,
SomeProperty2 = t.Table2.SomeProperty2,
AnotherProperty = t.Table3.AnotherProperty,
...
}
I looked in SQL Profiler and found that the generated SQL would join the same table multiple times and the statement would take around 1 second to execute.
I then rewrote the statement to something along these lines:
from t in ctx.Table1
join t2 in ctx.Table2 on t.key equals t2.key into lt2
from t2 in lt2.DefaultIfEmpty()
join t3 in ctx.Table3 on t.key equals t3.key into lt3
from t3 in lt3.DefaultIfEmpty()
where t ...
orderby t...
select new {Table1 = t, .... }
This generated a much nicer statement, that when grabbed from SQL profiler and executed in Management studio is double as fast as the statement generated by the code in the previous example.
However when running the code from the second example, the time taken for EF to generate the expression far superseeds the time gained from the query optimization.
So how do I go about writing statement number two as a CompiledQuery. I basically don't know how to return an anonymous type from a CompiledQuery.
A workaround I found for using CompiledQueries is:
Add a private InitQueryX() method before each QueryX() method that uses LINQ to Entity.
Use attributes and reflection call all the InitQueryX() methods from an Init() method.
Call Init() method once on app start.
This forces compilation of queries at start, yet enables writing queries in a more flexible manner then CompiledQueries do.
The InitQueryX() should use multiple dummy inputs so that it covers all the paths within QueryX() method (sort of like unit tests code coverage).
When possible, the InitQueryX()'s inputs should be mocks that result in 0 rows in database, so that the Init() method will take less time to run.
You can use Tuple class if your return object will have 8 or less properties. If you have more properties and don't want to declare a class for those properties you can use dynamic as the return type.

Using a delegate CompiledQuery in a join?

I have a question related to this previous question of mine. In an existing bit of LINQ which involves a number of joins, I'm trying to take each separate method comprising the join and convert it to a CompiledQuery.
First, the normal LINQ method:
private IQueryable<Widget> GetWidgetQuery()
{
return db.Widgets.Where(u => (!u.SomeField.HasValue || !u.SomeField.Value));
}
And here, a delegate (field) definition for a CompiledQuery along these lines:
private static readonly Func<DBDataContext, IQueryable<Widget>> GetWidgetQuery =
CompiledQuery.Compile((DBDataContext db) =>
db.Widgets.Where(u => (!u.SomeField.HasValue || !u.SomeField.Value)));
If I hover over the normal LINQ statement for the method GetWidgetQuery(), I see that it's a method as below:
(method) IQueryable<Widget> GetWidgetQuery()
However, the compiled query delegate (field) differs as follows:
(field) Func<DBDataContext, IQueryable<Widget>> GetWidgetQuery
Upon executing the latter as part of the LINQ statement, the syntax differs as follows. First, the normal LINQ's participation in the join:
var myquery =
from wxr in GetWidgetXRQuery()
join w in GetWidgetQuery() on wxr.WidgetID equals w.ID
select new DTO.WidgetList
{
...
}
And here, the invocation of the CompiledQuery in the form of the delegate:
var myquery =
from wxr in GetWidgetXRQuery()
join w in GetWidgetQuery.Invoke(myContext) on wxr.WidgetID equals w.ID
select new DTO.WidgetList
{
...
}
The former returns the expected result set; the latter, when I attempt myquery.ToList(), yields a stackoverflow exception, in part related to this limitation of .NET 3.5, I think.
Can someone please help me understand how the compiled statement existing as a field (or I guess I should say a delegate) rather than a method is killing my query? In short I know what I'm doing is wrong, but I'm not sure I understand what I misunderstand.
I tried doing roughly the same thing you're doing on EF 4, and everything seems to work fine. So it's either an EF 3.5 issue, or it has something to do with your implementation of GetWidgetXRQuery, or some combination of the two.
But the real point I'd like to make is that, as Roy Goode stated in an answer to your previous question, you lose all the advantages of a precompiled query once you extend that query in any way. By trying to perform a Join on your query, you are converting it to just a plain old query. So you might as well just use the non-compiled version which appears to work for you.
Update
Realized you were talking about LINQ to SQL. This sort of query does appear to have support in Entity Framework, but not LINQ to SQL. In .NET 4, I'm getting the following error:
An IQueryable that returns a self-referencing Constant expression is not supported.
That doesn't mean much to me, but I'm guessing that it has something to do with the way the compiled query is represented internally. I still get the same error if I evaluate the query into a variable and use that variable in the query later, so it clearly has nothing to do with the difference between a delegate and a function. I still maintain that a compiled query is not appropriate to use here. Either you need to create one big compiled query to represent the whole query you want to perform, or you need to use regular queries if you want to piece them together this way.
I just came across this same error while doing db integration testing, and to jump straight to the point without trying to explain my specific issue. Linq to Sql will create the sql query internally when using IQueryable and the moment you execute a method on that IQueryable, i.e. ToList() it executes that query on the database. So in my case I am joining to a method that returns IQueryable but is mocked to return a result, it is trying to compile that to a sql query but the IQueryable I created does not have an internal SQL query

What's the difference between these LINQ queries?

I use LINQ-SQL as my DAL, I then have a project called DB which acts as my BLL. Various applications then access the BLL to read / write data from the SQL Database.
I have these methods in my BLL for one particular table:
public IEnumerable<SystemSalesTaxList> Get_SystemSalesTaxList()
{
return from s in db.SystemSalesTaxLists
select s;
}
public SystemSalesTaxList Get_SystemSalesTaxList(string strSalesTaxID)
{
return Get_SystemSalesTaxList().Where(s => s.SalesTaxID == strSalesTaxID).FirstOrDefault();
}
public SystemSalesTaxList Get_SystemSalesTaxListByZipCode(string strZipCode)
{
return Get_SystemSalesTaxList().Where(s => s.ZipCode == strZipCode).FirstOrDefault();
}
All pretty straight forward I thought.
Get_SystemSalesTaxListByZipCode is always returning a null value though, even when it has a ZIP Code that exists in that table.
If I write the method like this, it returns the row I want:
public SystemSalesTaxList Get_SystemSalesTaxListByZipCode(string strZipCode)
{
var salesTax = from s in db.SystemSalesTaxLists
where s.ZipCode == strZipCode
select s;
return salesTax.FirstOrDefault();
}
Why does the other method not return the same, as the query should be identical ?
Note that, the overloaded Get_SystemSalesTaxList(string strSalesTaxID) returns a record just fine when I give it a valid SalesTaxID.
Is there a more efficient way to write these "helper" type classes ?
Thanks!
This is probably down to the different ways LINQ handles IEnumerable<T> and IQueryable<T>.
You have declared Get_SystemSalesTaxList as returning IEnumerable<SystemSalesTaxList>. That means that when, in your first code sample, you apply the Where operator to the results of Get_SystemSalesTaxList, it gets resolved to the Enumerable.Where extension method. (Note that what matters is the declared type. Yes, at runtime Get_SystemSalesTaxList is returning an IQueryable<SystemSalesTaxList>, but its declared type -- what the compiler sees -- is IEnumerable<SystemSalesTaxList>.) Enumerable.Where runs the specified .NET predicate over the target sequence. In this case, it iterates over all the SystemSalesTaxList objects returned by Get_SystemSalesTaxList, yielding the ones where the ZipCode property equals the specified zip code string (using the .NET String == operator).
But in your last code sample, you apply the Where operator to db.SystemSalesTaxList, which is declared as being of type IQueryable<SystemSalesTaxList>. So the Where operator in that sample gets resolved to Queryable.Where, which translates the specified predicate expression to SQL and runs it on the database.
So what's different in the zip code methods is that the first one runs the C# s.ZipCode == strZipCode test in .NET, and the second translates that into a SQL query WHERE ZipCode = 'CA 12345' (parameterised SQL really but you get the idea). Why do these give different results? Hard to be sure, but the C# == predicate is case-sensitive, and depending on your collation settings the SQL may or may not be case-sensitive. So my suspicion is that strZipCode doesn't match the database zip codes in case, but in the second version SQL Server collation is smoothing this over.
The best solution is probably to change the declaration of Get_SystemSalesTaxList to return IQueryable<SystemSalesTaxList>. The major benefit of this is that it means queries built on Get_SystemSalesTaxList will be executed database side. At the moment, your methods are pulling back EVERYTHING in the database table and filtering it client side. Changing the declaration will get your queries translated to SQL and they will run much more efficiently, and hopefully will solve your zip code issue into the bargain.
The real issue here is the use of IEnumerable<T>, which breaks "composition" of queries; this has two effects:
you are reading all (or at least, more than you need) of your table each time, even if you ask for a single row
you are running LINQ-to-Objects rules, so case-sensitivity applies
Instead, you want to be using IQueryable<T> inside your data layer, allowing you to combine multiple queries with additional Where, OrderBy, Skip, Take, etc as needed and have it build the TSQL to match (and use your db's case-sensitivity rules).
Is there a more efficient way to write these "helper" type classes ?
For more efficient (less code to debug, doesn't stream the entire table, better use of the identity-map to short-circuit additional lookups (via FirstOrDefault etc)):
public IEnumerable<SystemSalesTaxList> Get_SystemSalesTaxList()
{
return db.SystemSalesTaxLists;
}
public SystemSalesTaxList Get_SystemSalesTaxList(string salesTaxID)
{
return db.SystemSalesTaxLists.FirstOrDefault(s => s.SalesTaxID==salesTaxID);
}
public SystemSalesTaxList Get_SystemSalesTaxListByZipCode(string zipCode)
{
return db.SystemSalesTaxLists.FirstOrDefault(s => s.ZipCode == zipCode);
}

Returning multiple streams from LINQ query

I want to write a LINQ query which returns two streams of objects. In F# I would write a Seq expression which creates an IEnumerable of 2-tuples and then run Seq.unzip. What is the proper mechanism to do this in C# (on .NET 3.5)?
Cheers, Jurgen
Your best bet is probably to create a Pair<T1, T2> type and return a sequence of that. (Or use an anonymous type to do the same thing.)
You can then "unzip" it with:
var firstElements = pairs.Select(pair => pair.First);
var secondElements = pairs.Select(pair => pair.Second);
It's probably worth materializing pairs first though (e.g. call ToList() at the end of your first query) to avoid evaluating the query twice.
Basically this is exactly the same as your F# approach, but with no built-in support.
Due to the lack of tuples in C# you may create an anonymous type.
Semantics for this are:
someEnumerable.Select( inst => new { AnonTypeFirstStream = inst.FieldA, AnonTypeSecondStream = inst.FieldB });
This way you're not bound in the amount of streams you return, you can just add a field to the anonymous type pretty like you can add an element to a tuple.

Categories