Returning IQueryable or Enumerated Object - c#

I was wondering about the performance difference between these two scenarios and what could the disadvantages be over each other?
First scenario :
public class Helper //returns IQueryable
{
public IQueryable<Customer> CurrentCustomer
{
get{return new DataContext().Where(t=>t.CustomerId == 1);
}
}
public class SomeClass
{
public void Main()
{
Console.WriteLine(new Helper().CurrentCustomer.First().Name;
}
}
The second scenario :
public class Helper //returns Enumerated result
{
public Customer CurrentCustomer
{
get{return new DataContext().First(t=>t.CustomerId == 1);
}
}
public class SomeClass
{
public void Main()
{
Console.WriteLine(new Helper().CurrentCustomer.Name;
}
}
Thanks in advance.

Well, the main difference that I can see is when the query is executed and what else you can do with the query.
For example, suppose your Customer object has some large fields. Using the second approach, you will always fetch them. Using the first approach you could write:
string name = helper.CurrentCustomer.Select(x => x.Name).First();
That would then only need to query the single field in the database. In terms of timing, the query will only be executed when you actually request the data (which is how it's able to wait until after you've used Select to work out what to put in the query in the above case). That has pros and cons - it can make it harder to reason about, but it can save some work too. In terms of the "reasoning about" side, you know that once you've got a customer, you've got an object you can just work with. If you use the same queryable twice though, you need to know whether your LINQ query provider is going to cache the result... if you write:
IQueryable<Customer> currentCustomerQuery = helper.CurrentCustomer;
Customer x = currentCustomerQuery.First();
Customer y = currentCustomerQuery.First();
will that issue the query once or twice? I suspect it very much depends on the provider, but I wouldn't like to make any guesses about specific ones.
The other thing to think about is how easy it is to use the API you're building. Personally I'd normally find it easier to use an API which gives me the data I want rather than a query I can fetch that data from. On the other hand, it is slightly less flexible.
One option would be to allow both - have a GetCurrentCustomerQuery() and a GetCurrentCustomer() method. (I probably wouldn't make them properties myself, but that's merely a matter of personal preference.) That way you can get the flexibility you want when you really need it, but have a simple way of just getting the current customer as an object.

In short, using IQueryable is far better and allows you further filter the returned IQueryable down the path, without actually having the object or collection loaded into the memory. In this case, the return type is a simple Customer class and impact would be minimal, but in case of collections, you are strongly advised to use IQueryable. Chris Sells shows the problem in more depth here

The difference between the methods is that the first one returns an expression that can return the object, whlie the second one has already executed the expression and returns the object.
In this exacty scenario the difference isn't very useful, and returning a single object as an expression is not very intuitive.
A scenario where the difference is more useful is if you have a method that returns several objects. The deferred execution of the expression means that you will only load the objects that you actually use. In the case that you only need the first few objects, the rest of the objects will not be created.

Related

Wrapping A Long List of Parameters as a Single Object

Consider the following interface:
public interface SomeRepo
{
public IEnumerable<IThings> GetThingsByParameters(DateTime startDate,
DateTime endDate,
IEnumerable<int> categorIds,
IEnumerable<int> userIds,
IEnumerable<int> typeIds,
string someStringToFilerBy);
}
Is there any benefit in doing this instead?
public IEnuemrable<IThings> GetThingsByParamters(IParameter parameter);
Where IParameter is an object defined as such:
public interface IParameter
{
DateTime startDate { get; }
DateTime endDate { get; }
IEnumerable<int> categorIds { get; }
IEnumerable<int> userIds { get; }
IEnumerable<int> typeIds { get; }
string someStringToFilerBy { get; }
}
I don't see any benefit in doing IParameter other than it makes it a bit more readable but the extra layer of complexity doesn't seem to be worth it.
Anything that I maybe missing? Thanks.
If that's just for that single place, it may not be worth it all that much.
Creating a class on its own does have some possible benefits, but they're quite dependent on exactly that; whether you would be able to reuse it.
You could add some sort of early data validation to your IParameters implementation (eg. endDate can't be earlier than startDate - it's common sense, you don't need to be a repository object to know that).
If some values are optional and some are not, a Parameters class gives you an opportunity to clearly distinguish these two categories.
It's much easier to find all usages of Parameters in your code than all the occurences of raw "start date / end date / ids" packs.
This being said, readability isn't a minor concern. I feel that 6 parameters per method is twice too many. And based off experience, I wouldn't bet it will stop at 6.
You can see in book Clean Code (Robert C. Martin) that is not a good idea to use many parameters in a method (the book recommends use at most 3), if you have a method that requires so many parameters you have to think again on your design, or it suggests that your model need one more class.
The extreme of that is developing your own expression system where IParameter has a string operator ("Equals", "LessThanOrEqualTo", "Plus", etc.) and then has an array of IParameter[] called Children or something. Of course if you're going to do that, why not use something built-in like LINQ or C#'s Expression's? If this isn't backed by a database and you need to use string filtering that's a good option (or use a DataTable's built-in filtering/parsing of expressions if you don't care about performance).
If this is backed by a database, usually it's a bad idea to expose arbitrary querying on a repository that, say, ties to a SQL database because the end developer may not know which columns are indexed and may write ill-performant queries (particularly if they don't have easy access to production-scale data) - it's better to give specific query methods that take in specific methods that map to essentially a SQL SELECT and fine-tuning each query (assuming your repository is backed by a SQL database).
This is more performant because now you explicitly control which indexes the end developer can query from by exposing a method that takes in explicit arguments.
This also makes unit-testing dependencies of your repository much easier because it's easy to mock repository a strongly typed method like that - you'd end up making a fake in-memory abstraction of the database using LINQ-to-Objects if you allow your services to define their own queries - and that can sometimes give false positives.
There's nothing inherently absolutely wrong - I just see the typical use case for something like that of being very explicit if backed by a database or if not leveraging an already-existing filtering/expression system if this is all in-memory.

Recursion in Fluent API

I am designing a fluent API for writing SQL. Keep in mind one of my goals is to have API not suggest functions that can't be called in that part of the chain. For instance if you just got done defining a field in the select clause you can't call Where until you called From first. A simple query looks like this:
string sql = SelectBuilder.Create()
.Select()
.Fld("field1")
.From("table1")
.Where()
.Whr("field1 > field2")
.Whr("CURRENT_TIMESTAMP > field3")
.Build()
.SQL;
My problem comes with recursion in SQL code. Say you wanted to have a field contain another SQL statement like below:
string sql = SelectBuilder.Create()
.Select()
.Fld("field1")
.SQLFld()
.Select
.Count("field6")
.From("other table")
.EndSQLFld()
.FLd("field2")
.From("table1")
.Where()
.Whr("field1 > field2")
.Whr("CURRENT_TIMESTAMP > field3")
.Build()
.SQL;
I am using method chaining to build my fluent API. It many ways it is a state machine strewn out across many classes which represent each state. To add this functionality I would need to copy essentially every state I already have and wrap them around the two SQLFld and EndSQLFld states. I would need yet another copy if you were one more level down and were embedding a SQL statement in to a field of the already embedded SQL statement. This goes on to infinity, so with an infinitely deep embedded SQL query I would need an infinite number of classes to represent the infinite states.
I thought about writing a SelectBuilder query that was taken to the point of the Build method and then embedding that SelectBuilder in to another SelectBuilder and that fixes my infinity problem, but it is not very elegant and that is the point of this API.
I could also throw out the idea that the API only offers functions when they are appropriate but I would really hate to do that. I feel like that helps you best discover how to use the API. In many fluent APIs it doesn't matter which order you call what, but I want the API to appear as close to the actual SQL statement as possible and enforce its syntax.
Anyone have any idea how to solve this issue?
Glad to see you are trying fluent interfaces, I think they are a very elegant and expressive.
The builder pattern is not the only implementation for fluent interfaces. Consider this design, and let us know what you think =)
This is an example and I leave to you the details of your final implementation.
Interface design example:
public class QueryDefinition
{
// The members doesn't need to be strings, can be whatever you use to handle the construction of the query.
private string select;
private string from;
private string where;
public QueryDefinition AddField(string select)
{
this.select = select;
return this;
}
public QueryDefinition From(string from)
{
this.from = from;
return this;
}
public QueryDefinition Where(string where)
{
this.where = where;
return this;
}
public QueryDefinition AddFieldWithSubQuery(Action<QueryDefinition> definitionAction)
{
var subQueryDefinition = new QueryDefinition();
definitionAction(subQueryDefinition);
// Add here any action needed to consider the sub query, which should be defined in the object subQueryDefinition.
return this;
}
Example usage:
static void Main(string[] args)
{
// 1 query deep
var def = new QueryDefinition();
def
.AddField("Field1")
.AddField("Filed2")
.AddFieldWithSubQuery(subquery =>
{
subquery
.AddField("InnerField1")
.AddField("InnerFiled2")
.From("InnerTable")
.Where("<InnerCondition>");
})
.From("Table")
.Where("<Condition>");
// 2 queries deep
var def2 = new QueryDefinition();
def2
.AddField("Field1")
.AddField("Filed2")
.AddFieldWithSubQuery(subquery =>
{
subquery
.AddField("InnerField1")
.AddField("InnerField2")
.AddFieldWithSubQuery(subsubquery =>
{
subsubquery
.AddField("InnerInnerField1")
.AddField("InnerInnerField2")
.From("InnerInnerTable")
.Where("<InnerInnerCondition>");
})
.From("InnerInnerTable")
.Where("<InnerCondition>");
})
.From("Table")
.Where("<Condition>");
}
You can't "have only applicable methods available" without either sub-APIs for the substructures or clear bracketing/ending of all inner structural levels (SELECT columns, expressions in WHERE clause, subqueries).
Even then, running it all through a single API will require it to be stateful & "modal" with "bracketing" methods, to track whereabouts in the decl you are. Error reporting & getting these right will be tedious.
Ending bracketing by "fluent" methods, to me, seems non-fluent & ugly. This would result in a ugly appearence of EndSelect, EndWhere, EndSubquery etc. I'd prefer to build substructures (eg SUBQUERY for select) into a local variable & add that.
I don't like the EndSQLFld() idiom, which terminates the Subquery implicitly by terminating the Field. I'd prefer & guess it would be better design to terminate the subquery itself which is the complex part of the nested structure -- not the field.
To be honest, trying to enforce ordering of a "declarative" API for a "declarative" language (SQL) seems to be a waste of time.
Probably what I'd consider closer to an ideal usage:
SelectBuilder select = SelectBuilder.Create("CUSTOMER")
.Column("ID")
.Column("NAME")
/*.From("CUSTOMER")*/ // look, I'm just going to promote this onto the constructor.
.Where("field1 > field2")
.Where("CURRENT_TIMESTAMP > field3");
SelectBuilder countSubquery = SelectBuilder.Create("ORDER")
.Formula("count(*)");
.Where("ORDER.FK_CUSTOMER = CUSTOMER.ID");
.Where("STATUS = 'A'");
select.Formula( countSubquery, "ORDER_COUNT");
string sql = SelectBuilder.SQL;
Apologies to the Hibernate Criteria API :)

how to return an anonymous type in linq-to-sql

I have the following bogus code, but the idea is to return a generic class
public static var getSeaOf(string user)
{
var periodsSettings = from p in sea
where p.add_by == new Guid(user)
select new { p.id, p.name };
return var;
}
I have read here - How to return anonymous type from c# method that uses LINQ to SQL that the best solution for this case is to create a class for the return type.
But my question is if I have hundreds of functions like this does it mean I need to have hundreds of classes?
I hope there is a more generic solution, thanks for your help!!
Edition
I take a look at
Silverlight - LinqToEntities - How Do I Return Anonymous Types
But I cannot specified the class name in the select new, like the article does?
public static IEnumerable<retSea> getBskSeasonsOf(string user)
{
var periodsSettings = from p in sea
where p.add_by == new Guid(user)
select new retSea { p.id, p.name };
return periodsSettings;
}
If I remember correctly, the spec says that the anonymous type generated for that object cannot escape the method it's defined in. Therefore the only method that could ever have variables of that type is the method the object is instantiated in. This gets a bit sketchy when you consider the fact that the LINQ query could get compiled into a bunch of methods, but that's magic.
The object itself, however, can escape the method. The way to make this work is to... return object. You'll have to access it using reflection (or dynamic) though, so you'll lose type safety. You might want to consider whether this is worth it or not. Most likely it's not. And most likely you don't have hundreds of different types of results either - I bet many of your queries return the same type of data. Re-use those classes.
If you really have hundreds of classes just like that, just make a class. Or use something already build in for a key value pair, like KeyValuePair.
Here's a blog post on naming anonymous types in .NET using VS 2010's Generate From Usage functionality.
http://diditwith.net/2009/10/24/NamingAnonymousTypesWithGenerateFromUsage.aspx
If each one of those methods returns an anonymous type that has the same fields as the others, then you only have to create one class and re-use it throughout the methods.
If they each return an anonymous type that has different fields, then yes, you'll have to create a class for each of those methods.
If you're using C# 4.0, you could attempt to take advantage of the dynamic type and see what kind of trouble you could get yourself into there.
Well, you probably want to return periodsSettings.
But, after that, consider enumerating the collection with ToDictionary() within this method, where the key is p.id and the value is p.name.
public static IDictionary<int, string> getSeaOf(string user) {
return (from p in sea
where p.add_by == new Guid(user))
.ToDictionary(p => p.id, p => p.name);
}
This effectively gets you your data without having to make a new class. It enumerates it here, of course, but that might not matter in your case.

Linq based generic alternate to Predicate<T>?

I have an interface called ICatalog as shown below where each ICatalog has a name and a method that will return items based on a Predicate<Item> function.
public interface ICatalog
{
string Name { get; }
IEnumerable<Item> GetItems(Predicate<Item> predicate);
}
A specific implementation of a catalog may be linked to catalogs in various format such as XML, or a SQL database.
With an XML catalog I end up deserializing the entire XML file into memory, so testing each item with the predicate function does does not add a whole lot more overhead as it's already in memory.
Yet with the SQL implementation I'd rather not retrieve the entire contents of the database into memory, and then filter the items with the predicate function. Instead I'd want to find a way to somehow pass the predicate to the SQL server, or somehow convert it to a SQL query.
This seems like a problem that can be solved with Linq, but I'm pretty new to it. Should my interface return IQueryable instead? I'm not concerned right now with how to actually implement a SQL version of my ICatalog. I just want to make sure my interface will allow for it in the future.
Rob has indicated how you might do this (although a more classic LINQ approach might take Expression<Func<Item,bool>>, and possbily return IQueryable<IFamily>).
The good news is that if you want to use the predicate with LINQ-to-Objects (for your xml scenario) you can then just use:
Predicate<Item> func = predicate.Compile();
or (for the other signature):
Func<Item,bool> func = predicate.Compile();
and you have a delegate (func) to test your objects with.
The problem though, is that this is a nightmare to unit test - you can only really integration test it.
The problem is that you can't reliably mock (with LINQ-to-Objects) anything involving complex data-stores; for example, the following will work fine in your unit tests but won't work "for real" against a database:
var foo = GetItems(x => SomeMagicFunction(x.Name));
static bool SomeMagicFunction(string name) { return name.Length > 3; } // why not
The problem is that only some operations can be translated to TSQL. You get the same problem with IQueryable<T> - for example, EF and LINQ-to-SQL support different operations on a query; even just First() behaves differently (EF demands you explicitly order it first, LINQ-to-SQL doesn't).
So in summary:
it can work
but think carefully whether you want to do this; a more classic black box repository / service interface may be more testable
You don't need to go all the way and create an IQueryable implementation
If you declare your GetItems method as:
IEnumerable<IFamily> GetItems(Expression<Predicate<Item>> predicate);
Then your implementing class can inspect the Expression to determine what is being asked.
Have a read of the IQueryable article though, because it explains how to build a expression tree visitor, which you'll need to build a simple version of.

LINQ to SQL business object creation best practices

I've been using LINQ extensively in my recent projects, however, I have not been able to find a way of dealing with objects that doesn't either seem sloppy or impractical.
I'll also note that I primarily work with ASP.net.
I hate the idea of exposing the my data context or LINQ returned types to my UI code. I prefer finer grained control over my business objects, and it also seems too tightly coupled to the db to be good practice.
Here are the approaches I've tried ..
Project items into a custom class
dc.TableName.Select(λ => new MyCustomClass(λ.ID, λ.Name, λ.Monkey)).ToList();
This obviously tends to result in a lot of wireup code for creating, updating etc...
Creating a wrapper around returned object
public class MyCustomClass
{
LinqClassName _core;
Internal MyCustomClass(LINQClassName blah)
{
_core = blah;
}
int ID {get { return _core.ID;}}
string Name { get {return _core.Name;} set {_core.Name = value;} }
}
...
dc.TableName.Select(λ => new MyCustomClass(λ)).ToList();
Seems to work pretty well but reattaching for updates seems to be nigh impossible somewhat defeating the purpose.
I also tend to like using LINQ Queries for transformations and such through my code and I'm worried about a speed hit with this method, although I haven't tried it with large enough sets to confirm yet.
Creating a wrapper around returned object while persisting data context
public class MyCustomClass
{
LinqClassName _core;
MyDataContext _dc;
...
}
Persisting the data context within my object greatly simplifies updates but seems like a lot of overhead especially when utilizing session state.
A quick Note: I know the usage of λ is not mathematically correct here - I tend to use it for my bound variable because it stands out visually, and in most lambda statements it is the transformation that is important not the variable - not sure if that makes any sense but blah
Sorry for the extremely long question.
Thanks in advance for your input and Happy New Years!
I create "Map" extension functions on the tables returning from the LINQ queries. The Map function returns a plain old CLR object. For example:
public static MyClrObject Map(this MyLinqObject o)
{
MyClrObject myObject = new MyClrObject()
{
stringValue = o.String,
secondValue = o.Second
};
return myObject;
}
You can then add the Map function to the select list in the LINQ query and have LINQ return the CLR Object like:
return (from t in dc.MyLinqObject
select t.Map()).FirstOrDefault();
If you are returning a list, you can use the ToList to get a List<> back. If you prefer to create your own list types, you need to do two things. First, create a constructor that takes an IEnumerable<> of the underlying type as it's one argument. That constructor should copy the items from the IEnumerable<> collection. Second, create a static extension method to call that constructor from the LINQ query:
public static MyObjectList ToMyObjectList(this IEnumerable<MyObjectList> collection)
{
return new MyObjectList (collection);
}
Once these methods are created, they kind of hide in the background. They don't clutter up the LINQ queries and they don't limit what operations you can perform in teh query.
This blog entry has a more thorough explanation.

Categories