Manipulating Linq Expressions because I want them to be magic - c#

I have been carving out a section of code for a reporting app. I have derived many interfaces, for example
public interface IDateRangeSearchable
{
DateTime StartDate { get; }
DateTime EndDate { get; }
}
which I then have created helper methods to access expressions and add them, without needing to rewrite the same logic over and over, also attempting to preserve consistency and business logic in one place:
public static Expression<Func<Thingy, bool>> AddDateRangeFilterExpr<T>(this T model, Expression<Func<Thingy,bool>> webOrderItemExpr)
where T : IDateRangeSearchable
{
return webOrderItemExpr.AndAlso(thing => thing.Date >= model.StartDate && thing.Date <= model.EndDate);
}
AndAlso is a custom function that essentially combines expressions so as to avoid using multiple Where statements.
But here is the problem I see developing from this pattern:
Option 1: I have to write a custom implementation of "AddDateRangeFilterExpr" for Entity Object "Thingy1", "Thingy2, ad infinitum.
Problem: this is not dry. Probably what I will do starting out, since i am mainly concerned with 3 or 4 entity objects, and prefer duplication to the wrong abstraction. But I am looking for the right one here, since more may be added.
Option 2: I add an interface onto the Entity Object that has field "Date", and rewrite the signature.
Problem : Date fields I deal with vary.. nullable, not nullable, and can be named "Date", "DateAdded", "thingDate" etc. Meaning multiple interfaces AND implementations, clunky, not dry probably even worse..
Option 3:???
Call me crazy, but I am not an expression wizard yet. I am very interested in them though. I want to know if it is possible to transform an expression from:
Expression<Func<Thingy, DateTime?>> dateExpr = t => t.Date;
into
Expression<Func<Thingy, bool>> thingExpr = t => t.Date >= someDate;
which would allow me to just pass in the expression which would then perform that date filter on the column specified.
thingExpr = model.AddDateRangeFilterExpr(thingExpr, dateExpr);
Then i would only need an implementation for DateTime and DateTime? and some entity objects with multiple date columns, i could choose different date columns depending on what was needed.
Or in other words, can you transform a predicate(correct term?) somehow from a Date, to a boolean constructed from the column of that date field?
Sorry, I am really on the border of my knowledge here with expressions, so my language gets less precise as i tread into what I don't understand fully, I am just really here to determine if my wishful thinking could bear fruit in this direction. Open to criticisms on the whole approach as well, or resources for learning more about expressions in relation to this.

Related

Wrapping A Long List of Parameters as a Single Object

Consider the following interface:
public interface SomeRepo
{
public IEnumerable<IThings> GetThingsByParameters(DateTime startDate,
DateTime endDate,
IEnumerable<int> categorIds,
IEnumerable<int> userIds,
IEnumerable<int> typeIds,
string someStringToFilerBy);
}
Is there any benefit in doing this instead?
public IEnuemrable<IThings> GetThingsByParamters(IParameter parameter);
Where IParameter is an object defined as such:
public interface IParameter
{
DateTime startDate { get; }
DateTime endDate { get; }
IEnumerable<int> categorIds { get; }
IEnumerable<int> userIds { get; }
IEnumerable<int> typeIds { get; }
string someStringToFilerBy { get; }
}
I don't see any benefit in doing IParameter other than it makes it a bit more readable but the extra layer of complexity doesn't seem to be worth it.
Anything that I maybe missing? Thanks.
If that's just for that single place, it may not be worth it all that much.
Creating a class on its own does have some possible benefits, but they're quite dependent on exactly that; whether you would be able to reuse it.
You could add some sort of early data validation to your IParameters implementation (eg. endDate can't be earlier than startDate - it's common sense, you don't need to be a repository object to know that).
If some values are optional and some are not, a Parameters class gives you an opportunity to clearly distinguish these two categories.
It's much easier to find all usages of Parameters in your code than all the occurences of raw "start date / end date / ids" packs.
This being said, readability isn't a minor concern. I feel that 6 parameters per method is twice too many. And based off experience, I wouldn't bet it will stop at 6.
You can see in book Clean Code (Robert C. Martin) that is not a good idea to use many parameters in a method (the book recommends use at most 3), if you have a method that requires so many parameters you have to think again on your design, or it suggests that your model need one more class.
The extreme of that is developing your own expression system where IParameter has a string operator ("Equals", "LessThanOrEqualTo", "Plus", etc.) and then has an array of IParameter[] called Children or something. Of course if you're going to do that, why not use something built-in like LINQ or C#'s Expression's? If this isn't backed by a database and you need to use string filtering that's a good option (or use a DataTable's built-in filtering/parsing of expressions if you don't care about performance).
If this is backed by a database, usually it's a bad idea to expose arbitrary querying on a repository that, say, ties to a SQL database because the end developer may not know which columns are indexed and may write ill-performant queries (particularly if they don't have easy access to production-scale data) - it's better to give specific query methods that take in specific methods that map to essentially a SQL SELECT and fine-tuning each query (assuming your repository is backed by a SQL database).
This is more performant because now you explicitly control which indexes the end developer can query from by exposing a method that takes in explicit arguments.
This also makes unit-testing dependencies of your repository much easier because it's easy to mock repository a strongly typed method like that - you'd end up making a fake in-memory abstraction of the database using LINQ-to-Objects if you allow your services to define their own queries - and that can sometimes give false positives.
There's nothing inherently absolutely wrong - I just see the typical use case for something like that of being very explicit if backed by a database or if not leveraging an already-existing filtering/expression system if this is all in-memory.

Recursion in Fluent API

I am designing a fluent API for writing SQL. Keep in mind one of my goals is to have API not suggest functions that can't be called in that part of the chain. For instance if you just got done defining a field in the select clause you can't call Where until you called From first. A simple query looks like this:
string sql = SelectBuilder.Create()
.Select()
.Fld("field1")
.From("table1")
.Where()
.Whr("field1 > field2")
.Whr("CURRENT_TIMESTAMP > field3")
.Build()
.SQL;
My problem comes with recursion in SQL code. Say you wanted to have a field contain another SQL statement like below:
string sql = SelectBuilder.Create()
.Select()
.Fld("field1")
.SQLFld()
.Select
.Count("field6")
.From("other table")
.EndSQLFld()
.FLd("field2")
.From("table1")
.Where()
.Whr("field1 > field2")
.Whr("CURRENT_TIMESTAMP > field3")
.Build()
.SQL;
I am using method chaining to build my fluent API. It many ways it is a state machine strewn out across many classes which represent each state. To add this functionality I would need to copy essentially every state I already have and wrap them around the two SQLFld and EndSQLFld states. I would need yet another copy if you were one more level down and were embedding a SQL statement in to a field of the already embedded SQL statement. This goes on to infinity, so with an infinitely deep embedded SQL query I would need an infinite number of classes to represent the infinite states.
I thought about writing a SelectBuilder query that was taken to the point of the Build method and then embedding that SelectBuilder in to another SelectBuilder and that fixes my infinity problem, but it is not very elegant and that is the point of this API.
I could also throw out the idea that the API only offers functions when they are appropriate but I would really hate to do that. I feel like that helps you best discover how to use the API. In many fluent APIs it doesn't matter which order you call what, but I want the API to appear as close to the actual SQL statement as possible and enforce its syntax.
Anyone have any idea how to solve this issue?
Glad to see you are trying fluent interfaces, I think they are a very elegant and expressive.
The builder pattern is not the only implementation for fluent interfaces. Consider this design, and let us know what you think =)
This is an example and I leave to you the details of your final implementation.
Interface design example:
public class QueryDefinition
{
// The members doesn't need to be strings, can be whatever you use to handle the construction of the query.
private string select;
private string from;
private string where;
public QueryDefinition AddField(string select)
{
this.select = select;
return this;
}
public QueryDefinition From(string from)
{
this.from = from;
return this;
}
public QueryDefinition Where(string where)
{
this.where = where;
return this;
}
public QueryDefinition AddFieldWithSubQuery(Action<QueryDefinition> definitionAction)
{
var subQueryDefinition = new QueryDefinition();
definitionAction(subQueryDefinition);
// Add here any action needed to consider the sub query, which should be defined in the object subQueryDefinition.
return this;
}
Example usage:
static void Main(string[] args)
{
// 1 query deep
var def = new QueryDefinition();
def
.AddField("Field1")
.AddField("Filed2")
.AddFieldWithSubQuery(subquery =>
{
subquery
.AddField("InnerField1")
.AddField("InnerFiled2")
.From("InnerTable")
.Where("<InnerCondition>");
})
.From("Table")
.Where("<Condition>");
// 2 queries deep
var def2 = new QueryDefinition();
def2
.AddField("Field1")
.AddField("Filed2")
.AddFieldWithSubQuery(subquery =>
{
subquery
.AddField("InnerField1")
.AddField("InnerField2")
.AddFieldWithSubQuery(subsubquery =>
{
subsubquery
.AddField("InnerInnerField1")
.AddField("InnerInnerField2")
.From("InnerInnerTable")
.Where("<InnerInnerCondition>");
})
.From("InnerInnerTable")
.Where("<InnerCondition>");
})
.From("Table")
.Where("<Condition>");
}
You can't "have only applicable methods available" without either sub-APIs for the substructures or clear bracketing/ending of all inner structural levels (SELECT columns, expressions in WHERE clause, subqueries).
Even then, running it all through a single API will require it to be stateful & "modal" with "bracketing" methods, to track whereabouts in the decl you are. Error reporting & getting these right will be tedious.
Ending bracketing by "fluent" methods, to me, seems non-fluent & ugly. This would result in a ugly appearence of EndSelect, EndWhere, EndSubquery etc. I'd prefer to build substructures (eg SUBQUERY for select) into a local variable & add that.
I don't like the EndSQLFld() idiom, which terminates the Subquery implicitly by terminating the Field. I'd prefer & guess it would be better design to terminate the subquery itself which is the complex part of the nested structure -- not the field.
To be honest, trying to enforce ordering of a "declarative" API for a "declarative" language (SQL) seems to be a waste of time.
Probably what I'd consider closer to an ideal usage:
SelectBuilder select = SelectBuilder.Create("CUSTOMER")
.Column("ID")
.Column("NAME")
/*.From("CUSTOMER")*/ // look, I'm just going to promote this onto the constructor.
.Where("field1 > field2")
.Where("CURRENT_TIMESTAMP > field3");
SelectBuilder countSubquery = SelectBuilder.Create("ORDER")
.Formula("count(*)");
.Where("ORDER.FK_CUSTOMER = CUSTOMER.ID");
.Where("STATUS = 'A'");
select.Formula( countSubquery, "ORDER_COUNT");
string sql = SelectBuilder.SQL;
Apologies to the Hibernate Criteria API :)

Whats better design/practice: Nullable property or 1 value property and 1 bool "has" property?

I'm working on an ASP.NET MVC app, designing the domain models, using (testing) the new EF Code First feature.
I have an Activity entity that may or may not have a Deadline, what is the best way to approach it?
1 property:
public DateTime? Deadline {get; set;}
and check vs null before using
or
2 properties:
public DateTime Deadline {get; set;}
public bool HasDeadline {get; set;}
At first I thought of the first option, but then I started thinking that maybe the second option would be better regarding the DB...
Is there any best practice regarding this?
I'd go with the first option. After all, it's exactly an encapsulated form of the second.
The encapsulation makes it clear that you've only got one logical value (or lack thereof). In the second form you can treat the properties as if they were entirely independent, which they're logically not.
In terms of the database, I'd expect the first form to be just as easy too... presuambly you'll have a nullable DATETIME field in the database, won't you? It should map directly.
How about a combination of both just for the sake of making your code more readable?
public DateTime? Dealine{get; set;}
public bool HasDeadline
{
get
{
return (Deadline != null);
}
}
Its easy to read and does exactly the same thing that the consumer would have to do anyway. Besides...
if(HasDeadline)
doStuff();
is easier to read than
if(Dealine != null)
doStuff();
:)
I would use the first option. In the long run the second option will probably cause some maintenance problems because you have to remember to check and use both of the properties.
Also one option is to use one property but instead of making it nullable, you could return a Null object (also known as Special Case).
The database is used to storing NULL values - storing a Min value in the databsae, and then having a flag to indicate if you should trust that value makes queries complicated.
I like nullable types since the reflect the domain's intent - there is no date, not 'there isn't a date, so pretend the first of January 1970 means no date'.
There is also an overhead of maintaining the HasDealine value - you need to set it each time the corresponding property is updated. Also how do you clear it? If you set the Deadline to a date, it will set the HasDeadline to true. How do I 'unset' it? Would you set HasDeadline to false, but leave the Deadline field intact with the previous value?
Overall icky.
You should use the nullable, as it does exactly what you want. Using two separate properties means that you lose the connection between them, and you need to explain with documentation that they have a relation.
The nullable type should also fit better against a database type, however you should first design your object for how it works as an object, not for how you will store it in the database. If the use of a database generation tool causes you to make bad decisions when designing the code, it's contra-productive.

Getting magic strings out of QueryOver (or Fluent NHibernate perhaps)?

One of the many reason to use FluentNHibernate, the new QueryOver API, and the new Linq provider are all because they eliminate "magic string," or strings representing properties or other things that could be represented at compile time.
Sadly, I am using the spatial extensions for NHibernate which haven't been upgraded to support QueryOver or LINQ yet. As a result, I'm forced to use a combination of QueryOver Lambda expressions and strings to represent properties, etc. that I want to query.
What I'd like to do is this -- I want a way to ask Fluent NHibernate (or perhaps the NHibernate QueryOver API) what the magic string "should be." Here's a pseudo-code example:
Currently, I'd write --
var x = session.QueryOver<Shuttle>().Add(SpatialRestrictions.Intersects("abc", other_object));
What I'd like to write is --
var x = session.QueryOver<Shuttle>().Add(SpatialRestriction.Intersects(session.GetMagicString<Shuttle>(x => x.Abc), other_object));
Is there anything like this available? Would it be difficult to write?
EDIT: I just wanted to note that this would apply for a lot more than spatial -- really anything that hasn't been converted to QueryOver or LINQ yet could be benefit.
update
The nameof operator in C# 6 provides compile time support for this.
There is a much simpler solution - Expressions.
Take the following example:
public static class ExpressionsExtractor
{
public static string GetMemberName<TObj, TProp>(Expression<Func<TObj, TProp>> expression)
{
var memberExpression = expression.Body as MemberExpression;
if (memberExpression == null)
return null;
return memberExpression.Member.Name;
}
}
And the usage:
var propName = ExpressionsExtractor.GetMemberName<Person, int>(p => p.Id);
The ExpressionsExtractor is just a suggestion, you can wrap this method in whatever class you want, maybe as an extension method or preferably a none-static class.
Your example may look a little like this:
var abcPropertyName = ExpressionsExtractor.GetMemberName<Shuttle, IGeometry>(x => x.Abc);
var x = session.QueryOver<Shuttle>().Add(SpatialRestriction.Intersects(abcPropertyName, other_object));
Assuming I'm understanding your question what you might want is a helper class for each entity you have with things like column names, property names and other useful things, especially if you want to use ICriteria searches. http://nhforge.org/wikis/general/open-source-project-ecosystem.aspx has plenty of projects that might help. NhGen (http://sourceforge.net/projects/nhgen/) creates very simple helper classes which might help point you down a design path for what you might want.
Clarification Edit: following an "I don't understand" comment
In short, I don't beleive there is a solution for you just yet. The QueryOver project hasn't made it as far as you want it to. So as a possible solution in the mean time, to remove magic strings build a helper class, so your query becomes
var x = session.QueryOver<Shuttle>().Add(SpatialRestrictions.Intersects(ShuttleHelper.Abc, other_object));
That way your magic string is behind some other property ( I just chose .Abc to demonstrate but I'm sure you'll have a better idea of what you want ) then if "abc" changes ( say to "xyz" ) you either change the property name from .Abc to .Xyz and then you will have build errors to show you where you need to update your code ( much like you would with lambda expressions ) or just change the value of the .Abc property to "xyz" - which would really only work if your property had some meaningfull name ( such as .OtherObjectIntersectingColumn etc ) not that property name itself. That does have the advantage of not having to update code to correct the build errors. At that point your query could be
var x = session.QueryOver<Shuttle>().Add(SpatialRestrictions.Intersects(ShuttleHelper.OtherObjectIntersectingColumn, other_object));
I mentioned the open source project ecosystem page as it can give you some pointers on what types of helper classes other people have made so your not re-inventing the wheel so to speak.

Pulling the WHERE clause out of LINQ to SQL

I'm working with a client who wants to mix LINQ to SQL with their in-house DAL. Ultimately they want to be able to query their layer using typical LINQ syntax. The point where this gets tricky is that they build their queries dynamically. So ultimately what I want is to be able to take a LINQ query, pull it apart and be able to inspect the pieces to pull the correct objects out, but I don't really want to build a piece to translate the 'where' expression into SQL. Is this something I can just generate using Microsoft code? Or is there an easier way to do this?
(you mean just LINQ, not really LINQ-to-SQL)
Sure, you can do it - but it is massive amounts of work. Here's how; I recommend "don't". You could also look at the source code for DbLinq - see how they do it.
If you just want Where, it is a bit easier - but as soon as you start getting joins, groupings, etc - it will be very hard to do.
Here's just Where support on a custom LINQ implemention (not a fully queryable provider, but enough to get LINQ with Where working):
using System;
using System.Collections.Generic;
using System.Linq.Expressions;
using System.Reflection;
namespace YourLibrary
{
public static class MyLinq
{
public static IEnumerable<T> Where<T>(
this IMyDal<T> dal,
Expression<Func<T, bool>> predicate)
{
BinaryExpression be = predicate.Body as BinaryExpression;
var me = be.Left as MemberExpression;
if(me == null) throw new InvalidOperationException("don't be silly");
if(me.Expression != predicate.Parameters[0]) throw new InvalidOperationException("direct properties only, please!");
string member = me.Member.Name;
object value;
switch (be.Right.NodeType)
{
case ExpressionType.Constant:
value = ((ConstantExpression)be.Right).Value;
break;
case ExpressionType.MemberAccess:
var constMemberAccess = ((MemberExpression)be.Right);
var capture = ((ConstantExpression)constMemberAccess.Expression).Value;
switch (constMemberAccess.Member.MemberType)
{
case MemberTypes.Field:
value = ((FieldInfo)constMemberAccess.Member).GetValue(capture);
break;
case MemberTypes.Property:
value = ((PropertyInfo)constMemberAccess.Member).GetValue(capture, null);
break;
default:
throw new InvalidOperationException("simple captures only, please");
}
break;
default:
throw new InvalidOperationException("more complexity");
}
return dal.Find(member, value);
}
}
public interface IMyDal<T>
{
IEnumerable<T> Find(string member, object value);
}
}
namespace MyCode
{
using YourLibrary;
static class Program
{
class Customer {
public string Name { get; set; }
public int Id { get; set; }
}
class CustomerDal : IMyDal<Customer>
{
public IEnumerable<Customer> Find(string member, object value)
{
Console.WriteLine("Your code here: " + member + " = " + value);
return new Customer[0];
}
}
static void Main()
{
var dal = new CustomerDal();
var qry = from cust in dal
where cust.Name == "abc"
select cust;
int id = int.Parse("123");
var qry2 = from cust in dal
where cust.Id == id // capture
select cust;
}
}
}
Technically if your DAL exposes IQueryable<T> instead of IEnumerable<T> you can also implement a IQueryProvider and do exactly what you describe. However, this is not for the faint of heart.
But if you expose the LINQ to SQL tables themselves in the DAL, they will do exactly this for you. There is a (big) risk though since you'll be handling the client code total control over how to express SQL queries, and the usual result is some complex query that joins everything and slaps pagination a top of it with less than spectacular run time performance.
I think you should consider carefully what is actually needed from the DAL and expose only that.
I just read an interesting article on Expression Trees, LINQ to SQL uses these to translate the query into SQL and send it over the wire.
Maybe that's something you could use?
Just some though. I know some language support building a string that can be execute in the code itself. I never tried it with .Net, but this is common in functional languages like LISP. Since .Net support lambdas, maybe this is possible.
Since F# is coming to .Net soon, maybe it will possible if it is not right now.
What I am trying to say is if you can do this then maybe you can build that string that will be use as the LINQ statement and then execute it. Since it is a string, it will be possible to analyse the string and get the information you want.
Try Dynamic Linq
To anyone else with the same question out there. Pulling out the where clause from LINQ-to-SQL isn’t quite as straightforward, as one would’ve hoped for. Additionally, doing that by itself is probably meaningless. There are a couple of options, depending on the requirements – either grab it from the generated string, but then it would contain parameter references and object property mappings that would also have to be resolved, so those would also have to be pulled out of the original provider somehow, otherwise this would be pointless. Another – would be to find a modular provider that can do that, as well as make member mappings easily accessible, but once again, without the rest of the query, I see little utility in doing that, because the where clause would reference table/column aliases from the select statement.
I had a similar task to write a full blown provider for a custom ORM/DAL a couple of years ago. While it qualifies as the most complex thing I’ve worked on, being an experience developer, I can say it’s not as bad, as some people claim once you wrap your head around the concepts that lie at the foundation of such a component. Some solutions that I’ve seen go the wrong way about it, add redundant functionality and have extra code addressing problems introduced by underlying logic. E.g. the “optimization” stage/module that attempts to re-factor bloated, nested SQL produced by the main parser. If the latter was designed in such a way that would output clean SQL from the start, then no clean-up phase would be needed. I’ve seen providers that create a new level of nesting for each where and join call. That’s a bad strategy. By breaking down a query into three/four main parts – select, from, where and orderby, which are built individually as the tree is being visited, this problem is avoided altogether. I’ve developed an object-to-data (aka LINQ-to-SQL) provided based on these principles for a custom ORM/DAL and it produces nice, clean SQL, with an excellent performance, as each statement is compiled to IL and cached.
For anyone that is looking to do something similar, please see my posts that include a sample project with a tutorial/barebones implementation that makes it easy to see how it works. Included is also the full solution:
How to write a LINQ to SQL provider in C# Part 1 - Introduction
How to write a LINQ to SQL provider in C# Part 2 - Expression Visitor
How to write a LINQ to SQL provider in C# Part 3 - Where Clause Visitor
How to write a LINQ to SQL provider in C# Part 4 - Compiling Expression Trees

Categories