I am designing a fluent API for writing SQL. Keep in mind one of my goals is to have API not suggest functions that can't be called in that part of the chain. For instance if you just got done defining a field in the select clause you can't call Where until you called From first. A simple query looks like this:
string sql = SelectBuilder.Create()
.Select()
.Fld("field1")
.From("table1")
.Where()
.Whr("field1 > field2")
.Whr("CURRENT_TIMESTAMP > field3")
.Build()
.SQL;
My problem comes with recursion in SQL code. Say you wanted to have a field contain another SQL statement like below:
string sql = SelectBuilder.Create()
.Select()
.Fld("field1")
.SQLFld()
.Select
.Count("field6")
.From("other table")
.EndSQLFld()
.FLd("field2")
.From("table1")
.Where()
.Whr("field1 > field2")
.Whr("CURRENT_TIMESTAMP > field3")
.Build()
.SQL;
I am using method chaining to build my fluent API. It many ways it is a state machine strewn out across many classes which represent each state. To add this functionality I would need to copy essentially every state I already have and wrap them around the two SQLFld and EndSQLFld states. I would need yet another copy if you were one more level down and were embedding a SQL statement in to a field of the already embedded SQL statement. This goes on to infinity, so with an infinitely deep embedded SQL query I would need an infinite number of classes to represent the infinite states.
I thought about writing a SelectBuilder query that was taken to the point of the Build method and then embedding that SelectBuilder in to another SelectBuilder and that fixes my infinity problem, but it is not very elegant and that is the point of this API.
I could also throw out the idea that the API only offers functions when they are appropriate but I would really hate to do that. I feel like that helps you best discover how to use the API. In many fluent APIs it doesn't matter which order you call what, but I want the API to appear as close to the actual SQL statement as possible and enforce its syntax.
Anyone have any idea how to solve this issue?
Glad to see you are trying fluent interfaces, I think they are a very elegant and expressive.
The builder pattern is not the only implementation for fluent interfaces. Consider this design, and let us know what you think =)
This is an example and I leave to you the details of your final implementation.
Interface design example:
public class QueryDefinition
{
// The members doesn't need to be strings, can be whatever you use to handle the construction of the query.
private string select;
private string from;
private string where;
public QueryDefinition AddField(string select)
{
this.select = select;
return this;
}
public QueryDefinition From(string from)
{
this.from = from;
return this;
}
public QueryDefinition Where(string where)
{
this.where = where;
return this;
}
public QueryDefinition AddFieldWithSubQuery(Action<QueryDefinition> definitionAction)
{
var subQueryDefinition = new QueryDefinition();
definitionAction(subQueryDefinition);
// Add here any action needed to consider the sub query, which should be defined in the object subQueryDefinition.
return this;
}
Example usage:
static void Main(string[] args)
{
// 1 query deep
var def = new QueryDefinition();
def
.AddField("Field1")
.AddField("Filed2")
.AddFieldWithSubQuery(subquery =>
{
subquery
.AddField("InnerField1")
.AddField("InnerFiled2")
.From("InnerTable")
.Where("<InnerCondition>");
})
.From("Table")
.Where("<Condition>");
// 2 queries deep
var def2 = new QueryDefinition();
def2
.AddField("Field1")
.AddField("Filed2")
.AddFieldWithSubQuery(subquery =>
{
subquery
.AddField("InnerField1")
.AddField("InnerField2")
.AddFieldWithSubQuery(subsubquery =>
{
subsubquery
.AddField("InnerInnerField1")
.AddField("InnerInnerField2")
.From("InnerInnerTable")
.Where("<InnerInnerCondition>");
})
.From("InnerInnerTable")
.Where("<InnerCondition>");
})
.From("Table")
.Where("<Condition>");
}
You can't "have only applicable methods available" without either sub-APIs for the substructures or clear bracketing/ending of all inner structural levels (SELECT columns, expressions in WHERE clause, subqueries).
Even then, running it all through a single API will require it to be stateful & "modal" with "bracketing" methods, to track whereabouts in the decl you are. Error reporting & getting these right will be tedious.
Ending bracketing by "fluent" methods, to me, seems non-fluent & ugly. This would result in a ugly appearence of EndSelect, EndWhere, EndSubquery etc. I'd prefer to build substructures (eg SUBQUERY for select) into a local variable & add that.
I don't like the EndSQLFld() idiom, which terminates the Subquery implicitly by terminating the Field. I'd prefer & guess it would be better design to terminate the subquery itself which is the complex part of the nested structure -- not the field.
To be honest, trying to enforce ordering of a "declarative" API for a "declarative" language (SQL) seems to be a waste of time.
Probably what I'd consider closer to an ideal usage:
SelectBuilder select = SelectBuilder.Create("CUSTOMER")
.Column("ID")
.Column("NAME")
/*.From("CUSTOMER")*/ // look, I'm just going to promote this onto the constructor.
.Where("field1 > field2")
.Where("CURRENT_TIMESTAMP > field3");
SelectBuilder countSubquery = SelectBuilder.Create("ORDER")
.Formula("count(*)");
.Where("ORDER.FK_CUSTOMER = CUSTOMER.ID");
.Where("STATUS = 'A'");
select.Formula( countSubquery, "ORDER_COUNT");
string sql = SelectBuilder.SQL;
Apologies to the Hibernate Criteria API :)
Related
I am still (or again) on my student project and one part of the project is the ability to send and view (of course) messages between users, and we use HotChocalte (.Net 5.0) for queries (read) and mutations (send). Data are usually obtained from a MySQL database. We also applied [UseFiltering] and [UseSorting] to queries and that works find with the members of the DB model.
Along others, we added a custom field containing the number of messages within a "thread" (we called that a conversation), and it tells the correct number of queries, but it seems to be impossible to filter or sort by this special fiels - the message "The specified input object field '...' does not exist.".
I have no more ideas what can I do. Do you have any suggestions? Hope following code is good enough to understand what I'm doing here:
C# implementation (backend):
public class ConversationType : ObjectType<Model.Conversation>
protected override void Configure(IObjectTypeDescriptor<Model.Conversation> descriptor)
{
// [UseFiltering] // This doesn't help
// [UseSorting] // This doesn't help
descriptor.Field("messageCount")
.UseDbContext<ReviverDbContext>()
.Type<IntType>()
.ResolveWith<ConversationResolvers>(
product => product.GetMessageCount(default, default)
)
// .UseFiltering(); // This doesn't help
// ...
private class ConversationResolvers
{
/**
* #brief Get the number of message within a conversation
*/
public async Task<int>
GetMessageCount([Parent] Model.Conversation conversation,
[ScopedService] OurDbContext dbContext)
{
return await Task.FromResult(dbContext.Messages
.Where(message => message.Conversation == conversation)
.Count()
);
}
// ...
}
}
}
HotChocolate QueryType in C# (backend too):
[ExtendObjectType(OperationTypeNames.Query)]
public class ConversationQueries
{
[UseOurDbContext]
[UsePaging(IncludeTotalCount = true)]
[UseFiltering]
[UseSorting]
public async Task<IEnumerable<Model.Conversation>>
GetConversationsAsync([ScopedService] OurDbContext dbContext)
{
return dbContext.Conversations.ToListAsync();
}
}
Example of query that does not work:
query {
conversations
(
where: {
messageCount: {neq : 0} <- "The specified input object field does not exist"
}
)
nodes {
messageCount
...
}
}
Thanks for any advise.
TLDR: No, its not possible. And for good reasons.
Please note, that when you are configure() in Hot Chocolate, you are really setting a pipeline. Every piece (middleware) have input, process and is returning output. Thats why order of middleware really matter. This is important for further understanding.
[UseFiltering]
There is no magic in [UseFiltering] - it is just middleware which will generate gql parameter object translateable to Linq Expression and at execution time, it will take this parameter, make Linq Expression from it and call return pipeInput.Where(linq_expression_i_just_generated_from_useFiltering_parameter_object)
So, when your input is EF DbSet<Model.Conversation>, then it simply call conversationDbSet.Where(expression). Entity framework then take this expression and when someone read results, it translate it to SQL select and fetch results from server.
But your expression is not translatable to sql - it contains field unknown to sql server, hence the error.
This is the reason why it work, when you call .ToList() -- it fetched whole table from sql server and do everything locally, which is not good (maybe its doable right now, but this will bite you later)
Why this is actually good behaviour:
Ok, imagine yourself few years in future where your Conversation table have 100.000.000 rows. Your users ara really chatty or your app is smashing success, congratulations :)
Imagine you have (privacy nightmare, but this is only example) api endpoint returning all conversations. And because there is so many conversation, you did what any sane guy/girl do: you added filtering and paging. Lets say 100 items per page.
Now, you have this messageCount field, which is executed locally.
One of your user have no idea what he is doing and filter "where messageCount = -1" (or 123 or anything which have 0 result).
Now what... Your pipeline do select top 100 (paging) but when it try to evaluate filter locally, it found out that there is not a single result. And this is where problems start: what should it do? Fetch another 100 rows? And another, and another and (100.000.000 / 100 = 1.000.000x select query) another, until it iterate whole table only to find there is no result? Or, second options, when it found out that part of your Where must be evaluated locally, do paging locally too and fetch 100.000.000 items in one go?
Both cases are really bad for performance and RAM. You just DDoS yourself.
By the way, option B is what EF (before-core, no idea how it is called... EF Classic? :D ) did. And it was a source of hard to find bugs, when in one moment query take few ms but then it take minutes. Good folks from EF team dropped this in first version of EFCore.
What to do instead...
One, AFAIK future proof (but it take some work), solution for this is using DB Views. Actually, in this case it is pretty easy. Just create view like this:
create view ConversationDto
as
select Conversation.*,
isnull(c.[count], 0) as MessageCount
from Conversation
left join (
select ConversationId,
count(Id)
from Messages
group by ConversationId
) as c on c.ConversationId = Conversation.Id
(This is mssql query, no idea which db you are using. As a small bonus, you can optimalise hell out of your view (add custom index(ex), better joins, etc))
Then create new EF Entity:
public class ConversationDto
: Conversation
{
public int MessageCount { get; set; }
}
public class ConversationDtoMap
: IEntityTypeConfiguration<ConversationDto>
{
public void Configure(EntityTypeBuilder<SubjectOverviewDto> builder)
{
builder.ToView("ConversationDto"); // ToView! not ToTable!
}
}
And now you are pretty much ready to go.
Just throw out your custom resolver (you moved your work to db, which is (well... should be. MySql, I am looking at you :-| ) great in this kind of work - working with data sets.) and change (query) type to ConversationDtoType.
As side effect, your ConversationDto (gql query model) is no longer same as your input model (gql mutation model - but your (domain?) model Conversation still is), but thats ok - you dont want to let users set messageCount anyway.
What to do instead #2
Your second option is to use ComputedColumns. I am not sure if its doable in your case, not a big fan of ComputedColumns, but when your custom resolver do something stupid like var p = ctx.Parent<Foo>(); return p.x + p.y * 42; then this should be fine.
Again, you moved work to database and there is no need for custom resolver and you can throw it out.
What to do instead #3
Also, calling .ToList()/.ToArray() before returning your DbSet<> at the beginning of your pipe and simply emberacing that "this will fetch, materialize and filter whole table" is another possible solution.
But be aware, that this can come back later and bite you really hard, especially if your Conversation is not some sort of list and there is chance that this table will grow big. (I have a feeling that Conversation is exactly that).
Also, because your implementation is another query to database, be aware that you just created 1 + n problem.
Do it at your own risk.
Impossible cases:
Of course, there can be resolver which do things that are impossible to do in sql (for example, sending http request to REST api on another server).
Then you are out of luck for reason I showed you in example "Why this is actually good behaviour".
AFAIK you have two possible solutions: You can reimplement UseFiltering (I did that, its not THAT hard, but my case was relatively simple and HC is open source, so you have great starting point...) or you can just add custom argument + custom middleware when configuring your endpoint and implement (pre) filtering yourself.
Foot note:
please, dont do this:
public async Task<int> GetMessageCount([Parent] Model.Conversation conversation,
[ScopedService] OurDbContext dbContext)
{
return await Task.FromResult(dbContext.Messages
.Where(message => message.Conversation == conversation)
.Count()
);
}
at least, name it GetMessageCountAsync() but even better, its not async, so no need to wrap it in Task<> and await Task.FromResult() at all, just:
public int GetMessageCount([Parent] Model.Conversation conversation,
[ScopedService] OurDbContext dbContext)
{
return dbContext.Messages
.Where(message => message.Conversation == conversation)
.Count();
}
One of the many reason to use FluentNHibernate, the new QueryOver API, and the new Linq provider are all because they eliminate "magic string," or strings representing properties or other things that could be represented at compile time.
Sadly, I am using the spatial extensions for NHibernate which haven't been upgraded to support QueryOver or LINQ yet. As a result, I'm forced to use a combination of QueryOver Lambda expressions and strings to represent properties, etc. that I want to query.
What I'd like to do is this -- I want a way to ask Fluent NHibernate (or perhaps the NHibernate QueryOver API) what the magic string "should be." Here's a pseudo-code example:
Currently, I'd write --
var x = session.QueryOver<Shuttle>().Add(SpatialRestrictions.Intersects("abc", other_object));
What I'd like to write is --
var x = session.QueryOver<Shuttle>().Add(SpatialRestriction.Intersects(session.GetMagicString<Shuttle>(x => x.Abc), other_object));
Is there anything like this available? Would it be difficult to write?
EDIT: I just wanted to note that this would apply for a lot more than spatial -- really anything that hasn't been converted to QueryOver or LINQ yet could be benefit.
update
The nameof operator in C# 6 provides compile time support for this.
There is a much simpler solution - Expressions.
Take the following example:
public static class ExpressionsExtractor
{
public static string GetMemberName<TObj, TProp>(Expression<Func<TObj, TProp>> expression)
{
var memberExpression = expression.Body as MemberExpression;
if (memberExpression == null)
return null;
return memberExpression.Member.Name;
}
}
And the usage:
var propName = ExpressionsExtractor.GetMemberName<Person, int>(p => p.Id);
The ExpressionsExtractor is just a suggestion, you can wrap this method in whatever class you want, maybe as an extension method or preferably a none-static class.
Your example may look a little like this:
var abcPropertyName = ExpressionsExtractor.GetMemberName<Shuttle, IGeometry>(x => x.Abc);
var x = session.QueryOver<Shuttle>().Add(SpatialRestriction.Intersects(abcPropertyName, other_object));
Assuming I'm understanding your question what you might want is a helper class for each entity you have with things like column names, property names and other useful things, especially if you want to use ICriteria searches. http://nhforge.org/wikis/general/open-source-project-ecosystem.aspx has plenty of projects that might help. NhGen (http://sourceforge.net/projects/nhgen/) creates very simple helper classes which might help point you down a design path for what you might want.
Clarification Edit: following an "I don't understand" comment
In short, I don't beleive there is a solution for you just yet. The QueryOver project hasn't made it as far as you want it to. So as a possible solution in the mean time, to remove magic strings build a helper class, so your query becomes
var x = session.QueryOver<Shuttle>().Add(SpatialRestrictions.Intersects(ShuttleHelper.Abc, other_object));
That way your magic string is behind some other property ( I just chose .Abc to demonstrate but I'm sure you'll have a better idea of what you want ) then if "abc" changes ( say to "xyz" ) you either change the property name from .Abc to .Xyz and then you will have build errors to show you where you need to update your code ( much like you would with lambda expressions ) or just change the value of the .Abc property to "xyz" - which would really only work if your property had some meaningfull name ( such as .OtherObjectIntersectingColumn etc ) not that property name itself. That does have the advantage of not having to update code to correct the build errors. At that point your query could be
var x = session.QueryOver<Shuttle>().Add(SpatialRestrictions.Intersects(ShuttleHelper.OtherObjectIntersectingColumn, other_object));
I mentioned the open source project ecosystem page as it can give you some pointers on what types of helper classes other people have made so your not re-inventing the wheel so to speak.
I was wondering about the performance difference between these two scenarios and what could the disadvantages be over each other?
First scenario :
public class Helper //returns IQueryable
{
public IQueryable<Customer> CurrentCustomer
{
get{return new DataContext().Where(t=>t.CustomerId == 1);
}
}
public class SomeClass
{
public void Main()
{
Console.WriteLine(new Helper().CurrentCustomer.First().Name;
}
}
The second scenario :
public class Helper //returns Enumerated result
{
public Customer CurrentCustomer
{
get{return new DataContext().First(t=>t.CustomerId == 1);
}
}
public class SomeClass
{
public void Main()
{
Console.WriteLine(new Helper().CurrentCustomer.Name;
}
}
Thanks in advance.
Well, the main difference that I can see is when the query is executed and what else you can do with the query.
For example, suppose your Customer object has some large fields. Using the second approach, you will always fetch them. Using the first approach you could write:
string name = helper.CurrentCustomer.Select(x => x.Name).First();
That would then only need to query the single field in the database. In terms of timing, the query will only be executed when you actually request the data (which is how it's able to wait until after you've used Select to work out what to put in the query in the above case). That has pros and cons - it can make it harder to reason about, but it can save some work too. In terms of the "reasoning about" side, you know that once you've got a customer, you've got an object you can just work with. If you use the same queryable twice though, you need to know whether your LINQ query provider is going to cache the result... if you write:
IQueryable<Customer> currentCustomerQuery = helper.CurrentCustomer;
Customer x = currentCustomerQuery.First();
Customer y = currentCustomerQuery.First();
will that issue the query once or twice? I suspect it very much depends on the provider, but I wouldn't like to make any guesses about specific ones.
The other thing to think about is how easy it is to use the API you're building. Personally I'd normally find it easier to use an API which gives me the data I want rather than a query I can fetch that data from. On the other hand, it is slightly less flexible.
One option would be to allow both - have a GetCurrentCustomerQuery() and a GetCurrentCustomer() method. (I probably wouldn't make them properties myself, but that's merely a matter of personal preference.) That way you can get the flexibility you want when you really need it, but have a simple way of just getting the current customer as an object.
In short, using IQueryable is far better and allows you further filter the returned IQueryable down the path, without actually having the object or collection loaded into the memory. In this case, the return type is a simple Customer class and impact would be minimal, but in case of collections, you are strongly advised to use IQueryable. Chris Sells shows the problem in more depth here
The difference between the methods is that the first one returns an expression that can return the object, whlie the second one has already executed the expression and returns the object.
In this exacty scenario the difference isn't very useful, and returning a single object as an expression is not very intuitive.
A scenario where the difference is more useful is if you have a method that returns several objects. The deferred execution of the expression means that you will only load the objects that you actually use. In the case that you only need the first few objects, the rest of the objects will not be created.
I've been using LINQ extensively in my recent projects, however, I have not been able to find a way of dealing with objects that doesn't either seem sloppy or impractical.
I'll also note that I primarily work with ASP.net.
I hate the idea of exposing the my data context or LINQ returned types to my UI code. I prefer finer grained control over my business objects, and it also seems too tightly coupled to the db to be good practice.
Here are the approaches I've tried ..
Project items into a custom class
dc.TableName.Select(λ => new MyCustomClass(λ.ID, λ.Name, λ.Monkey)).ToList();
This obviously tends to result in a lot of wireup code for creating, updating etc...
Creating a wrapper around returned object
public class MyCustomClass
{
LinqClassName _core;
Internal MyCustomClass(LINQClassName blah)
{
_core = blah;
}
int ID {get { return _core.ID;}}
string Name { get {return _core.Name;} set {_core.Name = value;} }
}
...
dc.TableName.Select(λ => new MyCustomClass(λ)).ToList();
Seems to work pretty well but reattaching for updates seems to be nigh impossible somewhat defeating the purpose.
I also tend to like using LINQ Queries for transformations and such through my code and I'm worried about a speed hit with this method, although I haven't tried it with large enough sets to confirm yet.
Creating a wrapper around returned object while persisting data context
public class MyCustomClass
{
LinqClassName _core;
MyDataContext _dc;
...
}
Persisting the data context within my object greatly simplifies updates but seems like a lot of overhead especially when utilizing session state.
A quick Note: I know the usage of λ is not mathematically correct here - I tend to use it for my bound variable because it stands out visually, and in most lambda statements it is the transformation that is important not the variable - not sure if that makes any sense but blah
Sorry for the extremely long question.
Thanks in advance for your input and Happy New Years!
I create "Map" extension functions on the tables returning from the LINQ queries. The Map function returns a plain old CLR object. For example:
public static MyClrObject Map(this MyLinqObject o)
{
MyClrObject myObject = new MyClrObject()
{
stringValue = o.String,
secondValue = o.Second
};
return myObject;
}
You can then add the Map function to the select list in the LINQ query and have LINQ return the CLR Object like:
return (from t in dc.MyLinqObject
select t.Map()).FirstOrDefault();
If you are returning a list, you can use the ToList to get a List<> back. If you prefer to create your own list types, you need to do two things. First, create a constructor that takes an IEnumerable<> of the underlying type as it's one argument. That constructor should copy the items from the IEnumerable<> collection. Second, create a static extension method to call that constructor from the LINQ query:
public static MyObjectList ToMyObjectList(this IEnumerable<MyObjectList> collection)
{
return new MyObjectList (collection);
}
Once these methods are created, they kind of hide in the background. They don't clutter up the LINQ queries and they don't limit what operations you can perform in teh query.
This blog entry has a more thorough explanation.
I'm working with a client who wants to mix LINQ to SQL with their in-house DAL. Ultimately they want to be able to query their layer using typical LINQ syntax. The point where this gets tricky is that they build their queries dynamically. So ultimately what I want is to be able to take a LINQ query, pull it apart and be able to inspect the pieces to pull the correct objects out, but I don't really want to build a piece to translate the 'where' expression into SQL. Is this something I can just generate using Microsoft code? Or is there an easier way to do this?
(you mean just LINQ, not really LINQ-to-SQL)
Sure, you can do it - but it is massive amounts of work. Here's how; I recommend "don't". You could also look at the source code for DbLinq - see how they do it.
If you just want Where, it is a bit easier - but as soon as you start getting joins, groupings, etc - it will be very hard to do.
Here's just Where support on a custom LINQ implemention (not a fully queryable provider, but enough to get LINQ with Where working):
using System;
using System.Collections.Generic;
using System.Linq.Expressions;
using System.Reflection;
namespace YourLibrary
{
public static class MyLinq
{
public static IEnumerable<T> Where<T>(
this IMyDal<T> dal,
Expression<Func<T, bool>> predicate)
{
BinaryExpression be = predicate.Body as BinaryExpression;
var me = be.Left as MemberExpression;
if(me == null) throw new InvalidOperationException("don't be silly");
if(me.Expression != predicate.Parameters[0]) throw new InvalidOperationException("direct properties only, please!");
string member = me.Member.Name;
object value;
switch (be.Right.NodeType)
{
case ExpressionType.Constant:
value = ((ConstantExpression)be.Right).Value;
break;
case ExpressionType.MemberAccess:
var constMemberAccess = ((MemberExpression)be.Right);
var capture = ((ConstantExpression)constMemberAccess.Expression).Value;
switch (constMemberAccess.Member.MemberType)
{
case MemberTypes.Field:
value = ((FieldInfo)constMemberAccess.Member).GetValue(capture);
break;
case MemberTypes.Property:
value = ((PropertyInfo)constMemberAccess.Member).GetValue(capture, null);
break;
default:
throw new InvalidOperationException("simple captures only, please");
}
break;
default:
throw new InvalidOperationException("more complexity");
}
return dal.Find(member, value);
}
}
public interface IMyDal<T>
{
IEnumerable<T> Find(string member, object value);
}
}
namespace MyCode
{
using YourLibrary;
static class Program
{
class Customer {
public string Name { get; set; }
public int Id { get; set; }
}
class CustomerDal : IMyDal<Customer>
{
public IEnumerable<Customer> Find(string member, object value)
{
Console.WriteLine("Your code here: " + member + " = " + value);
return new Customer[0];
}
}
static void Main()
{
var dal = new CustomerDal();
var qry = from cust in dal
where cust.Name == "abc"
select cust;
int id = int.Parse("123");
var qry2 = from cust in dal
where cust.Id == id // capture
select cust;
}
}
}
Technically if your DAL exposes IQueryable<T> instead of IEnumerable<T> you can also implement a IQueryProvider and do exactly what you describe. However, this is not for the faint of heart.
But if you expose the LINQ to SQL tables themselves in the DAL, they will do exactly this for you. There is a (big) risk though since you'll be handling the client code total control over how to express SQL queries, and the usual result is some complex query that joins everything and slaps pagination a top of it with less than spectacular run time performance.
I think you should consider carefully what is actually needed from the DAL and expose only that.
I just read an interesting article on Expression Trees, LINQ to SQL uses these to translate the query into SQL and send it over the wire.
Maybe that's something you could use?
Just some though. I know some language support building a string that can be execute in the code itself. I never tried it with .Net, but this is common in functional languages like LISP. Since .Net support lambdas, maybe this is possible.
Since F# is coming to .Net soon, maybe it will possible if it is not right now.
What I am trying to say is if you can do this then maybe you can build that string that will be use as the LINQ statement and then execute it. Since it is a string, it will be possible to analyse the string and get the information you want.
Try Dynamic Linq
To anyone else with the same question out there. Pulling out the where clause from LINQ-to-SQL isn’t quite as straightforward, as one would’ve hoped for. Additionally, doing that by itself is probably meaningless. There are a couple of options, depending on the requirements – either grab it from the generated string, but then it would contain parameter references and object property mappings that would also have to be resolved, so those would also have to be pulled out of the original provider somehow, otherwise this would be pointless. Another – would be to find a modular provider that can do that, as well as make member mappings easily accessible, but once again, without the rest of the query, I see little utility in doing that, because the where clause would reference table/column aliases from the select statement.
I had a similar task to write a full blown provider for a custom ORM/DAL a couple of years ago. While it qualifies as the most complex thing I’ve worked on, being an experience developer, I can say it’s not as bad, as some people claim once you wrap your head around the concepts that lie at the foundation of such a component. Some solutions that I’ve seen go the wrong way about it, add redundant functionality and have extra code addressing problems introduced by underlying logic. E.g. the “optimization” stage/module that attempts to re-factor bloated, nested SQL produced by the main parser. If the latter was designed in such a way that would output clean SQL from the start, then no clean-up phase would be needed. I’ve seen providers that create a new level of nesting for each where and join call. That’s a bad strategy. By breaking down a query into three/four main parts – select, from, where and orderby, which are built individually as the tree is being visited, this problem is avoided altogether. I’ve developed an object-to-data (aka LINQ-to-SQL) provided based on these principles for a custom ORM/DAL and it produces nice, clean SQL, with an excellent performance, as each statement is compiled to IL and cached.
For anyone that is looking to do something similar, please see my posts that include a sample project with a tutorial/barebones implementation that makes it easy to see how it works. Included is also the full solution:
How to write a LINQ to SQL provider in C# Part 1 - Introduction
How to write a LINQ to SQL provider in C# Part 2 - Expression Visitor
How to write a LINQ to SQL provider in C# Part 3 - Where Clause Visitor
How to write a LINQ to SQL provider in C# Part 4 - Compiling Expression Trees