Either LINQ to SQL or LINQ to Entities already have the ability to convert LINQ into a SQL text string. But I want my application to make the conversion without using the db context - which in turn means an active database connection - that both those providers require.
I'd like to convert a LINQ expression into an equivalent SQL string(s) for WHERE and ORDER BY clauses, without a DB context dependency, to make the following repository interface work:
public interface IStore<T> where T : class
{
void Add(T item);
void Remove(T item);
void Update(T item);
T FindByID(Guid id);
//sure could use a LINQ to SQL converter!
IEnumerable<T> Find(Expression<Func<T, bool>> predicate);
IEnumerable<T> FindAll();
}
QUESTION
It is primarily the expression tree traversal and transform I am interested in. Does anyone know of an existing library (nuget?) that I can incorporate to be used in such a custom context?
As it is I've already built my own working "LINQ transformed to SQL text" tool, similar to this expression tree to SQL example which works in my above repository. It allows me to write code like this:
IRepository<Person> repo = new PersonRepository();
var maxWeight = 170;
var results = repo.Find(x => (x.Age > 40 || x.Age < 20) && x.Weight < maxWeight);
But my code and that sample are primitive (and that sample itself relies on a LINQ to SQL db context). For example, neither handle generation of "LIKE" statements.
I don't expect or need a generator-tool that handles every conceivable LINQ query. For example, I'm not worried about handling and generating joins or includes. In fact, with another ~20 hours my own custom code may cover all the cases that I care about (mostly "WHERE" and "ORDER BY" statements).
But at the same time I feel that I should not have to write my own custom code to do this. If I'm stuck writing my own, then I'd still be interested if someone could point me to specific classes I can reflect and imitate (NHibernate, EF, etc.). I'm asking about specific classes to peek at, if you know them, because I don't want to spend hours sifting through the code of a massive tool just to find the part I need.
Not that it matters, but if anyone wants to know why I'm not simply using LINQ to SQL or LINQ to Entities...for my specific application I simply prefer to use a tool such as Dapper.
USE CASES
Whether I finish building the tool myself, or find a 3rd party library, here are reasons why a "LINQ to SQL text string" would be useful:
The predicate I type into the IRepository.Find method has intellisense and basic compile-time checking.
My proposed IStore interface can be implemented for DB access or web service access. To clarify, if I can convert the LINQ "WHERE/ORDER BY" predicate to a SQL "WHERE/ORDER BY" clause then...
The SQL string could be used by Dapper directly.
The SQL string, unlike a LINQ expression, can be sent to a WCF service to be used for direct DB access (which itself might not be using Dapper).
The SQL string could be deserialized, with custom code, back into a LINQ statement by the WCF service. Eric Lippert comments on this.
The UI can use IQueryable mechanics to dynamically generate a predicate to give to the repository
In short, such a tool helps fulfill the "specification" or "query object" notion of repositories according to DDD, and does so without taking a dependency on EF or LINQ to SQL.
Doing this properly is really extremely complicated, especially if right now, you don't seem to know much about expression trees (which is what IQueryable uses to represent queries).
But if you really want to get started (or just get an idea of how much work it would be), have a look at Matt Warren's 17-part series Building an IQueryable provider.
I can confirm as this is a fairly big amount of work that’s suited only for the most experienced .NET developers. Perfect knowledge of C#, experience with multiple languages, including T-SQL is a must. One must be very well versed in both C# (or VB.NET) and T-SQL as they’ll have to write translator using the former into the latter. Additionally, this is in the realm of meta-programming, which is considered a fairly advanced branch of computer science. There is a lot of abstract thinking involved. Layers of abstract concepts stacked on each other.
If all of this isn’t a barrier, then this exercise can actually be quite enjoyable and rewarding, at least the first month or so. One common problem in these providers I noticed is that inflexibility and questionable design choices at the start led to difficulties later on and hacky fixes, etc. Planning as much as possible in advance, clearly understanding the whole process, different stages, components properly identifying layers and concerns would make it much easier to develop this. The biggest mistake I saw in one provider was – failing to break down the output query into its parts – select, from, where and order by. Each part should be represented by its own object throughout and then put together at the end. I explain this approach in my end-to-end tutorial on how to write a provider in the series linked below. There’s a sample project included, with a simpliefied/tutorial variant and the full version made from scratch for a project. Finding the time to write about it was a challenge in itself.
How to write a LINQ to SQL provider in C#:
Introduction
Expression Visitor
Where Clause Visitor
Compiling Expression Trees
This is something I briefly looked into quite a while ago. You may want to have a look at http://iqtoolkit.codeplex.com/ and/or http://expressiontree.codeplex.com/ for ideas. As mentioned by others, Linq query provider building is far from trivial if you do not limit your scope to the minimum set of features you really need.
If your goals relate to "specification" or "query object" notion of repositories according to DDD, this may not be the best direction to take. Instead of CRUD like technology related abstractions, it may be more productive to focus on ways in which the behaviour of the domain can be expressed, with a minimum of direct dependencies on technology related abstractions. As Eric Evans recently discussed, he regrets the focus on the technical building blocks, such as repositories, in his initial descriptions of DDD.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
There was a discussion on Kotlin Slack about a possibility of adding code trees to support things like C# LINQ.
In C# LINQ has many applications, but I want to focus on only one (because others a already presumably covered by Kotlin syntax): composing SQL queries to remote databases.
Prerequisites:
We have a data schema of an SQL database expressed somehow in the code so that static tools (or the type system) could check the correctness (at least the naming) of an SQL query
We have to generate the queries as strings
We want a syntax close to SQL or Java streams
Question: what do expression trees add to the syntax that is so crucial for the task at hand? How good an SQL builder can be without them?
How good a query dsl can be without expression trees?
As JINQ has shown you can get very far by analyzing the bytecode to understand the intent of developer and thus translate predicate to SQL. So in principle expression trees are not essential to build a great looking query dsl:
val alices = database.customerStream().where { it.name == "Alice" }
Even without a hackery such as bytecode analysis it's possible to get a decent query dsl with code generation. Querydsl and JOOQ are great examples. With a little bit of Kotlin wrapping code you can then write
val alices = db.findAll(QCustomer.customer, { it.name.eq("Alice") })
How do expression trees help building query dsl
Expression tree is a structure representing some code that resolves to a value. By having such structure generated by compiler one does not need bytecode analysis to understand what's the supposed to do. Given the example
val alices = database.customerStream().where { it.name == "Alice" }
The argument to where function would be an expression that we can inspect in runtime and translate it to SQL or other query language. Because the expression trees represent code you don't need to switch between Kotlin and SQL paradigm to write queries. The query code expressed using linq/jinq look pretty much the same regardless if they are executed in memory using POCO/POJO or in the database engine using its query language. The compiler can also do more type checking. Furthermore it's very easy to replace the underlying database with in memory representation to make running tests much faster.
Further reading:
What are Expression Trees and how do you use them and why would you use them?
Practical use of expression trees
JOOQ and Querydsl:
The typical solution to ORM has been to employ either a DSL or an embedded DSL from the application logic. While great advances have been made with these schemes, culminating in JOOQ and Querydsl, there are still many caveats to such a system:
many of the paradigms the people writing these queries are used to (namely type safety) are either missing or different in key ways
the exact semantics are non-obvious: in the previous answer it is suggested that we use an extension method eq to perform a db-native equality filter. It is highly probably that a new developer will mistakenly use equals instead of eq.
this second point is compounded by the difficulty in testing: using a live connector with fake data is a nutoriously difficult problem, so depending on the testing procedure, the incorrect code jooqDB.where { it.name.equals("alice") } may not be discovered until much further down the development pipeline.
Jinq
Jinq is not a first party data connector. While I think the importance of this is largely psychosomatic, it is important none the less. Many projects are going to use the tools suggested by the vendors, and all of the major DB providers have Java connectors, so it is likely that most developers will simply use them.
While I have not used Jinq, it is my belief that another reason Jinq has not seen wide-spread adoption is largely because it's attempting to use a much tougher domain to solve the problem: building queries from AST's is much easier than building queries from byte code for the same reason that building the back-end of a compiler is easier than building a transcompiler. While I cannot help but tip my hat to the Jinq team for doing such an amazing job, I also cannot help but think they are hampered by their tools: building queries out of bytecode is hard. By definition, java bytecode is committed to running on the JVM, trying to retrofit that commitment for another interpreter is a very hard problem.
My current work does not permit me to use a traditional database, but if I was to switch projects, knowing that I would need a great deal of data exposure in my DAL, I would likely retreat from Kotlin and Java back to .net, largely because of Linq, rather than investigate Jinq. "Linq from Kotlin" might well change my mind.
Support from DB vendors:
The LINQ-to-SQL and LINQ-to-mongo database connectors have seen wide-spread adoption in the .net community. This is because they are first party, high quality, and act in a reasonably straightforward manner: compiling an AST to SQL (or the mongo-query-language) is at least conceptually straight forward. Many of the traditional caveats of ORM's apply, but the vendors (Microsoft and Mongo) continue to address these problems.
If Kotlin supported runtime code trees in a similar vein to Linq, and if Kotlin continues to gain traction at its current rate, then I believe the MongoDB and the Hibernate teams would be quick to start retrofitting their existing LINQ-to-X connectors to support Kotlin clients, and eventually even the bigger slower companies like Microsoft and IBM would begin to support the same flow.
Linq from Kotlin
What’s more, the exact roles the Kotlin-unique concepts of a "receiver type" and the aggressive implementation of inline might play in the Linq space is interesting. Linq-from-Kotlin might well be more effective than LINQ-from-C#.
where C# has
someInterface
.where(customer -> customer.Name.StartsWith("A") && ComplexFunctionCallDoneByMongoDriver(customer))
.select(customer -> new ResultObject(customer.Name, customer.Age, OtherContext()))
Kotlin might be able to make advances:
someInterface
.filter { it.name startsWith "A" && inlinedComplexFunctionCallDoneOnDB(it) }
//inlined methods would have their AST exposed -> can be run on the DB process instead of on the driver.
.map { ResultObject(name, age, otherContext()) }
//uses a reciever type, no need to specify "customer ->"
//otherContext() might also be inlined
This is off the top of my head, I suspect that smarter brains than mine can put these tools to better use.
Other uses:
Its worth mentioning, the assumption made about the applications of runtime-code-AST's is false:
other [runtime AST problem domains] [are] already presumably covered by Kotlin syntax
The reason I brought this up in the first place was because I was annoyed with Kotlin's null-safety feature and its interaction with Mockito: Having spent some time researching the issue, there is no Mocking framework designed for Kotlin, only java frameworks that can be used from Kotlin, with some pain.
Some currently unsolved problems in both the java domain and the Kotlin domain:
Mocking frameworks, as above. With access to the AST all of the clever but bizarre tricks around argument-operation-order employed by Mockito become obsolete. Other more traditional mocking frameworks gain a much more intuitive and type-safe front-end.
binding expressions, for UI frameworks or otherwise, often devolve to strings. Consider a UI framework where developers could write notifyOfUIElementChange { this.model.displayName } instead of notifyOfUIElementChange("model.displayName"). The latter suffers from the crippling problem of being stale if somebody renames the property.
I'm very excited to see what the ControlsFX guys or Thomas Mikula might do with a feature such as this.
similar to Kotlin specific Linq: I suspect Kotlin’s applications here might provide for a number of tools I'm not aware of. But I'm very confident that they do exist.
I really like Linq, and I cannot help but think that with Kotlin's focus on industry problems, a Linq-from-Kotlin module would be a perfect fit and make a number of peoples lives, including mine, a fair bit easier.
In most cases, classes are known beforehand (e.g. Customer, Order); they're described for ORMs (e.g. Entity Framework, LINQ to SQL, NHibernate, BLToolkit) using visual designers, attributes in code or configuration files. When you need to use objects of class Customer, for example, you can write strongly-typed queries like this:
db.Customers
.Where(c => c.FirstName == "John")
.Select(c => new { c.Id, c.LastName })
.GroupBy(c => c.Id)
However, in my application, input data will be processed into models which are defined by users at runtime. Users will be able to add and remove properties, therefore I cannot define models in the code. The idea that this will require manually generating string SQL queries horrifies me. I'd like to be able to write code like this:
db.Tables["Customers"]
.Where("c.FirstName = :FirstName")
.Select(new[] { "c.Id", "c.LastName" })
.GroupBy("c.Id")
.Param(":FirstName", "John")
(Just example invented syntax.)
Question: is there a library for .NET which helps to construct complex SQL queries without defining models in the code first?
P.S. Looks like libraries like this exist, just not for .NET. I'd love to see something like SQLAlchemy, Ruby on Rails ActiveRecord, PHP Yii ActiveRecord, Django Models etc.
Rob Conery's massive may be helpful to you. It's a lightweight, dynamic DAL that has the ad-hoc capabilities you're looking for while still giving you some of the benefits of a DAL.
Because it's dynamic it doesn't give you compile-time checking or some of the other benefits of it]s bigger brothers, but it's a very handy little tool that straddles the space between a full DAL and raw SQL.
You ARE in a lot of pain now - this generally is seen as an anti pattern. It actually is, but sometimes it is required (and you may just have one of those apps). The bad news is - no support from any ORM etc.
Most ORM can generate a model from the database, but that is a "development time" operation, so it is not possible to sensibly use when for example your data model is totally unfived. THAT SAID - I am not sure how such an applicaiton would even work.
I am not aware of any application for this - there are two extremes basically: SQL as string, which is generally seen as bad from anyone with a clue, and strong typed objects. Your app falls right in between, from the requirements, and that is something that I think has no support.
One problem you face is that LINQ is - a compile time checked operation, That is soled as hugh advantage, but now it bites you. You POSSIBLY could work by having an "Entity" object as base class that is DYNAMIC, then take it from there (because the compiler will not complain about anything on a dynamic object). Theoretically this could allow you to construct an object query integation for LINQ without compile time checking, but again - no such thing exists IIRC due to lack of "common usage scenarios". I would choke on no compile time checking, for example.
Solution to the problem: Simple.Data by Mark Rendle. It allows to write code like this without defining any classes or properties:
Database.Open().Users.FindAllByEmail(email).FirstOrDefault()
(It was mentioned in the description of the Rob Conery's massive library.)
I have customized collection for indexing reason, its an implement ion of Idicationary (non generic). This is used to hold string based key and object based value.
Now please pardon my ignorance, I have just came out of the cave.
I want use an adapter between, this is a linq adapter which should take linq queries and perform operation on this existing IndexedDictionary.
Why SO ?
This was designed for a .net 2.0 application, now slowly and steadily we are moving toward 4.0 as a part of natural evolution, so we are taking side-by-side aproach so every thing written previously should exist and 4.0 features should be implemented as adapter where ever possible.
I will summarize what I want to cut long story short, I have an existing .net 2.0 IDictionary implementation. Now I would like to use it with LINQ so that can I can take full advantage of expressions. How can I do this ?
It seems to me that you don't need an adapter. An adapter is for transforming expression trees (IQueryable) to communicate with an underlying data source. For in-memory collections, the existing extension methods such as LINQ to Objects, LINQ to XML and LINQ to DataSets will normally do. If this is not enough, you can write your own extension methods or write instance LINQ methods on your type.
Not sure that I fully understand what your purpose is and what the code needs to do. LINQ is very powerful, you probably don't need an adapter.
For going from LINQ results to Dictionary there is a ToDictionary method in LINQ. See http://www.hookedonlinq.com/ToDictionaryOperator.ashx
Cheers
Chris Farrell
You don't need to do anything to the .Net 2.0 Dictionary, just use it like any other LINQ. It implements the IEnumerable<T> interface so it can be used by LINQ. Note that when you iterate through the dictionary using LINQ, the elements will not be in "order" and you will have to use a KeyValuePair<TKey, TValue>.
First, I apologize if this is not an appropriate venue to ask this question, but I wasn't really sure where else to get input from.
I have created an early version of a .NET object persistence library. Its features are:
A very simple interface for persistence of POCOs.
The main thing: support for just about every conceivable storage medium. This would be everything from plain text files on the local filesystem, to embedded systems like SQLite, any standard SQL server (MySQL, postgres, Oracle, SQL Server, whatever), to various NoSQL databases (Mongo, Couch, Redis, whatever). Drivers could be written for nearly anything, so for instance you could fairly easily write a driver where the actual backing store could be a web-service.
When I first had this idea I was convinced it was totally awesome. I quickly created an initial prototype. Now, I'm at the 'hard part' where I am debating issues like connection pooling, thread safety, and debating whether to try to support IQueryable for LINQ, etc. And I'm taking a harder look at whether it is worthwhile to develop this library beyond my own requirements for it.
Here is a basic example of usage:
var to1 = new TestObject { id = "fignewton", number = 100, FruitType = FruitType.Apple };
ObjectStore db = new SQLiteObjectStore("d:/objstore.sqlite");
db.Write(to1);
var readback = db.Read<TestObject>("fignewton");
var readmultiple = db.ReadObjects<TestObject>(collectionOfKeys);
The querying interface that works right now looks like:
var appleQuery = new Query<TestObject>().Eq("FruitType", FruitType.Apple).Gt("number",50);
var results = db.Find<TestObject>(appleQuery);
I am also working on an alternative query interface that lets you just pass in something very like a SQL WHERE clause. And obviously, in the NET world it would be great to support IQueryable / expression trees.
Because the library supports many storage mediums with disparate capabilities, it uses attributes to help the system make the best use of each driver.
[TableName("AttributeTest")]
[CompositeIndex("AutoProperty","CreatedOn")]
public class ComplexTypesObject
{
[Id]
public string id;
[QueryableIndexed]
public FruitType FruitType;
public SimpleTypesObject EmbeddedObject;
public string[] Array;
public int AutoProperty { get; set; }
public DateTime CreatedOn = DateTime.Now;
}
All of the attributes are optional, and are basically all about performance. In a simple case you don't need any of them.
In a SQL environment, the system will by default take care of creating tables and indexes for you, though there is a DbaSafe option that will prevent the system from executing DDLs.
It is also fun to be able to migrate your data from, say, a SQL engine to MongoDB in one line of code. Or to a zip file. And back again.
OK, The Question:
The root question is "Is this useful?" Is it worth taking the time to really polish, make thread-safe or connection pooled, write a better query interface, and upload somewhere?
Is there another library already out there that already does something like this, NAMELY, providing a single interface that works across multiple data sources (beyond just different varieties of SQL)?
Is it solving a problem that needs to be solved, or has someone else already solved it better?
If I proceed, how do you go about trying to make your project visible?
Obviously this isn't a replacement for ORMs (and it can co-exist with ORMs, and coexist with your traditional SQL server). I guess its main use cases are for simple persistence where an ORM is overkill, or for NoSQL type scenarios and where a document-store type interface is preferable.
My advice: Write it for your own requirements and then open-source it. You'll soon find out if there's a market for it. And, as a bonus, you'll find that other people will tell you which bits need polishing; there's a very high chance they'll polish it for you.
Ben, I think it's awesome. At the very least post it to CodePlex and share with the rest of the world. I'm quite sure there are developers out there who can use an object persistence framework (or help polish it up).
For what its worth I think its a great idea.
But more importantly, you've chosen a project (in my opinion) that will undoubtedly improve your code construction and design chops. It is often quite difficult to find projects that both add value while improving your skills.
At least complete it to your initial requirents and then open source it. Anything after that it is a bonus!
While I think the idea is intriguing, and could be useful, I am not sure what long-term value it may hold. Given the considerable advances with EF v4 recently, including things like Code-Only, true POCO support, etc. achieving what you are talking about is actually not that difficult with EF. I am a true believer in Code-Only these days, as it is simple, powerful, and best of all, compile-time checked.
The idea about supporting any kind of data store is intriguing, and something that is worth looking into. However, I think it might be more useful, and reach a considerably broader audience, if you implemented store providers for EF v4, rather than trying to reinvent the wheel that Microsoft has now spent years on. Small projects more often than not grow...and things like pooling, thread safety, LINQ/IQueryable support, etc. become more important...little by little, over time.
By developing EF data store providers for things like SqLite, MongoDB, Xml files or flat files, etc. you add to the capabilities of an existing, familiar, accessible framework, without requiring people to learn an additional one.
Is LINQ a new feature in .NET 4.0, unsupported in older versions like .NET 3.5? What is it useful for? It seems to be able to build Expression Trees. What is an Expression Tree, actually? Is LINQ able to extract info like class, method and field from a C# file?
Can someone provide me a working piece of code to demonstrate what LINQ can do?
Linq was added in .Net 3.5 (and added to the c# 3.0 compiler as well as in slightly limited form to the VB.net compiler in the same release)
In is language integrated query, though it covers many complex additions to both the language and the runtime in order to achieve this which are useful in and of themselves.
The Expression functionality is simply put the ability for a program, at runtime, inspect the abstract syntax of certain code constructs passed around. These are called lambdas. And are, in essence a way of writing anonymous functions more easily whilst making runtime introspection of their structure easier.
The 'SQL like' functionality Linq is most closely associated with (though by no means the only one) is called Linq to Sql where by something like this:
from f in Foo where s.Blah == "wibble" select f.Wobble;
is compiled into a representation of this query, rather than simply code to execute the query. The part that makes it linq to sql is the 'backend' which converts it into sql. For this the expression is translated into sql server statements to execute the query against a linked database with mapping from rows to .net objects and conversion of the c# logic into equivalent where clauses. You could apply exactly the same code if Foo was a collection of plain .net objects (at which point it is "Linq to objects") the conversion of the expression would then be to straight .Net code.
The lambda above written in the language integrated way is actually the equivalent of:
Foo.Where(f => f.Blah == "wibble).Select(f => f.Wobble);
Where Foo is a typed collection. For databases classes are synthesized to represent the values in the database to allow this to both compile, and to allow round tripping values from the sql areas to the .net areas and vice versa.
The critical aspect of the Language Integrated part of Linq is that the resulting language constructs are first class parts of the resulting code. Rather than simply resulting in a function they provide the way the function was constructed (as an expression) so that other aspects of the program can manipulate it.
Consumers of this functionality may simply chose to run it (execute the function which the lambda is compiled to) or to ask for the expression which describes it and then do something different with it.
Many aspects of what makes this possible are placed under the "Linq" banner despite not really being Linq themsleves.
For example anonymous types are required for easy use of projection (choosing a subset of the possible properties) but anonymous types can be used outside of Linq as well.
Linq, especially via the lambdas (which make writing anonymous delegates very lightweight in terms of syntax) has lead to an increase in the functional capabilities of c#. this is reinforced by the extension methods on IEnumerable<T> like Select(), corresponding to map in many function languages and Where() corresponding to filter. Like the anonymous types this is not in and of itself "Linq" though is viewed by many as a strongly beneficial effect on c# development (this is not a universal view but is widely held).
For an introduction to Linq from microsoft read this article
For an introduction to how to use Linq-to-Sql in Visual Studio see this series from Scott Guthrie
For a guide to how you can use linq to make plain c# easier when using collections read this article
Expressions are a more advanced topic, and understanding of them is entirely unecessary to use linq, though certain 'tricks' are possible using them.
In general you would care about Expressions only if you were attempting to write linq providers which is code to take an expression rather than just a function and use that to do something other than what the plain function would do, like talk to an external data source.
Here are some Linq Provider examples
A multi part guide to implementing your own provider
The MDSN documentation for the namespace
Other uses would be when you wish to get some meta data about what the internals of the function is doing, perhaps then compiling the expression (resulting in a delegate which will allow you to execute the expression as a function) and doing something with it or just looking at the metadata of the objects to do reflective code which is compile time verified as this answer shows.
One area of this question that hasn't been covered yet is expression trees. There is a really good article on expression trees (and lambda expression) available here.
The other important thing to bring up about expression trees is that by building an expression tree to define what you are going to do, you don't have to actually do anything. I am referring to deferred execution.
//this code will only build the expression tree
var itemsInStock = from item in warehouse.Items
where item.Quantity > 0;
// this code will cause the actual execution
Console.WriteLine("Items in stock: {0}", itemsInStock.Count());
LINQ was introduced with .NET 3.5. This site has a lot of examples.
System.Linq.Expressions is for hand building (or machine generating) expression trees. I have a feeling that given the complexity of building more complicated functionality that this namespace is under used. However it is exceedingly powerful. For instance one of my co workers recently implemented an expression tree that can auto scale any LINQ to SQL object using a cumultive density function. Every column gets its own tree that gets compiled so its fast. I have been building a specialized compiler that uses them extensively to implement basic functionality as well as glue the rest of the generated code together.
Please see this blog post for more information and ideas.
LINQ is a .NET 3.5 feature with built-in language support from C# 3.0 and Visual Basic 2008. There are plenty of examples on MSDN.