Translate C# code into AST?

Translate C# code into AST? - c#

Is it currently possible to translate C# code into an Abstract Syntax Tree?
Edit: some clarification; I don't necessarily expect the compiler to generate the AST for me - a parser would be fine, although I'd like to use something "official." Lambda expressions are unfortunately not going to be sufficient given they don't allow me to use statement bodies, which is what I'm looking for.

The Roslyn project is in Visual Studio 2010 and gives you programmatic access to the Syntax Tree, among other things.
SyntaxTree tree = SyntaxTree.ParseCompilationUnit(
#" C# code here ");
var root = (CompilationUnitSyntax)tree.Root;

Is it currently possible to translate C# code into an Abstract Syntax Tree?
Yes, trivially in special circumstances (= using the new Expressions framework):
// Requires 'using System.Linq.Expressions;'
Expression<Func<int, int>> f = x => x * 2;
This creates an expression tree for the lambda, i.e. a function taking an int and returning the double. You can modify the expression tree by using the Expressions framework (= the classes from in that namespace) and then compile it at run-time:
var newBody = Expression.Add(f.Body, Expression.Constant(1));
f = Expression.Lambda<Func<int, int>>(newBody, f.Parameters);
var compiled = f.Compile();
Console.WriteLine(compiled(5)); // Result: 11
Notice that all expressions are immutable so they have to be built anew by composition. In this case, I've prepended an addition of 1.
Notice that these expression trees only work on real expressions i.e. content found in a C# function. You can't get syntax trees for higher constructs such as classes this way. Use the CodeDom framework for these.

Check out .NET CodeDom support. There is an old article on code project for a C# CodeDOM parser, but it won't support the new language features.
There is also supposed to be support in #develop for generating a CodeDom tree from C# source code according to this posting.

There is much powerful than R# project.
Nemerle.Peg:
https://code.google.com/p/nemerle/source/browse/nemerle/trunk/snippets/peg-parser/
And it has C# Parser which parsers all C# code and translates it to AST !
https://code.google.com/p/nemerle/source/browse/nemerle/trunk/snippets/csharp-parser/
You can download installer here: https://code.google.com/p/nemerle/

Personally, I would use NRefactory, which is free, open source and gains popularity.

It looks like this sort of functionality will be included with whatever comes after C# 4, according to Anders Hejlsberg's 'Future of C#' PDC video.

The ANTLR Parser Generator has a grammar for C# 3.0 which covers everything except for LINQ syntax.

ANTLR is not very useful. LINQ is not what you want.
Try Mono.Cecil! http://www.mono-project.com/Cecil
It is used in many projects, including NDepend! http://www.ndepend.com/

I've just answered on another thread here at StackOverflow a solution where I implemented an API to create and manipulate AST from C# Source Code

Our C# front end for DMS parses full C# 3.0 including LINQ and produces ASTs. DMS in fact is an ecosystem for analyzing/transforming source code using ASTs for front-end provided langauges.
EDIT 3/10/2010: ... Now handles full C# 4.0
EDIT: 6/27/2014: Handles C# 5.0 since quite awhile.
EDIT: 6/15/2016: Handles C# 6.0. See https://stackoverflow.com/a/37847714/120163 for a sample AST.

Please see the R# project (sorry the docs are in Russian, but there are some code examples). It allows AST manipulations on C# code.
http://www.rsdn.ru/projects/rsharp/article/rsharp_mag.xml
Project's SVN is here: (URL updated, thanks, derigel)
Also please see the Nemerle language. It is a .Net language with strong support for metaprogramming.

It is strange that nobody suggested hacking the existing Mono C# compiler.

Related

.NET TypeScript parser to AST

I'm looking for TypeScript parser which produces AST (Abstract Syntax Tree) from TypeScript code, like code created with Visual Studio.
I think Visual Studio must have such parser since it uses it for code intelligence.
I know I can compile TS to JS and then use like Jint to produce AST, but it's no good for me. I need strict relation between AST nodes and original lines in TS source.
Is there a way to put my hands on a VS / Windows dll to get AST, or maybe there is a library providing such functionality? I've done some research and all I found was very incomplete and limited.
There is a Microsoft TypeScript compiler written in TypeScript, but how to use it from C#? Would it be fast enough to parse edited code in real-time?
For the sake of clarification: I need the parser written in C# or in C++ with C# bindings. Or... OK, it could be written in any language, but accessible from the level of C# code. I'm afraid I'll have to write my own parser, but I don't want to reinvent the wheel.
The point is I want to visualize the code. I do not want the code to be executed from C#. I only want to see its structure and it has to be accurate, no missing elements.
Most parsers / compilers I've seen had thousands LOC written in solely purpose of executing scripts. They covered very limited subset of the language syntax. I need just the opposite. No running, but full syntax. Without control structures, they are irrelevant to my visualization. All I need from AST are function declarations and object definition declarations.
I know there is a parser / compiler of almost every imaginable language written in JavaScript, but are there any good written in C#?

I'm looking for TypeScript parser which produces AST (Abstract Syntax Tree) from TypeScript code, like code created with Visual Studio.
Checkout this section : http://basarat.gitbooks.io/typescript/content/docs/compiler/parser.html
Here is a code sample to print out the AST:
import * as ts from "ntypescript";
function printAllChildren(node: ts.Node, depth = 0) {
console.log(new Array(depth + 1).join('----'), ts.syntaxKindToName(node.kind), node.pos, node.end);
depth++;
node.getChildren().forEach(c=> printAllChildren(c, depth));
}
var sourceCode = `
var foo = 123;
`.trim();
var sourceFile = ts.createSourceFile('foo.ts', sourceCode, ts.ScriptTarget.ES5, true);
printAllChildren(sourceFile);

C#'s LINQ and .NET Framework, which one depends on the other?

I think of C# language compiler as a self contained black box capable of understanding text of a certain syntax and producing compiled code. On the other hand .NET framework is a massive library that contains functionality written partly by C# and partly by C++. So .NET framework depends on C# language, not the other way around.
But I cannot fit this into how LINQ works. LINQ queries are text of a particular syntax that C# compiler can understand. But to build by own LINQ provider I need to work with interfaces like IQueryable and IQueryProvider both of which are defined in System.Linq namespace of the framework.
Does that mean a functionality C# language offers is dependent on a part of .NET framework? Does C# language know about .NET framework?

.NET Framework contains of many pieces. One of the most important is CLR — Common Language Runtime. All .NET languages depend on it, C# included, because they produce IL-code which cannot be executed by machine processor. Instead, CLR executes it.
And there is also Base Class Library, BCL, which is available to use for every .NET language: C#, VB.NET, Managed C++, F#, IronRuby, you name it. I doubt it was written in C#. It doesn't depend on any features of those languages, because classes and OOP are built in CLR.
So, yes, C# language knows about .NET framework, it absolutely must know about it. Think about IEnumerable: to compile foreach into GetEnumerator(), and MoveNext() calls, C# compiler has to know that, well, IEnumerable exists. And is somewhat special.
Or think about attributes! C# compiler has the intrinsic knowledge about what methods Attribute interface provides.
But CLR itself doesn't know anything about C#. At all.

LINQ queries are text of a particular syntax that C# compiler can understand.
Well, query expressions are - but the compiler doesn't really "understand" them. It just translates them in a pretty mechanical manner. For example, take this query:
var query = from foo in bar
where foo.X > 10
select foo.Y;
That is translated into:
var query = bar.Where(foo => foo.X > 10)
.Select(foo => foo.Y);
The compiler doesn't know anything about what Where and Select mean here. They don't even have to be methods - if you had appropriate fields or properties of delegate types, the compiler would be fine with it. Basically, if the second form will compile, so will the query expression.
Most LINQ providers use extension methods to provide these methods (Where, Select, SelectMany etc). Again, they're just part of the C# language - the compiler doesn't know or care what the extension methods do.
For more details about how query expressions are translated, see part 41 of my Edulinq blog series. You may find the rest of my Edulinq series informative, too - it's basically a series of blog posts in which I reimplement LINQ to Objects, one method at a time. Again, this demonstrates that the C# compiler doesn't rely on the LINQ implementation being in the System.Linq namespace, or anything like that.

Parser for the Mathematica syntax?

Is there a built parser that I can use from C# that can parse mathematica expressions?
I know that I can use the Kernel itself to parse an expression, and use .NET/Link to retrieve the tree structure... But I'm looking for something that doesnt rely on the Kernel.

My matheclipse-parser module implements a parser in Java which can parse a big subset of mathematica expressions. See the readme.md page for usage. Maybe you can port the parser to C#?

The mathematica grammar isn't well documented, true. But AFAIK, it is
LALR(1) and likely LL(1); the bracketed /tagged syntax from gives the parser complete clues
about what to expect next, just like LISP and XML.
The DMS Software Reengineering Toolkit does have a Mathematica grammar that has been used for real tasks.
This includes MMa programs as well as pure expression forms.
That probably doesn't help you, since you want one in C#.
If you have access to the Kernal, I'd stick to that.

I don't think such a thing exists already (I'd love to know about it). But it may be useful that within Mathematica you can apply the function FullForm to any expression and get something very easy to parse, kind of like an s-expression in Lisp. For example,
FullForm[a+b*c]
yields
Plus[a, Times[b,c]]
That's the underlying representation of all Mathematica expressions and should be straightforward to parse.

What is System.Linq.Expressions in C# used for?

Is LINQ a new feature in .NET 4.0, unsupported in older versions like .NET 3.5? What is it useful for? It seems to be able to build Expression Trees. What is an Expression Tree, actually? Is LINQ able to extract info like class, method and field from a C# file?
Can someone provide me a working piece of code to demonstrate what LINQ can do?

Linq was added in .Net 3.5 (and added to the c# 3.0 compiler as well as in slightly limited form to the VB.net compiler in the same release)
In is language integrated query, though it covers many complex additions to both the language and the runtime in order to achieve this which are useful in and of themselves.
The Expression functionality is simply put the ability for a program, at runtime, inspect the abstract syntax of certain code constructs passed around. These are called lambdas. And are, in essence a way of writing anonymous functions more easily whilst making runtime introspection of their structure easier.
The 'SQL like' functionality Linq is most closely associated with (though by no means the only one) is called Linq to Sql where by something like this:
from f in Foo where s.Blah == "wibble" select f.Wobble;
is compiled into a representation of this query, rather than simply code to execute the query. The part that makes it linq to sql is the 'backend' which converts it into sql. For this the expression is translated into sql server statements to execute the query against a linked database with mapping from rows to .net objects and conversion of the c# logic into equivalent where clauses. You could apply exactly the same code if Foo was a collection of plain .net objects (at which point it is "Linq to objects") the conversion of the expression would then be to straight .Net code.
The lambda above written in the language integrated way is actually the equivalent of:
Foo.Where(f => f.Blah == "wibble).Select(f => f.Wobble);
Where Foo is a typed collection. For databases classes are synthesized to represent the values in the database to allow this to both compile, and to allow round tripping values from the sql areas to the .net areas and vice versa.
The critical aspect of the Language Integrated part of Linq is that the resulting language constructs are first class parts of the resulting code. Rather than simply resulting in a function they provide the way the function was constructed (as an expression) so that other aspects of the program can manipulate it.
Consumers of this functionality may simply chose to run it (execute the function which the lambda is compiled to) or to ask for the expression which describes it and then do something different with it.
Many aspects of what makes this possible are placed under the "Linq" banner despite not really being Linq themsleves.
For example anonymous types are required for easy use of projection (choosing a subset of the possible properties) but anonymous types can be used outside of Linq as well.
Linq, especially via the lambdas (which make writing anonymous delegates very lightweight in terms of syntax) has lead to an increase in the functional capabilities of c#. this is reinforced by the extension methods on IEnumerable<T> like Select(), corresponding to map in many function languages and Where() corresponding to filter. Like the anonymous types this is not in and of itself "Linq" though is viewed by many as a strongly beneficial effect on c# development (this is not a universal view but is widely held).
For an introduction to Linq from microsoft read this article
For an introduction to how to use Linq-to-Sql in Visual Studio see this series from Scott Guthrie
For a guide to how you can use linq to make plain c# easier when using collections read this article
Expressions are a more advanced topic, and understanding of them is entirely unecessary to use linq, though certain 'tricks' are possible using them.
In general you would care about Expressions only if you were attempting to write linq providers which is code to take an expression rather than just a function and use that to do something other than what the plain function would do, like talk to an external data source.
Here are some Linq Provider examples
A multi part guide to implementing your own provider
The MDSN documentation for the namespace
Other uses would be when you wish to get some meta data about what the internals of the function is doing, perhaps then compiling the expression (resulting in a delegate which will allow you to execute the expression as a function) and doing something with it or just looking at the metadata of the objects to do reflective code which is compile time verified as this answer shows.

One area of this question that hasn't been covered yet is expression trees. There is a really good article on expression trees (and lambda expression) available here.
The other important thing to bring up about expression trees is that by building an expression tree to define what you are going to do, you don't have to actually do anything. I am referring to deferred execution.
//this code will only build the expression tree
var itemsInStock = from item in warehouse.Items
where item.Quantity > 0;
// this code will cause the actual execution
Console.WriteLine("Items in stock: {0}", itemsInStock.Count());

LINQ was introduced with .NET 3.5. This site has a lot of examples.

System.Linq.Expressions is for hand building (or machine generating) expression trees. I have a feeling that given the complexity of building more complicated functionality that this namespace is under used. However it is exceedingly powerful. For instance one of my co workers recently implemented an expression tree that can auto scale any LINQ to SQL object using a cumultive density function. Every column gets its own tree that gets compiled so its fast. I have been building a specialized compiler that uses them extensively to implement basic functionality as well as glue the rest of the generated code together.
Please see this blog post for more information and ideas.

LINQ is a .NET 3.5 feature with built-in language support from C# 3.0 and Visual Basic 2008. There are plenty of examples on MSDN.

New vb.net enhancements?

I've been hearing/reading a lot about the new language enhancements for C# 4. I'm a little curious if these same enhancements are also going to be applied to VB as well, or what. Does anyone know where I can get some insight here? With all the new changes happening to C#, it seems like there will very little reason left to be using VB unless you happen to like the syntax. Are there enhancements that MS isn't making to VB this time that are getting included in C#, or visa versa?

I'd actually overlook the dismissal of VB.Net by Lou Franco. Checkout Panopticon Central:
http://www.panopticoncentral.net/archive/2008/10/31/24803.aspx
http://www.panopticoncentral.net/archive/2008/10/29/24764.aspx
For example:
Then Lucian did a really wonderful
demo of VB 10.0, which is shipping in
Visual Studio 2010. He showed (IIRC)
the following features that should be
familiar to the readers of this blog:
array literals, collection
initializers, automatic properties,
implicit line continuations, statement
lambdas, generic variance, and a
feature that embeds primary interop
assembly types in your assembly so you
don’t have to deploy the PIA. I may
have missed some, so check out the
video when it’s posted!

Some of the changes to C# (e.g Named Optional Parameters) were already in VB. The main strength of VB.NET over C# was Office/COM integration, and the new C# is addressing that.
If you need to target an older .NET version, VB.NET will still be the one to use if you need these features.

Something still missing from C# that vb.net has had a little while: xml literals. But this isn't exactly new.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.