.NET TypeScript parser to AST - c#

I'm looking for TypeScript parser which produces AST (Abstract Syntax Tree) from TypeScript code, like code created with Visual Studio.
I think Visual Studio must have such parser since it uses it for code intelligence.
I know I can compile TS to JS and then use like Jint to produce AST, but it's no good for me. I need strict relation between AST nodes and original lines in TS source.
Is there a way to put my hands on a VS / Windows dll to get AST, or maybe there is a library providing such functionality? I've done some research and all I found was very incomplete and limited.
There is a Microsoft TypeScript compiler written in TypeScript, but how to use it from C#? Would it be fast enough to parse edited code in real-time?
For the sake of clarification: I need the parser written in C# or in C++ with C# bindings. Or... OK, it could be written in any language, but accessible from the level of C# code. I'm afraid I'll have to write my own parser, but I don't want to reinvent the wheel.
The point is I want to visualize the code. I do not want the code to be executed from C#. I only want to see its structure and it has to be accurate, no missing elements.
Most parsers / compilers I've seen had thousands LOC written in solely purpose of executing scripts. They covered very limited subset of the language syntax. I need just the opposite. No running, but full syntax. Without control structures, they are irrelevant to my visualization. All I need from AST are function declarations and object definition declarations.
I know there is a parser / compiler of almost every imaginable language written in JavaScript, but are there any good written in C#?

I'm looking for TypeScript parser which produces AST (Abstract Syntax Tree) from TypeScript code, like code created with Visual Studio.
Checkout this section : http://basarat.gitbooks.io/typescript/content/docs/compiler/parser.html
Here is a code sample to print out the AST:
import * as ts from "ntypescript";
function printAllChildren(node: ts.Node, depth = 0) {
console.log(new Array(depth + 1).join('----'), ts.syntaxKindToName(node.kind), node.pos, node.end);
depth++;
node.getChildren().forEach(c=> printAllChildren(c, depth));
}
var sourceCode = `
var foo = 123;
`.trim();
var sourceFile = ts.createSourceFile('foo.ts', sourceCode, ts.ScriptTarget.ES5, true);
printAllChildren(sourceFile);

Related

How to generate antlr a g4 parser and lexer in code?

Is it possible to generate the antlr lexer and parser (from a given g4 grammar) directly within the code be it with the Antlr 4 runtime directly from the Python or C# code?
I think it would be much more convenient that calling the external tool everytime I need.
[EDIT]
It looks that I am looking for something similar to an in memory antlr feature with C# or Python:
https://stackoverflow.com/a/38053163/4636721
How to create AST with ANTLR4?
The code to parse ANTLR4 grammars and convert them to an ATN + generating the target files is written in Java. This tool code is not translated to the target language (only the runtime is), so it is not possible to do the same job in other languages. That inmenantlr project only uses the Java code from ANTLR4 in its own Java code to do the same thing, except for the need to run it as an external jar.
The only way to make your wish possible would be to translate all the tool code also to the target language.
However, depending on your needs there's a way to generate a parser interpreter for your target language. I have done this in my vscode-antlr4 extension, where users can debug their ANTLR4 grammars. For that I added an export feature of the data required for the interpreter to ANTLR4 (it's available there since 4.7.2). This data can then be used to set up the lexer + parser interpreters (which are translated to the target language) to parse a file with that grammar. These interpreters use the same prediction engine as the generated parsers, but do not keep parse contexts, variables etc.

Generating doxygen comments for swig-generated C# that wraps C++

I have a project written in C++ where I'm using swig to generate some C# wrappers as well. The C++ code uses Doxygen style comments to annotate the classes and functions. Is it possible to get Swig to take those doxygen comments and produce doxygen comments for the C# wrapper classes and functions?
Currently, SWIG does not parse code comments including Doxygen documentation at all.
There is a SWIG branch in development since a couple of years to enable SWIG to deal with Doxygen comments, but even that currently (AFAIK) only maps them to Java and Python documentation.
The best option currently is therefore to extract the Doxygen documentation from the C++ source code and insert it into the SWIG generated wrapper. To understand how this can be done, here is a brief explanation of what doxy2swig.py does (and this is indeed meant for python docstrings):
Let Doxygen extract the documentation into its XML format
Parse the XML, and reformat into suitable Python docstrings
Write %feature("docstring") SWIG directives to tell SWIG to attach the docstrings to the wrapped classes and methods.
Basically, something similar can be done for C# as well. I do not know how to do (2) for C#, i.e., how to translate the Doxygen XML output into suitable C# documentation, this you may need to implement yourself (perhaps by modifying the doxy2swig.py script).
For (3) there is a neat trick that is sort of documented here, noting that the same can also be done for C# using the %csclassmodifiers and %csmethodmodifiers. These SWIG feature directives are AFAIK used to prepend either public or protected to C# methods or classes. But they can be hijacked to prepend the extracted documentation (+ the public keyword, not to forget). So they effectively allow the same functionality as the %feature("docstring") directive for Python.
Finally, I don't know C#, but what is the point of having the Doxygen comments included in the C# wrapper? If you only want to use Doxygen to generate documentation, you can do this from the C++ sources directly, so you don't gain anything. In Python, the docstrings can be displayed as help at runtime, and are used by some IDEs. Does C# have this, too?
As of October 2022, the accepted answer from m7thon has become outdated. I have started work in the (public) merge request https://github.com/swig/swig/pull/2421, based on the nice prior work from https://github.com/swig/swig/pull/1695, to add support for doxygen comments for SWIG-generated C#.
The current status in the above MR still has quite significant limitations. Also it has not yet been extensively tested. But it can already achieve basic documentation in C# XML format, and may be a good starting point for people in need of a solution.

Need to construct a XML representation for C# code

I need to convert C# code to an equivalent XML representation.
I plan to convert the C# code (C# 2.0 code snippets, no generics or nullable types) to an AST and then convert the AST to XML.
Looking for a simple lexer/parser for C# which outputs an AST.
Any pointers on converting C# code to an XML representation (which can be converted back to C#) would also be very helpful.
Kind regards,
MinosseCC: a lexer/parser generator for C#
Also SO questions:
Parser-generator that outputs C# given a BNF grammar? which suggests using ANTLR
Translate C# code into AST?
C# String to Expression Tree
Developing a simple parser
As Mitch says, Antlr can be your solution. You can transform Antlr's AST output depending on your needs and then serialize it with xstream. That's the approach I'm using in my bs project, If anyone knows a better way It'll be great for me aswell.
You can find csharp grammar samples like for example http://www.antlr.org/grammar/1127720913326/tkCSharp.g or http://www.antlr.org/grammar/1151612545460/CSharpParser.g but you might have to adapt it to ANTLRV3 or to your own needs.
Our DMS Software Reengineering Toolkit is an ecosystem for building code analyzers and transformers. DMS is parameterized by a language definition, and has language definitions for C#, Java, C++, C, PL/SQL, PHP, JavaScript, COBOL and a variety of other langauges. When DMS parses according to a langauge definition, it automatically builds an AST. An AST library provided by DMS can print the tree in Lisp-like parenthesized form, or in XML format.
Rather than convert XML back into source code, DMS can regenerated the source code directly from the AST. DMS also provides source-to-source transformations to allow manipulation of the AST.

What is TinyPG and how does it work?

What is TinyPG and how does it work? I know its a "compiler-compiler" but how do I get started and create my own compiler in C#?
I've understood approximately how you use it, and here's a brief.
TinyPG is a complete compiler-compiler IDE, with a Windows GUI for RegExp, EBNF and C#/VB. The following outlines the procedure of developing your own "compiler" within TinyPG:
You define Terminals using Regular Expressions.
You write these ReyExps within TinyPG, which basically extracts tokens from the input source code.
RegExps are natively supported in .NET which means that even your generated "compiler" code uses .NET's RegExps.
You define Non-terminals and parser rules in Extended BNF meta-syntax.
You write EBNF within TinyPG, to describe the language of your choice.
Some free BNF Grammers that describe modern programming languages.
You define the compiler in Managed code.
You write C#/VB code within TinyPG, to convert the tokens into an output of your choice.
One C#/VB code block per BNF grammer rule only.
TinyPG can compile and run your "tokenizer + parser + complier" using the commandline compiler.
TinyPG generates C# code for your new "compiler".
Generates a parse-tree from inputted source code, using your RegEx along with your EBNF.
Translates this parse-tree into an output, using your C#/VB code.
You develop the front-end of your compiler in C# or VB.NET.
A basic front end would invoke the generated C# classes with an input file, and display the output.
To begin, you can open the "simple expression2.tpg" file within the provided Samples of TinyPG to see a demo of a calculator "compiler".

Translate C# code into AST?

Is it currently possible to translate C# code into an Abstract Syntax Tree?
Edit: some clarification; I don't necessarily expect the compiler to generate the AST for me - a parser would be fine, although I'd like to use something "official." Lambda expressions are unfortunately not going to be sufficient given they don't allow me to use statement bodies, which is what I'm looking for.
The Roslyn project is in Visual Studio 2010 and gives you programmatic access to the Syntax Tree, among other things.
SyntaxTree tree = SyntaxTree.ParseCompilationUnit(
#" C# code here ");
var root = (CompilationUnitSyntax)tree.Root;
Is it currently possible to translate C# code into an Abstract Syntax Tree?
Yes, trivially in special circumstances (= using the new Expressions framework):
// Requires 'using System.Linq.Expressions;'
Expression<Func<int, int>> f = x => x * 2;
This creates an expression tree for the lambda, i.e. a function taking an int and returning the double. You can modify the expression tree by using the Expressions framework (= the classes from in that namespace) and then compile it at run-time:
var newBody = Expression.Add(f.Body, Expression.Constant(1));
f = Expression.Lambda<Func<int, int>>(newBody, f.Parameters);
var compiled = f.Compile();
Console.WriteLine(compiled(5)); // Result: 11
Notice that all expressions are immutable so they have to be built anew by composition. In this case, I've prepended an addition of 1.
Notice that these expression trees only work on real expressions i.e. content found in a C# function. You can't get syntax trees for higher constructs such as classes this way. Use the CodeDom framework for these.
Check out .NET CodeDom support. There is an old article on code project for a C# CodeDOM parser, but it won't support the new language features.
There is also supposed to be support in #develop for generating a CodeDom tree from C# source code according to this posting.
There is much powerful than R# project.
Nemerle.Peg:
https://code.google.com/p/nemerle/source/browse/nemerle/trunk/snippets/peg-parser/
And it has C# Parser which parsers all C# code and translates it to AST !
https://code.google.com/p/nemerle/source/browse/nemerle/trunk/snippets/csharp-parser/
You can download installer here: https://code.google.com/p/nemerle/
Personally, I would use NRefactory, which is free, open source and gains popularity.
It looks like this sort of functionality will be included with whatever comes after C# 4, according to Anders Hejlsberg's 'Future of C#' PDC video.
The ANTLR Parser Generator has a grammar for C# 3.0 which covers everything except for LINQ syntax.
ANTLR is not very useful. LINQ is not what you want.
Try Mono.Cecil! http://www.mono-project.com/Cecil
It is used in many projects, including NDepend! http://www.ndepend.com/
I've just answered on another thread here at StackOverflow a solution where I implemented an API to create and manipulate AST from C# Source Code
Our C# front end for DMS parses full C# 3.0 including LINQ and produces ASTs. DMS in fact is an ecosystem for analyzing/transforming source code using ASTs for front-end provided langauges.
EDIT 3/10/2010: ... Now handles full C# 4.0
EDIT: 6/27/2014: Handles C# 5.0 since quite awhile.
EDIT: 6/15/2016: Handles C# 6.0. See https://stackoverflow.com/a/37847714/120163 for a sample AST.
Please see the R# project (sorry the docs are in Russian, but there are some code examples). It allows AST manipulations on C# code.
http://www.rsdn.ru/projects/rsharp/article/rsharp_mag.xml
Project's SVN is here: (URL updated, thanks, derigel)
Also please see the Nemerle language. It is a .Net language with strong support for metaprogramming.
It is strange that nobody suggested hacking the existing Mono C# compiler.

Categories