How to generate antlr a g4 parser and lexer in code?

How to generate antlr a g4 parser and lexer in code? - c#

Is it possible to generate the antlr lexer and parser (from a given g4 grammar) directly within the code be it with the Antlr 4 runtime directly from the Python or C# code?
I think it would be much more convenient that calling the external tool everytime I need.
[EDIT]
It looks that I am looking for something similar to an in memory antlr feature with C# or Python:
https://stackoverflow.com/a/38053163/4636721
How to create AST with ANTLR4?

The code to parse ANTLR4 grammars and convert them to an ATN + generating the target files is written in Java. This tool code is not translated to the target language (only the runtime is), so it is not possible to do the same job in other languages. That inmenantlr project only uses the Java code from ANTLR4 in its own Java code to do the same thing, except for the need to run it as an external jar.
The only way to make your wish possible would be to translate all the tool code also to the target language.
However, depending on your needs there's a way to generate a parser interpreter for your target language. I have done this in my vscode-antlr4 extension, where users can debug their ANTLR4 grammars. For that I added an export feature of the data required for the interpreter to ANTLR4 (it's available there since 4.7.2). This data can then be used to set up the lexer + parser interpreters (which are translated to the target language) to parse a file with that grammar. These interpreters use the same prediction engine as the generated parsers, but do not keep parse contexts, variables etc.

Related

.NET TypeScript parser to AST

I'm looking for TypeScript parser which produces AST (Abstract Syntax Tree) from TypeScript code, like code created with Visual Studio.
I think Visual Studio must have such parser since it uses it for code intelligence.
I know I can compile TS to JS and then use like Jint to produce AST, but it's no good for me. I need strict relation between AST nodes and original lines in TS source.
Is there a way to put my hands on a VS / Windows dll to get AST, or maybe there is a library providing such functionality? I've done some research and all I found was very incomplete and limited.
There is a Microsoft TypeScript compiler written in TypeScript, but how to use it from C#? Would it be fast enough to parse edited code in real-time?
For the sake of clarification: I need the parser written in C# or in C++ with C# bindings. Or... OK, it could be written in any language, but accessible from the level of C# code. I'm afraid I'll have to write my own parser, but I don't want to reinvent the wheel.
The point is I want to visualize the code. I do not want the code to be executed from C#. I only want to see its structure and it has to be accurate, no missing elements.
Most parsers / compilers I've seen had thousands LOC written in solely purpose of executing scripts. They covered very limited subset of the language syntax. I need just the opposite. No running, but full syntax. Without control structures, they are irrelevant to my visualization. All I need from AST are function declarations and object definition declarations.
I know there is a parser / compiler of almost every imaginable language written in JavaScript, but are there any good written in C#?

I'm looking for TypeScript parser which produces AST (Abstract Syntax Tree) from TypeScript code, like code created with Visual Studio.
Checkout this section : http://basarat.gitbooks.io/typescript/content/docs/compiler/parser.html
Here is a code sample to print out the AST:
import * as ts from "ntypescript";
function printAllChildren(node: ts.Node, depth = 0) {
console.log(new Array(depth + 1).join('----'), ts.syntaxKindToName(node.kind), node.pos, node.end);
depth++;
node.getChildren().forEach(c=> printAllChildren(c, depth));
}
var sourceCode = `
var foo = 123;
`.trim();
var sourceFile = ts.createSourceFile('foo.ts', sourceCode, ts.ScriptTarget.ES5, true);
printAllChildren(sourceFile);

Lua AST within C#

What is the easiest way to get an abstract-syntax-tree within C# from a Lua script? I'm trying to do a simple static code analysis within C# for a Lua script.
Many existing code analysis tools like LuaInspect are based on MetaLua, but I don't see an easy way to integrate MetaLua within C#. And projects like Lua for Irony seem to be in an alpha stage or the development has stopped years ago.
What would be your suggestion to get an AST for LUA within C# for static code analysis?

You might want to try monosharp: http://www.moonsharp.org/
It uses ANTLR to build a C# AST. You will probably have to do some code spelunking though, since its main objective isn't "creating an AST" but actually "interpreting Lua directly in C#".

If you can run Lua code, nothing should stop you from integrating LuaInspect and MetaLua. The earlier versions of MetaLua required some manual work to avoid compilation steps, but the recent versions (0.7+) don't require much work. The generation of AST is as simple as:
require('metalua.compiler').new():src_to_ast(src, filename)
Note that LuaInspect hasn't been updated to support lineinfo format changes in the recent metalua versions, so if you plan on using it with ML 0.7+, you may review these changes I made to make it work in the Lua IDE I'm working on.

Parsing C Header Files in C#

I'm working with Visual Studio C#, and I need to parse C header files to extract information only about the function declarations contained within. For each function I need the name, return type, and its parameters. If possible, I'd like the parameters in the order in which they appear in the function declaration.
I've seen stuff online about using visual studios tags, or Exhuberant Ctags, etc. But from what I gathered those aren't really options that let me perform the parse from my C# program with C# code (I may be mistaken?). I've also looked through all the other answers to related questions but they don't seem really apply to my situation (I may just be dumb).
If I could at least get all the lines of code that represent function declarations I'd have a good start and could hand-parse the rest myself.
Thanks in advance

To "parse" C (header) files in a deep sense and pick up the type information for function declarations, in practice you need:
a full preprocessor (including the pecaddillos added by the vendor, MS has some pretty odd stuff in their headers),
a full (syntax) parser/AST builder for the C dialect of interest (there's no such thing as "C"; there is what the vendor offers in this revision of the compiler)
a full symbol table construction (because typedefs are aliases for the actual types of interest)
Many people will suggest "write your own parser (for C)". Mostly those people haven't done this; its a lot more work to do this and get it right than they understand. If you don't start with a production-level machinery, you won't get through real C header files without fixing it all.
Just parsing plain C is hard; consider the problem of parsing the ambiguous phrase
T*X;
A classic parser cannot parse this without additional hackery.
You will also not be able to parse a C header file by itself, in general. You need to have the source code context (often including the compiler command line) in which it is included, or typedefs, preprocessor conditionals and macros in a specific header file will be undefined and therefore unexpandable into the valid C that the compiler normally sees.
You are better off getting pre-existing pre-tested machinery that will do this for you. Clang comes to mind as an option, although I'm not sure it handles the MS header files. GCC is kind of an option, but it really, really wants to be a compiler, not your local friendly C source code analysis tool, and again I'm unsure of its support for MS dialects of C. Our DMS Software Reengineering Toolkit has all of the above for various MS dialects of C.
Having chosen a tool that can actually parse such headers, you'll likely want to do something with the collected header information. You are vague about what you want to accomplish. Having mentioned C# and C in the same breath, there's a hint that you want to call C programs from C# code, and thus need to generate C# equivalent APIs for the C code. For this you will need machinery to manipulate the type information provided, and to build the "text" for the C# declarations. For this, you are likely to find that you need other supporting tooling to do that part, too. Here GCC is a complete non-starter; it will offer you no additional help. Clang and DMS are both designed to be libraries of custom-tool building machinery.
Of course, this may all be moot depending on how much header file text you want to handle; it if is just one header file, doing it manually is probably easiest. You suggest you are willing to do that ("could hand-parse..."). In that case, all you really need to do is to run the preprocessor and interpret the output. I beleive you can do with command line switches for GCC and Clang and even the MS compilers; I know DMS can do this. For easily avialable options here, see How do I see a C/C++ source file after preprocessing in Visual Studio?

Need to construct a XML representation for C# code

I need to convert C# code to an equivalent XML representation.
I plan to convert the C# code (C# 2.0 code snippets, no generics or nullable types) to an AST and then convert the AST to XML.
Looking for a simple lexer/parser for C# which outputs an AST.
Any pointers on converting C# code to an XML representation (which can be converted back to C#) would also be very helpful.
Kind regards,

MinosseCC: a lexer/parser generator for C#
Also SO questions:
Parser-generator that outputs C# given a BNF grammar? which suggests using ANTLR
Translate C# code into AST?
C# String to Expression Tree
Developing a simple parser

As Mitch says, Antlr can be your solution. You can transform Antlr's AST output depending on your needs and then serialize it with xstream. That's the approach I'm using in my bs project, If anyone knows a better way It'll be great for me aswell.
You can find csharp grammar samples like for example http://www.antlr.org/grammar/1127720913326/tkCSharp.g or http://www.antlr.org/grammar/1151612545460/CSharpParser.g but you might have to adapt it to ANTLRV3 or to your own needs.

Our DMS Software Reengineering Toolkit is an ecosystem for building code analyzers and transformers. DMS is parameterized by a language definition, and has language definitions for C#, Java, C++, C, PL/SQL, PHP, JavaScript, COBOL and a variety of other langauges. When DMS parses according to a langauge definition, it automatically builds an AST. An AST library provided by DMS can print the tree in Lisp-like parenthesized form, or in XML format.
Rather than convert XML back into source code, DMS can regenerated the source code directly from the AST. DMS also provides source-to-source transformations to allow manipulation of the AST.

What is TinyPG and how does it work?

What is TinyPG and how does it work? I know its a "compiler-compiler" but how do I get started and create my own compiler in C#?

I've understood approximately how you use it, and here's a brief.
TinyPG is a complete compiler-compiler IDE, with a Windows GUI for RegExp, EBNF and C#/VB. The following outlines the procedure of developing your own "compiler" within TinyPG:
You define Terminals using Regular Expressions.
You write these ReyExps within TinyPG, which basically extracts tokens from the input source code.
RegExps are natively supported in .NET which means that even your generated "compiler" code uses .NET's RegExps.
You define Non-terminals and parser rules in Extended BNF meta-syntax.
You write EBNF within TinyPG, to describe the language of your choice.
Some free BNF Grammers that describe modern programming languages.
You define the compiler in Managed code.
You write C#/VB code within TinyPG, to convert the tokens into an output of your choice.
One C#/VB code block per BNF grammer rule only.
TinyPG can compile and run your "tokenizer + parser + complier" using the commandline compiler.
TinyPG generates C# code for your new "compiler".
Generates a parse-tree from inputted source code, using your RegEx along with your EBNF.
Translates this parse-tree into an output, using your C#/VB code.
You develop the front-end of your compiler in C# or VB.NET.
A basic front end would invoke the generated C# classes with an input file, and display the output.
To begin, you can open the "simple expression2.tpg" file within the provided Samples of TinyPG to see a demo of a calculator "compiler".

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.