Need to construct a XML representation for C# code - c#

I need to convert C# code to an equivalent XML representation.
I plan to convert the C# code (C# 2.0 code snippets, no generics or nullable types) to an AST and then convert the AST to XML.
Looking for a simple lexer/parser for C# which outputs an AST.
Any pointers on converting C# code to an XML representation (which can be converted back to C#) would also be very helpful.
Kind regards,

MinosseCC: a lexer/parser generator for C#
Also SO questions:
Parser-generator that outputs C# given a BNF grammar? which suggests using ANTLR
Translate C# code into AST?
C# String to Expression Tree
Developing a simple parser

As Mitch says, Antlr can be your solution. You can transform Antlr's AST output depending on your needs and then serialize it with xstream. That's the approach I'm using in my bs project, If anyone knows a better way It'll be great for me aswell.
You can find csharp grammar samples like for example http://www.antlr.org/grammar/1127720913326/tkCSharp.g or http://www.antlr.org/grammar/1151612545460/CSharpParser.g but you might have to adapt it to ANTLRV3 or to your own needs.

Our DMS Software Reengineering Toolkit is an ecosystem for building code analyzers and transformers. DMS is parameterized by a language definition, and has language definitions for C#, Java, C++, C, PL/SQL, PHP, JavaScript, COBOL and a variety of other langauges. When DMS parses according to a langauge definition, it automatically builds an AST. An AST library provided by DMS can print the tree in Lisp-like parenthesized form, or in XML format.
Rather than convert XML back into source code, DMS can regenerated the source code directly from the AST. DMS also provides source-to-source transformations to allow manipulation of the AST.

Related

How to generate antlr a g4 parser and lexer in code?

Is it possible to generate the antlr lexer and parser (from a given g4 grammar) directly within the code be it with the Antlr 4 runtime directly from the Python or C# code?
I think it would be much more convenient that calling the external tool everytime I need.
[EDIT]
It looks that I am looking for something similar to an in memory antlr feature with C# or Python:
https://stackoverflow.com/a/38053163/4636721
How to create AST with ANTLR4?
The code to parse ANTLR4 grammars and convert them to an ATN + generating the target files is written in Java. This tool code is not translated to the target language (only the runtime is), so it is not possible to do the same job in other languages. That inmenantlr project only uses the Java code from ANTLR4 in its own Java code to do the same thing, except for the need to run it as an external jar.
The only way to make your wish possible would be to translate all the tool code also to the target language.
However, depending on your needs there's a way to generate a parser interpreter for your target language. I have done this in my vscode-antlr4 extension, where users can debug their ANTLR4 grammars. For that I added an export feature of the data required for the interpreter to ANTLR4 (it's available there since 4.7.2). This data can then be used to set up the lexer + parser interpreters (which are translated to the target language) to parse a file with that grammar. These interpreters use the same prediction engine as the generated parsers, but do not keep parse contexts, variables etc.

Generating doxygen comments for swig-generated C# that wraps C++

I have a project written in C++ where I'm using swig to generate some C# wrappers as well. The C++ code uses Doxygen style comments to annotate the classes and functions. Is it possible to get Swig to take those doxygen comments and produce doxygen comments for the C# wrapper classes and functions?
Currently, SWIG does not parse code comments including Doxygen documentation at all.
There is a SWIG branch in development since a couple of years to enable SWIG to deal with Doxygen comments, but even that currently (AFAIK) only maps them to Java and Python documentation.
The best option currently is therefore to extract the Doxygen documentation from the C++ source code and insert it into the SWIG generated wrapper. To understand how this can be done, here is a brief explanation of what doxy2swig.py does (and this is indeed meant for python docstrings):
Let Doxygen extract the documentation into its XML format
Parse the XML, and reformat into suitable Python docstrings
Write %feature("docstring") SWIG directives to tell SWIG to attach the docstrings to the wrapped classes and methods.
Basically, something similar can be done for C# as well. I do not know how to do (2) for C#, i.e., how to translate the Doxygen XML output into suitable C# documentation, this you may need to implement yourself (perhaps by modifying the doxy2swig.py script).
For (3) there is a neat trick that is sort of documented here, noting that the same can also be done for C# using the %csclassmodifiers and %csmethodmodifiers. These SWIG feature directives are AFAIK used to prepend either public or protected to C# methods or classes. But they can be hijacked to prepend the extracted documentation (+ the public keyword, not to forget). So they effectively allow the same functionality as the %feature("docstring") directive for Python.
Finally, I don't know C#, but what is the point of having the Doxygen comments included in the C# wrapper? If you only want to use Doxygen to generate documentation, you can do this from the C++ sources directly, so you don't gain anything. In Python, the docstrings can be displayed as help at runtime, and are used by some IDEs. Does C# have this, too?
As of October 2022, the accepted answer from m7thon has become outdated. I have started work in the (public) merge request https://github.com/swig/swig/pull/2421, based on the nice prior work from https://github.com/swig/swig/pull/1695, to add support for doxygen comments for SWIG-generated C#.
The current status in the above MR still has quite significant limitations. Also it has not yet been extensively tested. But it can already achieve basic documentation in C# XML format, and may be a good starting point for people in need of a solution.

.NET TypeScript parser to AST

I'm looking for TypeScript parser which produces AST (Abstract Syntax Tree) from TypeScript code, like code created with Visual Studio.
I think Visual Studio must have such parser since it uses it for code intelligence.
I know I can compile TS to JS and then use like Jint to produce AST, but it's no good for me. I need strict relation between AST nodes and original lines in TS source.
Is there a way to put my hands on a VS / Windows dll to get AST, or maybe there is a library providing such functionality? I've done some research and all I found was very incomplete and limited.
There is a Microsoft TypeScript compiler written in TypeScript, but how to use it from C#? Would it be fast enough to parse edited code in real-time?
For the sake of clarification: I need the parser written in C# or in C++ with C# bindings. Or... OK, it could be written in any language, but accessible from the level of C# code. I'm afraid I'll have to write my own parser, but I don't want to reinvent the wheel.
The point is I want to visualize the code. I do not want the code to be executed from C#. I only want to see its structure and it has to be accurate, no missing elements.
Most parsers / compilers I've seen had thousands LOC written in solely purpose of executing scripts. They covered very limited subset of the language syntax. I need just the opposite. No running, but full syntax. Without control structures, they are irrelevant to my visualization. All I need from AST are function declarations and object definition declarations.
I know there is a parser / compiler of almost every imaginable language written in JavaScript, but are there any good written in C#?
I'm looking for TypeScript parser which produces AST (Abstract Syntax Tree) from TypeScript code, like code created with Visual Studio.
Checkout this section : http://basarat.gitbooks.io/typescript/content/docs/compiler/parser.html
Here is a code sample to print out the AST:
import * as ts from "ntypescript";
function printAllChildren(node: ts.Node, depth = 0) {
console.log(new Array(depth + 1).join('----'), ts.syntaxKindToName(node.kind), node.pos, node.end);
depth++;
node.getChildren().forEach(c=> printAllChildren(c, depth));
}
var sourceCode = `
var foo = 123;
`.trim();
var sourceFile = ts.createSourceFile('foo.ts', sourceCode, ts.ScriptTarget.ES5, true);
printAllChildren(sourceFile);

Parser for the Mathematica syntax?

Is there a built parser that I can use from C# that can parse mathematica expressions?
I know that I can use the Kernel itself to parse an expression, and use .NET/Link to retrieve the tree structure... But I'm looking for something that doesnt rely on the Kernel.
My matheclipse-parser module implements a parser in Java which can parse a big subset of mathematica expressions. See the readme.md page for usage. Maybe you can port the parser to C#?
The mathematica grammar isn't well documented, true. But AFAIK, it is
LALR(1) and likely LL(1); the bracketed /tagged syntax from gives the parser complete clues
about what to expect next, just like LISP and XML.
The DMS Software Reengineering Toolkit does have a Mathematica grammar that has been used for real tasks.
This includes MMa programs as well as pure expression forms.
That probably doesn't help you, since you want one in C#.
If you have access to the Kernal, I'd stick to that.
I don't think such a thing exists already (I'd love to know about it). But it may be useful that within Mathematica you can apply the function FullForm to any expression and get something very easy to parse, kind of like an s-expression in Lisp. For example,
FullForm[a+b*c]
yields
Plus[a, Times[b,c]]
That's the underlying representation of all Mathematica expressions and should be straightforward to parse.

What is TinyPG and how does it work?

What is TinyPG and how does it work? I know its a "compiler-compiler" but how do I get started and create my own compiler in C#?
I've understood approximately how you use it, and here's a brief.
TinyPG is a complete compiler-compiler IDE, with a Windows GUI for RegExp, EBNF and C#/VB. The following outlines the procedure of developing your own "compiler" within TinyPG:
You define Terminals using Regular Expressions.
You write these ReyExps within TinyPG, which basically extracts tokens from the input source code.
RegExps are natively supported in .NET which means that even your generated "compiler" code uses .NET's RegExps.
You define Non-terminals and parser rules in Extended BNF meta-syntax.
You write EBNF within TinyPG, to describe the language of your choice.
Some free BNF Grammers that describe modern programming languages.
You define the compiler in Managed code.
You write C#/VB code within TinyPG, to convert the tokens into an output of your choice.
One C#/VB code block per BNF grammer rule only.
TinyPG can compile and run your "tokenizer + parser + complier" using the commandline compiler.
TinyPG generates C# code for your new "compiler".
Generates a parse-tree from inputted source code, using your RegEx along with your EBNF.
Translates this parse-tree into an output, using your C#/VB code.
You develop the front-end of your compiler in C# or VB.NET.
A basic front end would invoke the generated C# classes with an input file, and display the output.
To begin, you can open the "simple expression2.tpg" file within the provided Samples of TinyPG to see a demo of a calculator "compiler".

Categories