Tokenizer for C#? - c#

Is there any functionality built into the .NET framework somewhere to tokenize C# code? I'm not looking to build a tokenizer in C#, I'm looking for something that can tokenize C# source code.

The only thing that comes to mind is a parser generator like ANTLR, which has C# Sample Grammar available. Bison/Flex also looks like it has pretty decent C# grammar as well. Parsing any language and then actually making sense of it is fairly difficult, so I wish you the best of luck.

No, not built into the framework.
However, you may want to look at Irony, and C# Parser on CodePlex, as they both provide a parser/lexer for at least simple C#

The GOLD Parser too has a C# grammar (to parse C#), and run-time engines written in C# (so that you can execute that grammar using C# code).

Related

c# parser for ActiproSyntaxEditor

Is there any freeware c# parser that can be used as parser in ActiproSyntaxEditor control?
Your first choice should be Mono if you are looking for a C# parser, as it is always updated to keep sync with Microsoft's C# compiler,
http://tirania.org/blog/archive/2010/Apr-27.html
However, you seems to ask for one that can work with SyntaxEditor. This is strange, as this control already has C# support,
Syntax Languages
Syntax languages are a core piece of the text/parsing
framework. They basically encapsulate all functionality for a
particular code language that is being used within a SyntaxEditor
control. This is everything from various types of parsing all the way
to simpler features like determining word breaks or performing line
commenting.
Over 20 full source sample language definitions are included with
SyntaxEditor for common languages like Assembly, Batch files, C, C++,
C#, CSS, HTML, INI files, Java, JScript, Lua, MSIL, Pascal, Perl, PHP,
PowerShell, Python, RTF, SQL, VB.NET, VBScript, and XML. Custom
language definition can easily be created, thereby making it possible
to build a code editor for any proprietary language.
http://www.actiprosoftware.com/products/controls/windowsforms/syntaxeditor/editing

Generate and parse Python code from C# application

I need to generate Python code to be more specific IronPyton. I also need to be able to parse the code and to load it into AST. I just started looking at some tools. I played with "Oslo" and made a decision that it's not the right tool for me. I just looked very briefly at Coco/R and it looks promising.
Does anyone use Coco/R?
If you did what's your experience with the tool
Can you recommend some other tool?
The IronPython implementation itself includes a parser and an AST representation of Python programs which can be walked with a PythonWalker.
Not really my area of expertise but you might want to try ANTLR 4. It has support for generating Python 2 and Python 3.
I think you should look at the Dynamic Language Runtime. This will be a standard part of some later version of .Net and C# (.Net 4 from memory).
I've used it to compile and execute Python code generated at runtime, but I haven't played with all the AST stuff yet.

Parser builders for C#/.NET

I want to write a simple DSL in C#. Nothing too complicated. I'm looking for the .NET equivalent of Lex & Yacc. The easiest one I've found so far is GOLD Parser builder. The other choice is to use the lex & yacc available with F#, but I'm not keen to program in F# right now.
If you have any suggestions for the .NET version of Lex or Yacc, I'd love to hear them!
Thanks!
If you really want to stay in C#, I would recommend using the Irony toolkit - it allows you to specify grammars in C# code.
ANTLR 3 has a C# target.
How much F# programming do you need to do to take advantage of the Lex & Yacc? Can you throw what you need into an F# dll, and reference it from a C# project?
I don't know if it's what you're looking for, but Oslo has the ability to create a textual DSL.
It's got more features than that, but you can ignore all the repository and Grand Vision stuff and just produce a grammar you can use to parse your DSL into an AST. Alternatively, you can take advantage of the built-in support for parsing into a set of rows in a database.
There is also a C# Lex and C# CUP that I found on a Vienna University website
C# Lex Manual

How to manipulate C# AST?

I am working on a Reverse Engineering school project, which requires to translate manipulate AST of compiled C# project. I have seen the post on "Translate C# code into AST?" in this website, but it doesn't look like the one I am looking for.
According to what I know, currently C# doesn't provide a library class that does something like that for Java: http://help.eclipse.org/help33/index.jsp?topic=/org.eclipse.cdt.doc.isv/reference/api/org/eclipse/cdt/core/dom/ast/ASTVisitor.html. If there is such library class in C#, everything here is solved.
I have consulted with someone, and here are the possible solutions. But I have problems with working out on the solutions as well:
Find another compiler that provides a library which allows its AST to be expose for manipulation. But I can't find a compiler like that.
Use ANTLR Parser Generator to come out with my own compiler that does that (it will be a much more difficult and longer process). The download there provides sample grammars for different languages but not C# (it has grammars written in various languages including C# but not to produce C# grammar). Hence the problem is I can't find C# grammar.
What is shortest and fastest way to approach this issue? If I really have to take one of the alternative above, how should I go about solving those problems I faced.
I know the answer for this one was accepted long ago. But I had a similar question and wasn't sure of the options out there. I did a little investigation of the NRefactory library that ships as part of SharpDevelop. It does generate an AST from C# code.
Here's an image of the NRefactory demo application that is part of the SD source code. Type in some C# code and it generates and displays the AST in a treeview.
Why don't you try NRefectory. I've seen it discussed for AST thing on some SharepDevelop forums.
Here is an article on CodeProject regarding this topic.
ANTLR is not a good choice. I am now trying out using Mono Cecil instead. Mono Cecil is good for analyzing any souce codes that can be compiled into Common Intermediate Language (CIL). The disadvantage is that it doesn't have properly documentation.
I've just answered on another thread here at StackOverflow a solution where I implemented an API to create and manipulate AST from C# Source Code
A full C# 3.0 parser is available with our DMS Software Reengineering Toolkit (DMS for short). It has been used to process tens of thousands of C# files accurately. It provides automated AST building, tree traversals,
surface-syntax pattern matching and transformation and lots more.
As a commercial product it might not work out for a student project.
ANTLR arguably offers a C# parser, but I don't know complete or robust it is,
or whether it actually builds ASTs.
[EDIT Jan 25 2010: C# 4.0 parser now available for DMS with all the above properties]
[EDIT May 2016: C# 6.0 parser available for DMS.]

C# ANTLR grammar?

I'm looking for turn-key ANTLR grammar for C# that generates a usable Abstract Syntax Tree (AST) and is either back-end language agnostic or targets C#, C, C++ or D.
It doesn't need to support error reporting.
P.S. I'm not willing to do hardly any fix-up as the alternative is not very hard.
This may be waaaay too late, but you can get a C# 4 grammar.
Here's a C# grammar link, as well as an overview of C# and ANTLR. There are others for the other languages you mentioned here.
The DMS Software Reengineering Toolkit provides a full, validated grammar for C# 1.2, 2.0 and 3.0 with generics and LINQ expressions.
It automatically builds ASTs, allows you programmatic access to the ASTs for analysis or tranformation, or you can apply source-to-source transformations that also directly manipulate the tree. The resulting AST can be prettyprinted back to source code, even retaining indentation and comments.
DMS also has mature front ends for other languages such as Java, PHP5, JavaScript, COBOL, C and C++.
EDIT: 1/31/2010: The DMS C# parser has been extended to handle full C# 4.0.
You can find C# 6 ANTLR grammar at official grammars repository.

Categories