Parser builders for C#/.NET - c#

I want to write a simple DSL in C#. Nothing too complicated. I'm looking for the .NET equivalent of Lex & Yacc. The easiest one I've found so far is GOLD Parser builder. The other choice is to use the lex & yacc available with F#, but I'm not keen to program in F# right now.
If you have any suggestions for the .NET version of Lex or Yacc, I'd love to hear them!
Thanks!

If you really want to stay in C#, I would recommend using the Irony toolkit - it allows you to specify grammars in C# code.

ANTLR 3 has a C# target.

How much F# programming do you need to do to take advantage of the Lex & Yacc? Can you throw what you need into an F# dll, and reference it from a C# project?

I don't know if it's what you're looking for, but Oslo has the ability to create a textual DSL.
It's got more features than that, but you can ignore all the repository and Grand Vision stuff and just produce a grammar you can use to parse your DSL into an AST. Alternatively, you can take advantage of the built-in support for parsing into a set of rows in a database.

There is also a C# Lex and C# CUP that I found on a Vienna University website
C# Lex Manual

Related

How to define a DSL over C#

For a little night project I would like to write a validation component that could be used in .NET application to do the usual and tedious validation of object, input parameters and post conditions.
My first idea was to dump all this validation setup logic into a XML configuation file and provide a liquid interface for the people that would like to have it in code.
Because I would like to deliver something that is actually usable I thought about providing a specialized DSL (domain specific language). The question is what tools should I use to do this?
I thought about parsing it by hand using regex. But personally I would like to have something more...usable.
So what would you suggest?
It sounds like you're talking about implementing one of .Net 4.0's features, code contracts.
So I guess my recommended tool would be VS.Net 2010.
If you're looking specifically at a DSL, have a look at the ANTLR project. We've used it at my company quite successfully in the past.
The thing with DSl's is that they rarely are effective in isolation. To be useful for writing real software, you really need to be able to embed the DSL inside a host language. Compare, for example, the way Linq works vs just straight SQL. Another good example is XML literal feature in VB. Both let you write real code, in a general purpose PL and inter weave it with simpler declaritive DSL code.
The result is something much more powerful than stand alone SQL or a simple XML editor.
The downside to this, unfortunately, is that neither C# nor VB offers any meta programming features, so the only way to do that for mainstream .net devs is to build your own language. If this is something you are doing just for fun you might be able to modify the mono C# compiler to add the features you are interested on to the language. Another alternative might be to try ruby. It has a flexible syntax whic let's you get away with a lot of crazyness. Personaly, however, I would prefer the hacked C# approach.
Might want to check out Building Domain Specific Languages in Boo (Boo is a CLR language, concepts should carry over to C#). An example project is Simple State Machine.
Take a look at Oslo from Microsoft, not sure if it does what you want, but I know that you can build DSL parsers and specify grammars and have it generate class libraries that can parse data based on the grammar.
More details of the "M" language and how to use it are here
Why go so far? Start off using generics and a fluent interface. Start simple and work it through some production. If the friction is too high dealing with a fluent interface, then look at using a DSL.
You can try DSL Tools for Visual Studio.
You should look at Irony project at http://irony.codeplex.com/

Using IronPython to learn the .NET framework, is this bad?

Because I'm a Python fan, I'd like to learn the .NET framework using IronPython. Would I be missing out on something? Is this in some way not recommended?
EDIT:
I'm pretty knowledgeable of Java ( so learning/using a new language is not a problem for me ). If needed, will I be able to use everything I learned in IronPython ( excluding language featurs ) to write C# code?
No, sounds like a good way to learn to me. You get to stick with a language and syntax that you are familiar with, and learn about the huge range of classes available in the framework, and how the CLR supports your code.
Once you've got to grips with some of the framework and the CLR services you could always pick up C# in the future. By that point it will just be a minor syntax change from what you already know.
Bare in mind that if you are thinking with respect to a career, you won't find many iron python jobs, but like I say, this could be a good way to learn about the framework first, then build on that with C# in a month or twos time.
You can definitely do that to learn the class library, but I'm not sure if it's such a good idea when it comes to fundamental CLR concepts (e.g. delegates and events). You'll need to pay attention and distinguish what is strictly an IronPython feature, and what is CLR feature exposed in IronPython in a way that matches its dynamic semantics better.
If I wanted to just "learn the framework", I would do it in C# or VB for two main reasons:
Intellisense - the framework is huge, and being offered suggestions for function overloads is one of the ways to find new stuff. There's almost no good intellisense for the framework with IronPython at the moment (Michael Foord has done some work on building the appropriate info for Wing, but I haven't tried it myself).
Code samples - pretty much all the educational material that exists about the .NET framework is given with C# or VB. You'll be much more on your own with IronPython.
I find .NET a lot easier to learn with intellisense. If you can get IronPython to work in Visual Studio as a first-class language, go for it. If you try, please document it!
Hmmm: http://www.codeplex.com/IronPythonStudio

Tokenizer for C#?

Is there any functionality built into the .NET framework somewhere to tokenize C# code? I'm not looking to build a tokenizer in C#, I'm looking for something that can tokenize C# source code.
The only thing that comes to mind is a parser generator like ANTLR, which has C# Sample Grammar available. Bison/Flex also looks like it has pretty decent C# grammar as well. Parsing any language and then actually making sense of it is fairly difficult, so I wish you the best of luck.
No, not built into the framework.
However, you may want to look at Irony, and C# Parser on CodePlex, as they both provide a parser/lexer for at least simple C#
The GOLD Parser too has a C# grammar (to parse C#), and run-time engines written in C# (so that you can execute that grammar using C# code).

How to manipulate C# AST?

I am working on a Reverse Engineering school project, which requires to translate manipulate AST of compiled C# project. I have seen the post on "Translate C# code into AST?" in this website, but it doesn't look like the one I am looking for.
According to what I know, currently C# doesn't provide a library class that does something like that for Java: http://help.eclipse.org/help33/index.jsp?topic=/org.eclipse.cdt.doc.isv/reference/api/org/eclipse/cdt/core/dom/ast/ASTVisitor.html. If there is such library class in C#, everything here is solved.
I have consulted with someone, and here are the possible solutions. But I have problems with working out on the solutions as well:
Find another compiler that provides a library which allows its AST to be expose for manipulation. But I can't find a compiler like that.
Use ANTLR Parser Generator to come out with my own compiler that does that (it will be a much more difficult and longer process). The download there provides sample grammars for different languages but not C# (it has grammars written in various languages including C# but not to produce C# grammar). Hence the problem is I can't find C# grammar.
What is shortest and fastest way to approach this issue? If I really have to take one of the alternative above, how should I go about solving those problems I faced.
I know the answer for this one was accepted long ago. But I had a similar question and wasn't sure of the options out there. I did a little investigation of the NRefactory library that ships as part of SharpDevelop. It does generate an AST from C# code.
Here's an image of the NRefactory demo application that is part of the SD source code. Type in some C# code and it generates and displays the AST in a treeview.
Why don't you try NRefectory. I've seen it discussed for AST thing on some SharepDevelop forums.
Here is an article on CodeProject regarding this topic.
ANTLR is not a good choice. I am now trying out using Mono Cecil instead. Mono Cecil is good for analyzing any souce codes that can be compiled into Common Intermediate Language (CIL). The disadvantage is that it doesn't have properly documentation.
I've just answered on another thread here at StackOverflow a solution where I implemented an API to create and manipulate AST from C# Source Code
A full C# 3.0 parser is available with our DMS Software Reengineering Toolkit (DMS for short). It has been used to process tens of thousands of C# files accurately. It provides automated AST building, tree traversals,
surface-syntax pattern matching and transformation and lots more.
As a commercial product it might not work out for a student project.
ANTLR arguably offers a C# parser, but I don't know complete or robust it is,
or whether it actually builds ASTs.
[EDIT Jan 25 2010: C# 4.0 parser now available for DMS with all the above properties]
[EDIT May 2016: C# 6.0 parser available for DMS.]

C# ANTLR grammar?

I'm looking for turn-key ANTLR grammar for C# that generates a usable Abstract Syntax Tree (AST) and is either back-end language agnostic or targets C#, C, C++ or D.
It doesn't need to support error reporting.
P.S. I'm not willing to do hardly any fix-up as the alternative is not very hard.
This may be waaaay too late, but you can get a C# 4 grammar.
Here's a C# grammar link, as well as an overview of C# and ANTLR. There are others for the other languages you mentioned here.
The DMS Software Reengineering Toolkit provides a full, validated grammar for C# 1.2, 2.0 and 3.0 with generics and LINQ expressions.
It automatically builds ASTs, allows you programmatic access to the ASTs for analysis or tranformation, or you can apply source-to-source transformations that also directly manipulate the tree. The resulting AST can be prettyprinted back to source code, even retaining indentation and comments.
DMS also has mature front ends for other languages such as Java, PHP5, JavaScript, COBOL, C and C++.
EDIT: 1/31/2010: The DMS C# parser has been extended to handle full C# 4.0.
You can find C# 6 ANTLR grammar at official grammars repository.

Categories