Extending the Mono C# compiler: is there any documentation or precedent?

Extending the Mono C# compiler: is there any documentation or precedent? - c#

I am currently involved in some interesting programming language research which has, up until now, centred around extending the upcoming Java 7.0 compiler with some very powerful programmer-productivity-based features. The work should be equally applicable to related programming languages such as C#.
I'm currently scoping out the options for prototyping a C# port of the functionality. I would prefer open-source options so that the fruits of this work can be shared with the broadest-possible audience. Thus the Mono C# compiler seems to be the most obvious starting point. I'm an experienced C# developer so writing the code isn't the problem. I'm mainly concerned about extending the compiler in a maintainable and supported fashion. In the Mono FAQ on the subject (link) it is stated that "Mono has already been used as a foundation for trying out new ideas for the C# language (there are three or four compilers derived from Mono's C# compiler)". Unfortunately, there are no further pointers than this and, so far, Google searches have not turned anything up.
I'm wondering if anybody out there has any information on this. Do mcs/gmcs/dmcs have a standard extensibility model? Specifically, I will be performing some interesting transformations on a program's abstract syntax tree. Is there a standard mechanism for inserting functionality into the compiler chain between abstract syntax tree generation and the type checker and then code generation?
Up until now I've written some ad-hoc extensions to the code (primarily in the code generator) but this doesn't seem to be a maintainable solution especially given that I intend to keep my extensions up to date with the Git trunk of Mono as much as possible. Furthermore it would be nice to be able to make updates to my extensions without having to recompile the whole compiler every time I make a change. I would like to be able to wrap all my AST manipulations into a single .NET assembly that could be dynamically loaded by mcs/gmcs/dmcs without having to hack at the core compiler code directly.
Any thoughts or pointers on extending the Mono C# compiler would be gratefully received!
UPDATES (23 October 2010)
In response to the responses to my question, I decided that I would start working on a branch of Mono in order to create a simple extensibility model for the compiler. It's in its very early stages, but here it is at GitHub:
http://github.com/rcook/mono-extensibility
And the main commit is: http://github.com/rcook/mono-extensibility/commit/a0456c852e48f6822e6bdad7b4d12a357ade0d01
If anybody would be interested in collaborating on this project, please let me know!

Unfortunately, I cannot adequately answer your question, but if you look at the examples of C# extensions on Miguel de Icaza's blog, you will notice that all of them take the form of patches to the compiler, not plugins or extensions. This seems to indicate that there is no such API.
Note that all of these examples are of much smaller scope than what you seem to be working on:
Parameterless Anonymous Methods (this post actually explicitly mentions concerns about the maintainability of such language extensions)
String Interpolation
Destructuring Assignment for Tuples
Syntactic Sugar for IEnumerable
These are mostly localized syntactic sugar, with no "interesting" behavior. The fourth patch, for example, implements Cω's syntactic sugar for IEnumerables, but without any of Cω's semantics that make this syntax interesting. If you look at the patch you can see that it literally does stupid syntactical expansion of ~T → IEnumerable<T>, as opposed to Cω, where member access and method invocation are properly lifted over streams.
Microsoft Research's Phoenix Compiler Pipeline was once explicitly touted as the solution to such extensibility problems, but it seems that it now focuses mostly on optimizations and analysis on the IR level in a code generation backend. In fact, I'm not even sure if the project is even still alive.

The mono C# compiler is a bit of a hack. I spent around a week figuring out how to use information from the parse tree. The compiler does not produce any intermediate representation and code generation may break parts of the parse tree.
Still, the parser and tokenizer might prove useful to you and you just take it from there.
SharpDevelop also provides a C# parser.
The SharpDevelop parser is easier to use than the mono C# parser.
If F# also works for you, I would recommended. The source much cleaner than mono and available under open source license.

Related

Would aspect oriented programming be a positive addition to the C# language?

I've had some interesting debates with colleagues about the merits if incorporating aspect oriented programming as a native paradigm to the C# language.
The debate seems to be divided into three camps:
Those folks who think that C# is already too complicated as it is, and another major feature like AOP would only muddy the waters further.
Those who think that it would be a great addition because anything that can increase the expressiveness of the language without breaking existing is a good thing.
Those who don't think it's necessary because libraries like PostSharp that perform post-compilation IL weaving already allow it in a language neutral way.
I'm curious what the community of C#/.NET developers out there think.

It would be great if languages would make it easier to develop and use AOP extensions.
For instance:
It would be nice if one could give a delegate (or anonymous method, or lambda) as a parameter to a custom attribute. It's not a lot of work to implement this in C#, it's quite easy to implement it in the CLR (since it supports types, why not methods?). And it would allow to express 'pointcuts' in an elegant way.
Support for 'fieldof' and 'methodof'. It is somewhat supported by the CLR (with bugs), not by C#. The same for 'eventof' and 'propertyof' (they have currently no support in the CLR).
Better debugging symbols could make it easier for an aspect weaver to report error messages and give the location in code.
It would be great to have a modular compiler; it would be less expensive to implement some features like source code generation based on aspects (for method and interface introductions).
That said, I don't think that the language should provide AOP extensions. This is too large (I think PostSharp 2.0 is more complex than the C# compiler itself, at least than C# 2.0). Let's face it: AOP is still rather experimental in the sense that we still don't know exactly what we want from it. There is still little experience. But we want the specification of a language to be stable and to address well-understood problems (imagine the Entity Framework were a part of the language).
Additionally, there are different ways to achieve AOP, and build-time is only one of them. There is nothing wrong in using runtime technologies, like JIT-emitted proxies (Spring/Castle); these are just for different use cases and have their own pros and cons.
So my opinion in one sentense: yes for limited and well-defined language extensions that make it easier to develop AOP frameworks; no for a full AOP implementation in the language.

I agree with the first camp. C# is already loaded with features. Leverage the PostSharp library instead.

It would be useful, but many of the uses are being included in various ways as it is. For example, we will have the ability to do post and pre validation in .NET4, which was one use for using around, in aspectJ.
By using extension methods you can inject new methods into objects you may not have source code for.
And, I don't believe it would be used much as C# developers seem to approach problems differently than Java programmers, which is easy to do since the two languages have diverged so much now.
I don't know if the companies that tend to use .NET would want to use something like AOP, and you would need tools to help understand what aspects are being injected where, such as AJDT on Eclipse.

Why is post-compilation code injection a better idea than pre-compilation code injection?

So we all know that C# doesn't have a C-like macro pre-processor (and there's a good thread on why here). But now that AOP is gaining traction, it seems like we're starting to do stuff with post-processors that we used to do with pre-processors (bear in mind that I am only getting my feet wet with PostSharp so am perhaps off base).
I am a huge fan of attributes in C#, but if a pre-processor was left out for good reasons (which, as a former MFC user I still question but nevertheless accept) why is post-compilation code injection a better idea than pre-compilation code injection?

The reasons why I chose post-compilation when designing PostSharp 5 years ago are:
Language agnosticism.
MSIL has stabler specifications compared to high-level languages (which have non-trivial updates every second year).
Most of the time, MSIL is the level of abstraction you need when dealing with aspects. You don't need to know all the equivalent constructs (think f 'using' and 'try-finally').
Before 2008, nobody has succeeded in producing a decent C# compiler. The difficulties met by Mono were impressive enough, even if they have caught up now.
Dealing with binary seemed much faster than dealing with source code.
Dealing with a binary assembly makes it possible to execute it -- the assembly being processed can transforme itself. It was unheard before PostSharp Laos was first released.
That said, implementations of AOP for C/C++ are indeed a pre-compiler (WeaveC) and implementations in Java are a compiler extension (for the good reason that there are many OSS implementations of the Java compiler).
-gael

Technically, there is a pre-compilation option for C# built into Visual Studio: The Text Template Transformation Toolkit (T4). This allows you to do pretty amazing things in a pre-compilation step, and is the basis of quite a few products, such as some ORMs, etc.

If you were to do pre-compilation you would have to interpret the source files from all the different languages you support then generate code in that language before it gets passed to the compiler. With post-processing you can simply use reflection to examine the assemblies whether the original language was C#, Visual Basic, or whatever.

It is just simpler. IL is a heckofalot easier to parse than C# source code. And it is language agnostic.

Should C# and Java incorporate each others syntax?

I've been jumping from C# to Java an awful lot and the "differences" between the two are a bit of an annoyance.
Would it be possible to write a set of extentions/plugins that would merge the two languages syntaxes.
I'm talking about adding either IDE support or using language constructs that would for example:
treat these two lines equivalently:
System.out.println("Blah");
Console.out.writeline("Blah");
Automatically notice that when you type in string you mean String
Recognise common API calls and translate them in the background.
The end goal being to be able to write a java/C# program and to pick at compile time which VM/Runtime you are targeting.
If you could do this would it be a good idea?
If not why not?
The two languages are so similar it's painful in some aspects but in other aspects they are really different.
I've seen Code that will translate a C# project into Java and I'm assuming there is probably the reverse, what I am proposing is a middle ground, so we can all just "get along".

No, absolutely not. Certainly not in the languages themselves (as implied by the title) and preferably (IMO) not in the IDEs (as requested in the body).
They are different languages. The idioms and conventions are subtly different. I don't want to be thinking in Java when I'm writing C# or vice versa. I believe developers should be actively encouraged to separate their thinking. It's not too hard to switch between the two, but that switch should be present, IMO.

While I totally agree with Jon Skeet, if you must have this why not create your own library of Java API so you can create System.out namespace which has a method call printLn which calls Console.Writeline()?
That gets you close to what you want.

Just because Java and C# share some similar syntax you need to see past this and think in terms of Java Platform and .NET Platform. The two are distinctly different, so my answer is definitely not.

There actually already is a Java language for the .NET framework, developed by microsoft: J#
This way you get the java-syntax but you are still developing with the .NET framework.
But i am not recommending anyone to use it.
I knew Java before i knew C# so i tried out J# because i thought it would be an easier transition. At first I liked it but after I tried C# I'm never going back. First of all, nobody uses J# so it's kinda hard to find examples and tutorials. Second, C# has (IMO) much more convenient syntax, specially for events, properties, lambda, anonymus methods and alot of other things, it's also being updated every now and then with even more syntax sugar which i don't think J# is.
Maybe if you often write Java and sometimes have to write a .net app it might be a good option.

I think no. I also switch from java to c#. But if the syntax is identical was is to stop someone from trying to compile c# in a Java compiler, or vice-versa.

Visual Studio actually ships with a Java to C# converter, which tries to do some of the things you mention. Unfortunately it fails miserably (1) for anything beyond the simple hello world application.
Despite being very similar on the surface, there are many significant differences between Java and C#, so you would achieve very little by doing what you suggest imo.
(1) To be fair, it actually does a fairly good job if you consider the limitations given for such a task, but in practice the resulting code is of limited use and you have to do a lot of clean up after the conversion.

Firstly what you are describing is not a difference in language syntax but a differences in class libraries. Both languages are relatively simple in terms of keywords and features but understanding or knowing the libraries and how they operate requires considerable learning.
The mistakes you are describing are things that the developer should not be making to begin with - the IDE should not be guessing. There are going to be many cases where you can't easily / trivially translate between java or dotnet. In the end a skilled developer learns and knows when and which class libraries to use.

Actually in the beginning there was no dotnet - microsoft was behind java. They however proceeded to change java in ways not compatible with the java plstform standard. To paraphrase sun sued microsoft and won I'm court. Following that ms proceeded to create dotnet and particularly c# which became microsofts VM platform. Of course along the way a whole stack of things got changed. Microsoft introduced many things which broke Javas run anywhere etc. They have done the same thing with dotnet which have cause problems for the mono team to be able to faithfully reimplemwnt everything for other non windows platforms.
• String vs string.
• lowercase method names (java) v uppercase method names(dotnet).
• Giving java keywords new names - "package".
In the end dotnet was microsoft response so they can control the platform and do their own thing instead of following a standar

How do I move from Java to C#?

I know Java well. Which caveats and resources will help me cross to the other side (C#) as painlessly as possible.

Biggest tip: go with the .NET naming conventions from the word go. That way you'll constantly be reminded about which language you're in. (Sounds silly, but it really is helpful.) Embrace the idioms of the language as far as possible.
There are various books specifically for folks in your situation - search for "C# for Java" in Amazon and you'll get plenty of hits. It's worth reading carefully to make sure you don't assume that things will work the same in C# as in Java. (For instance, in C# instance variable initializers are executed before the base class constructor body; in Java they happen after. Subtle things like this can take a while to learn, and are easy to miss if you're skimming.)
If you're going to be using C# 3, I'd get a book which definitely covers that - everything in C# 3 will be new to you. Gratuitous plug: my own book (C# in Depth) covers C# 2 and 3, but assumes you already know C# 1. (In other words, it won't be enough on its own, but you may want it as a "second" book.)

See this great article on C# from a Java Developer's Perspective. It has several insights on the things that can be done in both sides to avoid minimum overhead. Having example in both the language you know and the language you want to learn eases the learning curve quite a bit.

Install Visual Studio 2008 and Resharper with IntelliJ IDEA key bindings. This gives you things like prompting you to include namespaces if you start using them.
Start a new project and start writing Java code, when you run into something that doesn't work properly or it's unable to find the class you're trying to use Google "PrintLn in c#".
Write tests or code snippets for sanity checks, like you may want to check if == works for strings (it does)
realize that c# alias Data Types (int is an alias for System.Int32, string for System.String)
look at other peoples code I recommend JP Boodhoos Google code
Take a job in C#, there's lots of jobs requiring both Java and C# especially in support.
Know your libraries, most Java libraries have been ported and most of the time the name is either like (Hibernate => NHibernate) or (Xstream => Xstream.Net). Not every library has an obvious name so just start looking into random ones you hear about here. ie (Rhino.Mocks,HTMLAgilityPack,MBUnit,Rhino.Commons,Castle Project)
Go to usergroup meetings look for a DNUG (Dot Net User Group) they'll be helpful and you can get some good advice.

There's a cheat-sheet from Microsoft for Java developers using C# :)

I know that a good answer has already been accepted. However, I'd like to make an addition...
I find that learning a new language typically involves learning subtle syntactic differences....especially when dealing with the difference between languages in the C/C++/Java/C# family.
In addition to a nice thick reference book I recommend getting a pocket reference like C# 3 Pocket Reference from O'Reilly. It won't help you with the design patterns etc...but will provide a very quick reference about the specific differences of the language you are using.
Here's a quick blurb about this book from that site:
C# 3.0 Pocket Reference includes plenty of illustrations and code examples to explain:
Features new to C# 3.0, such as lambda expressions, anonymous types, automatic properties, and more
All aspects of C# syntax, predefined types, expressions, and operators
Creating classes, structs, delegates and events, enums, generics and constraints, exception handling, and iterators
The subtleties of boxing, operating overloading, delegate covariance, extension method resolution, interface reimplementation, nullable types, and operating lifting
LINQ, starting with the principles of sequences, deferred execution and standard query operators, and finishing with a complete reference to query syntax-including multiple generators, joining, grouping, and query continuations
Consuming, writing, and reflecting on custom attributes
I used this book (well the original) to help me go from being a Java to a C# developer. While I was learning, I kept it by my desk at all times and it really helped.

I made the transition pretty easily by using C# at work, but one of the most important things to do is familiarize yourself with the .NET API and some of the powerful techniques available in C#.
After I learned the .net library I relied on it a lot more than I used to, so learning about the things it can do for you is very helpful. After that, if you work with db code at all, learn LINQ, and also techniques lambas, anonymous types and delegates are also a useful to pick up.

The language syntax is vary similar, so I should only read a small reference of the C# syntax. Like a simple book (for experienced programmers) or maybe wikipedia (http://en.wikipedia.org/wiki/Comparison_of_Java_and_C_Sharp) will tell enough.
The biggest difference is the library: Asp.Net websites are totally different from java servlets.
Don't read much, just start programming!

Here's a link that has syntax comparison between Java and C# (even though it's almost identical, there are a few differences).
http://www.harding.edu/fmccown/java1_5_csharp_comparison.html

Use Sharpen to convert your Java programs to C# and see the differences.

How should I convert Java code to C# code?

I'm porting a Java library to C#. I'm using Visual Studio 2008, so I don't have the discontinued Microsoft Java Language Conversion Assistant program (JLCA).
My approach is to create a new solution with a similar project structure to the Java library, and to then copy the java code into a c# file and convert it to valid c# line-by-line. Considering that I find Java easy to read, the subtle differences in the two languages have surprised me.
Some things are easy to port (namespaces, inheritance etc.) but some things have been unexpectedly different, such as visibility of private members in nested classes, overriding virtual methods and the behaviour of built-in types. I don't fully understand these things and I'm sure there are lots of other differences I haven't seen yet.
I've got a long way to go on this project. What rules-of-thumb I can apply during this conversion to manage the language differences correctly?

Your doing it in the only sane way you can...the biggest help will be this document from Dare Obasanjo that lists the differences between the two languages:
http://www.25hoursaday.com/CsharpVsJava.html
BTW, change all getter and setter methods into properties...No need to have the C# library function just the same as the java library unless you are going for perfect interface compatibility.

Couple other options worth noting:
J# is Microsoft's Java language
implementation on .NET. You can
access Java libraries (up to version
1.4*, anyways).
*actually Java 1.1.4 for java.io/lang,
and 1.2 for java.util + keep in mind that J# end of
life is ~ 2015-2017 for J# 2.0 redist
Mono's IKVM also runs Java on
the CLR, with access to other .NET
programs.
Microsoft Visual Studio 2005 comes
with a "Java language conversion
assistant" that converts Java
programs to C# programs
automatically for you.

One more quick-and-dirty idea: you could use IKVM to convert the Java jar to a .NET assembly, then use Reflector--combined with the FileDisassembler Add-in--to disassemble it into a Visual C# project.
(By the way, I haven't actually used IKVM--anyone care to vouch that this process would work?)

If you have a small amount of code then a line by line conversion is probably the most efficient.
If you have a large amount of code I would consider:
Looking for a product that does the conversation for you.
Writing a script (Ruby or Perl might be a good candidate) to do the conversion for you - at least the monotonous stuff! It could be a simple search/replace for keyword differences and renaming of files. Gives you more time/fingers to concentrate on the harder stuff.

I'm not sure if it is really the best way to convert the code line by line especially if the obstacles become overwhelming. Of course the Java code gives you a guideline and the basic structure but I think at the end the most important thing is that the library does provide the same functionality like it does in Java.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.